Handbook of Employee Selection

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 1030

HANDBOOK OF

Employee
Selection
“SIOP emphasizes the scientist-practitioner model. Who exemplifies this model more in the field
of selection than Drs. Farr and Tippins? The new Alliance of Organizational Psychology includes
EAWOP, Division 1 of IAAP, and SIOP members. This book includes the perspectives of authors
with a global view of selection. There is simply no more authoritative text on the planet than this one
for members of the Alliance.” - Gary P. Latham, Secretary of State Professor of Organizational
Psychology, Rotman School of Management, University of Toronto, and Past President of the
Canadian Psychology and the Society for Industrial-Organizational Psychology

“Farr and Tippins have assembled an impressive line-up of leading scholars in the field to present a
comprehensive and up-to-date treatment of employee selection. Broad in scope and rich in content,
the Handbook of Employee Selection offers an evidence-based perspective on the design, imple-
mentation, and evaluation of selection systems in organizational contexts. The handbook also offers
an in-depth treatment of criterion development, important legal and ethical issues in employee
selection, and an in-depth discussion of the unique selection associated with various organizational
contexts, including the military, blue collar occupations, and multinational corporations. This book
is a must-read for practitioners and academics alike, and is positioned to have a major impact on
the field of employee selection.” - Lillian T. Eby, Professor of Psychology, Associate Editor,
Personnel Psychology, University of Georgia

“James L. Farr and Nancy T. Tippins have produced a ‘must-have’ handbook on employee selection
for anybody engaged in the selection and recruitment of talent at work, and for all students and aca-
demics in HR and personnel/occupational psychology. It contains most of the international leading
lights in the field. It not only includes the state of the art research in this arena, but also how this
can be translated into effective practice. A book shelf in HR is not complete without this volume.”
- Cary L. Cooper, CBE, Distinguished Professor of Organizational Psychology and Health at
Lancaster University Management School, England

“Farr and Tippins’ Handbook of Employee Selection is an impressive compilation of chapters writ-
ten by leaders in the field covering a) psychometric issues b) design, implementation, and evalua-
tion issues, and c) historical and legal contexts relating to selection of employees in organizations.
Chapters are, at the same time, sophisticated and readable. They summarize the state of the science
and practice to date, and provide signposts to the future of employee selection. Many of the hand-
book’s chapters will be citation classics well into the 21st century. It’s on my bookshelf and should
be on yours too!” - Charles E. Lance, Associate Editor, Organizational Research Methods,
Professor of Psychology, The University of Georgia

“Farr and Tippins have assembled the definitive who’s who in employee selection.” - Milton D.
Hakel, Department of Psychology, Bowling Green State University
HANDBOOK OF

Employee
Selection
EDITED BY

James L. Farr
Pennsylvania State University
University Park, PA

Nancy T. Tippins
Valtera Corporation
Greenville, SC

SECTION EDITORS

Walter C. Borman Frank J. Landy


John P. Campbell Kevin R. Murphy
David Chan Robert E. Ployhart
Leaetta Hough Elaine D. Pulakos
Ann Howard Ann Marie Ryan
Jerard F. Kehoe Paul R. Sackett
Rick Jacobs Neal W. Schmitt
P. Richard Jeanneret Ben Schneider
Routledge Routledge
Taylor & Francis Group Taylor & Francis Group
270 Madison Avenue 27 Church Road
New York, NY 10016 Hove, East Sussex BN3 2FA
© 2010 by Taylor and Francis Group, LLC
Routledge is an imprint of Taylor & Francis Group, an Informa business

Printed in the United States of America on acid-free paper


10 9 8 7 6 5 4 3 2 1

International Standard Book Number: 978-0-8058-6437-3 (Hardback)

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://
www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923,
978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Handbook of employee selection / editors, James L. Farr, Nancy T. Tippins.


p. cm.
Includes bibliographical references and index.
ISBN 978-0-8058-6437-3 (hardcover : alk. paper)
1. Employee selection. 2. Employee selection--Handbooks, manuals, etc. I. Farr, James L. II.
Tippins, Nancy Thomas, 1950-

HF5549.5.S38H36 2010
658.3’112--dc22 2009052677

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com
and the Psychology Press Web site at
http://www.psypress.com
To Diane and Mac, our best selection decisions, and to the too-numerous-
to-name mentors, colleagues, students, and clients who have informed our
thinking about employee selection throughout our careers. And to Frank
J. Landy. In the last days of editing this book, we learned of Frank’s death
on January 12, 2010. We dedicate this book to Frank’s memory for his
many contributions to the science and practice of employee selection.
This page intentionally left blank
Contents
Preface.............................................................................................................................................xix
Editors .............................................................................................................................................xxi
Contributors ................................................................................................................................. xxiii

Chapter 1 Handbook of Employee Selection: An Introduction and Overview ............................. 1


James L. Farr and Nancy T. Tippins
Brief History of Science- and Data-Based Employee Selection ..................................1
Structure of Handbook of Employee Selection ............................................................ 3
References ....................................................................................................................6

PART 1 Foundations of Psychological Measurement and


Evaluation Applied to Employee Selection
John P. Campbell and Frank J. Landy, Section Editors

Chapter 2 Reliability and Validity ................................................................................................ 9


Dan J. Putka and Paul R. Sackett
Overview .................................................................................................................... 10
Reliability ................................................................................................................... 11
Role of Measurement Models..................................................................................... 14
Estimation of Reliability ............................................................................................ 22
Emerging Perspectives on Measurement Error .......................................................... 36
Closing Thoughts on Reliability ................................................................................ 37
Concept of Validity .................................................................................................... 38
References .................................................................................................................. 45

Chapter 3 Validation Strategies for Primary Studies .................................................................. 51


Neal W. Schmitt, John D. Arnold, and Levi Nieminen
Nature of Validity ....................................................................................................... 51
Validation in Different Contexts ................................................................................ 53
Special Considerations in Criterion-Related Studies ................................................. 59
Concerns About the Quality of the Data: Cleaning the Data ....................................60
Modes of Decision-Making and the Impact on Utility and Adverse Impact ............. 61
Scientific or Long-Term Perspective: Limitations of Existing Primary
Validation Studies, Including the Current Meta-Analytic Database......................64
Conclusions ................................................................................................................ 67
References .................................................................................................................. 67

vii
viii Contents

Chapter 4 Work Analysis ............................................................................................................ 73


Kenneth Pearlman and Juan I. Sanchez
Traditional Selection-Related Applications of Work Analysis .................................. 73
A Review of Major Work Analysis Methods and Approaches .................................. 75
Key Work Analysis Practice Issues ............................................................................ 83
Frontiers of Work Analysis: Emerging Trends and Future Challenges ..................... 88
Synopsis and Conclusions ..........................................................................................94
References ..................................................................................................................94

Chapter 5 Current Concepts of Validity, Validation, and Generalizability ................................99


Jerard F. Kehoe and Kevin R. Murphy
Converging Trends in Psychometrics—Toward a Unified Theory of Validity ........ 100
Reliability, Validity, and Generalizations From Test Scores.................................... 101
Validation Process: Linking Tests With Their Uses ................................................ 102
Generalizing Research-Based Inferences to Implementation Decisions ................. 108
References ................................................................................................................ 120

PART 2 Implementation and Management of Employee


Selection Systems in Work Organizations
Jerard F. Kehoe and Robert E. Ployhart, Section Editors

Chapter 6 Attracting Job Candidates to Organizations ............................................................ 127


Ann Marie Ryan and Tanya Delany
Reaching Potential Applicants ................................................................................. 127
Sourcing Globally..................................................................................................... 130
Technology as a Means of Improving Reach ........................................................... 133
Considering Strategic Talent Management ............................................................. 134
Maintaining Interest ................................................................................................. 136
Maintaining Interest Around the Globe ................................................................... 139
Technology’s Role in Maintaining Interest .............................................................. 140
Maintaining the Interest of Targeted Talent ............................................................. 141
Accepting Offers ...................................................................................................... 142
Conclusions .............................................................................................................. 145
References ................................................................................................................ 146

Chapter 7 Test Administration and the Use of Test Scores ...................................................... 151
Jeff W. Johnson and Frederick L. Oswald
Use of Test Scores .................................................................................................... 151
Decisions to Make Before Collecting Test Scores ................................................... 151
Collection of Test Scores .......................................................................................... 153
Computation of Test Scores...................................................................................... 159
Contents ix

Making Selection Decisions ..................................................................................... 163


Conclusions .............................................................................................................. 166
References ................................................................................................................ 166

Chapter 8 Technology and Employee Selection........................................................................ 171


Douglas H. Reynolds and David N. Dickter
Introduction: The Changing Relationship Between Industrial-Organizational
Psychology and Information Technology ............................................................. 171
Critical Issues in Technology-Based Selection ........................................................ 176
Implementation Issues .............................................................................................. 186
Future Directions...................................................................................................... 189
References ................................................................................................................ 192

Chapter 9 Strategy, Selection, and Sustained Competitive Advantage..................................... 195


Robert E. Ployhart and Jeff A. Weekley
Why Personnel Selection Must Show Business-Unit-Level Value........................... 196
SHRM ...................................................................................................................... 197
Alignment of Selection and Strategy .......................................................................200
Selection’s Potential Contribution to Business Unit Value....................................... 203
Selection’s Symbiotic Relationship With Other HR Activities ................................208
Conclusions ..............................................................................................................209
Acknowledgments ....................................................................................................209
References ................................................................................................................ 210

Chapter 10 Managing Sustainable Selection Programs .............................................................. 213


Jerard F. Kehoe, Stefan T. Mol, and Neil R. Anderson
Organization Context for Selection .......................................................................... 213
Defining Selection System Sustainability ................................................................ 214
HR Technology......................................................................................................... 225
Conclusions .............................................................................................................. 233
References ................................................................................................................ 233

Chapter 11 The Business Value of Employee Selection ............................................................. 235


Wayne F. Cascio and Lawrence Fogli
Traditional Model of Employee Selection ................................................................ 235
Challenges to the Business Value of the Traditional Approach ............................... 236
Dynamic, Contemporary Approach to Selection as an Organizational
Process ........................................................................................................237
Importance of Social Context and Interpersonal Processes in Selection
Decisions ............................................................................................................ 240
Conclusions: How Should We Value the Success of Selection in
Organizations? ................................................................................................248
References ................................................................................................................ 250
x Contents

PART 3 Categories of Individual Difference


Constructs for Employee Selection
David Chan and Leaetta Hough, Section Editors

Chapter 12 Cognitive Abilities.................................................................................................... 255


Deniz S. Ones, Stephan Dilchert, Chockalingam Viswesvaran,
and Jesús F. Salgado
History, Current Usage, and Acceptability of Cognitive Ability Measures
in Employee Selection .......................................................................................... 255
Definitions and Theoretical Underpinnings ............................................................. 258
Structure ................................................................................................................... 258
Measurement ............................................................................................................ 259
Criterion-Related Validity Evidence ........................................................................ 261
Group Differences on Cognitive Ability Measures..................................................264
Future Challenges for Research and Practice .......................................................... 269
Epilogue.................................................................................................................... 270
References ................................................................................................................ 270

Chapter 13 Physical Performance Tests ...................................................................................... 277


Deborah L. Gebhardt and Todd A. Baker
Job Analysis for Arduous Jobs ................................................................................. 277
Physical Performance Tests ...................................................................................... 281
Validity of Physical Performance Tests.................................................................... 285
Test Scoring and Administration ............................................................................. 287
Legal Issues .............................................................................................................. 291
Benefits of Physical Testing...................................................................................... 294
References ................................................................................................................ 294

Chapter 14 Personality: Its Measurement and Validity for Employee Selection ........................ 299
Leaetta Hough and Stephan Dilchert
Structure of Personality Variables ........................................................................... 299
Measurement Methods ............................................................................................. 301
Validity of Personality Constructs and Other Factors That Affect
Their Usefulness...................................................................................................308
Conclusions .............................................................................................................. 312
References ................................................................................................................ 312

Chapter 15 Values, Styles, and Motivational Constructs ............................................................ 321


David Chan
Values ....................................................................................................................... 321
Cognitive Styles........................................................................................................ 324
Motivational Constructs ........................................................................................... 327
Practical Considerations and Future Research Challenges ...................................... 332
Epilogue.................................................................................................................... 335
References ................................................................................................................ 336
Contents xi

Chapter 16 Practical Intelligence, Emotional Intelligence, and Social Intelligence .................. 339
Filip Lievens and David Chan
Definitions and Conceptualizations ......................................................................... 339
Measurement Approaches ........................................................................................344
Conceptual Framework for Examining Practical, Emotional, and
Social Intelligence ................................................................................................ 349
Strategies for Future Research ................................................................................. 351
Epilogue.................................................................................................................... 354
References ................................................................................................................ 355

PART 4 Decisions in Developing, Selecting, Using, and


Evaluating Predictors
Ann Marie Ryan and Neal W. Schmitt, Section Editors

Chapter 17 Decisions in Developing and Selecting Assessment Tools ...................................... 363


Nancy T. Tippins, Jone M. Papinchock, and Emily C. Solberg
Introduction .............................................................................................................. 363
Which Constructs Should Be Measured?.................................................................364
How Should the Constructs Be Measured? .............................................................. 366
How Should Validity Evidence Be Gathered? ......................................................... 371
How Should Scores Be Used? .................................................................................. 373
Conclusions .............................................................................................................. 375
References ................................................................................................................ 375

Chapter 18 Administering Assessments and Decision-Making ................................................. 377


R. Stephen Wunder, Lisa L. Thomas, and Zupei Luo
Unproctored Internet Testing ................................................................................... 377
Combination of Candidate Information: The Role of the Decision-Maker ............. 390
Conclusions .............................................................................................................. 396
References ................................................................................................................ 396

Chapter 19 Evaluation of Measures: Sources of Error, Sufficiency, and Contamination ........... 399
Michael J. Zickar, Jose M. Cortina, and Nathan T. Carter
Reliability .................................................................................................................400
Validity .....................................................................................................................402
Sources of Invariance: A Handful of Hypotheses....................................................407
CFA MI/E in Selection Research ............................................................................. 411
Conclusions .............................................................................................................. 413
References ................................................................................................................ 414

Chapter 20 Assessment Feedback ............................................................................................... 417


Manuel London and Lynn A. McFarland
Some Case Examples ............................................................................................... 417
xii Contents

Feedback and Professional Standards ...................................................................... 419


Applicants’ Reactions to Feedback .......................................................................... 420
Benefits and Costs of Feedback................................................................................ 426
Implications for Practice .......................................................................................... 431
Implications for Research......................................................................................... 433
Conclusions .............................................................................................................. 434
References ................................................................................................................ 434

PART 5 Criterion Constructs in Employee Selection


Kevin R. Murphy and Elaine D. Pulakos, Section Editors

Chapter 21 The Measurement of Task Performance as Criteria in


Selection Research ................................................................................................... 439
Walter C. Borman, Rebecca H. Bryant, and Jay Dorio
Objective Criteria ..................................................................................................... 439
Subjective Criteria .................................................................................................... 441
Dimensionality of Job Performance ......................................................................... 447
Predictors of Task Performance Dimensions ........................................................... 454
References ................................................................................................................ 458

Chapter 22 Adaptive and Citizenship-Related Behaviors at Work ............................................. 463


David W. Dorsey, Jose M. Cortina, and Joseph Luchman
Conceptualization..................................................................................................... 463
Adaptive Behavior Defined ...................................................................................... 463
Citizenship Defined ..................................................................................................465
Individual Difference Predictors ..............................................................................468
Measurement ............................................................................................................ 474
Moderators and Mediators—Adaptability ............................................................... 476
Moderators and Mediators—Citizenship ................................................................. 477
Impact on Organizational Outcomes ....................................................................... 479
Too Much of a Good Thing? ....................................................................................480
Conclusions .............................................................................................................. 481
References ................................................................................................................ 482

Chapter 23 Counterproductive Work Behavior and Withdrawal ................................................ 489


Maria Rotundo and Paul E. Spector
Nature of CWB ........................................................................................................ 489
Assessment of CWB................................................................................................. 491
Potential Antecedents of CWB ................................................................................ 492
Environment ............................................................................................................. 499
Person and Environment .......................................................................................... 502
Consequences ........................................................................................................... 503
Future Directions...................................................................................................... 505
Conclusions ..............................................................................................................506
References ................................................................................................................506
Contents xiii

Chapter 24 Defining and Measuring Results of Workplace Behavior ........................................ 513


Elaine D. Pulakos and Ryan S. O’Leary
Measuring Workplace Behavior Versus Results ...................................................... 514
Defining Individual Performance Objectives ........................................................... 516
Challenges Associated With Developing Individual Objectives and
Mitigation Strategies ............................................................................................ 519
Measuring Results of Performance Objectives ........................................................ 522
Challenges Associated With Measuring Results and Mitigation Strategies ............ 524
Individual Difference Predictors of Results ............................................................. 526
Conclusions .............................................................................................................. 527
References ................................................................................................................ 528

Chapter 25 Employee Work-Related Health, Stress, and Safety ................................................ 531


Lois E. Tetrick, Pamela L. Perrewé, and Mark Griffin
Healthy Workers ....................................................................................................... 531
Work Stress............................................................................................................... 535
Occupational Safety ................................................................................................. 541
Conclusions .............................................................................................................. 545
References ................................................................................................................546

Chapter 26 Criterion Validity and Criterion Deficiency: What We Measure


Well and What We Ignore ........................................................................................ 551
Jeanette N. Cleveland and Adrienne Colella
Work and Workforce in the 21st Century: Outmoded Assumptions and
Bases for Change .................................................................................................. 552
Criterion Problem in I-O Psychology ....................................................................... 552
Multilevel Issues in Defining Performance and Success ......................................... 558
“Closing In” on Criterion Deficiency: One Approach to Bridging HR
Systems With Business Unit Strategy .................................................................. 562
Conclusions .............................................................................................................. 563
References ................................................................................................................564

PART 6 Legal and Ethical Issues in Employee Selection


P. Richard Jeanneret and Paul R. Sackett, Section Editors

Chapter 27 Ethics of Employee Selection ................................................................................... 571


Joel Lefkowitz and Rodney L. Lowman
Some Meta-Issues..................................................................................................... 571
Ethical Principles and Dilemmas ............................................................................. 576
Role of Ethical Codes in Professional Practice: Historical and Current
Perspectives .......................................................................................................... 580
Some Specific Issues and Sources of Ethical Problems ........................................... 582
Conclusions .............................................................................................................. 587
References ................................................................................................................ 589
xiv Contents

Chapter 28 Professional Guidelines/Standards ........................................................................... 593


P. Richard Jeanneret and Sheldon Zedeck
Introduction ............................................................................................................. 593
Standards for Educational and Psychological Testing ........................................... 595
Principles for the Validation and Use of Personnel Selection Procedures .............604
Uniform Guidelines on Employee Selection Procedures......................................... 610
Comparisons Among the Three Authorities ............................................................ 615
Final Thoughts ......................................................................................................... 621
Conclusions .............................................................................................................. 621
References ................................................................................................................ 623

Chapter 29 A Sampler of Legal Principles in Employment Selection ........................................ 627


Frank J. Landy, Arthur Gutman, and James L. Outtz
Introduction .............................................................................................................. 627
Principles and Exemplar Case Law.......................................................................... 629
Conclusions ..............................................................................................................648
References ................................................................................................................648

Chapter 30 Perspectives From Twenty-Two Countries on the Legal Environment


for Selection.............................................................................................................. 651
Paul R. Sackett, Winny Shen, Brett Myors, and Colleagues
Data Collection Methodology .................................................................................. 652
Discussion................................................................................................................. 673
Authors’ Note ........................................................................................................... 675
References ................................................................................................................ 675

PART 7 Employee Selection in Specific Organizational


Contexts
Rick Jacobs and Ann Howard, Section Editors

Chapter 31 Selection and Classification in the U.S. Military ..................................................... 679


Wayne S. Sellman, Dana H. Born, William J. Strickland, and Jason J. Ross
Military Personnel System ....................................................................................... 679
Indicators of Recruit Quality ....................................................................................680
Need for Military Selection ...................................................................................... 685
Short History of Military Personnel Testing (Pre-All Volunteer Force)................... 686
Moving to an All-Volunteer Force ............................................................................ 686
ASVAB Misnorming and Job Performance Measurement Project .......................... 688
Enlisted Selection and Classification in Today’s Military ........................................ 691
Enlistment Process ................................................................................................... 692
Recruit Quality Benchmarks and Enlistment Standards .......................................... 693
Selection for Officer Commissioning Programs ...................................................... 693
Officer Retention and Attrition ................................................................................. 695
Officer Executive Development................................................................................ 696
Contents xv

Command Selection and Career Broadening Experiences ....................................... 696


Defense Transformation in Military Selection ......................................................... 697
Conclusions .............................................................................................................. 698
References ................................................................................................................ 699

Chapter 32 Public Sector Employment ....................................................................................... 705


Rick Jacobs and Donna L. Denning
Position Classification in the Public Sector .............................................................. 706
Civil Service Examinations...................................................................................... 706
Linking Tests to Jobs: Validation and Its Many Forms............................................ 708
Creating a Talent Pipeline: Recruiting Candidates .................................................. 711
Promotional Processes: Using What We Know About People and Their
Capabilities to Our Advantage ............................................................................. 712
Personnel Decision-Making and Legal Jeopardy..................................................... 714
Conclusions .............................................................................................................. 718
References ................................................................................................................ 719

Chapter 33 Selection Methods and Desired Outcomes: Integrating Assessment


Content and Technology to Improve Entry- and Mid-Level Leadership
Performance ............................................................................................................. 721
Scott C. Erker, Charles J. Cosentino, and Kevin B. Tamanini
Current Business Trends Affecting Leadership Selection and Development .......... 722
Changing Behavioral Requirements for Leaders ..................................................... 725
Assessment Tools and Techniques ........................................................................... 728
Case Studies of Leadership Selection ...................................................................... 733
Conclusions .............................................................................................................. 737
References ................................................................................................................ 738

Chapter 34 Blue-Collar Selection in Private Sector Organizations ............................................ 741


Wanda J. Campbell and Robert A. Ramos
Definition of Blue-Collar Jobs ................................................................................. 741
Environment ............................................................................................................. 741
Planning and Developing the Selection Process: Psychometric and
Practical Considerations ....................................................................................... 744
Planning and Developing the Selection Process: The Constituents,
Their Issues, and Preferences ............................................................................... 749
Implementing a Selection Procedure ....................................................................... 754
Maintaining a Selection Procedure .......................................................................... 756
Recruitment and Employee Development ................................................................ 759
References ................................................................................................................ 762

Chapter 35 Selection for Service and Sales Jobs ........................................................................ 765


John P. Hausknecht and Angela M. Langevin
Nature of Service and Sales Work ........................................................................... 765
xvi Contents

Research on Selection for Service and Sales Workers ............................................. 768


Implications for Practice and Future Research ........................................................ 774
Conclusions .............................................................................................................. 776
References ................................................................................................................ 776

Chapter 36 Selection in Multinational Organizations ................................................................ 781


Paula Caligiuri and Karen B. Paul
Employee Selection in Multinational Organizations ............................................... 781
Challenges of Cross-National Selection and Assessment ........................................ 785
International Assignments: Implications for Employee Selection ........................... 789
Conclusions .............................................................................................................. 795
References ................................................................................................................ 796

Chapter 37 Selection for Team Membership: A Contingency and Multilevel Perspective......... 801
Susan Mohammed, Jan Cannon-Bowers, and Su Chuen Foo
Conceptual Framework for Understanding Selection for Team Membership..........802
Individual-Level Considerations ..............................................................................804
Team-Level Considerations ......................................................................................809
Discussion................................................................................................................. 815
Conclusions .............................................................................................................. 817
References ................................................................................................................ 818

Chapter 38 Selecting Leaders: Executives and High Potentials ................................................. 823


George C. Thornton III, George P. Hollenbeck, and Stefanie K. Johnson
Executives and High Potentials: Who Are They? What Do They Do?.................... 823
Executive Competencies and Attributes...................................................................824
Assessment Techniques ............................................................................................ 826
Executive Selection in an HRM System .................................................................. 830
Performance Reviews: The Role of Top Executives and Boards ............................. 830
Fit: Individuals, Team, Outcome, Followers, and Culture ....................................... 831
Does It Work? ........................................................................................................... 833
Conclusions .............................................................................................................. 834
References ................................................................................................................ 835

PART 8 Milestones in Employee Selection


Walter C. Borman and Benjamin Schneider, Section Editors

Chapter 39 The Management Progress Study and Its Legacy for Selection............................... 843
Ann Howard
The Foundations of the Assessment Center ............................................................. 843
The Beginnings of Managerial Assessment ............................................................. 845
Predicting Managerial Success ................................................................................ 847
Successful Managers’ Development Over Time ...................................................... 852
AC Advantages and Disadvantages.......................................................................... 855
Contents xvii

The MPS Selection Legacy ...................................................................................... 858


References ................................................................................................................ 863

Chapter 40 Project A: 12 Years of R & D................................................................................... 865


John P. Campbell and Deirdre J. Knapp
Origins of Project A ................................................................................................. 865
Enabling of Project A ...............................................................................................866
Specific Research Objectives ................................................................................... 867
Overall Research Design .......................................................................................... 867
Research Instrument Development: Predictors ........................................................ 869
Job Analyses and Criterion Development ................................................................ 873
Modeling the Latent Structure of Performance ....................................................... 875
Correlations of Past Performance With Future Performance .................................. 878
Criterion-Related Validation .................................................................................... 879
Some Broader Implications ...................................................................................... 882
Conclusions ..............................................................................................................884
References ................................................................................................................ 885

Chapter 41 The Dictionary of Occupational Titles and the Occupational


Information Network ................................................................................................ 887
Norman Peterson and Christopher E. Sager
Milestones ................................................................................................................ 887
Organizing Framework ............................................................................................ 887
DOT .......................................................................................................................... 889
DOT to O*NET™ Transition ................................................................................... 894
Development and Testing of a Prototype O*NET .................................................... 895
O*NET Becomes Operational.................................................................................. 899
Conclusions ..............................................................................................................905
References ................................................................................................................906

Chapter 42 Situational Specificity and Validity Generalization .................................................909


Lawrence R. James and Heather H. McIntyre
Situational Specificity .............................................................................................. 910
VG ............................................................................................................................ 912
Situational Specificity Response to VG ................................................................... 914
Substantive Tests of Situational Specificity.............................................................. 917
Conclusions .............................................................................................................. 918
References ................................................................................................................ 919

Chapter 43 Employee Selection in Europe: Psychotechnics and the


Forgotten History of Modern Scientific Employee Selection .................................. 921
Jesús F. Salgado, Neil R. Anderson, and Ute R. Hülsheger
Introduction .............................................................................................................. 921
The European Origin of Scientific Employee Selection: The Years of Success
(1900–1945) .......................................................................................................... 921
xviii Contents

The Fall of the European Employee Selection Enterprise After


World War II: The Years of Decline and Stagnation (1945–1975) ...................... 931
The Resurgence of European Employee Selection: The Years of
Optimism and Growth (Since 1975)..................................................................... 934
Conclusions .............................................................................................................. 936
Acknowledgments .................................................................................................... 937
References ................................................................................................................ 937

Chapter 44 Employee Selection: Musings About Its Past, Present, and Future.......................... 943
Robert M. Guion
Psychometric Musings.............................................................................................. 943
Musings on the Perpetual Disconnect: Research and Practice ................................ 950
Is Education Really Broadening? ............................................................................. 954
A Final Musing......................................................................................................... 956
References ................................................................................................................ 956
Author Index................................................................................................................................. 959
Subject Index ................................................................................................................................ 983
Preface
Industrial and organizational psychologists have long studied employee selection. For more than
100 years, psychologists have identified new processes for developing and evaluating selection instru-
ments and created new predictors and criteria for a wide range of jobs. At the same time, the organizational
environments in which selection tools are used have continued to evolve. Because the literature on selec-
tion is massive, this handbook can only summarize the important ideas that influence selection efforts
today. Undoubtedly, some important ideas have been omitted and many others are yet to be developed.
This handbook was designed to cover the current thinking on not only basic concepts in employee
selection but also specific applications of those concepts in various organizational settings. In addition,
the book looks backward at the milestones that have been passed in employee selection in the United
States and in Europe and considers the future direction of selection in the final chapter. Chapter 1
provides an overview of the entire volume. Throughout the book, we have encouraged authors to (a)
balance the treatment of scientific (i.e., research findings and theory) and practical concerns related
to implementation and operational use and (b) take a global perspective that reflects the concerns of
multinational corporations and cross-cultural differences in testing practices and applicant skill.
Our hope for this handbook is that it serves as a reference for the informed reader possessing an
advanced degree in industrial and organizational psychology, human resource management, and
other related fields, as well as for graduate students in these fields. Because the intended audience
for the handbook is professionals who work in the area of employee selection in academic and pro-
fessional settings, a conscientious effort has been made to include the latest scientific thought and
the best practices in application.
Handbooks of this size are not published without the help of many people. We are particularly
grateful for the contributions of the 128 authors who wrote these 44 chapters. Without their
expertise and willingness to share their professional and scientific knowledge, there would be no
handbook. Similarly, we also attribute the existence of this handbook to the 16 section editors (John
P. Campbell, Frank J. Landy, Jerard F. Kehoe, Robert E. Ployhart, David Chan, Leaetta Hough, Ann
Marie Ryan, Neal W. Schmitt, Kevin R. Murphy, Elaine D. Pulakos, P. Richard Jeanneret, Paul R.
Sackett, Rick Jacobs, Ann Howard, Walter C. Borman, and Benjamin Schneider) of the eight major
parts of the handbook, who not only helped us refine our outlines and identify authors but also
guided the authors of each chapter in their respective parts.
Anne Duffy, Senior Editor at Psychology Press/Routledge of the Taylor and Francis Group,
deserves a special acknowledgement for initially cajoling us with good humor and positive affect
into taking this project on and then guiding and encouraging us throughout the process, as do Erin
Flaherty for handling the administrative duties for Taylor and Francis and Betsy Saiani of Valtera
for keeping us organized.
There are innumerable teachers, mentors, colleagues, and friends who have taught us much about
employee selection and gotten us to this point in our careers. We are appreciative of their sup-
port and encouragement and their indirect contribution to this handbook. Of course, our respective
families have made a significant contribution to this book through their encouragement, support,
patience, and tolerance. Thank you, Diane and Mac.

James L. Farr
State College, Pennsylvania

Nancy T. Tippins
Greenville, South Carolina

xix
This page intentionally left blank
Editors
James L. Farr
Dr. James L. Farr received his PhD in industrial and organizational psychology from the University
of Maryland. Since 1971 he has been on the faculty of Pennsylvania State University, where he is
currently professor of psychology. He has also been a visiting scholar at the University of Sheffield
(United Kingdom), the University of Western Australia, the Chinese University of Hong Kong, and
the University of Giessen (Germany). His primary research interests are performance appraisal
and feedback, personnel selection, the older worker, and innovation and creativity in work set-
tings. Dr. Farr is the author or editor of over 80 publications in professional journals and books,
including The Measurement of Work Performance (with Frank J. Landy; Academic Press, 1983),
Innovation and Creativity at Work: Psychological and Organizational Strategies (coedited with
Michael West; John Wiley & Sons, 1990), and Personnel Selection and Assessment: Individual and
Organizational Perspectives (coedited with Heinz Schuler and Mike Smith; Lawrence Erlbaum
Associates, 1993). He was the editor of Human Performance from 2000–2006 and has been a
member of the editorial boards of numerous other professional journals, including Journal of Applied
Psychology, Organizational Behavior and Human Decision Processes, Journal of Occupational
and Organizational Psychology, and Journal of Business and Psychology. Active in a number of
professional organizations, Dr. Farr was president of the Society for Industrial and Organizational
Psychology (SIOP) in 1996–97 and has served in a variety of other positions for SIOP. He was an
elected member of the Board of Representatives for the American Psychological Association (APA)
from 1993–1996 and 2002–2004, representing SIOP. He is an elected fellow of SIOP and APA.
A strong believer in the scientist-practitioner model for industrial/organizational (I/O) psychology,
Dr. Farr was a winner of SIOP’s 1980 James McKeen Cattell Award for Research Design (with
Frank J. Landy and Rick Jacobs) and its 1998 M. Scott Myers Award for Applied Research in the
Workplace (with Frank J. Landy, Edwin Fleishman, and Robert Vance). In 2001 he was the winner
of SIOP’s Distinguished Service Award.

Nancy T. Tippins
Dr. Nancy T. Tippins is a senior vice president and managing principal of Valtera Corporation,
where she is responsible for the development and execution of firm strategies related to employee
selection and assessment. She has extensive experience in the development and validation of
selection tests and other forms of assessment, including performance appraisals for all levels of
management and hourly employees as well as in designing performance management programs
and leadership development programs. Prior to joining Valtera, Dr. Tippins worked as an internal
consultant in large Fortune 100 companies (e.g., Exxon, Bell Atlantic, and GTE) developing and
validating selection and assessment tools.
Dr. Tippins is active in professional affairs, including SIOP, where she has served as Chair of
the Committee on Committees, Secretary, Member at Large, and President (2000–2001). In addi-
tion, she served on SIOP’s Ad Hoc Committee on the Revision of the Principles for the Validation
and Use of Personnel Selection Procedures. She is currently the Secretary of the SIOP Foundation.
She has been SIOP’s representative to the APA’s Council of Representatives and served on the
APA’s Board of Professional Affairs. She is currently a commissioner on the Commission for the
Recognition of Specialties and Proficiencies in Professional Psychology and a member of the Joint
Committee for the revision of the Standards for Educational and Psychological Tests. She is on

xxi
xxii Editors

the editorial boards of the Journal of Applied Psychology and Personnel Psychology, where she was
formerly the associate editor of the scientist-practitioner forum of Personnel Psychology. Dr. Tippins
is a fellow of SIOP, the APA, and the American Psychological Society (APS).
Dr. Tippins received MS and PhD degrees in industrial and organizational psychology from the
Georgia Institute of Technology. She holds an MEd in counseling and psychological services from
Georgia State University and a BA in history from Agnes Scott College.
Contributors
Herman Aguinis Paula Caligiuri
University of Colorado Denver Rutgers University
Denver, Colorado New Brunswick, New Jersey

Neil R. Anderson John P. Campbell


Faculty of Economics and Business Department of Psychology
University of Amsterdam University of Minnesota
Amsterdam, The Netherlands Minneapolis, Minnesota

John D. Arnold Wanda J. Campbell


Applied Psychology and Organizational Edison Electric Institute
Research Group Clarksville, Maryland
Wayne State University
Detroit, Michigan Jan Cannon-Bowers
Institute for Simulation & Training
Todd A. Baker Digital Media Department
Human Performance Systems, Inc. University of Central Florida
Alburtis, Pennsylvania Orlando, Florida

Peter Bamberger Nathan T. Carter


Technion-Israel Institute of Technology Department of Psychology
Haifa, Israel Bowling Green State University
Bowling Green, Ohio
Mahmut Bayazit
Sabanci University Wayne F. Cascio
Istanbul, Turkey The Business School
University of Colorado Denver
Marilena Bertolino Denver, Colorado
University of Trento
Trento, Italy David Chan
School of Social Sciences
Walter C. Borman Singapore Management University
Department of Psychology Republic of Singapore
University of South Florida
Tampa, Florida Oleksandr Chernyshenko
University of Canterbury
Dana H. Born Christchurch, New Zealand
United States Air Force Academy
USAFA, Colorado Aichia Chuang
National Taiwan University
Rebecca H. Bryant Taipei, Taiwan
Department of Psychology
University of South Florida
Tampa, Florida

xxiii
xxiv Contributors

Jeanette N. Cleveland David W. Dorsey


Department of Psychology Personnel Decisions Research Institute
Pennsylvania State University Arlington, Virginia
University Park, Pennsylvania
Paul Englert
Adrienne Colella OPRA Consulting Group
A. B. Freeman School of Business Wellington, New Zealand
A Tulane University
New Orleans, Louisiana Scott C. Erker
Selection Solutions Group
Mark Cook Development Dimensions International, Inc.
University of Wales Bridgeville, Pennsylvania
Wales, United Kingdom
Arne Evers
Jose M. Cortina University of Amsterdam
Department of Psychology Amsterdam, The Netherlands
George Mason University
Fairfax, Virginia Lawrence Fogli
People Focus, Inc.
Charles J. Cosentino A Company in the Assessio Group
Development Dimensions International, Inc. Pleasant Hill, California
Bridgeville, Pennsylvania
Franco Fraccaroli
Steven F. Cronshaw University of Trento
University of North British Columbia Trento, Italy
Prince George, British Columbia, Canada
Su Chuen Foo
Tanya Delany Department of Psychology
Global Testing and Assessment Pennsylvania State University
IBM University Park, Pennsylvania
San Diego, California
Andreas Frintrup
Donna L. Denning HR Diagnostics
City of Los Angeles Stuttgart, Germany
Los Angeles, California
Deborah L. Gebhardt
David N. Dickter Human Performance Systems, Inc.
Talent Assessment Beltsville, Maryland
PSI
Burbank, California Mark Griffin
Institute of Work Psychology
Stephan Dilchert University of Sheffield
Zicklin School of Business Sheffield, United Kingdom
Baruch College
City University of New York Robert M. Guion
New York, New York Department of Psychology
Bowling Green State University
Jay Dorio Bowling Green, Ohio
Department of Psychology
University of South Florida
Tampa, Florida
Contributors xxv

Arthur Gutman Tina Joubert


College of Psychology and Liberal Arts SHL
Florida Institute of Technology Groenkloof, South Africa
Melbourne, Florida
Jerard F. Kehoe
John P. Hausknecht Selection and Assessment Consulting
Department of Human Resource Studies Olympia, Washington
Cornell University
Ithaca, New York Deirdre J. Knapp
Human Resources Research Organization
George P. Hollenbeck Alexandria, Virginia
Hollenbeck Associates
Livingston, Texas Cornelius J. KÖnig
Universitat Zurich
Leaetta Hough Zurich, Switzerland
Dunnette Group, Ltd.
St. Paul, Minnesota Hennie J. Kriek
SHL and University of South Africa
Ann Howard Pretoria, South Africa
Development Dimensions International, Inc.
Bridgeville, Pennsylvannia Frank J. Landy
Landy Litigation Support Group
Ute R. Hülsheger New York, New York
Department of Work and Organizational
Psychology Angela M. Langevin
Maastricht University Department of Human Resource Studies
Maastricht, The Netherlands Cornell University
Ithaca, New York
Rick Jacobs
Department of Psychology Joel Lefkowitz
Pennsylvania State University Baruch College
State College, Pennsylvania City University of New York
New York, New York
Lawrence R. James
School of Psychology Filip Lievens
Georgia Institute of Technology Department of Personnel Management and
Atlanta, Georgia Work and Organizational Psychology
Ghent University
P. Richard Jeanneret Ghent, Belgium
Valtera Corporation
Houston, Texas Manuel London
College of Business
Jeff W. Johnson State University of New York at Stony Brook
Personnel Decisions Research Institutes Stony Brook, New York
Minneapolis, Minnesota
Rodney L. Lowman
Stefanie K. Johnson Marshall Goldsmith School of Management
Department of Psychology Alliant International University
Colorado State University San Diego, California
Fort Collins, Colorado
xxvi Contributors

Joseph Luchman Ioannis Nikolaou


George Mason University Athens University of Economics and Business
Fairfax, Virginia Athens, Greece

Zupei Luo Ryan S. O’Leary


Strategic Resources Department PDRI, a PreVisor Company
State Farm Insurance Companies Arlington, Virginia
Bloomington, Illinois
Deniz S. Ones
Marco Mariani Department of Psychology
University of Bologna University of Minnesota
Bologna, Italy Minneapolis, Minnesota

Betty Onyura
Lynn A. McFarland
University of Guelph
Department of Psychology
Ontario, Canada
Clemson University
Clemson, South Carolina
Frederick L. Oswald
Department of Psychology
Heather H. McIntyre Rice University
School of Psychology Houston, Texas
Georgia Institute of Technology
Atlanta, Georgia James L. Outtz
Outtz and Associates
Antonio Mladinic Washington, DC
Pontificia Universidad Catolica de Chile
Santiago, Chile Jone M. Papinchock
Valtera Corporation
Susan Mohammed Greenville, South Carolina
Department of Psychology
Pennsylvania State University Karen B. Paul
University Park, Pennsylvania 3M
St. Paul, Minnesota
Stefan T. Mol
Amsterdam Business School Kenneth Pearlman
University of Amsterdam Creative Personnel Management Consulting
Amsterdam, The Netherlands Sarasota, Florida

Pamela L. Perrewé
Kevin R. Murphy
College of Business
Department of Psychology
Florida State University
Pennsylvania State University
Tallahassee, Florida
University Park, Pennsylvania
Norman Peterson
Brett Myors SPR Center, Inc.
Griffith University Minneapolis, Minnesota
Brisbane, Australia
Robert E. Ployhart
Levi Nieminen Department of Management
Wayne State University University of South Carolina
Detroit, Michigan Columbia, South Carolina
Contributors xxvii

Elaine D. Pulakos Christopher E. Sager


PDRI, a PreVisor Company Department of Psychology
Arlington, Virginia University of Central Florida
Orlando, Florida
Dan J. Putka
Human Resources Research Organization Jesús F. Salgado
Alexandria, Virginia Department of Social and Basic Psychology
University of Santiago de Compostela
Santiago de Compostela, Spain
Shabu B. Raj
Defence Institute of Psychological Research Juan I. Sanchez
Delhi, India Department of Management and International
Business
Robert A. Ramos Florida International University
Edison Electric Institute Miami, Florida
Arlington, Virginia
Neal W. Schmitt
Department of Psychology
Douglas H. Reynolds Michigan State University
Assessment Technology Group East Lansing, Michigan
Development Dimensions International, Inc.
Bridgeville, Pennsylvania Eveline Schollaert
Ghent University
Viviana Rodríguez Ghent, Belgium
Pontificia Universidad Catolica de Chile
Santiago, Chile Heinz Schuler
University of Hohenheim
Stuttgart, Germany
Florence Rolland
Universite de Nice-Sophia Antipolis
Tomoki Sekiguchi
Nice, France
Osaka University
Osaka, Japan
Jason J. Ross
United States Air Force Academy Wayne S. Sellman
USAFA, Colorado Human Resources Research Organization
Alexandria, Virginia
Maria Rotundo
Joseph L. Rotman School of Management Winny Shen
University of Toronto Department of Psychology
Toronto, Ontario, Canada University of Minnesota
Minneapolis, Minnesota
Ann Marie Ryan
Handan Kepir Sinangil
Department of Psychology
Marmara University
Michigan State University
Istanbul, Turkey
East Lansing, Michigan
Emily C. Solberg
Paul R. Sackett Valtera Corporation
Department of Psychology Rolling Meadows, Illinois
University of Minnesota
Minneapolis, Minnesota
xxviii Contributors

Paul E. Spector Maria Tomprou


Department of Psychology Athens University of Economics and Business
University of South Florida Athens, Greece
Tampa, Florida
Shay Tzafrir
Dirk D. Steiner University of Haifa
Universite de Nice-Sophia Antipolis Haifa, Israel
Nice, France
Hyuckseung Uag
William J. Strickland Yonsei University
Human Resources Research Organization Seoul, South Korea
Alexandria, Virginia
Greet Van Hoye
S. Subramony Ghent University
Defence Institute of Psychological Research Ghent, Belgium
Delhi, India
Chockalingam Viswesvaran
Kevin B. Tamanini Department of Psychology
Development Dimensions International, Inc. Florida International University
Bridgeville, Pennsylvania Miami, Florida

Lois E. Tetrick Jeff A. Weekley


Department of Psychology Kenexa
George Mason University Frisco, Texas
Fairfax, Virginia
R. Stephen Wunder
Lisa L. Thomas Strategic Resources Department
Strategic Resources Department State Farm Insurance Companies
State Farm Insurance Companies Bloomington, Illinois
Bloomington, Illinois
Sheldon Zedeck
Larissa A. Thommen Department of Psychology
Universitate Zurich University of California at Berkeley
Zurich, Switzerland Kensington, California

George C. Thornton III Michael J. Zickar


Department of Psychology Department of Psychology
Colorado State University Bowling Green State University
Fort Collins, Colorado Bowling Green, Ohio
1 Handbook
Selection
of Employee

An Introduction and Overview

James L. Farr and Nancy T. Tippins

Employee selection is a major human resources function in work organizations. Employee selec-
tion refers to the process that employers use to make decisions concerning which individuals from
a group to choose for particular jobs or roles within the organization. Although frequently the job
candidates are not employed by the organization at the time selection decisions are made, employee
selection can also refer to cases in which current organizational employees may be candidates for
another job (e.g., a promotion to a higher organizational level or a lateral move to a different unit or
function). Guion (1998) has noted that the same selection logic also applies to other types of human
resource decisions, such as choosing which employees should be offered participation in skill or
competency development programs.
The chapters in Handbook of Employee Selection adopt the point of view that employment selec-
tion decisions should be based on information systematically obtained about job candidates and
that such candidate information has documented usefulness for predicting which candidates are
most likely to meet the objectives of the selection process. Selection objectives differ in their details
across organizations, job characteristics, and occasions but in general fall into three broad catego-
ries: (a) improving organizational effectiveness and efficiency; (b) complying with legal require-
ments and societal norms regarding equal employment opportunities related to candidates’ race,
ethnicity, gender, age, and disability status; and (c) helping individuals gain meaningful and sat-
isfying employment. Thus, the perspective regarding employee selection that is advocated in this
handbook is science- and data-based and consistent with the contemporary approach to organiza-
tional decision-making known as evidence-based management (e.g., Pfeffer & Sutton, 2006). This
perspective has evolved over the past century, and its history is briefly reviewed below.

BRIEF HISTORY OF SCIENCE- AND DATA-BASED EMPLOYEE SELECTION


Employee selection has been a major component of industrial-organizational (I-O) psychology1 since
its beginning as a subdiscipline within psychology near the beginning of the 20th century. Vinchur
(2007) provided a detailed discussion of the history of employee selection with particular emphasis
on the pre-1930 era, and we do not attempt to duplicate that effort. As Vinchur noted, employee
selection predates the existence of I-O psychology as a scientific and applied field. The growth of

1 What is now known as I-O psychology in North America, and work and organizational psychology in much of the rest
of the world, has had several names during its history; prior to 1973 the most common label was industrial psychology.
Throughout this handbook we generally use industrial-organizational (often abbreviated as I-O) psychology as the disci-
plinary name.

1
2 Handbook of Employee Selection

work organizations and paid employment necessitated that employers hire or select employees from
a larger group of potential workers (job applicants). Employee selection within work organizations
was largely informal, nonsystematic, and idiosyncratic before the involvement of I-O psychologists.
I-O psychology emphasized the empirical evaluation of the value of employee selection methods
that were based on principles of scientific research methodology being developed in experimental
psychology and on standardized measurement of individual differences (Vinchur, 2007). The basic
cornerstone of employee selection from the perspective of I-O psychology is that measures of the
abilities, knowledge, skills, and other characteristics of job applicants can be used to predict which
individuals will be (more) effective employees. Such measures (usually labeled as tests) are developed
systematically, based on job requirements, to maximize accuracy and minimize error in the assessment
of these individual differences among applicants.
The process of empirical evaluation of employee selection methods soon became known as
test validation, and by the 1920s the general components of an employee selection process and the
procedures for establishing the value of the selection process were much like what they are to
this day (Vinchur, 2007). For example, Kornhauser and Kingsbury (1924) described an initial job
analysis that led to the selection of existing tests or the creation of new tests to be administered to
job applicants. The scores on the selection tests were used to predict the individuals’ performance
on a criterion that represented success or proficiency on the job. Accuracy in the prediction of job
performance (criterion scores) was a measure of the validity of the test(s). Finally, the minimally
acceptable performance score was used to establish a passing or cut score on the test to determine
which applicants would be hired or rejected for the job. Bingham and Freyd (1926) addressed
various questions that should be asked (and answered) in advance of a validation investigation to
determine its scientific and practical feasibility. These issues concerned whether inadequate
current selection procedures were likely causes of performance deficiencies in the target job,
whether there were more job applicants than job openings, whether the number of projected hires
was adequate for scientific evaluation of test validity, whether an accurate and reliable performance
criterion was available, and whether the organizational members would be receptive to the validity
investigation.
Although these components of the selection process and considerations related to the feasibility
of test validation are still generally applicable to contemporary employee selection, we do not want
to convey the impression that nothing has changed since the 1920s! Much has changed. Test vali-
dation and the operational use of validated tests for employee selection decisions in the 1920s and
following decades were based on the assumption that job requirements and organizational contexts
were relatively stable over time. The pace of change in contemporary organizations has increased
considerably, with technological, product, and market changes having large and frequent impacts on
how jobs are performed. It was also assumed in those early decades that individual job performance
was linked to organizational effectiveness in a cumulative and direct way; that is, improvements
in the job performance of individual employees lead directly to improved organizational effective-
ness. There was little concern for performance of work groups or selecting for improved group
functioning. Today, much work is performed by groups, who may work together face-to-face or only
communicate via technology. In addition, the early emphasis on developing selection tests was the
assessment of cognitive and psychomotor abilities that could be shown to be predictive of task per-
formance. Personality and motivation were rarely evaluated as possible predictors of performance
criteria, whereas contemporary group-based work places considerable interpersonal and teamwork
performance demands on employees that may be better predicted by noncognitive measures. Prior
to about 30 years ago, a selection test found to be a valid predictor of job performance in an organi-
zational setting was not considered useful for employee selection for the same job in similar organi-
zational settings unless the test score-criterion measure relationship was empirically demonstrated
in each new setting. It was thought that unmeasured situational factors might attenuate or even
eliminate the test-criterion relationship demonstrated in the original setting. More recently, profes-
sional thinking has moved toward the view that validity evidence obtained in one organizational
Handbook of Employee Selection 3

setting for a particular assessment measure can be used to support the use of that measure in other
settings, provided that sufficient similarity of the job requirements, applicant groups, and organiza-
tional context can be shown.

STRUCTURE OF HANDBOOK OF EMPLOYEE SELECTION


The chapters in this handbook address the issues noted above and many other recent develop-
ments in theory and research findings that have changed how I-O psychologists and other human
resource professionals approach employee selection in the 21st century. The handbook is divided
into eight parts, the chapters of which focus on a specific topic related to employee selection. Two
section coeditors assisted the editors with the chapters in each part of the handbook. They are listed
below:

• Part 1: Foundations of Psychological Measurement and Evaluation Applied to Employee


Selection (section coeditors: John P. Campbell and Frank J. Landy)
• Part 2: Implementation and Management of Employee Selection Systems in Work
Organizations (section coeditors: Jerard F. Kehoe and Robert E. Ployhart)
• Part 3: Categories of Individual Difference Constructs for Employee Selection (section
coeditors: David Chan and Leaetta Hough)
• Part 4: Decisions in Developing, Selecting, Using, and Evaluating Predictors (section
coeditors: Ann Marie Ryan and Neal W. Schmitt)
• Part 5: Criterion Constructs in Employee Selection (section coeditors: Kevin R. Murphy
and Elaine D. Pulakos)
• Part 6: Legal and Ethical Issues in Employee Selection (section coeditors: P. Richard
Jeanneret and Paul R. Sackett)
• Part 7: Employee Selection in Specific Organizational Contexts (section coeditors: Rick
Jacobs and Ann Howard)
• Part 8: Milestones in Employee Selection (section coeditors: Walter C. Borman and
Benjamin Schneider)

The remainder of this chapter presents brief summaries of the eight major parts of the handbook.
We do not attempt to address the specifics of each chapter in any detail. Rather, we synthesize the
important issues discussed in the part’s chapters. Some chapters are necessarily discussed more
than others, but that does not reflect our evaluative judgments concerning the chapters. We trust
that reading these brief comments about the parts of the handbook will help send you to the specific
chapters that interest you most at a given point in time.
The four chapters in Part 1 focus on topics that are the basic building blocks for science- and
data-based employee selection. These topics include reliability and validity, design of validation
studies, generalizability of validity evidence, and measurement of work activities within their orga-
nizational context. Important conclusions that can be drawn from the chapters of Part 1 include the
following:

1. In employee selection, the principal concern is for the validity of the predictive inference
that future job behavior can be linked to current test scores.
2. There are different types of validity evidence for the predictive inference and several ways
to obtain that evidence.
3. Although there is support for the general concept that validity evidence obtained in one
organizational setting may support a similar predictive inference in another setting, numer-
ous situational factors affect that transfer of validity evidence.
4. Validity evidence quantitatively aggregated across studies can provide more accurate
estimates of test score-work behavior relationships than evidence drawn from individual
4 Handbook of Employee Selection

studies, but the quality of primary validity studies, including analysis of work activities, is
critical for the validity of the aggregated data.
5. The validity evidence for criterion (work behavior) measures is as important as the evidence
for predictor (test scores) measures when examining the validity of predictive inferences.

Part 2 contains six chapters that address various aspects of how selection systems are used
operationally and how they are managed within the organization. Chapters examine such specific
topics as applicant recruitment, test administration, use of technology, and the organizational fac-
tors affecting the implementation and sustainability of employee selection programs. Themes that
emerge in Part 2 include the following:

1. Scientific advancements related to the implementation and continuation of selection sys-


tems lag behind the operational use of new approaches and methods in organizations.
2. To be accepted within the organization, employee selection systems must be aligned with
organizational strategy and other human resource functions.
3. Organizational decision-makers are more concerned with the contribution of employee
selection to business-unit-level performance and the organization’s competitive advantage
than with the validity of predictive inferences at the level of individual employees.
4. Successful implementation and sustainment of selection systems require skills in
change management and process consultation as much as skills in test validation and
psychometrics.

Part 3 contains five chapters that each examines a different predictor frequently used in employee
selection. These include cognitive ability, personality, values, cognitive styles, motives and needs,
emotional and social intelligence, and physical abilities. Although cognitive ability measures have
demonstrated their importance as predictors of job performance in many jobs and organizations,
evidence is presented in other chapters in Part 3 that noncognitive constructs often provide incre-
mental predictive validity beyond that related to cognitive ability tests. In addition, concerns about
the group mean differences and adverse impact often found with cognitive tests have motivated I-O
psychologists to search for combinations of cognitive and noncognitive predictors to meet validity
requirements and social objectives in employment practices. Enhanced theory and improved mea-
surement of noncognitive constructs in recent years have also increased their use in selection.
Part 4 contains four chapters that are focused on decisions that must be made in designing an
assessment and determining the conditions under which it will be used. These chapters address
various tradeoffs (or balancing of multiple goals) that must be considered when making decisions
about the development of an employee selection system and its operational use once developed. In
addition to the validity-adverse impact tradeoff noted above in Part 3, decisions regarding develop-
ment and operation of selection systems in an organization must include considerations of required
time and budget resources and the number of organizational incumbents needed for various pur-
poses. In addition, the relation between the selection system and the managers for whom the new
employees will work often must be negotiated, and, minimally, the selection system developers must
understand who makes the final hiring decision. Finally, decisions must be made regarding what
information (assessment feedback) is provided to job candidates and the organization. For example,
successful candidates often have developmental needs, but should such information be communi-
cated to the candidate only, to organizational management only, or to both? How much information
is provided to unsuccessful candidates? Do such candidates from outside of the organization receive
the same information as those who are current employees seeking a new job?
The six chapters in Part 5 address the measurement of employee’s behavior on the job. Not
too many years ago, probably only one or two chapters could comprehensively cover this topic.
Traditionally in I-O psychology, the job incumbent’s performance of the tasks found on the job
description would have been the criterion measure of interest when assessing the validity of an
Handbook of Employee Selection 5

employee selection system. For some jobs characterized by high turnover, absence rates, or expen-
sive training costs, a relevant prediction would have been whether a job candidate could be expected
to remain on the job for some period of time, to attend work regularly, or to complete training
successfully. The definition of an “effective employee” has expanded considerably in the past two
decades to include more than someone who completes training programs, performs adequately
on core job tasks, attends work regularly, and remains on the job. The importance of displaying
such behaviors as adaptability, creativity, and organizational citizenship, while minimizing such
counterproductive work behaviors as incivility, verbal and physical aggression, and theft and sabo-
tage of organizational resources, has been recognized and documented in organizational research,
and these additional behaviors have emerged as relevant criteria for employee selection systems.
The expanded definition of effective employee performance makes it more likely that predictors
beyond cognitive ability will contribute to the prediction of effectiveness, because some behavior
domains (e.g., citizenship and counterproductive behavior) are likely to be related to personality
and motivation.
Part 6 contains four chapters that address the environment in which employee selection occurs.
One chapter examines the general nature of ethical and moral dilemmas and, more specifically,
the Code of Ethics of the American Psychological Association, as they relate to employee selec-
tion. Another discusses three documents [Standards for Educational and Psychological Testing
(American Educational Research Association, American Psychological Association, & National
Council on Measurement in Education, 1999), Principles for the Use and Validation of Personnel
Selection Procedures (Society for Industrial and Organizational Psychology, 2003), and Uniform
Guidelines for Employee Selection Procedures (Equal Employment Opportunity Commission,
Civil Service Commission, Department of Labor, & Department of Justice, 1978)] containing legal
or professional guidelines and standards that are directly linked to employee selection and pro-
vide potential yardsticks by which selection instruments and programs can be evaluated. The final
two chapters in this part look at legal issues related to selection. One discusses 14 legal principles
derived from U.S. statutory law and case law derived from written opinions of the U.S. federal
judiciary, noting their application to and implications for employee selection. The other discusses
commonalities and differences in the legal environment of employee selection across 22 countries,
addressing such issues as the subgroups (racial, ethnicity, religious, gender) that are protected from
employment discrimination, the employers and employment practices covered by employment leg-
islation, the penalties for unfair employment practices, and procedures for charging and defending
claims of discrimination. The chapters in Part 6 define the important sociolegal environment in
which employee selection must exist.
The chapters in Parts 1–6 of this handbook in large part are written to be somewhat generic in
their content so that they apply well to most types of organizations and jobs. In contrast, the chap-
ters in Part 7 examine employee selection related to specific types of organizations or occupations.
Chapters focus on selection in the U.S. military, public (governmental) agencies, and multinational
organizations with particular emphasis on the factors that raise specific concerns in each of these
types of organizations. Other chapters address selection within particular occupations, including
skilled craft and technician jobs, sales and service jobs, entry- and middle-manager positions, and
executive leadership positions, again focusing on specific concerns. In addition, another chapter
addresses special issues related to selecting individuals for membership in work teams, including
concerns about individual-team fit and complementarity.
Part 8 presents a somewhat different perspective on employee selection than the other parts of
the handbook. As its title denotes, the chapters in Part 8 describe important events or concepts in the
relatively recent history of employee selection, true professional milestones. Three chapters focus
on large-scale, longitudinal projects that had a major impact on science- and data-based employee
selection. These include the development and implementation of the assessment center method at
AT&T (Chapter 39); the advancement of job performance theory that resulted from Project A con-
ducted within the U.S. Army (Chapter 40); and the development of O*NETTM, a comprehensive
6 Handbook of Employee Selection

database and taxonomy that provides descriptions of all occupations in the U.S. economy that are
linked to many other occupational and labor market databases and are accessible to users on the
Internet (Chapter 41). Chapter 42 addresses the issues of validity generalization and situational
specificity, tracing the development of arguments and related evidence from both perspectives.
The situational specificity perspective argued that evidence for the validity of a selection system
(or test) was required for each organizational situation in which the selection test was used and
was the dominant professional viewpoint for the first seven or eight decades of data-based selec-
tion. Validity generalization shifted professional thinking by presenting evidence that much of
the situational variation in validity evidence for similar selection tests was related to statistical
artifacts in the sample data sets. In Chapter 42, a somewhat different approach to these questions
is presented, and it is argued that situational moderators of test validity do exist—specifically,
job type and task complexity—and that other moderators are likely to exist, supporting the need
for continued development of situational measurement and inclusion of situational variables in meta-
analytic studies of validity generalization. The final chapter in Part 8 looks somewhat further
back into the history of science-based selection but focuses on selection research and application
in Europe. Considerable scientific and applied activity related to employee selection occurred in
many European countries in parallel to similar efforts being conducted in North America, but they
remain relatively unknown outside of Europe, an unfortunate omission in the knowledge domain
of I-O psychologists in other parts of the world.
The handbook concludes with an epilogue written by Robert M. Guion, who draws upon his 50+
years of scientific and professional involvement in employee selection to reflect on the past, present,
and future status of selection. Guion muses on such topics as psychometric issues, the “perpetual
disconnect” between science and practice, the role of narrow and broad constructs in advancing
selection knowledge, and the integration of multilevel concepts into our thinking about selection.
Such musings provide a clear message that there is still much to learn about employee selection
with regard to the psychological constructs that form the basis of effective job performance and to
the implementation of selection procedures that help organizations and employees come together to
achieve both of their goals for success.

REFERENCES
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education (1999). Standards for educational and psychological testing. Washington,
DC: Author.
Bingham, W. V., & Freyd, M. (1926). Procedures in employment psychology: A manual for developing scien-
tific methods of vocational selection. New York, NY: McGraw-Hill.
Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Depart-
ment of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43,
8290–8315.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions. Mahwah, NJ: Lawrence
Erlbaum.
Kornhauser, A. W., & Kingsbury, F. A. (1924). Psychological tests in business. Chicago: University of Chicago
Press.
Pfeffer, J., & Sutton, R. I. (2006). Hard facts, dangerous half-truths, and total nonsense: Profiting from
evidence-based management. Boston, MA: Harvard Business School Press.
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of person-
nel selection procedures (4th ed.). Bowling Green, OH: Author.
Vinchur, A. J. (2007). A history of psychology applied to employee selection. In L. L. Koppes (Ed.), Historical
perspectives in industrial and organizational psychology (pp. 193–218). Mahwah, NJ: Lawrence
Erlbaum.
Part 1
Foundations of Psychological
Measurement and Evaluation
Applied to Employee Selection
John P. Campbell and Frank J. Landy,
Section Editors
This page intentionally left blank
2 Reliability and Validity
Dan J. Putka and Paul R. Sackett

Reliability and validity are concepts that provide the scientific foundation upon which we construct
and evaluate predictor and criterion measures of interest in personnel selection. They offer a com-
mon technical language for discussing and evaluating (a) the generalizability of scores resulting
from our measures (to a population of like measures), as well as (b) the accuracy inferences we
desire to make based on those scores (e.g., high scores on our predictor measure are associated with
high levels of job performance; high scores on our criterion measure are associated with high levels
of job performance).1 Furthermore, the literature surrounding these concepts provides a framework
for scientifically sound measure development that, a priori, can enable us to increase the likelihood
that scores resulting from our measures will be generalizable, and inferences we desire to make
based upon them, supported.
Like personnel selection itself, science and practice surrounding the concepts of reliability and
validity continue to evolve. The evolution of reliability has centered on its evaluation and framing
of “measurement error,” as its operational definition over the past century has remained focused on
notions of consistency of scores across replications of a measurement procedure (Haertel, 2006;
Spearman, 1904; Thorndike, 1951). The evolution of validity has been more diverse—with changes
affecting not only its evaluation, but also its very definition, as evidenced by comparing editions
of the Standards for Educational and Psychological Testing produced over the past half century
by the American Educational Research Association (AERA), American Psychological Association
(APA), and National Council on Measurement in Education (NCME) (AERA, APA, & NCME,
1999). Relative to the evolution of reliability, the evolution of validity has been well covered in the
personnel selection literature (e.g., Binning & Barrett, 1989; McPhail, 2007; Schmitt & Landy,
1993; Society for Industrial and Organizational Psychology, Inc., 2003), and will continue to be
well covered in this handbook. Indeed a section of this chapter and the three chapters that follow all
speak directly to developments with regard to validity—particularly as it relates to personnel selec-
tion. For this reason and those we note below, the bulk of this chapter will be devoted to providing
an integrated, modern perspective on reliability.
In reviewing literature in preparation for this chapter, we were struck at the paucity of organi-
zational research literature that has attempted to juxtapose and integrate perspectives on reliability
of the last 50 years, with perspectives on reliability from the first half of the 20th century. Indeed,
Borsboom (2006) lamented that to this day many treatments of reliability are explicitly framed or
implicitly laden with assumptions based on measurement models from the early 1900s. While clas-
sical test theory (CTT) certainly has its place in treatments of reliability, framing entire treatments
around it serves to “trap” us within the CTT paradigm (Kuhn, 1962). This makes it difficult for
students of the field to compare and contrast—on conceptual and empirical grounds—perspectives
offered by other measurement theories and approaches to reliability estimation. This state of affairs
is highly unfortunate because perspectives on reliability and methods for its estimation have evolved

1 Throughout this chapter we use the term “scores” to generically refer to observed manifestations of a measurement
procedure—thus, scores might be ratings, behavioral observations, test scores, etc.

9
10 Handbook of Employee Selection

greatly since Gulliksen’s codification of CTT in 1950, yet these advances have been slow to dis-
seminate into personnel selection research and practice. Indeed, our review of the literature reveals
what appears to be a widening gap between perspectives of reliability offered in the organizational
research literature and those of the broader psychometric community (Borsboom, 2006). Couple
this trend with (a) the recognized decline in the graduate instruction of statistics and measure-
ment over the past 30 years in psychology departments (Aiken, West, & Millsap, 2008; Merenda,
2007), as well as (b) the growing availability of statistical software and estimation methods since
the mid-1980s and we have a situation where the psychometric knowledge base of new researchers
and practitioners can be dated prior to exiting graduate training. Perhaps more disturbing is that
the lack of dissemination of modern perspectives on reliability can easily give students of the field
the impression that there have not been many scientifically or practically useful developments in the
area of reliability since the early 1950s.
In light of the issues raised above, our aim in the first part of this chapter is to parsimoniously
reframe and integrate developments in the reliability literature over the past century that reflects,
to the extent of our knowledge, our modern capabilities. In laying out our discussion, we use exam-
ples from personnel selection research and practice to relate key points to situations readers may
confront in their own work. Given our focus, we should note that that several topics commonly
discussed in textbook or chapter-length treatments of reliability are missing from this chapter. For
example, topics such as standard errors of measurement, factors affecting the magnitude of reliabil-
ity coefficients (e.g., sample heterogeneity), and applications of reliability-related data (e.g., correc-
tions for attenuation, measure refinement) receive little or no attention here. The omission of these
topics is not meant to downplay their importance to our field; rather, it just reflects the fact that fine
treatments of these topics already exist in several places in the literature (e.g., Feldt & Brennan,
1989; Haertel, 2006; Nunnally, 1978). Our emphasis is on complementing the existing literature,
not repeating it. In place of these important topics, we focus on integrating and drawing connec-
tions between historically disparate perspectives on reliability. As we note below, such integration
is essential, because the literature on reliability has become extremely fragmented.
For example, although originally introduced as a “liberalization” of CTT over 40 years ago,
generalizability theory is still not well integrated into textbook treatments of reliability in the orga-
nizational literature. It tends to be relegated to secondary sections that appear after the primary
treatment of reliability (largely based on CTT) is introduced, not mentioned at all, or treated as
if it had value in only a limited number of measurement situations faced in research and practice.
Although such a statement may appear as a wholesale endorsement of generalizability theory and
its associated methodology, it is not. As an example, the educational measurement literature has
generally held up generalizability theory as a centerpiece of modern perspectives on reliability, but
arguably, this has come at the expense of shortchanging confirmatory factor analytic (CFA)-based
perspectives on reliability and how such perspectives relate to and can complement generalizability
theory. Ironically, this lack of integration goes both ways, because CFA-based treatments of reli-
ability rarely if ever acknowledge how generalizability theory can enrich the CFA perspective, but
rather link their discussions of reliability to CTT. Essentially, investigators faced with understand-
ing modern perspectives on reliability are faced with a fragmented, complex literature.

Overview
Our treatment of reliability is organized into four main sections. The first section offers a concep-
tual, “model-free” definition of measurement error. In essence, starting out with such a model-
free definition of error is required to help clarify some confusion that tends to crop up when one
begins to frame error from the perspective of a given measurement theory and the assumptions
such theories make regarding the substantive nature of error. Next we overlay our conceptual
treatment of error with perspectives offered by various measurement models. Measurement mod-
els are important because they offer a set of hypotheses regarding the composition of observed
Reliability and Validity 11

scores, which, if supported, can allow us to accurately estimate reliability from a sample of data
and apply those estimates to various problems (e.g., corrections for attenuation, construction of
score bands). Third, we compare and contrast three traditions that have emerged for estimating
reliability: (a) a classical tradition that arose out of work by Spearman (1904) and Brown (1910),
(b) a random-effects model tradition that arose out of Fisher’s work with analysis of variance
(ANOVA), and (c) a CFA tradition that arose out of Joreskog’s work on congeneric test models.
Lastly, we close our discussion of reliability with a critical examination of how we have histori-
cally framed and dealt with measurement error. This closing section leads naturally into a discus-
sion of validity.
Our treatment of validity is entirely conceptual in nature. We do not address operational issues
in the design of studies aimed at obtaining various types of validity evidence. Rather, we attempt
to outline a set of distinctions that we view as central to an understanding of validity. Namely, we
discuss (a) validity as predictor-criterion relationship versus broader conceptualizations, (b) valid-
ity of an inference versus validity of a test, (c) types of validity evidence versus types of validity,
(d) validity as an inference about a test score versus validity as a strategy for establishing job relat-
edness, (e) the predictive inference versus the evidence for it, (f) classical versus modern notions
of reliability as a limiter of validity, and (g) validity limited to inferences about individuals versus
including broader consequences of test score use. Our belief is that a clear understanding of these
foundational issues in validity is essential for effective research and practice in the selection arena.

Reliability
A specification for error is central to the concept of reliability, regardless of one’s theoretical
perspective; but to this day the meaning of the term “error” is a source of debate and confusion
(Borsboom & Mellenbergh, 2002; Murphy & DeShon, 2000; Schmidt, Viswesvaran, & Ones,
2000). The sources of variance in scores that are designated as sources of error can differ as a func-
tion of (a) the inferences or assertions an investigator wishes to make regarding the scores, (b) how
an investigator intends to use the scores (e.g., for relative comparison among applicants or absolute
comparison of their scores to some set standard), (c) characteristics of the measurement procedure
that produced them, and (d) the nature of the construct one is attempting to measure. Consequently,
what is called error, even for scores produced by the same measurement procedure, may legiti-
mately reflect different things under different circumstances. A convenient way to address these
points is to examine how error has come to be operationally defined in the context of estimating reli-
ability. All measurement theories seem to agree that reliability estimation attempts to quantify the
expected degree of consistency in scores over replications of a measurement procedure (Brennan,
2001a; Haertel, 2006). Consequently, from the perspective of reliability estimation, error reflects
the expected degree of inconsistency between scores produced by a measurement procedure and
replications of it. There are several elements of these operational definitions that warrant further
explanation, beginning with the notion of replication. Clarifying these elements will provide an
important foundation for the remainder of our discussion.

Replication
From a measurement perspective, replication refers to the repetition or reproduction of a mea-
surement procedure such that the scores produced by each “replicate” are believed to assess the
same construct.2 There are many ways of replicating a measurement procedure. Perhaps the most

2 As we discuss later, the degree to which replicates are assumed to “assess the same construct” differs across measure-
ment theories. The degree of similarity among replicates has been discussed under the rubric of degrees of part-test
similarity (Feldt & Brennan, 1989) and degrees of parallelism (Lord, 1955). At this point, further discussion of this issue
is unnecessary, but we will revisit this issue when discussing the role of measurement models in reliability estimation.
12 Handbook of Employee Selection

straightforward way would be to administer the same measurement procedure on more than one
occasion, which would provide insight into how consistent scores are for a given person across
occasions. However, we are frequently interested in more than whether our measurement proce-
dure would produce comparable scores on different occasions. For example, would we achieve
consistency over replicates if we had used an alternative, yet similar, set of items to those that
comprise our measure? Answering the latter question is a bit more difficult in that we are rarely
in a position to replicate an entire measurement procedure (e.g., construct two or more 20-item
measures of conscientiousness and compare scores on each). Consequently, in practice, “parts”
or “elements” of our measurement procedure (e.g., items) are often viewed as replicates of each
other. The observed consistency of scores across these individual elements is then used to make
inferences about the level of consistency we would expect if our entire measurement procedure
was replicated, that is, how consistent would we expect scores to be for a given person across
alternative sets of items we might use to assess the construct of interest. The forms of replication
described above dominated measurement theory for nearly the first 5 decades of the 20th century
(Cronbach, 1947; Gulliksen, 1950).
Modern perspectives on reliability have liberalized the notion of replicates in terms of (a) the
forms that they take and (b) how the measurement facets (i.e., items, raters, tasks) that serve to
define them are manifested in a data collection design (i.e., a measurement design). For example,
consider a measurement procedure that involves having two raters provide ratings for individuals
with regard to their performance on three tasks designed to assess the same construct. In this case,
replicates take the form of the six rater-task pairs that comprise the measurement procedure, and as
such, are multifaceted (i.e., each replicate is defined in terms of specific rater and a specific task).
Prior to the 1960s, measurement theory primarily focused on replicates that were defined along a
single facet (e.g., replicates represented different items, different split-halves of a test, or the same
test administered on different occasions).3 Early measurement models were not concerned with rep-
licates that were multifaceted in nature (Brown, 1910; Gulliksen, 1950; Spearman, 1910). Modern
perspectives on reliability also recognize that measurement facets can manifest themselves differ-
ently in any given data collection design. For example, (a) the same raters might provide ratings for
each ratee, (b) a unique, nonoverlapping set of raters might provide ratings for each ratee, or (c) sets
of raters that rate each ratee may vary in their degree of overlap. As we note later, the data collection
design underlying one’s measurement procedure has important implications for reliability estima-
tion, which, prior to the 1960s, was not integrated into measurement models. It was simply not the
focus of early measurement theory (Cronbach & Shavelson, 2004).

Expectation
A second key element of the operational definition of reliability offered above is the notion of
expectation. The purpose of estimating reliability is not to quantify the level of consistency in
scores among the sample of replicates that comprise one’s measurement procedure for a given
study (e.g., items, raters, tasks, or combinations thereof). Rather, the purpose is to use such
information to make inferences regarding (a) the consistency of scores resulting from our mea-
surement procedure as a whole with the population from which replicates comprising our mea-
surement procedure were drawn (e.g., the population of items, raters, tasks, or combinations
thereof believed to measure the construct of interest) and (b) the consistency of the said scores
for the population of individuals from which our sample of study participants was drawn. Thus,
the inference space of interest in reliability estimation is inherently multidimensional. As we
describe in subsequent sections, the utility of measurement theories is that they help us make
this inferential leap from sample to population; however, the quality with which estimation
approaches derived from these theories do so depend on the properties of scores arising from

3 A key exception here is Cronbach’s (1947) treatment of a coefficient of equivalence and stability.
Reliability and Validity 13

each replicate, characteristics of the construct one is attempting to measure, and characteristics
of the sample of one’s study participants.

Consistency and Inconsistency


Lastly, the third key element of the operational definition of reliability is notion of consistency in
scores arising from replicates. Defining reliability in terms of consistency of scores implies that
error, from the perspective of reliability, is anything that gives rise to inconsistency in scores.4
Conversely then, anything that gives rise to consistency in a set of scores, whether it is the construct
we intend to measure or some contaminate source of construct-irrelevant variation that is shared
or consistent across replicates, serves to delineate the “true” portion of an observed score from the
perspective of reliability. Indeed, this is one reason why investigators are quick to note that “true
score,” in the reliability sense of the word, is a bit of a misnomer for the uninitiated—it is not the
same as a person’s true standing on the construct of interest (Borsboom & Mellenbergh, 2002; Lord
& Novick, 1968; Lumsden, 1976). Thus, what may be considered a source of error from the perspec-
tive of validity may be considered true score from the perspective of reliability.
Although an appreciation of the distinction between true score from the perspective of reliability
and a person’s true standing on a construct can both readily be gleaned from the extant literature,
where there seems to be a bit more debate is with regard to the substantive properties of error. The
confusion in part stems from a disconnect between the operational definition of error outlined
above (i.e., inconsistency in scores across replicates) and hypotheses that measurement theories
make regarding the distributional properties of such inconsistencies—which may or may not reflect
reality. For example, in the sections above we made no claims with regard to whether inconsistency
in scores across replications reflected (a) unexplainable variation that would be pointless to attempt
to model, (b) explainable variation that could potentially be meaningfully modeled using exogenous
variables as predictors (i.e., measures other than our replicates), or (c) a combination of both of these
types of variation. Historically, many treatments of reliability, whether explicitly or implicitly, have
equated inconsistency in scores across replicates with “unpredictable” error (e.g., AERA, APA,
& NCME, 1999, p. 27). However, there is nothing in the operational definition of error laid out
above that necessitates inconsistencies in scores are unpredictable. Part of the confusion may lie
in the fact that we often conceive of replicates as having been randomly sampled from a broader
population(s) or are at least representative of some broader population(s) (Cronbach, Gleser, Nanda,
& Rajaratnam, 1972; Nunnally, 1978; Tryon, 1957). From a statistical perspective, the effects
associated with such replicates on scores would be considered random (Jackson & Brashers, 1994),
but this does not necessitate that variation in those effects is unexplainable or beyond meaningful
prediction—particularly when raters serve to define the replicates of interest (Cronbach et al., 1972;
Murphy & DeShon, 2000). Thus, one should be cautious when framing inconsistency in scores
as reflecting random errors of measurement because it is often confused with the notion that such
errors are beyond meaningful explanation (Ng, 1974).

Summary
In this section, we attempted to offer a model-free perspective on error and how it has come to be
operationally defined from the perspective of reliability. We adopted this strategy in part because

4 Actually, this is a bit of an overstatement. As alluded to in the opening paragraph, error, in any given situation, will be
partly dependent on the generalization(s) the investigator wishes to make regarding scores. In some cases, investigators
may choose not to treat a given source of inconsistency in scores as error. For example, this might occur in the context of
performance ratings where inconsistencies in job incumbents’ scores across different dimensions of performance may
be viewed as acceptable by the investigator (Scullen, Mount, & Goff, 2000). This example illustrates why the investiga-
tor is a critical part of defining error in any given situation. We will touch upon this topic again later when we discuss
generalizability theory.
14 Handbook of Employee Selection

of the confusion noted above, but also to bring balance to existing treatments of reliability in the
industrial-organizational (I-O) literature, which explicitly or implicitly tend to frame discussions of
reliability from the CTT tradition. The language historically used in treatments of CTT makes it
difficult for investigators to recognize that inconsistency in scores is not necessarily beyond mean-
ingful explanation, although we conceive of it as random. Another reason we belabor this point is
that beginning with Spearman (1904), a legacy of organizational research emerged that focuses
more on making adjustment for error in our measures (e.g., corrections for attenuation), rather than
developing methods for modeling and understanding error in our measures; which in part may
reflect our tendency to view such error as unexplainable. The implication of this legacy and poten-
tial for remediation in light of recent methodological advances are touched upon in the concluding
sections of our discussion of reliability.

Role of Measurement Models


The defining characteristic of a measurement model is that it specifies a statistical relationship
between observed scores and unobserved components of those scores. Such unobserved compo-
nents may reflect sources of consistency in scores (across replicates), whereas others may reflect
sources of inconsistency. As noted earlier, the utility of measurement models is that they offer a set
of hypotheses regarding the composition of observed scores, which, if supported, can allow us to
accurately estimate reliability (e.g., reliability coefficients, standard errors of measurement) from a
sample of data and apply those estimates to various problems (e.g., corrections for attenuation, con-
struction of score bands). To the extent that such hypotheses are not supported, faulty conclusions
regarding the reliability of scores may be drawn, inappropriate uses of the reliability information
may occur, and knowledge regarding inconsistencies in our scores may be underutilized. In this
section, we compare and contrast measurement models arising from two theories that underlie the
modern literature on reliability, namely CTT and generalizability theory (G-theory).5
The measurement models underlying CTT and G-theory actually share some important simi-
larities. For example, both (a) conceive of observed scores as being an additive function of true
score and error components and (b) assume that true score and error components are uncorrelated.
Nevertheless, as we discuss below, there are certain characteristics of the G-theory model that
enable it to be meaningfully applied to a much broader swath of measurement procedures that we
encounter in personnel selection relative to the CTT model. Rather than being competing mod-
els, it is now commonly acknowledged that the CTT model is simply a more restrictive, narrower
version of the G-theory model, which is why G-theory is generally viewed as a “liberalization”
of CTT (AERA, APA, & NCME, 1999; Brennan, 2006; Cronbach, Rajaratnam, & Gleser, 1963).
Nevertheless, given its relatively narrow focus, it is convenient for pedagogical purposes to open
with a brief discussion of CTT before turning to G-theory.

Classical Test Theory


Under classical test theory, the observed score (X) for a given person p that is produced by replicate
r of a measurement procedure is assumed to be a simple additive function of two parts: the person’s
true score (T) and an error score (E).
Xpr = Tp + Epr (2.1)

5 Readers may question the omission of item response theory (IRT) from the subsequent discussion. Like Brennan (2006),
we tend to view IRT as offering a “scaling” model rather than a “measurement” model because it does not have a built-in
explicit consideration of measurement error. Further, the focus of applications of IRT is often on estimation/scaling of a
latent trait, ability of interest, or calibration of item parameters rather than the isolation and quantification of measurement
error (see Brennan 2006, pp. 6-7). Although we are not downplaying the obvious importance of IRT for psychometrics and
personnel selection, we felt it was beyond the scope of this chapter to address IRT while still addressing reliability and valid-
ity as we have done herein. For a recent, parsimonious treatment of IRT, we refer readers to Yen and Fitzpatrick (2006).
Reliability and Validity 15

Conceptually, a person’s true score equals the expected value of their observed scores across an
infinite set of replications of the measurement procedure. Given that such an infinite set of rep-
lications is hypothetical, a person’s true score is unknowable; but, as it turns out, not necessarily
unestimatable (see Haertel, 2006, pp. 80–82). As noted earlier, true score represents the source(s) of
consistency in scores across replicates (note there is no “r” subscript on the true score component
in Equation 2.1)—in CTT it is assumed to be a constant for a given person across replicates. Error,
on the other hand, is something that varies from replicate to replicate, and CTT hypothesizes that
the mean error across the population of replicates for any given person will be zero. In addition to
these characteristics, if we look across persons, CTT hypothesizes that there will be (a) no correla-
tion between true and error score associated with a given replicate (r Tp.Ep1 = 0), (b) no correlation
between error scores from different replicates (r Ep1.Ep2 = 0), and (c) no correlation between error
scores from a given replicate and true scores from another replicate (r Ep1.Tp2 = 0). Although the CTT
score model does not necessitate that error scores from a given replicate (or composite of replicates)
be uncorrelated with scores from measures of other attributes, the latter is a key assumption under-
lying the use of reliability coefficients to correct observed correlations for attenuation (Spearman,
1910; Schmidt & Hunter, 1996). Essentially, this last assumption implies that inconsistency in a
measurement procedure will be unrelated to any external variables (i.e., variables other than our
replicates) and therefore beyond meaningful prediction. From basic statistics we know that the vari-
ance of the sum of two independent variables (such as T and E) will simply equal the sum of their
variances; thus, under CTT, observed score variance across persons for a given replicate is simply
the sum of true score variance and error variance.

σ2X = σ2T + σ2E (2.2)

As we detail later, reliability estimation attempts to estimate the ratio of σ2T over σ2T + σ2E, not for a
single replicate, but rather for a measurement procedure as a whole, which as noted earlier is often
conceived as being comprised of multiple replicates. Thus, reliability coefficients are often inter-
preted as the proportion of observed score variance attributable to true score variance, or alterna-
tively, the expected correlation between observed scores resulting from our measurement procedure
and scores that would be obtained had we based our measure on the full population of replicates of
interest (i.e., hypothetical true scores).
One of the key defining characteristics of CTT is the perspective it takes on replicates. Recall
that earlier we offered a very generic definition for what constitutes a replicate. We described how
we often conceive of parts or elements of a measurement procedure as replicates and use them to
estimate the reliability of scores produced by our procedure as a whole. As we note later, CTT-based
reliability estimation procedures assume that replicates have a certain degree of “parallelism.” For
example, for two replicates to be considered strictly (or classically) parallel they must: (a) produce
identical true scores for a given individual (i.e., Tp for Replicate A = Tp for Replicate B), (b) have iden-
tical mean observed scores, and (c) have identical error variances.6 The commonly used Spearman-
Brown prophecy formula is an example of a CTT-based estimation procedure that is based on the
assumption that replicates involved in its calculation are strictly parallel (Feldt & Brennan, 1989).
Often times it is not realistic to expect any two replicates to be strictly parallel. For example,
items on a test of cognitive ability are rarely of the same difficulty level, and raters judging
incumbents’ job performance often differ in their level of leniency/severity. Under such condi-
tions, item means (or rater means) would differ, and thus, such replicates would not be considered
strictly parallel. In recognition of this, CTT gradually relaxed its assumptions over the years to
accommodate the degrees of parallelism that are more likely to be seen in practice. The work of
Lord (1955), Lord and Novick (1968), and Joreskog (1971) lay out several degrees of parallelism,
which we briefly review below.

6 Given condition (a) and (c) such replicates will also have identical observed score variances.
16 Handbook of Employee Selection

Tau-equivalent replicates (a) produce identical true scores for a given individual and have (b)
identical mean observed scores, but may have different error variances (across persons) and as such
different observed variances. Essentially tau-equivalent replicates relax assumptions further, in that
they allow true scores produced by the replicates to differ by a constant (i.e., Tp for Replicate 1 = Tp
for Replicate 2 + C). As such, essential tau-equivalence accommodates the situation in which there
are mean differences across replicates (e.g., items differ in their difficulty, and raters differ in their
leniency/severity). The assumption of essential tau-equivalence underlies several types of coeffi-
cients commonly used in reliability estimation such as coefficient alpha, intraclass correlations, and
as we discuss in the next section, generalizability coefficients.7
One thing that may not be immediately obvious from the description of essential tau-equivalence
offered above is that it does not accommodate the situation in which replicates differ in true score
variance (across persons). Joreskog’s (1971) notion of congeneric test forms (or more generally, con-
generic replicates) accommodated this possibility. Specifically, the congeneric model allows true
scores produced by a given replicate to be a linear function of true scores from another replicate
(i.e., Tp for Replicate 1 = b × Tp for Replicate 2 + C). As we will illustrate in our later section on reli-
ability estimation, this accommodates the possibility that replicates may be differentially saturated
with true score variance or be measured on a different metric.
The degrees of parallelism discussed above have implications for estimating reliability; more
specifically, they have implications for the accuracy of results produced by reliability estimation
methods that we apply to any given set of replicates. As we discuss later, we can apply nearly
any reliability estimation method derived from the classical tradition to any sample of replicates,
regardless of their underlying properties; however, the estimate we get will differ in its accuracy
depending on (a) the extent to which the underlying properties of those replicates conform to the
assumptions above and (b) characteristics of the construct one is attempting to measure. It is beyond
the scope of chapter, and not its intent, to provide a catalog of coefficients that may be appropriate
for estimating the reliability depending on the degree of parallelism among the replicates of interest,
because excellent descriptions exist elsewhere in the literature (e.g., Feldt & Brennan, 1989, Table 3,
p. 115). However, in reviewing treatments such as the one offered by Feldt and Brennan (1989), be
cognizant that the myriad coefficients they review (including the commonly used Spearman-Brown
prophecy and coefficient alpha) were formulated to deal with scores arising from measurement
procedures in which (a) replicates were defined by a single facet (e.g., replicates reflect different
items or test parts) and (b) that facet was fully crossed with one’s objects of measurement (e.g., all
test takers are administered the same set of items, all test takers completed the same test on two
different occasions). As we will see below, application of classical reliability estimation methods
in cases in which replicates are multifaceted (e.g., replicates representing task-rater pairs) or cases
in which the design underlying one’s measurement procedure is not fully crossed is problematic
(Cronbach & Shavelson, 2004). The treatment of reliability for measurement procedures character-
ized by multifaceted replicates or involving noncrossed measurement designs leads naturally to the
introduction of G-theory.

Generalizability Theory
G-theory liberalizes CTT in that it has mechanisms within its score model for (a) dealing with
single-faceted and multifaceted replicates, (b) simultaneously differentiating and estimating mul-
tiple sources of error arising from different measurement facets (e.g., items, raters, occasions,
tasks), (c) dealing with scores produced by a wide variety of data collection designs (e.g., crossed,
nested, and ill-structured measurement designs), (d) adjusting the composition of true score and

7 Actually, this statement is a bit of a misnomer, because coefficient alpha and intraclass correlations simply represent
specific computational forms of a broader class of coefficients known as generalizability coefficients (Cronbach &
Shavelson, 2004).
Reliability and Validity 17

error depending on the generalizations one wishes to make regarding the scores, (e) adjusting the
composition of true score and error depending on how one intends to use the scores (e.g., for rela-
tive comparison among applicants or absolute comparison of their scores to some set standard),
and (f) relaxing some of the assumptions put on the distributional properties of true and error com-
ponents proscribed under CTT. The purpose of this section will be to elaborate these features of
G-theory model in a way that is relatively free from G-theory jargon—which has been cited as one
reason why the unifying perspective that G-theory offers on reliability has yet to be widely adopted
by organizational researchers (DeShon, 2002).
Perhaps the most visible way G-theory model liberalizes the CTT model is its ability to han-
dle measurement procedures comprised of multifaceted replicates. To illustrate this key differ-
ence between the G-theory and CTT models, let us first consider an example in which we have
observed scores based on ratings of job applicants’ responses to three interview questions designed
to assess interpersonal skill. Say that we had the same three raters interview each applicant and
that each rater asked applicants the same three questions (i.e., applicants, raters, and questions are
fully crossed). Thus, we have nine scores for each applicant—one for each of our nine “replicates,”
which in this case are defined by unique question-rater combinations. Under CTT and G-theory, we
might conceive of an applicant’s true score as the expected value of his/her observed score across
the population of replicates—in this case it is the population of raters and questions. However,
if we were to apply the CTT score model to such replicates it would break down because it does
not account for the fact that some replicates share a rater in common and other replicates share a
question in common. As such, the error associated with some replicates will be correlated across
applicants, therefore violating one of the key assumptions underlying CTT measurement model (i.e.,
errors associated with different replicates are uncorrelated). As we show below, the G-theory mea-
surement model permits the addition of terms to the model that account for the fact that replicates
are multifaceted. The insidious part of this illustration is that the situation above would not prevent
us from applying estimation methods derived from CTT to these data (e.g., calculating coefficient
alpha on the nine replicates). Rather, perhaps unbeknownst to the investigator, the method would
allocate error covariance among replicates that share a rater or question in common to true score
variance because they are a source of consistency across at least some of the replicates (Komaroff,
1997; Raykov, 2001a). That is, the CTT score model and commonly used coefficients derived from
it (e.g., coefficient alpha) are blind to the possibility of multifaceted replicates, which is a direct
reflection of the fact that early measurement theory primarily concerned itself with fully crossed,
single-faceted measurement designs (Cronbach & Shavelson, 2004).
To account for the potential that replicates can be multifaceted, G-theory formulates its measure-
ment model from a random-effects ANOVA perspective. Unlike CTT, which has its roots in the
correlational research tradition characteristic of Spearman and Pearson, G-theory has its roots in
the experimental research tradition characteristic of Fisher (1925). As such, G-theory is particularly
sensitive to dealing with replicates that are multifaceted in nature and both crossed and noncrossed
measurement designs. It has long been acknowledged that issues of measurement design have
been downplayed and overlooked in the correlational research tradition (Cattell, 1966; Cronbach,
1957), and this is clearly evident in reliability estimation approaches born out of CTT (Cronbach &
Shavelson, 2004; Feldt & Brennan, 1989). To ease into the G-theory measurement model, we start
with a simple example, one in which we assume observed scores are generated by a replicate of a
measurement procedure that is defined along only one facet of measurement. The observed score
(X) for a given person p that is produced by any given replicate defined by measurement facet “A”
(e.g., “A” might reflect items, occasions, raters, tasks, etc.) is assumed to be an additive function:

Xpa = b0 + up + ua + upa + epa (2.3)

where b0 is the grand mean score across persons and replicates of facet A; up is the main effect of
person p and conceptually the expected value of p’s score across the population of replicates of facet
18 Handbook of Employee Selection

A (i.e., the analogue of true score); ua represents the main effect of replicate a and conceptually is
the expected value of a’s score across the population of persons; upa represents the p × a interaction
effect and conceptually reflects differences of the rank ordering of persons across the population
of replicates of facet A; and lastly, epa is the residual error that conceptually is left over in Xpa after
accounting for the other score effects.8 As with common random-effects ANOVA assumptions,
these score effects are assumed to (a) have population means of zero, (b) be uncorrelated, and (c)
have variances of σ2P, σ2A, σ2B, σ2AB, and σ2Residual, respectively (Jackson & Brashers, 1994). It is the
latter variance components that are the focus of estimation efforts in G-theory, and they serve as
building blocks of reliability estimates derived by G-theory.
Of course, the example above is introduced primarily for pedagogical purposes; the real strength
of the random-effects formulation is that the model above is easily extended to measurement proce-
dures with multifaceted replicates (e.g., replicates that reflect question-rater pairs). For example, the
observed score (X) for a given person p that is produced by any given replicate defined by measure-
ment facets “A” and “B” (e.g., “A” might reflect questions and “B” might reflect raters) is assumed
to be an additive function.

Xpab = b0 + up + ua + ub + upa + upb + uab + upab + epab (2.4)

A key difference to point out between the models specified in Equations 2.3 and 2.4 is the
interpretation of the main effects for individuals. Once again, up is the main effect of person p, but
conceptually it is the expected value of p’s score across the population of replicates defined by fac-
ets A and B. Thus, although the up term in Equations 2.3 and 2.4 provides an analogue to the true
score, the substance of true scores differs depending on the nature of the population(s) of replicates
of interest. Extending this model beyond two facets (e.g., a situation in which replicates are defined
as a combination of questions, raters, and occasions) is straightforward and simply involves adding
main effect terms for the other facets and associated interaction terms (Brennan, 2001b).
One thing that is evident from the illustration of the G-theory model provided above is that,
unlike the CTT model, it is scalable; that is, it can expand or contract depending on the degree to
which replicates underlying a measurement procedure are faceted. Given its flexibility to expand
beyond simply a true and error component, the G-theory model potentially affords investigators
with several more components of variance to consider relative to the CTT model. For example,
using the interview example presented above, we could potentially decompose variance in interview
scores for applicant p on question q as rated by rater r into seven components.9

σ2X = σ2P + σ2Q + σ2R + σ2PQ + σ2PR + σ2QR + σ2PQR, Residual (2.5)

Recall from our discussion of CTT that the basic form reliability coefficients take on is σ2T/(σ2T +
σ2E). This fact begs the question, from the G-theory perspective, what sources of variance com-
prise σ2T and σ2E? As one might guess from the decomposition above, the G-theory model offers
researchers a great deal of flexibility when it comes time to specifying what constitutes error vari-
ance and true score variance in any given situation. As we will demonstrate below, having this flex-
ibility is of great value. As alluded to in our opening paragraph, the sources of variance in scores
that are considered to reflect error (and true score for that matter), can differ depending on (a) the

8 The highest order interaction term and the residual term in G-theory models are confounded because such designs essen-
tially amount to having one observation per cell. Thus, in practice, it is not possible to generate separate estimates of
variance in X attributable to these two effects.
9 Two notes here: First, as we will discuss in the following sections, one’s ability to estimate each of these components

will be limited by the measurement design underlying one’s measurement procedure. The example here assumes a fully
crossed deign, which will often not be the case in practice. Second, note that in Equation 2.5 we combine variance com-
ponents for the applicant-question-rater interaction and residual terms; this reflects the fact that these sources of variance
will not be uniquely estimable.
Reliability and Validity 19

generalizations an investigator wishes to make with regarding the scores; (b) how an investigator
intends to use the scores (e.g., for relative comparison among applicants or absolute comparison
of their scores to some set standard); and (c) characteristics of the data collection or measurement
design itself, which can limit an investigator’s ability to estimate various components of variance.
The idea of having flexibility of specifying what components of observed variance contribute to true
score and error is something that is beyond the CTT score model because it only partitions variance
into two components. In the following sections, we highlight how the G-theory model offers inves-
tigators flexibility for tailoring the composition of σ2T and σ2E to their situation.

Dependency of σ2T and σ2E on Desired Generalizations


The decision of what components of variance comprise σ2T and σ2E depend in part on the generaliza-
tions the investigator wishes to make based on the scores. To illustrate this, let us take the interview
example offered above and say that the investigator was interested in (a) generalizing scores from
his or her interview across the population of questions and raters and (b) using the scores to make
relative comparisons among applicants who completed the interview. In such a case, variance asso-
ciated with applicant main effects (σ2P) would comprise σ2T, and variance associated with interac-
tions between applicants and each type of measurement facet (i.e., applicant-question interaction
variance, σ2PQ; applicant-rater interaction variance, σ2PR; and applicant-question-rater interaction
variance and residual variance, σ2PQR,Residual) would comprise σ2E. The relative contribution of these
latter effects to error variance would be scaled according to the number of questions and raters
involved in the measurement procedure. As the number of questions increases, the contribution
of σ2PQ would go down (i.e., error associated with questions would be averaged away), and as the
number of raters increases, the contribution of σ2PR would go down (i.e., error associated with raters
would be averaged away). Specifically, the “generalizability” coefficient described above would be

σ 2P σT2
Eρ2 = =
 σ 2PQ σ 2 σ 2PQR , Residual  σT2 + σ E2
σ 2P +  + PR +  (2.6)
 nQ nR nQ nR 

where the term in brackets represents σ2E, nQ is the number of interview questions, nR is the number
of raters, and nQnR is the product of the number of questions and raters.10 Note that increasing the
number of questions and/or raters will result in decreasing that part of error associated with ques-
tions and/or raters, respectively. The idea that G-theory allows for the scaling of these effects as a
function of the number of questions and raters sampled is analogous to the role of the Spearman-
Brown prophecy in CTT, in which the number of replicates that comprise a measurement procedure
directly affects the estimated reliability of scores produced by that procedure (Feldt & Brennan,
1989). The key difference here is that G-theory allows one to differentiate and examine the effect
that adjusting the sampling of different types of facets has for reliability (e.g., separately adjusting
the number of questions and raters), whereas the Spearman-Brown prophecy does not allow such
differentiation to occur. As such, applying the Spearman-Brown prophecy to estimate what the reli-
ability of scores would be if the length of a measure is changed can greatly mislead investigators if
the replicates that comprise that measure are multifaceted (Feldt & Brennan, 1989).
To illustrate, let us take the interview example offered above and say σ2P = .50, σ2PQ = .30,
σ2PR = .10, and σ2PQR,Residual = .10. Recall our interview is comprised of three questions and three rat-
ers (i.e., nine question-rater pairs serve as replicates). Using Equation 2.6, the estimated reliability
of the average rating across questions and raters would be .78 (σ2T = .50, σ2E = .14). Now, if we were
to ask what effect “doubling the length of the interview” would have on reliability, and we used the

Although labeled as a “generalizability” coefficient, note that this formula provides an estimate of σ2T over σ2T + σ2E, and
10 

such may be considered an estimate of reliability.


20 Handbook of Employee Selection

Spearman-Brown prophecy (i.e., 2Eρ2/[1 + Eρ2]) to answer that question, we would achieve an esti-
mate of .88, which is analogous to what we achieve if we replaced nQ, nR, and nQnR in Equation 2.6
with 2nQ, 2nR, and 2nQ2nR. Note that the Spearman-Brown prophecy does not provide the estimated
reliability for 18 question-rater pairs (i.e., double the existing number of replicates), but rather an
estimated reliability for 36 question-rater pairs (i.e., six questions × six raters). As such, in this case,
the Spearman-Brown formula gives us an estimated reliability if the effective length of the inter-
view were quadrupled rather than doubled. Another shortcoming of the Spearman-Brown formula
is that it fails to account for the fact that there are multiple ways one can effectively double the length
of the interview, each of which may produce a different reliability estimate. For example, we can
have 2 questions and 9 raters, which would give us 18 question-rater pairs and result in an average
rating reliability of .75 on the basis of Equation 2.6. Alternatively, we can have 9 questions and 2
raters, which would also give us 18 question-rater pairs but result in an average rating reliability of
.85 on the basis of Equation 2.6. Essentially there is no mechanism within the Spearman–Brown
formula that accounts for the fact that facets may differentially contribute to error. As this example
illustrates, making adjustments to the number of levels sampled for one facet (e.g., questions in this
case) may have a much more profound effect on error than making adjustments to the number of
levels sampled for other facets (e.g. raters) included in the design.
Returning to our discussion of the dependency of σ2T and σ2E on the generalizations one wishes
to make regarding their scores, let us now say that a different investigator uses the same interview
procedure described above, but instead only wished to generalize scores from the procedure across
the population of raters. For example, this might be the case if the investigator feels that the ques-
tions get at different parts of the interpersonal skill construct, and as such does not wish to treat
inconsistency in scores across questions (for a given applicant) as error. In such a case, variance
associated with applicant main effects (σ2P) and a function of applicant-question interaction effects
(σ2PQ) would comprise σ2T, and variance associated with interactions between applicants and rat-
ers (σ2PR) and the applicant-rater-questions along with residual error applicant (σ2PQR,Residual) would
comprise σ2E (Brennan, 2001b; DeShon, 2002). In this situation, what the investigator is essen-
tially doing is examining the consistency of scores across raters on the basis of ratings that have
been averaged across the three interview questions—in G-theory this is known as fixing a facet of
measurement.11

Dependency of σ2T and σ2E on Intended Use of Scores


Slightly varying the example above allows for illustration of the implications of how an investigator
intends on using scores for the sources of variance that contribute to σ2T and σ2E. For example, let
us say the interview above was conducted to determine if applicants met some minimum level of
interpersonal skill. That is, rather than comparing applicants against one another, the interest is in
comparing their scores to some standard of interpersonal skill. Also, let us return to the original
example in which the investigator was interested in generalizing scores across the population of
questions and raters. In this case, variance due to the main effects of questions and raters, as well
as their interaction (i.e., σ2Q, σ2R, σ2QR), would contribute σ2E (in addition to sources identified ear-
lier, σ2PQ, σ2PR, σ2PQR,Residual) because they influence the absolute magnitude of the score any given
applicant receives. In the example from the previous paragraphs in which we were only interested
in using scores to make relative comparisons among applicants, these effects did not contribute to
error because they have no bearing on how applicants were rank ordered (i.e., question and rater
main effects are constants across applicants for designs in which questions and raters are fully
crossed with applicants). The potential for such effects to contribute to σ2E in crossed designs (as

Note the idea of fixing a facet of measurement for purposes of estimating σ2T and σ2E in the context of G-theory is
11 

different from modeling a factor or covariate as fixed in the context of mixed-effects models (DeShon, 2002; Searle,
Casella, & McCulloch, 1992).
Reliability and Validity 21

they do in this example) is not addressed by CTT, because it is simply beyond the scope of the CTT
model to handle error of that type (Cronbach & Shavelson, 2004).

Dependency of σ2T and σ2E on Characteristics of the Measurement Procedure


Critics may argue that the interview examples offered above do not reflect the reality of measure-
ment designs faced in applied organizational research and practice. Such critics would be right.
Rarely, if ever, are the measurement designs involving ratings that we confront in the applied orga-
nizational research and practice fully crossed. Often times, when we are fortunate to have two or
more raters for each ratee, the orientation of raters to ratees is what Putka, Le, McCloy, and Diaz
(2008) have termed, ill-structured.12 Specifically, the sets of raters that rate each ratee are neither
identical (indicative of a fully crossed design) nor completely unique (indicative of a design in
which raters are nested with ratees); rather, each ratee is rated by a set of raters that may vary in
their degree of overlap. The implications of the departure of measurement designs from the fully
crossed ideal is that it can limit our ability to uniquely estimate the components of variance that
underlie observed scores (e.g., those illustrated in Equation 2.5), which in turn limits our flexibility
for choosing which components contribute σ2T and σ2E. To illustrate this, we consider a few variants
on the interview example above.
Say that instead of having three raters rate each applicant on each interview question, a different
nonoverlapping set of three raters rates each applicant (i.e., raters are nested within applicants). In
this case, rater main effect variance (σ2R) and applicant-rater interaction effect variance (σ2PR) would
be inseparable, and both will contribute to σ2E regardless of whether the investigator was interested
in using the scores simply to rank order applicants or compare applicants’ scores to some fixed
standard (McGraw & Wong, 1996; Shrout & Fleiss, 1979). However, often in practice we are not
dealt such nested designs—the sets of raters that may rate each ratee tend to vary in their degree of
overlap. Although less “clean” than the aforementioned nested design, having some degree of over-
lap actually gives us an opportunity to uniquely estimate σ2R and σ2PR (as we are able to do in a fully
crossed design) (Putka et al., 2008). Nevertheless, as was the case with the nested design, σ2R and
σ2PR will contribute to σ2E —because the raters that rate each ratee are not identical, σ2R and σ2PR will
affect the rank ordering of ratees’ scores (Schmidt et al., 2000). However, unlike the nested design,
the contribution of σ2R to σ2E will be dependent on the amount of overlap between the sets of raters
that rate each ratee—a subtlety not widely known but pertinent to many organizational researchers
who work with ratings (Putka et al., 2008).
Lastly, we use our interview example one more time to provide a critical insight offered by
G-theory—the notion of hidden measurement facets and their implications for interpreting the sub-
stantive nature of σ2T and σ2E. In laying out the interview example above, it was implicit that raters
conducted interviews on separate occasions. However, a more common situation might be that raters
sit on a panel, and as such the three questions are asked of a given applicant on the same occasion.
In either case, we have measurement procedures with designs that are “notationally” identical (i.e.,
applicants × questions × raters); however, the variance components underlying scores produced by
these interview procedures have different substantive meanings. If each rater conducted a separate
interview, variance attributable to the applicant-rater interaction (σ2PR) would also reflect applicant-
occasion variance (σ2PO). In other words, σ2PR would not only reflect inconsistencies in raters’ rank-
ordering of applicants, but also inconsistency in the applicants’ responses across the occasions on
which the interviews were conducted. If the applicant participated in the panel interview, raters
would be rating the applicants’ responses on the same occasion, and as such variance attributable to

Another common ratings design faced in practice (particularly with job performance ratings) is one in which ratees are
12 

nested with raters (e.g., each group of incumbents is rated by their respective group supervisor). In this case, each ratee
has only one rater, and as such there is no way to distinguish between the σ2PR (typically considered a source of error)
and σ2P (typically considered true score variance). Thus, estimating inter-rater reliability on the basis of data structured
in this manner is not possible.
22 Handbook of Employee Selection

the applicant-rater interaction (σ2PR) would be just that, but variance attributable to applicant main
effects (σ2P) would also reflect applicant-occasion variance (σ2PO). This stems from the fact that
raters are observing a given applicant on the same occasion, and as such occasion of measurement
serves as a source of consistency in raters’ ratings that would not be present if raters conducted
separate interviews. In both of the examples above, σ2PO is not separable from the other source of
variance with which it is confounded. In the case of separate interviews, raters covary with occa-
sions; in the case of panel interviews, occasions are not replicated for a given applicant. Thus, these
examples illustrate how a measurement facet can hide in different ways to influence the substantive
meaning of σ2E (in the case of the separate interview) and σ2T (in the case of the panel interviews).
The examples above also serve to illustrate an important point—just because we cannot isolate
or estimate a source of variance underlying observed scores does not mean those sources of vari-
ance are not present and influencing our scores (Brennan, 2001b; DeShon, 1998; Feldt & Brennan,
1989; Schmidt & Hunter, 1996). Indeed, it is interesting to take the concept of hidden facets and
use them to frame some common measurement issues in personnel selection. For example, the
magnitude of person-rater interaction variance (σ2PR) in job performance ratings has been found to
be quite large (e.g., Scullen et al., 2000; Schmidt et al., 2000). However, if raters are viewing the
performance of individuals on (a) different occasions, (b) different tasks, and/or (c) different tasks
on different occasions, then part of what we typically label person-rater interaction variance may
actually also reflect several other sources of variance (e.g., person-occasion interaction variance,
person-task interaction variance, and person-task-occasion interaction variance). In other words,
the hidden facets of occasion and task might help explain the sizable person-rater interaction effects
often found in job performance ratings. In the context of assessment centers, hidden facets might
partially explain the common finding of the dominance of exercise effects over dimension effects
(Lance, 2008). For example, dimensions within exercises share an occasion of measurement in
common (and sometimes share raters as well), whereas dimensions in different exercises do not.
As such, all else being equal we would expect scores for dimensions within exercises to be more
consistent with each other than scores for dimensions in different exercises. Thus, what is inter-
preted as an exercise effect in the context of assessment center ratings may partially be explained by
hidden occasion and rater facets of measurement that serve to increase consistency among dimen-
sion scores within exercises relative to dimension scores across exercises (e.g., Cronbach, Linn,
Brennan, & Haertel, 1997). These examples illustrate the potential utility of framing common mea-
surement issues through the lens of hidden facets illuminated by G-theory.

Summary
In this section, we described perspectives on observed scores adopted by two measurement theories
that dominate current discussions of reliability. Through a single example, we illustrated the many
ways in which G-theory liberalizes not only the score model offered by CTT, but also the very per-
spective it offers on reliability. By no means did the discussion above fully illustrate how G-theory
is applied or how reliability coefficients based on G-theory are calculated. For such details, the
reader is referred to other treatments (Brennan, 2001b; DeShon, 2002; Haertel, 2006; Shavelson
& Webb, 1991). Nevertheless, given space constraints, this was not our intent. Rather, we tried, in
a way that was relatively free of G-theory jargon, to show how G-theory offers a way for framing
and dealing with measurement situations that CTT was designed to handle, as well as those that
CTT was never really designed to handle—a key reason why G-theory currently underlies modern
perspectives on reliability (Cronbach & Shavelson, 2004).

Estimation of Reliability
In the sections above, we have outlined conceptual and model-based perspectives on reliability
and measurement error. In this section, we address how these concepts and models translate into
Reliability and Validity 23

methods for estimating reliability. Reliability is often summarized in terms of (a) coefficients rang-
ing from 0 to 1 or (b) standard errors of measurement (SEMs) expressed in a raw score metric.
Our focus is on the former, partly out of page limits and partly because the latter can typically be
calculated from components of the former.13 As noted earlier, under CTT and G-theory the goal of
reliability estimation is to estimate the ratio σ2T / (σ2T + σ2E). The following sections discuss methods
for estimating this ratio and components of it. Our intent here is not to provide a catalog of different
types of reliability coefficients, nor is our intent to provide a cookbook on how to estimate reliability
in any given situation. Indeed, as should be clear from the previous section, doing so would not be
fruitful given that the composition of σ2T and σ2E in any given situation partly reflects the aims of
the individual investigator. Rather, we focus on comparing and contrasting different historical tradi-
tions on estimating reliability, examining the pros and cons of each, and speaking to their equifinal-
ity under certain conditions.
The extant literature on reliability estimation is characterized by a multitude of loosely orga-
nized coefficients and estimation methods. Historically, the psychometric literature tended to
organize discussions of reliability estimation in terms of categories or types of reliability (e.g.,
test-retest reliability, split-half, parallel-forms, coefficients of equivalence, stability, precision;
Cronbach, 1947; Gulliksen, 1950). With the advent of G-theory, psychometricians have slowly
gravitated away from categories or types of coefficients that characterized early test theory
because “the categories may now be seen as special cases of a more general classification, gen-
eralizability coefficients” (AERA et al., 1999, p. 27). As Campbell (1976) noted, the G-theory
model “removes the somewhat arbitrary distinctions among coefficients of stability, equivalence,
and internal consistency and replaces them with a general continuum of representativeness”
(p. 202). Interestingly, this movement toward a unitarian perspective on reliability has temporally
coincided with the movement from trinitarian to unitarian perspectives on validity (Brennan,
2006). Ironically, unlike our views on validity, our formal treatments of reliability estimation in
organizational research have remained focused on categories or types of reliability coefficients
(e.g., Aguinis, Henle, & Ostroff, 2001; Guion, 1998; Le & Putka, 2007; Ployhart, Schnider, &
Schmitt, 2006; Schmidt & Hunter, 1996).14 Rather than continuing to bemoan the current state
of affairs, we offer an alternative way of framing discussions of estimating reliability that may
help bring organizational research, practice, and pedagogy more in line with modern psychomet-
ric thought. Before doing so, we offer a quick example to help illustrate the rationale behind the
structure we offer below.
When calculating existing types of reliability coefficients, such as a simple Pearson correlation
calculated between two replicates (Brown, 1910; Spearman, 1910), coefficient alpha (Cronbach,
1951), or intraclass correlation (ICC; Shrout & Fleiss, 1979)—with which most investigators are
familiar, it is important to remember that these are just sets of mathematical operations that can be
applied to any set of replicates of our choosing (e.g., raters, items, tasks, occasions). They will all
produce, to varying degrees of quality (depending on the properties of the underlying data and con-
struct being measured), estimates of the ratio σ2T/(σ2T + σ2E). As noted above, the substantive mean-
ing of σ2T and σ2E will depend in large part on the types of replicates to which the mathematical
operations are applied. For example, if we apply them to replicates defined as items, σ2T will reflect
consistency across items; if we apply them to replicates defined as occasions, σ2T will reflect consis-

We refer the interested readers to Brennan (1998), Haertel (2006), and Qualls-Payne (1992) for modern treatments of
13 

SEMs in the context of CTT and G-theory. One advantage of SEMs over reliability coefficients is that they can be tai-
lored to individuals being measured (e.g., differential amounts of error depending on individuals’ level of true score),
whereas reliability coefficients are typically associated with groups of individuals. The latter is often cited as one benefit
of IRT-based perspectives on measurement over CTT- and G-theory-based perspectives; however, CTT and G-theory
also offer methods for generating individual-level SEMs (Haertel, 2006).
14 Our
 speculation on why this occurred is (a) the perceived complexity and jargon-loaded nature of G-theory (DeShon,
2002), (b) the overarching dominance of the correlational research tradition underlying selection research and practice
(Cronbach, 1957; Dunnette, 1966; Guion, 1998), and (c) the steady decline of teaching psychometrics and statistics in our
graduate programs since the 1970s (Aiken et al., 2008; Merenda, 2007).
24 Handbook of Employee Selection

tency across occasions; if we apply them to replicates defined as raters, σ2T will reflect consistency
across raters; and so on and so forth.15 Unfortunately, our literature has a tendency to associate
certain types of coefficients with certain types of replicates (e.g., coefficient alpha with items, ICCs
with raters). This is unfortunate and misleading, because this simply reflects the type of replicate
with which these procedures happened to be introduced by earlier authors. Computationally, the
procedures are blind to the types of replicates to which they are applied, and many are algebraically
identical (Cronbach & Shavelson, 2004; Feldt & Brennan, 1989). For example, alpha is a specific
type of ICC, and all ICCs can be framed as generalizability coefficients (Brennan, 2001b; McGraw
& Wong, 1996). We organize our discussion below around three traditions for estimating reliability.
The classical tradition largely attempts to estimate reliability directly, with little attention toward
estimating components of it. More modern traditions (e.g., those based on random-effects models
and CFA models) attempt to generate estimates of σ2T and σ2E, or components of them, which gives
investigators flexibility to combine components in different ways to calculate reliability estimates
appropriate for their situation and achieve a better understanding of the sources of error (and true
score) in their measures.

Classical Tradition
This classical tradition has its roots in using Pearson correlation between two replicates (e.g., split-
halves of a single test, tests administered on two different occasions) to estimate reliability (Brown,
1910; Spearman, 1910). It is based on the premise that the correlation between two strictly paral-
lel replicates (e.g., split-halves of a test, the same test administered on two occasions) equals the
proportion of observed score variance attributable to true scores from a single replicate. If applied
to split-halves of a test, the Spearman-Brown prophecy formula would then be used to “step-up”
the said correlation to arrive at an estimate of reliability for scores produced by the full test. The
primary strength of estimation methods based on this tradition is their simplicity and widespread
familiarity. Pearson correlations are easy to calculate and widely used in selection research and
practice (Schmidt & Hunter, 1996).
Early psychometricians realized that the Spearman-Brown approach described above becomes
unwieldy in situations dealing with more than two replicates (e.g., a 10-item conscientiousness
scale). Specifically, they realized that depending on which split-halves of their test they calculated
their correlation on, they would get a different estimate of reliability (Kuder & Richardson, 1937).
In light of this difficulty, researchers developed alternative approaches to estimating reliability
that were a function of replicate variances (e.g., item variances) and observed score variances
(e.g., variance of the full test). These approaches provided a computationally simple solution
that could easily accommodate measures involving two or more single-faceted replicates and are
reflected in Kuder and Richardson’s (1937) KR-20, Guttman’s (1945) set of lambda coefficients,
and Cronbach’s coefficient alpha (Cronbach, 1951).16, 17 Another positive characteristic of these
latter approaches relative to the Spearman-Brown prophecy is that they only necessitate replicates
be essentially tau-equivalent, as opposed to strictly parallel (Novick & Lewis, 1967), although
subsequent research has found that alpha is robust to violations of essential tau-­equivalence
(Haertel, 2006).

15 Given the discussion raised earlier, σ2T in any of these examples may also reflect variance attributable to one or more
hidden facets of measurement.
16 On
 a historical note, Cronbach did not invent coefficient alpha per se—Guttman’s (1945) L3 coefficient and Hoyt’s (1941)
coefficient are algebraically identical to alpha and were introduced long before Cronbach’s (1951) landmark article.
17 We should note that a subtle difference between Pearson r-based indices of reliability and those noted here (i.e., KR-20,

Gutman’s lamdas, alpha) is that the latter assess the additive relationship between replicates, whereas Pearson r assesses
the linear relationship between replicates. Differences in the variances of replicates will reduce alpha and other additive
reliability indices, but they will have no effect on Pearson r-based indices because the latter standardizes any variance
differences between replicates away (McGraw & Wong, 1996).
Reliability and Validity 25

Unfortunately, all of the classical estimation approaches described above, from Spearman-Brown
through coefficient alpha, are limited in some important ways. As noted earlier, the CTT model on
which these coefficients are based was developed for use with measurement procedures involving
single-faceted replicates that were fully crossed with one’s objects of measurement (Cronbach &
Shavelson, 2004). The simplicity of calculating a Pearson r, the Spearman-Brown prophecy, and
alpha belies interpretational and statistical problems that arise if one attempts to apply them to
replicates that are (a) not fully crossed with one’s objects of measurement or (b) multifaceted in
nature. As Cronbach and Shavelson (2004) noted in their discussion of the possibility of applying
alpha to replicates that are not fully crossed, “Mathematically, it is easy enough to substitute scores
from a nested sample matrix by simply taking the score listed first for each (person) as belonging in
Column 1, but this is not the appropriate analysis” (p. 400). Nevertheless, application of such clas-
sical estimation methods, regardless of a procedure’s underlying design, has been common practice
in organizational research (Viswesvaran, Schmidt, & Ones, 2005).
To illustrate the problems that arise when classical estimation methods are applied to measure-
ment designs that are not fully crossed, consider an example in which job incumbents are each
rated by two raters on their job performance. Some incumbents may share one or more raters in
common, whereas others may share no raters in common. In this case, standard practice is to (a)
randomly treat one rater for each ratee as “rater 1” and the other as “rater 2,” (b) assign the rat-
ings of “rater 1” to column 1 and the ratings of “rater 2” to column 2 in a data set, (c) calculate the
Pearson correlation between columns to estimate the reliability of a single-rater’s ratings, and then
(d) use the Spearman–Brown prophecy on the said correlation to estimate the reliability for the
average rating (Viswesvaran et al., 2005). Putka et al. (2008) elaborated on several problems with
this common practice, namely (a) the estimates derived from this process can differ depending on
the assignment of raters to columns 1 and 2 for each ratee; (b) Pearson r fails to account for the fact
that residual errors are nonindependent for ratees that share one or more raters in common, which
leads to a downward bias in estimated true score variance (σ2T) (Kenny & Judd, 1986); and (c) the
Spearman–Brown prophecy inappropriately scales the contribution of rater main effect variance to
error variance (σ2E) as a function of the number of raters per ratee, rather than the amount of over-
lap between sets of raters that rate each ratee, leading to an overestimate of σ2E (see also, Brennan,
2001b, p. 236). In addition to these points, Putka and his colleagues offer a solution for dealing
with this type of design that is based on the random-effects model tradition of estimating reliability
(discussed later).
Second, with regard to the problem of applying classical methods to multifaceted replicates, the
task-rater example presented earlier clearly showed the hazards of blindly applying alpha to repli-
cates of such nature. However, it would be fallacy to suggest that investigators who adopt classical
methods would actually apply alpha or other classical methods in such a manner. Indeed, early
psychometricians seemed acutely aware of the limitations of the CTT model, and calculating dif-
ferent types of coefficients was the way early psychometricians attempted to deal with the inability
of the CTT model to account for multifaceted replicates. For example, Cronbach (1947) discussed
the coefficient of equivalence and stability (CES), which was calculated by correlating two different
forms of a measure completed by the same respondents on two different occasions (i.e., replicates
defined by form-occasion combinations). Cronbach later realized that emergence of the G-theory
score model in the 1960s eliminated the need to “mix and match” pairs of replicates like this and
provided a generalized solution that applied regardless of whether one was dealing with single-
faceted or multifaceted replicates and regardless of whether one was dealing with crossed or non-
crossed designs (Cronbach & Shavelson, 2004).
Although the tradition of using coefficients such as CES to deal with multifaceted replicates
has faded in psychometrics, it has continued to characterize organizational research and practice,
because we have continued to frame problems of reliability in a way that, for better or worse,
resembles the psychometric literature of the 1940s. For example, Schmidt and his colleagues have
demonstrated how, in the context of fully crossed designs, one can calibrate different sources of
26 Handbook of Employee Selection

error in scores (e.g., error arising from inconsistencies across items, occasions, raters, etc.) through
the addition and subtraction of Pearson correlations applied to different types of replicates (Schmidt
et al., 2000). Indeed, for fully crossed designs, Schmidt and others illustrated how one can arrive
at estimates for at least some of the variance components estimable based on the random-effects
model underlying G-theory (Schmidt, Le, & Ilies, 2003; Le, Schmidt, & Putka, 2009). However,
it is important to note that the calibration methods based on the classical coefficients alluded to
above will not be able to estimate all components of variance that a given measurement design
may support estimating—even if the design is fully crossed. For example, such methods cannot
be used to estimate the unique contribution of facet main effects (e.g., rater main effects, question
main effects) or interactions among facets (e.g., question-rater effects). Lacking this flexibility is
unfortunate, particularly if one is interested in (a) comparing scores to standards (e.g., cut-off score)
rather than simply making relative comparisons among individuals or (b) simply gaining a more
comprehensive understanding of the sources of variance underlying scores. Remember that the
CTT score model that gave rise to the classical coefficients discussed above was never designed to
account for main effects of measurement facets, largely because they were assumed not to exist (e.g.,
recall parallel measures have equal means) and because they were not of interest in the problems
that Spearman and other early psychometric researchers concerned themselves with (Cronbach &
Shavelson, 2004).

Random-Effects Model Tradition


If one has a measurement procedure involving multifaceted replicates, or the design that under-
lies the procedure is something other than fully crossed, a natural choice for estimating reliabil-
ity is based on variance components generated by fitting a random-effects model to one’s data
(Jackson & Brashers, 1994; Searle et al., 1992). The modern random-effects model has its root
in the work of Fisher’s early work on the ANOVA model and ICCs (Fisher, 1925). Work by Hoyt
(1941) and Ebel (1951) provided early examples of using the ANOVA framework for estimating
reliability for single-faceted replicates. Of particular note was Ebel’s (1951) work on ratings in
which he dealt with crossed and nested measurement designs. This early work branched in two
directions, one that manifested itself in today’s literature on ICCs (e.g., McGraw & Wong, 1996;
Shrout & Fleiss, 1979), and the other developed into G-theory (Cronbach et al., 1972). Although
rarely acknowledged in the ICC literature on reliability estimation, G-theory encompasses that
literature. ICCs and reliability coefficients produced under G-theory (i.e., G-coefficients) are
nothing more than ratios of variance components; for example, σ2T/(σ2T + σ2E). G-theory simply
acknowledges that these ICCs can take on many more forms than those discussed by McGraw
and Wong (1996) and Shrout and Fleiss (1979), and, per our earlier discussion on G-theory,
offers a comprehensive framework for constructing a reliability estimate that is appropriate
given one’s situation.
As alluded to in our earlier treatment of G-theory, when it was originally developed, the primary
approach of estimating variance components that contributed to σ2T and σ2E was the random-effects
ANOVA model. This same approach to estimating variance components underlies the modern
literature on ICCs (e.g., McGraw & Wong, 1996). Unfortunately, estimating variance components
using ANOVA-based procedures can be an arduous process that until recently and without highly
specialized software involved numerous manipulations of the sums of squares resulting from
ANOVA tables (e.g., Cronbach et al., 1972; Shavelson & Webb, 1991). Relative to calculating
coefficients arising from the classical tradition, the difference in simplicity of estimating reliability
could be substantial. Indeed, this may be a large reason why G-theory never gained traction among
organizational researchers. However, since the 1960s there have been several advances in random-
effects models that have made estimation of variance components much simpler and resolved
many problems associated with ANOVA-based estimators of variance components (DeShon, 1995;
Marcoulides, 1990; Searle et al., 1992). Unfortunately, this knowledge has been slow to disseminate
Reliability and Validity 27

into the psychometric and I-O literature, because many still seem to equate G-theory with ANOVA-
based variance component estimation procedures that characterized G-theory upon its introduction
to the literature.
Procedures for the direct estimation of variance components that underlie all reliability coef-
ficients are now widely available in common statistical packages (e.g., SAS, SPSS) and allow inves-
tigators to estimate variance components with a few clicks of a button. DeShon (2002) and Putka
and McCloy (2008) provided clear examples of the ease with which variance components can
be estimated within SAS and SPSS. As such, modern methods of variance component estima-
tion are far easier to implement than (a) procedures characteristic of the early G-theory literature
and (b) the calibration techniques discussed by Schmidt et al. (2000), which would require an
investigator to engage in a series of manipulations with various types of coefficients arising out
of the classical tradition. In addition to offering parsimony, modern methods of variance compo-
nent estimation have another key advantage: they can readily deal with missing data and unbal-
anced designs characteristic of organizational research (DeShon, 1995; Greguras & Robie, 1998;
Marcoulides, 1990; Putka et al., 2008). In contrast, ANOVA-based variance component estima-
tors characteristic of the early G-theory literature are not well equipped to handle such messy
designs. Indeed, when confronted with such designs, advocates of G-theory have often suggested
discarding data to achieve a balanced design for purposes of estimating variance components (e.g.,
Shavelson & Webb, 1991)—with modern methods of variance component estimation, the need
for such drastic steps has subsided. The most notable drawback of modem methods of variance
component ­estimation—largely based on full or restricted maximum likelihood—is that they can
involve rather substantial memory requirements for large measurement designs (Bell, 1985; Littell,
Milliken, Stroup, & Wolfinger, 1996). In some cases, such requirements may outstrip the memory
that Windows-based desktop computers can currently allocate to programs for estimating variance
components (e.g., SAS and SPSS).
The strengths of reliability estimation methods based on the random-effects model tradition
relative to the classical tradition are substantial. First, they fully encompass classical methods in
that they can be used to estimate reliability for measurement procedures involving single-faceted
replicates that are fully crossed with one’s object of measurement. Second, unlike classical meth-
ods, they can easily be used to formulate reliability estimates for measurement procedures involving
multifaceted replicates in which the facets are crossed, nested, or any combination thereof. Third,
the random-effects tradition provides investigators not only with coefficients, but also the variance
components that underlie them. As Cronbach and Shavelson (2004) stated, “Coefficients (reliability)
are a crude device that do not bring to the surface many subtleties implied by variance components”
(p. 394). Variance components allow researchers to get a much finer appreciation of what com-
prises error than simply having one omnibus estimate of error. Readers interested in learning more
about formulation of reliability estimates via variance components estimated by random-effects
models—or more generally, G-theory—are referred to DeShon (2002), Haertel (2006), Putka et al.
(2008), and Shavelson and Webb (1991). For a more thorough technical presentation, one should
consult Brennan (2001b).

Confirmatory Factor Analytic Tradition


Despite the fact that G-theory is often espoused as a conceptual centerpiece of modern psycho-
metrics (along with IRT), it is important to separate the conceptual perspective G-theory offers
on reliability from the estimation methods (random-effects models) it proscribes. Such a distinc-
tion is important because although the conceptual perspective offered by G-theory can serve
as a parsimonious way to frame the problem of building a reliability coefficient appropriate for
one’s situation (regardless of whether one uses classical methods, random-effects methods, or
CFA methods to derive estimates of such coefficients), the random-effects model that undergirds
G-theory and classical methods of estimating reliability share a key drawback. Specifically,
28 Handbook of Employee Selection

they offer no clear mechanism for (a) testing or dealing with violations of CTT and G-theory
measurement model assumptions and (b) specifying or testing alternative factorial composi-
tions of true score—both of which have fundamental implications for the interpretation of reli-
ability estimates.18 It is in this regard that CFA approaches to estimating reliability are strong
(McDonald, 1999).
Unlike reliability estimation approaches born out of the classical and random-effects traditions,
CFA-based approaches force investigators to be specific about the substantive nature of the latent
structure underlying their replicates (indicators, in CFA terms). For example, CFA forces them to
face questions such as:

• Is the covariance shared among replicates (i.e., true score variance from the perspective of
classical and random-effects approaches) accounted for by a single latent true score factor
or multiple latent factors?
• Do indicators of the latent true score factor(s) load equally on that/those factor(s) (i.e., are
they at least essentially tau-equivalent) or is their heterogeneity in factor loadings (i.e., sug-
gesting they are not at least essentially tau-equivalent)?
• What proportion of true score variance (as defined in CTT and G-theory) reflects the
effects of a single latent factor, as opposed to residual covariances?

Although such questions have implications for reliability estimates arising from the classical and
random-effect traditions, neither of these traditions has a built-in mechanism for addressing them.
In essence, they ascribe all shared covariance among replicates to a latent entity (e.g., true score)
regardless of whether it stems from a single factor or multiple factors. Thus, in some ways CFA can
be seen as a way of clarifying the factorial composition of true score variance as conceived by CTT
and G-theory measurement models. One may argue that such clarification is more an issue of valid-
ity rather than reliability (e.g., Schmidt et al., 2000); however, as we discuss below, the dimensional-
ity of the focal construct of interest has implications for the accuracy of reliability estimates based
on the classical and random-effects traditions (Lee & Frisbie, 1999; Rae, 2007; Rogers, Schmitt, &
Mullins, 2002).
The CFA tradition of reliability estimation arose out of Joreskog’s (1971) work on the notion
of congeneric tests discussed earlier. To illustrate, consider a situation in which we administer a
10-item measure of agreeableness to a sample of job applicants. In this case, our replicates are
single-faceted and defined in terms of items, and those items are fully crossed with our objects of
measurement—applicants. From the CFA perspective, we might view the replicates as indicators of
a latent factor representing true score, and then fit a model to the data such that the variance of the
latent factor is set to one, and the factor loadings and unique variances are freely estimated. On the
basis of such a model, the estimated reliability of the sum of the k replicates can be obtained via
2
 k 
 ∑ λ i  σ̂T2
i =1
ω = =
 k  k
2
σˆ T2 + σˆ E2 (2.7)
 ∑ λ i  + ∑ θii
i =1 i =1

where λi represents the estimated factor loading for the ith of k replicates, and θii represents the esti-
mated unique variance for the ith replicate (McDonald, 1999; Reuterberg & Gustafsson, 1992).19 As

One potential caveat to this regards the fairly recent ability of SAS and SPSS to fit random-effects models that allow for
18 

heterogeneity in variance component estimates (e.g., Littell et al., 1996; SPSS Inc., 2005). Such capabilities might be
leveraged test parallelism assumptions underlying the CTT and G-theory score models.
19 McDonald (1999) refers to this coefficient as “omega” (p. 89).
Reliability and Validity 29

with the application of classical and random-effects approaches to reliability, the substantive mean-
ing of σ2T and σ2E based on this formulation will differ depending on the type of replicates to which
they are applied (e.g., items, raters, occasions, tasks, etc.). Thus, if applied to replicates defined by
items, the estimate of σ2T provided by the squared sum of loadings will reflect consistency among
items (attributable to the latent true score factor). If applied to replicates defined by raters, the esti-
mate of σ2T provided by the squared sum of loadings will reflect consistency among raters (again,
attributable to the latent true score factor).
A key benefit of the CFA approach described above is it will allow one to impose constraints
on parameter estimates (e.g., the λs and θs) that allow one to test various assumptions underlying
the CTT and G-theory score models (see Joreskog & Sorbom, 2001, pp. 124–128; Reuterberg &
Gustafsson, 1992; Werts, Linn, & Joreskog, 1974). If replicates are strictly parallel (an assumption
underlying the Spearman-Brown prophecy; Feldt & Brennan, 1989), they should have equal factor
loadings (i.e., λ1 = λ2 = λk) and equal unique variances (i.e., θ11= θ22 = θkk). If replicates are tau-
equivalent or essentially tau-equivalent (an assumption underlying alpha and coefficients based on
the random-effects tradition), they should have equal factor loadings but their unique variances
can differ. To the extent that factor loadings vary across replicates (i.e., the replicates are not at
least essentially tau-equivalent), most reliability estimates based out of the classical and random-
effects tradition (e.g., alpha) will tend to be slightly downwardly biased (Novick & Lewis, 1967).20
Nevertheless, this common claim is based on the premise that the replicates on which alpha is
estimated are experimentally independent—from a CFA perspective this would imply there is no
unmodeled sources of covariance among replicates after accounting for the latent true score factor
(Komaroff, 1997; Raykov, 2001a; Zimmerman, Zumbo, & LaLonde, 1993). In light of the fact
that many constructs of interest to organizational researchers are heterogeneous (e.g., situational
judgment) or clearly multidimensional (e.g., job performance), application of the formula shown in
Equation 2.7 would be questionable because it implies that a single common factor accounts for the
covariance among replicates, which in practice may rarely be true.
The observation above brings us to a critical difference between the CFA-based formulation of
reliability noted in Equation 2.7 and those based on the classical and random-effects traditions—
the former specifies a single latent factor as the sole source of covariance among replicates, and
as such only variance in replicates attributable to that factor is treated as true score variance.
Recall from the operational definition of true score offered earlier and the perspective on true
score offered by the CTT and G-theory score models that true score reflects all sources of consis-
tency across replicates. As Ghiselli (1964) noted, “The fact that a single term … has been used to
describe the amount of the trait an individual possesses should not be taken to imply that individual
differences in scores on a given test are determined by a single factor” (p. 220). The implications
of this are that whereas the CFA formulation above ignores any covariance among replicates that
is left over after extracting a first latent factor, classical coefficients such as alpha and coefficients
derived from the random-effects tradition lump such covariance into the estimate of true score
variance (Bost, 1995; Komaroff, 1997; Maxwell, 1968; Raykov, 2001a; Smith & Luecht, 1992;
Zimmerman et al., 1993).
This characteristic of the CFA approach offered above presents investigators with a dilemma:
Should residual covariance observed when adopting such an approach be treated as (a) error variance
(σ2E) or (b) a source of true score variance (σ2T)? In estimates of reliability based on the classical or
random-effects traditions, one does not have much of an option. True score variance as estimated
under each of those traditions reflects any source of consistency in scores, regardless of whether it

This downward bias arises from the fact that most reliability estimates based on these traditions rely on the average
20 

covariance among replicates to make inferences regarding the magnitude of true score variance for each replicate. To
the extent that replicates are not essentially tau-equivalent, this average covariance will tend to underestimate true score
variance for each replicate (a component of true score variance of the composite of replicates), thus leading to a slight
underestimation of reliability when all other assumptions are met (e.g., uncorrelated errors among replicates) (Feldt &
Brennan, 1989; Raykov, 2001a).
30 Handbook of Employee Selection

stems from a first common factor, or what, in CFA-terms, would be viewed as residual covariance
or correlated uniquenesses (Komaroff, 1997; Scullen, 1999). However, with CFA, researchers have
the flexibility to distinguish between true score variance that (a) arises from the first common ­factor,
and (b) reflects residual covariance among replicates after extracting the first factor (Raykov, 1998;
2001b). Although in theory having this flexibility is valuable because it allows one insight into the
substance of true score variance, it also has practical benefits in that it can allow investigators to
tailor a reliability coefficient to their situation depending on the nature of the construct they are
assessing. To illustrate this flexibility, we offer three examples below that selection researchers and
practioners may encounter.
First, let us say one (a) designs a measurement procedure to assess a unidimensional construct,
(b) uses a fully crossed measurement design comprising replicates defined by a single facet of
measurement (e.g., items) to assess it, (c) fits the single-factor CFA model described above to the
resulting data, and (d) finds evidence of residual covariance. Assuming there is no pattern to the
residual covariance that would suggest the presence of additional substantively meaningful fac-
tors, the investigator would likely desire to treat the residual covariance as a source error variance
(σ2E) rather than a source of true score variance (σ2T).21 Fortunately, such residual covariance can
be easily incorporated into Equation 2.7 by replacing the term corresponding to the sum of unique
variances with a term that reflects the sum of unique variances and residual covariances or by
simply replacing the denominator with observed score variance (Komaroff, 1997; Raykov, 2001a).
If one were to calculate alpha on these same data, or fit a simple random-effects model to estimate
σ2T, such residual covariance would be reflected in σ2T as opposed to σ2E and thus would produce a
reliability estimate that is higher than the modified omega-coefficient described here when the sum
of the residual covariances are positive (lower when the sum is negative) (Komaroff, 1997). It is
important to note that the comparison made here between alpha and modified omega is based on the
assumption that the replicates in the analysis are at least essentially tau-equivalent. If the replicates
are not at least essentially tau-equivalent, then this would serve to lower the estimate of alpha, thus
either partially or completely offsetting any positive bias created by the presence of residual covari-
ance (Raykov, 2001a).
As another example, let us say one (a) designs a measurement procedure to assess a relatively
heterogeneous, but ill-specified construct (e.g., situational judgment) and again, (b) uses a fully
crossed measurement design comprising replicates defined by a single facet of measurement (e.g.,
scenarios) to assess it, (c) fits the single-factor CFA model described above to the resulting data,
and (d) finds evidence of residual covariance. In this case, the investigator may choose to treat
the residual covariance as a source of true score variance (σ2T) rather than error variance (σ2E).
Unlike the first example, given the heterogeneous nature of the situational judgment construct,
the investigator would not likely expect the covariance among scenarios to be accounted for by
a single factor. For example, the investigator may hypothesize that the scenarios comprising the
assessment vary in the degree to which various combinations of individual differences (e.g.,
interpersonal skill, conscientiousness, and general mental ability) are required to successfully
resolve them. As such, scores on scenarios may differentially covary depending on the simi-
larity of the individual difference profile required to resolve them. Under such conditions, one
would need more than a single-factor to account for covariation among replicates, but given the
ill-structured nature of situational judgment construct, the investigator may not find strong evi-
dence for a simple factor structure. As was the case with treating residual covariances as σ2E in
the previous example, Equation 2.7 can easily be modified to treat residual covariances as σ2T by

Even if there is a pattern to the residual covariances, the investigator might still wish to treat them as contributing to σ2E
21 

if they reflect an artifact of the particular measurement situation (e.g., Green & Hershberger, 2000). This raises an impor-
tant point: The examples offered here are for illustration; they are not prescriptions for future research and practice.
Ultimately it will be the individual investigator who decides how to treat residual covariance given the characteristics of
the measurement situation he or she faces.
Reliability and Validity 31

adding a term to the squared sum of loadings that reflects the sum of all residual covariances,
specifically

2
 k 
( )
k k

 ∑
i =1
λ i

+ ∑ ∑ Cov ei , e j
j

σ 2T
i i≠ j
ϖ′ = = ∧ ∧
(2.8)
 k 
2
σ 2T + σ E
( )
k k k

 ∑
i =1
λ i

+ ∑ ∑ Cov e
i
i
,
j
e j
+ ∑ θii
i =1
i≠ j

Note that treating residual covariance as σ2T represents a departure from how the CFA literature on
reliability estimation has generally espoused treating such covariance when estimating reliability
(e.g., Komaroff, 1997; Raykov, 2001b). Nevertheless, the perspective offered by these authors is
largely based on the assumption that the investigator is assessing a unidimensional construct. If
one were to calculate alpha on such replicates, or fit a random-effects model to estimate σ2T, the
covariance among residuals noted above would contribute to σ2T as opposed to σ2E and as such
would produce a coefficient similar in magnitude to what is provided by Equation 2.8 (Komaroff,
1997).
Lastly, and as a third example, let us say one (a) designs a measurement procedure to assess a
multidimensional construct (e.g., job performance), (b) uses a fully crossed measurement design
comprising replicates defined by a single facet of measurement (e.g., items) to assess it, and (c)
samples content for the measure in a way that allows one to distinguish between different dimen-
sions of the construct (e.g., samples items corresponding to multiple job performance dimensions).
In this situation, one might be interested in estimating the reliability of scores on each dimension
of the construct separately, as well as estimating the reliability of a composite score based on the
sum of dimension-level scores (e.g., an overall performance score). To achieve a reliability estimate
for scores on the overall composite, the single-factor CFA model described would clearly not be
appropriate. Rather, a multifactor model may be fitted in which each factor reflects dimensions of
the construct being targeted by the measure. Indicators would be allowed to load only on those fac-
tors they are designed to reflect, and the reliability and true score variance of the overall composite
score would be a function of factor loadings and factor covariances (Kamata, Turhan, & Darandari,
2003; Raykov, 1998; Raykov & Shrout, 2002). Any residual covariance among indicators associ-
ated with a given factor could be treated as noted in the earlier examples (i.e., treated as σ2T or σ2E)
depending on how the investigator views such covariance in light of the substantive nature of the
target construct and particular measurement situation. Such multifactor models could also be used
to simultaneously generate separate estimates of reliability of scores for each dimension of the con-
struct (Raykov & Shrout, 2002).
Although classical and random-effects traditions do not concern themselves with the factorial
composition of true score covariance as the CFA tradition does, estimation methods arising out of
the former traditions have developed to deal with reliability estimation for scores produced by mea-
sures that clearly reflect the composite of multiple dimensions. Such methods have typically been
discussed under the guise of (a) reliability estimation for measures stratified on content (e.g., items
comprising the measure were sampled to assess relatively distinct domains such as deductive and
inductive reasoning) or, more generally, (b) reliability estimation for composite scores (Cronbach,
Schoneman, & McKie, 1965; Feldt & Brennan, 1989). Here “composites” do not necessarily refer
to a compilation of items thought to reflect the same construct (i.e., replicates), but rather compila-
tions of measures designed to reflect distinct, yet related constructs (e.g., proficiency with regard to
different requirements for a trade or profession) or different components of a multidimensional con-
struct (e.g., task and contextual performance). Scores produced by such component measures may
differ in their reliability and their observed relation with one another. Sensitivity to such issues is
clearly seen in classical formulas for the reliability of composites such as stratified coefficient alpha
32 Handbook of Employee Selection

(Cronbach et al., 1965) and Mosier’s (1943) formula for the reliability of a weighted composite.22
In the case of stratified alpha and Mosier’s coefficient, σ2T for the overall composite score reflects
the sum of σ2T for each component of the composite and the sum of covariances between replicates
(e.g., items, raters) comprising different components.23 The fact that covariances between replicates
from different components of the composite contribute to true score variance has a very important
implication—these estimates will likely produce inflated estimates of reliability in cases in which
measures of each component share one or more elements of a facet of measurement (e.g., raters,
occasions) in common.
For example, consider a situation in which one gathers job performance ratings on two dimensions
of performance for each ratee—task performance and contextual performance. Assume that for any
given ratee, the same two raters provided ratings of task performance and contextual performance.
In this case, the measures of task and contextual performance share raters in common and as such
are “linked” (Brennan, 2001b). Thus, the covariation between task and contextual performance
in this example reflects not only covariance between their true scores but also covariance arising
from the fact that they share a common set of raters. Were we to apply stratified alpha or Mosier’s
formula to estimate the reliability of the composite score produced by summing across the two
dimensions (using inter-rater reliability estimates for each component in the aforementioned
formulas), covariance attributable to having a common set of raters would contribute to true score
variance, thus artificially inflating the reliability estimate (assuming we wish to generalize the
measures across raters). Stratified alpha and Mosier’s formula are based on the assumption that
errors of measurement associated with components that comprise the composite are uncorrelated;
to the extent they are positively correlated—a likely case when components share one or more
elements of a facet of measurement in common—the estimates they provide can be substantially
inflated (Rae, 2007). Outside of multivariate G-theory, which is not widely used nor discussed in the
organizational research literature (Brennan, 2001b; Webb & Shavelson, 1981), there appear to be no
practical, straightforward analytic solutions to this situation on the basis of classical and random-
effects estimation traditions.

Multifaceted Replicates and Noncrossed Measurement Designs in CFA


In all of the CFA examples offered above, the discussion assumed that the source(s) of extra covaria-
tion among replicates beyond the first factor was due to multidimensionality, or more generally het-
erogeneity in the construct being measured. However, as the example from the previous paragraph
illustrated, such covariation can also arise from the characteristics of one’s measurement design.
For example, such extra covariation can also arise if the replicates that serve as indicators in a CFA
are multifaceted and share a measurement design element (e.g., a rater, an occasion) in common.
This brings us to another critical point regarding the CFA-based approach to reliability estima-
tion discussed above. When Joreskog (1971) originally formulated the congeneric test model upon
which many CFA-based estimates of reliability are grounded, it was based on a set of replicates
defined along a single facet of measurement (e.g., items), and that facet was assumed to be fully
crossed with the objects of measurement (e.g., persons). However, as noted above, when replicates

Note Mosier’s (1943) formula is equivalent to the formula for stratified coefficient alpha if elements comprising a com-
22 

posite are equally weighted.


23 Note
 that all else being equal, stratified alpha will tend to be higher (appropriately so) than coefficient alpha applied
to the same data if between-component item covariances are lower than within-component item covariances—likely a
common occurrence in practice for measures of multidimensional constructs (Haertel, 2006; Schmitt, 1996). In both
cases the denominator of these coefficients are the same (observed variance); what changes is how true score variance
for each component of the composite is estimated (these in turn are part of what contribute to σ2T for the overall com-
posite). For stratified alpha, σ2T for any given component is a function of the average covariance among items within that
component, for alpha, σ2T for any given component is a function of the average covariance among all items, regardless
of component. As such, if between-component item covariances are lower than within-component item covariances, σ2T
for any given component will be lower if alpha is applied to the data rather than stratified alpha; in turn, the estimate σ2T
for the overall composite produced by alpha will also be lower.
Reliability and Validity 33

are ­multifaceted those replicates that share a level of a given facet in common (e.g., replicates that
share a common rater or occasion of measurement) will covary above and beyond any substantive
factors (e.g., interpersonal skill, job performance) that underlie the replicates (DeShon, 1998).
There are numerous ways to account for multifaceted replicates within the CFA framework;
however, only recently have they begun to find their way into the literature (e.g., DeShon, 1998;
Green, 2003; Le et al., 2008; Marsh & Grayson, 1994; Marcoulides, 1996). Many of the methods
being espoused for handling multifaceted replicates in the context of CFA have their roots in the
literature on modeling of multitrait-multimethod data (e.g., Kenny & Kashy, 1992; Widaman, 1985).
For example, in the context of the interview example offered earlier, we might fit a model that not
only includes a latent factor corresponding to the construct of interest (e.g., interpersonal skill), but
also specify latent factors that correspond to different raters or interview questions (e.g., all indica-
tors associated with rater 1 would load on a “Rater 1” factor, all indictors associated with rater 2
would load on a “Rater 2” factor). Alternatively, one might allow uniqueness for those indicators
that share a rater or question in common to covary and constrain those that do not to zero (e.g., Lee,
Dunbar, & Frisbie, 2001; Scullen, 1999). By fitting such models, one can derive estimates of vari-
ance components associated with various elements of one’s measurement design (e.g., person-rater
effects, person-question effects) that resemble what is achieved by fitting a random-effects model
to the data described earlier (e.g., DeShon, 2002; Le et al., 2008; Marcoulides, 1996; Scullen et al.,
2000). As illustrated earlier in our discussion of G-theory, these variance components can then be
used to construct reliability coefficients appropriate for one’s situation.
Unfortunately, as was the case with using classical reliability coefficients to calibrate various
sources of error in scores (e.g., Schmidt et al., 2000), CFA-based approaches to variance compo-
nent estimation have a few drawbacks. First, they easily lend themselves to estimating variance
attributable to (a) facet main effects (e.g., rater main effects, question main effects) or (b) interac-
tions among measurement facets (e.g., rater-question interaction effects). Although it is possible to
estimate the effects above, this would require calculating covariances among persons (i.e., persons
as columns/variables) across facets of measurement of interest (e.g., raters, question) as opposed
to the typical calculation of covariances among question-rater pairs (i.e., question-rater pairs are
treated as columns/variables) across objects of measurement (e.g., persons).24 Furthermore, it is not
clear how CFA could be leveraged to deal with designs that are more ill-structured in nature (e.g.,
Putka et al., 2008). For example, recall the example earlier where we had performance ratings for
a sample of incumbents that were rated by multiple raters, and the raters that rated each incumbent
varied in their degree of overlap. When confronted with such designs in the past applications of
CFA, organizational researchers have generally resorted to random assignment of raters to columns
for each ratee (e.g., Mount, Judge, Scullen, Sytsma, & Hezlett, 1998; Scullen et al., 2000; Van
Iddekinge, Raymark, Eidson, & Attenweiler, 2004). As noted earlier, the drawback of doing this is
that it can produce results that (a) vary simply depending on how raters are assigned for each ratee
and (b) fail to account for the nonindependence of residuals for incumbents that share a rater in
common, which serves to downwardly bias estimates of true score variance (Kenny & Judd, 1986;
Putka et al., 2008).
Lastly, for better or worse, the literature on CFA offers myriad ways to parameterize a model to
arrive at variance component estimates, each of which have various strengths and weaknesses that
are still in the process of being ironed out (e.g., Eid, Lischetzke, Nussbeck, & Trierweiler, 2003;
Lance, Lambert, Gewin, Lievens, & Conway, 2004; Marsh & Grayson, 1994). With the random-
effects model discussed above, the number of alternative parameterizations (at least as currently
implemented in common statistical software such as SAS and SPSS) is quite limited. The difficulty
this creates when using CFA to estimate variance components is determining which parameteriza-
tion is most appropriate in a given situation, because a clear answer has not emerged and will likely
ultimately depend on characteristics of the construct being measured and characteristics of one’s

24 Interested readers are referred to Hocking (1995) and Brennan (2001b, pp. 166–168).
34 Handbook of Employee Selection

measurement situation (Marsh & Grayson, 1994). This is complicated by the fact that often times
the choice of which parameterization is adopted may be less a matter of substantive considerations
and more a reflection of the parameterization that allowed the software fitting the model to converge
to an admissible solution (Lance et al., 2004). Ultimately, such nuances, coupled with the com-
plexity of CFA-based approaches to variance component estimation, may limit the utility of such
approaches for reliability estimation in general selection research and practice.

Summary: Comparison and Equifinality of Estimation Traditions


On the basis of the examples, one might ask which tradition best serves the needs of personnel
selection research and practice. Our answer would be no single tradition currently satisfies all needs.
Table 2.1 summarizes characteristics of the reliability estimation traditions discussed above.
Beyond their simplicity and familiarity, classical approaches do not appear to have much to offer.
Modern random-effects approaches not only address measurement situations classical approaches
were initially designed to handle (e.g., those involving single-faceted replicates and fully crossed
designs), but also those situations that classical approaches were not designed to handle (i.e., pro-
cedures involving multifaceted replicates and/or noncrossed designs). Couple this with the ease
with which variance components can now be estimated using widely available software (e.g., SPSS,
SAS), as well as the consistency of the random-effects model with modern psychometric perspec-
tives on reliability (i.e., G-theory; AERA, APA, & NCME, 1999; Brennan, 2006), and it appears the
­random-effects tradition has much to offer. Nevertheless, the classical and random-effects traditions
suffer from two similar drawbacks in that their estimation procedures offer no clear mechanism
for (a) testing or dealing with violations of CTT and G-theory measurement model assumptions on
which their formulations of reliability are based and (b) specifying or testing ­alternative factorial
compositions of true score. The latter drawback can make the interpretation of reliability estimates

TABLE 2.1
Relative Advantages and Disadvantages of Reliability Estimation Traditions
Random
Characteristics of Estimation Tradition Classical Effects CFA
Perceived simplicity Yes No No
Widely discussed in organizational literature on reliability estimation Yes No No
Easily implemented with standard statistical software (e.g., SPSS, SAS) Yes Yes Noa
Direct and simultaneous estimation of variance components underlying σ2T No Yes Yes
and σ2E
Straightforward to apply to nested and ill-structured measurement designs No Yes No
confronted in applied organizational research and practice
Capacity to isolate and estimate variance attributable to facet main effects No Yes Nob
and interactions among facets
Offers mechanism for testing and dealing with violations of CTT and No Noc Yes
G-theory measurement model assumptions
Offers mechanism for specifying and testing alternative factorial No No Yes
compositions of true score

a Potential exception is PROC CALIS within SAS.


b As noted in text, such effects could be estimated by fitting CFA models to covariances calculated across measurement
facets (e.g., question, raters, question-rater pairs) as opposed to objects of measurement (e.g., persons).
c SAS and SPSS now offer users the ability to fit “heterogeneous variance” random-effect models, which for some designs
can be used to assess various equivalence assumptions underlying the CTT and G-theory measurement models (e.g., Is σ2T
for Rater 1 = σ2T for Rater 2?).
Reliability and Validity 35

difficult because of ambiguity of what constitutes true score, particularly for measures of heteroge-
neous constructs. This is where the CFA tradition can offer an advantage; however, this advantage
does not come freely—its price is added complexity.
For single-faceted replicates that are fully crossed with one’s objects of measurement, CFA
methods are straightforward to apply and clear examples exist (e.g., Brown, 2006; McDonald,
1999). For multifaceted replicates, a systematic set of examples has yet to be provided for inves-
tigators to capitalize on, which is complicated by the fact that the CFA models can be param-
eterized in numerous different ways to arrive at a solution (Marsh & Grayson, 1994). This has
a tendency to restrict such solutions to psychometrically savvy researchers and practitioners.
Moreover for the ill-structured measurement designs discussed by Putka et al. (2008), which are
all too common in selection research involving ratings (e.g., assessment centers, interviews, job
performance), it is not clear how the CFA models would overcome the issues raised. Thus we have
a tradeoff between the ease with which modern random-effects models and software can deal
with multifaceted measurement designs of any sort and the model fitting and testing capabilities
associated with CFA, which can serve to not only check on measurement model assumptions,
but also refine our specification (and understanding) of true score for measures of heterogeneous
constructs.
Although we have treated reliability estimation approaches arising out of classical, random
effects, and CFA traditions separately, it is important to recall how we began this section—all of
these traditions can be used to arrive at the same ratio—σ2T/(σ2T + σ2E). The substantive meaning
of σ2T and σ2E will of course depend on the type of replicates examined, the nature of the measure-
ment procedure, and the construct that one is assessing. Nevertheless, all of these traditions can
potentially be leveraged to arrive at an estimate of this ratio and/or components of it. How they
arrive at those estimates, the assumptions they make in doing so, and the strengths and weaknesses
of the methodologies they use is what differentiates them. In cases in which one has a measurement
procedure comprised of single-faceted replicates or multifaceted replicates in which facets are fully
crossed with one’s objects of measurement and one is interested solely in using scores to make
relative comparisons among objects of measurement (e.g., persons), much literature has accumu-
lated indicating that these traditions can produce very similar results, even in the face of moderate
violation of common tau-equivalence assumptions (e.g., Le et al., 2008; Reuterberg & Gustafsson,
1992).
For example, Brennan (2001b) and Haertel (2006) show how random-effects ANOVA models
may be used to estimate variance components and form reliability coefficients that are identical to
the types of reliability coefficients from the classical tradition (e.g., alpha, coefficients of equivalence
and stability). Marcoulides (1996) demonstrated the equivalence of variance components estimated
based on CFA and random-effects ANOVA models fitted to a multifaceted set of job analysis data.
Le et al. (2008) illustrated how one can arrive at similar variance component estimates using func-
tions of Pearson correlations, random-effects models, and CFA models. Lastly, Brennan (2001b)
and Hocking (1995) demonstrated how it is possible to generate variance component estimates
without even invoking the random-effects or CFA models but simply calculating them as functions
of observed variance and covariances (in some ways, akin to Schmidt et al., 2000). Each of these
works illustrate that under certain conditions, the three traditions discussed above can bring inves-
tigators to similar conclusions. However, as illustrated above, nuances regarding the (a) generaliza-
tions one wishes to make regarding their scores, (b) the intended use of those scores (e.g., relative
comparisons among applicants vs. comparisons of their scores to a fixed cutoff), (c) characteristics
of one’s measurement procedure itself (e.g., nature of its underlying design), and (d) characteristics
of the construct one is attempting to measure (e.g., unidimensional vs. multidimensional, homoge-
neous vs. heterogeneous) make some of these approaches more attractive than others under different
circumstances. Ideally, the well-informed investigator would be in a position to capitalize on the
relative strengths of these traditions when formulating reliability and variance component estimates
of interest given his/her situation.
36 Handbook of Employee Selection

Emerging Perspectives on Measurement Error


In concluding our treatment of reliability, we would like to revisit a point that we closed our introductory
section with and raise the possibility that our psychometric traditions have created a critical gap in
existing paradigms for framing and studying measurement error. As alluded to in our introductory
section, beginning with Spearman (1904), our field has a curious legacy of dealing with measurement
error from the perspective of reliability. Rather than attempt to explain or model it, tradition has been
to correct observed relations for it (Schmidt & Hunter, 1996).25 This legacy is not without criticism
inside (DeShon, 2002; Murphy & DeShon, 2000) and outside of organizational research (Ng, 1974;
Rozeboom, 1966; Zimmerman & Williams, 1980, 1997). Nevertheless, researchers have countered
that we appear to be without viable alternatives (Schmidt et al., 2000). At risk of being viewed as
nihilistic, we briefly discuss the emergence of such alternatives—namely, refined models of error
that can help explain the error that reliability theories help us quantify.
As noted above, estimating reliability is an exercise in quantifying the consistency (or inconsis-
tency) expected in scores produced by a measurement procedure if it is replicated. These estimates
are based on quantification of general sources of inconsistency arising from nominal elements of
one’s measurement design (e.g., raters sampled, items sampled, etc.). The perspective reliability
offers on error in scores is complemented by the perspective on error offered by validity. One form
of validity evidence is establishing support for the hypothesis that “true score” in investigations
of reliability reflects consistency because of the construct of interest, rather than the construct-
­irrelevant attributes that may contribute to consistency across replicates. What arguably falls
through the cracks of our current psychometric paradigms and traditions is a framework for model-
ing the substantive basis of inconsistency in scores. In other words, if validity tends to concern itself
with differentiating between invalid and valid forms of consistency in scores, and reliability simply
concerns itself with quantifying inconsistency in scores, what psychometric paradigm concerns
itself with explaining inconsistency in scores?
One might argue that G-theory provides paradigm and mechanism for explaining such incon-
sistencies (Murphy & DeShon, 2000), but the mechanism it offers for doing so (the random-effects
model) is limited. For example, G-theory provides a decomposition of variance in observed scores
as a function of nominal facets of one’s measurement design, but does not attempt to explain the
substantive basis of such variance components. For example, returning to the interview example
we used earlier, fitting a random-effects model to such data may reveal that a sizable portion of
variance in ratings is attributable to rater severity/leniency effects, as opposed to the ease/difficulty
of interview questions, but such an analysis does little to explain what characteristics of raters are
associated with the degree of their severity/leniency; that is, although G-theory offers clues where
variation is scores resides, it does not explain potential reasons for that variation.
To further illustrate this point, we draw an analogy to applications of multilevel modeling in I-O
research (Bliese, 2002). In typical applications of multilevel modeling, we first divvy up variance
in an outcome across levels of analysis (e.g., individual-level, group-level)—much like we divvy
up variance in an outcome across measurement facets using random-effects models in G-theory.
Rarely is the focus of multilevel modeling efforts simply to divvy up such variance; rather, its pur-
pose is also to model variance found at each level—the initial decomposition of variance simply
sets the baseline for variance that could potentially be explained. On the other hand, G-theory is
structured around simply divvying the variance up; it stops short of attempting to model it because
its purpose is simply to quantify, not explain it. Indeed, Cronbach et al. (1972) acknowledged this:
“Suppose, for example, it is found that in peer ratings there is a substantial subject-rater interaction
component … the finding should impel the investigator to ask what rater characteristics contribute
to [explain variance attributable to] the interaction” (p. 382; bracketed text added for clarity).

Per our discussion in earlier sections, we attribute this in part to the tendency of CTT-based treatments of reliability to
25 

equate “random” error with “unpredictable” error.


Reliability and Validity 37

The above example might suggest that hierarchical linear models (HLMs) underlying multilevel
modeling efforts could provide a framework for modeling inconsistencies in scores. However, software
for fitting such models has historically been based on algorithms that have been optimized for deal-
ing with nested data structures (e.g., Bryk & Raudenbush, 1992), rather than the myriad of data struc-
tures that characterize measurement designs faced in research and practice. To model inconsistencies
in scores arising from myriad types of measurement designs, a more general modeling framework is
required. Along these lines, efforts are emerging on several fronts. For example, Putka, Ingerick, and
McCloy (2008a) elaborate on how linear mixed models (LMMs) offer a powerful framework for not
only partitioning error in ratings (per G-theory), but also modeling the substantive determinants of
components of error regardless of the measurement design underlying those ratings. Applied empiri-
cal examples of attempts to model inconsistency on the basis of LMMs, and more limited versions
of them are beginning to emerge (e.g., Hox, 1994; LaHuis & Avis, 2007; Laenen, Vangeneugden,
Geys, & Molenberghs, 2006; Putka et al., 2008b; Shrout, 1993; Van Iddekinge, Putka, Raymark, &
Eidson, 2005). Taking a slightly different tack, Englehard (1994) and Patz, Junker, Johnson, and
Mariano (2002) have described IRT-based approaches to modeling inconsistencies in ratings. The
theme shared by all of these efforts is that they are attempting to model inconsistencies in measures
through capitalizing on modern analytic methods that post-date the developments of both CTT and
G-theory, but are not easily placed in the psychometric curricula because such investigations may be
beyond the purview of traditional investigations of reliability and validity. Nevertheless, these may
represent the emergence of a refined psychometric paradigm that may prove particularly useful for
understanding and improving our most error-prone measurement procedures.

Closing Thoughts on Reliability


Perspectives on reliability and methods for its estimation have evolved greatly over the last 50 years,
but these perspectives and methods have yet to be well integrated (Brennan, 2006). One potential
reason for this lack of integration may stem from the historical disconnect between experimen-
tal and correlation research traditions (Cronbach, 1957), which continues to manifest itself today,
particularly in our approaches to reliability estimation (Cronbach & Shavelson, 2004). Another
potential reason for this lack of integration may stem from the recognized decline in the graduate
instruction of statistics and measurement over the past 30 years in psychology departments (Aiken
et al., 2008; Merenda, 2007). For example, in reviewing results of their study of doctoral training
in statistics, measurement, and methodology in PhD psychology programs across North America,
Aiken et al. (2008) lament,

We find it deplorable … the measurement requirement occupies a median of only 4.5 weeks in the PhD
curriculum in psychology. A substantial fraction of programs offered no training in test theory or test
construction; only 46% of programs judge that the bulk of their graduates could assess the reliability
of their own measures (p. 43).

Under such conditions, it makes it nearly impossible for faculty to comprehensibly integrate and dis-
cuss implications of developments in the areas above into classroom discussions of psychometrics—
almost out of necessity, we limit ourselves to basic treatment of age-old perspectives on measurement.
Couple this observation with the explosion of new statistical software and availability of new esti-
mation methods since the mid-1980s, and it creates a situation where staying psychometrically
current can be a challenge for those in academe, as well as those in practice. Of course, also com-
plicating the trends above is the course of normal science—which leads us to pursue incremental
research that serves to refine measurement models and the perspectives on reliability they offer but
does not emphasize integration of models and perspectives (Kuhn, 1962). Such a lack of integra-
tion among psychometric models and perspectives is unfortunate because it can serve as a source
of parsimony, which is critical when one has limited time to devote to such topics in the course of
38 Handbook of Employee Selection

graduate instruction and in the course of applied research and practice. We hope this treatment has
brought some degree of parsimony to what have often been treated as disparate, loosely related top-
ics. Furthermore, we hope it cast developments in the area of reliability in a novel light for selection
researchers and practitioners and encourages us to explore and capitalize on modern methods for
framing reliability, error, and their underlying components.

Concept of Validity
Validity, according to the 1999 Standards for Educational and Psychological Testing, is “the degree
to which evidence and theory support the interpretations of test scores entailed by proposed uses of
tests” (AERA, APA, & NCME, 1999, p. 9). There is a long history and considerable literature on the
subject of validity. With limited space here, it is impossible to do justice to the subject. We attempt
to highlight a set of important issues in the ongoing development of thinking about validity, but we
direct the interested reader to a set of key resources for a strong foundation on the topic. One key
set of references is the set of chapters on the topic of validity in the four editions of Educational
Measurement, which is that field’s analog to the Handbook of Industrial and Organizational
Psychology. Cureton (1951), Cronbach (1971), Messick (1989), and Kane (2006) each offer detailed
treatment of the evolving conceptualizations of validity. Another key set focuses specifically on
validity in the context of personnel selection. There have been two prominent articles on validity in
the employment context published in the American Psychologist by Guion (1974) and Landy (1986).
There is also a very influential paper by Binning and Barrett (1989). A third key set is made up of
classic highly cited articles in psychology—Cronbach and Meehl’s (1955) and Loevinger’s (1957)
treatises on construct validity.
Our focus here is entirely conceptual. We do not address operational issues in the design of
studies aimed at obtaining various types of validity evidence. Rather, we attempt to outline a set of
issues that we view as central to an understanding of validity.

Validity as Predictor-Criterion Relationship Versus Broader Conceptualizations


In the first half of the 20th century, validity was commonly viewed solely in terms of the strength of
predictor-criterion relationships. Cureton’s (1951) chapter on validity stated, reasonably, that validity
addresses the question of “how well a test does the job it was employed to do” (p. 621). But the “job it was
employed to do” was viewed as one of prediction, leading Cureton to state that, “Validity is … defined in
terms of the correlation between the actual test scores and the ‘true’ criterion measures” (pp. 622–623).
But more questions were being asked of tests than whether they predicted a criterion of interest.
These included questions about whether mastery of a domain could be inferred from a set of ques-
tions sampling that domain and about whether a test could be put forward as a measure of a speci-
fied psychological construct. A landmark event in the intellectual history of the concept of validity
was the publication of the first edition of what is now known as the Standards for Educational
and Psychological Testing (APA, 1954), in which a committee headed by Lee Cronbach, with Paul
Meehl as a key member, put forward the now familiar notions of predictive, concurrent, content,
and construct validity. Cronbach and Meehl (1955) elaborated their position on construct validity
a year later in their seminal Psychological Bulletin paper. Since then, validity has been viewed
more broadly than predictor-criterion correlations, with the differing validity labels viewed first as
types of validity and more recently as different types of validity evidence or as evidence relevant to
­differing inferences to be drawn from test scores.

Validity of an Inference Versus Validity of a Test


Arguably the single most essential idea regarding validity is that it refers to the degree to which
evidence supports inferences one proposes to draw about the target of assessment (in the I-O world,
Reliability and Validity 39

most commonly an individual; in other settings, a larger aggregate such as a classroom or a school)
from their scores on assessment devices. The generic question “Is this a valid test?” is not a use-
ful one; rather, the question is “Can a specified inference about the target of assessment be validly
drawn from scores on this device?” Several important notions follow from this position.
First, it thus follows that the inferences to be made must be clearly specified. It is often the case
that multiple inferences are proposed. Consider a technical report stating, “This test representatively
samples the established training curriculum for this job. It measures four sub-domains of job knowl-
edge, each of which is predictive of subsequent on-the-job task performance.” Note that three claims
are made here, dealing with sampling, dimensionality, and prediction, respectively. Each claim is
linked to one or more inferences about a test taker (i.e., degree of curriculum mastery, differentia-
tion across subdomains, relationships with subsequent performance, and incremental prediction of
performance across subdomains).
Second, support for each inference is needed to support the multifaceted set of claims made
about inferences that can be drawn from the test. Each inference may require a different type of
evidence. The claim of representative content sampling may be supported by evidence of the form
historically referred to as “content validity evidence,” namely, a systematic documentation of the
relationship between test content and curriculum content, typically involving the judgment of subject
matter experts. The claim of multidimensionality may be supported by factor-analytic evidence, and
evidence in support of this claim is one facet of what has historically been referred to as construct
validity evidence (i.e., evidence regarding whether the test measures what it purports to measure).
The claim of prediction of subsequent task performance may be supported by what has historically
been referred to as “criterion-related validity evidence,” namely, evidence of an empirical relation-
ship between test scores and subsequent performance. Note that the above types of evidence are
provided as examples—it is commonly the case that multiple strategies may be selected alone or
in combination as the basis for support for a given inference. For example, empirical evidence of a
test-criterion relationship may be unfeasible in a given setting because of sample size limitations,
and the investigator may turn to the systematic collection of expert judgment as to the likelihood
that performance on various test components is linked to higher subsequent job performance.
Third, it may prove the case that some proposed inferences receive support as evidence is gath-
ered and evaluated, whereas others do not. In the current example, what might emerge is strong
support for the claim of representative curriculum sampling and strong support for the claim of
prediction of subsequent performance, but evidence in support of a unidimensional rather than
multidimensional structure. In such cases, one should revise the claims made for the test; in this
case, dropping the claim that inferences can be drawn about differential standing on subdomains
of knowledge.

Types of Validity Evidence Versus Types of Validity


Emerging from the 1954 edition of what is now the Standards for Educational and Psychological
Testing was the notion of multiple types of validity. The triumvirate of criterion-related validity,
content validity, and construct validity came to dominate writings about validity. At one level, this
makes perfect sense. Each of these deals with different key inferences one may wish to draw about
a test. First, in some settings, such as many educational applications, the key inference is one of
content sampling. Using tests for purposes such as determining whether a student passes a course,
progresses to the next grade, or merits a diploma relies heavily on the adequacy with which a test
samples the specified curriculum. Second, in some settings, such as the study of personality, the key
inference is one of appropriateness of construct labeling. There is a classic distinction (Loevinger,
1957) between two types of construct validity questions, namely, questions about the existence
of a construct (e.g., Can one define a construct labeled “integrity” and differentiate it from other
constructs?) and questions about the adequacy of a given measure of a construct (e.g., Can test
X be viewed as a measure of integrity?) Third, in some settings, such as the personnel ­selection
40 Handbook of Employee Selection

s­ etting of primary interest for the current volume, the key inference is one of prediction: Can scores
from measures gathered before a selection decision be used to draw inferences about future job
behavior?
Over the last several decades, there has been a move from viewing these as types of validity to
types of validity evidence. All lines of evidence—content sampling, dimensionality, convergence
with other measures, investigations of the processes by which test takers respond to test stimuli, or
relations with external criteria—deal with understanding the meaning of test scores and the infer-
ences that can be drawn from them. Because construct validity is the term historically applied to
questions of the meaning of test scores, the position emerged that if all forms of validity evidence
contributed to understanding the meaning of test scores, then all forms of validity evidence were
really construct validity evidence. The 1999 edition of the Standards pushed this one step further:
If all forms of evidence are construct validity evidence, then “validity” and “construct validity”
are indistinguishable. Thus the Standards refer to “validity” rather than “construct validity” as the
umbrella term. This seems useful, because construct validity carries the traditional connotations
of referring to specific forms of validity evidence, namely convergence with conceptually related
measures and divergence from conceptually unrelated measures.
Thus, the current perspective reflected in the 1999 Standards is that validity refers to the eviden-
tiary basis supporting the inferences that a user claims can be drawn from a test score. Many claims
are multifaceted, and thus multiple lines of evidence may be needed to support the claims made
for a test. A common misunderstanding of this perspective on validity is that the test user’s burden
has been increased, because the user now needs to provide each of the types of validity evidence.
In fact, there is no requirement that all forms of validity evidence be provided; rather, the central
notion is, as noted earlier, that evidence needs to be provided for the inferences that one claims can
be drawn from test scores. For example, if one’s intended inferences make no claims about content
sampling, content-related evidence is not needed. If the claim is simply that scores on a measure
can be used to forecast whether an individual will voluntarily leave the organization within a year
of hire, the only inference that needs to be supported is the predictive one. One may rightly assert
that scientific understanding is aided by obtaining other types of evidence than those drawn on to
support the predictive inference (i.e., forms of evidence that shed light on the construct(s) underly-
ing test scores), but we view such evidence gathering as desirable but not essential. One’s obligation
is simply to provide evidence in support of the inferences one wishes to draw.

Validity as an Inference About a Test Score Versus Validity


as a Strategy for Establishing Job Relatedness

In employment settings, the most crucial inference to be supported about any measure is whether
the measure is job-related. Labeling a measure as job-related means “scores on this measure can
be used to draw inferences about an individual’s future job behavior”—we term this the “predictive
inference.” In personnel selection settings, our task is to develop a body of evidence to support the
predictive inference. The next section of this chapter outlines mechanisms for doing so.
Some potential confusion arises from the failure to differentiate between settings where types of
validity evidence are being used to draw inferences about the meaning of test scores rather than to
draw a predictive inference. For example, content-related validity evidence refers to the adequacy
with which the content of a given measure samples a specified content domain. Assume that one is
attempting to develop a self-report measure of conscientiousness to reflect a particular theory that
specifies that conscientiousness has four equally important subfacets: dependability, achievement
striving, dutifulness, and orderliness. Assume that a group of expert judges is given the task of
sorting the 40 test items into these four subfacets. A finding that 10 items were rated as reflecting
each of the four facets would support the inference of adequate domain sampling and contribute to
an inference about score meaning. Note that this inference is independent of question about the job
relatedness of this measure. One could draw on multiple lines of evidence to further develop the
Reliability and Validity 41

case for this measure as an effective way to measure conscientiousness (e.g., convergence with other
measures) without ever addressing the question of whether predictive inferences can be drawn from
this measure for a given job. When one’s interest is in the predictive hypothesis, various types of
validity evidence can be drawn upon to support this evidence, as outlined below.

Predictive Inference Versus the Evidence for It


As noted above, the key inference in personnel selection settings is a predictive one, namely the
inferences that scores on the test or other selection procedure can be used to make predictions about
the test takers’ subsequent job behavior. A common error is the equating of the type of inference to
be drawn with the type of evidence needed to support the inference. Put most bluntly, the error is
to assert that, “If the inference is predictive, then the needed evidence is criterion-related evidence
of the predictive type.”
Scholars in the I-O area have clearly articulated that there are multiple routes to providing evi-
dence in support of the predictive hypothesis. Figure 2.1 presents this position in visual form. Models
of this sort are laid out in Binning and Barrett (1989) and in the 1999 Standards. This upper half
of Figure 2.1 shows a measured predictor and a measured criterion. Because both are measured,
the relationship between these two can be empirically established. The lower half of Figure 2.1
shows an unmeasured predictor construct domain and an unmeasured criterion construct domain.
Of interest are the set of linkages between the four components of this model.
The first and most central point is that the goal of validation research in the personnel selection
context is to establish a linkage between the predictor measure (Figure 2.1, upper left) and the cri-
terion construct domain (Figure 2.1, lower right). The criterion construct domain is the conceptual
specification of the set of work behaviors that one is interested in predicting. This criterion construct
domain may be quite formal and elaborate, as in the case of a job-analytically-specified set of criti-
cal job tasks, or it may be quite simple and intuitive, as in the case of an organization that asserts
that it wishes to minimize voluntary turnover within the first year of employment and thus specifies
this as the criterion domain of interest.
The second central point is that there are three possible mechanisms for linking an observed
predictor score and a criterion construct domain. The first is via a sampling strategy. If the ­predictor

Linkage 1
Observed Observed
• Via local validation study
predictor criterion
• Via validity generalization/
measure measure
transportability evidence

Linkage 2 Linkage 5 Linkage 4


• Amassing theory and data to assign • Sampling of predictor content from • Via SME judgment
construct labels to predictor scores criterion domain (e.g., comparison to job analysis data)
(informed via job analysis) • Via empirical means
(e.g., CFA)

Linkage 3
Latent Latent
• Via logic and judgment based on
predictor criterion
existing theory and the body of
domain domain
relevant empirical evidence

FIGURE 2.1  Routes to providing evidence in support of the predictive inference. (Adapted from Binning,
J. F., & Barrett, G. V., Journal of Applied Psychology, 74, 478–494, 1989.)
42 Handbook of Employee Selection

measure is a direct sample of the criterion construct domain, then the predictive inference is estab-
lished based on expert judgment (e.g., obtained via a job analysis process) (Linkage 5 in Figure
2.1). Having an applicant for a symphony orchestra position sight read unfamiliar music is a direct
sample of this important job behavior. Having an applicant for a lifeguard position dive to the bot-
tom of a pool to rescue a simulated drowning victim is a simulation, rather than a direct sample of
the criterion construct domain. However, it does rely on domain sampling logic, and, like most work
sample tests, aims at psychological fidelity in representing critical aspects of the construct domain.
The second mechanism for linking an observed predictor and a criterion construct domain
is via establishing a pair of linkages, namely (a) the observed predictor–observed criterion link
(Linkage 1 in Figure 2.1) and (b) the observed criterion–criterion construct domain link (Linkage
4 in Figure 2.1). The first of these can be established empirically, as in the case of local criterion-
related evidence, or generalized or transported evidence. Critically, such evidence must be paired
with evidence that the criterion measure (e.g., ratings of job performance) can be linked to the
criterion construct domain (e.g., actual performance behaviors). Such evidence can be judgmental
(e.g., comparing criterion measure content to critical elements of the criterion construct domain
revealed through job analyses) and empirical (e.g., fitting CFA models to assess whether dimen-
sionality of the observed criterion scores is consistent with the hypothesized dimensionality of the
criterion construct domain). It commonly involves showing that the chosen criterion measures do
reflect important elements of the criterion construct domain. Observed measures may fail this test,
as in the case of a classroom instructor who grades solely on attendance when the criterion con-
struct domain is specified in terms of knowledge acquisition, or in the case of a criterion measure
for which variance is largely determined by features of the situation rather than by features under
control of the individuals.
The third mechanism also focuses on a pair of linkages, namely (a) linking the observed predic-
tor scores and the predictor construct domain (Linkage 2 in Figure 2.1) and (b) linking the predictor
construct domain and the criterion construct domain (Linkage 3 in Figure 2.1). The first linkage
involves obtaining data to support interpreting variance in predictor scores as reflecting variance in
a specific predictor construct domain. This reflects one form of what has historically been referred
to as construct validity evidence, namely, amassing theory and data to support assigning a speci-
fied construct label to test scores. For example, if a test purports to measure achievement striving,
one might offer a conceptual mapping of test content and one’s specification of the domain of
achievement striving, paired with evidence of empirical convergence with other similarly specified
measures of the construct. However, showing that the measure does reflect the construct domain is
supportive of the predictive inference only if the predictor construct domain can be linked to the
criterion construct domain. Such evidence is logical and judgmental, requiring a clear articulation
of the basis for asserting that individuals higher in the domain of achievement striving will have
higher standing on the criterion construct domain than individuals lower in achievement striving.
Thus, there are multiple routes to establishing the predictive inference. These are not mutually
exclusive; one may provide more than one line of evidence in support of the predictive inference. It
is also not the case that the type of measure dictates the type of evidentiary strategy chosen.

Classical Versus Modern Notions of Reliability as a Limiter of Validity


Modern perspectives on validity and reliability are concerned with assessing the veracity of infer-
ences. With reliability, we are primarily concerned with inferences regarding the similarity of
observed scores to scores that would be obtained from a population of like measures (replicates).
With validity, the inferences of interest can take on many forms, and multiple types of evidence may
be garnered to address the veracity of those inferences. Given this description of validity, one may
argue modern notions of validity encompass reliability (i.e., reliability regards a specific inference
one may make regarding observed scores). Regardless, a key feature of veracity of the “reliability
inference” is that it can have implications for the strength of evidence we can garner for inferences
Reliability and Validity 43

that typically fall under discussions of validity (e.g., observed predictor scores are related to subse-
quent job behavior).
Historically, the idea that reliability limits or puts an upper bound on validity is grounded in
reliability theory from the early 20th century (i.e., CTT) and notions of validity that peaked in the
1950s when criterion-related validity was the gold standard. By operationalizing validity as a cor-
relation between two sets of scores (each of which are assumed to conform to the CTT score model),
it is straightforward to demonstrate that “reliability” serves to put an upper bound on “validity”
(Spearman, 1904, 1910). Clearly, validity and reliability have greatly evolved since the early 20th
century; such an evolution begs the question—How does the idea that “reliability limits validity”
manifest itself in modern perspectives on these concepts? Here we briefly juxtapose this classical
idea against the general forms of validity evidence that might be garnered to support the inference
that applicants’ scores on a selection procedure can predict their subsequent job behavior.
Recall from the previous section that one mechanism for establishing evidence of a link between
observed predictor scores and subsequent job behavior involves demonstrating that predictor mea-
sures reflect a representative sampling of actual job behaviors (Linkage 5 in Figure 2.1). Typically,
such behaviors are identified through job analyses that may involve panels of job experts, surveys of
job incumbents, and so forth. Given the many judgments involved in the job analysis process, results
of that process are prone to similar types of errors we concern ourselves with in investigations of
reliability (e.g., inconsistency among raters’ ratings of job tasks, inconsistency among questions
designed to assess task criticality). To the extent that investigators recognize such inconsistencies as
error, and to the extent that such errors manifest themselves in job analysis results, then investiga-
tors’ argument that predictor measure content is indicative of actual behavior on the job would be
weakened. Note that although this example draws on issues related to reliability, the focus shifts
from reliability of a test score (the historic focus of the perspective that reliability puts an upper
bound on validity) to the reliability of processes used in the development of predictor measures.
A second mechanism for establishing evidence of a link between observed predictor scores and
subsequent job behavior involves demonstrating a link between (a) observed predictor and crite-
rion scores (Linkage 1 in Figure 2.1) and (b) observed criterion scores and the criterion construct
domain (Linkage 4 in Figure 2.1). As noted earlier, arguments for the veracity of the first link are
often based on local criterion-related validity studies or synthetic estimates (e.g., meta-analytic esti-
mates, job component validity). Here, the relation between reliability and validity can become quite
complex, because it involves many subtleties that are often masked in CTT-based discussion of
reliability’s effect on criterion-related validity estimates. For example, if the predictor and criterion
being examined share some element of the measurement design in common (e.g., assessed by com-
mon raters, assessed on a common occasion), then there will be competing biases on the observed
validity estimate. All else being equal, unreliability in the criterion will tend to downwardly bias
the validity estimate, whereas covariance arising from common element of the measurement design
(e.g., raters, occasions) would tend to upwardly bias the validity estimate (Viswesvaran et al., 2005).
In this situation, error (in the reliability sense) serves to simultaneously downwardly and upwardly
bias observed predictor-criterion correlations, thus serving to weaken one’s arguments because of
the uncertainly it creates regarding the predictor-criterion relationship. Investigators’ arguments
for the veracity of a link between observed criterion scores and the criterion construct domain may
be weakened by (a) unreliability in the job analytic judgments that gave rise to the content of the
observed criterion measure (per the example from the previous paragraph) and (b) unreliability of
the observed criterion scores themselves (e.g., inconsistency in raters’ performance ratings).
Lastly, a third mechanism for establishing evidence of a link between observed predictor scores
and subsequent job behavior involves demonstrating a link between (a) observed predictor scores
and the predictor construct domain (Linkage 2 in Figure 2.1) and (b) the predictor and criterion con-
struct domains (Linkage 3 in Figure 2.1). Although we do not typically correct for predictor unreli-
ability in criterion-related validity studies, if we concern ourselves with the link between observed
predictor scores and the theoretical domain that the predictor measures’ content it is believed to
44 Handbook of Employee Selection

represent, then unreliability in the predictor scores can serve to weaken arguments for that link.
The link between predictor and criterion construct domains is largely established based on theory.
In theory, we tend not concern ourselves with properties of observed scores—as such, the idea
that reliability limits validity does not directly apply. Nevertheless, how we come to defensible
theory, or at least provide evidence for relationships that serve as its key tenets, is by empirical
means. Unreliability in scores produced by measures of key constructs within a theory can weaken
theory-based arguments by introducing uncertainty into the substantive nature of key relationships
underpinning the theory. For example, it is fairly well recognized that unreliability can have a
deleterious effect on investigators’ ability to detect nonlinear relationships, interaction effects (e.g.,
moderating effects), and increments in criterion-related validity. Theory-based arguments that rest
on the assumption that those effects do not exist may be weakened to the extent that the bodies of
literature on which those theory-based arguments are defended used unreliable measures (e.g., rat-
ings) of key constructs (e.g., job performance). Null findings may not be indicative that such effects
do not exist in reality, but rather simply indicate that there is too much “noise in the data” to detect
them. The net result of this phenomenon is that we may build up a base of empirical evidence that
“consistently” finds lack of such effects, which may lead one to potentially falsely conclude such
effects do not exist (i.e., the scientific allure of consistency in findings across studies may bias one
toward accepting the null findings). This example demonstrates the subtlety via which a lack of
reliability may weaken theory-based arguments for the link between observed predictor scores and
subsequent job behavior.

Validity Limited to Inferences About Individuals Versus


Including Broader Consequences of Test Score Use
In the last two decades, considerable attention has been paid to new views of validity that extend
beyond the inferences that can be drawn about individuals to include a consideration of the con-
sequences of test use. The key proponent of this position is Messick (1989). Messick noted that it
is commonly asserted that the single most important attribute of a measure is its validity for its
intended uses. He noted that at times test use has unintended negative consequences, as in the case
in which a teacher abandons many key elements of a curriculum to focus all effort on preparing
students to be tested in one subject. Even if inferences about student domain mastery in that sub-
ject can be drawn with high accuracy, Messick argued that the negative consequences (i.e., ignor-
ing other subjects) may be so severe as to argue against the use of this test. If validity is the most
important attribute of a test, then the only way for negative consequences to have the potential to
outweigh validity evidence in a decision about the appropriateness of test use was for consequences
of test use to be included as a facet of validity. Thus he argued for a consideration of traditional
aspects of validity (which he labeled “evidential”) and these new aspects of validity (which he
labeled “consequential”). These ideas were generally well received in educational circles, and the
term “consequential validity” came to be used. In this usage, a measure with unintended negative
consequences lacks consequential validity. This perspective views such negative consequences as
invalidating test use.
The 1999 Standards rejects this view. Although evidence of negative consequences may influ-
ence decisions concerning the use of predictors, such evidence will only be related to inferences
about validity if the negative consequences can be directly traced to the measurement properties
of the predictor. Using an example that one of us (Sackett) contributed to the SIOP Principles for
the Validation and Use of Personnel Selection Procedures (2003), consider an organization that
(a) introduces an integrity test to screen applicants, (b) assumes that this selection procedure pro-
vides an adequate safeguard against employee theft, and (c) discontinues use of other theft-deterrent
methods (e.g., video surveillance). In such an instance, employee theft might actually increase after
the integrity test is introduced and other organizational procedures are eliminated. Thus, the inter-
vention may have had an unanticipated negative consequence on the organization. These negative
Reliability and Validity 45

consequences do not threaten the validity of inferences that can be drawn from scores on the integ-
rity test, because the consequences are not a function of the test itself.

Summary
In conclusion, we have attempted to develop seven major points about validity. These are that (a) we
have moved far beyond early conceptualizations of validity as the correlation between test scores
and criterion measures; (b) validity is not a characteristic of a test, but rather refers to inferences
made from test scores; (c) we have moved from conceptualizing different types of validity to a
perspective that there are different types of validity evidence, all of which contribute to an under-
standing of the meaning of test scores; (d) the key inference to be supported in employment settings
is the predictive inference, namely, that inferences about future job behavior can be drawn from
test scores; (e) there are multiple routes to gathering evidence to support the predictive inferences;
(f) reliability may weaken arguments for validity of inferences, but does so in a way that is more
complex than traditional notions that reliability puts a specifiable upper bound on validity; and (g)
although evidence about unintended negative consequences of test use (e.g., negative applicant reac-
tions to the test) may affect a policy decision as to whether or not to use the test, such evidence is not
a threat to the predictive inference and does not affect judgments about the validity of the test. Our
belief is that a clear understanding of these foundational issues in validity is essential for effective
research and practice in the selection arena.

References
Aguinis, H., Henle, C. A., & Ostroff, C. (2001). Measurement in work and organizational psychology. In
N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran (Eds.), Handbook of industrial, work, and
organizational psychology: Volume 1: Personnel psychology (pp. 27–50). London, England: Sage.
Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Graduate training in statistics, measurement, and methodol-
ogy in psychology: Replication and extension of Aiken, West, Sechrest, and Reno’s (1990) survey of PhD
programs in North America. American Psychologist, 63, 32–50.
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (1999). Standards for educational and psychological testing. Washington,
DC: American Educational Research Association.
American Psychological Association (joint committee). (1954). Technical recommendations for psychological
tests and diagnostic techniques. Psychological Bulletin, 51, 201–238.
Bell, J. F. (1985). Generalizability theory: The software problem. Journal of Educational and Behavioral
Statistics, 10, 19–29.
Binning, J. F., & Barrett, G.V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential
and evidential bases. Journal of Applied Psychology, 74, 478–494.
Bliese, P. D. (2002). Multilevel random coefficient modeling in organizational research. In F. Drasgow &
N. Schmitt (Eds.), Measuring and analyzing behavior in organizations (pp. 401–445). San Francisco,
CA: Jossey-Bass.
Borsboom, D. (2006). The attack of the psychometricians. Psychometrika, 71, 425–440.
Borsboom, D., & Mellenbergh, G. J. (2002). True scores, latent variables, and constructs: A comment on
Schmidt and Hunter. Intelligence, 30, 505–514.
Bost, J. E. (1995). The effects of correlated errors on generalizability and dependability coefficients. Applied
Psychological Measurement, 19, 191–203.
Brennan, R. L. (1998). Raw-score conditional standard errors of measurement in generalizability theory.
Applied Psychological Measurement, 22, 307–331.
Brennan, R. L. (2001a). An essay on the history and future of reliability from the perspective of replications.
Journal of Educational Measurement, 38, 295–317.
Brennan, R. L. (2001b). Generalizability theory. New York, NY: Springer-Verlag.
Brennan, R. L. (2006). Perspectives on the evolution and future of educational measurement. In R. L. Brennan
(Ed.), Educational measurement (4th ed., pp. 1–16). Westport, CT: American Council on Education and
Praeger.
46 Handbook of Employee Selection

Brown, T. A. (2006). Confirmatory factor analysis for applied research. New York, NY: Guilford Press.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of
Psychology, 3, 296–322.
Bryk, A. S., & Raudenbush, S.W. (1992). Hierarchical linear models. Newbury Park, CA: Sage.
Campbell, J. P. (1976). Psychometric theory. In M. D. Dunnette (Ed.), Handbook of industrial and organiza-
tional psychology (pp. 185–222). New York: John Wiley & Sons.
Cattell, R. B. (1966). The data box: Its ordering of total resources in terms of possible relational systems. In
R. B. Cattell (Ed.), Handbook of multivariate experimental psychology (pp. 67–128). Chicago, IL: Rand
McNally.
Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12, 1–16.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 292–334.
Cronbach, L. J. (1957). The two disciplines of scientific psychology. American Psychologist, 12, 671–684.
Cronbach, L. J. (1971). Test validation. In R. L. Thorndike (Ed.), Educational measurement (pp. 221–237).
Washington, DC: American Council on Education.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measure-
ments: Theory of generalizability for scores and profiles. New York, NY: John Wiley.
Cronbach, L. J., Linn, R. L., Brennan, R. L., & Haertel, E. H. (1997). Generalizability analysis for perfor-
mance assessments of student achievement or school effectiveness. Educational and Psychological
Measurement, 57, 373–399.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52,
281–300.
Cronbach, L. J., Rajaratnam, N., & Gleser, G. (1963). Theory of generalizability: A liberalization of reliability
theory. British Journal of Statistical Psychology, 16(2), 137–163.
Cronbach, L. J., Schoneman, P., & McKie, D. (1965). Alpha coefficient for stratified-parallel tests. Educational
& Psychological Measurement, 25, 291–312.
Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient alpha and its successor proce-
dures. Educational and Psychological Measurement, 64, 391–418.
Cureton, E. E. (1951). Validity. In E. F. Lindquist (Ed.), Educational measurement (pp. 621–694). Washington
DC: American Council on Education.
DeShon, R. P. (1995, May). Restricted maximum likelihood estimation of variance components in generaliz-
ability theory: Overcoming balanced design requirements. Paper presented at the 10th annual conference
of the Society of Industrial and Organizational Psychology, Orlando, FL.
DeShon, R. P. (1998). A cautionary note on measurement error corrections in structural equation models.
Psychological Methods, 3, 412–423.
DeShon, R. P. (2002). Generalizability theory. In F. Drasgow & N. Schmitt (Eds.), Measuring and analyzing
behavior in organizations (pp. 189–220). San Francisco, CA: Jossey-Bass.
Dunnette, M. D. (1966). Personnel selection and placement. Oxford, England: Wadsworth.
Ebel, R. L. (1951). Estimation of the reliability of ratings. Psychometrika, 16, 407–424.
Eid, M., Lischetzke, T., Nussbeck, F. W., & Trierweiler, L. I. (2003). Separating trait effects from trait-specific
method effects in multitrait-multimethod models: A multiple-indicator CT-C(M-1) model. Psychological
Methods, 8, 38–60.
Englehard, G. (1994). Examining rater errors in the assessment of written composition with a many-faceted
Rasch model. Journal of Educational Measurement, 31, 93–112.
Feldt, L. S., & Brennan, R. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3rd ed.,
pp. 105–146). New York: American Council on Education and Macmillan.
Fisher, R.A. (1925). Statistical methods for research workers (1st ed.). Edinburgh, Scotland: Oliver and Boyd.
Ghiselli, E. E. (1964). Theory of psychological measurement. New York: McGraw-Hill.
Green, S. B. (2003). A coefficient alpha for test-retest data. Psychological Methods, 8, 88–101.
Green, S. B., & Hershberger, S. L. (2000). Correlated errors in true score models and their effect on coefficient
alpha. Structural Equation Modeling, 7, 251–270.
Greguras, G. J., & Robie, C. (1998). A new look at within-source interrater reliability of 360 degree feedback
ratings. Journal of Applied Psychology, 83, 960–968.
Guion, R. M. (1974). Open a new window: Validities and values in psychological measurement. American
Psychologist, 29, 287–296.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions. Mahwah, NJ: Lawrence
Erlbaum.
Gulliksen, H. (1950). Theory of mental tests. New York, NY: Wiley.
Guttman, L. A. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282.
Reliability and Validity 47

Haertel, E. H. (2006). Reliability. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 65–110).
Westport, CT: American Council on Education and Praeger.
Hocking, R. R. (1995). Variance component estimation in mixed linear models. In L. K. Edwards (Ed.), Applied
analysis of variance in behavioral science (pp. 541–571). New York, NY: Marcel Dekker.
Hox, J. J. (1994). Hierarchical regression models for interviewer and respondent effects. Sociological Methods
and Research, 22, 300–318.
Hoyt, C. (1941). Test reliability obtained by analysis of variance. Psychometrika, 6, 153–160.
Jackson, S., & Brashers, D. E. (1994). Random factors in ANOVA. Thousand Oaks, CA: Sage.
Joreskog, K. G. (1971). Statistical analysis of sets of congeneric tests. Psychometrika, 36, 109–133.
Joreskog, K. G., and Sorbom, D. (2001). LISREL 8: User’s reference guide. Lincolnwood, IL: Scientific
Software International.
Kamata, A., Turhan, A., & Darandari, E. (2003, April). Estimating reliability for multidimensional composite
scale scores. Paper presented at the annual meeting of the American Educational Research Association,
Chicago, IL.
Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17–64). Westport,
CT: American Council on Education and Praeger.
Kenny, D. A., & Judd, C. M. (1986). Consequences of violating the independence assumption in analysis of
variance. Psychological Bulletin, 99, 422–431.
Kenny, D. A., & Kashy, D. A. (1992). Analysis of the multitrait-multimethod matrix by confirmatory factor
analysis. Psychological Bulletin, 112, 165–172.
Komaroff, E. (1997). Effect of simultaneous violations of essential tau-equivalence and uncorrelated errors on
coefficient alpha. Applied Psychological Measurement, 21, 337–348.
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2,
151–160.
Kuhn, T. S. (1962). The structure of scientific revolutions. Chicago: The University of Chicago Press.
Laenen, A., Vangeneugden, T., Geys, H., & Molenberghs, G. (2006). Generalized reliability estimation using
repeated measurements. British Journal of Mathematical and Statistical Psychology, 59, 113–131.
LaHuis, D. M., & Avis, J. M. (2007). Using multilevel random coefficient modeling to investigate rater effects
in performance ratings. Organizational Research Methods, 10, 97–107.
Lance, C. L. (2008). Why assessment centers don’t work the way they’re supposed to. Industrial and
Organizational Psychology, 1, 84–97.
Lance, C. L., Lambert, T. A., Gewin, A. G., Lievens, F., & Conway, J. M. (2004). Revised estimates of dimen-
sion and exercise variance components in assessment center postexercise dimension ratings. Journal of
Applied Psychology, 89, 377–385.
Landy, F. J. (1986). Stamp collecting versus science: Validation as hypothesis testing. American Psychologist,
41, 1183–1192.
Le, H., & Putka, D. J. (2007). Reliability. In S. G. Rogelberg (Ed.), Encyclopedia of industrial and organiza-
tional psychology (Vol. 2, pp. 675–678). Thousand Oaks, CA: Sage.
Le, H., Schmidt, F. L., & Putka, D. J. (2009). The multifaceted nature of measurement artifacts and its implica-
tions for estimating construct-level relationships. Organizational Research Methods, 12, 165–200.
Lee, G., Dunbar, S. B., & Frisbie, D. A. (2001). The relative appropriateness of eight measurement models
for analyzing scores from tests comprised of testlets. Educational and Psychological Measurement, 61,
958–975.
Lee, G., & Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores
composed of testlets. Applied Measurement in Education, 12, 237–255.
Littell, R. C., Milliken, G. A., Stroup, W. W., & Wolfinger, R. D. (1996). SAS system for mixed models. Cary,
NC: SAS Institute.
Loevinger, J. (1957). Objective tests as instruments of psychological theory [Monograph No. 9]. Psychological
Reports, 3, 635–694.
Lord, F. M. (1955). Estimating test reliability. Educational and Psychological Measurement, 15, 325–336.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Lumsden, J. (1976). Test theory. Annual Review of Psychology, 27, 251–280.
Marcoulides, G. A. (1990). An alternative method for estimating variance components in generalizability
­theory. Psychological Reports, 66, 102–109.
Marcoulides, G. A. (1996). Estimating variance components in generalizability theory: The covariance struc-
ture analysis approach. Structural Equation Modeling, 3, 290–299.
Marsh, H. W., & Grayson, D. (1994). Longitudinal confirmatory factor analysis: Common, time-specific,
­item-specific, and residual-error components of variance. Structural Equation Modeling, 1, 116–146.
48 Handbook of Employee Selection

Maxwell, A. E. (1968). The effect of correlated error on reliability coefficients. Educational and Psychological
Measurement, 28, 803–811.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.
McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients.
Psychological Methods, 1, 30–46.
McPhail, S. M. (Ed.) (2007). Alternative validation strategies: Developing new and leveraging existing validity
evidence. San Francisco, CA: John Wiley and Sons.
Merenda, P. F. (2007). Psychometrics and psychometricians in the 20th and 21st centuries: How it was in the
20th century and how it is now. Perceptual and Motor Skills, 104, 3–20.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York:
American Council on Education and Macmillan.
Mosier, C. I. (1943). On the reliability of a weighted composite. Psychometrika, 8, 161–168.
Mount, M. K., Judge, T. A., Scullen, S. E., Sytsma, M. R., & Hezlett, S. A. (1998). Trait rater and level effects
in 360-degree performance ratings. Personnel Psychology, 51, 557–576.
Murphy, K. R., & DeShon, R. (2000). Interrater correlations do not estimate the reliability of job performance
ratings. Personnel Psychology, 53, 873–900.
Ng, K. T. (1974). Spearman’s test score model: A restatement. Educational and Psychological Measurement,
34, 487–498.
Novick, M. R., & Lewis, C. (1967). Coefficient alpha and the reliability of composite measurements.
Psychometrika, 32, 1–13.
Nunnally, J. C. (1978). Psychometric theory. New York: McGraw-Hill.
Patz, R. J., Junker, B. W., Johnson, M. S., & Mariano, L. T. (2002). The hierarchical rater model for rated
test items and its application to large-scale educational assessment data. Journal of Educational and
Behavioral Statistics, 27, 341–384.
Ployhart, R. E., Schneider, B., & Schmitt, N. (2006). Staffing organizations: Contemporary practice and theory
(3rd ed.). Mahwah, NJ: Lawrence Erlbaum.
Putka, D. J., Ingerick, M., & McCloy, R. A. (2008a, April). Integrating reliability and validity-based perspec-
tives on error in performance ratings. Presentation in J. Cortina (Chair), Write, for these words are true:
Uncovering complexity in I/O. Symposium conducted at the 23rd Annual Society for Industrial and
Organizational Psychology Conference, San Francisco, CA.
Putka, D. J., Ingerick, M., & McCloy, R. A. (2008b). Integrating traditional perspectives on error in ratings:
Capitalizing on advances in mixed effects modeling. Industrial and Organizational Psychology, 1,
167–173.
Putka, D. J., Le, H., McCloy, R. A., & Diaz, T. (2008). Ill-structured measurement designs in organiza-
tional research: Implications for estimating interrater reliability. Journal of Applied Psychology, 93,
959–981.
Putka, D. J., & McCloy, R. A. (2008, February). Estimating variance components in SPSS and SAS: An anno-
tated reference guide. Retrieved March 23, 2009, from http://www.humrro.org/djp_archive/Estimating
Variance Components in SPSS and SAS.pdf
Qualls-Payne, A. L. (1992). A comparison of score level estimates of the standard error of measurement.
Journal of Educational Measurement, 29, 213–225.
Rae, G. (2007). A note on using stratified alpha to estimate the composite reliability of a test composed of inter-
related nonhomogeneous items. Psychological Methods, 12, 177–184.
Raykov, T. (1998). Coefficient alpha and composite reliability with interrelated nonhomogeneous items.
Applied Psychological Measurement, 22, 375–385.
Raykov, T. (2001a). Bias of coefficient α for fixed congeneric measures with correlated errors. Applied
Psychological Measurement, 25, 69–76.
Raykov, T. (2001b). Estimation of congeneric scale reliability using covariance structure analysis with nonlin-
ear constraints. British Journal of Mathematical and Statistical Psychology, 54, 315–323.
Raykov, T., & Shrout, P. E. (2002). Reliability of scales with general structure: Point and interval estimation
using a structural equation modeling approach. Structural Equation Modeling, 9, 195–212.
Reuterberg, S. E., & Gustafsson, J. E. (1992). Confirmatory factor analysis and reliability: Testing measure-
ment model assumptions. Educational and Psychological Measurement, 52, 795–811.
Rogers, W. M., Schmitt, N., & Mullins, M. E. (2002). Correction for unreliability of multifactor measures:
Comparison of alpha and parallel forms of approaches. Organizational Research Methods, 5, 184–199.
Rozeboom. W. W. (1966). Foundations of the theory of prediction. Homewood, IL: Dorsey.
Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26 research
scenarios. Psychological Methods, 1, 199–223.
Reliability and Validity 49

Schmidt, F. L., Le, H., & Ilies, R. (2003). Beyond alpha: An empirical examination of the effects of different
sources of measurement error on reliability estimates for measures of individual-differences constructs.
Psychological Methods, 8, 206–224.
Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not validity and validity is not reliability.
Personnel Psychology, 53, 901–912.
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assessment, 8, 350–353.
Schmitt, N., & Landy, F. J. (1993). The concept of validity. In N. Schmitt, W.C. Borman & Associates (Eds.),
Personnel selection in organizations (pp. 275–309). San Francisco, CA: Jossey-Bass.
Scullen, S. E. (1999). Using confirmatory factor analyses of correlated uniquenesses to estimate method vari-
ance in multitrait-multimethod matrices. Organizational Research Methods, 2, 275–292.
Scullen, S. E., Mount, M. K., & Goff, M. (2000). Understanding the latent structure of job performance ratings.
Journal of Applied Psychology, 85, 956–970.
Searle, S. R., Casella, G., & McCulloch, C. E. (1992). Variance components. New York, NY: Wiley.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Newbury Park, CA: Sage.
Shrout, P. E. (1993). Analyzing consensus in personality judgments: A variance components approach. Journal
of Personality, 61, 769–788.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological
Bulletin, 86, 420–428.
Smith, P. L., & Luecht, R. M. (1992). Correlated effects in generalizability studies. Applied Psychological
Measurement, 36, 229–235.
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of person-
nel selection procedures (4th ed.). Bowling Green, OH: Author.
Spearman, C. (1904). The proof and measurement of association between two things. American Journal of
Psychology, 15, 72–101.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295.
SPSS. (2005). Linear mixed-effect modeling in SPSS: An introduction to the mixed procedure (Technical Report
LMEMWP-0305). Chicago, IL: Author.
Thorndike, R. L. (1951). Reliability. In E. F. Lindquist (Ed.), Educational measurement (pp. 560–620).
Washington, DC: American Council on Education.
Tryon, R. C. (1957). Reliability and behavior domain validity: Reformulation and historical critique.
Psychological Bulletin, 54, 229–249.
Van Iddekinge, C. H., Putka, D. J., Raymark, P. H., & Eidson, C. E., Jr. (2005). Modeling error variance in
job specification ratings: The influence of rater, job, and organization-level factors. Journal of Applied
Psychology, 90, 323–334.
Van Iddekinge, C. H., Raymark, P. H., Eidson, C. E., Jr., & Attenweiler, W. J. (2004). What do structured
selection interviews really measure? The construct validity of behavior description interviews. Human
Performance, 17, 71–93.
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general factor in ratings of job performance?
A meta-analytic framework for disentangling substantive and error influences. Journal of Applied
Psychology, 90, 108–131.
Webb, N. M., & Shavelson, R. J. (1981). Multivariate generalizability of general educational development rat-
ings. Journal of Educational Measurement, 18, 13–22.
Werts, C. E., Linn, R. L., & Joreskog, K. G. (1974). Intraclass reliability estimates: Testing structural assump-
tions. Educational and Psychological Measurement, 34, 25–32.
Widaman, K. F. (1985). Hierarchically nested covariance structure models for multitrait-multimethod data.
Applied Psychological Measurement, 9, 1–26.
Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational measure-
ment (4th ed., pp. 111–153). Westport, CT: American Council on Education and Praeger.
Zimmerman, D. W., & Williams, R. H. (1980). Is classical test theory “robust” under violation of the assump-
tion of uncorrelated errors? Canadian Journal of Psychology, 34, 227–237.
Zimmerman, D. W., & Williams, R. H. (1997). Properties of the Spearman correction for attenuation for normal
and realistic non-normal distributions. Applied Psychological Measurement, 21, 253–279.
Zimmerman, D. W., Zumbo, B. D., & LaLonde, C. (1993). Coefficient alpha as an estimate of test reliability
under violation of two assumptions. Educational and Psychological Measurement, 53, 33–49.
This page intentionally left blank
3 Validation Strategies
for Primary Studies

Neal W. Schmitt, John D. Arnold, and Levi Nieminen

NATURE OF VALIDITY
Most early applications of the use of tests as decision-making tools in the selection of personnel in
work organizations involved a validation model in which the scores on tests were correlated with
some measure or rating of job performance, such as the studies of salespersons by Scott (1915) and
streetcar motormen by Thorndike (1911). This view of validity was reinforced in books by Hull
(1928) and Viteles (1932). Subsequent reviews by Ghiselli (1966, 1973) were similarly focused on
what was by then known as criterion-related validity.
During this time, there was a recognition that tests could and should be based on other logical
arguments as well. Standards for Educational and Psychological Tests (American Psychological
Association [APA], American Educational Research Association [AERA], and National Council
on Measurement in Education [NCME], 1954) identified four aspects of validity evidence: content,
predictive, concurrent, and construct. With time, the predictive and concurrent aspects of validity
became seen as simply different research designs, the purpose of which was to establish a predictor-
criterion relationship; hence, they became known as criterion-related validity. Content and construct
validation were seen as alternate methods by which one could validate and defend the use of test
scores in decision-making. A much broader view of the nature of validity is accepted today, and in
general it is seen as the degree to which the inferences we draw from a set of test scores about job
performance are accurate.

TRINITARIAN VERSUS UNITARIAN IDEAS OF VALIDITY


The “trinitarian” approach to validity was popularized and incorporated in the 1974 revision
of the APA Standards. This conceptualization of validity holds that there are three approaches
to the validation of tests. Content validation involves a demonstration that test items are a
representative sample of the behaviors to be exhibited in some performance domain. Content
validation typically depends heavily on a job analysis that specifies the tasks performed on
a job and how those tasks (or very similar tasks) are reflected in the tests used to make deci-
sions. Criterion-related validation involves the demonstration that scores on a test are related
to job performance measures. If job performance measures and test scores are obtained from
job incumbents at approximately the same time, the study involves what has become known
as concurrent criterion-related validity. If test scores are collected from job applicants and
performance measures are collected some time after these individuals are hired, the study
represents a test of predictive criterion-related validity. Thousands of criterion-related studies
have been summarized in meta-analyses over the last 30 years, and Schmidt and Hunter (1998)
have summarized these meta-analyses. Construct validation often includes a series of studies or
collections of evidence that a psychological concept or construct explains test performance. In

51
52 Handbook of Employee Selection

addition, there should be support for the notion that the construct is central to the performance
of job tasks.
This separation of approaches to validity produced numerous problems, not the least of which
was the notion that there were times when one approach was to be preferred over another or that
there were different acceptable standards by which these different aspects of validity were to be
judged. Most important, however, was the realization on the part of measurement scholars that all
were aspects of construct validity—the theoretical reasonableness of our explanations of job behav-
ior. There was a realization that the inferences we derive from test scores was central to all valida-
tion work. Content validity and the practices usually associated with it were recognized as desirable
practices in the development of any test. Careful consideration of the “theory” and hypotheses that
underlie our conceptualization of performance and how the constructs central to job performance
are represented in our tests is always important and unifying insofar as our validation efforts are
concerned. Traditional criterion-related research represents one type of evidence that can be col-
lected to confirm/disconfirm these hypotheses. This “unitarian” approach to validity was strongly
argued in the 1985 Standards and has been incorporated most completely in the 1999 version of the
Standards (AERA, APA, & NCME, 1999).

DIFFERENT APPROACHES TO THE COLLECTION OF DATA ABOUT VALIDITY


Validity, as defined in the most recent version of the Standards (1999), is the degree to which
evidence and theory support the interpretation of test scores proposed by the test user. The user
must state explicitly what interpretations are to be derived from a set of test scores including the
nature of the construct thought to be measured. The document goes on to describe a variety of
evidence that can support such an interpretation. An evaluation of test themes, wording, item
format, tasks, and administrative guidelines all comprise the “content” of a test, and a careful
logical or empirical analysis of the relationship of this content to the construct measured as well
as expert judgments about the representativeness of the items to the construct measured supports
validity.
Validity evidence can also take the form of an examination of the response processes involved
in responding to an item. For example, in evaluating the capabilities of an applicant for a mechanical
job, we might ask the person to read a set of instructions on how to operate a piece of equipment
and then ask the applicant to demonstrate the use of the equipment. Because the equipment is
used on the job, it would seem valid, but suppose we also find that test scores are highly related
to examinees’ vocabulary level. We would then want to know if vocabulary is necessary to learn
how to use this equipment on the job and depending on the answer to that question, we may want
to revise the test.
Yet a third piece of evidence might be to collect data regarding the internal structure of a test.
We would examine the degree to which different items in a test (or responses to an interview) yield
correlated results and whether items designed to measure different constructs can be differentiated
from items written to assess a different construct. Researchers interested in these questions use item
means, standard deviations, and intercorrelations as well as exploratory and confirmatory analyses
to evaluate hypotheses about the nature of the constructs measured by a test.
Similar to looking at the internal structure of a test, researchers can also examine its external
validity by correlating the test with measures of theoretically similar or dissimilar constructs as
well as job performance measures that are hypothesized correlates of test scores. Validity in the
personnel selection area has been almost synonymous with the examination of the relationship
between test scores and job performance measures, most often referred to as criterion-related valid-
ity. Because there is a large body of primary studies of many job performance-test relationships,
one can also examine the extent to which tests of similar constructs are related to job performance
and generalize in a way that supports the validity of a new measure or an existing measure in a new
context. These are studies of validity generalization.
Validation Strategies for Primary Studies 53

Finally, and somewhat controversial among industrial-organizational (I-O) psychologists, the


Standards (1999) also suggest that researchers examine the intended and unintended consequences
of test use to make decisions. This evidence is referred to as consequential validity (Messick, 1998).
The consequences of most concern are the degree to which use of test scores results in dispro-
portionate hiring of one or more subgroups (e.g., gender, race, disabled). The 1999 version of the
Standards clearly reflects a unitarian view of validation; the old tripartite notions and terms are not
mentioned.
The Society for Industrial and Organizational Psychology’s Principles (2003) described the
sources of validity evidence in a way that more reflects the trinitarian approach to validation in that
it includes a relatively lengthy discussion of criterion-related evidence, evidence based on content,
and evidence based on the internal structure of tests and their relationships to other variables. The
Principles, like the Standards, also recognized the central role of psychological constructs in all
validation research.
Finally, some I-O psychologists have also noted that the traditional separation of reli-
ability and validity concepts may be inadequate (Lance, Foster, Gentry, & Thoresen, 2004;
Lance, Foster, Nemeth, Gentry, & Drollinger, 2007; Murphy & DeShon, 2000). Their ideas are
addressed in Chapter 2 of this volume. It is also the case that technology affords the opportunity
to make the traditional one-time criterion-related validity study an ongoing effort in which the
accumulation of predictor and criterion data can be collected and aggregated across time and
organizations.

VALIDATION IN DIFFERENT CONTEXTS


This chapter will discuss validation largely within the context of personnel selection. This is the
most common application of the various approaches to validation. It is also the most straightforward
example of how validation approaches can be applied.
There is a wide range of contexts in which the validation of measures is desirable; however,
organizations should, for example, ensure they are using “validated” tools and processes in their
performance management systems, in their assessments of training and development outcomes, in
their promotion and succession planning processes, etc.
Each of these circumstances is associated with its own set of challenges as the researcher designs
an appropriate validation study. However, the design of the well-constructed study by necessity will
follow the same logic as will be discussed for the personnel selection context. Following this logic,
the studies should be structured to include the following three elements:

1. Job analysis: The foundation of validation in employment settings always involves the
development of a clear understanding of job and organizational requirements. For exam-
ple, for promotion purposes these would be the requirements of the target job(s) into which
a person might be promoted. For training and development purposes, these would be the
meaningful outcomes in terms of on-the-job performance that are the focus of the training/
development efforts.
2. Systematic development: As measures are developed, they need to follow an architecture
that is firmly grounded in the results of the job analysis. As the development of the mea-
sures is planned and as the tools are being constructed, activities need to be focused on
ensuring that the measures are carefully targeted to address the intended constructs.
3. Independent verification: Once the measures are developed, they need to be subjected
to independent verification that they measure the intended constructs. At times, this can
involve statistical studies to determine whether the measures exhibit expected relation-
ships with other independent measures (e.g., Does the 360-degree assessment of leader-
ship behavior correlate with an independent interview panel’s judgment of a leader’s
54 Handbook of Employee Selection

behavior?). Often, the independent verification is derived from structured expert reviews
of the measures that are conducted prior to implementation. Regardless of the method, this
“independent verification” is a necessary aspect of verifying the validity of a measure.

STRONG VERSUS WEAK INFERENCES ABOUT VALIDITY


The field’s evolving conceptualization of validity has important implications for I-O researchers
concerned with designing and conducting primary validation studies. Given that validation is a
process of collecting evidence to support inferences derived from test scores (e.g., that a person
will perform effectively on a job), the confidence with which inferences are made is a function of
the strength of the evidence collected. In this way, the strength of the validity evidence refers to the
probability that the inference is correct, with “stronger” evidence connoting higher probabilities.
Consistent with the unitarian view, validation can be viewed as a form of hypothesis testing (Landy,
1986; Messick, 1975), and judgments of validity are to be based on the same host of considerations
applicable to judgments concerning the veracity with which a null hypothesis would be rejected in
any psychological research (e.g., the extent to which threats to validity are ruled out; see Cook &
Campbell, 1979). Thus, it is critical for researchers designing and conducting local validation stud-
ies to concentrate their efforts on ensuring that studies result in strong evidence for the inferences
they wish to make in much the same way that they would otherwise “defend” their conclusions in a
hypothesis testing situation.
Gathering stronger evidence of validity almost always necessitates increased effort, resources,
and/or costs (e.g., to gain larger sample sizes or expand the breadth of the criterion measures).
Thus, a key decision for researchers designing primary validation studies involves determin-
ing how to optimize the strength of the study (assurance that inferences are correct) within the
bounds of certain practical limitations and organizational realities. Situations may vary in terms
of the extent to which feasibility drives the researcher’s choice among validation strategies. In
some cases, situational limitations may be the primary determinant of the validation strategies
available to researchers. For example, for situations in which adequately powered sample sizes
cannot be achieved, validation efforts may require use of synthetic validity strategies (Scherbaum,
2005), transporting validity evidence from another context that is judged to be sufficiently simi-
lar (Gibson & Caplinger, 2007), generalizing validity across jobs or job families on the basis of
meta-analytic findings (McDaniel, 2007; Rothstein, 1992), or relying on evidence and judgments
that the content of the selection procedures is sufficiently similar to job tasks to support their use
in decision-making. Other factors noted by the Principles that may serve to limit the feasibility
of certain validation strategies include unavailability of criterion data, inaccessibility to subject
matter experts (SMEs) as might be the case when consulting SMEs would compromise test secu-
rity, dynamic working conditions such that the target job is changing or does not yet exist, and
time and/or money. Clearly then, validation strategy needs to account for feasibility-driven con-
siderations and researchers’ judgment about the strength of evidence required. Further, because
these demands are often competing, researchers are frequently required to make the best of a less
than optimal situation.
Given the need to balance several competing demands (e.g., issues of feasibility limiting the
approach that can be taken vs. upholding high standards of professionalism and providing strong
evidence to support key inferences), it is essential that researchers understand the various factors
that have potentially important implications for the strength of evidence that is required in a given
validation scenario. In other words, part of the decision process, with regard to planning and imple-
menting validation strategy, is a consideration of how strong the evidence in support of key infer-
ences ought to be. The basic assumption here is that different situations warrant different strategies
along several dimensions (Sussman & Robertson, 1986), one of which has to do with the strength of
evidence needed in support of inferences. Consideration of how certain situational factors inform
the adequacy of a validation effort does not imply that the validation researcher adopt a minimalist
Validation Strategies for Primary Studies 55

approach. Rather, all validation studies and selection practices should aspire to the ethical and
professional guidelines offered in the Principles, which means using sound methods rooted in sci-
entific evidence and exhibiting high standards of quality. However, the Principles’ guidelines are
not formulaic to the exclusion of professional judgment, nor are their applications invariant across
circumstances. In the following paragraphs, several factors are identified that have potential impli-
cations for the strength of the evidence needed by a local validation effort. In general, these factors
represent situations in which conclusions regarding the validity of a selection practice need to be
made with a higher degree of confidence than usual. In turn, these situations tend to warrant intensi-
fied research strategies.

SITUATIONAL FACTORS INFLUENCING THE STRENGTH OF EVIDENCE NEEDED


Although it is beyond the scope of this chapter to describe in full detail the legal issues surrounding
validation research and selection practice (see Chapters 29 and 30, this volume, for further discus-
sions of legal issues), it would be difficult if not impossible to describe applied validation strategy
without underscoring the influence of litigation or the prospect of litigation. It is becoming almost
cliché to state that, in circumstances in which there is a relatively high probability of litigation
regarding selection practices, validation evidence is likely to function as a central part of defending
selection practices in the courtroom. Indeed, much validation research is stimulated by litigation,
whether post facto or in anticipation of lawsuits. Within this context, researchers make judgments
regarding the potential for litigation and adapt their validation strategies accordingly. Numerous
contextual factors contribute to the probability that litigation will occur. A primary example has to
do with the type of selection instrument being validated and the potential for adverse impact, or the
disproportionate rejection of identifiable subgroups. Tests that have historically resulted in adverse
impact, such as tests of cognitive ability (Schmitt, Rogers, Chan, Sheppard, & Jennings, 1997)
or physical ability (Arvey, Nutting, & Landon, 1992; Hogan & Quigley, 1986) tend to promote
more litigation, and researchers validating these instruments in a local context should anticipate
this possibility. Similarly, selection instruments with low face validity (i.e., the test’s job relevance
is not easily discerned by test-takers) are more likely to engender negative applicant reactions
(Shotland, Alliger, & Sales, 1998), and decisions based on such tests may lead to applicant percep-
tions of unfairness (Cropanzano & Wright, 2003). In their recent review of the antecedents and
consequences of employment discrimination, Goldman, Gutek, Stein, & Lewis (2006) identified
employee perceptions of organizational and procedural justice as important antecedents of dis-
crimination lawsuits. In addition to considering characteristics of the selection instrument(s) being
validated, lawsuits over selection practice are more frequent in some industry (e.g., government)
and job types (Terpstra & Kethley, 2002).
Researchers should also consider the implications and relative seriousness of selection systems
resulting in hiring decisions that are false positives or false-negative errors. A false positive is made
by selecting an unqualified individual whose performance on the job will be low, whereas a false-
negative error is made by rejecting a qualified individual whose performance on the job would have
been high. Making an error of either type can be potentially costly to the organization. However, the
relative impact of such errors can differ by occupation type and organizational context. For exam-
ple, the negative impact of a false positive in high-risk occupations (e.g., nuclear power plant opera-
tor or air-traffic controller) or high visibility occupations (e.g., Director of the Federal Emergency
Management Agency [FEMA]) can be catastrophic, threaten the organization’s existence, and so
on (Picano, Williams, & Roland, 2006). Alternatively, for occupations that are associated with less
risk, such that failure on the job does not have catastrophic consequences for the organization or
larger society, or when organizations use probationary programs or other trial periods, the cost of
false-positive errors may be relatively low. Although validation efforts in both situations would be
concerned with selection errors and demonstrating that use of tests can reduce the occurrence and
negative consequences of such errors, clearly there are some situations in which this would be more
56 Handbook of Employee Selection

of a central focus of the validation effort. It is our contention that validating selection systems for
high-risk occupations are a special circumstance warranting particularly “watertight” validation
strategies in which strong evidence should be sought to support the inferences made. In these cir-
cumstances, a test with low validity (e.g., less than r = .10) might be used to make hiring decisions
if that relationship is found to be statistically significant.
In some circumstances, the cost of false negatives is more salient. For example, strong evidence
of a test’s validity may be warranted when an organization needs to fill a position or several posi-
tions, but applicants’ test scores are below some acceptable standard, indicating that they are not fit
to hire (i.e., predicted on-the-job performance is low or very low). In this case, the organization’s
decision to reject an applicant on the basis of their test scores would leave a position or several
positions within the organization vacant, a costly mistake in the event that false-negative errors
are present. Demonstrable evidence to support the test’s validity would be needed to justify such a
decision, and in essence, convince the organization that they are better off with a vacant position
than putting the wrong person in the job. In these instances, one might want evidence of a larger
test-criterion relationship (perhaps greater than r = .30) to warrant use of this test and the possible
rejection of competent applicants.
The possibility of false negatives becomes a special concern when members of some subgroup(s)
are selected less frequently than members of another subgroup. When unequal ratios of various
subgroups are selected, the organization must be prepared to show that false negatives are not
primarily of one group as opposed to another. When this is impossible, the legal and social costs
can be very high. Concern about these costs is another reason to be concerned about the false
negatives aside from the concerns associated with possible vacant positions.
Personnel psychologists have long been aware of the fact that the utility of selection systems
increase as a function of selectivity, such that selection instruments even modestly related to impor-
tant outcomes can have large payoffs when there are many applicants from which only a few are to
be selected (Brogden, 1951, 1959). On the other hand, as selection ratios become extremely liberal,
such that nearly all applicants are accepted, even selection instruments highly related to perfor-
mance have less positive implications for utility. From a purely utilitarian perspective, it would
seem logical that demonstrating test validity is less of an impetus when selection ratios are liberal
(because even the best tests will have little effect) and more of an impetus when selection ratios are
low. The consequences of large-scale applicant rejections would also seem to justify more rigorous
validation methods from societal and legal perspectives. However, these extremes likely occur less
frequently in practice, as indicated by Schmidt and Hunter (1998), who cite typical selection ratios
that are moderate, ranging on average from .30 to .70.
In licensing examinations, this utility perspective takes a different form because the major pur-
pose of these examinations is to protect the public from “injuries” related to incompetent practice.
In this case, the license-no license decision point using test scores is usually set at a point that is
judged to indicate “minimal competence.” Depending on the service provided (e.g., hairdresser vs.
surgeon), the cost of inappropriately licensing a person could be very different. On the other hand,
certification examinations are usually oriented toward the identification of some special expertise in
an area (e.g., pediatric dentistry or forensic photography), hence a decision as to a score that would
warrant certification might result in the rejection of larger numbers or proportions of examinees.
The cost-benefit balance in this situation (assuming all are minimally competent) might accrue
mostly to the individual receiving the certification in the form of greater earning power.
Another factor that can affect the extent of the local validation effort that is required is the avail-
ability of existing validation research. The Principles describes three related validation strategies
that can be used as alternatives to conducting traditional local validation studies or to support the
conclusions drawn at the primary study level. First, “transportability” of validity evidence involves
applying validity evidence from one selection scenario to another, on the basis that the two con-
texts are judged to be sufficiently similar. Specifically, the Principles note that researchers should
be concerned with assessing similarity in terms of job characteristics [e.g., the knowledge, skills,
Validation Strategies for Primary Studies 57

and abilities (or KSAs) needed to perform the job in each context], job tasks and content, applicant
pool characteristics, or other factors that would limit generalizability across the two contexts (e.g.,
cultural differences). Assessing similarity in this manner usually requires that researchers conduct
a job analysis or rely on existing job analysis materials combined with their own professional exper-
tise and sound judgment.
Second, synthetic validity is a process in which validity for a test battery is “synthesized” from
evidence of multiple predictor-job component relationships (Peterson, Wise, Arabian, & Hoffman,
2001; Scherbaum, 2005). Job analysis is used to understand the various components that comprise
a particular job, and then predictor-job-component relationships are collected for all available jobs
with shared components. Because evidence can be drawn from other jobs besides the focal job, syn-
thetic validity may be a particularly useful strategy for organizations that have too few incumbents
performing the focal job to reach adequate sample sizes for a traditional criterion-related study
(Scherbaum, 2005).
Third, validity generalization involves using meta-analytic findings to support the conclusion
that predictor-criterion validity evidence can be generalized across situations. Like transportability
strategies, meta-analytic findings provide researchers with outside evidence to support inferences
in a local context. The argument for validity generalization on the basis of meta-analyses is that
some selection tests, such as cognitive ability tests (Ones, Viswesvaran, & Dilchert, 2005), are valid
across selection contexts. Thus, the implication is that with validity generalization strategies, unlike
transportability, in-depth job analyses or qualitative studies of the local organizational context are
unnecessary. In support of this assertion, the program of research initiated by Schmidt and Hunter
and colleagues (for review, see Schmidt & Hunter, 2003) has argued that between-study variability
in validity coefficients can be largely attributed to statistical artifacts, such as range restriction,
unreliability, or sampling error. However, caution is warranted to the extent that meta-analyses have
identified substantive moderators, or in the presence of strong theory indicating that some variable
may moderate the magnitude of validity. Further, with regard to generalization across contexts,
inferences drawn from meta-analytic findings are limited to the contexts of those studies included
in the meta-analysis. At minimum, meta-analytic findings should be referenced in test development
and can be used to supplement evidence at the local level, either via theoretical or statistical means
(Newman, Jacobs, & Bartram, 2007). The argument for more direct use of validity generalization
strategies is dependent on the strength of the meta-analytic findings.

DESIGN CONSIDERATIONS WHEN STRONG EVIDENCE IS NEEDED


On the basis of the preceding discussion, situational factors can affect the feasibility and appropri-
ateness of the validation models applied to a given selection context. Moreover, researchers should
be particularly attuned to contextual variables that warrant an increased concern for demonstrating
the strength of evidence collected and high levels of confidence in the inferences to be made. The
validity strategies used reflect consideration of these contextual factors and others. For instance, in
apparent response to increasing litigation, Boehm (1982) found longitudinal trends suggesting that
published validation studies were using larger sample sizes, relying on supervisory rating criteria
less frequently, and utilizing more predictive (as opposed to concurrent) criterion-related designs.
Similarly, the discussion that follows is focused on identifying a handful of actionable validation
strategies to be considered by researchers when particularly strong evidence is needed.

Importance of the Nomological Net


Binning and Barrett (1989) offered a thorough conceptualization of the nomological network
implicit in validity models. Their model identifies multiple inferential pathways interrelating
psychological constructs and their respective operational measures. Inferential pathways in the
model are empirically testable using observed variables (e.g., linkages between operationalized
measures of constructs and linkages between constructs and their operationalized measures).
58 Handbook of Employee Selection

Others may be theoretically or rationally justified (e.g., construct to construct linkages) or tested
using latent variable models, although these applications are relatively rare in personnel selec-
tion research (see Campbell, McHenry, & Wise, 1990, for an attempt to model job performance).
Consistent with the unitarian conceptualization of validity, all validity efforts in a selection con-
text are ultimately concerned with demonstrating that test scores predict future job performance,
and each of the various inferential pathways represents sources or types of evidence to support
this common inference. Binning and Barrett (1989, p. 482) described how “truncated” validation
strategies often concentrate exclusively on demonstrating evidence for a single inferential path-
way and as a result provide only partial support for conclusions regarding test validity. A more
cogent argument for validity is built upon demonstration of strong evidence for several inferential
pathways within the nomological network. For example, in addition to demonstrating a statisti-
cal relationship between operational measures from the predictor and performance domain, as
is commonly the main objective in criterion-related validity studies, researchers should provide
evidence of the psychological constructs underlying job performance (as well as the predictor
measures) and demonstrate that the criterion measure adequately samples constructs from the
performance domain.

Criterion Concerns
In practice, criterion-related validity studies are often criticized for failing to adequately address
validity issues surrounding the criterion measure(s) used. The relative lack of scientific scrutiny
focused on criteria, termed the “criterion problem” (Austin & Villanova, 1992), has been a topic
of discussion among personnel psychologists for years (Dunnette, 1963; Fiske, 1951; Guion, 1961).
Universal to these discussions is the call for more rigorous validation evidence with respect to
the criteria that are used. Binning and Barrett (1989) outlined this task, underscoring two inter-
related goals for the validation researcher. First, they suggested that the selection of criteria should
be rooted in job analysis to the same extent that selection of predictors traditionally are (i.e.,
more attention to rigorous “criterion development”). Other considerations relevant to the task of
criterion development and validation include the use of “hard” or objective criteria versus more
proximal behaviors that lead to these outcomes (Thayer, 1992), use of multiple relevant crite-
ria as opposed to a single overall criterion (Dunnette, 1963), and the possibility that criteria are
dynamic (Barrett, Caldwell, & Alexander, 1985). Second, researchers should be concerned with
demonstrating evidence of construct-related validity for the criterion. Investigators must specify
the latent dimensions that underlay the content of their criterion measures. This involves expansion
of the nomological network to include inferences that link the criterion measure(s) to constructs in
the performance domain (e.g., by demonstrating that criterion measures are neither contaminated
nor deficient with respect to their coverage of the intended constructs in the performance domain)
and link constructs in the performance domain to job demands that require specific ability or moti-
vational constructs (e.g., by demonstrating through job analysis that constructs in the performance
domain are organizationally meaningful). Campbell and his colleagues (e.g., Campbell, McCloy,
Oppler, & Sager, 1993) have repeatedly emphasized this emphasis on the nature of criteria or per-
formance constructs. These authors make the somewhat obvious, although often overlooked, point
that performance should be defined as behavior (“what people actually do and can be observed”);
the products of one’s behavior, or what are often called “hard criteria,” are only indirectly the
result of one’s behavior and other factors that are not attributable to an individual job incumbent.
Further, we may consider relatively short term or proximal criteria or distal criteria, such as the
impact of one’s career on some field of interest. Any specification of a performance or criterion
domain must also consider the impact of time (Ackerman, 1989; Henry & Hulin, 1989). In any
study of performance, these various factors must be carefully considered when one decides on the
nature of the performance constructs and actual operationalizations of the underlying constructs
and how those measures might or might not be related to measures of other constructs in the
domain of interest.
Validation Strategies for Primary Studies 59

Multiple Inferences in Validation Research


Gathering evidence to support multiple inferences within a theoretically specified nomological net-
work resembles a pattern-matching approach. The advantage of pattern-matching research strate-
gies is that stronger support for a theory can be gained when complex patterns of observed results
match those that are theoretically expected (Davis, 1989). Logically, it would be less likely that a
complex pattern of results would be observed simply because of chance. In addition, when experi-
mental control of potentially confounding variables is not possible, pattern matching can be used
to preempt alternative explanations for the observed relationships (i.e., threats to validity; Cook &
Campbell, 1979).
Campbell and Fiske (1959) described a specific application of pattern matching that can be used
to support inferences regarding the appropriateness with which a construct is operationalized in
measurement. A slight variation of the convergence/divergence idea is included in Binning and
Barrett’s (1989) elaborated nomological network model. The elaborated model includes inferences
regarding the proximal versus distal nature of relationships between constructs. A sound argument
for validity can be made on the basis of results that indicate a reliable pattern, in which strong sta-
tistical relationships are obtained for constructs that are theoretically proximal to one another and
weaker statistical relationships are obtained for constructs that are theoretically more distal. This, of
course, is the rationale underlying mediation (Baron & Kenny, 1986) and the idea of causal chains.
Even without testing mediation directly, a strong argument can be made by demonstrating that a
predictor correlates highly with proximal operational measures of the criterion construct, and that
this relationship is attenuated as criterion measures become more distally related to the construct
space. For example, Thayer (1992) noted that in many cases objective outcome criteria, although
important at the organizational level, are distal indicators of the job performance construct because
they are contaminated by factors outside of the employee’s control. Also, low reliability was found
for the theoretically removed, “hard” criteria measures discussed by Thayer. In contrast, criteria
that assess employee behavior directly as defined above are more proximal to the job performance
construct. To the extent that these proximal criteria can be identified and measured, stronger support
for validity is gained in the event that test scores correlate more highly with proximal as compared
to distal criteria.
A more extensive form of pattern matching involves the use of multiple studies, or research pro-
grams, to corroborate evidence of validity. Again, the logic is straightforward; stronger evidence
is gained when a constellation of findings all lead to the same conclusion. Sussman and Robertson
(1986) suggested that programs of research could be undertaken, “composed of multiple stud-
ies each utilizing a different design and aimed at collecting different types of evidence” (p. 467).
Extending the rationale of the multi-trait multi-method (MTMM) (Campbell & Fiske, 1959), con-
vergent evidence across studies may indeed be stronger if gained through different research designs
and methods. Landy’s (1986) assertion that test validation is a form of hypothesis testing, and that
judgments of validity are to be based on a “preponderance of evidence” (p. 1191; Guion, as cited in
Landy, 1986), provides the context for consideration of research strategies such as quasi-experimen-
tal designs (Cook & Campbell, 1979) and program evaluation research (Strickland, 1979). Binning
and Barrett (1989) presented a similar rationale by calling for “experimenting organizations”
(p. 490) in which local validation research is treated as an ongoing and iterative process. Published
research on use of these research methods in a selection-validation context remains sparse to date.

SPECIAL CONSIDERATIONS IN CRITERION-RELATED STUDIES


In the event that a criterion-related strategy is part of the validation effort undertaken, a special set
of considerations is relevant. Power analysis is a useful framework for interrelating the concepts
of statistical significance, effect size, sample size, and reliability (Cohen, 1988) and has design
and evaluation implications for the statistical relationships sought in criterion-related studies. For
60 Handbook of Employee Selection

instance, the sample size needed to demonstrate a statistically significant predictor-criterion rela-
tionship decreases as the magnitude of the relationship that exists between predictor and criterion
(i.e., effect size) increases. Sussman and Robertson (1986), in their assessment of various predic-
tive and concurrent validation designs, found that those strategies that allowed larger sample sizes
gained a trivial increment in power. This suggests that, as long as sample sizes can support the
use of a criterion-related design, further attention toward increasing N may not reap large benefits.
Other factors affecting power include the interrelatedness and number of predictors used, such that
the addition of nonredundant predictors increases power (Cascio, Valenzi, & Silbey, 1978). The
reliability of the predictors and criteria and the decision criteria used for inferring that a relation-
ship is nonzero also impact power.
By incorporating power analysis in validation design, researchers can increase the likelihood
that relationships relevant to key inferences will be tested with sufficient sample size upon which to
have confidence in the results. However, from a scientific standpoint, the importance of demonstrat-
ing that predictor-criterion relationships are statistically significant may be overstated, given that
relationships, which may not be practically meaningful, can reach statistical significance with large
enough sample sizes. For instance, a statistically significant relationship, in which a test accounts for
less than 5% of the variance in job performance, is not unequivocal support for the test’s use. This
is especially evident when there is reason to suggest that other available tests could do a better job
predicting performance. Nonetheless, we agree with Cortina and Dunlap’s (1997) overall contention
that statistical significance testing remains a useful tool to researchers when decision criteria are
needed. For this reason, strong evidence in support of the predictor-criterion relationship should be
derived based on the size and significance of the validity coefficients.
Operationally, there are several other important considerations in criterion-related research (e.g.,
job analyses that support the relevance of predictor and criterion constructs and the quality of the
measures of each set of constructs). However, those concerns are addressed repeatedly in textbooks
(e.g., Guion, 1998; Ployhart, Schneider, & Schmitt, 2006). In the next section of this chapter, we
address a very important concern that is rarely discussed.

CONCERNS ABOUT THE QUALITY OF THE DATA: CLEANING THE DATA


Once data have been collected, quality control techniques should be applied to ensure that the data
are clean before proceeding to statistical analysis. Some basic quality control techniques include
double-checking data for entry errors, spot checks for discrepancies between the electronic data
and original data forms, inspecting data for out-of-range values and statistical outliers, and visual
examination of the data using graphical interfaces (e.g., scatter plots, histograms, stem-and-leaf
plots). Special concern is warranted in scenarios with multiple persons accessing and entering data
or when data sets from multiple researchers are to be merged. Although these recommendations
may appear trite, they are often overlooked and the consequence of erroneous data can be profound
for the results of analyses and their interpretations.
A study by Maier (1988) illustrated, in stepwise fashion, the effects of data cleaning procedures
on validity coefficients. Three stages of data cleaning were conducted, and the effects on correla-
tions between the Armed Services Vocational Aptitude Battery (ASVAB) and subsequent perfor-
mance on a work sample test for two military jobs (radio repairers and automotive mechanics) were
observed. Selection was based on the experimental instrument (the ASVAB) and the work sample
criterion tests were administered to incumbents in both occupations after some time had passed. In
Phase 1 of the data cleaning process, the sample was made more homogenous for the radio repairers
group by removing the data of some employees who received different or incomplete training before
criterion data collection. In comparison to the total sample, the validity coefficient for the remain-
ing, more representative group that had received complete training before criterion collection was
decreased (from .28 to .09). The initial estimate had been inflated because of the partially trained
group having scored low on the predictor and criterion.
Validation Strategies for Primary Studies 61

In Phase 2, scores on the criterion measure (i.e., ratings from a single rater on a work sample)
were standardized across raters. There were significant differences between raters that were attrib-
uted to different rating standards and not to group differences in ratees, such as experience, rank,
or supervisor performance ratings. The raters were noncommissioned officers and did not receive
extensive training in the rating task, so that differences between raters in judgmental standards
were not unexpected. As a result, the validity coefficients for both jobs increased (radio repairers,
from .09 to .18; automotive mechanics, from .17 to .24). In Phase 3, validity coefficients were cor-
rected for range restriction, which again resulted in an increase in the observed validity coefficients
(radio repairers, from .18 to .49; automotive mechanics, from .24 to .37). Maier noted that the final
validity coefficients were within the expected range on the basis of previous studies.
The Maier (1988) study is illustrative of the large effect that data cleaning can have for attaining
more accurate estimates of validity coefficients in a predictive design scenario. Several caveats are
also evident, so that researchers can ensure that data cleaning procedures conducted on sound pro-
fessional judgment are not perceived as data “fudging.” First, the cleaning procedures need to have
a theoretical or rational basis. Researchers should document any decision criteria used and the sub-
stantive changes that are made. For example, researchers should record methods used for detecting
and dealing with outliers. In addition, a strong case should be built in support of any decisions made.
The researcher bears the burden of defending each alteration made to the data. For example, in the
Maier study, the decision to standardize criterion data across raters (because raters were relatively
untrained and used different rating standards) was supported by empirical evidence that ruled out
several alternative explanations for the mean differences observed between raters.

MODES OF DECISION-MAKING AND THE IMPACT


ON UTILITY AND ADVERSE IMPACT
If we have good quality data, it still matters how we use those data in making decisions as to
whether or not use of the test produces aggregated performance improvements. In this section, we
will discuss the impact of various modes of decision-making on two outcomes that are of concern
in most organizations: Overall performance improvement or utility and adverse impact on some
protected group defined as unequal proportions of selection across subgroups. Advancing both out-
comes is often in conflict, especially when one uses cognitive ability tests to evaluate the ability of
members of different racial groups or physical ability when evaluating male and female applicants
for a position. Measures of some other constructs (e.g., mechanical ability) produce gender or race
effects, but the subgroup differences that are largest and affect the most people are those associated
with cognitive and physical ability constructs.

TOP-DOWN SELECTION USING TEST SCORES


If a test has a demonstrable relationship to performance on a job, it is the case that the optimal util-
ity in terms of expected employee performance will occur when the organization selects the top
scoring persons on the test to fill its positions (Brown & Ghiselli, 1953). Expected performance is
a direct linear function of the test score-performance relationship in the situation in which the top
scoring individuals are selected. However, use of tests in this fashion when it is possible will mean
that lower scoring subgroups will be less likely to be selected (Murphy, 1986). This conflict between
maximization of expected organizational productivity and adverse impact is well known and has
been quantified for different levels of subgroup mean differences in ability and selection ratios
(Sackett, Schmitt, Ellingson, & Kabin, 2001; Sackett & Wilk, 1994; Schmidt, Mack, & Hunter,
1984). For social, legal, and political reasons as well as long-term organizational viability in some
contexts, the adverse impact of a strict top-down strategy of test use often cannot be tolerated. For
these reasons as well as others, researchers and practitioners have often experimented with and used
other ways of using test scores.
62 Handbook of Employee Selection

BANDING AND CUT SCORES


One method of reducing the consequences of subgroup differences in test scores and top-down
selection is to form bands of test scores that are not considered different usually using a statistical
criterion known as the standard error of the difference, which is based on the reliability of the test.
Most of us are familiar with a form of banding commonly used in academic situations. Scores on
tests are usually grouped into grades (e.g., A, B, C, etc.) that are reported without specific test score
information. So persons with scores of 99 and 93 might both receive an A in a course just as two
with scores of 88 and 85 would receive a B. The theory in employment selection use of banding is
that the unreliability inherent in most tests makes the people within a band indistinguishable from
each other just as occurs when grades are assigned to students.
Because minorities tend to score lower on cognitive ability tests, creating these bands of indis-
tinguishable scores helps increase the chances that minority applicants will fall in a top band and be
hired. There are two ways in which banding can increase minority hiring. One is to make the bands
very wide so that a greater number of minority test scorers will be included in the top bands. Of
course, a cynic may correctly point out that a test of zero reliability will include everyone in the top
band and that this approach supports the use of tests with low reliability. A second way in which to
impact the selection of minority individuals is the manner in which individuals are chosen within a
band. The best way to increase the selection of minority individuals is to choose these persons first
within each band before proceeding to consider other individuals in the band, but this has proven
difficult to legally justify in U.S. courts (Campion et al., 2001). Other approaches to selection within
a band include random selection or selection on secondary criteria unrelated to subgroup status, but
these procedures typically do not affect minority hiring rates in practically significant ways (Sackett
& Roth, 1991). A discussion of various issues and debates regarding the appropriateness of banding
is contained in an edited volume by Aguinis (2004).
An extreme departure from top-down selection occurs when an organization sets a minimum
cutoff test score such that individuals above some score are selected whereas those below that score
are rejected. In essence, there are two bands of test scores—those judged to represent a passable
level of competence and those representing a failing level of performance. Perhaps the most com-
mon use of cutoff scores is in licensing and credentialing, in which the effort is usually to identify
a level of expertise and knowledge of the practice of a profession below which a licensure candidate
is likely to bring harm to clients. In organizational settings a cutoff is often required when selection
of personnel is done sequentially over time rather than from among a large number of candidates
at a single point in time. In this case, hire-reject decisions are made about individuals and a pass
score is essential.
Use of a single cutoff score will certainly reduce the potential utility inherent in a valid test
because it ignores the individual differences in ability above the test score cutoff. There exists a
great deal of evidence (e.g., Coward & Sackett, 1990) that test score-job performance relationships
are linear throughout the range of test scores. However, using a minimum cut score on a cognitive
ability test on which we usually see the largest minority-majority differences to select employees
and selecting above that cutoff on a random basis or on the basis of some other valid procedure that
does not display subgroup differences may very much reduce the adverse impact that usually occurs
with top-down selection using a cognitive ability test.
Perhaps the biggest problem with the use of cutoff scores is deriving a justifiable cut score.
Setting a cutoff is always judgmental. Livingston (1980) and Cascio, Alexander, and Barrett (1988)
among others have usually specified the following as important considerations in setting cutoffs:
the qualifications of the experts who set the cutoff, the purpose for which the test is being used, and
the consideration of the various types of decision errors that can be made (i.e., denying a qualified
person and accepting an unqualified individual). One frequently used approach is the so-called
Angoff method, in which a representative sample of experts examines each test item and deter-
mines the probability that a minimally competent person (the definition and experts’ understanding
Validation Strategies for Primary Studies 63

of minimally competent is critical) would answer the question correctly. These probabilities are
summed across experts and across items. The result is the cutoff score. A second approach to the
setting of cut scores is to set them by reference to some acceptable level of performance on a crite-
rion variable. In this case, one could end up saying that an individual with a score of 75 on some test
has a 10% (or any percent) chance of achieving success on some job. However, this “benchmark-
ing” of scores against criteria does not resolve the problem because someone will be asked to make
sometimes equally difficult decisions about what constitutes acceptable performance. Cizek (2001)
provided a comprehensive treatment of methods of setting performance standards.
The use of score cutoffs to establish minimum qualifications or competency is common in licens-
ing exams. Credentialing exams may require evidence of a higher level of skill or performance
capability in some domain, but they too usually require only a “pass-fail” decision. Validation of
these cutoffs almost always relies solely on the judgments of experts in the performance area of
interest. In these cases, careful explication of the behaviors required to perform a set of tasks and
the level of “acceptable” performance is essential and likely the only possible form of validation.

USING PROFILES OF SCORES


Another possibility when scores on multiple measures of different constructs are available is that a
profile of measured abilities is constructed and that this profile is matched to a profile of the abilities
thought to be required in a job. In this instance, we might measure and quantify the type of job expe-
riences possessed by a job candidate along with their scores on various personality tests, and their
oral communications and social skills as measured in an interview and scores on ability tests. If this
profile of scores matches that required in the job, the person would be selected. This contrasts with
the traditional approach described in textbooks in which the person’s scores on these tests would be
linearly related to performance and combined using a regression model so that each score was opti-
mally linearly related to job performance. In using profiles, one is interested in patterns of scores
rather than an optimally weighted composite. Use of profiles of scores presents various complex
measurement and statistical problems of which the user should be aware (Edwards, 2002). Instances
in which selection decisions are made in this fashion include individual assessments (Jeanneret &
Silzer, 1998), which involve the use of multiple techniques using multiple methods of assessment
and a clinical judgment by the assessor that a person is qualified for some position (Ryan & Sackett,
1987; 1992; 1998). Another venue in which profiles of test scores are considered is in assessment
centers in which candidates for positions (usually managerial) are evaluated in various exercises on
different constructs and assessors make overall judgments that are then used in decision-making.
Overall judgments based on these procedures have shown criterion-related validity [see Ryan &
Sackett (1998) for a summary of data relevant to individual assessment and Gaugler, Rosenthal,
Thornton, & Bentson (1987) or Arthur, Day, McNelly, & Edens (2003) on assessment center valid-
ity], but we are aware of no evidence that validates a profile or configural use of scores.
Perhaps the best description of the research results on the use of profiles to make high-stakes
decisions is that we know very little. The following would be some of the issues that should
receive research attention: (a) Is a profile of scores actually used, implicitly or explicitly, in com-
bining information about job applicants and what is it? (b) What is the validity of such use and
its incremental validity over the use of individual components of the profile or linear composites
of the scores in the profile? and (c) What is the adverse impact on various subgroups using profile
judgments?

CLINICAL VERSUS STATISTICAL JUDGMENT


Clinical judgment refers to the use and combination of different types of information to make a
decision or recommendation about some person. In psychology, clinical judgment may be most
often discussed in terms of diagnoses regarding clinical patients (Meehl, 1954). These judgments
64 Handbook of Employee Selection

are likely quite similar to those made in the individual assessments discussed in the previous section
of this chapter but also may occur when judgments are made about job applicants in employment
interviews, assessment centers, and various other instances in which human resource specialists
or psychologists make employment decisions. Clinical judgment is often compared with statisti-
cal judgment in which test scores are combined on the basis of an arithmetic formula that reflects
the desired weighting of each element of information. The weights may be determined rationally
by a group of job experts or by using weights derived from a regression of a measure of overall
job success on scores on various dimensions using different methods of measurement. Meehl’s
original research (1954) showed that the accuracy of the worst regression estimate was equal to
the judgments made by human decision-makers. A more recent treatment and review of this litera-
ture by Hastie and Dawes (2001) has reaffirmed the general conclusion that predictions made by
human experts are inferior to those based on a linear regression model. However, human experts are
required to identify the types of information used in the prediction task. The predictions themselves
are likely best left to some mechanical combination rule if one is interested in maximizing a perfor-
mance outcome. The overall clinical judgment when used to make decisions should be the focus of
the validation effort, but unless it is clear how information is combined by the decision-maker, it is
unclear what constructs are playing a role in their decisions. The fact that these clinical judgments
are often not as highly correlated with externally relevant and important outcomes suggests that the
constructs these decision-makers use are not relevant.
In clinical judgment, the presence or absence of adverse impact can be the result of a combina-
tion of information that does not display sizable subgroup differences or a bias on the part of the
person making the judgment. Psychologists making clinical judgments may mentally adjust scores
on the basis of their knowledge of subgroup differences on various measures. There are again no
studies of which we are aware that address the use or appropriateness of such adjustments.

SCIENTIFIC OR LONG-TERM PERSPECTIVE: LIMITATIONS


OF EXISTING PRIMARY VALIDATION STUDIES, INCLUDING
THE CURRENT META-ANALYTIC DATABASE
There are a great many meta-analyses of the criterion-related validity of various constructs in the
prediction of job performance and many thousands of primary studies. Secondary analyses of meta-
analyses have also been undertaken (e.g., Schmidt & Hunter, 1998). The studies that provided these
data were nearly all conducted more than 30 years ago. Although it is not necessarily the case that
the relationships between ability and performance documented in these studies have changed in
the last half-century or so, this database has some limitations. In this section, we describe these
limitations and make the case that researchers continue their efforts to evaluate test-performance
relationships and improve the quality of the data that are collected.

CONCURRENT VALIDATION DESIGNS


In criterion-related validation research, concurrent validation studies in which predictor and cri-
terion data are simultaneously collected from job incumbents are distinguished from predictive
designs. In the latter, predictor data are collected before hiring from job applicants and criterion
data are collected from those hired presumably on the basis of criteria that are uncorrelated with
the predictor data after some appropriate period of time when job performance is thought to have
stabilized. Defects in the concurrent design (i.e., restriction of range and a different motivational set
on the part of incumbents versus applicants) have been described frequently (Barrett, Phillips, &
Alexander, 1981). Most comparisons of predictive and concurrent designs indicate that they provide
similar estimates of validity. However, it is probably the case that tests more susceptible to motiva-
tional differences between job incumbents and applicants, as might be the case for many noncog-
nitive measures which would display differences in validity when the participants in the research
Validation Strategies for Primary Studies 65

were actually being evaluated for employment versus a situation in which they were responding “for
research purposes.” To our knowledge, this comparison has not been made frequently, and when it
has been done in meta-analyses cognitive and noncognitive test validities have not been separated
(Schmitt, Gooding, Noe, & Kirsch, 1984). It is the case that practical considerations have made the
use of concurrent designs much more frequent than predictive designs (Schmitt et al., 1984).
Meta-analytic data suggest that there are not large differences in the validity coefficients resulting
from these two designs. Further, range restriction corrections can be applied to correct for the fact
that data for lower-scoring persons are absent from concurrent studies, but these data are often
absent in reports of criterion-related research. Nor can we estimate any effects on test scores
that might result from the fact that much more is at stake in a testing situation that may result in
employment as opposed to one that is being done for research purposes. Moreover, as Sussman
and Robertson (1986) maintained, the manner in which some predictive studies are designed and
conducted make them little different than concurrent studies.

UNIDIMENSIONAL CRITERION VERSUS MULTIDIMENSIONAL PERSPECTIVES


Over the last two decades, the view that job performance is multidimensional has become much
more widely accepted by I-O psychologists (Borman & Motowidlo, 1997; Campbell, Gasser, &
Oswald, 1996). Early validation researchers often used a single rating of what is now called task
performance as a criterion, or they combined a set of ratings into an overall performance measure.
In many cases a measure of training success was used as the criterion. The Project A research
showed that performance was comprised of clearly identifiable dimensions (Campbell et al., 1990)
and subsequent research has very often included the use of measures of contextual and task perfor-
mance (Motowidlo, 2003). Some researchers also argue that the nature of what constitutes perfor-
mance has changed because jobs have changed (Ilgen & Pulakos, 1999). In all cases, the underlying
performance constructs should be specified as carefully as possible, perhaps particularly so when
performance includes contextual dimensions, which, as is true of any developing literature, have
included everything that does not include “core” aspects of a job. Validation studies (and meta-
analyses) that include this multidimensional view of performance are very likely to yield informa-
tion that updates earlier validation results.

SMALL SAMPLE SIZES


The limitations of small sample sizes in validity research have become painfully obvious with the
development of meta-analyses and validity generalization research (Schmidt & Hunter, 1977) as
well as the recognition that the power to reject a null hypothesis that there is no test score-perfor-
mance relationship is very low in much early validation work (Schmidt, Hunter, & Urry, 1976).
Although methods to correct for the variability in observed validity coefficients are available and
routinely used in meta-analytic and validity generalization research, the use of small samples does
not provide for confidence in the results of that research and can be misleading in the short term
as enough small sample studies are conducted and reported to discern generalizable findings. This
may not be a problem if we are satisfied that the relationships studied in the past are the only ones in
which our field is interested, but it is a problem when we want to evaluate new performance models
(e.g., models that include a distinction between task, contextual dimensions, or others), new predic-
tor constructs (e.g., some noncognitive constructs or even spatial or perceptual measures), or when
we want to assess cross- or multilevel hypotheses.

INADEQUATE DATA REPORTING


The impact of some well-known deficiencies in primary validation studies is well known. Corrections
for range restriction and criterion unreliability (in the mean and variance of validity coefficients)
66 Handbook of Employee Selection

and for the variability due to small sample size are also well known and routinely applied in validity
generalization work. However, most primary studies do not report information that allows for
sample-based corrections for criterion unreliability or range restriction. Schmidt and Hunter (1977)
in their original meta-analytic effort used estimates of the sample size of the validity coefficients
they aggregated because not even sample size was available in early reports. Consequently, in
estimating population validity coefficients, meta-analysts have been forced to use assumed artifact
distributions based on the small amount of data that are available. There is some evidence that
these assumptions are approximately correct (e.g., Alexander, Carson, Alliger, & Cronshaw, 1989;
Sackett & Ostgaard, 1994) for range restriction corrections, but the use of such assumed artifact
distributions would not be necessary with adequate reporting of primary data. Unfortunately, such
information for most of our primary database is lost. In addition, researchers disagree regarding
the appropriate operationalization of criterion reliability (Murphy & DeShon, 2000; Schmidt,
Viswesvaran, & Ones, 2000).

CONSIDERATION OF MULTILEVEL ISSUES


As described in the section above on the utility and adverse impact associated with selection proce-
dures, selection researchers have made attempts to estimate the organizational outcomes associated
with the use of valid tests (Boudreau & Ramstad, 2003). Utility is linearly related to validity minus
the cost of recruiting and assessing personnel. When multiplied by the number of people and the
standard deviation of performance in dollar terms, the estimates of utility for most selection instru-
ments are very large (e.g., see Schmidt, Hunter, Outerbridge, & Trattner, 1986).
Another body of research has focused on the relationship between organizational human resource
practices such as the use of tests and measures of organizational success. The organizational-level
research has documented the usefulness of various human resource practices including test use.
Terpstra and Rozell (1993) reported correlational data that supported the conclusion that organiza-
tions that used various selection procedures such as interviews, cognitive ability tests, and biodata
had higher annual levels of profit, growth in profit, and overall performance.
Various other authors have called for multilevel (individuals, work groups, organizations) or
cross-level research on the relationship between knowledge, skills, abilities, and other characteristics
(KSAOs) and organizational differences (Schneider, Smith, & Sipe, 2000). Ployhart and Schmitt
(2007) and Schneider et al. (2000) have proposed a series of multilevel questions that include consid-
erations of the relationships between the variance of KSAOs and measures of group and organiza-
tional effectiveness. In the context of the attraction-selection-attrition model (Schneider, 1987), there
are many issues of a multilevel and longitudinal nature that researchers are only beginning to address
and about which we have very little or no data. These questions should be addressed if we are to fully
understand the relationships between KSAOs and individual and organizational performance.

VALIDATION AND LONG-TERM OR SCIENTIFIC PERSPECTIVE


Given the various limitations of our primary database noted in the previous sections of this chapter,
we believe selection researchers should aim to conduct additional large-scale or consortium studies
like Project A (Campbell, 1990; Campbell & Knapp, 2001). These studies should include the fol-
lowing characteristics:

1. They should be predictive (i.e., longitudinal with data collection at multiple points), con-
current, and of sufficient sample size to allow for adequate power in the tests of hypoth-
eses. Large-scale studies in which organizations continue data collection over time on an
ever-expanding group of participants should be initiated.
2. Multiple criteria should be collected to allow for evaluation of various KASO-performance
relationships.
Validation Strategies for Primary Studies 67

3. Data should be collected to allow for artifact corrections such as unreliability in the criteria
and range restriction.
4. Unit-level data should be collected to allow for evaluation of multilevel hypotheses. These
data should include basic unit characteristics and outcome data.
5. Demographic data should be collected to allow for evaluation of subgroup differences
in the level of performance and differences in KASO-performance relationships across
subgroups.
6. Data on constructs thought to be related (and unrelated) to the target constructs of interest
should be collected to allow for evaluation of broader construct validity issues.

Obviously, these studies would necessitate a level of cooperation and planning not characteris-
tic of multiple researchers, much less multiple organizations. However, real advancement in our
understanding of individual differences in KSAOs and performance will probably not come from
additional small-scale studies or meta-analyses of primary studies that address traditional questions
with sample sizes, research designs, and measurement characteristics that are not adequate.

CONCLUSIONS
It is certainly true that meta-analyses have provided our discipline with strong evidence that many
of the relationships between individual differences and performance are relatively strong and gener-
alizable. However, many situations where validation is necessary do not lend themselves to validity
generalization or the use of meta-analytic databases. As a result, practitioners frequently find them-
selves in situations where well-designed primary studies are required. A focus on the appropriate
designs for these studies is therefore important.
Additionally, without primary studies of the relationships between individual differences and
performance, there can be no meta-analyses, validity transport, or validity generalization. The qual-
ity and nature of the original studies that are the source of our meta-analytic database determine to
a great extent the currency and quality of the conclusions derived from the meta-analyses, statistical
corrections notwithstanding.
We argue that the field would be greatly served by large-scale primary studies of the type con-
ducted as part of Project A (see Sackett, 1990 or Campbell & Knapp, 2001). These studies should
begin with a clear articulation of the performance and predictor constructs of interest. They should
involve the collection of concurrent and predictive data and improve upon research design and
reporting issues that have bedeviled meta-analytic efforts for the past three decades. Demographic
data should be collected and reported. Data should be collected across multiple organizational units
and organizations (and perhaps globally), and data describing the organizational context should be
collected and recorded. We know much more about the complexities of organizational behavior,
research design, measurement, and individual differences than we did 80–100 years ago, and this
should be reflected in how we collect our data and make them available to other professionals.

REFERENCES
Ackerman, P. L. (1989). Within-task intercorrelations of skilled performance: Implications for predicting
individual differences? (A comment on Henry & Hulin, 1987). Journal of Applied Psychology, 74,
360–364.
Aguinis, H. (Ed.). (2004). Test score banding in human resource selection: Legal, technical and societal issues.
Westport, CT: Praeger.
Alexander, R. A., Carson, K. P., Alliger, G. M., & Cronshaw, S. F. (1989). Empirical distributions of range
restricted SDx ion validity studies. Journal of Applied Psychology, 74, 253–258.
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (1985). Standards for educational and psychological testing. Washington,
DC: American Psychological Association.
68 Handbook of Employee Selection

American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (1999). Standards for educational and psychological testing. Washington,
DC: American Educational Research Association.
American Psychological Association, American Educational Research Association, and National Council on
Measurement in Education. (1954). Technical recommendations for psychological and diagnostic tech-
niques. Psychological Bulletin, 51, 201–238.
Arthur, W., Jr., Day, E. A., McNelly, T. L., & Edens, P. S. (2003). A meta-analysis of the criterion-related valid-
ity of assessment center dimensions. Personnel Psychology, 56, 125–153.
Arvey, R. D., Nutting, S. M., & Landon, T. E. (1992). Validation strategies for physical ability testing in police
and fire settings. Public Personnel Management, 21, 301–312.
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917–1992. Journal of Applied Psychology, 77,
836–874.
Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychologi-
cal research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social
Psychology, 51, 1173–1182.
Barrett, G. V., Caldwell, M. S., & Alexander, R. A. (1985). The concept of dynamic criteria: A critical reanalysis.
Personnel Psychology, 38, 41–56.
Barrett, G. V., Phillips, J. S., & Alexander, R. A. (1981). Concurrent and predictive validity designs: A critical
reanalysis. Journal of Applied Psychology, 66, 1–6.
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential
and evidential bases. Journal of Applied Psychology, 74, 478–494.
Boehm, V. R. (1982). Are we validating more but publishing less? The impact of governmental regulation on
published validation research—An exploratory investigation. Personnel Psychology, 35, 175–187.
Borman, W. C., & Motowidlo, S. J. (1997). Task performance and contextual performance: The meaning for
personnel selection research. Human Performance, 10, 99–109.
Boudreau, J. W., & Ramstad, P. M. (2003). Strategic industrial and organizational psychology and the role
of utility. In W. C. Borman, D. R. Ilgen, & R. J. Klimoski (Eds.), Handbook of psychology (Vol. 12,
pp. 193–221). Hoboken, NJ: Wiley.
Brogden, H. E. (1951). Increased efficiency of selection resulting from replacement of a single predictor with
several differential predictors. Educational and Psychological Measurement, 11, 173–195.
Brogden, H. E. (1959). Efficiency of classification as a function of number of jobs, percent rejected, and the
validity and intercorrelation of job performance estimates. Educational and Psychological Measurement,
19, 181–190.
Brown, C. W., & Ghiselli, E. E. (1953). Percent increase in proficiency resulting from use of selection devices.
Journal of Applied Psychology, 37, 341–345.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod
matrix. Psychological Bulletin, 56, 81–105.
Campbell, J. P. (1990). An overview of the Army Selection and Classification Project (Project A). Personnel
Psychology, 43, 231–240.
Campbell, J. P., Gasser, M. B., & Oswald, F. L. (1996). The substantive nature of job performance variabil-
ity. In K. R. Murphy (Ed.), Individual differences and behavior in organizations (pp. 258–299). San
Francisco, CA: Jossey-Bass.
Campbell, J. P., & Knapp, D. J. (Eds.) (2001). Exploring the limits in personnel selection and classification.
Mahwah, NJ: Lawrence Erlbaum.
Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. (1993). A theory of performance. In N. Schmitt &
W. C. Borman (Eds.), Personnel selection in organizations. San Francisco, CA: Jossey-Bass.
Campbell, J. P., McHenry, J. J., & Wise, L. L. (1990). Modeling job performance in a population of jobs.
Personnel Psychology, 43, 313–334.
Campion, M. A., Outtz, J. L., Zedeck, S., Schmidt, F. L., Kehoe, J. F., Murphy, K. R., & Guion, R. M. (2001).
The controversy over score banding in personnel selection: Answers to 10 key questions. Personnel
Psychology, 54, 149–185.
Cascio, W. F., Alexander, R. A., & Barrett, G. V. (1988). Setting cut scores: Legal, psychometric, and profes-
sional issues and guidelines. Personnel Psychology, 41, 1–24.
Cascio, W. F., Valenzi, E. R., & Silbey, V. (1978). Validation and statistical power: Implications for applied
research. Journal of Applied Psychology, 63, 589–595.
Cizek, G. J. (Ed.). (2001). Setting performance standards. Hillsdale, NJ: Lawrence Erlbaum.
Cohen, J. (1988). Statistical power for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Validation Strategies for Primary Studies 69

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings.
Chicago, IL: Rand-McNally.
Cortina, J. M., & Dunlap, W. P. (1997). On the logic and purpose of significance testing. Psychological Methods,
2, 161–172.
Coward, W. M., & Sackett, P. R. (1990). Linearity of ability-performance relationships: A reconfirmation.
Journal of Applied Psychology, 75, 297–300.
Cropanzano, R., & Wright, T. A. (2003). Procedural justice and organizational staffing: A tale of two paradigms.
Human Resource Management Review. Special Issue: Fairness and Human Resources Management, 13,
7–39.
Davis, J. E. (1989). Construct validity in measurement: A pattern matching approach. Evaluation and Program
Planning. Special Issue: Concept Mapping for Evaluation and Planning, 12, 31–36.
Dunnette, M. D. (1963). A note on the criterion. Journal of Applied Psychology, 47, 251–254.
Edwards, J. R. (2002). Alternatives to difference scores: Polynomial regression analysis and response surface
methodology. In F. Drasgow & N. Schmitt (Eds.), Measuring and analyzing behavior in organizations.
San Francisco: Jossey-Bass.
Fiske, D. W. (1951). Values, theory, and the criterion problem. Personnel Psychology, 4, 93–98.
Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analyses of assessment center
validity. Journal of Applied Psychology, 72, 493–511.
Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New York, NY: Wiley.
Ghiselli, E. E. (1973). The validity of aptitude tests in personnel selection. Personnel Psychology, 26,
461–478.
Gibson, W. M., & Caplinger, J. A. (2007). Transportation of validation results. In S. M. McPhail (Ed.),
Alternative validation strategies: Developing new and leveraging existing validity evidence: The profes-
sional practice series. (pp. 29–81). Hoboken, NJ: Wiley.
Goldman, B. M., Gutek, B. A., Stein, J. H., & Lewis, K. (2006). Employment discrimination in organizations:
Antecedents and consequences. Journal of Management, 32(6), 786–830.
Guion, R. M. (1961). Criterion measurement and personnel judgments. Personnel Psychology, 14, 141–149.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions. Mahwah, NJ: Lawrence
Erlbaum.
Hastie, R., & Dawes, R. M. (2001). Rational choice in an uncertain world. Thousand Oaks, CA: Sage.
Henry, R. A., & Hulin, C. L. (1989). Changing validities: Ability-performance relations and utilities. Journal
of Applied Psychology, 54, 365–367.
Hogan, J., & Quigley, A. M. (1986). Physical standards for employment and the courts. American Psychologist,
41, 1193–1217.
Hull, C. L. (1928). Aptitude testing. Yonkers, NY: World Book.
Ilgen, D. R., & Pulakos, E. D. (Eds.). (1999). The changing nature of performance. San Francisco: Jossey-
Bass.
Jeanneret, P. R., & Silzer, R. (Eds.). (1998). Individual psychological assessment: Predicting behavior in orga-
nizational settings. San Francisco, CA: Jossey-Bass.
Lance, C. L., Foster, M. R., Gentry, W. A., & Thoresen, J. D. (2004). Assessor cognitive processes in an opera-
tional assessment center. Journal of Applied Psychology, 89, 22–35.
Lance, C. L., Foster, M. R., Nemeth, Y. M., Gentry, W. A., & Drollinger, S. (2007). Extending the nomological
network of assessment center construct validity: Prediction of cross-situationally consistent and specific
aspects of assessment center performance. Human Performance, 20, 345–362.
Landy, F. J. (1986). Stamp collecting versus science: Validation as hypothesis testing. American Psychologist,
41, 1183–1192.
Livingston, S. A. (1980). Comments on criterion-referenced testing. Applied Psychological Measurement, 4,
575–581.
Maier, M. H. (1988). On the need for quality control in validation research. Personnel Psychology, 41,
497–502.
McDaniel, M. A. (2007). Validity generalization as a test validation approach. In S. M. McPhail (Ed.),
Alternative validation strategies: Developing new and leveraging existing validity evidence: The profes-
sional practice series (pp. 159–180). Hoboken, NJ: Wiley.
Meehl, R. J. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence.
Minneapolis: University of Minnesota Press.
Messick, S. (1975). The standard problem: Meaning and values in measurement and evaluation. American
Psychologist, 30, 955–966.
Messick, S. (1998). Test validity: A matter of consequence. Social Indicators Research, 45, 35–44.
70 Handbook of Employee Selection

Motowidlo, S. J. (2003). Job performance. In W. C. Borman, D. R. Ilgen, & R. J. Klimoski (Eds.), Handbook
of psychology (Vol. 12, pp. 39–53). Hoboken, NJ: Wiley.
Murphy, K. R. (1986). When your top choice turns you down: The effects of rejected offers on the utility of
selection tests. Psychological Bulletin, 99, 133–138.
Murphy, K. R., & DeShon, R. P. (2000). Interrater correlations do not estimate the reliability of job perfor-
mance ratings. Personnel Psychology, 53, 873–900.
Newman, D. A., Jacobs, R. R., & Bartram, D. (2007). Choosing the best method for local validity estima-
tion: Relative accuracy of meta-analysis versus a local study versus Bayes-analysis. Journal of Applied
Psychology, 92, 1394–1413.
Ones, D. S., Viswesvaran, C., & Dilchert, S. (2005). Cognitive ability in selection decisions. In O. Wilhelm
(Ed.), Handbook of understanding and measuring intelligence (pp. 431–468). Thousand Oaks, CA:
Sage.
Peterson, N. G., Wise, L. L., Arabian, J., & Hoffman, R. G. (Eds.). (2001). Synthetic validation and valid-
ity generalization: When empirical validation is not possible. In J. P. Campbell & D. J. Knapp (Eds),
Exploring the limits in personnel selection and classification (pp. 411–452). Mahwah, NJ: Lawrence
Erlbaum.
Picano, J. J., Williams, T. J., & Roland, R. R. (Eds.). (2006). Assessment and selection of high-risk operational
personnel. New York, NY: Guilford Press.
Ployhart, R. E., & Schmitt, N. (2007). The attraction-selection-attrition model and staffing: Some multilevel
implications. In D. B. Smith (Ed.), The people make the place: Exploring dynamic linkages between
individuals and organizations (pp. 89–102). Mahwah, NJ: Lawrence Erlbaum.
Ployhart, R. E., Schneider, B., & Schmitt, N. (2006). Staffing organizations: Contemporary practice and the-
ory. Mahwah, NJ: Lawrence Erlbaum.
Rothstein, H. R. (1992). Meta-analysis and construct validity. Human Performance, 5, 71–80.
Ryan, A. M., & Sackett, P. R. (1987). A survey of individual assessment practices by I/O psychologists.
Personnel Psychology, 40, 455–488.
Ryan, A. M., & Sackett, P. R. (1992). Relationships between graduate training, professional affiliation, and
individual psychological assessment practices for personnel decisions. Personnel Psychology, 45,
363–385.
Ryan, A.M., & Sackett, P. R. (1998). Individual assessment: The research base. In R. Jeanneret & R. Silzer
(Eds.), Individual psychological assessment: Predicting behavior in organizational settings (pp. 54–87).
San Francisco, CA: Jossey-Bass.
Sackett, P. R. (Ed.) (1990). Special Issue: Project A: The U. S. Army Selection and Classification Project.
Personnel Psychology, 43, 231–378.
Sackett, P. R., & Ostgaard, D. J. (1994). Job-specific applicant pools and national norms for cognitive ability
tests: Implications for range restriction corrections in validation research. Journal of Applied Psychology,
79, 680–684.
Sackett, P. R., & Roth, L. (1991). A Monte Carlo examination of banding and rank order methods of test score
use in personnel selection. Human Performance, 4, 279–295.
Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment cre-
dentialing, and higher education: Prospects in a post-affirmative action world. American Psychologist,
56, 302–318.
Sackett, P. R., & Wilk, S. L. (1994). Within-group norming and other forms of score adjustment in preemploy-
ment testing. American Psychologist, 49, 929–954.
Scherbaum, C. A. (2005). Synthetic validity: Past, present, and future. Personnel Psychology, 58, 481–515.
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generaliza-
tion. Journal of Applied Psychology, 62, 529–540.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychol-
ogy: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Schmidt, F. L., & Hunter, J. E. (2003). History, development, evolution, and impact of validity generalization
and meta-analysis methods, 1975–2001. In K. R. Murphy (Ed.), Validity generalization: A critical review.
Applied Psychology Series (pp. 31–65). Mahwah, NJ: Lawrence Erlbaum.
Schmidt, F. L., Hunter, J. E., Outerbridge, A. N., & Trattner, M. H. (1986). The economic impact of job selec-
tion methods on size, productivity, and payroll costs of the federal work force: An empirically based
demonstration. Personnel Psychology, 39, 1–30.
Schmidt, F. L., Hunter, J. E., & Urry, V. W. (1976). Statistical power in criterion-related validity studies. Journal
of Applied Psychology, 61, 473–485.
Validation Strategies for Primary Studies 71

Schmidt, F. L., Mack, M. J., & Hunter, J. E. (1984). Selection utility in the occupation of U.S. park ranger for
three modes of test use. Journal of Applied Psychology, 69, 490–497.
Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not validity and validity is not reliability.
Personnel Psychology, 53, 901–912.
Schmitt, N., Gooding, R., Noe, R. A., & Kirsch, M. (1984). Meta-analyses of validity studies published between
1964 and 1982, and the investigation of study characteristics. Personnel Psychology, 37, 407–422.
Schmitt, N., Rogers, W., Chan, D., Sheppard, L., & Jennings, D. (1997). Adverse impact and predictive
efficiency of various predictor combinations. Journal of Applied Psychology, 82, 717–730.
Schneider, B. (1987). The people make the place. Personnel Psychology, 40, 437–454.
Schneider, B., Smith, D., & Sipe, W. P. (2000). Personnel selection psychology: Multilevel considerations.
In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations
(pp. 91–120). San Francisco, CA: Jossey-Bass.
Scott, W. D. (1915). The scientific selection of salesmen. Advertising and Selling, 5, 5–7.
Shotland, A., Alliger, G. M., & Sales, T. (1998). Face validity in the context of personnel selection: A multime-
dia approach. International Journal of Selection and Assessment, 6, 124–130.
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of person-
nel selection procedures. Bowling Green, OH: Author.
Strickland, W. J. (1979). The relationship between program evaluation research and selection system validation:
Application to the assessment center method. Dissertation Abstracts International, 40(1-B), 481–482.
Sussman, M., & Robertson, D. U. (1986). The validity of validity: An analysis of validation study designs.
Journal of Applied Psychology, 71, 461–468.
Terpstra, D. E., & Kethley, R. B. (2002). Organizations’ relative degree of exposure to selection discrimination
litigation. Public Personnel Management, 31, 277–292.
Terpstra, D. E., & Rozell, E. J. (1993). The relationship of staffing practices to organizational level measures of
performance. Personnel Psychology, 46, 27–48.
Thayer, P. W. (1992). Construct validation: Do we understand our criteria? Human Performance, 5, 97–108.
Thorndike, E. L. (1911). Individuality. Boston, MA: Houghton-Mifflin.
Viteles, M. S. (1932). Industrial psychology. New York, NY: Norton.
This page intentionally left blank
4 Work Analysis
Kenneth Pearlman and Juan I. Sanchez

The purposes of this chapter are to describe and summarize the current state of the art with respect
to work analysis as it applies to employee or personnel selection and to suggest expansions of such
applications in light of emerging and anticipated changes in the world of work. We use the term
“work analysis” broadly to refer to any systematic process for gathering, documenting, and analyz-
ing information about (a) the content of the work performed by people in organizations (e.g., tasks,
responsibilities, or work outputs), (b) the worker attributes related to its performance (often referred
to as knowledge, skills, abilities, and other personal characteristics, or KSAOs), or (c) the context in
which work is performed (including physical and psychological conditions in the immediate work
environment and the broader organizational and external environment). Other terms, such as “job
analysis,” “occupational analysis,” and “job specification” are often used, sometimes interchange-
ably and with somewhat varying definitions in different contexts, to refer to one or more of these
activities. Our use of “work analysis” reflects our preference for a broader term that does not con-
note a focus on any particular aspect or unit of analysis in the study of work.
The present focus on staffing involves just one of a wide range of organizational and human
resources (HR) needs and purposes for which work analysis is used (e.g., training, job evaluation,
performance management, job design, workforce planning, and many others). However, we take a
broad view of this function, ranging from its narrowest conception (external hiring) to a wider range
of related HR functions (personnel placement and classification, internal staffing and promotion,
selection for training or special assignments, and selection out of an organization) and extending to
its broadest conception as reflecting different transition or decision points within an ongoing pro-
cess or “life cycle” of employee assimilation and socialization within an organization (Anderson &
Ostroff, 1997).
This chapter is organized into four major sections. “Traditional Selection-Related Applications of
Work Analysis” examines the primary applications of selection-oriented work analysis. “A Review
of Major Work Analysis Methods and Approaches” provides a review and analysis of major histori-
cal work analysis methods that have been used to support personnel selection. “Key Work Analysis
Practice Issues” is devoted to several key issues that arise in the practical application of work analy-
sis to personnel selection. “Frontiers of Work Analysis: Emerging Trends and Future Challenges”
discusses several emerging trends, issues, and challenges that we see as critical to the continuing
and future relevance and utility of selection-oriented work analysis. “Synopsis and Conclusions”
summarizes and draws some general conclusions on the basis of the material presented in the main
sections.

TRADITIONAL SELECTION-RELATED APPLICATIONS OF WORK ANALYSIS


Work analysis is seldom an end in itself but is almost always a tool in the service of some applica-
tion, a means to an end. We view it as axiomatic that options or alternatives regarding specific work
analysis methods and practices cannot be meaningfully considered without also specifying their
context or application, because this drives every facet of work analysis. In the present context (work

73
74 Handbook of Employee Selection

analysis for employee selection) it is therefore relevant to understand that an organization’s selection
processes are (or should be) derived from its overall selection strategy, which in turn is (or should
be) based on its broader goals and strategies. Just as selection systems must be viewed as part of
broader HR systems (recruitment, training, performance management, and so forth), and ideally
should be integrated in a planful manner with these other systems (which themselves are, ideally,
linked to and aligned with the organization’s broader goals), the work analysis systems that support
them should be similarly viewed. In other words, organizational goals and strategy should drive
selection system goals and strategy, which in turn should drive work analysis strategy, which then
serves as the basis for the many specific decisions involved in designing a particular work analysis
system, program, or project.
Broadly speaking, the purpose of work analysis for personnel selection applications is to ensure
that selection systems are work- or job-related, and hence valid, and thereby have value or utility for
the organization, as well as being legally defensible.1 Within this context, four general categories of
work analysis application can be distinguished, as follows.

WORK ANALYSIS FOR PREDICTOR DEVELOPMENT


There are two phases or aspects to this application. First is the use of work analysis information to
make inferences about the person requirements of work, that is, to determine what worker attributes
or KSAOs (perhaps also including at what level of proficiency) are needed to effectively perform the
behaviors or carry out the work activities identified or defined through the work analysis. Second is
the determination or development of appropriate measures of the KSAOs generated in phase one,
such as tests or assessments of particular abilities or skills.

WORK ANALYSIS FOR CRITERION DEVELOPMENT


Work analysis provides the information needed to understand the content (work activities, behav-
iors, or outcomes) and context (both the broader organizational and more specific work setting) of
work performance, and in so doing it provides the basis for developing work performance measures
or standards. Such measures or standards can, in turn, serve as criteria for formally evaluating
individual employee selection tools or an overall selection system; for example, in the context of a
criterion-related validation study. These criterion measures often take the form of specific dimen-
sions of work activity, along with associated rating scales, but can also take the form of more objec-
tively observable indices (production or error rates) or work sample measures.

WORK ANALYSIS FOR VALIDITY EVIDENCE DEVELOPMENT


This refers to the application of work analysis to the development of content-related evidence in
support of a selection procedure’s validity (more commonly referred to as “content validity”). For
such applications, work analysis is used to define a work domain or a job’s content domain in terms
of the important tasks, activities, responsibilities, or work behaviors performed and their associated
worker requirements, or KSAOs. Measures of the work content, or a selected subset of associated

1 The legal authority relevant to work analysis practices in the context of personnel selection is embodied in the Uniform
Guidelines on Employee Selection Procedures (Guidelines; EEOC, CSC, DOL, & DOJ, 1978) issued by the agencies
responsible for enforcing federal antidiscrimination law. The Guidelines outline several specific and somewhat differing
work analysis requirements for different selection procedure validation strategies that represent important constraints
and considerations that practitioners must take into account. The various legal and professional guidelines and standards
relevant to domestic and international personnel selection, including work analysis, are treated in depth elsewhere in
this volume and hence are not covered here. Detailed discussion of legal and professional standards for work analysis as
they relate specifically to the applications we describe below can be found in Brannick, Levine, and Morgeson (2007),
Gatewood, Feild, and Barrick (2008), and the Stelly/Goldstein, Gibson/Caplinger, Johnson, and McDaniel chapters in
McPhail (2007).
Work Analysis 75

KSAOs, are then developed and judgmentally linked back to the content domain by subject matter
experts (SMEs). Validity is a function of the strength of these (measure-content domain) linkages,
which in turn are a function of the quality of the original work analysis, the quality of the experts,
and the rigor of the linkage process.

WORK ANALYSIS FOR VALIDITY EVIDENCE EXTENSION


This refers to the application of work analysis to methods for inferring a selection procedure’s valid-
ity for a given job or work setting without actually conducting a validation study in the new set-
ting. Three distinguishable but related concepts or methodologies for justifying such inferences are
commonly recognized: (a) validity generalization or meta-analysis, (b) synthetic or job-component
validity, and (c) validity transportability. (Extended treatment of all of these approaches can be
found in McPhail, 2007.) Validity generalization, a form of meta-analysis applied specifically to
the personnel selection context (Pearlman, Schmidt, & Hunter, 1980), involves a series of statisti-
cal analyses performed on a set of criterion-related validity coefficients accumulated from archival
sources to determine: (a) the best estimate of validity for the predictor-criterion (or job) combination
represented in the set and (b) the degree to which this estimate can be generalized to other situa-
tions. Synthetic validity encompasses several methods that involve the use of structured work analy-
sis questionnaires that can be scored in terms of work dimensions or job components, each of which
has pre-established relationships (generally via prior criterion-related validation work or expert
estimation) with one or more predictor constructs or measures (e.g., cognitive abilities). When a
new job is analyzed with this questionnaire and its component dimensions are scored, predictor
measure validity for the new job can be inferred or computed from these relationships. Validity
transportability refers to use of a specific selection procedure in a new situation based on results of
a validation study conducted elsewhere.
Work analysis is fundamental to establishing the strength of the inference of validity in all three
approaches to validity evidence extension, albeit in different ways. For validity generalization and
validity transportability, the key issue is establishing similarity between a job (or job group) for
which validity evidence has been obtained in one setting and the target job to which one wishes to
generalize or “transport” that evidence. Consequently, the key question is, “How similar is similar
enough?” Research suggests that a relatively molar or high-level work analysis (e.g., sufficient to
classify the target job into a broadly defined job family) may be sufficient for many applications,
because even relatively large task differences between jobs do not moderate the validity of many
types of ability tests (Schmidt, Hunter, & Pearlman, 1981). By its nature, synthetic validity requires
a fairly detailed work analysis; however, the precise form of the work analysis is dictated by the
characteristics of the focal work analysis questionnaire on which the predictor-job dimension rela-
tionships were originally established. Validity transportability applications legally require the tar-
get job to consist of “substantially the same major work behaviors” as that (or those) on which the
focal validation work was conducted (EEOC et al., 1978), implying the need for at least a somewhat
detailed level of work analysis. Such issues, as well as various approaches to job similarity evalua-
tion, are considered in depth by Harvey (1986), Pearlman (1980), and Sackett (2003).

A REVIEW OF MAJOR WORK ANALYSIS METHODS AND APPROACHES


WORK ANALYSIS INFORMATION FRAMEWORK
A useful framework (shown in Table 4.1) for classifying work analysis methods is premised on
the view that they can be broadly differentiated in terms of the process they use to compile,
analyze, and present work analytic information or data and the content of such information or
data. Work analysis processes can in turn be broadly differentiated in terms of whether they are
primarily qualitative or quantitative (sometimes characterized as an inductive-versus-deductive
76 Handbook of Employee Selection

TABLE 4.1
Work Analysis Information Framework
Level of Analysis
Work Descriptor
Category Broad Moderate Specific
Work-oriented • Major duties • Task clusters or work • Tasks or work steps
content • Major responsibilities activities • Work outputs
• Work functions or processes • Specific materials, equipment,
• Material, equipment, tool, and tools, and machines
machine categories • Work-content-based
performance standards
• Specific work environment
features and working
conditions
Worker-oriented • Position/job/ • Generalized work behaviors • Worker-behavior-based
content occupational titles • Worker functions or processes performance standards
• Behavioral indicators
Attribute • Personality traits, values, • Generic or • Specialized/technical skills
requirements and interests cross-functional skills • Specialized/technical
• Aptitudes and abilities knowledge

distinction). Qualitative approaches build up work or job information, often from scratch (e.g., on
the basis of observing or interviewing job incumbents to determine specific tasks performed) and
generally one job at a time, yielding detailed, narrative information that is customized to indi-
vidual jobs or specific work within an organization. Quantitative approaches are generally based
on the use of structured work analysis questionnaires or surveys comprised of pre-established
lists of different types of work or job descriptors (i.e., work characteristics or units of analysis,
such as work behaviors, worker functions, or KSAOs). These usually include rating scales that
permit SMEs (incumbents, supervisors, or job analysts) to quantify their judgments about individual
descriptors along dimensions of interest (e.g., performance frequency, importance, level of
complexity, and consequences of error).
Work analysis content refers to the types of work descriptors used and the level of analysis or
detail represented by these descriptors. McCormick (1976) has usefully distinguished among three
broad descriptor categories, including (in slightly adapted form) (a) work-oriented content descrip-
tors, in which the descriptive frame of reference is the work to be done, including the purpose,
steps, tools and materials, required resources, and conditions under which work is accomplished
(examples include tasks, activities, duties, responsibilities, working conditions, and work outputs);
(b) worker-oriented content descriptors, in which the descriptive frame of reference is what workers
do to carry out work (examples include worker functions, processes, or behaviors); and (c) attri-
bute requirement descriptors, in which the descriptive frame of reference is the attributes needed
by a worker to do the specified work (examples include skills, knowledge, abilities, and tempera-
ments or dispositions). Distinctions among these categories can at times be blurry because of some
natural overlap and the unavoidable imprecision of language (e.g., the second and third categories
are sometimes considered as a single “worker-oriented” or “attribute” category); nonetheless, they
have proven to be conceptually useful. Work descriptors can be further differentiated in terms of
the level of analysis or description reflected by a particular category of descriptor. Work-oriented
content descriptors can range from narrow and specific (such as tasks performed) to broad and gen-
eral (such as major duties or responsibilities), as can worker-oriented content descriptors (ranging
from specific behavioral performance indicators to broad position or job titles). Similarly, attribute
requirement descriptors can be represented by very narrowly defined characteristics (specialized
Work Analysis 77

skills or knowledge) or very broadly defined human attributes (such as aptitudes and abilities or
personality traits).
Table 4.1 illustrates how these two facets of work analysis content interact and play out in terms
of specific descriptor types. It provides examples of descriptors representing each work descriptor
category within each of three levels of analysis (broad, moderate, and specific; these are admittedly
imprecise distinctions that, nonetheless, seem to work reasonably well as a descriptive frame of
reference for our purposes.) Note that the qualitative-quantitative work analysis process distinction
is not represented here but is represented in our later expansion of this table and associated text
discussion of specific work analysis methodologies.

REVIEW OF SPECIFIC WORK ANALYSIS METHODS, SYSTEMS, AND APPROACHES


Work analysis, at least conceptually, can be traced as far back as the 5th century BC to Socrates’
concerns with what work is required in a society, as well as how and by whom such work
should be done (Primoff & Fine, 1988). Even earlier beginnings—the Chinese Imperial Court
of 1115 BC—have been suggested by some (Wilson, 2007). However, the origins of modern
work analysis are most commonly traced to the early 20th century scientific management move-
ment in the United States, out of which grew the fi rst systematic methods for the analysis of
work and workers, such as the Gilbreths’ famed time-and-motion study methodology and Viteles’
job psychograph—the fi rst standardized work analysis instrument. Many new paradigms and
methods for studying work and workers soon emerged, largely in response to external forces
such as (a) worker displacement and military staffi ng issues of the 1930s and 1940s that led to
work analysis methods based on the specification of (and identification of interrelationships
among) worker characteristics; (b) the expanding economy and social changes of the 1950s and
1960s, along with a rise in associated civil rights litigation, which led to work analysis methods
emphasizing the specific task and behavioral aspects of work and the cognitive or functional
components of work, and also led to the use of highly structured and standardized work analysis
instruments suitable for large-scale application; and (c) the increasing job complexity and work
diversity of the post-1970s “information age,” which led to more integrated, multidomain, and
multipurpose work analysis methods.
Space constraints preclude a comprehensive review of the dozens of work analysis methods,
systems, or instruments published commercially or currently in use in the private, military, and
public sectors (plus the innumerable organization-specific proprietary or consultant-developed
methods), or even a detailed description of just the most significant ones. [More detailed treatment
of such methods can be found in the encyclopedic volumes by Gael (1988), as well as in Brannick
et al. (2007), Gatewood et al. (2008), and Peterson, Mumford, Borman, Jeanneret, & Fleishman
(1999); historical perspectives are available in Mitchell & Driskill (1996), Primoff & Fine (1988),
and Wilson (2007).] Instead, we adapt our information framework as shown in Table 4.2 to display
some of the most significant or representative of current or historical work analysis methods, or
those that are of interest for other reasons so as to illuminate their major characteristics and general
similarities and differences.
Table 4.2 populates the nine cells of our framework with specific work analysis systems that
exemplify each descriptor-category by level-of-analysis combination while adding another row to
account for hybrid methods that cut across such combinations. Some of these represent specific and
well-defined instruments, methods, or programs, whereas others represent more general systems or
approaches. The process aspect of these methods, discussed earlier, is represented in Table 4.2 by
footnotes following methods that, partly or wholly, use qualitative data generation techniques; all
other (nonfootnoted) methods use entirely quantitative methods. We briefly describe a number of
these methods, and then we suggest some practical implications and broader perspectives on this
review. Given our present context of employee selection, we limit our consideration to methods that
either have a history of, or the potential for, such application.
78 Handbook of Employee Selection

TABLE 4.2
Significant Work Analysis Methods Organized by Work Analysis
Information Framework Cells
Level of Analysis
Work Descriptor
Category Broad Moderate Specific
Work-oriented • Job Diagnostic Survey • Minnesota Job Description • TI/CODAPa
content • PPRF Work Styles Questionnaire
• PIC Checklist
Worker-oriented • Dictionary of Occupational Titles • Position Analysis • Cognitive Task Analysisa
content classification structure Questionnaire • Critical Incident
Techniquea
Attribute • Fleishman Ability Requirements • SCANS • Job Element Methoda
requirements Scales • Work Keys • CIP-2000 Knowledge
• Holland Interest Taxonomy Taxonomy
Hybrid • Functional Job Analysisa • Competency Modeling
(multidescriptor • MOSAICa • Strategic Job Modelinga
and/or multilevel) • SHL Universal Competency Framework • O*NETa

a Method generates some or all of its information using qualitative processes (see earlier text discussion).

We begin by noting a subset of what we consider to be “landmark” work analysis methods on


the basis of their historical significance, strong research base, wide usage, or impact on subsequent
developments in work analysis. The Dictionary of Occupational Titles (DOT) occupational classifi-
cation structure that came to fruition in the DOT’s third edition (U.S. Department of Labor, 1965a)
represents the synthesis of two classic work analytic methods—the Labor Department’s analyst-
based methodology (Droege, 1988) and the Data-People-Things scales of Functional Job Analysis.
The DOT resulted in the development of relatively detailed analytic information on essentially
every job in the U.S. economy hierarchically classified in terms of worker functions and worker-
oriented job content, and additionally linkable to extensive worker requirements information (edu-
cation, aptitudes, knowledge, interests, temperaments, and physical demands) gathered in parallel
initiatives (U.S. Department of Labor, 1965b).
The Critical Incident Technique (Flanagan, 1954), originating in the U.S. Air Force’s World War
II pilot selection research program, involves supervisors recalling actual incidents of notably good
and poor performance (observed behavior) in a target job, and analysts subsequently sorting and
grouping these incidents by theme (essentially worker KSAOs), which are then used as the basis for
developing selection tools (as well as performance measures, training programs, and other appli-
cations). It generates rich qualitative data that can provide useful insights into work performance,
especially regarding more subtle (interpersonal and communication) factors. However, it tends to
be relatively labor-intensive and produces organization-specific results that, given their qualitative
basis, may be less stable than empirically derived worker dimensions. Functional Job Analysis
(FJA; Fine & Wiley, 1971) generates carefully structured qualitative information about what a
worker does (tasks) and quantitative information about how a task is performed in terms of the cog-
nitive, interpersonal, and physical functions of a worker, as measured by hierarchically organized
rating scales for data (information or ideas), people (co-workers, customers), and things (machines,
equipment), as well as by additional rating scales for “worker instructions” (degree of work discre-
tion) and general educational development (reasoning, mathematical, and language demands). It is
a highly systematic procedure using qualitative and quantitative data gathering processes, but can
be costly, time-consuming, and labor-intensive, as well as requiring special training for appropriate
application.
Work Analysis 79

The Position Analysis Questionnaire (PAQ; McCormick, Jeanneret, & Mecham, 1972), one
of the most extensively researched and widely used of all work analysis methods, is a structured
questionnaire completed by incumbents or supervisors that describes jobs in terms of 27 stan-
dardized worker-oriented dimensions that are neither highly specific nor extremely broad and are
common across nearly all jobs. It thus lends itself well to quantitative cross-job comparisons, job
family development, and validity evidence extension applications; in particular, synthetic validity
(originated by McCormick as “job-component” validity on the basis of PAQ research) and valid-
ity transportability. TI/CODAP (an acronym for Task Inventory/Comprehensive Occupational Data
Analysis Programs; Christal & Weissmuller, 1988) refers to a collection of computer programs and
applications that analyze and array in various ways quantitative data collected from standardized
task inventories on enlisted military and officer jobs. Following initial implementation in the U.S.
Air Force in 1967, it was eventually adopted as the primary work analysis tool for all branches of the
military and has expanded into widespread use in academia, business, industry, and federal, state,
and local governments. The method is simple and efficient, permitting relatively quick gathering
and analysis of detailed task information from large samples of workers. Its disadvantages include
the often lengthy process required for task inventory development and the fact that its exclusively
work-oriented output is not directly suitable for applications requiring worker-oriented or KSAO-
type data without additional steps or procedures.
The Fleishman Ability Requirement Scales (ARS) methodology is an outgrowth of several lines
of programmatic research involving task characteristics and human ability taxonomies begun in the
1960s (Fleishman & Quaintance, 1984). This led to the development of a highly construct-valid set
of 52 cognitive, physical, psychomotor, and sensory abilities, along with associated rating scales
based on empirically calibrated task or behavioral anchors used to evaluate the ability requirements
of tasks, broader job components, or entire jobs. Later research extended the approach to the cover-
age of social/interpersonal dimensions, knowledge, and skills. The method’s chief assets are the
strong research base underlying the ability sets and rating scales, its relative ease of understanding
and use, and the utility of the resulting ability requirement profiles for quantitative job comparison
and job family development, as well as in providing a direct linkage to potential selection tools
within a criterion-related or construct validation context. It is less well suited to applications requir-
ing work-oriented descriptor content, such as validity evidence development (content validity) and
criterion development, although Landy (1988) provided a case study illustrating how this can be
accomplished.
There is another subset of work analysis methods that can be thought of as “taxonomy-driven,”
that is, methods in which judgments or ratings are made in terms of a predetermined list of attribute
requirement descriptors (which may represent one or more levels of analysis) determined through
prior research to represent all (or the important) elements of a meaningful KSAO domain. In this
respect they are similar to the ARS methodology, but differ from the PAQ approach, in which
analyst ratings are made on specific items and are subsequently converted statistically into broader
dimension scores. Examples of such methods include the Holland interest taxonomy (Holland,
1973), the SHL Universal Competency Framework (Bartram, 2005), the CIP-2000 knowledge tax-
onomy (embodied in the U.S. Department of Education’s current Classification of Instructional
Programs; USDE, 2002), the SCANS skill set (Secretary’s Commission on Achieving Necessary
Skills, 1992), and Work Keys (McLarty & Vansickle, 1997). Not all of these were developed as work
analysis methods per se (such as SCANS and CIP-2000), but they all can be or have been readily
adapted for such application.
Several of the methods included in Table 4.2 might be thought of as “special-purpose” methods.
The Job Element Method (Primoff, 1975) was an innovative approach to developing selection
standards primarily for trades and labor jobs, as well as an early application of synthetic validity,
when developed at the U.S. Civil Service Commission (now the U.S. Office of Personnel Management)
during the 1950s, although it is no longer in significant use. The Job Diagnostic Survey (JDS;
Hackman & Oldham, 1976) measures, through survey items, the “motivating potential” of jobs in
80 Handbook of Employee Selection

terms of five broad work-oriented job dimensions—skill variety, task identity, task significance,
autonomy, and feedback—predicted by the method’s underlying “job characteristics theory” to be
positively related to work performance because they reflect inherently motivating characteristics of
work. The Minnesota Job Description Questionnaire (MJDQ; Dawis, 1991) is based on a taxonomy
of 21 “occupational reinforcers” that describe occupations in terms of various job circumstances or
conditions that can be related to individuals’ work preferences. The JDS and MJDQ are generally
viewed as aids to job design, work satisfaction prediction, and applications other than conventional
personnel selection (they do not lend themselves to the identification of sets of KSAOs on which
selection tools would be built), but instruments of this type do offer possible benefits for selection
that we discuss later in the chapter.
We characterize a final group of methods displayed in Table 4.2 as “new or emerging” approaches
to work analysis in that they attempt, in one way or another, to meet the challenges of the chang-
ing nature of work. (We provide a more in-depth discussion of one such approach, Competency
Modeling, later in this chapter.) The Personality-Related Position Requirements Form (PPRF;
Raymark, Schmit, & Guion, 1997) and the Performance Improvement Characteristic (PIC) check-
list (Hogan, Davies, & Hogan, 2007) attempt to measure the personality requirements of jobs via
structured questionnaires, thereby providing a means to supplement more conventional (task- or
ability-oriented) work analysis methods with the identification of some of the nontask and noncog-
nitive requirements of work, as we discuss later. Cognitive task analysis (Schraagen, Chipman, &
Shalin, 2000) encompasses various qualitative methods for analyzing the (unobservable) mental
processes (e.g., strategies and knowledge) used by experts to perform cognitively complex work
tasks and how experts and novices differ in such performance. Used primarily to support training
applications, such methods have not as yet demonstrated obvious or easy applicability to employee
selection—especially given their cost, complexity, and time- and labor-intensiveness—but they do
warrant our continued interest, particularly in view of the increasing cognitive complexity of work
in many sectors of our economy.
The U.S. Office of Personnel Management’s Multipurpose Occupational Systems Analysis
Inventory-Close-Ended (MOSAIC) system (Rodriguez, Patel, Bright, Gregory, & Gowing, 2002),
which has become the federal government’s primary work analysis system, is a multipurpose,
automated, survey-based work analysis approach used to simultaneously collect information (from
incumbents and supervisors) on many jobs within a broad occupational area (such as information
technology, professional and administrative, trades and labor). It is based on use of a common lan-
guage for both competencies (KSAOs) and generalized task statements that are relevant to an entire
occupational area (rather than specific jobs), and hence lends itself well to cross-job comparisons in
terms of work- and worker-oriented descriptors. It is useful for personnel selection as well as other
HR applications (such as workforce planning, training needs assessment, and performance manage-
ment), the integration of which is facilitated by the underlying common descriptor base.
The Occupational Information Network (O*NET™), treated in depth elsewhere in this volume,
was developed by the U.S. Department of Labor in the mid-1990s (Peterson et al., 1999) to replace
an outmoded DOT system with an online electronic database of greatly expanded job information
and more effective methodologies for data collection, analysis, and updating. O*NET was premised
on the need for a comprehensive, theoretically based, and empirically validated common language
that represented a hierarchical, taxonomic approach to work description and would therefore be
capable of describing the characteristics of work and workers at multiple levels of analysis. Its
centerpiece is the O*NET Content Model, which serves as the blueprint and integrating framework
for the various descriptive components of the system. The content model encompasses six major
domains of job descriptors representing some 20 individual job descriptor categories or taxonomies
that reflect various (over 270) work- and worker-oriented descriptors on which data are collected
from trained analysts and job incumbents by means of various structured questionnaires and sur-
veys. O*NET research and development is ongoing, with core data having been collected on most
of its 812 substantively defined occupations, whereas several of the newer descriptor taxonomies
Work Analysis 81

(e.g., Tools and Technology, Detailed Work Activities) and occupational task content are still being
built out across the occupational spectrum. Nonetheless, the internal research studies completed to
date provide a great deal of supportive reliability and validity evidence for the system. Moreover,
its potential for various research uses (e.g., see Dierdorff & Morgeson, 2007) and applications (as
evidenced in large and steadily increasing numbers of user downloads of O*NET online products
and tools) is becoming increasingly evident.
Strategic Job Modeling (SJM) is a term coined by Schippmann (1999) to describe an approach to
work analysis that can serve as a basis for integrated HR systems. Its core is a conceptual framework
outlining the key descriptive elements on the work and person sides of the performance equation.
The approach itself consists of a series of steps, suggested guidelines, procedures, and work aids for
obtaining information on these descriptors within the context of a given SJM project. No specific
instrumentation is provided as part of the method; it is essentially a practitioner guide for conduct-
ing work analysis with an explicit emphasis on taking account of strategic organizational issues as a
point of departure. It attempts to integrate the rigor of traditional work analysis methods and prac-
tices within the less well-defined terrain of strategic work analysis (which we discuss later in this
chapter), resulting in a potentially significant evolutionary next step for work analysis and serving as
a stimulus for further thinking and development in such directions (see, e.g., Barney, 2000).

WORK ANALYSIS METHODS REVIEW: SOME PRACTICAL IMPLICATIONS AND


BROADER PERSPECTIVES
The preceding review indicates that practitioners have many options to choose from when consider-
ing what method will best support a particular personnel selection application. There are several
possible ways to narrow these choices. One is to view the specific selection-related applications
discussed previously in terms of our work analysis information framework, as illustrated in Table
4.3. Broadly speaking, predictor development applications are likely to be best served by attri-
bute requirement descriptors at any level of analysis, because selection tools are most commonly
designed as measures of work-related KSAOs. Criterion development applications are best served
either by work-oriented content descriptors at any level of analysis or worker-oriented content
descriptors at a relatively specific level of analysis, because such information provides the most
useful basis for developing relevant work performance measures. Validity evidence development
(content validity) applications are best served by work-oriented content or attribute requirement
descriptor information at a specific or possibly moderate levels of analysis in which the neces-
sary linkages between specific work content and attribute measures can be most readily made and
documented. Existing research suggests that validity evidence extension applications are likely
to be best served by worker-oriented content or attribute requirement descriptors at moderate or
broad analytic levels for validity generalization and work- or worker-oriented content descriptors

TABLE 4.3
Descriptor Appropriateness for Selection-Related Work Analysis Applications
Level of Analysis
Work Descriptor Category Broad Moderate Specific
Work-oriented content CD CD, VD, VE-SV CD, VD
Worker-oriented content VE-VG VE-VG, VE-SV, VE-VT CD, VE-VT
Attribute requirements PD, VE-VG PD, VD, VE-VG PD, VD

CD, criterion development applications; PD, predictor development applications; VD, validity evidence development
(content validity) applications; VE, validity evidence extension applications; VE-VG, encompassing validity general-
ization; VE-SV, synthetic validity; VE-VT, validity transportability.
82 Handbook of Employee Selection

at a moderate level of analysis for synthetic validity, whereas legal considerations suggest that
worker-oriented content descriptors at specific or moderate levels of analysis are most appropriate
for validity transportability.
Alternative work analysis methods can also be evaluated in terms of various specific practi-
cal and technical considerations, such as those described by Gatewood et al. (2008) and further
adapted and expanded here, including such things as the method’s (or instrument’s) (a) availability
and readiness for use; (b) versatility of application to different jobs and work settings, descriptor
needs, and units of analysis; (c) degree of standardization; (d) acceptability to users and stakehold-
ers; (e) analyst or respondent training requirements; (f) sample size needs; (g) reliability and valid-
ity of resulting data; (h) cost; and (i) suitability and utility for the specific intended application(s).
Such evaluations are best undertaken without expecting to find or create the “perfect” work analysis
system or method, because each has strengths and limitations and relatively greater or lesser suit-
ability for particular purposes or applications. Moreover, it is not always necessary to pick a single
method, because it is often possible to create hybrid methods that borrow or adapt elements from
several different approaches.
Reflecting more broadly on this review, it appears that work analysis is at a crossroads—one
rooted in the fact that whereas work in many economic sectors has been changing a lot, work analy-
sis methodology has been changing only a little. The general phenomenon of the changing nature of
work, workers, and the workplace resulting from broader economic, demographic, and technologi-
cal changes has been extensively described and documented for at least the last 20 years (Coates,
Jarratt, & Mahaffie, 1990; Johnston & Packer, 1987), as has its specific ramifications for many orga-
nizational and HR practices, including personnel selection and work analysis (Offerman & Gowing,
1993; Pearlman & Barney, 2000; Sanchez, 1994). Rather than slowing down or stabilizing, the pace
of such changes, if anything, appears to be accelerating (Landy, 2007). However, for work analysis,
after periods of substantial innovation in the middle part of the 20th century, with a few exceptions
the last several decades have been largely ones of methodological refinements, variations, and new
combinations of tried-and-true methods.
Like all organizational practices, work analysis must continue to adapt and evolve to maintain
its relevance and utility. Although the traditional concept of a job may not be “dead,” as some have
argued (Bridges, 1994), its changed settings and dynamics have created new and formidable chal-
lenges for traditional work analysis assumptions and practices. Among those who have speculated
about this (Cunningham, 2000; Fogli & Whitney, 1998; Levine & Sanchez, 2007; Pearlman &
Barney, 2000), there has been some consensus that such challenges imply the need for work analysis
methods with a greater ability to capture such things as (a) strategic and future work requirements
that are based on a more macro, “top-down” (i.e., organization-level) than a micro, “bottom-up”
(i.e., individual and job level) orientation; (b) broader and multiple work roles and work processes
rather than individual jobs and work content; (c) broader sets of worker attributes (e.g., personal-
ity, attitudes, and values) relevant to contextual performance (Borman & Motowidlo, 1993), team
and organizational performance outcomes, and task and individual performance outcomes; and
(d) important elements of the broader work and organizational environment, as well as to incorpo-
rate interdisciplinary perspectives and methodological innovations that would facilitate and enhance
such pursuits. (We elaborate on many of these points later in this chapter.)
There are signs that critical mass has been building toward a new generation of work analysis
approaches that reflect some of these directions, as exemplified by some of the more expansive, ver-
satile, future-oriented, and multipurpose approaches to work analysis that have appeared in orga-
nizational practice and the research literature in recent years, such as PPRF, MOSAIC, O*NET,
various forms of strategic work analysis (such as SJM), work analysis for teams (Brannick et al.,
2007; Jones, Stevens, & Fischer, 2000), and as reflected in an increasing number of recent texts
that recognize and promote such forward-looking perspectives (e.g., Brannick et al., 2007; Landy
& Conte, 2007; Ployhart, Schneider, & Schmitt, 2006). Nonetheless, much of this territory remains
uncharted and unexplored.
Work Analysis 83

KEY WORK ANALYSIS PRACTICE ISSUES


DATA COLLECTION ISSUES IN WORK ANALYSIS
Work Analysis Data Sources
The choice of data sources, like all methodological decisions in work analysis, should be driven by
the specific purposes and goals of the analysis. Among the more commonly used sources of work
analysis data (incumbents, supervisors, higher-level managers, job analysts, training specialists,
or other technical experts), job incumbents are by far the most frequently used. However, there is
an important distinction to be made between a job and its incumbents. Jobs are, in fact, social and
organizational constructions, abstractions based on specific sets of responsibilities required of one
or more job incumbents, such that each incumbent of the same job is charged with performing the
same set of responsibilities. Hence, jobs are actually highly dynamic (sets of responsibilities change
over time, and all incumbents do not carry them out in the same way) and relativistic—a single task
for one incumbent (“making sandwiches” for a short-order cook in a diner) may comprise an entire
job for another (“sandwich-maker” in a specialized gourmet deli). However, most traditional meth-
ods of work analysis implicitly assume the existence of an absolute, or reasonably stable, job as at
least a “convenient truth.” It is therefore not surprising that large numbers of observers of this “job
reality” often have been enlisted in the work analysis process so as to mitigate the bias and idiosyn-
crasies of individual observers by combining and averaging observations of the same “object” (their
job) from multiple vantage points. Under the assumption that those closest to the behavioral realities
of the job are its most objective sources, incumbent ratings of work-analytic units such as job tasks
and KSAOs are often preferred to the ratings of nonincumbents (e.g., trained analysts, supervisors,
and psychologists) because of their higher “face validity” and acceptability among the end users of
such data. In other words, there is a widespread assumption in work analysis that job incumbency
is necessary and even sufficient to ensure valid ratings. However, there is no convincing body of
evidence backing such a belief. That is, selection procedures based on incumbent ratings have not
been found more valid or effective than those based on nonincumbent ratings (Sanchez, 2000).
Moreover, several logical arguments can be made regarding the potential disadvantages of using
incumbents (greater susceptibility to various social influence and impression management biases)
and the potential advantages of using nonincumbents (greater objectivity) as sources of work analy-
sis ratings under different circumstances (Morgeson & Campion, 1997).
The argument for expanding the range of data sources beyond incumbents is further strength-
ened by several characteristics of many contemporary work settings, such as the need for workers
to cross functional boundaries and an emphasis on teamwork and customer service. This implies
potential value in expanding units of work description and analysis beyond the conventional (indi-
vidual positions and jobs) to such units as functional or process teams or broader organizational
entities. This in turn suggests that internal and external customers, suppliers and vendors, and other
colleagues and points of coordination along a product or service delivery chain could likely add
value as sources of information about a given job, work function, or work process in a more broadly
conceived, 360-degree approach to work analysis. Depending on the organizational setting, this
principle could be extended to encompass such additional data sources as organizational strategists
and business planners, job and work process designers, strategically targeted worker subpopula-
tions that cross nominal job boundaries (e.g., knowledge workers and international workers), and
various subsets of contingent workers (e.g., contract and life-of-project workers, teleworkers, and
virtual team members). However, when nonincumbents are used to provide work analysis ratings, it
is important that they have sufficient opportunity to gain first-hand familiarity with the focal job (or
other unit of analysis involved), such as through job observation or interviews, rather than making
judgments based solely on review of written material (lists of tasks, duties, and responsibilities) that
is unlikely to provide the insights necessary to make well-informed judgments (e.g., when estimat-
ing KSAO requirements).
84 Handbook of Employee Selection

Work Analysis Data Types and Level of Analysis


As illustrated in Table 4.1, there are various work descriptors—specific types of information about
work, workers, and the work environment—on which data can be collected as part of a work analy-
sis. Choices among these should be informed by the purposes of the analysis. As Table 4.2 and
our work analysis methods review showed, there are one or more methods or instruments avail-
able for all combinations of descriptor type and level of analysis. Nonetheless, not all such com-
binations are equally well covered, nor do existing methods address all aspects of work analysis
data content that may be of interest or relevance for certain applications or circumstances. For
example, highly cognitive and highly interpersonally oriented work content remain a challenge,
because these domains are not easily represented in task inventories, rooted as they are in largely
unobservable or nuanced behavior. The personality requirements of work also have not been well
represented in traditional work analysis methods, although there are signs of progress in this area,
as described earlier. Finding ways to represent and capture the dynamic nature of work has been a
long-standing problem in work analysis. Methods from other disciplines are available for describ-
ing dynamic work processes, such as the flow, interaction, and strategic impact of work processes
across functions and time. One such method is work process mapping (Brannick et al., 2007), in
which relationships of tasks and work roles to one another and to specific work goals are displayed
in flowchart form. Several innovative approaches along these lines have been detailed by Barney
(2000), such as impact mapping and “strategic modeling scales,” analytic methods for linking work
tasks and worker attributes to an organization’s broad strategic goals. Such techniques could be
helpful supplements to more traditional work description but have not as yet found their way into
mainstream work analysis practice.
Other potentially fruitful types of work analysis data are rooted in the dynamic and relativistic
nature of work and jobs to which we alluded earlier. Job responsibilities are differentially construed
by incumbents as a function of their past experiences and interpretation of the job; that is, they draw
conclusions regarding the meaning of their assigned work on the basis of their own unique process of
sense-making. The resulting variability in incumbents’ interpretations of their jobs leads to different
behavioral patterns of work, even when the work environment is held constant across incumbents.
Without diminishing the value of an “average” work profile resulting from the use of standardized
task- and behavior-based work descriptors (the “convenient truth” to which we alluded earlier),
this situation nonetheless implies potential value in supplementing such data with an approach that
permits a description of how individuals subjectively experience work. For example, this might be
accomplished by having incumbents create and complete their own task or work behavior invento-
ries. Comparison of results across incumbents could reveal important differences in job interpreta-
tion and lead to useful insights having implications for employee selection (as well as work design,
performance management, and job satisfaction). Such an approach could yield similar benefits for
the evolving study of “emotional labor” (Kanfer & Kantrowitz, 2002; Morris & Feldman, 1996),
which explores the antecedents and consequences of various types of work (especially prevalent in
retail, hospitality, sales, customer service, and personal services work) that requires incumbents to
display certain organizationally desired emotions (e.g., empathy when handling a customer com-
plaint) and repress some genuine emotions. For example, work analysis data regarding incumbents’
subjective experience of such emotional demands might point to personality factors of potential
value in selection that would not come to light using more conventional analytic methods.
We touch on descriptor level-of-analysis issues later in this chapter (in discussing the challenges
of large-scale work analysis systems), but a few key points or issues for practitioners are worth not-
ing at this juncture. One perhaps obvious, but still important, point for practitioners is that the more
molecular the level of analysis selected for use in a work analysis, the larger will be the number of
elements (specific descriptors) needed for adequate job or work description. For example, many jobs
can be described in terms of two to five general responsibilities or broad duties, whereas a descriptor
representing a more specific level of analysis (e.g., tasks performed) might require several hundred
task statements to fully describe each of those jobs. This in turn often implies that, other things
Work Analysis 85

equal, more data will have to be collected and that the work analysis process will be correspond-
ingly more costly and labor-intensive than when a broader level of description is involved. Another
point to recognize is that it is always possible to “aggregate up,” or combine data gathered on specific
descriptors into broader descriptor categories, but it is impossible to disaggregate broader descriptor
data to a level of detail on which data have not been collected. The implication of this for practice
is to use the most specific level of analysis for which any future use of the data may be planned or
anticipated. A corollary recommendation is, to the degree that either multiple applications are cur-
rently at hand or unexpected applications may emerge in the future, to collect data on descriptors
representing multiple levels of analysis, perhaps with a bias toward gathering more rather than less
detailed information; this will facilitate the use of the work analysis results in support of more than
one application. A final point, applicable to situations in which multiple jobs are to be analyzed and
compared, is the importance of recognizing that the use of highly specific descriptors (e.g., tasks)
tends to exaggerate the degree of between-job differences, whereas the use of very broad descriptors
(such as abilities) tends to minimize such differences. This is why descriptors at a moderate level
of analysis, such as PAQ dimensions and their derivatives (generalized work behaviors), have often
been found optimal for such applications.

Work Analysis Data Collection Methods


Various data collection methods are available to work analysts, including such things as observa-
tion, interview, questionnaire, diaries, review of work-related documents or records, various forms
of technology-assisted (audio or video) event recording or electronic monitoring, and direct work
participation. Choice of method is dictated chiefly by such factors as the nature of the jobs to be
covered in the analysis, the characteristics of the job incumbents, budget and time constraints, and
(of course) the purpose of the analysis. Earlier we noted the broad distinction that can be drawn
between data collection processes that are primarily qualitative (such as observation- and interview-
based methods) and those that are primarily quantitative (such as structured-questionnaire-based
methods), regarding which several broad characterizations can be made. Qualitative methods tend
to produce highly customized and detailed job-specific information that is invaluable for gaining
an in-depth understanding and feel for a job. By the same token, such information tends to pre-
clude generalization, so the data collection process must be repeated for each job to be analyzed.
Consequently, such methods are often costly, time-consuming, and produce information that is dif-
ficult to keep current or use as a basis for cross-job comparison. Quantitative methods (especially
if automated, which is increasingly the case) allow many different jobs to be analyzed simultane-
ously and quickly, on the basis of quantitative input from many respondents, in terms of the same
variables or characteristics and using the same metrics, thereby facilitating cross-job comparisons.
Consequently, they are generally more efficient, more cost-effective, and (potentially) more reliable
and accurate (although these can be elusive concepts, as we discuss later). On the other hand, they
generally preclude the collection of highly customized or job-specific information.
An overarching issue affecting any data collection processes involving human sources is the
potential for distortion of job information, with or without conscious intent. (Morgeson & Campion,
1997, detailed the many facets of this issue.) For example, incumbents may be motivated for
various reasons to present a positive image of their jobs and are thus prone to inflating their ratings
of task or KSAO importance. The reverse is also possible, as in the case of information sources
such as supervisors or second-level managers underrating the importance of their subordinates’ job
responsibilities so as to elevate the importance of their own roles. We further explore the issue of
work analysis information accuracy and quality below, but one implication of such possibilities is a
recommendation to use more than one data collection methodology whenever possible (e.g., interviews
followed by a structured questionnaire), making it possible to check for convergence between the
information gathered through different methods. This may not be as impractical as it might initially
seem, because the development of a structured work analysis survey generally requires the collection
of qualitative data via interviews or job observation as input into the survey development process.
86 Handbook of Employee Selection

INFERENTIAL LEAPS AND LINKAGES IN WORK ANALYSIS


Gatewood et al. (2008) outlined four types of inferential leaps that come into play when work analy-
sis is applied to employee selection, including (a) the translation of work content information into
worker attribute information, (b) the translation of work content information into work performance
or criterion measures, (c) the translation of worker attributes into actual selection instruments, and
(d) the inferential leap between selection instruments and performance measures. It is important to
recognize that not all inferences are “created equal,” in terms of magnitude or difficulty to make.
Inferences involving fewer or more concretely based judgments or interpretations are generally
of smaller magnitude and easier to make than those involving larger or more abstract judgments.
Hence, inferences b and c tend to be more tractable than inferences a and d. Perhaps the most readily
visible operation of these inferential leaps is in the context of validity evidence development (i.e.,
content validity) applications of work analysis, in which it is entirely possible for all four types of
inferences to be required depending on the specific circumstances. For example, the development
of professional or occupational certification or licensure tests (a significant and highly consequen-
tial application of content validity) typically begins with a detailed analysis of a job or occupation
in terms of its tasks, duties, and responsibilities. Subsequently, this information is used to infer the
important knowledge components (and their relative weights) of the occupation (inference a) and
may at times be used to develop one or more performance or criterion measures (inference b) for
use in future criterion-related validation studies conducted to augment the content validity evidence.
The knowledge requirements are then further specified and translated into a test plan on the basis
of which items are constructed to measure all of the required knowledge areas in the appropriate
proportions (inference c). The test’s content validity may be established through several means,
including appropriate documentation of all the steps and judgments just described, the use of one
or more statistical indices (summarized and evaluated by Lindell & Brandt, 1999) available for
evaluating the strength of the test-performance relationship (inference d) on the basis of item-level
job-relatedness judgments of SMEs and eventual examination of the empirical relationship between
test and job performance (also inference d), as just noted.
Content validity applications notwithstanding, it is inference a—the use of job or work infor-
mation to determine the worker attributes needed to effectively carry out a job’s activities and
perform its required behaviors—that has received the most attention in the general application of
work analysis to predictor development. Several methods have been developed to bring a degree of
rigor to this judgment process. In some applications, such as in the O*NET system, the importance
of various attribute requirements is rated or estimated directly by incumbents on the basis of their
own job knowledge and experience with the aid of task- and behavior-anchored rating scales. Other
researchers have proposed an explicit set of “linkage” ratings (Goldstein, Zedeck, & Schneider,
1993) for determining KSAO importance on the basis of the strength of the judged relationship
between individual tasks and individual KSAOs (hence these are considered indirect attribute
importance estimates). Research has shown that SMEs are indeed capable of forming reliable link-
age ratings, although analysts’ judgments may be somewhat more reliable than those of incumbents
(Baranowski & Anderson, 2005). Still another stream of research has explored the underlying cova-
riance between task and KSAO ratings, even suggesting the possibility of empirical derivation of
KSAOs from task ratings (Arvey, Salas, & Gialluca, 1992; Goiffin & Woycheshin, 2006; Sanchez
& Fraser, 1994). Fleishman and colleagues showed how abilities can be mapped onto empirically
derived dimensions of performance (Fleishman & Quaintance, 1984), whereas McCormick’s PAQ
research (McCormick et al., 1972) uncovered empirical relationships between job elements and
worker attributes via factor analysis. More recent research along similar lines using the initial
O*NET database also found meaningful linkages among a wide range of work and worker char-
acteristics, including abilities, generalized work activities, work styles, knowledge, occupational
values, skills, and work context (Hanson, Borman, Kubisiak, & Sager, 1999). In summary, there
is strong evidence of meaningful covariation between ratings of different work-, worker-oriented,
Work Analysis 87

and attribute-oriented descriptors, suggesting that SMEs’ inferences about attribute requirements—
whether via direct or indirect estimation—are well grounded in their judgments of work activities
or job tasks. Although the specific processes and individual judgment policies through which such
ratings are made remain a matter for further research, these findings nonetheless provide reasonably
strong underpinnings for the various methods and techniques that have been developed and used to
make such linkages in practice.

EVALUATING WORK ANALYSIS QUALITY


Primoff and Fine (1988), describing the state of work analysis in the first part of the 20th century,
noted, “Although described by Munsterberg, the idea that the validity of a job analysis was a circu-
lar process dependent on the validity of the products for which the job analysis was undertaken was
not yet widely understood” (p. 17). After characterizing work analysis as “analytic fact gathering
about job-worker situations designed to meet a particular purpose and satisfy a particular need,”
they noted that “not until the outcome of the job analysis, the need for which it was undertaken, is
scrutinized can we determine how complete, accurate, and objective the analysis was, and how effi-
cient were the mode and extent of the fact gathering” (p. 14). They go on to point out how the early
20th-century “scientific management” movement in industry led to a focus on the methodologies
of fact-gathering as such, which they asserted are secondary to more fundamental conceptual chal-
lenges of work analysis, including the importance of recognizing “that the validity of a job analysis
is integral with the validity of the products derived from it, whether a selection test, a performance
evaluation scale, or a training curriculum—the validity of the one is the validity of the other” (p.
14). It is hard to imagine a clearer or more forceful argument for what has since been termed work
analysis “consequential validity” (Sanchez & Levine, 2000), linking this notion back to the very
roots of industrial-organizational (I-O) psychology.
Discussion of how best to evaluate the accuracy, quality, and meaning of work analysis data has
been going on for a long time, but has been given more recent impetus by various empirical studies
(Wilson, 1997) and conceptual/theoretical work (Morgeson & Campion, 1997, 2000) illustrating
how potentially difficult it is to collect accurate and valid job or work information—or even to agree
on what this means. For example, is it inter-rater agreement, convergence among different data
sources, or convergence between work analysis output and some external standard or benchmark?
This has in turn led to some spirited discussion and exchanges on the subject and continued debate
as to the most appropriate standards for evaluating work analysis. Cogent arguments have been
made to the effect that (a) work analysis data accuracy is a meaningful concept and an attainable
goal (Harvey & Wilson, 2000); (b) such accuracy is an ambiguous and elusive concept that may or
may not be attainable, depending on how it is defined (Morgeson & Campion, 2000); and (c) such
accuracy is neither meaningful nor attainable because of the dynamic and relativistic nature of work
and jobs (as we described earlier) and the consequent absence of a “gold standard” or true score on
which to evaluate accuracy and is in any case subsumed by the more relevant standard of conse-
quential validity (Sanchez & Levine, 2000). These arguments involve several methodological and
even philosophical complexities that are beyond the scope of this discussion, but we also note that
they are somewhat confounded by differing frames of reference and definitions of terms. Despite
this, there seems to be little disagreement about the ultimate and overarching importance of the
validity of the inferences made from work analysis data, that is, the effectiveness or utility of such
data for making sound personnel decisions, as advocated long ago by Munsterberg.
Within the context of personnel selection, the inferences at issue range from those that are imme-
diately supported by work analysis data (such as inferring worker attributes from task importance
ratings) to those that are more distally supported by such data (such as inferring the validity of
work-analysis-based selection instruments by examining their correlations with performance mea-
sures). Although we recognize that such consequences provide only partial information about the
impact of work analysis on decision-making, we nonetheless believe there is considerable value in
88 Handbook of Employee Selection

thinking of work analysis in terms of its more broadly conceived consequential validity, because
this provides a framework for demonstrating and documenting to organizational leaders and stake-
holders its pivotal role in linking personnel selection (as well as many other HR practices) to critical
aspects of work performance (as revealed through work analysis). This value is further enhanced to
the degree that such practices can be shown to produce significant “returns on investment” (Becker,
Huselid, & Ulrich, 2001); for example, in the form of individual or organization-level performance
improvements.

FRONTIERS OF WORK ANALYSIS: EMERGING


TRENDS AND FUTURE CHALLENGES
WORK ANALYSIS IN SUPPORT OF SELECTION FOR “HIGH-PERFORMANCE” WORKPLACES
The concept of high-performance (or high-involvement) organizations (HPOs; Lawler, Mohrman, &
Benson, 2001) is a direct outgrowth of the sweeping work and workplace changes to which we
alluded earlier that have been occurring over the last 20 years or so. It refers to organizations that
have incorporated into their strategy, culture, and structure various elements believed to maximize
their performance and ability to compete effectively in the global economy. These include such
workplace practices as (a) worker empowerment, participation, and autonomy; (b) the use of self-
managed and cross-functional teams; (c) commitment to superior product and service quality;
(d) flat organizational structures; (e) the use of contingent workers; (f) flexible or enriched design
of work that is defined by roles, processes, output requirements, and distal criteria (customer
satisfaction, contribution to organization values), rather than by (or in addition to) rigidly prescribed
task- or job-specific requirements; (g) rigorous selection and performance management processes;
and (h) various worker- and family-friendly HR policies that reward employee development and
continuous learning and support work-life balance. There is a growing body of evidence (e.g.,
Cascio & Young, 2005; others summarized in Gibson, Porath, Benson, & Lawler, 2007) that such
workplace practices can indeed contribute to important organization-level outcomes (e.g., financial
performance, productivity, and customer satisfaction), although these conclusions are far from
definitive (Boselie, Dietz, & Boon, 2005; Staw & Epstein, 2000).
Particularly relevant for our discussion is evidence within this larger body of research that links
such HPO-oriented workplace practices and outcomes with the individual worker attributes and
behaviors needed to affect them (Boselie et al., 2005; Guest, Conway, & Dewe, 2004; Guthrie, 2001;
Huselid, 1995; MacDuffie, 1995; Spreitzer, 1995). For example, formerly individual-contributor sci-
entists who have been reorganized into cross-functional teams with engineering and marketing
staff to improve a product delivery cycle may need social and communication skills in addition
to research skills (a sort of work context “main effect”). Work that has been redesigned to cre-
ate greater worker autonomy may improve motivation, and hence performance, among individuals
with high growth need strength but not in others (a work context “interaction effect”; Hackman &
Oldham, 1976). It is not enough for employees at Disneyland (“the happiest place on earth”) to sim-
ply work, they must “whistle while they work”—literally for seven particular work roles and figura-
tively for all others—so as to contribute to one of that setting’s most critical outputs (cheerfulness).
In other words, different organizational strategies—and how they are reflected in an organization’s
culture and structure—imply potential needs for additional or different (or different configurations
or weightings of) worker KSAOs, the measurement of which could enhance existing selection sys-
tems. Moreover, HPO-associated strategies have, in many cases, increased the organizational value
of various non-job- and nontask-specific performance criteria, such as contextual performance;
employee satisfaction, commitment, engagement, and retention; and avoidance of employee with-
drawal and counterproductive behaviors.
These developments imply the need for work analysis methods that incorporate measures
of a greater variety of work context factors—particularly those associated with or driven by an
Work Analysis 89

organization’s vision, mission, strategy, structure, culture, and values—than are addressed in con-
ventional methods, which, if present at all (e.g., as they are in FJA, PAQ, and O*NET), tend to be
limited to those associated with specific jobs and work settings (work schedule, working conditions,
environmental hazards) rather than the broader organizational and external context (business strat-
egy, competitive environment, market conditions). Absent such measures, we may fail to detect the
need for potentially important or useful worker attributes and potentially critical selection proce-
dure validation criteria. For example, it is plausible that areas where noncognitive attributes (such
as personality traits, attitudes, and values) might have their greatest predictive value remain largely
unexplored because of our historical focus on more conventional (job- and performance-oriented)
criteria that ignore the broader organizational context of work. Models for explaining and under-
standing this are only just beginning to emerge (Tett & Burnett, 2003). We would go so far as to
argue that the definition, measurement, and mapping of the work environment—in effect, creating
a “common language” of work context—at multiple levels of analysis is the next major frontier in
work analysis. This is not a small challenge. It is a problem of long standing in psychology as a
whole (Frederiksen, 1972) and continues to be one of acute importance in I-O psychology today
(Johns, 2006).

STRATEGIC WORK ANALYSIS AND COMPETENCY MODELING


The developments described in the preceding subsection, viewed from a broader level, imply the
need for a fundamental rethinking of a work analysis enterprise that is largely rooted in the tra-
ditional, industrial-age workplace, an enterprise designed to support what Snow and Snell (1993)
characterized as staffing Model 1—the traditional practice of matching people to individual jobs.
As a result, most current selection-related work analysis practice is premised on what has increas-
ingly become a myth—that all the important human attributes an organization might want to mea-
sure in a selection system are discoverable by studying what individuals do on their jobs. This
bottom-up orientation, inherent in work analysis methods based on analyst observation or incum-
bent self-reporting of mostly observable job activities and conditions of work, largely ignores three
major top-down elements—organizational strategy, organizational structure, and organizational
culture—that reflect an organization’s vision and mission, drive much of its daily functioning, and
can (in some instances, profoundly) affect the choice, configuration, and relative importance of
KSAOs comprising an organization’s selection system, independent of the nature of the work per-
formed (i.e., specific jobs, work processes, or work outputs) by individuals across the organization.
(Williams & Dobson, 1997, provided specific examples of this.) Therefore this latter top-down per-
spective is highly relevant to the alternative, nontraditional staffing models being adopted by many
contemporary organizations that, for example, view staffing as a tool in strategy implementation
(Snow and Snell’s Model 2, applicable to organizations with clear strategies and known competitors)
or strategy formation (Model 3, applicable to organizations that need to develop or change strategies
quickly).
Such circumstances challenge us to devise work analysis methods that are better adapted to these
new or emerging organizational realities, as suggested in our earlier discussion of data collection
issues, such as (a) the use of multiple or combined work analysis methods and different units of
analysis; (b) the adaptation of innovative techniques from other disciplines; (c) the development and
deployment of enhanced work context descriptors; (d) the development and deployment of enhanced
descriptor sets for the cognitive, interpersonal, and temperament worker attribute domains; and
(e) the use of more diverse types of SMEs as data sources, as well as the use of conventional
data sources (incumbents) in innovative ways. Perhaps equally important, such challenges demand
work analysis methods that (preferably) begin with, or at least explicitly incorporate, various types
of (external and internal) organization-level analyses—market and demographic trends, competi-
tive environment, emerging technology, business and strategic plans, organizational culture and
style—as is routinely done in work analysis to support training system development and has been
90 Handbook of Employee Selection

similarly recommended for selection-oriented work analysis (Goldstein, 1997). This would provide
the critical context to facilitate more specific work analytic efforts, thereby also facilitating the
direct generation of worker KSAOs related to broader organizational criteria, strategies, and goals.
Such an approach could in turn provide a framework from which other, more broadly conceived,
selection-related applications of work analysis (in addition to the four outlined earlier) might be
explored and capitalized on. For example, one such application could be the provision (via ads, real-
istic job previews, or other recruiting tools) of customized information to applicants about various
aspects of work and worker requirements (e.g., context factors, career ladders and lattices based on
job interrelationships and skill or knowledge transferability) that are potentially related to applicant
attraction and subsequent organizational commitment. Another example is collecting work analy-
sis data on contextual and other factors relevant to selection for nontraditional or nonperformance
criteria, such as successful post-hire assimilation and socialization, or different levels of employee
“fit” (person-job, person-role, person-team, person-organization; Higgs, Papper, & Carr, 2000). Yet
another example is using work analysis questionnaires more in the mode of an organizational survey
(i.e., as an ongoing or regularly recurring intervention) rather than exclusively as a “one-and-done”
tool for work profiling; this could provide a measure of work content and work context stability/
volatility and offer insights into the nature of such content or context changes and their potential
implications for selection-related worker attributes.
All of this suggests a potentially useful reconceptualization of work analysis as organizational
strategy—that is, as a strategic tool—and hence characterized by a strong organization develop-
ment (OD) component, as has been suggested by Schippmann (1999) and Higgs et al. (2000). It
also suggests the need to bridge a historical and professional disconnect between those who have
tended to view jobs and work from a traditional, “micro” perspective (e.g., personnel and industrial
psychologists, training and education professionals, cognitive psychologists, occupational analysts,
industrial engineers, and human factors specialists) and those who have tended to look at work
from a more “macro” perspective (e.g., labor economists; sociologists; business and management
consultants; demographers; ethnologists; and clinical, social, and organizational psychologists). In
our view, the work analysis enterprise would be better served by an integration of these perspec-
tives, facilitated by much more interdisciplinary work among such professionals than historically
has been the case, as some have called for (Barney, 2000; Cunningham, 2000). Such a reframing
and associated changes in work analysis practice—and practitioners—underlie what we believe to
be a broader and potentially more useful concept of “strategic work analysis” (SWA) as a systematic
effort to identify or define current or anticipated work or worker requirements that are strategically
aligned with an organization’s mission and goals. This would subsume some other related terms and
practices in current use, such as future-oriented job analysis, strategic job analysis (which is some-
times used as a synonym for the prior term—and sometimes not), strategic job (or work) modeling,
and competency modeling, which, given its substantial impact on contemporary work analysis dis-
cussion and practice, we now consider in greater depth.
Competency modeling (CM) is a form of work analysis whose use has become widespread since
about the mid-1980s. Its proponents (chiefly practitioners and consultants) view it as a fundamen-
tally new and different form of work analysis that holds the key to developing integrated HR systems
that are aligned with an organization’s strategy, culture, or values. Its critics (chiefly researchers
and more traditional work analysis professionals) have viewed it as simply an alternative label for
conventional work analysis and have expressed doubts and concerns about its technical rigor and
overall validity (Barrett & Callahan, 1997), concerns that have been borne out in empirical evalu-
ations (Lievens, Sanchez, & De Corte, 2004; Morgeson, Delaney-Klinger, Mayfield, Ferrara, &
Campion, 2004). Discussions of the subject are complicated by the fact that there is no professional
consensus regarding the meaning of either “competency” as a concept or “competency modeling”
as a methodology. There appear to be almost as many definitions of these terms as there are users
of them (Schippmann et al., 2000). This problem is compounded by the fact that most existing defi-
nitions describe a complex and multifaceted concept, such as that offered by Spencer, McLelland,
Work Analysis 91

and Spencer (1994). They define competency as a combination of motives, traits, self-concepts, atti-
tudes, values, content knowledge, or cognitive behavior skills and as any individual characteristic
that can be reliably measured or counted and that can be shown to differentiate superior from aver-
age performers. The difficulty with such definitions is that they lump together in a single construct
attributes representing vastly different domains, characteristics, and levels of analysis, which limits
its value conceptually (Clouseau-like, it means everything, therefore it means nothing) and practi-
cally (as a useful or measurable descriptor or unit of analysis in work analysis).
The typical output of a CM project is a set of worker attributes (competencies) believed to con-
tribute to an organization’s broad strategy and goals, culture, or values. As such, these attributes
are (at least in theory) applicable across the entire organization, or within large units or functional
areas, and thereby (in a project’s fullest application) able to serve as a common framework underly-
ing the various components of an integrated HR system, such as training and development, perfor-
mance management, compensation, and selection/promotion. Each competency is given a name or
label (and, in some cases, a definition) and is usually accompanied by a set of behavioral indicators
(BIs) that exemplify desirable (and, in some cases, undesirable, as well as “moderate”) behavioral
manifestations of the competency and thereby serve as the basis for measuring individuals’ stand-
ing on the competency. Multiple sets of BIs are often developed to address a given competency’s
manifestation across different job families (sales, engineering), across functional specialties within
a job family (account executive, technical consultant, customer service representative), or across
occupational levels within a single job. For example, a “systems thinking” competency (defined as
“making calculated decisions that take into account impact on other activities, units, and individu-
als”) might have different BIs for sales managers (“evaluates the impact on others before changing
work processes”) than for account team leaders (“helps staff understand how their function relates
to the overall organization”), as appropriate for these different roles.
We believe there is a huge chasm between the CM ideal envisioned by its proponents and the
actual practices that, under the rubric of CM, produce the type of output described above. In our
experience, the majority of such practices fall into one of two categories. The first category involves
entirely traditional work analysis, of one form or another, which leads to the development of sets
or taxonomies of well-defined, work-related (but not strategy-, culture-, or values-related) person
attributes (KSAOs) and associated metrics (behavioral or numerical rating scales, tests, or other
instrumentation) that meet accepted professional standards for such work. Such activities are
labeled as CM, and the KSAOs called competencies, to satisfy explicit or implicit requirements
of organizations or particular leaders. The second category purports to derive attributes related
to organizational strategy, culture, and values but entails the use of poorly conceived, incomplete,
or otherwise inadequate procedures (e.g., convenience samples engaged in unstructured group
discussions conducted without reference to individual or organizational work performance) that
lead to the development of ad hoc, idiosyncratic, ill-defined (or undefined) concepts or “folk
constructs”—ad hoc or “armchair” concepts or labels devised without reference to existing research
or theory—that are often little more than a “wish list” of desired worker attributes or purported
organizational values along with brainstormed (and typically unvetted and unvalidated) examples
(behavioral indicators) of good performance for each identified competency.
The above discussion is not meant to impugn the CM ideal of its proponents, but rather to high-
light the disconnect we perceive between this ideal and most contemporary CM practice, which
is either fairly rigorous but not explicitly “strategic” (the first category described above) or osten-
sibly strategic but not very rigorous, and hence ultimately unsuccessful in its strategic intent (the
second category described above). We would, in fact, classify this ideal as a third (although still
mostly unrealized) category of CM practice—one which combines the laudable goals of (a) linking
organizational strategy (and other organization-level variables and outcomes) to desired individual
employee attributes (an early model for which was presented by McLagan, 1988) and (b) utilizing
rigorous development methodologies of both conventional work analysis and other disciplines to
ensure the validity of these linkages, much as proposed by Schippmann (1999) and Barney (2000).
92 Handbook of Employee Selection

For example, the “traditional” Critical Incident Technique can be readily adapted to the genera-
tion of genuinely strategically driven competencies and associated BIs, requiring only a change
of frame of reference for incident generation from specific jobs to various organization-level vari-
ables, combined with the use of SMEs appropriate for this frame of reference. The unique aspect
of this category of CM practice is its explicit strategic organizational focus, without reference to
the work performed in any particular jobs. This is why we believe it is most appropriately regarded
as simply one particular form of the more broadly conceived SWA concept we proposed above,
and why (along with all of the conceptual and definitional ambiguities noted above) it has been
argued (Pearlman, 1997) that the terms “competency” and “competency modeling” be abandoned
altogether.
Viewing the CM ideal as one form of SWA (or, perhaps more precisely, as SWA with a particular
output format, as outlined above) clarifies and hence enhances our understanding of its potential
value to organizations. For example, CM proponents tout the utility of CM output for such purposes
as employee selection, promotion assessment, succession planning, and performance management
(including multisource feedback, via assembling and adapting BIs from an entire set of competen-
cies into a survey format). However, a less frequently recognized value of this particular SWA out-
put format, when rigorously developed, lies in the fact that it in effect functions as a set of relatively
concrete and meaningful performance standards for often abstract or complex constructs, as cus-
tomized for a given organization. As such, it serves a broader (and more subtle) purpose of influenc-
ing and (when built into performance management and feedback systems) incenting employees to
behave in ways that are consistent with the organization’s vision, mission, strategic objectives, and
values, especially to the degree that the competency model is widely communicated and promoted
by leaders throughout the organization, as is often the case. In other words, it serves as a strategic
tool to build and reinforce an organization’s culture.
To the degree that this is regarded as a major purpose of a given CM application, it arguably
should not be evaluated in terms of the same criteria as conventional work analysis methods used for
conventional HR applications. For example, it can be argued that, to the degree they achieve the goal
of influencing people to behave in strategically aligned ways within a given organizational context,
competency models that are idiosyncratic to particular organizations (as opposed to taxonomically
or theoretically driven) are of legitimate value (Becker et al., 2001), as would be consistent with the
OD, rather than psychometric, orientation of such applications—an assertion that gives new mean-
ing to the (tongue-in-cheek) adage, “It works great in practice, but it’ll never work in theory!” This
is not an argument for poor methodology or sloppy development work. Obviously, competencies
that are so nebulous or behaviorally “empty” as to defy organizationally meaningful definition and
behavioral referents will scarcely lend themselves to objective measurement or be of any organiza-
tional utility. As we noted above, this is where conventional work analysis methods, appropriately
adapted, can play a significant role by bringing methodological rigor to the development process.
The major need going forward, as we see it, is for creative thought and research addressing such
potential adaptations (such as the beginning efforts of Lievens et al., 2004, and Lievens & Sanchez,
2007), as well as the development of new data collection methods and approaches, to support all
varieties of SWA.

QUEST FOR A “COMMON LANGUAGE” AND THE CHALLENGE OF LARGE-SCALE,


MULTIPURPOSE WORK ANALYSIS SYSTEMS
The concept seems simple. Develop comprehensive sets of standardized work- and worker-oriented
descriptors representing multiple levels of analysis and then determine their interrelationships
within a single analytic system that could thereby be used to derive the work content and worker
requirements of any job. Such a system, especially when fully automated (as is easily accomplished
nowadays), could serve as the basis for a powerful HR data and information system (or “human asset
management” system, in today’s jargon), underpinning and integrating numerous HR functions,
Work Analysis 93

such as selection and staffing (especially validity evidence extension applications, because it is
ideally suited for cross-job comparison), training and career development, performance manage-
ment, and workforce planning. At a broader level it could provide the means for tracking trends and
changes in work content and occupational structure across the economy; for assessing and address-
ing national “skill gaps,” and skill transferabililty and occupational portability issues; and for study-
ing selection and talent allocation issues at a national level. From a scientific standpoint, it would
constitute a critical research tool for advancing theory development regarding work performance
and the structure of work or occupations—in effect, moving us closer to a “unified theory of work”
(Vaughn & Bennett, 2002).
This notion of “a complete, universally applicable information system for human resources allo-
cation” (Peterson & Bownas, 1982, p. 49) based on taxonomic information about work, work envi-
ronments, and human attributes—a “common language” of people and jobs—has long been viewed
as something of a “holy grail,” enticing work analysis researchers and practitioners for the better
part of 80 years. Such a system, when fully realized, would have underpinnings of structure (mean-
ing logical interrelationships among both descriptor categories and specific elements within those
categories) and standardization (meaning common definitions, rules, and metrics) that thereby pro-
mote common understanding and usage of system elements among all users and stakeholders. This
was the driving vision behind the U.S. Labor Department’s Occupational Research Program of the
1930s and its development of the DOT, and was reflected to varying degrees in such later systems
as FJA, the PAQ, ARS, MOSAIC, and O*NET. In our view, no single system has as yet been able
to fully realize this vision, although O*NET probably comes the closest in terms of its scope and
analytic capabilities.
Despite the simplicity and elegance of the concept, the practical realization of such a system is
enormously complex. This led Higgs et al. (2000) to conclude that “most systems like this … have
held great conceptual promise but … have eventually died of their own administrative weight and
expense” (p. 108). Many complex choices and decisions must be made in the conception, design,
and implementation of such a system, depending on the applications or objectives at issue, such
as (a) descriptor coverage—how many and which work- and worker-oriented attribute domains
will be included in the system—and the associated question of whether the common framework
will be operationalized as a single set of a relatively limited number of descriptor elements rep-
resenting a single level of description, or as multiple descriptor sets or taxonomies representing
multiple attribute domains and levels of descriptions; (b) descriptor level of analysis (the breadth
or narrowness of descriptor definition, as well as whether to allow multiple levels of analysis
via the use of hierarchical descriptor element taxonomies); (c) whether descriptor coverage will
apply (or will be designed so as to allow or promote application) to work, to workers, or to both;
(d) whether individual jobs will be described exclusively in terms of descriptor sets that are used
across all jobs in the system or will also include some types of job-specific information (such as
tasks or tools/technology); (e) the characteristics of the metrics or scales by which descriptors will
be quantified; (f) the policy and deployment questions of how much and which parts (descriptors)
of a common framework will be required for use by all organizational units and which parts, if
any, can be user-specific, which speaks to the critical issue of gaining the support and cooperation
of multiple users and stakeholders, without which the system is unlikely to succeed; (g) devising
efficient and effective procedures for ongoing data collection; and (h) devising procedures for
maintaining and updating the system’s data structure, which itself involves numerous technical
and practical challenges (e.g., the dilemma of changing or incorporating new data elements to
respond to changed needs or realities while maintaining comparability and continuity with the
prior data structure).
Despite this wide range of options, decisions, and challenges, we believe that the vision of such
a system continues to be both worthy and viable, if approached in manageable steps, segments, or
prototypes, on the basis of sound professional judgment, and undertaken with broad and high-level
organizational support.
94 Handbook of Employee Selection

SYNOPSIS AND CONCLUSIONS


Work analysis seems to have garnered a reputation as one of the less interesting and challenging
areas of I-O psychology and HR practice. (It has been noted, in a masterstroke of understatement,
that “job and occupational analysis is not a glamorous or high visibility area on which to build a per-
sonal career or secure tenure” [Mitchell & Driskill, 1996, p. 129].) This is surprising, given that its
subject matter—the who, what, where, when, why, and how of work—is at the core of most of what
these fields are concerned with. One possible explanation may lie in the fact that, as we noted at the
chapter’s outset, work analysis is rarely a destination; it is almost always a road—a way to get from
here to there. In the rare instances in which it is a destination (most commonly, in the conduct of work
analysis as documentation for an actual or potential lawsuit), it is not an eagerly anticipated one.
We hope that this brief “walk down the road” of work analysis in the context of personnel selec-
tion serves to change such perceptions. We believe that, to meet the types of challenges described
in the previous section, work analysis needs to be reconceptualized more broadly as a strategic,
multistep, multifaceted, and interdisciplinary effort that is at least as much a top-down process (i.e.,
one based on analysis and understanding of macro-organizational strategy and context factors) as a
bottom-up process (i.e., one based on analysis of what workers actually do). This implies the need
to rethink the conventional boundaries of work analysis—what it consists of, who does it (and with
what qualifications and organizational roles), and how it gets done. Such rethinking would promote
a transformation of the work analysis enterprise from one of merely gathering information to one
of generating insight, meaning, and knowledge about work. This would in turn contribute to theory
and practice. We believe that even modest strides in these directions would yield significant returns
in terms of improving the efficiency and effectiveness of the (broadly conceived) employee selec-
tion life cycle. Although such a shift in orientation may not immediately change the work analysis
enterprise from a road to a destination (nor necessarily should it), it will at least make the journey
more interesting and productive.

REFERENCES
Anderson, N., & Ostroff, C. (1997). Selection as socialization. In N. Anderson & P. Herriot (Eds.), International
handbook of selection and assessment (pp. 413–440). New York, NY: Wiley.
Arvey, R. D., Salas, E., & Gialluca, K. A. (1992). Using task inventories to forecast skills and abilities. Human
Performance, 5, 171–190.
Baranowski, L. E., & Anderson, L. E. (2005). Examining rating source variation in work behavior to KSA link-
ages. Personnel Psychology, 58, 1041–1054.
Barney, M. (2000). Interdisciplinary contributions to strategic work modeling. Ergometrika, 1, 24–37.
Barrett, G. V., & Callahan, C. M. (1997, April). Competencies: The Madison Avenue approach to professional
practice. In R. C. Page (Chair), Competency models: What are they and do they work? Practitioner forum
conducted at the meeting of the Society for Industrial and Organizational Psychology, St. Louis, MO.
Bartram, D. (2005). The great eight competencies: A criterion-centric approach to validation. Journal of Applied
Psychology, 90, 1185–1203.
Becker, B. E., Huselid, M. A., & Ulrich, D. (2001). The HR scorecard: Linking people, strategy, and perfor-
mance. Boston: Harvard Business School Press.
Borman, W. C., & Motowidlo, S. J. (1993). Expanding the criterion domain to included elements of contextual
performance. In N. Schmitt, W. C. Borman, & Associates (Eds.), Personnel selection in organizations
(pp. 71–98). San Francisco, CA: Jossey-Bass.
Boselie, P., Dietz, G., & Boon, C. (2005). Commonalities and contradictions in HRM and performance research.
Human Resource Management Journal, 3, 67–94.
Brannick, M. T., Levine, E. L., & Morgeson, F. P. (2007). Job and work analysis: Methods, research, and appli-
cations for human resource management. Los Angeles, CA: Sage.
Bridges, W. (1994). Job shift: How to prosper in a world without jobs. Reading, MA: Addison-Wesley.
Cascio, W. F., & Young, C. E. (2005). Work-family balance: Does the market reward firms that respect it? In
D. F. Halpern & S. E. Murphy (Eds.), From work-family balance to work-family interaction: Changing
the metaphor (pp. 49–63). Mahwah, NJ: Lawrence Erlbaum.
Work Analysis 95

Christal, R. E., & Weissmuller, J. J. (1988). Job-task inventory analysis. In S. Gael (Ed.), The job analysis hand-
book for business, industry, and government (Vol. II, pp. 1036–1050). New York, NY: Wiley.
Coates, J. F., Jarratt, J., & Mahaffie, J. B. (1990). Future work: Seven critical forces reshaping work and the
work force in North America. San Francisco, CA: Jossey-Bass.
Cunningham, J. W. (2000). Introduction to a new journal. Ergometrika, 1, 1–23.
Dawis, R. V. (1991). Vocational interests, values, and preferences. In M. D. Dunnette & L. M. Hough (Eds.),
Handbook of industrial and organizational psychology (2nd ed., Vol. 2, pp. 833–872). Palo Alto, CA:
Consulting Psychologists Press.
Dierdorff, E. C., & Morgeson, F. P. (2007). Consensus in work role requirements: The influence of discrete
occupational context on role expectations. Journal of Applied Psychology, 92, 1228–1241.
Droege, R. C. (1988). Department of Labor job analysis methodology. In S. Gael (Ed.), The job analysis hand-
book for business, industry, and government (Vol. II, pp. 993–1018). New York, NY: Wiley.
Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department
of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166),
38295–38309.
Fine, S. A., & Wiley, W. W. (1971). An introduction to Functional Job Analysis: A scaling of selected tasks.
Kalamazoo, MI: Upjohn Institute.
Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin, 51, 327–358.
Fleishman, E. A., & Quaintance, M. K. (1984). Taxonomies of human performance. Orlando, FL: Academic
Press.
Fogli, L., & Whitney, K. (1998). Assessing and changing managers for new organizational roles. In R. Jeanneret &
R. Silzer (Eds.), Individual psychological assessment (pp. 285–329). San Francisco, CA: Jossey-Bass.
Frederiksen, N. (1972). Toward a taxonomy of situations. American Psychologist, 27, 114–123.
Gael, S. (Ed.). (1988). The job analysis handbook for business, industry, and government. New York: Wiley.
Gatewood, R. D., Feild, H. S., & Barrick, M. (2008). Human resource selection (6th ed.). Mason, OH: Thomson/
South-Western.
Gibson, C. B., Porath, C. L., Benson, G. S., & Lawler, E. E., III. (2007). What results when firms implement
practices: The differential relationship between specific practices, firm financial performance, customer
service, and quality. Journal of Applied Psychology, 92, 1467–1480.
Goiffin, R. D., & Woycheshin, D. E. (2006). An empirical method of determining employee competencies/
KSAOs from task-based job analysis. Military Psychology, 18, 121–130.
Goldstein, I. L. (1997). Interrelationships between the foundations for selection and training systems. In
N. Anderson & P. Herriot (Eds.), International handbook of selection and assessment (pp. 529–541).
New York, NY: Wiley.
Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An exploration of the job analysis-content validity process.
In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organizations (pp. 3–34). San Francisco,
CA: Jossey-Bass.
Guest, D., Conway, N., & Dewe, P. (2004). Using sequential tree analysis to search for “bundles” of HR prac-
tices. Human Resource Management Journal, 14, 79–96.
Guthrie, J. (2001). High-involvement work practices, turnover, and productivity: Evidence from New Zealand.
Academy of Management Journal, 44, 180–192.
Hackman, J. R., & Oldham, G. R. (1976). Motivation through the design of work: Test of a theory. Organizational
Behavior and Human Performance, 16, 250–279.
Hanson, M. A., Borman, W. C., Kubisiak, U. C., & Sager, C. E. (1999). Cross-domain analysis. In N. G.
Peterson, M. D. Mumford, W. C. Borman, P. R. Jeanneret, & E. A. Fleishman (Eds.), An occupational
information system for the 21st century: The development of O*NET (pp. 247–258). Washington, DC:
American Psychological Association.
Harvey, R. J. (1986). Quantitative approaches to job classification: A review and critique. Personnel Psychology,
39, 267–289.
Harvey, R. J., & Wilson, M. A. (2000). Yes Virginia, there is an objective reality in job analysis. Journal of
Organizational Behavior, 21, 829–854.
Higgs, A. C., Papper, E. M., & Carr, L. S. (2000). Integrating selection with other organizational processes and
systems. In J. F. Kehoe (Ed.), Managing selection in changing organizations: Human resource strategies
(pp. 73–122). San Francisco, CA: Jossey-Bass.
Hogan, J., Davies, S., & Hogan, R. (2007). Generalizing personality-based validity evidence. In S. M. McPhail
(Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence
(pp. 181–229). San Francisco, CA: Jossey-Bass.
Holland, J. L. (1973). Making vocational choices: A theory of careers. Englewood Cliffs, NJ: Prentice-Hall.
96 Handbook of Employee Selection

Huselid, M. (1995). The impact of human resource management practices on turnover, productivity, and corpo-
rate performance. Academy of Management Journal, 38, 635–672.
Johns, G. (2006). The essential impact of context on organizational behavior. Academy of Management Review,
31, 386–408.
Johnston, W. B., & Packer, A. E. (1987). Workforce 2000. Indianapolis, IN: Hudson Institute.
Jones, R. G., Stevens, M. J., & Fischer, D. L. (2000). Selection in team contexts. In J. F. Kehoe (Ed.), Managing
selection in changing organizations: Human resource strategies (pp. 210–241). San Francisco, CA:
Jossey-Bass.
Kanfer, R., & Kantrowitz, T. M. (2002). Emotional regulation: Command and control of emotion in work
life. In R. G. Lord, R. J. Klimoski, & R. Kanfer (Eds.), Emotions in the workplace (pp. 433–472). San
Francisco, CA: Jossey-Bass.
Landy, F. J. (1988). Selection procedure development and usage. In S. Gael (Ed.), The job analysis handbook
for business, industry, and government (Vol. I, pp. 271–287). New York, NY: Wiley.
Landy, F. J. (2007). The validation of personnel decisions in the twenty-first century: Back to the future. In
S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity
evidence (pp. 409–426). San Francisco, CA: Jossey-Bass.
Landy, F. J., & Conte, J. M. (2007). Work in the 21st century (2nd ed.). Boston, MA: Blackwell.
Lawler, E. E., III, Mohrman, S. A., & Benson, G. (2001). Organizing for high performance: Employee involve-
ment, TQM, reengineering, and knowledge management in the Fortune 1000. San Francisco, CA: Jossey
Bass.
Levine, E. L., & Sanchez, J. I. (2007). Evaluating work analysis in the 21st century. Ergometrika, 4, 1–11.
Lievens, F., & Sanchez, J. I. (2007). Can training improve the quality of inferences made by raters in compe-
tency modeling? A quasi-experiment. Journal of Applied Psychology, 92, 812–819.
Lievens, F., Sanchez, J. I., & De Corte, W. (2004). Easing the inferential leap in competency model-
ing: The effects of task-related information and subject matter expertise. Personnel Psychology, 57,
881–904.
Lindell, M. K., & Brandt, C. J. (1999). Assessing interrater agreement on the job relevance of a test: A compari-
son of the CVI, T, rWG(J), and r*WG(J) indexes. Journal of Applied Psychology, 84, 640–647.
MacDuffie, J. P. (1995). Human resource bundles and manufacturing performance: Organizational logic and
flexible production systems in the world auto industry. Industrial and Labor Relations Review, 48,
197–221.
McCormick, E. J. (1976). Job and task analysis. In M. D. Dunnette (Ed.), Handbook of industrial and organi-
zational psychology (pp. 651–696). Chicago, IL: Rand McNally.
McCormick, E. J., Jeanneret, P. R., & Mecham, R. M. (1972). A study of job characteristics and job dimen-
sions as based on the Position Analysis Questionnaire (PAQ). Journal of Applied Psychology, 56,
347–368.
McLagan, P. A. (1988). Flexible job models: A productivity strategy for the information age. In J. P. Campbell,
R. J. Campbell, & Associates (Eds.), Productivity in organizations (pp. 369–387). San Francisco, CA:
Jossey-Bass.
McLarty, J. R., & Vansickle, T. R. (1997). Assessing employability skills: The Work Keys™ super-system. In
H. F. O’Neil, Jr. (Ed.), Workforce readiness: Competencies and assessment (pp. 293–325). Mahwah, NJ:
Lawrence Erlbaum.
McPhail, S. M. (Ed.). (2007). Alternative validation strategies: Developing new and leveraging existing valid-
ity evidence. San Francisco, CA: Jossey-Bass.
Mitchell, J. L., & Driskill, W. E. (1996). Military job analysis: A historical perspective. Military Psychology,
8, 119–142.
Morgeson, F. P., & Campion, M. A. (1997). Social and cognitive sources of potential inaccuracy in job analysis.
Journal of Applied Psychology, 82, 627–655.
Morgeson, F. P., & Campion, M. A. (2000). Accuracy in job analysis: Toward an inference-based model.
Journal of Organizational Behavior, 21, 819–827.
Morgeson, F. P., Delaney-Klinger, K., Mayfield, M. S., Ferrara, P., & Campion, M. A. (2004). Self-presentation
processes in job analysis: A field experiment investigating inflation in abilities, tasks, and competencies.
Journal of Applied Psychology, 89, 674–686.
Morris, J. A., & Feldman, D. C. (1996). The dimensions, antecedents, and consequences of emotional labor.
Academy of Management Review, 21, 986–1010.
Offerman, L. R., & Gowing, M. K. (1993). Personnel selection in the future: The impact of changing demo-
graphics and the nature of work. In N. Schmitt, W. C. Borman, & Associates (Eds.), Personnel selection
in organizations (pp. 385–417). San Francisco, CA: Jossey-Bass.
Work Analysis 97

Pearlman, K. (1980). Job families: A review and discussion of their implications for personnel selection.
Psychological Bulletin, 87, 1–28.
Pearlman, K. (1997, April). Competencies: Issues in their application. In R. C. Page (Chair), Competency
models: What are they and do they work? Practitioner forum conducted at the meeting of the Society for
Industrial and Organizational Psychology, St. Louis, MO.
Pearlman, K., & Barney, M. F. (2000). Selection for a changing workplace. In J. F. Kehoe (Ed.), Managing selec-
tion in changing organizations: Human resource strategies (pp. 3–72). San Francisco, CA: Jossey-Bass.
Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980). Validity generalization results for tests used to predict job
proficiency and training success in clerical occupations. Journal of Applied Psychology, 65, 373–406.
Peterson, N. G., & Bownas, D. A. (1982). Skill, task structure, and performance acquisition. In M. D. Dunnette
& E. A. Fleishman (Eds.), Human performance and productivity: Vol. 1. Human capability assessment
(pp. 49–105). Hillsdale, NJ: Lawrence Erlbaum.
Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., & Fleishman, E. A. (Eds.). (1999). An
occupational information system for the 21st century: The development of O*NET. Washington, DC:
American Psychological Association.
Ployhart, R. E., Schneider, B., & Schmitt, N. (2006). Staffing organizations: Contemporary theory and prac-
tice. Mahwah, NJ: Lawrence Erlbaum.
Primoff, E. S. (1975). How to prepare and conduct job-element examinations (U.S. Civil Service Commission
Technical Study 75-1). Washington, DC: U.S. Government Printing Office.
Primoff, E. S., & Fine, S. A. (1988). A history of job analysis. In S. Gael (Ed.), The job analysis handbook for
business, industry, and government (Vol. I, pp. 14–29). New York, NY: Wiley.
Raymark, P. H., Schmit, M. J., & Guion, R. M. (1997). Identifying potentially useful personality constructs for
employee selection. Personnel Psychology, 50, 723–736.
Rodriguez, D., Patel, R., Bright, A., Gregory, D., & Gowing, M. K. (2002). Developing competency models to
promote integrated human-resource practices. Human Resource Management, 41, 309–324.
Sackett, P. R. (2003). Exploring strategies for clustering military occupations. In A. K. Wigdor & B. F. Green
(Eds.), Performance assessment for the workplace (Vol. II, pp. 305–332). Washington, DC: National
Academy Press.
Sanchez, J. I. (1994). From documentation to innovation: Reshaping job analysis to meet emerging business
needs. Human Resource Management Review, 4, 51–74.
Sanchez, J. I. (2000). Adapting work analysis to a fast-paced and electronic business world. International
Journal of Selection and Assessment, 8, 204–212.
Sanchez, J. I., & Fraser, S. L. (1994). An empirical procedure to identify job duty-skill linkages in managerial
jobs: A case example. Journal of Business and Psychology, 8, 309–326.
Sanchez, J. I., & Levine, E. L. (2000). Accuracy or consequential validity: Which is the better standard for job
analysis data? Journal of Organizational Behavior, 21, 809–818.
Schippmann, J. S. (1999). Strategic job modeling: Working at the core of integrated human resources. Mahwah,
NJ: Lawrence Erlbaum.
Schippmann, J. S., Ash, R. A., Battista, M., Carr, L., Eyde, L. D., Hesketh, B., et al. (2000). The practice of
competency modeling. Personnel Psychology, 53, 703–740.
Schmidt, F. L., Hunter, J. E., & Pearlman, K. (1981). Task differences as moderators of aptitude test validity in
selection: A red herring. Journal of Applied Psychology, 66, 166–185.
Schraagen, J. M., Chipman, S. F., & Shalin, V. L. (Eds.). (2000). Cognitive task analysis. Mahwah, NJ:
Lawrence Erlbaum.
Secretary’s Commission on Achieving Necessary Skills. (1992, April). Learning a living: A blueprint for high
performance: A SCANS report for America 2000. Washington, DC: U.S. Department of Labor.
Snow, C. C., & Snell, S. A. (1993). Staffing as strategy. In N. Schmitt, W. C. Borman, & Associates (Eds.),
Personnel selection in organizations (pp. 448–478). San Francisco, CA: Jossey-Bass.
Spencer, L. M., McLelland, D. C., & Spencer, S. (1994). Competency assessment methods: History and state
of the art. Boston: Hay-McBer Research Press.
Spreitzer, G. M. (1995). Psychological empowerment in the work place: Construct definition, measurement,
and validation. Academy of Management Journal, 38, 1442–1465.
Staw, B. M., & Epstein, L. (2000). What bandwagons bring: Effects of popular management techniques on
corporate performance, reputation, and CEO pay. Administrative Science Quarterly, 45, 523–559.
Tett, R. P., & Burnett, D. D. (2003). A personality trait-based interactionist model of job performance. Journal
of Applied Psychology, 88, 500–517.
U.S. Department of Education, National Center for Education Statistics. (2002). Classification of instructional
programs: 2000 edition. Washington, DC: Author.
98 Handbook of Employee Selection

U.S. Department of Labor. (1965a). Dictionary of Occupational Titles (3rd ed., Vol. 1). Washington, DC: U.S.
Government Printing Office.
U.S. Department of Labor. (1965b). Dictionary of Occupational Titles (3rd ed., Vol. 2). Washington, DC: U.S.
Government Printing Office.
Vaughn, D. S., & Bennett, W. Jr. (2002, November). Toward a unified theory of work. Organizational simula-
tions and policy analyses (AFRL-HE-AZ-TP-2002-0014). Mesa, AZ: Air Force Research Laboratory.
Williams, A. P. O., & Dobson, P. (1997). Personnel selection and corporate strategy. In N. Anderson & P. Herriot
(Eds.), International handbook of selection and assessment (pp. 219–245). New York, NY: Wiley.
Wilson, M. A. (1997). The validity of task coverage ratings by incumbents and supervisors. Journal of Business
and Psychology, 12, 85–95.
Wilson, M. A. (2007). A history of job analysis. In L. Koppes (Ed.), Historical perspectives in industrial and
organizational psychology (pp. 219–241). Mahwah, NJ: Lawrence Erlbaum.
5 Current Concepts of
Validity, Validation,
and Generalizability

Jerard F. Kehoe and Kevin R. Murphy

In this chapter, we consider the ways psychologists’ understanding of the meaning of the construct
“validity” and of the links between the procedures used to investigate and establish validity and this
construct have evolved over the last 50 years (Angoff, 1988; Langenfeld & Crocker, 1994). At one
time, psychologists spoke of different types of validity and advocated several seemingly distinct
strategies for defining and estimating the validity of tests (American Psychological Association
et al., 1954; Guion, 1980; Landy, 1986). In particular, it was assumed that there were substantial and
important distinctions between content, construct, and criterion-related validity, and this assump-
tion was incorporated into the guidelines that still regulate the enforcement of equal employment
law in the United States (Guion, 1980; Kleiman & Faley, 1985; Uniform Guidelines, 1978). Other
variants, including synthetic validity (Lawshe, 1952) and convergent and discriminant validity
(Campbell & Fiske, 1959) were introduced, adding to the complexity of the debate about how to
define and understand validity.
Since the late 1980s there has been clear and steady evolution in our understanding of valid-
ity. First, it is now widely accepted that there are not many different types of validity; rather,
validity is now recognized as a unitary concept (American Educational Research Association
[AERA], American Psychological Association [APA], & National Council on Measurement in
Education [NCME], 1985). It is also widely accepted that validity is a property of the infer-
ences or generalizations made on the basis of test scores, not a property of the tests themselves
(e.g., Lawshe, 1985). That is, it does not make sense to label a test “valid” or “not valid.” The
inferences made about people on the basis of test scores (e.g., Sam received a high score on a
conscientiousness scale, therefore Sam is probably dependable and reliable) might or might not
be valid, and validation is a process of determining whether or not these generalizations about
individuals are correct and meaningful. There are various strategies for investigating validity,
some of which concentrate on content-based evidence, some of which concentrate on the rela-
tionships between test scores and other criteria, but the goal of all methods of validation is to
determine whether particular conclusions one draws about individuals on the basis of test scores
are valid.
In this chapter, we will make five key points in discussing the meaning of the construct validity
and its relationship with the procedures commonly used for validating tests and assessments:

1. Virtually any investigation into the meaning and implications of test scores can be thought
of as validation.
2. Topics such as reliability, scaling, and factor structure represent specialized niches within
the broader umbrella of validity.

99
100 Handbook of Employee Selection

3. The key concern of virtually all methods of validation is with testing the inferences and
generalizations made on the basis of test scores.
4. The validity of tests and assessments cannot be divorced from the uses and applications of
those tests.
5. Decisions about applied selection programs rely on a wide range of generalizations about
specific facets of the validity of the selection process.

CONVERGING TRENDS IN PSYCHOMETRICS—TOWARD


A UNIFIED THEORY OF VALIDITY
The movement of validity from an emphasis on multiplicity (i.e., different types of validity) toward
unity is part of a broader trend in psychometrics; that is, the growing recognition that topics such as
reliability, item response theory, validity, factor analysis, and structural modeling are closely inter-
related, and investigations of the validity of test scores, the factor structure of test batteries, the sta-
bility of test scores over time, etc., are all concerned with the same general question—understanding
the meaning of test scores. Table 5.1 lists timelines that display major events in the development of
factor analysis, reliability, and validity; the central point in this table is theory and research in all
three areas have moved in converging directions. McDonald (1999) advocated a unified view that
links scaling, factor analysis, reliability, and validity under a broad umbrella of concepts that all
have to do with understanding the meaning for test scores. In particular, the perspective he presents
emphasizes the concept that is central to this chapter, that is, that all investigations into the meaning
and implications of test scores can be thought of as aspects of validity.
For the first 50 years of systematic research in psychology, there was little consensus about the
meaning or the definition of validity. There was some agreement that a test might be thought of as
valid if it measured what it was designed to measure, or if it predicted some important outcome, but

TABLE 5.1
Trends in the Development of Factor Analysis, Reliability, and Validity
Factor Analysis Reliability Validity
1900–1930 • Spearman introduces factor • Correction for attenuation • Limited consensus about
analysis • Spearman-Brown formula definition or estimation of
• Multiple factor analysis • Classic true score model validity
developed
1930–1960 • Oblique rotation • Coefficient alpha • Technical Recommendations
• Hierarchical factor analysis for Psychological Tests and
Diagnostic Techniques (1954)
• Predictive, concurrent,
content and construct
distinctions
1960–1990 • Procrustes rotation • Item response theory • Standards for Educational
• Confirmatory factor • Generalizability theory and Psychological Tests and
analysis Manuals (1966, 1985)
• Structural equation • Validity defined in terms of
modeling appropriateness of inferences
from test scores
1990–present • Widespread availability and • Reliability estimation via SEM • Standards for Educational
use of SEM and Psychological Testing
(1999)
• Construct validity as an
umbrella
Current Concepts of Validity, Validation, and Generalizability 101

there was no real agreement about how validity should be assessed or about the standards of evidence
for defining validity. In 1954, a joint committee of the APA, the AERA, and the National Council
of Measurement Used in Education released their Technical Recommendations for Psychological
Tests and Diagnostic Techniques (APA, 1954). This was the first of several set of testing standards
released over the next half-century, and it served to define many of the key terms in psychological
measurement.
The 1954 Technical Recommendations codified a view of validity that sorted validity into sev-
eral different types: criterion-related, content validity, and construct validity. This view of valid-
ity is still central to equal employment law in the United States (Guion, 1980; Kleiman & Faley,
1985; Landy, 1986; Uniform Guidelines, 1978) but is no longer accepted as useful by researchers.
Subsequent testing standards explicitly rejected the idea of different types of validity and advocated
the idea that validity is a unitary concept (Standards for Educational and Psychological Testing,
1985, 1999), and that validity is a property of the inferences made on the basis of test scores, not a
property of the tests themselves (e.g., Lawshe, 1985).
The current consensus suggests that there are various strategies for investigating validity, some
of which concentrate on content-based evidence, some of which concentrate on the relationships
between test scores and other criteria, but the goal of all methods of validation is to determine
whether particular conclusions one draws about individuals on the basis of test scores are valid. The
overarching concept that is increasingly used to describe and refer to all types of validation research
is “construct validation” (Cronbach, 1988, 1989; Cronbach & Meehl, 1955). As with factor analy-
sis and reliability, the trend in thinking about validity has moved in a very clear direction—that
is, a recognition of the fundamental links between virtually all methods of collecting, analyzing,
and interpreting data about the meaning of test scores (McDonald, 1999)—and therefore represent
investigations of construct validity.
There has been a growing recognition that analyses that would once have been thought of as
addressing quite different questions (e.g., factor analysis, assessment of reliability) are better thought
of as parts of the general process of validation. One implication is that topics that have been tradi-
tionally thought of as quite distinct by personnel psychologists may need to be reconsidered from
the perspective of a more unified view of psychometrics. This trend is most evident when consider-
ing the question of whether there is any useful difference between reliability and validity.

RELIABILITY, VALIDITY, AND GENERALIZATIONS FROM TEST SCORES


Traditionally, reliability and validity have been presented as somewhat separate concepts. It has
long been understood that low reliability limits validity, in the sense that a test that is plagued by
measurement error is unlikely to be correlated with other measures. However, many if not most per-
sonnel psychologists would agree with Schmidt, Viswesvaran, and Ones’ (2000) assertion that “reli-
ability is not validity and validity is not reliability” (p. 901). Murphy and DeShon (2000b) argued
that the field of personnel psychology is seriously out of step with modern psychometric thinking.
For example, leading researchers in personnel psychology still take the parallel test model quite
seriously, even in situations where the measures being examined are clearly not parallel (Murphy &
DeShon, 2000a). More generally, the idea that reliability and validity are fundamentally different
concepts represents a throwback to an earlier generation of psychometric research and theory that
has largely been abandoned by serious psychometricians. There are sometimes useful distinctions
between the types of questions asked in analyses of reliability and validity, but the core concepts are
not as distinct as most psychologists seem to think they are.
Several authors have noted that reliability and validity are closely related concepts, and that the
traditional distinction between the two is an issue of style rather than substance (Dimitrov, 2002;
Graham, 2006; McDonald, 1999; Murphy & DeShon, 2000b). The idea that reliability and validity
are variations on a common theme is hardly a new one; Cronbach, Gleser, Nanda, and Rajaratnam
(1972) and Nunnally (1975) made this point over 30 years ago. Virtually all of the concerns of
102 Handbook of Employee Selection

reliability or generalizability theory can and should be subsumed under the broad heading of con-
struct validity.
The aim of validity is to understand the meaning of test scores. One way to do this is to develop
and test structural models of test scores, identifying the sources of variance in test scores. When you
say that a test shows a high level of construct validity, this is the same as saying that a large portion
of the variance in test scores is due to the construct the test is designed to measure. Measurement
error, in the classical sense, is one of many things that can cause test scores to imperfectly repre-
sent constructs. Generalizability theory (Brennan, 1983; Cronbach et al., 1972) suggests that there
is often variation in test scores that is due to things other than random measurement error or the
construct being measured (e.g., systematic variation attributable to measurement conditions, rater
differences, etc.); this theory lays out several methods of estimating these various effects.
Reliability assessment represents a very special, and possibly very small, niche within the larger
domain of construct validity. Classical test theory (CTT) deals with random measurement error,
which can be dealt with in relatively straightforward ways if the extent of error is known. When
the assumptions of CTT are met (e.g., observed scores can be sensibly divided into true scores and
random measurement errors), reliability theory can provide elegant and powerful approaches for
interpreting data, but the idea that one can or should investigate reliability without at the same time
thinking about validity is not a sensible one (Murphy & Deshon, 2000b). Several recent studies have
discussed or illustrated ways of linking assessments of reliability with assessments of construct
validity (Kraiger & Teachout, 1990; Murphy & DeShon, 2000a).
The goal of virtually all psychometric investigations is the same—to make sense of test scores. The
methods of evaluating reliability described here can all contribute to that goal. Knowing whether or
not test scores are stable over time (test-retest reliability), whether difference evaluators reach similar
conclusions (inter-rater reliability), or whether the different items on a test hang together (coefficient
alpha) can help in understanding what tests mean. It is not sensible, and indeed it is not clear that it
is possible, to investigate reliability without learning about validity or to investigate validity without
learning about reliability (McDonald, 1999; Murphy & DeShon, 2000b; Nunnally, 1975).

VALIDATION PROCESS: LINKING TESTS WITH THEIR USES


In personnel selection, the most common use of tests and assessments is to help in making decisions
about job applicants, usually on the basis of test-based predictions of their future performance,
effectiveness, ability to learn, etc. The process of gathering evidence to evaluate the validity of
tests and assessments used in personnel selection usually involves collecting some combination of
evidence based on the relationship between test scores and measures of performance on the job or
in training, evidence based on a comparison between the content of tests and the content of jobs,
and evidence of the relationships between the constructs that underlie tests and the constructs that
underlie performance on the job or in training.

VALIDATION IN TERMS OF RELATIONSHIPS BETWEEN TESTS AND CRITERIA


One category of evidence that is useful for understanding the meaning of test scores is obtained by
examining the relationships between test scores and criteria, that is, variables that are of substantive
interest to a test user or researcher. The choice of criteria often depends on a combination of the
purpose of testing (e.g., if tests are used to make personnel selection decisions, variables such as job
performance are likely to be viewed as appropriate criteria) and the realities of the data collection
environment (e.g., measures of performance that are psychometrically preferable may be viewed as
burdensome and impractical).
Traditionally, the research designs used to gather criterion-related evidence have been classified
as predictive or concurrent. In predictive designs, scores on selection tests are obtained for a sample
of job applicants and are correlated with criterion measures obtained from these applicants at some
Current Concepts of Validity, Validation, and Generalizability 103

later time (e.g., post-hire measures). In concurrent designs, test scores and criterion measures are
often obtained at roughly the same time. However, it is not the temporal relationship between tests
and criteria that distinguishes these two designs, but rather the question of whether test scores play
a direct or indirect role in defining the sample. In predictive designs, test scores are not used to
select job applicants. Job applicants might be selected at random or might be hired on the basis of
screening criteria that are uncorrelated with the test being validated; however, in a predictive design,
the sample of individuals who provide predictor and criterion scores is assumed to be representative
of the broader population of applicants that the tests being validated will eventually be used to
screen. In concurrent designs, test scores, or measures that are likely to be correlated with test
scores, are used to screen job applicants. For example, the most common concurrent design is one
in which the scores of job incumbents (who were selected on the basis of the tests being validated
or other selection processes) are correlated with performance measures. In theory, this should lead
to potentially serious levels of range restriction and should substantially depress validity estimates.
However, several reviews have shown that concurrent validity estimates are generally quite similar
to estimates obtained in predictive designs (Barrett, Phillips, & Alexander, 1981; Guion & Cranny,
1982; Nathan & Alexander, 1988; Schmitt, Gooding, Noe, & Kirsch, 1984). In part, this is because
severe range restriction is fairly rare, and in part it is because correction formulas ignore many of
the factors that affect the outcomes of real-world validity studies.
There have been hundreds, if not thousands, of studies examining the relationships between
personnel selection tests and measures of job performance, performance in training, and similar
criteria. One of the major tasks of personnel psychologists over the last 40 years has been to make
sense of this vast body of literature; meta-analysis and validity generalization methods have con-
tributed substantially to this task.

Validity Generalization
In a series of papers, Schmidt, Hunter, and their colleagues have argued that it is often possible to
generalize from existing research on test validity to draw conclusions about the validity of tests in
various settings. In particular, they have shown that it is often possible to generalize the fi ndings of
existing validity research to new findings, thus providing a general solution to the same problem that
motivated the synthetic validity model (i.e., the problem of estimating validity in a particular context
without doing a new validity study in that context). Personnel psychologists had long assumed that
validity estimates were too low and that the results of validity studies were simply too inconsistent
from organization to organization or from job to job to permit this sort of generalization. Schmidt
and Hunter’s critical insight was that statistical artifacts, such as sampling error, range restriction,
and unreliability, had led personnel psychologists to underestimate the size and overestimate the
variability of validity coefficients.
Schmidt and Hunter (1977) developed a validity generalization model that estimates and corrects
for the effects of statistical artifacts (e.g., limited reliability and range restriction) on the distribu-
tions of test validity estimates. The Schmidt-Hunter validity generalization model suggests that in
almost every test, true validities are (a) substantially larger and (b) much less variable than psychol-
ogists have traditionally believed (Murphy, 2000, 2003; Salgado, Anderson, Moscoso, Bertua, &
de Fruyt, 2003; Schmidt, 1992; Schmidt & Hunter, 1977, 1980, 1998).
There is a large body of research examining the criterion-related validity of different selection
tests, particularly written tests that measure general or specific facets of cognitive ability. Validity
generalization analyses of these studies suggest that measures of cognitive ability are positively cor-
related with measures of performance in virtually all jobs, and that the results are sufficiently con-
sistent across hundreds of validity studies (after correcting for the effects of sampling error, limited
reliability, etc.) to support the inference that ability tests are valid predictors of performance in vir-
tually any job (Hunter, 1986; McHenry, Hough, Toquam, Hanson, & Ashworth, 1990; Ree & Earles,
1992; Schmidt & Hunter, 1998; Schmidt, Ones, & Hunter, 1992). Broad validity generalization has
also been claimed for personality traits such as conscientiousness (Barrick & Mount, 1991).
104 Handbook of Employee Selection

Since the late 1970s several variations of the basic model first proposed by Schmidt and Hunter
(1977) have been proposed. Murphy’s (2003) edited volume on validity generalization includes
excellent chapters that review the history of validity generalization models (Schmidt, Landy), pro-
pose new analytic models on the basis of maximum likelihood estimation (Raju and Drasgow) and
Bayesian statistics (Brannick and Hall), and discuss the conceptual and analytic challenges in apply-
ing these models to draw inferences about the validity of selection tests (Bobko and Roth, Burke
and Landis). Bayesian models for validity generalization appear to have considerable potential for
addressing the problem of how to interpret and apply validity generalization results (Newman,
Jacobs, & Bartram, 2007).

Synthetic Validity
When Lawshe (1952) introduced the method of synthetic validity, he noted that jobs can be broken
down into several components, and these components (e.g., supervision, obtaining information
from gauges and devices) are shared across several settings. Jobs differ in terms of their particu-
lar mixes of components, but the general building blocks are quite similar across many jobs and
organizations. Lawshe realized that if one could obtain estimates of the validity of tests for each
of these components, it would be possible to combine these to produce an estimate of the valid-
ity of the test or tests for any job, regardless of its specific content (Peterson, Wise, Arabian, &
Hoffman, 2001).
Implementing this method would certainly be a challenge; this method would require you to
describe jobs in terms of a relatively universal taxonomy of components, establish the validity of
different classes of tests for each component, then build a validity estimate on the basis of what
you know about the validities of the tests for each component and the intercorrelations of the tests
and the components, all of which might prove challenging. Nevertheless, the method has clear con-
ceptual appeal. However, despite more than 50 years of research on synthetic validity, this method
has not proved completely particular or successful (Steel, Huffcutt, & Kammeyer-Mueller, 2006).
Advances in meta-analysis have made this method more feasible (Steel et al., 2006), but probably
have also suggested that there may be simpler solutions to the problem of estimating validity in a
job where no validity studies exist.

Consequential Validity
Messick (1988, 1989) suggested that we should consider the consequences of test use when evaluat-
ing validity. For example, the use of cognitive ability tests to make high-stakes decisions about indi-
viduals contributes to the racial segregation of some workplaces and schools (Gottfredson, 1986;
Herrnstein & Murray, 1994; Jensen, 1980). These tests appear to measure cognitive abilities reason-
ably well and they predict important criteria, but they also lead to very negative consequences for
some examinees. Should this unfortunate fact influence our evaluation of the validity of cognitive
ability tests? The answer probably depends on how narrowly or broadly one defines validity.
If validity is defined solely in terms of whether a test measures what it is supposed to measure
or predicts what it is supposed to predict, the consequences of test use might arguably be irrelevant
to validity, except where consequences provide evidence relevant to the intended meaning or use of
tests. If validity is defined in terms of an overall assessment of the value or the impact of a test, these
same consequences might be central to assessing validity.
One resolution to the question of whether or not the consequences of test use should factor
into evaluations of validity is to remember that tests are neither valid nor invalid; rather, they are
valid for some purposes and not for others. Thus, the validity of a test cannot be fully defi ned
without reference to the way the test will be used, and different perspectives on the purpose of
testing might lead to quite different assessments of validity. Thus, the same test might be seen as
valid for the purpose of maximizing expected performance while minimizing costs and invalid
for making decisions that do not unduly disadvantage members of particular racial or ethnic
groups.
Current Concepts of Validity, Validation, and Generalizability 105

A Multivariate Model for Criterion-Related Validity


Murphy and Shiarella (1997) noted that personnel selection is rarely if ever a univariate prob-
lem. Organizations use multiple tests and assessments to select employees, and constructs such
as performance or effectiveness have multiple dimensions. They proposed a general multi-
variate model for validity studies and noted that the validity of selection test batteries would
depend not only on the tests and the performance components, but also on the relative weight
given to these by the organization. Most critically, two organizations might defi ne performance
in quite different terms (e.g., one might give a great deal of weight to task accomplishment,
whereas another gives substantial weight to contextual factors such as teamwork), and the rela-
tive weights given to performance dimensions and selection tests can substantially affect valid-
ity. Murphy (2010) extended the logic of this model to show how incorporating criteria such as
the likelihood that a test would lead to substantial adverse impact into the validation of a test
for personnel selection could radically change conclusions about which tests are most or least
valid for that purpose.
Although not cast in terms of synthetic validity, Murphy and Shiarella’s (1997) multivariate
model provides a framework for synthetic validation. That is, they present a general set of equa-
tions for linking scores on a set of tests with measures of a set of performance components, and
they allow for differences (across occasions, organizations, etc.) in the relative weight attached
to predictor tests and performance dimensions. It also presents a framework for integrating tra-
ditional conceptions of criterion-related validity (e.g., the prediction of performance) with more
controversial conceptions of consequential validity. For example, Murphy (2010) noted that
the purpose of a selection test is often multifaceted, and that performance and adverse impact
are likely to be variables of substantive interest to an organization. The models developed by
Murphy and Shiarella (1997) can be used to evaluate the validity of a test in a multivariate
criterion space.

CONTENT-BASED APPROACHES
All methods of validation share the same basic aim: to present evidence and theory that supports
particular inferences made about test scores. One common method of validation involves examina-
tion of the content of a test using expert judgment and/or empirical analyses to assess the relation-
ship between the content of a test and the content domain the test purports to measure or predict
(AERA, APA, & NCME, 1985). The assumption underlying this method of validation is that if the
test is a representative sample of the domain, people who receive high scores on the test are likely
to exhibit a high level of the attribute the test is designed to measure.
In the context of personnel selection, the term “content validity” has taken on a unique and
special meaning—using assessments of the content of a test to make inferences about the likelihood
that individuals who receive high scores on a selection test will also perform well on the job
(Binning, 2007; Guion, 1998; Murphy & Davidshofer, 2005; Uniform Guidelines on Employee
Selection Procedures, 1978). In principle, this inference could be tested directly by standard
methods of criterion-related validation (i.e., by correlating scores on these tests with measures of job
performance). Content-oriented methods provide a different approach, in which inferences about
the predictive value of test scores are established on the basis of a systematic comparison between
the content of a test and the content of a job. If a systematic match between the content of a test
and the content of a job can be established, the usual inference is that people who do well on the test
will also do well on the job.
There are several different approaches that are used to assess the match between test content and
job content. One approach laid out in the Uniform Guidelines on Employee Selection Procedures
(1978) is to argue that the test is a valid predictor of performance if it can be shown to be a repre-
sentative sample of the work domain. Section 14C(4) of the Guidelines states,
106 Handbook of Employee Selection

To demonstrate the content validity of a selection procedure, a user should show that the behavior(s)
demonstrated in the selection procedure are a representative sample of the behavior(s) of the job in
question or that the selection procedure provides a representative sample of the work product of the job.

Similarly, the third edition of the Principles for the Validation and Use of Personnel Selection
Procedures (Society for Industrial and Organizational Psychology [SIOP], 1987) stated that a test is
content valid if it is “… a representative sample of the tasks, behaviors, or knowledge drawn from
that [job] domain” (p. 19). Under this definition, content validity can be established if experts agree
that a test adequately samples the portions of the job domain that are critical to adequate perfor-
mance. This strategy is based on the inference that people who do well on a sample of the job will
also do well if selected into that job.
An alternate strategy is to describe a test as content-valid if it measures knowledge, abilities, and/
or skills that are required for successful job performance. Section 14C(1) of the Uniform Guidelines
(1978) notes:

Selection procedures which purport to measure knowledge, skills, or abilities may in certain
circumstances be justified by content validity, although they may not be representative samples, if the
knowledge, skill, or ability measured by the selection procedure can be operationally defined … and if
that knowledge, skill, or ability is a necessary prerequisite to successful job performance.

Similarly, the Principles for the Validation and Use of Personnel Selection Procedures (SIOP, 2003)
state “Evidence based on test content may include logical or empirical analyses that compare the
adequacy of the match between test content and work content, worker requirements, or outcomes
of the job” (p. 6) and “Evidence for validity based on content typically consists of a demonstration
of a strong linkage between the content of the selection procedure and important work behaviors,
activities, worker requirements, or outcomes on the job.” (p. 21)
Linkage methods represent a variation on the performance requirements approach described
above in which subject matter experts are asked to make judgments about overlaps in the
knowledge, skills, and abilities (KSAs) needed to do well on a test and those needed to do well
on a job. Goldstein and Zedeck (1996) argued that “… when there is congruence between the
KSAs required to perform on the job and the KSA required to perform on the testing instrument,
then it should be possible to make inferences about how the test scores relate to job behavior.”
(p. 28; see also Goldstein, Zedeck, & Schneider, 1993). This method involves (a) linking KSAs to
specific job elements (i.e., making a judgment that specific KSAs are required for or contribute to
performance of specific aspects of a job), (2) linking KSAs to test items or to subtests (these steps
are often done by independent groups of experts; Goldstein & Zedeck, 1996, refer to this step as
retranslation), and (3) assessing the communalities between these KSA lists. If the same KSAs
are judged to be required for performing well on the test and for performing well on the job, then
the inference is made that people who do well on the test will also do well on the job.

Criticism of Content-Oriented Methods


There is a long history of criticism of content validity in the personnel psychology literature, most
notably a set of papers by Guion (1977, 1978a, 1978b), who claimed that “There is no such thing as
content validity …” (1978a, p. 212). Guion (1978b) went on to note that validity refers to inferences
from scores, not to the tests themselves, suggesting that an examination of test content might not
in principle be sufficient to support an inference that tests scores predict performance. His argu-
ment is that tests cannot be valid or invalid; even if items are representative of content domain,
this does not guarantee that scores obtained when administering a test made up of these items will
predict future performance on that domain. For example, bias in scoring could affect validity. Tests
might be so easy or so difficult that there is little variance in scores; severe restriction in range
will substantially limit the correlations between test scores and performance measures. Test items
Current Concepts of Validity, Validation, and Generalizability 107

might be representative of the job, but the responses people make to these items might have little or
nothing to do with responses to the same sort of content in a work setting (Guion, 1977; Murphy &
Davidshofer, 2005; Sackett, 1987).

AN INTEGRATED FRAMEWORK FOR VALIDATION


Binning and Barrett (1989) proposed a general framework for integrating the various questions
about validity that might sensibly be asked in evaluating the validity of a selection test; this frame-
work is illustrated in Figure 5.1.
Several features of this framework are likely to be interested in several aspects of validity,
including:

• Measure-to-measure relationships: How well do scores on the test correlate with perfor-
mance ratings (criterion-related validity)?
• Measure-to-construct relationships: How well do test scores represent cognitive ability?
How well do performance ratings represent job performance (construct validity)?
• Predictor-measure-to-criterion-construct relationships: Do test scores predict job perfor-
mance (as opposed to performance ratings)?

Typically, criterion-related validity studies start (and sometimes end) with measure-to-measure
relationships (e.g., the correlation between test scores and performance ratings). As the Binning
and Barrett framework makes clear, these correlations are not really the main focus of a validity
study. Validity studies usually start with the assumption that there are strong links between the
criterion measure and the construct it is supposed to represent; here this is equivalent to assuming
that performance ratings are good measures of job performance, an assumption that is probably not
true (Murphy & Cleveland, 1995). If there is strong evidence of the construct validity of criterion
measures, the Binning and Barrett (1989) framework suggested that it should be possible to make
strong inferences about the link between test scores and the performance construct (this link is
shown with a heavy line in Figure 5.1). However, establishing a strong link between the predictor
and the criterion construct requires, first and foremost, good criterion measures. Given the long and
sorry history of criterion measures used in personnel psychology (Austin & Villanova, 1992), it is
hard to be optimistic about our ability to make well-founded inferences about links between test
scores and poorly measured criteria.
This framework implies that construct validity issues may have different meanings for predic-
tors and criteria. The simplest case is one in which the test and the criterion show strong evi-
dence of construct validity. In that case, empirical correlations between tests and criterion measures
provide convincing evidence that the underlying constructs are related. On the other hand, it is

Performance
Ability test
rating

Cognitive Job
ability performance

FIGURE 5.1 Relationships between constructs and measures. (Adapted from Binning, J. F., & Barrett, G. V.,
Journal of Applied Psychology, 74, 478–494, 1989.)
108 Handbook of Employee Selection

possible that a test that shows poor construct validity could nevertheless show good performance
as a predictor. The correlation between test scores and both criterion measures and the criterion
constructs they represent do not necessarily depend on the construct validity of the predictor. On
the other hand, poor construct validity on the criterion side renders empirical evidence of correla-
tions between predictors and criterion measures potentially useless. Nobody really cares if tests are
correlated with performance ratings; what is important is whether or not test scores are correlated
with performance.

GENERALIZING RESEARCH-BASED INFERENCES


TO IMPLEMENTATION DECISIONS
This section shifts the focus to the link between the science of validation and decisions about
selection processes. Perhaps the most informative way to describe this link is in terms of the
generalizations necessary to translate general inferences about validity into decisions about the
design and implementation of local selection procedures. In this section, we address four topics
about these generalizations. First, we begin by describing the types of generalizations that support
implementation decisions. Second, we describe several examples of the generalizations required of
virtually all selection implementation designs. Third, we describe the extent to which the common
validation methodologies inform these local generalizations. Finally, we evaluate the strengths and
weaknesses of these validation methodologies and suggest possible improvements that would better
enable scientifically sound implementation decisions.

LOCAL GENERALIZATIONS
First, it is important to clarify the meaning of “generalization” in the context of this discussion
about implementation decisions. Generalizations that support (justify) implementation decisions
refer to inferences that previous validation research results apply in the specific local setting
in which a selection program is being designed and implemented. We refer to these as “local
generalizations.” Local generalization is somewhat different from the generalization that
underlies the typical scientific inference from empirical evidence to theoretical conclusion.
For example, the inference from research evidence that conscientiousness leads to low error
rates is an inference from data to a theoretical proposition. This inference is different from the
conclusion that, say, Hogan Personality Inventory conscientiousness scores will predict filing
accuracy among new clerical hires in XYZ Corporation. The latter conclusion is an inference
that a theoretical proposition applies in a specific local setting. This is a local generalization. The
guidelines in the Standards and Principles about selection validation methods as described above
are primarily focused on assuring meaningful generalizations from data to theory. Similarly, the
principles of scientific inference (e.g., see Cook & Campbell, 1979) describe the scientific controls
and conditions necessary for a set of data to lead to unambiguous, unconfounded theoretical
conclusions. In contrast, the Standards and Principles provide far less guidance about sound
generalizations from theory to specific settings.
In a modest effort to codify the basis for making sound local generalizations, we suggest that the
tradition of selection research and practice supports two qualitatively different rationales, either of
which can support local generalizations. The first is a sampling rationale. The essential feature of
this rationale is that important features of the local, specific setting such as applicant type, predic-
tor construct, and criterion construct are represented by the set of validation research results from
which one is making a local generalization. The main focus of this rationale is the claim that fea-
tures of the local setting are adequately sampled in some sense by the study conditions that produced
the set of results. It should be understood that there is, or could be, a theoretical component to this
sampling rationale. The theoretical component would serve to identify some finite set of features
of the local setting that are relevant to the use of the selection procedure. This set of theoretically
Current Concepts of Validity, Validation, and Generalizability 109

relevant features would determine the manner in which the local setting should be sampled by the
set of research results. Consider the example of an organization’s decision to include personality
assessment in an assessment center. Lievens, Chasteen, Day, and Christiansen (2006) argued that
the validity of personality assessments depends on the “strength” of the work situation. If that is an
accepted theoretical conclusion, then the sampling approach to local generalization would require
that the research samples adequately represent the situation strength profile of the local setting.
The second type of rationale supporting local generalizations is based on a purely theoretical
justification. In this case, the argument justifying local generalization is that the theoretical rationale
accounting for the validity of a selection procedure is judged to apply to the local setting because
the defined theoretical components are known to be relevant in the local setting or because the
theoretical explanation discounts major situational differences as being irrelevant to the theoretical
framework that explains the predictor-criterion relationship. An expert makes this judgment on
the basis of the logical implications of the accepted theoretical framework. Consider the example
of job component validity (JCV), which is a method for local generalization because its purpose
is to derive an estimate of validity in a specific setting on the basis of previous research results.
Hoffman, Rashovsky, and D’Egidio (2007) described this method of generalization as relying on a
theoretical framework linking validity to job components and a “gravitational” hypothesis justifying
assumptions about the meaning of long-term incumbents’ test scores. A JCV estimate is derived
from a theoretical rationale, not a sampling rationale.
Finally, it is important to note that local generalizations are equally relevant to conclusions
about local measurement properties and prediction properties. From an implementation perspec-
tive, measurement inferences are used to identify and attach meaning to the predictors and criteria,
and prediction inferences define the impact or usefulness of selection decisions. Local general-
ization inferences provide confidence that the measurement and prediction inferences are accu-
rate in the context in which these measures will be used. In the Binning and Barrett framework
described above, measurement inferences are represented by the vertical links and prediction infer-
ences by the horizontal links. Our view of this model is that it is intended to capture the diverse
types of data-to-theory inferences underlying selection validity and is not intended to represent
local generalizations. We take this view because the two considerations represented in the model—
criterion-versus-predictor and measure-versus-construct considerations—are separate from the
general-versus-specific considerations underlying local generalizations. To capture considerations
of local generalization, the model could be expanded to include this third dimension distinguishing
general theoretical conclusions from generalizations to local settings. In this extended model, link-
ages would represent local generalizability across this third dimension.

Examples of Local Generalizations


A point of emphasis in this chapter is the importance of generalization in decisions about the design
and implementation of selection programs. It is our experience that the frequency and importance
of local generalizations are underestimated in the professional discourse about personnel selection.
This may be a surprising claim given the dominance of the meta-analytic basis for validity general-
ization in personnel selection research over the past 30 years. The point is that there are more types
of local generalizations about setting-specific validity than are usually discussed in the research
literature.
Selection programs are typically designed to be applied consistently across conditions, time,
and people. Also, cost and practical considerations often substantially limit the complexity and
diversity of selection practices, even within very large organizations with diverse types of work.
These pressures toward consistency, simplicity, and low cost, along with the usual limitations on
professional resources to support selection programs, mean that the number of different selection
processes is often small compared to the overall diversity of work and conditions within an
organization. These pressures also imply that core components of selection programs are relatively
unchanging over time. Although this relative constancy is driven in some part by pragmatic
110 Handbook of Employee Selection

considerations, it is justified by the local generalization that the usefulness of the selection program
is similarly constant across those times and situations.
Each of the following briefly describes a characteristic of typical selection programs that implies
some specific local generalization necessary to support the design of the selection program.

1. One set of selection procedures is applied across a wide range of work differences. These
include differences between positions within jobs, jobs within job families, job families
within organizations, across changes in the work, and across time. It is often the case
that selection program designers seek to create job families that are as wide as possible
but still admit a single selection solution across all differences within the family. This is
often driven by considerations of cost, efficiency, and ease of employee movement within
the family of jobs. But program designers make a decision about the actual range of jobs
within the family that will share the same selection process. This decision is based, in
part, on a conclusion that the range of true criterion differences within the family is small
and the range of predictor validity values across those criteria is similarly small. It should
be noted that the local generalization about the range of criterion differences is not an
inference about measurement. This is an inference about the meaning of the true latent
structure of the diverse criteria and not a conclusion about the measurement of these cri-
teria. After all, the organization values workplace behavior, not the measure of workplace
behavior.
2. All candidates are treated the same with respect to their scores on the selection proce-
dures. Although this approach may, to a great extent, grow out of the regulatory pressure
not to distinguish between protected and majority groups, other differences between can-
didates are also often ignored, such as differences between experienced and inexperienced
candidates, candidates with internal performance records and those with none, and candi-
dates with recent educational experience and those with older education experience. The
underlying generalization is not that candidate factors have no effect on validity or score
levels but that the effects of such candidate differences are small compared to the costs that
would be incurred by tailoring selection practices to such candidate differences.
The generalization that supports the decision to treat all candidates’ scores the same is
a measurement inference and a prediction inference. The measurement inference is that the
meaning of candidate scores is independent of candidate differences such as work experi-
ence, education, and test preparation. A particularly compelling version of this practice is
the strategy of administering a selection procedure only in English. Although the organi-
zation-level rationale for this decision may be that English is the language of the job, the
underlying measurement conclusion supporting this practice is that linguistic differences
between candidates do not affect the meaning of the assessment scores. Of course, this is
a less important conclusion for selection procedures that have little or no verbal content.
The companion prediction conclusion supporting this practice is that candidate differ-
ences do not systematically moderate the predictive validity of the selection procedure.
Consider the example of a sales situational judgment test (SJT) developed in the context
of a particular sales job within an organization. Candidate differences in a priori knowl-
edge of sales job procedures may systematically influence the predictive validity of the
sales SJT.
3. Once implemented, all predictors in a selection system are often managed in the same
way. Predictors differ with respect to the variance of their validities. But often different
predictors are treated as if their validities are as equally unchanging across time and situ-
ations. For example, it is not uncommon for a combination of cognitive and personality
predictors to be implemented in the same way across the same family of jobs regardless
of evidence that personality is likely to have more variable validity across those jobs. This
simplification of process is almost always attributed, in part, to the administrative value of
Current Concepts of Validity, Validation, and Generalizability 111

selection processes that are easy to manage. This may also be attributable, in part, to the
scientific result that the combined effect of positively correlated predictors is near optimal
when they are given equal weight even if they have unequal validities.
The local generalization that justifies the practice of treating all predictors alike is about
prediction validity differences across time and conditions. The supporting conclusion is
that such differences are small relative to the cost and administrative burden that would be
caused by tailoring selection processes to such differences.
4. Relatively short tests are chosen to represent desired predictor constructs. In large-scale
commoditized employment processes, it is not uncommon for vendors to offer and orga-
nizations to choose shorter assessments of desired predictor constructs. (This point is
separate from the relative merits of computer adaptive assessments.) For example, it is not
uncommon for operational cognitive ability batteries to consist of three tests each requir-
ing 5 minutes to administer. Similarly, short personality forms are increasingly available.
The practical considerations are obvious—shorter costs less. The underlying local gener-
alization is a measurement inference about the predictors that the accumulated effects of
shorter length do not outweigh the lower cost. The possible effects of shorter length may
include predictor construct under-representation, lower validity, and higher variability in
scores and validity across individual differences and test conditions.
5. New hires are expected to be acceptable on every facet of work behavior. Individual
selection studies typically focus on criteria that capture a manageable number of work
behaviors. Most often, these include some combination of task performance behaviors,
contextual behaviors, overall performance, and turnover. Typically, validity studies either
focus on specific facets of work behavior or aggregate a limited number of facets such as
task performance and contextual performance to create an overall performance measure.
In contrast, selected candidates are expected to demonstrate acceptable behavior in every
facet of work behavior valued by the organization. For example, new hires are expected
to be productive enough, fast enough, helpful enough, honest enough, reliable enough,
prompt enough, have appropriate enough personal habits, stay long enough, work safely
enough, be easy enough to work with, and so on.
The selection program designer’s conclusion that those selected will be acceptable in
every facet of employee value is, at root, a local generalization about the criterion constructs
measured in validity studies. There appear to be three possible conclusions about criterion
constructs that would support the expectation that selected employees are acceptable on all
facets of work behavior. First, one might conclude that each facet of work behavior valued
in the local job and organization is represented among the criteria used in existing validity
studies. Second, one might conclude that aggregate measures of performance commonly
reported in existing validity studies have sufficient construct scope that the validity with
which they are predicted is generalizable to any particular facet of work behavior. Third, one
could rely on a criterion “halo” effect that, if existing studies demonstrated validity for some
facets of work behavior, then that level of validity may be generalized to unmeasured facets of
work behavior because people who are good at some things tend to be good at other things.
6. Test scores are not adjusted to eliminate the effects of candidates’ previous experience
or insight about tests. Selection program designers decide whether to adjust test scores on
the basis of candidate experience with or insight about the tests. Selection programs rarely
adjust cognitive test scores to correct for effects of retesting, test preparation training,
test anxiety counseling, or other test-related experience that may impact test scores. This
is a consequential decision because recent evidence (Hausknecht, Halpert, Di Paolo, &
Moriarty, 2007) continues to confirm that retesting leads to approximately a 0.25 stan-
dard deviation improvement in selection test scores, and new evidence (Lievens, Reeve, &
Heggestad, 2007) suggests retesting changes measurement constructs and decreases pre-
dictive validity. In contrast, it appears more likely that some selection programs have
112 Handbook of Employee Selection

implemented adjustment strategies for personality assessment scores. (We have no empiri-
cal data on this point, but the research attention given to indices of socially desirable
responding suggests that some selection programs use such indices to adjust personality
scores.)
Two types of generalizations are needed to support decisions made about score adjust-
ments. The first is a measurement inference about the effect of candidate experience and
insight differences on scores and whether those score effects are construct-relevant. The
second is a prediction inference about the effects such score differences have on criterion
validity. A decision not to adjust scores would be justified by the conclusion that measure-
ment effects are small or, even if measurement effects are not small, effects on criterion
validity are small. As with all other program decisions described here, other factors unre-
lated to measurement and prediction will play a significant role in decisions about score
adjustment and are likely to force the program designer into an evaluation of the tradeoffs
between competing interests.

A key point here is that each of these six examples of a common selection practice represents
a set of decisions made by the selection program designer on the basis of conclusions about the
effectiveness of the selection practice in the local setting. Even if one argues that these practices
simply reflect a pragmatic tolerance of imperfection or the unknown, all decisions affect the value
of a selection program for an organization. In this discussion about professional selection practice,
decisions with such consequences are assumed to have a substantive rationale relevant to the valid-
ity of the selection program.
These six examples of local generalizations may or may not apply to any particular selection
program. On the basis of these and other examples of specific local generalizations, Table 5.2
describes a simple, more general taxonomy of local generalizations relevant to all selection programs.
This simple model is organized around two facets of any local generalization. In an applied selection
context, all local generalizations are generalizations of either a theoretical measurement inference
or a theoretical prediction inference and apply to the predictor, or the criterion, or both. Within the
family of theoretical measurement inferences, some relate to the meaning of scores, others relate to
characteristics of the score distributions. This is an important reminder that selection designers and
implementers care about more than relationships. Score levels are also important. Within the family
of theoretical prediction inferences, some relate to the relationship between the predictor and the
criterion, others refer to independent factors that may moderate that relationship. Admittedly, this
is a fine distinction. However, many of the most typical or important challenges for local selection
implementation work involve a consideration of possible moderators.

RELATIONSHIPS BETWEEN TYPES OF EVIDENCE AND LOCAL


GENERALIZATIONS SUPPORTING IMPLEMENTATION DECISIONS
To describe the relationships between validity evidence and decisions about the design and imple-
mentation of selection programs, we organize validity evidence into the same four categories as
described in the previous section—criterion oriented methods, meta-analysis-based validity gen-
eralization, synthetic validation, and content-oriented methods—and, additionally, evidence based
on internal structure. Four of these five categories were described in the previous section focusing
on their methodological definitions and distinctions. In contrast, the treatment here focuses on the
extent to which each type of evidence is relevant to local generalizations that make inferences
about validity in the specific local setting from evidence gathered in research settings. Also, for the
purpose of this evaluation, we broaden the meaning of synthetic validity to include job component
validity and transportability methods. These share the common feature that they develop validity
inferences on the basis of the assumption that whole job validity is a function of the aggregation of
Current Concepts of Validity, Validation, and Generalizability 113

TABLE 5.2
Local Generalizations Relevant to Selection Programs
Type of Local
Generalization Predictor Criterion
Measurement
Score meaning • Chosen predictors assess intended constructs • Local criteria of interest
• Locally developed predictors assess intended constructs (implicit or explicit) associated
• Motives of local test takers do not alter intended constructs with the intended use of the
• Local test preparation options do not alter intended constructs selection procedures represent
• Local retesting practices do not alter intended constructs the constructs underlying
• Local accommodations to assessment procedures for assumed levels of validity
candidates with disabilities do not alter intended constructs • Local job analysis methods
• Local exemptions/waivers from assessment requirements are accurately represent the major
based on alternative representations of the constructs assessed facets of work and worker
by the predictor attributes
Score • Response strategies of local candidates do not alter expected • Systematic errors by criterion
psychometrics scores raters, if used, do not alter the
• Score reliability in local candidate samples is sufficient level of performance targeted
• Local circumstances such as recruiting practices do not alter by choice of qualifying
expected group differences standards, if any

Prediction
Predictor- • Local conditions do not alter the level of validity inferred • The introduction of the selection
criterion from previous research program does not, itself, alter the
relationship importance of criteria targeted
by the program’s intended use
Prediction • Local conditions do not cause irrelevant changes in • Local conditions do not
moderators differential prediction slopes or intercepts in analyses of introduce unmeasured criterion
differential prediction constructs, leading to biased
• Local candidate characteristics ignored in the selection differential prediction results
process such as level of education and experience, do not
moderate level of validity
• Local differences between positions, jobs, or work families for
which the same selection process is applied do not alter the
level of validity expected from previous research

job element validities across the job elements thought to be relevant to a whole job. Finally, we have
not included evidence based on response processes or evidence based on consequences of testing,
both of which are included in the Standards. Evidence based on consequences does not derive from
a unique methodological approach, as do the five methods described here. In effect, asserting that
evidence based on consequences is within the scope of selection validity simply expands the scope
of criteria to include other possible consequences than group effects. It has no unique implications
for validation methodology other than a possible organization level of analysis. For this reason, it
has no unique implications for local generalization other than to suggest a broader scope. Similarly,
evidence based on response processes is distinguished primarily by the nature of its theoretical
foundation, not its methodology. In fact, one could argue that every predictor-criterion relation-
ship of interest to selection psychology could admit a theoretical analysis based on the examinee’s
response process.
For each of the five types of evidence, we describe its relevance to local generalizations. Evidence
supporting local generalizations justifies a conclusion that validity relationships demonstrated by the
validation methodology apply in the specific local setting. The relevance of each type of evidence
114 Handbook of Employee Selection

can be evaluated by examining its relevance to measurement and prediction conclusions and by
examining the extent to which the evidence provides a sampling or theoretical rationale linking the
specific local setting to the research evidence.
A final point about the evaluations that follow is that all of the five categories of validation
methods are characterized by the types of variables they utilize (e.g., item content, performance
measures, inter-item correlations) and the types of information they gather (e.g., empirical data
and expert judgments). These validation methods are not distinguished by their application of the
basic requirements of unconfounded inference. Examples of such requirements include the repre-
sentativeness of samples with respect to intended populations, the experimental independence of
cause and effect measures, a temporal sequence of observations consistent with the hypothesized
relationship, and adequate sample sizes. These fundamental requirements of scientific inference are
applicable to all five categories of methods although often in somewhat different ways. However,
because they are not specific to one or another category of validation methods, these fundamental
principles of sound inference are not included in the evaluation of these six methods relevant to
local generalizations. We assume the version of each method we evaluate complies with these fun-
damental requirements.

Criterion-Oriented Methods: Characteristic Features


Evidence about the relationships between test scores and criteria is a common type of validity
evidence for selection procedures. There are three distinguishing features of this type of evidence.
First, it is typically theory-based. Relationships between predictor variables and criterion variables
provide evidence of predictor validity to the extent that there is a theory-based rationale for
hypothesizing or expecting such relationships. This is particularly true of predictive designs in
which predictor assessments are obtained prior to the occasion of criterion outcomes. Concurrent
validation designs lack this important feature. Second, it is virtually always empirical, which not
only provides a basis for hypothesis testing of theoretical relationships but also enables sampling-
based generalizations across various situational factors. Third, this type of validation method can be
applied to investigations of measurement validity and prediction validity. When used to investigate
measurement validity, criterion measures are replaced with marker variables for analyses of
convergent and divergent relationships. Marker variables serve the same purpose in investigations
of measurement validity that criterion variables serve in investigations of prediction validity.
Both serve as experimentally independent variables with which the predictor of interest has some
hypothesized relationship(s).

Criterion-Oriented Methods: Relationship to Local Generalizations


Criterion-oriented validation methods have several features that are favorable for local generalizations.
First, these methods can provide evidence about measurement and prediction validity. This feature
enables criterion methods to provide information about the meaning of the measures of predictors
and criteria as well as the relationships between them. In our experience, many implementation
decisions are rooted in conclusions about the meaning of locally chosen or developed measures.
Second, criterion methods are often theory-based. Because criterion methods typically focus on the
consistency between empirical relationships and theory-based hypothesized relationships, they are
able to confirm theoretical explanations of predictive relationships and/or the meaning of measures.
There is no more compelling rationale for making an inference about relationships in a local setting
than considering the relevance of the theoretical explanation to the specific circumstances of the
local setting. For example, consider the well-known validity of general mental ability (GMA)
tests for the prediction of task proficiency. The theoretical meaning of GMA and the cognitive
requirements of task proficiency lead to the empirically supported conclusion that task complexity
moderates validity. Further, this theoretical explanation suggests no plausible basis for expecting
many possible characteristics of specific local settings (e.g., industry type, organization size, or
culture) to moderate the validity of GMA tests. A strong theory-based explanation for validity
Current Concepts of Validity, Validation, and Generalizability 115

can discount many possible setting-specific moderators. This, in turn, greatly facilitates the local
generalization that validity levels observed in research settings apply to the local specific setting.
A compelling example of this type of theory-based generalization is offered by Hogan, Davies,
and Hogan (2007), who developed a theory of personality at work to explain why the validity
of personality generalizes across contexts. Third, criterion methods are empirical. They involve
samples of research participants. This sampling feature enables selection program implementers to
compare local candidate pools to well-described research samples. Generalizing validity evidence
to the specific local setting is supported by the demonstration that the research samples represent
the local candidate pool.
An important strength of criterion methods is that they can provide sampling and theory-based
evidence. This is valuable because sampling-based generalizations and theory-based generaliza-
tions are interdependent. Theory-based explanations for valid relationships allow one to recognize
what sample characteristics are relevant to the generalization from research samples to local candi-
date pools. In the example above, empirically supported theory implies that GMA-task proficiency
validity is not moderated by organization culture. On the basis of this theoretical conclusion, the
local selection designer is justified in not considering organization culture when judging whether
the research samples represent the local candidate pool.

Meta-Analysis: Characteristic Features


In this chapter we distinguish between meta-analyses of empirical research results to draw conclusions
about general and local validity and other methods (e.g., synthetic validity) for estimating local
validity on the basis of judgments about job elements. Meta-analytic studies have three primary
characteristics. First, they are empirical in that they mathematically aggregate empirical research
across studies. Although various aggregation methods have been developed, all share the same
essential feature that they summarize empirical validity results by mathematical methods. As a
consequence, the validity estimates produced can be considered empirical values, not judgments.
Second, meta-analyzed results are used to test hypotheses about the level and variability of validity
estimates. Meta-analysis is not simply empirical aggregation. In the context of selection research
it also is used to evaluate theoretical hypotheses about validity relationships. A unique feature of
meta-analysis is that it can evaluate two types of theoretical hypotheses—one methodological/
psychometric and the other psychological. Meta-analysis can evaluate hypotheses about the
extent to which validity indices are affected by method/psychometric artifacts (e.g., measurement
unreliability and range restriction) by developing and applying corrections for such artifacts and
examining the impact of these corrections on validity estimates. Also, meta-analysis results can
be organized around particular predictors and criteria of interest to evaluate psychological theories
about predictors and criteria and possible moderators.
Third, meta-analysis can be used to estimate validity values in local settings. There are two
general approaches to this estimation process. One approach, referred to as the “medical model”
by Schmidt and Raju (2007), used the weighted average validity across samples as the estimate
of local validity. The theoretical rationale supporting this approach is the conclusion that situ-
ational variability in validity has been found to be relatively small. As a result, the mean valid-
ity across study samples serves as an accurate estimate of local validity. In the medical model
approach, if local empirical validity results are available they are combined with all other study
results based on sample size alone and are given no unique consideration. The other approach
to local validity estimation from meta-analytic results is to combine available local validity val-
ues with meta-analytic estimates using Bayesian methods. Brannick (2001), Schmidt and Raju
(2007), and Newman, Jacobs, and Bartram (2007) described somewhat different approaches to
weighting current local study results and previous validity estimates, but all share the essential
Bayesian feature that the combination weights assigned to current local validity results and
previous validity estimates are a function of some index of the uncertainty of the current and
previous estimates. Bayesian methods for estimating local validity usually give greater weight,
116 Handbook of Employee Selection

and in some cases much greater weight, to the current local study estimates than would the
medical model.

Meta-Analysis: Relationship to Local Generalizations


Meta-analytic methods can provide strong support for local generalizations relating to predictive
validity and sample representation. In practice, most selection-related meta-analyses are about pre-
dictor-criterion relationships, not predictor or criterion measurement validity. Perhaps the primary
contribution of meta-analysis methods to local generalizations relevant to selection programs is the
set of conclusions about the size and variability of predictor-criterion validity relationships. This set
of conclusions is known as validity generalization (VG) and consists of findings related to validity
estimates and to situational moderators of validity estimates, both of which have direct relevance
to local generalizations. The findings about validity estimates tend to show that validity levels are
higher than observed validity estimates because of the net effects of methodological and psycho-
metric artifacts. The findings relating to situational factors are more directly relevant to local gener-
alizations because these findings help practitioners judge the relevance of specific features of their
local settings to the generalization of validity to their local setting. Meta-analysis methods have
been developed to estimate confidence intervals around validity estimates to describe the extent to
which situational factors might impact the level of validity.
Although meta-analyses of selection research have focused primarily on predictor-criterion
relationships, other meta-analyses have addressed measurement issues. Viswesvaran, Schmidt, and
Ones’ (2005) meta-analyzed performance rating reliabilities to draw conclusions about a general
performance factor. McDaniel, Whetzel, Schmidt, and Maurer (1994) examined the relationship
between interview structure and validity and differences between performance ratings provided
for administrative and research purposes. Many meta-analyses have been conducted to estimate
the effects of group differences on predictors (e.g., Roth, BeVier, Bobko, Switzer, & Tyler, 2001)
and performance measures (e.g., McKay & McDaniel, 2006). Meta-analyses of retesting effects
on test scores have also been conducted (Lievens, Reeve, & Heggestad, 2007; Hausknecht et al.,
2007). In meta-analyses of prediction and measurement validity, the primary contributions to local
generalizations are the magnitude of validity estimates and evidence about the extent to which
validity magnitude depends on situational factors.
A final note is needed concerning the methods of local validity estimation described by Brannick
(2001), Newman, Jacobs, and Bartram (2007), and Schmidt and Raju (2007). These methods are
mathematical and rational models for defining and computing accurate estimates of local validity.
They are not methods for confirming that the sampling and/or theoretical rationales for local gener-
alizations are adequately supported.

Synthetic Validation: Characteristic Features


We distinguish synthetic validation evidence from the others because this class of evidence uniquely
relies on some form of synthesis of predictive validity estimates at the job component level into an
estimate or prediction of predictive validity for the job as a whole. For this discussion, the syn-
thetic validation category includes transportability evidence, job component validity evidence (the
method described by McCormick, 1959), and the broader subclass of methods known as synthetic
validation.
Synthetic validation methods have three defining characteristics. First, they decompose a job
into component parts by some form of job analysis methodology. Second, a validity estimate is
associated by some rationale with each component representing the extent to which the selection
procedure in question is expected to predict performance in that component of the job. Validity
values may be linked to these components by various methods including expert judgment, local
validity studies at the component level, and validities constructed from historical data. Third, some
method is applied to the set of component-wise validities to aggregate them into a single validity
associated with the whole job.
Current Concepts of Validity, Validation, and Generalizability 117

Synthetic Validation: Relationship to Local Generalizations


There is a clear and strong relationship between synthetic validation methods and local generaliza-
tions in that the methodology targets a specific, clearly defined job that is virtually always a specific
job in the local setting. Synthetic validation methods are constructed to provide a local generaliza-
tion in the form of a predictive validity value associated with a specific job. At the same time that
this relationship is strong, it is also narrow. Synthetic validation methods provide a single conclusion
about validity in the local setting. They provide an estimate of or prediction about the predictive
validity expected of a specific selection procedure for component-specific performance. They do
not provide generalizations about measurement validity, group differences, or any of the other types
of local generalizations underlying the six generalizations described above. One particularly subtle
limitation of synthetic validation is that such methods constrain the scope of the implicit criterion
space to include work behavior directly associated with individual components of a job. In effect,
synthetic validation methods constrain the intended use of the selection strategy to be a function of
the job rather than a function of other considerations such as managers’ and organizations’ priorities
about people and work.

Content-Oriented Evidence: Characteristic Features


As we noted above, despite the continuing controversy over content-oriented methods of valida-
tion, legal and professional guidelines support the use of content evidence as a basis for inferring
the validity of a selection procedure. Content-oriented strategies have been described in various
ways (Stelly & Goldstein, 2007). Across these descriptions, four core characteristics emerge. First,
the target work is described in terms of important work behaviors, tasks or activities, work outputs
or worker requirements. Second, target worker attributes such as knowledge, skill, and ability, are
described. These descriptions are often based in part on the intended type of selection procedure
(e.g., job knowledge test or SJTs) and in part on the outcomes of the job analysis. Much has been
written and debated about these two sets of descriptions, but, at a minimum, it can be said that
content-based methods provide persuasive evidence of validity only to the extent that job experts
can accurately infer the meaning of relevant worker attributes from the work description provided.
Third, job experts judge the linkage between test content and job content. This may take many
forms but often consists of judgments by the job experts about the importance of each test item
or each tested knowledge for successful performance in one or more job activities. Fourth, some
method of analysis is applied to this linkage to ensure that the test content represents important
job components. This analysis may take the form of a document describing the extent to which the
test items assess the skills, knowledge, or other attributes most important to job success. In some
cases this feature includes the actual development or selection of test items by job experts. In some
cases, content evidence is further established by having the job experts estimate the test score that
would be expected of incumbents who perform the job at some target level of proficiency. These
core characteristics of a content-oriented validation strategy provide a rationale for a measurement
validity conclusion that the test scores measure worker attributes important for job performance.
The tradition within selection psychology is that content validation also justifies a prediction infer-
ence that “scores from the selection procedure can be generalized to the work behaviors and can be
interpreted in terms of predicted work performance” (Principles, p. 25). It is important to note that
the relationship of content validation methods to predictive validity is limited. Although content
evidence offers descriptions of criterion information (i.e., important work behaviors, activities, etc.)
and predictor information (i.e., targeted worker attributes to be assessed by the selection proce-
dures), it provides no information about the psychometric characteristics of predictor or criterion
scores. For this reason, the claim that content evidence allows predictor scores to “be interpreted
in terms of predicted work performance” should be interpreted as a theoretical proposition, not an
empirical result. The precision and accuracy of prediction can be estimated only with additional
types of evidence.
118 Handbook of Employee Selection

Content-Oriented Evidence: Relationship to Local Generalizations


The relationship between content-related evidence and inferences about generalization to other jobs
or situations or contexts is clear. Because content-related evidence is typically based on information
about specific jobs, the resulting validity conclusions apply directly to the local setting from which
the job information was derived. However, in addition to the particular local setting associated with
the analyzed job, inferences of validity can be directly supported in other settings on the basis of a
comparison of work components between settings. In effect, content-related validity inferences—
measurement and prediction—are about components of jobs, not whole jobs. An inference about
a whole job is, essentially, merely the aggregation of inferences about several components of jobs.
Content evidence supports essentially the same local generalization rationale as transportability
evidence. Both rely on the relative clarity and persuasiveness that two similarly described work
components represent the same performance constructs. Such a comparison is far more complex
and ambiguous when two whole jobs and their associated contexts are being compared. The local
generalization rationale is that in which the major job components are similar—a content-based
inference of validity applies.

Internal Structure Evidence: Characteristic Features


Internal structure evidence is about measurement inferences. It informs prediction inferences only
indirectly to the extent prediction is a function of the internal structure of the measures. The vari-
ables of interest are variables within the structure of a particular measure, such as items within a test
and subscales within an inventory. Strictly, internal structure evidence does not include marker vari-
ables that are external to the measure itself. The two key features of internal structure evidence are
(a) hypothesized relationships among internal variables are derived from theoretical propositions
about the variables and (b) that structural analyses are conducted to evaluate whether the empirical
data fit the hypothesized relationships. To be sure, this simple description belies considerable com-
plexity in the technology of the structural analyses that can be used to investigate internal evidence.
For example, McKenzie, Podsakoff, and Jarvis (2005) showed that mis-specifying formative mea-
surement models as reflective models can lead to substantial bias in structural parameter estimates.
On the other hand, at its simplest level, internal structure evidence can take the form of simple item
analyses that evaluate the internal consistency of items that are intended to be homogeneous.

Internal Structure Evidence: Relationship to Local Generalization


Unlike content, synthetic, and meta-analytic evidence, internal structure evidence does not gather
evidence that translates directly into local generalizations. It is similar to criterion-oriented evi-
dence in this regard. Both provide evidence of considerable relevance to the design and implemen-
tation of selection programs. Internal structure evidence is usually empirical evidence that can
provide empirical estimates of psychometric properties such as reliability, group differences, and
measurement bias, all of which are fundamental to decisions that selection program designers make
about the use of specific selection procedures. This is especially valuable information in which the
samples represent relevant applicant populations. In this case, such empirical estimates can inform
decisions about the use of test scores such as cut scores, the estimate of range restriction values,
and the like. However, the local generalization of their conclusions to a specific setting requires
consideration of additional features of the research design such as information about sample char-
acteristics, test taker motives, testing conditions, and the like.

PROFESSIONAL CONSIDERATIONS REGARDING LOCAL


GENERALIZATION AS PART OF PRACTICE
This summary of the relationships between several common types of validation evidence and the
requirements of local generalizations makes the point that two types of evidence—synthetic and
meta-analytic evidence—are designed specifically to evaluate a rationale for local generalizations.
Current Concepts of Validity, Validation, and Generalizability 119

Synthetic validation consists of methods for constructing local generalization arguments on the
basis of similarity of job components across jobs/settings, their relative importance, and informa-
tion about predictive validity values for individual components. Content validation evidence could
be used in the same way, although it is typically not used in this fashion. Typically, content evidence
is gathered in the specific setting in which the selection procedure will be used. Any validation
inference based on locally gathered content evidence is an inference about local validity.
Meta-analyses of prediction results and, less often, measurement data can support three separate
rationales for justifying local generalizations. First, meta-analyses summarize information about
validity differences across samples. This provides a sampling rationale for local generalizations to
the extent that the local setting is well represented by the range of samples contributing to the meta-
analysis results. Second, meta-analyses lead to theoretical conclusions about the likely presence and
meaning of situational variables that moderate validity. Where meta-analyses indicate that mea-
sured moderators explain relatively little of the observed validity variability, a theoretical argument
can be made that such validities are unlikely to be affected to any great extent by any situational
moderators, measured or unmeasured (Schmidt & Hunter, 1998). Where that theoretical argument
is accepted, local generalizations need not depend on any evaluations of situational moderators. In
effect, the sampling rationale becomes moot. Finally, mathematical techniques have been developed
and evaluated by which meta-analytic mean validity values may be translated into estimates of local
validity values (Brannick, 2001; Newman, Jacobs, & Bartram, 2007; and Schmidt & Raju, 2007).
Other categories of evidence do not provide as direct rationales for local generalization. Criterion-
related evidence and internal structure evidence do not, themselves, aggregate information in a way
that leads directly to local generalizations. Certainly meta-analysis may be applied to results from
both to provide a direct rationale for local generalizability. This observation is not a criticism as
much as it simply points out the boundaries of certain classes of validation evidence.
Two final observations merit attention here. The first is that two traditions have emerged and have
been supported for justifying local generalizations. First is the empirical tradition, which is repre-
sented by meta-analytic techniques applied primarily to the results of criterion-oriented research.
The empirical tradition supports a sampling rationale by virtue of the fact that all empirical research
involves samples of participants who generate data in some applied or laboratory setting. The empir-
ical tradition also supports theory-based local generalizations by which the theoretical framework
explaining the valid relationship of interest can be interpreted to draw conclusions about the likely
strength of situation-specific moderators. Where theoretical analysis suggests few important mod-
erators, local generalizations are justified.
The second tradition is captured in content validation and synthetic validation methods. This
tradition might be called the expert judgment tradition of validation and local generalization. It
is important to acknowledge this tradition and, perhaps, check its boundaries from time to time.
This tradition manifests itself in content validation and synthetic validation methods in which
judgments by job experts of the relevance and/or validity of worker attributes with respect to job
components creates the foundation for the validity inference. When these judgments are made by
job experts who are work scientists, they might be expected to be grounded in empirical research
findings relevant to the judgment task. Where these judgments are made by job experts who are
not scientists, they are presumed to be grounded in experiential learning about the requirements of
successful job performance. Although not without criticism as noted above, the Principles endorses
both as sources of validity evidence. The question that should be raised about the expert judgment
tradition is about the degree of abstraction from observable work behavior that begins to degrade
the accuracy of expert judgments. There would be two benefits of a professional effort to identify
standards of validation that must be satisfied by expert judgment processes. One benefit would
apply directly to content and synthetic validation methods by defining process standards necessary
to warrant judgments that could be treated as a source of validity evidence. The second benefit
would be that the same standards could be applied to the process by which selection designers and
implementers make judgment about the appropriateness of local generalizations. Both are expert
120 Handbook of Employee Selection

judgments about the strength of an inference in which the inference relies on the experts’ knowledge
of the relevant research foundation and ability to interpret and synthesize that knowledge into
informative, accurate judgments.
The final observation acknowledges the importance for local generalizations of the accepted
body of knowledge representing selection psychology. Local generalizations, like any theoretical
inference, are made in the context of the accumulated body of knowledge relevant to the issue.
Although much of this section addresses the manner in which local generalizations are informed
by diverse types of validation methods and processes, local generalizations are likely to be less
accurate if each requires a repeated reevaluation of the full litany of all relevant research findings.
Rather, local generalizations would benefit greatly by access to professionally representative, if
not consensus, summaries of the existing body of research relevant to the validity issue in ques-
tion. Currently, the profession of industrial-organizational (I-O) psychology has no such systematic
forum for documenting what we know in a way that conveys broad professional acceptance.
In summary, we have described validity as a changing and multifaceted concept. Validity is the
accumulation of evidence of many types, including psychometric and structural relationships, evi-
dence of meaning, evidence of prediction, and evidence of consequences to support many types of
professional decisions relating to the implementation of selection programs. No single claim about
validity and no single method of validation are likely to capture all of these facets.

REFERENCES
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (1985). Standards for educational and psychological testing. Washington,
DC: American Educational Research Association.
American Psychological Association. (1954). Technical Recommendations for Psychological Tests and
Diagnostic Techniques. Washington, DC: Author.
Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & H. Braun (Eds.), Test validity (pp. 19–32).
Hillsdale, NJ: Lawrence Erlbaum.
Austin, J. T., & Villanova, P. (1992). The criterion problem 1917–1992. Journal of Applied Psychology, 77,
836–874.
Barrett, G. V., Phillips, J. S., & Alexander, R. A. (1981). Concurrent and predictive validity designs: A critical
reanalysis. Journal of Applied Psychology, 66, 1–6.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Binning, J. F. (2007). Validation strategies. In S. G. Rogelberg (Ed.), Encyclopedia of industrial and organiza-
tional psychology (Vol. 2, pp. 859–864). Thousand Oaks, CA: Sage.
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential
and evidential bases. Journal of Applied Psychology, 74, 478–494.
Brannick, M. T. (2001) Implications of empirical Bayes meta-analysis for test validation. Journal of Applied
Psychology, 86, 468–480.
Brennan, R. L. (1983). Elements of generalizability theory. Iowa City, IA: American College Test Program.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validity in the multi-trait multi-method
matrix. Psychological Bulletin, 56, 81–105.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design and analysis issues for field settings.
Chicago, IL: Rand McNally.
Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H. Brown (Eds.), Test valid-
ity. Hillsdale, NJ: Lawrence Erlbaum.
Cronbach, L. J. (1989). Construct validation after thirty years. In R. Linn (Ed.), Intelligence: Measurement,
theory, and public policy. Urbana: University of Illinois Press.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measure-
ments: Theory of generalizability for scores and profiles. New York, NY: Wiley.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52,
281–302.
Dimitrov, D. M. (2002). Reliability: Arguments for multiple perspectives and potential problems with general-
ization across studies. Educational and Psychological Measurement, 62, 783–801.
Current Concepts of Validity, Validation, and Generalizability 121

Goldstein, I. L., & Zedeck, S. (1996). Content validation. In R. Barrett (Ed.), Fair employment strategies in
human resource management (pp. 27–37). Westport, CT: Quorum Books.
Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An exploration of the job analysis-content validity process.
In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organizations. San Francisco, CA: Jossey-
Bass.
Gottfredson, L. S. (1986). Societal consequences of the g factor in employment. Journal of Vocational Behavior,
29, 379–410.
Graham, J. M. (2006). Congeneric and (essentially) tau-equivalent estimates of score reliability: What they are
and how to use them. Educational and Psychological Measurement, 66, 930–944.
Guion, R. M. (1977). Content validity, the source of my discontent. Applied Psychological Measurement, 1,
1–10.
Guion, R. M. (1978a). “Content validity” in moderation. Personal Psychology, 31, 205–213.
Guion, R. M. (1978b). Scoring content domain samples: The problem of fairness. Journal of Applied Psychology,
63, 499–506.
Guion, R. M. (1980). On trinitarian doctrines of validity. Professional Psychology, 11, 385–398.
Guion, R. M. (1998) Assessment, measurement and prediction for personnel decisions. Mahwah, NJ: Lawrence
Erlbaum.
Guion, R. M., & Cranny, C. J. (1982). A note on concurrent and predictive validity designs. Journal of Applied
Psychology, 67, 239–244.
Hausknecht, J. P., Halpert, J. A., Di Paolo, N. T., & Moriarty Gerrard, M. O. (2007). Retesting in selection: A
meta-analysis of coaching and practice effects for tests of cognitive ability. Journal of Applied Psychology,
92, 373–385.
Herrnstein, R. J., & Murray, C. (1994). The bell curve: Intelligence and class structure in American life. New
York, NY: Free Press.
Hoffman, C. C., Rashovsky, B., & D’Egidio, E. (2007). Job component validity: Background, current research,
and applications. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leverag-
ing existing validity evidence. San Francisco, CA: Jossey-Bass.
Hogan, J., Davies, S., & Hogan, R. (2007). Generalizing personality-based validity evidence. In S. M. McPhail
(Ed.). Alternative validation strategies: Developing new and leveraging existing validity evidence. San
Francisco, CA: Jossey-Bass.
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of
Vocational Behavior, 29, 340–362.
Jensen, A. R. (1980). Bias in mental testing. New York, NY: Free Press.
Kleiman, L. S., & Faley, R. H. (1985). The implications of professional and legal guidelines for court decisions
involving criterion-related validity: A review and analysis. Personnel Psychology, 38, 803–833
Kraiger, K., & Teachout, M. S. (1990). Generalizability theory as construct-related evidence of the validity of
job performance ratings. Human Performance, 3, 19–36.
Landy, F. J. (1986). Stamp collecting versus science: Validation as hypothesis testing. American Psychologist,
41, 1183–1192.
Langenfeld, T. E., & Crocker, L. M. (1994). The evolution of validity theory: Public school testing, the courts,
and incompatible interpretations. Educational Assessment, 2, 149–165.
Lawshe, C. H. (1952). Employee selection. Personnel Psychology, 6, 31–34.
Lawshe, C. H. (1985). Inferences from personnel tests and their validities. Journal of Applied Psychology, 70,
237–238.
Lievens, F., Chasteen, C. S., Day, E. A., & Christiansen, N. D. (2006). Large-scale investigation of the role of
trait activation theory for understanding assessment center convergent and discriminant validity. Journal
of Applied Psychology, 91, 247–258.
Lievens, F., Reeve, C. L., & Heggestad E. D. (2007). An examination of psychometric bias due to retesting on
cognitive ability tests in selection settings. Journal of Applied Psychology, 92, 1672–1682.
McCormick, E. J. (1959). The development of processes for indirect or synthetic validity: III. Application of
job analysis to indirect validity. A symposium. Personnel Psychology, 12, 402–413.
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The validity of employment inter-
views: A comprehensive review and meta-analysis. Journal of Applied Psychology, 79, 599–616.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum Associates.
McHenry, J. J, Hough, L. M., Toquam, J. L., Hanson, M. A., & Ashworth, S. (1990). Project A validity results:
The relationship between predictor and criterion domains. Personnel Psychology, 43, 335–355.
McKay, P. F., & McDaniel, M. A. (2006). A Re-examination of Black-White mean differences in work perfor-
mance: More data, more moderators. Journal of Applied Psychology, 91, 538–554.
122 Handbook of Employee Selection

McKenzie, S. B., Podsakoff, P. M., & Jarvis, C. B. (2005). The problem of measurement model misspecifica-
tion in behavioral and organizational research and some recommended solutions. Journal of Applied
Psychology, 90, 710–730.
Messick, S. (1988). The once and future issues of validity: Assessing the meaning and consequences of mea-
surement. In H. Wainer & H. Brown (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum.
Messick, S. (1989). Meaning and values in test validation: The science and ethics of assessment. Educational
Researcher, 18, 5–11.
Murphy, K. R. (2000). Impact of assessment of validity generalization and situational specificity on the science
and practice of personnel selection. International Journal of Selection and Assessment, 8, 194–206.
Murphy, K. R. (2003). Validity generalization: A critical review. Mahwah, NJ: Lawrence Erlbaum.
Murphy, K. R. (2010). How a broader definition of the criterion domain changes our thinking about adverse
impact. In J. Outtz (Ed.), Adverse impact (pp. 137–160). San Francisco, CA: Jossey-Bass.
Murphy, K. R., & Cleveland, J. (1995). Understanding performance appraisal: Social, organizational and
goal-oriented perspectives. Newbury Park, CA: Sage.
Murphy, K. R., & Davidshofer, C.O. (2005). Psychological testing: Principles and applications (6th ed). Upper
Saddle River, NJ: Prentice Hall.
Murphy, K. R., & DeShon, R. (2000a). Inter-rater correlations do not estimate the reliability of job performance
ratings. Personnel Psychology, 53, 873–900.
Murphy, K. R., & DeShon, R. (2000b). Progress in psychometrics: Can industrial and organizational psychol-
ogy catch up? Personnel Psychology, 53, 913–924.
Murphy, K. R., & Shiarella, A. (1997). Implications of the multidimensional nature of job performance for the
validity of selection tests: Multivariate frameworks for studying test validity. Personnel Psychology, 50,
823–854.
Nathan, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A meta-analytic investi-
gation. Personnel Psychology, 41, 517–535.
Newman, D. A., Jacobs, R. R., & Bartram, D. (2007). Choosing the best method for local validity estima-
tion: Relative accuracy of meta-analysis versus a local study versus Bayes-analysis. Journal of Applied
Psychology, 92, 1394–1413.
Nunnally, J. (1975). Psychometric theory 25 years ago and now. Educational Researcher, 4, 7–14, 19–21.
Peterson, N. G., Wise, L. L., Arabian, J., & Hoffman, R. G. (2001). Synthetic validation and validity gen-
eralization when empirical validation is not possible. In J. Campbell & D. Knapp (Eds.), Exploring
the upper limits of personnel selection and classification (pp. 411–451). Mahwah, NJ: Lawrence
Erlbaum.
Ree, M. J., & Earles, J. A. (1992). Intelligence is the best predictor of job performance. Current Directions in
Psychological Science, 1, 86–89.
Roth, P. L., BeVire, C. A., Bobko, P., Switzer, F. S. III, & Tyler, P. (2001). Ethnic group differences in
cognitive ability in employment and education settings: A meta-analysis. Personnel Psychology, 54,
297–330.
Sackett, P. R. (1987). Assessment centers and content validity: Some neglected issues. Personnel Psychology,
40, 13–25.
Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., & de Fruyt, F. (2003). International validity generaliza-
tion of GMA and cognitive abilities: A European Community meta-analysis. Personnel Psychology, 56,
573–605.
Schmidt, F. L. (1992). What do data really mean? Research findings, meta-analysis, and cumulative knowledge
in psychology. American Psychologist, 47, 1173–1181.
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generaliza-
tion. Journal of Applied Psychology, 62, 643–661.
Schmidt, F. L., & Hunter, J. E. (1980). The future of criterion-related validity. Personnel Psychology, 33,
41–60.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psycho-
logy: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Schmidt, F. L., Ones, D. S., & Hunter, J. E. (1992). Personnel selection. Annual Review of Psychology, 43,
627–670.
Schmidt, F. L., & Raju, N. S. (2007). Updating meta-analytic research findings: Bayesian approaches versus the
medical model. Journal of Applied Psychology, 92, 297–308.
Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not validity and validity is not reliability.
Personnel Psychology, 53, 901–912.
Current Concepts of Validity, Validation, and Generalizability 123

Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta-analysis of validity studies published
between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology, 37,
407–422.
Society for Industrial and Organizational Psychology (1987). Principles for the validation and use of personnel
selection procedures (3rd ed.). Bowling Green, OH: Author.
Society for Industrial and Organizational Psychology (2003). Principles for the validation and use of personnel
selection procedures (4th ed.). Bowling Green, OH: Author.
Steel, P. G., Huffcutt, A. I., & Kammeyer-Mueller, J. (2006). From the work one knows the worker: A sys-
tematic review of challenges, solutions and steps to creating synthetic validity. International Journal of
Selection and Assessment, 14,16–36.
Stelly, D. J., & Goldstein, H. W. (2007). Application of content validation methods to broader constructs. In
S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leveraging existing validity
evidence. San Francisco, CA: Jossey-Bass.
Thorndike, R. M., & Lohman, D. F. (1990). A century of ability testing. Chicago, IL: Riverside.
Uniform guidelines on employee selection procedures. (1978). Federal Register, 43(166), 38290–39309.
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (2005). Is there a general factor in ratings of performance?
A meta-analytic framework for disentangling substantive and error influences. Journal of Applied
Psychology, 90, 108–131.
This page intentionally left blank
Part 2
Implementation and Management
of Employee Selection Systems
in Work Organizations
Jerard F. Kehoe and Robert E. Ployhart,
Section Editors
This page intentionally left blank
6 Attracting Job Candidates
to Organizations

Ann Marie Ryan and Tanya Delany

Recruiting is more complex today than it has ever been. Technology is assisting corporations in the
ability to find skilled, cost-effective talent in all corners of the world, enabling globally integrated
workforces. However, to be successful corporations need recruiting models that accommodate
growth markets and mature markets, entry and experience professionals, and cross-generational
applicants—twenty somethings, thirty somethings, etc. Corporations must also develop successful
recruiting strategies to secure hot skills or market value skills. Recruiting models must leverage
global best practices while complying with local legislation and managing local cultures. Recruiting
must involve ways to process candidates through hiring quicker than ever while managing greater
volumes of applicants than in the past.
The ability to attract individuals to work at organizations is a topic of perennial research inter-
est. Major reviews of the research on recruitment appear periodically (e.g., Barber, 1998; Breaugh
& Starke, 2000; Ployhart, 2006; Rynes, 1991; Rynes & Cable, 2003; Rynes, Heneman, & Schwab,
1980; Taylor & Collins, 2000), noting findings and lamenting a lack of progress in addressing many
questions (Saks, 2005). In this chapter, we look at current questions regarding applicant attraction
arising from the recent workplace trends noted above as a means of framing a practical research
agenda for the future. Specifically, we address what is known and what we need to know about
applicant attraction in light of globalization, advances in technology in recruitment, and organiza-
tional efforts toward more strategic talent management.
We have organized our review from a more traditional recruitment process perspective into the
three stages of reaching potential applicants, maintaining applicant interest, and securing offer
acceptance (Barber, 1998), recognizing that the same organizational practices may affect each of
these phases and that an applicant’s ability to separate these is not always great. We also have fore-
gone a study-by-study review, because these are already plentiful, in favor of a high-level integration
of key findings.

REACHING POTENTIAL APPLICANTS


Traditionally, human resource (HR) efforts at recruitment have placed a heavy emphasis on how to
create awareness of opportunities among desired potential applicants. Job seekers do have to have a
minimal level of attraction to give an organization consideration as an employer (Cable & Turban,
2001), and without this recruitment activities will not be attended to and hence not have influence.
Most of the research on generating interest in job openings relates to (a) how to provide information
and catch attention (i.e., recruitment sources, such as advertising and websites), (b) what informa-
tion to provide (e.g., how much specificity, how much realism, creating brand equity), and (c) how
information source and content affect decisions to pursue jobs. Some general conclusions can be
reached from this body of research.

127
128 Handbook of Employee Selection

NATURE AND SOURCE OF RECRUITMENT INFORMATION AFFECT


APPLICANT POOL QUANTITY AND QUALITY
Quantity and quality of the applicant pool are affected by the source of recruitment information and
the nature of the information (i.e., specificity, realism, and valence; e.g., Barber & Roehling, 1993;
Maurer, Howe, & Lee, 1992). Research on recruitment source effects on applicant pool quality has
yielded mixed results (Zottoli & Wanous, 2000), with questions raised as to whether source effects
are job- or industry-specific. Because job seekers often obtain information from multiple sources
(Vecchio, 1995) and the same source can be used by job seekers in different ways (Rynes & Cable,
2003), pinpointing specific source effects may be challenging. Still, the overall conclusion of this
research is that source can play a role in applicant initial attraction.

PRACTICE IMPLICATION
Companies today should have some working knowledge of what sources of information are
being viewed by applicants. For example, blogs are becoming a common way for applicants
(and employees) to share experiences and perceptions of a company and its hiring processes
in real time. A recommendation for all recruiters would be to “Google” your company’s name
to understand what information is available to the applicant. Read this information regardless
of the source to gain insight as to what image is being portrayed about your company.

Realities of the job and working within a company have been a particular research focus for
quite some time (see Phillips, 1998 and Premack & Wanous, 1985 for early reviews). Studies
suggest that the presentation of negative or realistic information will have differential effects on
different categories of job seekers [e.g., those with less prior job experience (Meglino, DeNisi, &
Ravlin, 1993); higher quality applicants (Bretz & Judge, 1998; Thorsteinson, Palmer, Wulff, &
Anderson, 2004)]. Further, Highhouse, Stanton, and Reeve (2004) found that negative informa-
tion about prospective employers is discounted more than positive information. However, Reeve,
Highhouse, and Brooks (2006) also showed that one negative piece of information can affect
the accumulated effects of multiple moderately positive pieces of information (i.e., the relative
balance of positive or negative information is not as important as the intensity of one’s affective
response to a piece of negative information). The overall conclusion of this line of research is
that although providing realistic information (and negative information) may adversely affect the
attraction of some desired applicants, its overall effect on applicant pool quality and quantity may
depend on the specifics of the job, industry, labor market, job seekers, and nature of the informa-
tion itself.

PRACTICE IMPLICATION
The challenge is for recruiters and hiring managers to acquire and perfect techniques that
balance the “sell” with the job and company “realities.” A suggestion for recruiters would be
to create a realistic job preview discussion guide that can be used by interviewers, recruiters,
and hiring managers. The guide could have competitive advantage talking points as well as
information on the realities of working in the company and in a particular role. Interviewers
should be taught how to best sell the company and how to set realistic expectations. Indeed,
recent rises in truth-in-hiring or fraudulent inducement lawsuits suggest that special care
be taken when making specific statements regarding time to promotion and variable pay
(Hansen, 2007).
Attracting Job Candidates to Organizations 129

Highhouse and colleagues have generated several studies integrating decision-making theory
with recruitment research that address issues of how to present information to potential applicants.
Some conclusions from these studies are as follows:

1. Adding more relevant information to job postings can lead to greater job attractiveness
perceptions (Yuce & Highhouse, 1997; see also Barber & Roehling, 1993 for similar
findings).
2. Lack of information (e.g., about pay) can lead to negative inferences and lower attractive-
ness perceptions (Yuce & Highhouse, 1997; however, see also Highhouse & Hause, 1995
and Maurer et al., 1992).
3. Suggesting vacancy scarcity or time scarcity (limited application period) can lead to more
positive employer image inferences (Highhouse, Beadle, Gallo, & Miller, 1998).

PRACTICE IMPLICATION
The practical advice is “more information is better.” Organizations should review job postings
to add position- and organization-relevant information. If appropriate, vacancy and/or time
scarcity should be emphasized.

Methodological shortcomings of research have been pointed out in many previous reviews; these
reviews typically end with a call for research designs that recognize the complexities involved in
attraction (e.g., consider multiple sources; consider specific information provided; consider context
such as labor market; consider job seeker experience; consider organizational-level antecedents and
outcomes, not just individual level). That is, research seldom focuses on more than one factor at a
time and their relative influences. One other shortcoming is that the focus on recent college gradu-
ates leaves a gap in knowledge of the generalizability of findings to other groups.

KNOWLEDGE OF THE COMPANY IS CRITICAL TO APPLICANT ATTRACTION


Cable and Turban (2001) described an applicant’s knowledge of the company as having three dimen-
sions: familiarity (awareness), reputation (global affective impression, brand), and image (attributes
associated with organization). They argued that these, in conjunction with a job seeker’s values
and needs, will determine attraction. For example, inferences regarding the organization’s values
influence attraction beyond instrumental aspects of the position, such as pay and location (Lievens
& Highhouse, 2003; Chapman, Uggerslev, & Webster, 2003). One interesting research perspective
is that of Highhouse, Thornbury, and Little (2007), who suggested that the connection of employer
knowledge to firm attraction is affected by the job seeker’s self-image. That is, individuals are
attracted to an organization if they feel that it invokes prestige and impresses others and/or allows
them to express their values. Researchers have explored what specific organizational image attri-
butes are perceived most favorably. For example, achievement, concern for others, honesty, and
fairness are seen as the most salient work values, and their effects on applicant behavior have been
established (Judge & Bretz, 1992; Ravlin & Meglino, 1987).
Although the traditional school of thought has been that familiarity leads to positive perceptions,
Brooks and Highhouse (2006) showed the fallacy in this line of thinking, noting that although
familiarity with an employer is a necessary prerequisite to organizational attraction, greater famil-
iarity beyond name recognition often results in ambivalence (i.e., positive and negative attitudes)
rather than increasingly positive perceptions (see Brooks, Highhouse, Russell, & Mohr, 2003 for an
empirical demonstration of this effect).
In summary, applicant attraction is affected by how familiar the organization is to applicants,
what its reputation is, and the image job seekers hold of the organization’s values and attributes.
130 Handbook of Employee Selection

PRACTICE IMPLICATION
Evaluate an organization’s image and reputation, contrast what is found with what is desired,
and then develop planned activities to affect images held by job seekers.

RECRUITMENT ACTIVITIES INFLUENCE KNOWLEDGE ABOUT THE EMPLOYER


Recruitment activities do influence knowledge of the employer, thus affecting attraction (Rynes,
Bretz, & Gerhart, 1991; Turban, 2001; Turban & Cable, 2003; Turban, Forret, & Hendrickson,
1998). The use of advertising affects organizational image and applicant pool quantity and qual-
ity (Collins & Han, 2004; Collins & Stevens, 2002) as does publicity and endorsements by outside
sources (Collins & Stevens, 2002). Although we know these effects occur, we need to conduct
more theory-driven research on the processes by which they occur if we wish to truly develop bet-
ter advice for organizations. Cable and Yu (2007) noted that it is important to go beyond knowing
that applicant impressions influence attraction to understanding how impressions are formed and
can be changed. They proposed that media richness (multiple cues, language variety, feedback, and
personal focus) and media credibility (expertise and trustworthiness) are particularly influential in
the formation of applicant beliefs regarding organizations and jobs.
Similarly, Rynes and Cable (2003) urged greater attention to ways to improve organizational
image. Recent research has focused on how recruiting advertisement may serve as a means to
overcome negative publicity/reputation (e.g., Van Hoye & Lievens, 2005). For example, Avery and
McKay (2006) reviewed several examples of organizational attempts to overcome negative publicity
regarding discriminatory practices through targeted recruitment activities. Research by Collins and
Han (2004) suggested that low involvement recruitment practices (exposure to organization with no
detailed information on company or job) will not affect the reputation of a well-known organization,
suggesting that overcoming negative reputations may be challenging.

PRACTICE IMPLICATION
Evaluate how recruitment and advertising practices might be used to enhance organizational
image. In particular, consider how to engage applicants via media that are rich and credible.

We now discuss generating applicant interest in light of workplace trends regarding globalization,
technology, and strategic talent management.

SOURCING GLOBALLY
It is critical for an employer to recognize that in a global market, there may be large differences in
familiarity, reputation, and image across regions; hence, recruitment activities may need to vary to
create the type and level of employer knowledge desired (i.e., a familiar organization with a positive
reputation in one locale may have to engage in more and different recruitment activities in another
locale). To tap the same caliber of talent in different locations may require different sourcing strate-
gies. For example, an Internet presence may greatly expand an organization’s reach, but, because
of differences in Internet access, it may lead to missing key talent sources in some locations. The
issue of the “digital divide” has been raised with regard to the use of technology in recruiting
(Cober, Brown, Keeping, & Levy, 2004), although evidence for negative effects of technology on
pool diversity in North America are mixed (see Russell, 2007 for a review). However, in terms of
global recruiting, access remains an important concern. Another example would be the need for
appropriate translations. Although the ubiquity of English as the language of business can lead to
Attracting Job Candidates to Organizations 131

questioning investments in translations, translations of recruiting and assessment materials will


affect which applicants are reached. A third example would be the market type: growth or emerging
markets versus mature markets. Growth markets often require dynamic strategies, whereas mature
markets may draw upon more traditional approaches to sourcing applicants. In summary, global
branding and advertising require synthesizing a desired global organizational image with awareness
of local customs, needs, and language.
One overarching concern with regard to globalization and recruitment is that we lack cross-
cultural research on the information processing strategies and needs of job applicants. Although
many of the theoretical perspectives that are being applied to the recruitment arena are general
theories of human behavior, there is a need to incorporate the vast body of research on culture and
communication (see Matsumoto, 2001 and Orbe, 1998 for reviews) to effectively consider source,
content, and other factors in generating applicant interest on a global basis. For example, does the
same information serve the same role in input into employer knowledge across cultures? Is negative
or missing information considered similarly across cultures? Does the influence of recruitment activ-
ity on employer knowledge vary by culture? Do certain cultures attend more to certain information
sources or types? For example, referral programs are particularly important outside of the United
States; one large multinational found that applicants in Latin American countries are more likely to
have learned of positions from family and friends than those in Asia-Pacific or European countries.
Further, we lack understanding of the generalizability of recruitment efforts for different job levels in
different cultures and markets. For example, sources that are effective in the recruitment of blue-col-
lar workers but not managers in one country may play different or even opposite roles in another.
Research on culture and recruitment should build from theories of the processes through which
recruitment activities have effects rather than adopting a simple between-country comparative
approach that is atheoretical in nature. Although actions such as presenting one’s organization as
environmentally responsible or as prestigious (e.g., Fortune ranking) would seem to increase the
likelihood of being viewed favorably, one might consider how a cultural lens might affect what
is seen as impressive or what values are seen as socially approved. Indeed, we would posit that
although factors affecting what is considered prestigious (e.g., organization rankings, high pay)
might be similar across cultures, there may be some differences in what is associated with respect-
ability (e.g., what organizations are seen as good and honorable). Another example would be to
consider media richness and media credibility as potentially influenced by culture, or their effects
on attraction as moderated by culture.
A second concern is the need for cultural adaptation in recruitment. The literature on culture
and marketing is fairly extensive (Hermeking, 2005) and has established the need to consider cul-
tural receptivity as well as socioeconomic conditions in designing advertising campaigns (Karande,
Almurshidee, & Al-Olayan, 2006); hence, it is no surprise that organizations recognize that recruit-
ment activities and materials may need to vary by country. For example, Baack and Singh (2007)
demonstrated that individuals prefer websites that are culturally adapted (i.e., changing content
and presentation to fit cultural values of a target country), a principle that has permeated market-
ing communications in print and advertising (see also Fletcher, 2006). At first blush, this appears
to fly in the face of “global branding,” in which one website and “one image” projected is the goal.
However, we contend that organizations can work to project a global brand image while also mak-
ing appropriate cultural adaptations. For example, an organization that wants to project an image
of caring for the customer can provide culturally appropriate examples of showing good customer
service in each location or show photographs of individuals of different nationalities depending on
location (Brandel, 2006).
A third overarching concern in global recruiting is how best to attract a diverse workforce.
In the past several years, how to attract a diverse applicant pool has been the focus of consider-
able research (see Avery & McKay, 2006 for a review). However, almost all of this research has
focused on the role of managing diversity policy statements (e.g., Equal Employment Opportunity,
affirmative action) on attraction in North American samples (e.g., Saks, Leck, & Saunders, 1995;
132 Handbook of Employee Selection

Williams & Bauer, 1994). Harrison, Kravitz, Mayer, Leslie, and Lev-Arey (2006) provided a meta-
analytic review of this research and suggested that the use of less prescriptive affirmative action
plans (i.e., not preferential treatment), communication of detailed features, and providing justifica-
tions for use are likely to result in the most positive reactions across the applicant pool. Initially, it
might seem that this research would readily generalize globally. However, Myors et al. (2008) noted
that the United States is somewhat of an outlier in its prohibition on quotas and within-group stan-
dards. The greater prevalence of preferential treatment in hiring outside of the United States means
that one needs to consider how culture and economic conditions will affect applicant perceptions of
policy statements and hiring practices.
Researchers have specifically considered how the portrayal of a diverse workforce in recruitment
advertisements affects attraction (e.g., Avery, Hernandez, & Hebl, 2004; Perkins, Thomas, & Taylor,
2000). For example, Avery (2003) found complex reactions for Blacks and Whites to ad diversity,
depending upon the individual’s other-group orientation as well as whether the ads portrayed more
or less racial vertical integration. Despite some complexity in findings, the general advice has been
that ads depicting greater diversity should be used (Avery & McKay, 2006). On a global basis, this
would extend beyond considerations of racial/ethnic and gender diversity to diversity of culture (e.g.,
through dress). Vertical integration of the organization by individuals of different nationalities may
also affect attraction across countries.
Table 6.1 outlines some suggested research directions related to globalization and attracting
applicants.

TABLE 6.1
Reaching Potential Applicants—Unanswered Research Questions
Globalization Technology Strategic Talent Management
Are there differences in recruitment source Does the use of technology in How do organizational recruitment
effects across countries because of recruitment only influence the activities vary according to
differences in economies and cultures? “how” and “amount” of strategic talent management
information delivery to potential objectives?
applicants, or does it also alter
“what” attracts applicants?
Does the role of various attributes of How can individual differences in To what extent do strategic ad
information (e.g., valence, specificity) in what attracts be leveraged to placements affect recruiting
attraction vary by culture and economic customize messages in outcomes differently than less
conditions? recruitment? targeted advertising?
How do differences in familiarity, reputation, How does job seeker web search How do policy/benefit emphases in
and image across countries affect an behavior influence how to best attraction of targeted talent affect
organization’s ability to attract applicants? attract applicants? recruiting outcomes for targeted
groups?
How does research on U.S.-based preferential What are the most effective, How can source-of-hire
treatment and diversity policy statements innovative uses of technology for information be used effectively in
apply to attraction of applicants globally, generating interest among targeted recruitment?
where different norms and standards apply high-quality applicants?
with regard to hiring?
Are the effects of media richness, media How does technology’s role in What is a cost-effective level of
credibility, and specific recruitment attraction differ for different job customization in recruitment?
activities moderated by culture? levels and applicant pools?
When and how should recruitment materials How do targeted individuals react
and activities be culturally adapted? to customized versus
noncustomized messages?
How does vertical integration by nationality
affect attraction on a global basis?
Attracting Job Candidates to Organizations 133

PRACTICE IMPLICATION
Organizations need to evaluate sourcing strategies in different markets and culturally adapt
recruiting materials as needed. What are considered valued organizational attributes in differ-
ent cultures should be considered in developing recruiting strategies and materials.

TECHNOLOGY AS A MEANS OF IMPROVING REACH


In their review on Internet recruiting, Lievens and Harris (2003) noted that technological innova-
tions in recruitment since the late 1980s include the pervasive use of company websites, job boards,
applicant tracking systems, and various e-recruiting approaches (e.g., using the Internet to identify
potential applicants, using e-mail and other presence on the Internet to contact passive applicants or
potential applicants who have not expressed an interest, and using the Internet to assess applicants).
Some uses (corporate websites) have received more research attention than others (job boards), per-
haps related to growing practical concerns about the efficiency of job boards as generators of quality
applicants and the relative investments required by organizations. In general, technology, and in
particular the Internet, has facilitated the capabilities of recruiting functions to reach more potential
applicants in less time and for less money; that is, technology exponentially enhances the efficiency
of recruiting (Lievens & Harris, 2003). Internet use in recruiting reduces differences between large
and small firms in recruiting activities (Hausdorf & Duncan, 2004). Technology can also facilitate
the identification of particular talent pools (e.g., listservs and other subscriber groups and sites as
sources), the tailoring of materials to particular target groups (e.g., different web content depending
on answers to a set of questions regarding interests/values), and the inclusion of more informa-
tion than traditional advertisements (as noted above). However, a survey of HR professionals by
Chapman and Webster (2003) found that most reported only “moderate” results from technology
implementation, with many noting that expanded reach did not translate into higher quality appli-
cants. Organizations have also noted the downside of using technology in the recruiting process,
such as making it easier for applicants to apply to positions regardless of qualifications, creating a
greater pool of qualified and unqualified applicants that recruiters must sift through.
Rynes and Cable (2003) noted that recruitment research before 1990 focused on a narrow set
of practice-driven research questions (e.g., What are the best sources for quality applicants? What
characteristics make for effective recruiters?) and that there was a need for greater cross-level
research and inductive theory building. However, the advances in technology since 1990 have led
the predominant focus of recruitment research in the past several years to be on narrow, practical
questions related to technology and attraction (e.g., web design features and attraction) with only a
few studies attempting to link practical issues to theories of attraction.
Key findings in this line of research are that website content and usability play important roles
in attraction (Braddy, Thompson, Wuensch, & Grossnickle, 2003; Cober, Brown, Levy, Cober, &
Keeping, 2003), but website aesthetics are also fairly important (Cober, Brown, & Levy, 2004;
Dineen, Ling, Ash, & DelVecchio, 2007; Zusman & Landis, 2002). Cober, Brown, and Levy (2004)
noted that the interaction of form, content, and function is essential (i.e., good content has to be
presented in an interactive, navigable, and pleasing way). Similar advice comes from Cable and Yu
(2007), who found most recruitment web pages were low in media richness. Technological advances
do not appear to alter conclusions of prior research regarding what influences attraction (although
we see this as an under-researched question) but do afford organizations greater and unique oppor-
tunities to influence employer knowledge and to provide more information in much more efficient
and effective ways.
An example of an innovative use of technology to generate applicant interest is through forums
such as Second Life, a three-dimensional virtual world in which individuals can meet people,
134 Handbook of Employee Selection

participate in activities, and purchase virtual property and goods. Because Second Life provides
access to a wide range of individuals who are geographically dispersed, recruiting in Second
Life is taking place. For example, KPMG conducted an online fair, allowing reach of many more
applicants than a traditional job fair (Flinders, 2007). Other approaches include organizations
such as Merrill Lynch’s hosting webinars to efficiently reach multiple people in multiple locations
(Mullich, 2004) and organization involvement in electronic gaming as a means of garnering attrac-
tion. Cosmetics maker L’Oreal bypassed the traditional visits to top business schools in search of
talent via an online competition. More than 30,000 students from over 110 countries logged on,
and it is now one of the world’s largest business planning contests (see http://www.estrat.loreal.
com/_int/_en/whatis.aspx?).
In the past, considering individual differences in reactions to recruitment materials and selec-
tion processes was seen as less useful from a practical perspective because developing different
content and processes for different types of prospective applicants was seen as resource-prohibitive.
Technology allows for a much greater capability for differences in what applicants are provided,
and thus there is renewed interest in individual differences. Customizing messages can occur in
terms of the characteristics of those who frequent a particular website, LISTSERV, or job board.
Customization can also occur in the specific types of messages sent to various groups.
Research on customization of messages is fairly nascent. In a general vein, researchers have
compared websites that are more recruiting-oriented to those that are more screening-oriented (Hu,
Su, & Chen, 2007; Williamson, Lepak, & King, 2003). Although some of this research suggests
that more recruiting-oriented websites generate greater attraction, several studies have shown that
providing self-screening information (e.g., assessments of fit with the position in terms of abilities,
values, and needs) is seen as particularly valuable by applicants and directly affects variables such
as information recall and site viewing time. (e.g., Dineen, Ash, & Noe, 2002; Dineen et al., 2007).
Given that fit perceptions influence self-selection (Cable & Judge, 1996; Kristof-Brown, Zimmerman,
& Johnson, 2005), enabling accurate self-assessment of fit can be a vital role for websites.
Another individual difference consideration with regard to technology and recruitment is vari-
ability in job seeker web search behavior (Cober, Brown, Keeping, & Levy, 2004). That is, consider-
ation of the amount of time spent on information accessed, number of different components or sites
accessed, and total time spent on web searching may be important in assessing which applicants are
best reached through these media. Cober et al. (2004) focused on web search behavior as a media-
tor of website effects on attraction to a specific organization; extending this research to the broader
question on how applicants avail themselves of recruitment information would be a useful extension
of earlier research on recruitment sources.
Table 6.1 (p. 132) provides a summary of research directions on technology and applicant
attraction.

PRACTICE IMPLICATION
In implementing technology in recruitment, consider content, usability, and aesthetics in
conjunction with one another. Organizations that develop innovative uses of technology set
themselves apart from competitors. Messages should be customized where appropriate, and
applicants should be provided with self-screening opportunities.

CONSIDERING STRATEGIC TALENT MANAGEMENT


At a basic level, organizations have long been interested in recruitment strategy. Breaugh and Starke
(2000) noted that the establishment of objectives is the first phase of the recruiting process, with
organizations planning a strategy for attracting talent on the basis of these objectives. Current
approaches to talent management in organizations place recruitment activities under a broader
Attracting Job Candidates to Organizations 135

umbrella of talent management practices and emphasize alignment of recruitment practices with
strategic goals (Lewis & Heckman, 2006).
One of the keys to strategic talent management is to identify which talent pools are pivotal where
small gains in applicant quantity or quality will yield large returns (Boudreau & Ramstad, 2005;
Lewis & Heckman, 2006). Researchers have shown that differences in applicant pool quality have util-
ity in that hiring higher quality applicants has benefits for the organization (Connerley, Carlson, &
Mecham, 2003). One then has to identify which practices might lead to achieving these gains.
We can envision ways in which the topic of generating applicant interest might be approached
with a consideration of the organization’s strategy for talent management. For example, which tal-
ent pools are the organization most concerned about retaining and, therefore, for which applicant
pools should concerns about applicant quality predominate? Which talent pools has the organiza-
tion determined to be ones for greater investments in development internally and how might that
affect recruitment targets? Although it is easy to see how such questions can serve in a practical way
to ensure that organizational resources for recruitment are directed in keeping with strategy, it also
may be important to consider an organization’s strategy in determining what they consider recruit-
ment effectiveness (i.e., for a given job type, one organization’s strategy might suggest that generat-
ing strong interest among high talent is critical, whereas another’s might suggest that applicant pool
quality does not have to be as strong).
Organizational strategies for recruitment are likely to vary based on context. For example,
Hallier (2001) discussed how recruitment for Greenfield sites might seek to attract a different type
of applicant than might be traditionally sought for individuals in similar jobs at other locations. As
another example, Taylor (2006) pointed to the need for better research on recruitment strategies for
small firms.
Strategic ad placement to attract specific talent pools has often been discussed in the context of
recruiting a more diverse applicant pool (see Avery & McKay, 2006 for a review) but can be consid-
ered more broadly as a talent management practice. For example, an organization concerned with
attraction of a specific talent pool may think about ad placement in terms of reaching that particular
pool. Technology has enabled location- or context-based advertising on the web in innovative ways
(e.g., placing job ads next to specific types of just-released content on websites).
Similarly, whereas targeted recruitment messages to attract minority applicants, targeted cam-
puses in recruitment, and targeted job fairs have all been discussed in terms of attraction of minority
applicants (Avery & McKay, 2006), one can envision other targeted messages and targeted sourcing
depending on other talent pool targets. For example, organizations can send customized text mes-
sages to potential applicants at certain universities or belonging to certain professional organiza-
tions to advertise openings or events. Cisco has held focus groups with ideal recruitment targets to
uncover how they spend free time and websites they visit to better target their recruitment efforts
(e.g., recruiters attend art fairs, microbrewery festivals, and home and garden shows; advertisements
are placed at local movie theatres; Nakache, 1997). The company has also used a “friends program”
in which prospects are matched with current employees with similar background and skills who call
to discuss what it is like to work there (Nakache, 1997). In the Dineen et al. (2007) study mentioned
earlier, customizing the message or providing feedback regarding organizational fit likelihood was
found to increase the attraction of qualified applicants and decrease the attraction of unqualified
individuals. Obviously, although customization and provision of feedback can be more costly to
initiate, the long-run cost-savings of creating a more targeted applicant pool are apparent. Cober,
Brown, Keeping, and Levy (2004) noted that we need more research on the effects of such messages
on intended targets as well as what the unintended effects are on those outside of the target group.
Although research on how organizational policies and benefits affect recruitment outcomes is
not new (e.g., job security rights, Roehling & Winters, 2000; salary negotiations, Porter, Conlon &
Barber, 2004; work-family balance initiatives, Nord, Fox, Phoenix, & Viano, 2002; affirmative
action policies, Harrison et al., 2006), strategic talent management suggests a direct tie of policy/
benefit promotion to potential targeted applicant groups to success in recruiting those groups. For
136 Handbook of Employee Selection

example, companies interested in attracting women into occupations where they are underrepre-
sented (e.g., engineering) may emphasize work-life initiatives; however, surprising little empirical
research has actually examined work-life initiatives’ impact in attraction (Carless & Wintle, 2007;
Casper & Buffardi, 2004), let alone in offer acceptances.
Applicant tracking systems (ATS) allow for better determining which sources are yielding
desired applicants (e.g., track location based advertising). However, source-of-hire tracking can be
of poor quality if insufficient attention is paid to quality of tracking tokens or to developing drop-
down self-report boxes of sources that are specific and easy for applicants to use (Ruiz, 2007).
Cable and Turban (2001) stated that, “There are not recruitment ‘best practices’ across firms”
because firms need to assess what employer knowledge their target market holds before developing
a recruitment strategy. Approaching recruitment from an integrated talent management perspective
suggests a strong shift away from examining what works best across applicants to what works best
for specific targets of interest in specific contexts. This is hardly a new suggestion—it has been made
by Rynes (Rynes, 1991; Rynes & Cable, 2003; Rynes et al., 1980) in all of her reviews—but the cur-
rent zeitgeist with regard to strategic talent management may increase the likelihood of the research
shift. Once again, Table 6.1 (p. 132) summarizes research suggestions on attracting targeted talent.

PRACTICE IMPLICATION
Organizations need to identify pivotal talent pools and use strategic ad placement, targeted
messages, and targeted sourcing for those pools. A quality ATS to monitor, evaluate, and
evolve recruiting efforts is key to success.

MAINTAINING INTEREST
Although it is clear that many of the factors that relate to generating interest (e.g., organizational
image, job attributes) relate to maintaining interest, there are several research topics concentrated
primarily on keeping applicants in the pipeline once they have applied. Some general findings are
discussed in the following sections.

APPLICANTS PREFER RECRUITERS THAT TREAT THEM WELL AND ARE INFORMATIVE
Applicants prefer and react more positively to recruiters who treat them well and are informative
(see Breaugh & Starke, 2000 or Rynes & Cable, 2003 for reviews of this research).
In other words, applicants seek informational and interactional justice (Bell, Wiechmann, &
Ryan, 2006) as they proceed through the hiring process. However, it is important to note that the
general consistent consensus is that the job and organization are more important than the recruiter
(Chapman, Uggerslev, Carroll, Plasentin, & Jones, 2005).
In our view, advice to organizations on this topic remains pithy: treat applicants nicely, make sure
recruiters know the jobs for which they are recruiting, and train recruiters. It seem that research-
ers should focus more on the micro level of interactions between recruiters and applicants to better
inform training as to what causes affective shifts among applicants. For example, we need to under-
stand better where the balance lies between technology-enabled processes and high-touch recruit-
ment practices—at what point do applicants perceive less high touch and become less satisfied?

PRACTICE IMPLICATION
Recruiters need to be carefully selected. Training and enabling of recruiters requires
investment.
Attracting Job Candidates to Organizations 137

RECRUITER DEMOGRAPHICS DO NOT CLEARLY INFLUENCE APPLICANT ATTRACTION TO A JOB


Recruiter demographics do not seem to have a clear and consistent influence on applicant attrac-
tion to the job (Avery & McKay, 2006; Chapman et al., 2005; Rynes, 1991). Research on relational
demography in recruitment suggests at best a mixed picture (e.g., Cable & Judge, 1996; Turban &
Dougherty, 1992), in part because many studies have not used appropriate methodologies (Riordan
& Holiday-Wayne, 2008). Further, as we noted earlier, research designs that consider the relative
influence of multiple factors (e.g., influence of recruiter-applicant similarity relative to pay, location,
promotional opportunities, etc.) are not prevalent.
McKay and Avery (2006) suggested that both encounter demographics (e.g., the vertical
integration of minorities in the organization and in the community) and the quality of the
interaction between groups, not just recruiter demographics, will affect applicant perceptions
of organizational diversity climate and subsequent job acceptance intentions. These research-
ers also noted that there is likely significant within-group variance among minority job seek-
ers in reaction to these factors, depending on applicant racioethnic identity, social dominance
orientation, and other group orientation (McKay & Avery, 2006). Most important, they noted
that depictions of diversity in advertising that might suggest a positive diversity climate will
be seen as just window dressing if site visits do not live up to the advertised. In sum, we need
to move from examining demographics of one-time, dyadic campus recruitment interviews to
understanding the process of how applicants draw inferences regarding organizational diver-
sity climate as they proceed through the recruitment process. As cross-border hiring becomes
part of the landscape of recruiting, researchers will need to continue to expand their defi nition
of diversity and examine how applicants draw inferences regarding whether an organization is
a truly global employer.

PRACTICE IMPLICATION
Rather than focusing on recruiter-applicant similarity as critical, efforts should be directed
at finding out applicant views of organizational diversity climate, where those views derive
from, and how they are influenced by organizational recruitment activities.

SITE VISITS AFFECT CHOICE


Site visits affect eventual choice (Rynes et al., 1991; Turban, Campion, & Eyring, 1995). Although
obviously more geared toward the graduating college student recruitment, studies on site visits
suggest the importance of face-to-face encounters with those at the work site. Breaugh and Starke
(2000) noted that despite awareness of this, little research has actually focused on the details of site
visits to guide HR practitioners in what truly makes a difference. IBM’s Talent Management Team
found that work environment was ranked as the most influencing factor in determining acceptance
of an offer after site visits. McKay and Avery (2006) noted the importance of community char-
acteristics beyond the worksite. As noted earlier, moving toward more theory-driven research on
impression formation and change should inform our knowledge of what makes for more effective
site visits.

PRACTICE IMPLICATION
Site visit features (i.e., how time is spent) need to be investigated in terms of which ones relate
to actual offer acceptance.
138 Handbook of Employee Selection

SELECTION PROCEDURES AFFECT APPLICANT JOB ATTRACTION AND INTENTIONS


Applicants do react to the selection procedures they encounter, and this does affect attraction and
job pursuit intentions (see Hausknecht, Day, & Thomas, 2004 for a meta-analysis). The practical
advice emanating from this research is to make screening procedures job-related, provide for two-
way communication, and attend to interpersonal treatment of applicants. Further, delays in the
recruiting process do indeed affect applicant perceptions (e.g., Rynes et al., 1980); thus, knowing
applicant expectations regarding timing of communications is critical.
One major lament in this area has been that so few studies examine actual applicant job choice
(Ryan & Ployhart, 2000), so it is hard to determine whether dislike of certain selection procedures
really makes a meaningful difference in the end. However, on the basis of a 2007 applicant reaction sur-
vey conducted by University of Connecticut and IBM, experience during the hiring process was ranked
12th of 15 factors in choosing to accept an offer. More influential were factors such as opportunity for
advancement, salary, work/life balance, and type of work. Further, when asked about willingness to
complete selection procedures during the hiring process, applicants were favorable to participating in
face-to-face interviews (93.7%), phone interviews (79.7%), employment tests (68.3%), exercises and
case studies (53.9%), and assessment centers (52.2%). This would imply that candidates, although they
may not enjoy being subjected to selection procedures, are accepting of most selection procedures.
Echoing a point already made, research designs need to consider the relative influence of perceptions of
procedures in light of the many other factors (e.g., pay, location) that may be influencing attraction.
It is important to keep in mind that research clearly shows that the outcome received by an appli-
cant plays a critical role in how the hiring process is perceived. Individuals do react more negatively
to rejection, regardless of how well such a decision is couched, and in any selection context several
people will be left with negative outcomes and many may make attributions to the process rather
than to themselves. Organizations can lessen negative perceptions of processes, but probably cannot
entirely remove rejected applicants’ negative feelings.

PRACTICE IMPLICATION
Data on perceptions of the hiring tools used should be regularly gathered and monitored;
however, conclusions should be couched in terms of the many other factors exerting influence
on applicant decision-making.

REASONS FOR SELF-SELECTION OUT OF HIRING PROCESSES VARY


Not all self-selection is adverse and not all is preventable (Bretz & Judge, 1998; Ryan, Sacco,
McFarland, & Kriska, 2000). Research on why individuals initially express an interest in a job but
then stop job pursuit seems like it would be essential to understanding what works in recruitment;
however, researchers more often focus on decisions (or intentions) to apply than on who self-selects
out along the way. Research in this area suggests varied reasons for dropping out (e.g., stay with
current job, took another job), with some research suggesting those self-selecting out may be more
similar to ultimately unsuccessful applicants (Ryan et al., 2000), at least attitudinally; hence, self-
selection can be positive.

PRACTICE IMPLICATION
Rather than passively accepting self-selection rates, organizations should investigate reasons
for self-selection to uncover whether any causes are problematic or are leading to a loss of
desirable applicants. Also, desirable self-selection should be facilitated through tools that
enable assessment of job and organization fit.
Attracting Job Candidates to Organizations 139

TABLE 6.2
Maintaining Interest—Unanswered Research Questions
Globalization Technology Strategic Talent Management
How do cultural and economic differences What is the relative balance What factors differentially affect
affect perceptions of the interactional and between high-touch and high-tech the interest of high-quality from
informational justice of encounters with in maintaining attraction? low-quality applicants?
recruiters?
Do recruiter/applicant similarity effects exist How effective are innovative How do competitor recruitment
on features such as culture or nationality? technologies at maintaining practices affect organizational
applicant interest? ability to maintain interest?
What is the role of culture in reactions to What features of Internet-based Do recruiter characteristics have
selection tools? testing lead to more positive or differential effects on targeted
more negative reactions? versus nontargeted individuals?
Which characteristics?
How do applicants form inferences of How can organizations stimulate Are there differences in self-
organizational climate for global inclusion? positive and counteract negative selection rates and reasons for
word-of-mouse? targeted and nontargeted
individuals?
What are cultural and economic influences on How does organizational use of How do changes in recruitment
the relative roles of pay, promotion innovations in technology in activities to target applicant pools
opportunities, job security, signing recruitment affect job offer affect subsequent training and
inducements, social influencers, and acceptance? socialization activities and other
work-life balance in job offer acceptance? HR practices?
What factors influence relocation decisions?
How does organizational off-shoring and
outsourcing affect applicant perceptions of
job security and how does that affect offer
acceptance?

Trends toward globalization, changes in technology, and increased strategic talent management
suggest some new directions for research on maintaining applicant interest. These are listed in
Table 6.2 and described below.

MAINTAINING INTEREST AROUND THE GLOBE


Maintaining applicant interest may entail similar concerns and processes globally, but the rela-
tive effectiveness of recruiting efforts may vary with local conditions. First, job market variability
across nations likely will affect the number of alternatives that applicants have; how willing/able
individuals are to experience delays; and hence, self-selection rates. Second, what constitutes a
warm recruiter treating individuals fairly may not be the same in different regions, because cultures
differ in beliefs regarding the appropriateness of assertive behavior in interviews (Vance & Paik,
2006). As for recruiter demographics, research focused on U.S. ethnic diversity (McKay & Avery,
2006) may or may not have implications for global diversity considerations, because reasons for
disadvantages in the labor market vary from locale to locale (e.g., immigrant status, Aboriginal
group, language difference, religious differences). We also know little about how applicants form
impressions of opportunities in multinational corporations (MNCs) for those of their national origin
and an organization’s climate for global inclusion.
There has been some research attention to the role of culture in applicant reactions to selection
tools, but most of this has been research comparing reactions by country rather than driven by
theories of cultural differences, a problematic approach because within-country variance in cul-
tural values is ignored (Triandis, 1995). Further, the findings of these studies have not evidenced
140 Handbook of Employee Selection

any strong consistent pattern of relations between type of selection tool and applicant reactions that
indicates particular cultural values as key to reactions (see Anderson & Witvliet, 2008 for a review).
For example, Ryan et al. (2009) found that similarities outweighed differences in perceptions of
selection tools where cultural values were directly assessed and that country gross domestic product
(GDP) was a greater contributor to variance in perceptions at the country level than were national
standings on cultural values.

PRACTICE IMPLICATION
There is a need to consider how job markets affect the ability to maintain interest, and mecha-
nisms for doing so need to be implemented in hot and/or valued skill markets. Cultural influ-
ences should be considered throughout selection process design.

TECHNOLOGY’S ROLE IN MAINTAINING INTEREST


In terms of maintaining interest, advances in technology can enable greater, continued contact
with applicants (e.g., e-mail updates on status, providing information). For example, Accenture
sends customized eCards to prospective hires (Mullich, 2004). Applicant tracking systems can
provide faster feedback, make the recruiting process more collaborative for applicants, provide
more information to applicants regarding positions available worldwide, and allow assessments
to be efficiently integrated and streamlined. At some point in the process, individualized contact
is important, even if that individualized contact is technology-enabled. However, organizations
must be diligent in understanding their applicant pools’ preferences and manage technology-
enabled communication accordingly. When asked about preferences for communication modes,
applicants applying to IBM in 2007–2008 stated e-mail and telephone are still the two main
preferred methods of contact after an application is submitted. Higher preference for e-mail in
all regions except Asia-Pacific and southwestern Europe was found, reinforcing the earlier refer-
enced concept that tapping the same caliber of talent in different locations may require different
sourcing strategies.
Just as technology can help maintain interest, questions have arisen as to potential negative
effects on retaining applicants. For example, Anderson (2003) noted that although most research
suggests reactions to Internet-based testing is positive, the design of the specific tool itself is prob-
ably most critical to its acceptability. A study at Proctor & Gamble (P&G) found that messaging
around assessments must be well-crafted to ensure acceptability (Gibby & Boyce, 2007). However,
we would argue that these same points apply to paper and pencil tools and are not unique to the
acceptability of technology-enabled assessments. One must consider the target audience when
leveraging technology. For some, including executives and passive applicants not willing to jump
through hoops, technology can be viewed as cold and inhuman. Hot skills or market-valued, skilled
applicants are opting to work with “agents” (headhunters) throughout the recruiting process ensur-
ing a high-touch experience.
Technology also allows job seekers to obtain a great deal of independent information about orga-
nizations from blogs, chat rooms, and other postings. Van Hoye and Lievens (2007) examined how
this “word-of-mouse” affected applicant attraction, noting how the inability to control company-
independent information makes it more difficult to ensure recruitment messages get across. They
found that web-based independent sources are seen as more credible than employee testimonials on
company websites. Van Hoye and Lievens emphasized how important it is for organizations to try
and stimulate positive word-of-mouse.
Less research has focused on the use of technology in interviewing, but what is available sug-
gests that videoconference interviews and telephone interviews may not be perceived as positively
Attracting Job Candidates to Organizations 141

as face-to-face encounters (Chapman & Rowe, 2002; Chapman, Uggerslev, & Webster, 2003;
Silvester & Anderson, 2003; Silvester, Anderson, Haddleton, Cunningham-Snell, & Gibb, 2000;
Straus, Miles, & Levesque, 2001).
Some researchers have questioned whether interested individuals might be reluctant to complete
applications and screening tools on the Internet because of privacy concerns (Lievens & Harris,
2003). Although there are a few studies that suggest privacy concerns relate to willingness to use
Internet-based recruitment tools (Bauer et al., 2006; Harris, Ispas, & Mano, 2007; Harris, Van Hoye,
& Lievens, 2003), there is no compelling evidence that applicants are unwilling to complete online
applications in actual practice. However, some popular technological innovations may raise privacy
concerns if not appropriately managed. For example, the practice of tagging and tracking visitors
to corporate career websites and then deploying company ads in other web locations they visit to
maintain a company’s presence in the job seeker’s mind has increased (Ruiz, 2008). Ensuring that
personal data on the individual are not captured and maintained as well as ensuring that pervasive
advertising does not turn off applicants is important.
Technology by itself is neither good nor bad. Technology has great potential in the recruiting
process, but the integration points on where it helps and hurts the recruitment process are far from
being understood.

PRACTICE IMPLICATION
Investigating applicant pool preferences with regard to technology can lead to appropriate
adaptations that are targeted to those applicants. Attending to how technology-enabled pro-
cesses are messaged to applicants is also important. Organizations must combat negative and
stimulate positive “word of mouse.”

MAINTAINING THE INTEREST OF TARGETED TALENT


Of particular importance in strategic talent management is uncovering when and why differ-
ential reactions to recruitment activities occur for those with higher potential and/or greater
alternative opportunities. Several studies have documented that students with a high grade point
average (GPA) react more negatively to recruiting delays (Rynes et al., 1991). Indeed, the gen-
eral evidence is that high-quality applicants react more critically to negative information (Bretz
& Judge, 1998; Connerley & Rynes, 1997). Similarly, the applicant reaction literature often
contains the suggestion that high-quality applicants react more negatively to certain types of
assessments, although there is actually no clear research support of such assertions (Hausknecht
et al., 2006). However, it is important to note that high-quality applicants may differ from
low-quality applicants in reactions to specific aspects of the process yet uninvestigated, such
as the amount of high-touch recruitment practices used or vertical integration of women and
minorities.
Given this evidence and the importance of focus on high-quality applicants, the role of alter-
native job opportunities is surprisingly ignored in many research studies as previous reviews
on recruitment have pointed out. Maintaining interest must consider not only the organization’s
efforts (i.e., Do we have warm, friendly, diverse, trained recruiters? Are our screening and assess-
ment procedures face-valid?), but also what is happening with competitor organizations (i.e., Are
their recruiters warmer, friendlier, more diverse, better trained? Are their procedures more face
valid?).
In terms of other specific targeted characteristics (e.g., demographic groups, individuals with
certain skill sets), one can also envision differential findings for the effects of recruiter warmth,
informativeness, and demographic characteristics on attraction for targeted versus nontargeted
individuals. Site visit features of importance might differ as well (e.g., McKay & Avery’s [2006]
142 Handbook of Employee Selection

propositions regarding minority job seekers and influences in site visits). Researchers may be less
interested in overall applicant pool self-selection effects and need to drill down on specific reasons
for self-selection of different target groups.

PRACTICE IMPLICATION
Investigate whether high-quality applicants react differently to recruitment activities than
low-quality applicants and then adapt to their preferences. Focusing efforts on high-quality
applicants will lead to a more effective recruitment process, but should be coupled with efforts
to dissuade low-quality applicants in effective and nonoffensive ways.

ACCEPTING OFFERS
The ratio of job acceptances to offers is considered an important indicator of recruiting success for
many organizations (i.e., Do the ones we want want us?). The factors mentioned earlier as affect-
ing attraction are sometimes not as critical to an acceptance: Individuals gather more information,
eliminate options on those factors, and change their criteria as they proceed through the job search
process. Prior reviews have noted that weak methodologies (e.g., examining intent to accept at early
stages of the process) have clouded findings on what affects decisions to accept offers. However,
several general conclusions have emerged.

PAY AND PROMOTION OPPORTUNITIES


Pay and promotion opportunities are important job features in applicant decision-making (e.g., see
Rynes & Cable, 2003 for a review of the pay/attraction literature). Studies exploring “what is most
important” to offer acceptance have been criticized for not creating choice tasks that reflect the
informational and motivational context of a job applicant considering an actual offer. Hence, for a
long time the important role of factors like pay and promotional opportunities in job offer accep-
tance was not completely clear, but better methodologies that use greater complexity in information
provided have reinforced their importance.
Organization characteristics are stronger predictors of acceptance intentions than recruiter
characteristics, perceptions of the hiring process, or other variables (Chapman et al., 2005).
However, we know that applicants make decisions in stages, fi rst screening incompatible options
and then choosing from among surviving options (Beach, 1993), but researchers seldom use
a design that affords for this stage processing. Hence, job choice often is not well predicted
because of issues associated with not considering time of measurement (i.e., range restriction on
key variables, applicant motivation and information levels). We echo an earlier point regarding
the need for research on recruitment for different types of jobs; factors influencing job choice
of recent college graduates may not generalize to job seekers with less education or more job
search experience.

TIMELINESS
Timeliness of offer is important (Rynes et al., 1991). Pressures by hiring managers to speed up the
recruitment process are not without empirical backing, because one can lose desirable individuals
with delays. However, Rynes and Barber (1990) noted that although offers in hand are generally
favored over uncertain possibilities from other organizations, this likely varies with quality of
applicant as competitive individuals can afford to wait longer but also may be “snatched up”
sooner.
Attracting Job Candidates to Organizations 143

PRACTICE IMPLICATION
Evaluation of offer acceptance influencers needs to be made with data gathered later in the
process. Data gathered earlier can be used to assess influences on attraction and maintaining
interest but should not be considered as necessarily accurate when it comes to influences on
offer acceptance.

INDUCEMENTS
Inducements can make a difference (Rynes & Barber, 1990). Organizations use signing bonuses
routinely in tight labor markets for particular job categories (e.g., nursing). For example, the
U.S. Army began offering “quick ship” bonuses of $20,000 for recruits who agree to report
for training within 30 days (Shanker, 2007). Although inducements certainly affect offer
acceptance, we lack empirical data on their effects on applicant quality and turnover, because
bonuses appeal to individuals willing to move around for short-term gains (Hansen, 2006;
Medland, 2003).

FAMILY AND FRIENDS


Family and friends have influence on decisions. Although the role of social influencers (e.g., fam-
ily and friends) in job choice has long been suggested as important (Kilduff, 1990), it is relatively
under-researched. One exception would be the U.S. military’s long time focus on influencers of
enlistment decisions (Legree et al., 2000) through the Youth Attitude Tracking Study and other
efforts that have demonstrated how influencers affect choices. However, in practice this role is
recognized in various employee referral programs as well as in recruitment activities. For example,
to obtain a competitive advantage in attracting applicants, a call center in India conducts “Family
Days,” which provide members of a potential applicant’s family with an opportunity to learn about
the company. The U.S. military developed advertisements specifically targeted at the hopes and
concerns of parents regarding military careers (Neal, 2005).

WORK-LIFE BALANCE
The role of work-life balance in job choice appears to be increasing in importance. Much has been
written about how younger generations are more concerned about work-life balance than previous
labor market entrants (Charles & Harris, 2007; Fleetwood, 2007; Roos, Trigg, & Hartman, 2006).
Of particular note is the importance of this factor to high-quality applicants. For example, in a
2007 study of turnover at IBM, top performers report leaving for a perception of increased work-
life balance, whereas solid contributors left for more traditional factors like pay and promotion
opportunities.
Trends in the workplace such as globalization and technological innovations have not appeared
to have a great impact on recruitment research on offer acceptance, but there are a few areas in
which we think research attention is warranted (see Table 6.3).

OFFER ACCEPTANCE AROUND THE GLOBE


Cultural influences on the relative role of various factors (pay, promotion opportunities, induce-
ments, family and friends, work-life balance) on job offer acceptance needs to be examined. We
would anticipate, on the basis of our own practical experience and suggestions in the literature,
that factors such as pay, opportunities, and signing bonuses would play stronger roles in emerging
markets than mature ones where research traditionally has been based. Lucrative job offers are
144 Handbook of Employee Selection

TABLE 6.3
Accepting Offers—Unanswered Research Questions
Globalization Technology Strategic Talent Management
What are cultural and economic influences on How does organizational use of How do changes in recruitment
the relative roles of pay, promotion innovations in technology in activities to target applicant pools
opportunities, job security, signing recruitment affect job offer affect subsequent training and
inducements, social influencers, and acceptance? socialization activities and other HR
work-life balance on job offer acceptance? practices?
What factors influence relocation decisions?
How does organizational off-shoring and
outsourcing affect applicant perceptions of
job security and how does that affect offer
acceptance?

tickets to upward mobility, and so salary plays a bigger factor in job interest in those locations.
Potential recruits often are switching jobs frequently in efforts to obtain salary increases. Further,
because compensation packages and hours worked vary widely through the world, global recruiting
requires adjustments to offers to make them attractive in different locations (Brandel, 2006). We
have already noted that the role of social influencers will likely vary with the role accorded family
and friends in a given culture. Hence, although comparative empirical research on offer acceptance
and country is not available, existing evidence strongly suggests some differential rating of factors
by culture and economic conditions.
One of the biggest concerns of those involved in global hiring is being able to constantly antici-
pate where the cost-effective labor is that is willing to do lower-level work and stay in the role. For
example, 20 years ago, India had a pool of skilled labor. As more companies tapped India, attrition
and the cost of labor spiked. In 2008, companies are quickly moving to countries such as Malaysia,
Vietnam, and South Africa to find skilled, cost-effective labor. Increasing off-shoring and out-
sourcing to low-cost countries may also lead applicants in locations where organizations have more
mature businesses to take a closer look at job security in decision-making than has been the case
in the past. We know little about how publicity regarding an organization’s layoffs and off-shoring
affect willingness to apply and to accept offers (Aiman-Smith, Bauer, & Cable, 2001; Kammeyer-
Mueller & Liao, 2006).

PRACTICE IMPLICATION
Recognizing cultural and market influences on the relative importance of factors in offer
acceptance is important. Anticipating how market changes will affect recruitment efforts and
organizational image can facilitate the effectiveness of future efforts.

TECHNOLOGY AND OFFERS


The role of technology in offer acceptance should be similar to what we have described in terms of
attraction and maintaining interest. We would expect that at the point of making offers, most orga-
nizations are engaged in more high-touch contact than high-tech contact with prospects. Obviously
an organization’s investment in technology and its innovative use of technology in and of itself
may influence applicant attraction because the employer will be seen as “leading edge.” Once
again, these speculations may be less correct for recruiting in areas other than the recent college
graduates.
Attracting Job Candidates to Organizations 145

PRACTICE IMPLICATION
Review and evaluate technology use and messaging in later stages of the process, not just
early ones.

OFFER ACCEPTANCE AND TALENT MANAGEMENT


One way organizations have focused efforts to tap targeted talent is by using applicant tracking
information to narrow their campus recruitment efforts, focusing on fewer schools that yield the
best offer acceptance rates among the higher quality applicants (Mullich, 2004). For example, Shell
changed from an on-campus presence at 84 colleges and universities to just 26 schools, expanding
programs and developing closer ties at those locations (Mullich, 2004). Early identification of top
talent is a strategy that organizations are using to increase odds of job offer acceptance. In addi-
tion to the traditional route of internships, programs are now offered for sophomore students (e.g.,
special presentations on topics of interest to students, scholarships and other awards to promising
individuals), long-term relationships with faculty are developed, and senior executives regularly
visit schools (Younger, Smallwood, & Ulrich, 2007).
One strategic question raised by Rynes and Barber (1990) is whether the organization has suf-
ficiently considered how attraction strategy changes to increase offer acceptances might affect exist-
ing HR policies and practices. For example, offering signing bonuses may increase attraction of
targeted individuals but may cause internal equity problems. Focusing on attracting individuals
with unproven potential may necessitate major changes to training and socialization efforts.

PRACTICE IMPLICATION
Narrow recruitment efforts through targeting. Focus on the early identification of talent and
generating interest in those pools. Evaluate how attraction strategies such as inducements can
affect internal equity.

CONCLUSIONS
In this chapter, we have summarized conclusions in recruitment research that have broad general-
ity. This focus may leave the reader wondering about the pros and cons of more contextualized
approaches to researching recruitment that are industry, job-level, geography, or target appli-
cant group-specific. Our conclusion is that contextual factors must be evaluated, but they are not
necessarily going to change recruitment theory and models. For example, although we provided
numerous examples in this chapter in which culture might make a difference, we also provided
numerous examples in which it does not appear to greatly influence research findings and/or prac-
tice. Advancements in recruitment research and practice will come from better articulation of when
one ought to be attending to these contextual factors and when they can be ignored or only minor
modifications in approaches be made.
We have eschewed a traditional review of the recruitment literature for a more focused look at
how some of the key conclusions of research should be interpreted in light of the important trends of
increasing globalization, increasing use of technology in recruiting, and increasing attention to stra-
tegic talent management. Our review leads us to conclude that although organizations are certainly
using new practices and adopting new strategies in response to these trends, the research base lags
practice in these areas. Increasing our attention to recruitment processes with these trends in mind
should yield more theory-driven practices than those adopted today, while at the same time better
informing our understanding of what influences attraction to organizations.
146 Handbook of Employee Selection

REFERENCES
Aiman-Smith, L., Bauer, T. N., & Cable, D. M. (2001). Are you attracted? Do you intend to pursue? A recruit-
ing policy-capturing study. Journal of Business and Psychology, 16, 219–237.
Anderson, N. (2003). Applicant and recruiter reactions to new technology in selection: A critical review and
agenda for future research. International Journal of Selection and Assessment, 11, 121–136.
Anderson, N., & Witvliet, C. (2008). Fairness reactions to personnel selection methods: An international com-
parison between the Netherlands, the United States, France, Spain, Portugal and Singapore. International
Journal of Selection and Assessment, 16, 1–13.
Avery, D. R. (2003). Reactions to diversity in recruitment advertising—Are differences Black and White?
Journal of Applied Psychology, 88, 672–679.
Avery, D. R., Hernandez, M., & Hebl, M. R. (2004). Who’s watching the race? Racial salience in recruitment
advertising. Journal of Applied Social Psychology, 34, 146–161.
Avery, D. R., & McKay, P. F. (2006). Target practice: An organizational impression management approach to
attracting minority and female job applicants. Personnel Psychology, 59, 157–187.
Baack, D. W., & Singh, N. (2007). Culture and web communications. Journal of Business Research, 60,
181–188.
Barber, A. E. (1998). Recruiting employees: Individual and organizational perspectives. Thousands Oaks, CA: Sage.
Barber, A. E., & Roehling, M. V. (1993). Job postings and the decision to interview: A verbal protocol analysis.
Journal of Applied Psychology, 78, 845–856.
Bauer, T., Truxillo, D., Tucker, J., Weathers, V., Bertolino, M., Erdogan, B., & Campion, M. (2006). Selection
in the information age: The impact of privacy concerns and computer experience on applicant reactions.
Journal of Management, 32, 601–626.
Beach, L. R. (1993). Broadening the definition of decision making: The role of prechoice screening of options.
Psychological Science, 4, 215–220.
Bell, B. S., Wiechmann, D., & Ryan, A. M. (2006). Consequences of organizational justice expectations in a
selection system. Journal of Applied Psychology, 91, 455–466.
Boudreau, J. W., & Ramstad, P. M. (2005). Talentship and the new paradigm for human resource management:
From professional practice to strategic talent decision science. Human Resource Planning, 28, 17–26.
Braddy, P. W., Thompson, L. F., Wuensch, K. L., & Grossnickle, W. F. (2003). Internet recruiting: The effects
of web page design features. Social Science Computer Review, 21, 374–385.
Brandel, M. (2006). Fishing in the global talent pool. Computerworld, 40, 33–35.
Breaugh, J., & Starke, M. (2000). Research on employee recruiting: So many studies, so many remaining ques-
tions. Journal of Management, 26, 405–434.
Bretz, R. D., & Judge, T. A. (1998). Realistic job previews: A test of the adverse self-selection hypothesis.
Journal of Applied Psychology, 83, 330–337.
Brooks, M. E., & Highhouse, S. (2006). Familiarity breeds ambivalence. Corporate Reputation Review, 9,
105–113.
Brooks, M. E., Highhouse, S., Russell, S., & Mohr, D. (2003). Familiarity, ambivalence, and firm reputation: Is
corporate fame a double-edge sword? Journal of Applied Psychology, 88, 904–914.
Cable, D. M., & Judge, T. A. (1996). Person-organization fit, job choice decisions, and organizational entry.
Organizational Behavior and Human Decision Processes, 67, 294–311.
Cable, D. M., & Turban, D. B. (2001). Establishing the dimensions, sources and value of job seekers’
employer knowledge during recruitment. Research in Personnel and Human Resources Management,
20, 115–163.
Cable, D. M., & Yu, K. Y. T. (2007). How selection and recruitment practices develop the beliefs used to assess
fit. In C. Ostroff & T. A. Judge (Eds.), Perspectives on organizational fit (pp. 155–182). New York, NY:
Lawrence Erlbaum.
Carless, S. A., & Wintle, J. (2007). Applicant attraction: The role of recruiter function, work-life balance poli-
cies and career salience. International Journal of Selection and Assessment, 15(4), 394–404.
Casper, W. J., & Buffardi, L. C. (2004). Work-life benefits and job pursuit intentions: The role of anticipated
organizational support. Journal of Vocational Behavior, 65(3), 391–410.
Chapman, D. S., & Rowe, P. M. (2002). The influence of videoconference technology and interview structure
on the recruiting function of the employment interview: A field experiment. International Journal of
Selection and Assessment, 10(3), 185–197.
Chapman, D. S., Uggerslev, K. L., Carroll, S. A., Plasentin, K. A., & Jones, D. A. (2005). Applicant attrac-
tion into organizations and job choice: A meta-analytic review of the correlates of recruiting outcomes.
Journal of Applied Psychology, 90, 928–944.
Attracting Job Candidates to Organizations 147

Chapman, D. S., Uggerslev, K. L., & Webster, J. (2003). Applicant reactions to face-to-face and technology-
mediated interviews: A field investigation. Journal of Applied Psychology, 88(5), 944–953.
Chapman, D. S., & Webster, J. (2003). The use of technologies in the recruiting, screening, and selection pro-
cesses for job candidates. International Journal of Selection and Assessment, 11(2–3), 113–120.
Charles, N., & Harris, C. (2007). Continuity and change in work-life balance choices. British Journal of
Sociology, 58(2), 277–295.
Cober, R. T., Brown, D. J., Keeping, L. M., & Levy, P. E. (2004). Recruitment on the Net: How do organiza-
tional web site characteristics influence applicant attraction? Journal of Management, 30, 623–646.
Cober, R. T., Brown, D. J., & Levy, P. E. (2004). Form, content, and function: An evaluative methodology for
corporate employment Web sites. Human Resource Management, 43, 201–218.
Cober, R. T., Brown D. J., Levy, P. E., Cober, A. B., & Keeping, L. M. (2003). Organizational Web sites: Web
site content and style as determinants of organizational attraction. International Journal of Selection and
Assessment, 11, 158–169.
Collins, C. J., & Han, J. (2004). Exploring applicant pool quantity and quality: The effects of early recruitment
practices, corporate advertising, and firm reputation. Personnel Psychology, 57, 685–717.
Collins, C. J., & Stevens, C. K. (2002). The relationship between early recruitment-related activities and the
application decisions of new labor-market entrants: A brand equity approach to recruitment. Journal of
Applied Psychology, 87, 1121–1133.
Connerley, M. L., Carlson, K. D., & Mecham, R. L. (2003). Evidence of differences in applicant pool quality.
Personnel Review, 32, 22–39.
Connerley, M. L., & Rynes, S. L. (1997). The influence of recruiter characteristics and organizational recruit-
ment support on perceived recruiter effectiveness: Views from applicants and recruiters. Human Relations,
50, 1563–1586.
Dineen, B. R., Ash, S. R., & Noe, R. A. (2002). A web of applicant attraction: Person-organization fit in the
context of Web-based recruitment. Journal of Applied Psychology, 87, 723–734.
Dineen, B. R., Ling, J., Ash, S. R., & DelVecchio, D. (2007). Aesthetic properties and message customization:
navigating the dark side of web recruitment. Journal of Applied Psychology, 92, 356–372.
Fleetwood, S. (2007). Why work-life balance now? International Journal of Human Resource Management,
18, 387–400.
Fletcher, R. (2006). The impact of culture on web site content, design, and structure: An international and a
multicultural perspective. Journal of Communication Management, 10, 259–273.
Flinders, K. (2007, October 23). Harnessing Generation Y. Computer Weekly, 142, pNA.
Gibby, R. E., & Boyce, A. S. (2007, April). A cross-cultural look at items of numerical reasoning. Presented at
the annual meetings of the Society for Industrial and Organizational Psychology, New York, NY.
Hallier, J. (2001). Greenfield recruitment and selection: Implications for the older worker. Personnel Review,
30, 331–350.
Hansen, F. (2006). Refining signing bonuses. Workforce Management, 85, 39–41.
Hansen, F. (2007). Avoiding truth-in-hiring lawsuits. Workforce Management Online. Retrieved December 13,
2007, from http://www.workforce.com
Harris, M. M., Ispas, D., & Mano, H. (2007, April). Applicant perceptions of recruitment sources: A Romanian
sample. Presented at the annual meetings of the Society for Industrial and Organizational Psychology,
New York, NY.
Harris, M. M., Van Hoye, G., & Lievens, F. (2003). Privacy and attitudes towards internet-based selec-
tion systems: A cross-cultural comparison. International Journal of Selection and Assessment, 11,
230–236.
Harrison, D. A., Kravitz, D. A., Mayer, D. M., Leslie, L. M., & Lev-Arey, D. (2006). Understanding attitudes
toward affirmative action programs in employment: Summary and meta-analysis of 35 years of research.
Journal of Applied Psychology, 91, 1013–1036.
Hausdorf, P. A., & Duncan, D. (2004). Firm size and internet recruiting in Canada: A preliminary investigation.
Journal of Small Business Management, 42, 325–334.
Hausknecht, J. P., Day, D. V., & Thomas, S. C. (2004). Applicant reactions to selection procedures: An updated
model and meta-analysis. Personnel Psychology, 57, 639–683.
Hermeking, M. (2005). Culture and Internet consumption: Contributions from cross-cultural marketing and
advertising research. Journal of Computer-Mediated Communication, 11, 192–216.
Highhouse, S., Beadle, D., Gallo, A., & Miller, L. (1998). Get ‘em while they last! Effects of scarcity informa-
tion in job advertisements. Journal of Applied Social Psychology, 28, 779–795.
Highhouse, S., & Hause, E. L. (1995). Missing information in selection: An application of the Einhorn-Hogarth
ambiguity model. Journal of Applied Psychology, 80, 86–93.
148 Handbook of Employee Selection

Highhouse, S., Stanton, J. M., & Reeve, C. L. (2004). Examining reactions to employer information using a
simulated web-based job fair. Journal of Career Assessment, 12, 85–96.
Highhouse, S., Thornbury, E. E., & Little, I. S. (2007). Social-identity functions of attraction to organizations.
Organizational Behavior and Human Decision Processes, 103, 134–146.
Hu, C., Su, H., & Chen, C. B. (2007). The effect of person-organization fit feedback via recruitment web sites
on applicant attraction. Computers in Human Behavior, 23, 2509–2523.
Judge, T. A., & Bretz, R. D. (1992). Effects of work values on job choice decisions. Journal of Applied
Psychology, 77, 261–271.
Kammeyer-Mueller, J., & Liao, H. (2006). Workforce reduction and job seeker attraction: Examining
job seekers’ reactions to firm workforce reduction policies. Human Resource Management, 45,
585–603.
Karande, K., Almurshidee, K. A., & Al-Olayan, F. (2006). Advertising standardisation in culturally similar
markets: Can we standardise all components? International Journal of Advertising, 25, 489–512.
Kilduff, M. (1990). The interpersonal structure of decision making: A social comparison approach to organiza-
tional choice. Organizational Behavior and Human Decision Processes, 47, 270–288.
Kristof-Brown, A. L., Zimmerman, R. D., & Johnson, E. C. (2005). Consequences of individual’s fit at work:
A meta-analysis of person-job, person-organization, person-group, and person-supervisor fit. Personnel
Psychology, 58, 281–342.
Legree, P.J., Gade, P.A., Martin, D.E., Fischl, M.A., Wilson, M.J., Nieva, V.F., McCloy, R., & Laurence, J.
(2000). Military enlistment and family dynamics: Youth and parental perspectives. Military Psychology,
12, 31–49.
Lewis, R. E., & Heckman, R. J. (2006). Talent management: A critical review. Human Resource Management
Review, 16, 139–154.
Lievens, F., & Harris, M. M. (2003). Research on internet recruiting and testing: Current status and future
directions. In C. Cooper & I. Robertson (Eds.), International review of industrial and organizational
psychology (Vol. 18, pp.131–165). Chichester, England: Wiley.
Lievens, F., & Highhouse, S. (2003). The relation of instrumental and symbolic attributes to a company’s attrac-
tiveness as an employer. Personnel Psychology, 56, 75–102.
Marcus, B. (2003). Attitudes towards personnel selection methods: A partial replication and extension in a
German sample. Applied Psychology: An International Review, 52(4), 515–532.
Matsumoto, D. (2001). The handbook of culture and psychology. New York, NY: Oxford University Press.
Maurer, S. D., Howe, V., & Lee, T. W. (1992). Organizational recruiting as marketing management: An inter-
disciplinary study of engineering graduates. Personnel Psychology, 45, 807–833.
McKay, P. F., & Avery, D. R. (2006). What has race got to do with it? Unraveling the role of racioethnicity in
job seekers’ reactions to site visits. Personnel Psychology, 59, 395–429.
Medland, M. E. (2003). When to pay signing bonuses. HR Magazine, 48, 99–102.
Meglino, B. M., DeNisi, A. S., & Ravlin, E. C. (1993). Effects of previous job exposure and subsequent job
status on the functioning of a realistic job preview. Personnel Psychology, 46(4), 803–822.
Moscoso, S., & Salgado, J. F. (2004). Fairness reactions to personnel selection techniques in Spain and Portugal.
International Journal of Selection and Assessment, 12, 187–196.
Mullich, J. (2004). Finding the schools that yield the best job-applicant ROI. Workforce Management, 83,
67–68.
Myors, B., Lievens, F., Schollaert, E., Van Hoye, G., Cronshaw, S. F., Mladinic, A., et al. (2008). Broadening
international perspectives on the legal environment for personnel selection. Industrial and Organizational
Psychology 1, 266–270
Nakache, P. (1997). Cisco’s recruiting edge. Fortune, 136, 275–276.
Neal, T. M. (2005, August 22). Military’s recruiting troubles extend to affluent war supporters. Washington
Post.
Nord, W. R., Fox, S., Phoenix, A., & Viano, K. (2002). Real-world reactions to work-life balance programs:
Lessons for effective implementation. Organizational Dynamics, 30, 223–238.
Orbe, M. P. (1998). Constructing co-cultural theory: An explication of culture, power, and communication.
Thousand Oaks, CA: Sage.
Perkins, L. A., Thomas, K. M., & Taylor, G. A. (2000). Advertising and recruitment: Marketing to minorities.
Psychology & Marketing, 17, 235–255.
Phillips, J. M. (1998). Effects of realistic job previews on multiple organizational outcomes: A meta-analysis.
Academy of Management Journal, 41, 673–690.
Ployhart, R. E. (2006). Staffing in the 21st century: New challenges and strategic opportunities. Journal of
Management, 32, 868–897.
Attracting Job Candidates to Organizations 149

Porter, C. O. L. H., Conlon, D. E., & Barber, A. E. (2004). The dynamics of salary negotiations: Effects on
applicant’s justice perceptions and recruitment decisions. International Journal of Conflict Management,
15(3), 273–303.
Premack, S. L., & Wanous, J. P. (1985). A meta-analysis of realistic job preview experiments. Journal of
Applied Psychology, 20, 706–719.
Ravlin, E. C., & Meglino, B. M. (1987). Effect of values on perception and decision making: A study of alterna-
tive work values measures. Journal of Applied Psychology, 72, 666–673.
Reeve, C. L., Highhouse, S., & Brooks, M. E. (2006). A closer look at reactions to realistic recruitment mes-
sages. International Journal of Selection and Assessment, 14, 1–15.
Riordan, C. M., & Holiday-Wayne, J. (2008). “Are all measures the same? A critical review and examination
of demographic similarity measures in relational demography within groups research.” Organizational
Research Methods, 11, 562–592.
Roehling, M. V., & Winters, D. (2000). Job security rights: The effects of specific policies and practices on the
evaluation of employers. Employee Rights and Responsibilities Journal, 12, 25–38.
Roos, P. A., Trigg, M, K., & Hartman, M. S. (2006). Changing families/changing communities: Work, family
and community in transition. Community, Work & Family, 9, 197–224.
Ruiz, G. (2007). Where’d they come from? Workforce Management, 86, 39–40.
Ruiz, G. (2008). PeopleFilter Amplifies ability to track job applicants. Workforce Management Online. Retrieved
February 2, 2008, from http://www.workforce.com
Russell, D. P. (2007). Recruiting and staffing in the electronic age: A research-based perspective. Consulting
Psychology Journal: Practice and Research, 59, 91–101.
Ryan, A. M., Boyce, A. S., Ghumman, S., Jundt, D., Schmidt, G., & Gibby, R. (2009). Going global: Cultural
values and perceptions of selection procedures. Applied Psychology: An International Review, 58,
520–556.
Ryan, A. M., & Ployhart, R. E. (2000). Applicants’ perceptions of selection procedures and decisions: A critical
review and agenda for the future. Journal of Management, 26, 565–606.
Ryan, A. M., Sacco, J. M., McFarland, L. A., & Kriska, S. D. (2000). Applicant self-selection: Correlates of
withdrawal from a multiple hurdle process. Journal of Applied Psychology, 85, 163–179.
Rynes, S. L. (1991). Recruitment, job choice, and post-hire consequences: A call for new research directions. In
M. D. Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 399–444). Palo Alto,
CA: Consulting Psychologists Press.
Rynes, S. L., & Barber, A. E. (1990). Applicant attraction strategies: An organizational perspective. Academy
of Management Review, 15, 286–310.
Rynes, S. L., Bretz, R. D., & Gerhart, B. (1991). The importance of recruitment in job choice: A different way
of looking. Personnel Psychology, 44, 487–521.
Rynes, S. L., & Cable, D. M. (2003). Recruitment research in the twenty-first century. In W. C. Borman,
D. R. Ilgen, & R. J. Klimoski (Eds.), Handbook of psychology: Volume 12 Industrial-Organizational
Psychology (pp. 55–76). Hoboken, NJ: John Wiley & Sons.
Rynes, S. L., Heneman, H. G., & Schwab, D. P. (1980). Individual reactions to organizational recruiting: A
review. Personnel Psychology, 33, 529–542.
Saks, A. M. (2005). Job search success: A review and integration of the predictors, behaviors, and outcomes.
In S. D. Brown & R. W. Lent (Eds.), Career development and counseling: Putting theory and research to
work (pp. 155–179). Hoboken, NJ: John Wiley & Sons.
Saks, A. M., Leck, J. D., & Saunders, D. M. (1995). Effects of application blanks and employment equity on
applicant reactions and job pursuit intentions. Journal of Organizational Behavior, 16, 415–430.
Shanker, T. (2007, August 11). Army, shedding a slump, met July recruiting goal. The New York Times. Retrieved
from http://www.nytimes.com/2007/08/11/us/11recruit.html
Silvester, J., & Anderson, N. (2003). Technology and discourse: A comparison of face-to-face and telephone
employment interviews. International Journal of Selection and Assessment, 11, 206–214.
Silvester, J., Anderson, N., Haddleton, E., Cunningham-Snell, N., & Gibb, A. (2000). A cross-modal compari-
son of telephone and face-to-face interviews in graduate recruitment. International Journal of Selection
and Assessment, 8, 16–21.
Straus, S. G., Miles, J. A., & Levesque, L. L. (2001). The effects of videoconference, telephone, and face-to-
face media on interviewer and applicant judgments in employment interviews. Journal of Management,
27, 363–381.
Taylor, M. S., & Collins, C. J. (2000). Organizational recruitment: Enhancing the intersection of theory and
practice. In C. L. Cooper & E. A. Locke (Eds.), Industrial and organizational psychology: Linking theory
and practice (pp. 304–334). Oxford, England: Basil Blackwell.
150 Handbook of Employee Selection

Taylor, S. (2006). Acquaintance, meritocracy and critical realism: Researching recruitment and selection pro-
cesses in smaller and growth organizations. Human Resource Management Review, 16, 478–489.
Thorsteinson, T. J., Palmer, E. M., Wulff, C., & Anderson, A. (2004). Too good to be true? Using realism to
enhance applicant attraction. Journal of Business and Psychology, 19, 125–137.
Triandis, H. C. (Ed.). (1995). Individualism & collectivism. Boulder, CO: Westview Press.
Turban, D. B. (2001). Organizational attractiveness as an employer on college campuses: An examination of the
applicant population. Journal of Vocational Behavior, 58, 293–312.
Turban, D. B., & Cable, D. M. (2003). Firm reputation and applicant pool characteristics. Journal of
Organizational Behavior, 24, 733–751.
Turban, D. B., Campion, J. E., & Eyring, A. R. (1995). Factors related to job acceptance decisions of college
graduates. Journal of Vocational Behavior 47, 193–213.
Turban, D. B., & Dougherty, T. W. (1992). Influences of campus recruiting on applicant attraction to firms.
Academy of Management Journal, 35, 739–765.
Turban, D. B., Forret, M. L., & Hendrickson, C. L. (1998). Applicant attraction to firms: Influences of organiza-
tion reputation, job and organizational attributes, and recruiter behaviors. Journal of Vocational Behavior,
52, 24–44.
Van Hoye, G., & Lievens, F. (2005). Recruitment-related information sources and organizational attractiveness:
Can something be done about negative publicity? International Journal of Selection and Assessment, 13,
179–187.
Van Hoye, G., & Lievens, F. (2007). Investigating web-based recruitment sources: Employee testimonials vs.
word-of-mouse. International Journal of Selection and Assessment, 15, 372–382.
Vance, C. M., & Paik, Y. (2006). Managing a global workforce: Challenges and opportunities in International
Human Resource Management. London, England: M. E. Sharpe.
Vecchio, R. P. (1995). The impact of referral sources on employee attitudes: Evidence from a national sample.
Journal of Management, 21, 953–965.
Williams, M. L., & Bauer, T. N. (1994). The effect of a managing diversity policy on organizational attractive-
ness. Group & Organization Management, 19, 295–308.
Williamson, I. O., Lepak, D. P., & King, J. (2003). The effect of company recruitment Web site orientation of
individuals’ perceptions of organizational attractiveness. Journal of Vocational Behavior, 63, 242–263.
Younger, J., Smallwood, N., & Ulrich, D. (2007). Developing your organization’s brand as a talent developer.
Human Resource Planning, 30, 21–29.
Yuce, P., & Highhouse, S. (1997). Effects of attribute set size and pay ambiguity on reactions to ‘help wanted’
advertisements. Journal of Organizational Behavior, 19, 337–352
Zottoli, M. A., & Wanous, J. P. (2000). Recruitment source research: Current status and future directions.
Human Resource Management Review, 10, 353–382.
Zusman, R. R., & Landis, R. S. (2002). Applicant preferences for web-based versus traditional job postings.
Computers in Human Behavior, 18, 285–296.
7 Test Administration and
the Use of Test Scores

Jeff W. Johnson and Frederick L. Oswald

USE OF TEST SCORES


Tests and test scores are ubiquitous, and they affect most of us in meaningful ways from the very
moment we are born—or even earlier with prenatal tests. This chapter does not address ultrasounds
or Apgar tests; however, we focus on the use of tests and test scores in employment selection set-
tings. It is important from the outset to remind ourselves that personnel selection is only one general
approach out of many that, taken together, contribute toward meeting organizational goals such
as improved job performance, more effective learning outcomes, higher motivation, and reduced
turnover. Selection researchers and practitioners dedicated to these goals can broaden their thinking
and improve their work by staying connected with the literature in training, motivation, leadership,
teamwork, and other relevant areas outside of their niche. Complex organizational problems are
usually not purely “selection problems” and therefore are most effectively evaluated and addressed
by taking an integrated approach to practice that involves a broad set of professional, strategic, and
technical skills (Huselid, Jackson, & Schuler, 1997). Similarly, selection test scores do not exist in a
vacuum. Decisions about the type of test scores to collect might be influenced by the type of train-
ing provided (or not provided) to those who are selected; the knowledge, skills, abilities, and other
characteristics (KSAOs) of applicants who are attracted to the organization’s recruiting strategies;
and so on. In addition, conceptually sound and statistically reliable test scores are inextricably tied
to job analysis, construct measurement, validation efforts, and many other topics covered in this
handbook.
The purpose of this chapter is to provide some practical guidance for using test scores in per-
sonnel selection. Without the knowledge of a particular organizational problem at hand, our advice
remains at a general level, but we attempt to synthesize and interpret the most current and reliable
information in the selection literature to date. Our chapter attempts to keep several sensitivities in
mind. First, we attempt to lean toward being practical when interpreting relevant statistical and
psychometric concepts and findings in the literature that can sometimes appear rather abstruse at
first glance, even to the intended audience of selection researchers and practitioners. Second, our
advice is intended to be consistent with the use of test scores in a legally defensible manner,
although organizational science and the law are continuously evolving disciplines. Finally, our
review and suggestions based on it attempt to be sensitive to the wide variety of organizational
contexts, whether public versus private sector, high- versus low-volume employment, and large
versus small organizations.

DECISIONS TO MAKE BEFORE COLLECTING TEST SCORES


Although a chapter on the use of test scores implies the scores have already been collected, most of
the decisions about how test scores will be used must be made at the outset of the testing program.

151
152 Handbook of Employee Selection

TABLE 7.1
General Considerations for the Use of Selection Systems
• Discuss whether or not the organization has a “selection problem.” Personnel selection is only one option or
approach out of many possibilities (training, motivational interventions, or organizational redesign may supplement
or supplant selection).
• Define the desired purpose(s) for which test scores will be used. Direct uses include predicting performance,
minimum proficiency, safe behaviors, and reduced turnover; less direct uses include identifying training needs,
identifying potential for career advancement, or providing sponsored licensure or certification.
• Determine the characteristics to be measured by tests. KSAOs may be more directly measurable for applicants
with job experience; conversely, entry-level jobs or future job requirements may require the measurement of more
distal constructs (e.g., ability, motivation).
• Identify the practical constraints that influence your selection process—and how to deal with them. These
constraints include the volume of testing over time, the number of jobs into which individuals are being selected or
classified, the available resources of the organization (e.g., type of tests, number of computer testing terminals,
availability of test administrators), and the level of test security that can be achieved.

This section presents several issues that must be addressed prior to data collection when determin-
ing how test scores will be used. Table 7.1 provides a summary of these issues.
First, for what purpose will the test scores be used? We focus on selection issues in this chapter,
but testing is useful for several organizational purposes. Common uses of test scores are (a) to select
job applicants for employment, (b) to identify training needs, (c) to promote current employees or
determine who will be put on a fast track for later promotion, and (d) licensure or certification.
Second, what is the ultimate goal in collecting these test scores? Typical goals to which a testing
program can contribute are maximizing the mean level of task-specific or contextual job perfor-
mance; maximizing the number of employees who meet a minimum level of proficiency; improving
employee retention; increasing the diversity of skill sets in the organization; or minimizing counter-
productive behaviors such as theft, absenteeism, unsafe work behavior, or drug abuse. Taxonomies
of job performance have helped sharpen the models, definitions, and metrics that underlie organi-
zational objectives, but they also have opened our eyes to the fact that multiple objectives are often
of interest, making selection goals complicated. If the objective is a combination of two or more of
these goals, as it often is, then a decision must be made about the relative importance of each goal
to the organization. If no explicit consideration is paid to how testing relates to major organizational
goals, decisions made along the way could contradict those made in conjunction with other practices
and policies of the organization.
Third, what characteristics do you want to measure? For example, when selecting applicants
for employment, it is legitimate to select for characteristics that are important for the current job,
future job requirements, or person-organization fit. Prediction and applicant reactions to testing are
usually most favorable when the test or test battery measures characteristics that are more proximal
to performance, such as job knowledge and skill. Job knowledge tests and work samples tend to
be more expensive than measures of more indirect determinants of job performance, such as abil-
ity and personality; however, it may not be possible to test for specific job knowledge and skill for
entry-level positions.
Fourth, what is the volume of testing necessary? Tests can be used to select a single individual
from a small number of applicants into a specific position or to generate a large pool of individu-
als who may be selected or classified into multiple jobs. Given the probabilistic nature of selection
benefits (i.e., small gains accumulate over large numbers of selection decisions), test users must
determine if the costs associated with a small volume of testing are worth the benefits that are
likely to be obtained. Testing in small versus large organizations can influence the way test scores
are used, because large organizations tend to have the resources that allow them more freedom to
do different things with test scores. Smaller organizations may be limited to buying off-the-shelf
Test Administration and the Use of Test Scores 153

tests that are scored and interpreted by outside vendors, and support for the use of the measure may
require relying on multiple sources of external information (e.g., job analysis, transporting validity
from other situations, and meta-analyses). Although this is also an option for larger organizations,
they are more likely to have the option of developing their own tests and conducting a local valida-
tion study. This offers more freedom in customizing, weighting, combining, reporting, and applying
test scores.
Finally, the mode of administration can influence how test scores are used. Will the test be
administered via paper and pencil, computer, role play, work sample, or even interactive voice
response (IVR)? Will the test be proctored or unproctored, one-on-one or group administration?
The shift to Internet or computer-based testing opens the door to a much wider range of test formats
(e.g., video, interactive, adaptive) and new constructs that can be measured more reliably (e.g., inter-
personal skills, ability to speak a foreign language).
The decisions we have just briefly covered influence later decisions in the testing process. The
remainder of this chapter discusses (a) collection of test scores, (b) computation of test scores, and
(c) making selection decisions on the basis of test scores.

COLLECTION OF TEST SCORES


When collecting test scores, there are several decisions that can influence data quality as well as
applicant satisfaction with and legal defensibility of the process. In this section, we discuss issues
associated with test security, testing time, alternate formats, and retesting.

MAINTAINING TEST SECURITY


In high-stakes testing settings, organizations and test vendors have a keen interest in test security
for two primary reasons. First, test owners want to protect their proprietary rights to the content,
format, and unique aspects of the administration and scoring of the test items. Second, test users
want to maintain the test’s fairness and validity by preventing the spread of test-specific information
that would allow for cheating (e.g., individuals who post information about a test on the Internet or
pass it on to their friends; individuals who memorize test questions to unfairly benefit on a retest).
Organizations also have an ethical responsibility to maintain the privacy and security of individuals’
test scores in high-stakes testing situations, and data are likely to be of higher quality (e.g., relaying
information about confidentiality may reduce test-taker anxiety). Maintaining test security and
confidentiality is more important now than ever, because modern methods of communication make
it easier to compromise test security, and a compromised test can be circulated very quickly to many
people.
Security breaches are more likely when (a) administration is via paper and pencil, (b) there are
many examinees, (c) tests are older and/or commercially available, and (d) there is a single form
of the test. Paper-and-pencil administration requires hard copies of tests that are easier to steal
(computer-administered tests can also be accessed illicitly, but it is much more difficult if proper
safeguards are in place). Many examinees means a greater likelihood that unscrupulous examinees
will be among those tested, greater demand for obtaining test information among applicants, and
larger testing sessions that are more difficult to proctor. Older and/or commercial tests are more
widely available, providing more opportunity for the test to be compromised, especially given a
limited number of test forms or a small item pool.

Alternate Forms
A common and effective strategy for enhancing test security is creating alternate forms of the
test, thus increasing test security while maintaining comparable scores across forms. Given that
a test reflects a representative sample of content from the construct domain of interest, it should
154 Handbook of Employee Selection

be possible to develop alternate measures of the same construct that exhibit similar psychometric
properties in terms of reliability and validity. High correlations between scores across test forms
provides evidence that scores from a particular test are reflective of an individual’s standing on an
underlying construct rather than reflective of an individual’s understanding of content unique to
a particular test form. The following subsections review (a) creating alternate forms, (b) adaptive
tests, and (c) test form equating.

Creating Alternate Forms


Creating alternate forms for some ability constructs may be a relatively simple task. For example,
when creating alternate forms that test for the ability to multiply two 2-digit numbers, substituting
different numbers for the originals will suffice. However, in many cases constructs are defined with
enough conceptual breadth that constructing alternate forms is not straightforward. In these cases, the
constructs to be tested should be well defined and theoretically driven. Carroll’s (1993) hierarchical
taxonomy of human abilities would serve as a good reference in the cognitive domain. Alternate test
forms should be developed so that psychometric characteristics across forms have similarly high
reliability and patterns of criterion-related validity, which can be accomplished by sampling items
representatively from a well-defined construct domain. A good approach to creating alternate forms
of an ability test is to write about three times as many items as will reliably measure the construct
on one form. After collecting data for all items in a pilot test, assign items with similar content and
psychometric properties (e.g., proportion correct, corrected item-total r) to alternate forms. Experience
tells us that about one third of the items will drop out because of inadequate psychometrics. Next,
ensure that the test forms have similar alpha internal consistency reliabilities and that the correlation
between forms is high (at least .90 after correcting for unreliability using alphas from each form). For
speeded tests, alpha reliability and item-total correlations are not appropriate reliability measures;
other approaches are to be used, such as alternate forms and test-retest reliability. Finally, where
possible, determine whether criterion-related validities are similar across forms. Items can be moved
from one form to another to improve the comparability of the forms.
Creating alternate forms for job knowledge tests can follow the same procedure, but it is usually
more difficult because of the items’ specificity. The test developer also often lacks familiarity with
the content area, particularly when job knowledge is highly technical. A good strategy is to have
subject matter experts (SMEs; e.g., trainers, supervisors, incumbents) write test items and to instruct
them to write an “item buddy” for each item they write. The item buddy would be similar to the
original item in terms of content and difficulty, but different enough that knowing the answer to one
does not give away the answer to the other.
Developing alternate forms for situational judgment tests (SJTs) is a challenge because SJT con-
tent can reflect a wide variety of constructs, but analyses usually result in a single general situational
judgment construct. The construct validity of SJTs therefore remains elusive (Schmitt & Chan,
2006). Lievens and Sackett (2007) explored three methods for creating alternate forms for SJTs: (a)
assigning items randomly to forms, (b) creating forms with similar situations, and (c) creating forms
with similar situations and similar item responses. Internal consistency reliabilities and validity
coefficients for predicting grade point average (GPA) for interpersonal courses relevant to the SJT
were low for all forms. The latter two methods did show higher test-retest correlations, indicating
that random assignment of SJT items may not be a sound approach to developing alternate forms.
Oswald, Friede, Schmitt, Kim, and Ramsay (2005) developed multiple parallel forms of an SJT
that was designed to measure 12 broad dimensions of college student performance (e.g., continu-
ous learning, leadership, ethics). Although this SJT was found to be empirically unidimensional,
items were sampled representatively from each of the 12 dimensions to help ensure similarly broad
content coverage for each form. Oswald et al. (2005) winnowed 10,000 computer-generated forms
down to 144 tests with scores having similar means and standard deviations (SDs), high estimated
alpha reliability, high estimated validity, and low item overlap. Thus all test forms were similar in
desired practical qualities of the SJT.
Test Administration and the Use of Test Scores 155

It is possible to create alternate forms of personality tests using the same domain-sampling pro-
cedure as for ability tests because there are many potential items to measure personality constructs.
However, this may not be necessary because the “correct” answer in a personality test is usually not
difficult to determine. Prior exposure to personality items is usually not necessary to fake a person-
ality test, as demonstrated in directed faking studies (Viswesvaran & Ones, 1999). A common prac-
tice is simply to randomize the presentation order of the same items when creating alternate forms.
In general, the importance of having alternate forms corresponds to the extent to which a single
correct answer can be determined for the test questions. Thus, alternate forms are less important for
personality and biodata tests; more important for SJTs, interviews, and work simulations; and most
important for knowledge or ability tests.
A recent trend is the development of many forms (e.g., 10 instead of 2) in an attempt to minimize
cheating and keep a testing program going if one of the test forms is compromised. Oswald et al.
(2005) extended a method proposed by Gibson and Weiner (1998) for creating many test forms on
the basis of the statistics from a pool of items that may not have appeared on the same test form or
have been given to the same sample. This method of generating parallel forms potentially mini-
mizes the exposure of any single test item; in fact, item exposure can be a constraint built into the
procedure for generating test forms (see van der Linden, 2005). A more common strategy for creat-
ing parallel forms is to create three or four unique alternate forms, then create additional forms by
changing the item order and/or by taking some items from each of the original forms. Note, how-
ever, that mixing items from two or more unique forms to create a new form means that some parts
of the unique forms are compromised if the new form is stolen.

Equating
When alternate forms are used, it is necessary to equate test scores across forms so that a given
score on one form is equivalent to the same score on the other form. Although equating can intro-
duce some additional error into the selection process as opposed to using the same measure across
all applicants, careful attention to the process of test development and the establishment of equiv-
alent forms reduces such error. When samples are randomly equivalent and there are common
anchor items across test forms that have similar parameters across samples, several different equat-
ing methods from item response theory (IRT) yield similarly good results (Kim, Choi, Lee, & Um,
2008). In cases in which sample sizes are smaller (less than 500 per item) or IRT assumptions are
untenable, it is necessary to rely on other equating methods. There are two other kinds of equating
methods: (a) linear equating and (b) equipercentile equating. In linear equating, scores on two tests
are considered to be equated if they correspond to the same number of SD units from the mean.
Because linear equating is entirely analytical and does not require data at each point in the test
score range, it offers the advantages of allowing a mapping of scores from one version to the other
throughout the entire range of scores and requires smaller sample sizes. A disadvantage of linear
equating is that it requires the assumption that differences in the shapes of the raw-score distribu-
tions for each form are trivial.
In equipercentile equating, scores on two tests are considered to be equated if they correspond to
the same percentile rank (for details, see Livingston, 2004, pp. 17–23). A problem with equipercen-
tile equating is that it requires very large sample sizes to precisely equate the entire range of scores
on each test. In this method, large errors of estimation are likely in score ranges where data are
scant or erratic, so there must be many observations for each possible score on each form (Peterson,
Kolen, & Hoover, 1989). Methods are available that smooth the empirical distribution of the data,
allowing for more reasonable equipercentile equating in ranges of the scale with less data, assum-
ing the smoothed distribution is the correct one underlying the data (see Dorans, Pommerich, &
Holland, 2007). If the only score that needs to be equated is a cut score and only a pass-fail decision
is communicated to applicants, we recommend equating the cut score on the basis of equipercentile
equating because that will lead to the same pass rate within each form. Whether simpler methods
are as useful as IRT-based approaches is an empirical question, but given very similar empirical
156 Handbook of Employee Selection

outcomes between IRT and classical test theory (Fan, 1998; MacDonald & Paunonen, 2002), we
suspect they will often yield similar results.

Computer Adaptive Tests


Computer adaptive tests (CATs) provide different items to test-takers depending on their previous
responses. For example, an individual who answers an item incorrectly will be given an easier item
next, whereas an individual who answers that item correctly will be given a more difficult item
next. Even for unidimensional measures, developing CATs requires very large sample sizes (at least
500–1,000) and very large item pools, but if this investment can be made, the advantage of CAT is
that fewer items have to be administered to each individual, reducing test fatigue and time while
maintaining high reliability across levels of the construct to be measured. Test security is theoreti-
cally more likely to be maintained because each individual receives a different subset of the item
pool; it is therefore possible for each individual to receive a unique form of the test. This requires
the item pool to be sufficiently large to ensure that test items do not reappear with a frequency
sufficient to allow examinees to memorize them. It is also possible for a savvy test-taker trying to
see many items to purposely answer certain items incorrectly to ensure that additional items are
presented. Other potential disadvantages of CATs are that test-takers cannot review their previous
answers to correct them, and the test scores resulting from a CAT may be more sensitive to initial
responses than to subsequent ones (Chang & Ying, 2008). Investment in CAT development in the
private sector is just beginning, so it remains to be seen if the cost-benefit of CAT is superior to that
of traditional test formats.

Online Testing
Where possible, we recommend using computer administration to enhance test security. Although
computer-based testing does not resolve all security issues (e.g., test-takers can still memorize items),
the elimination of paper forms that are more easily stolen is a clear advantage. However, to maintain
the security advantage of computer-administered tests, strict physical and information technology
(IT) security measures must be in place to protect against unauthorized access to the testing soft-
ware. Reputable vendors that specialize in online testing will have extensive security procedures for
protecting their intellectual property. We do not recommend that organizations conduct large-scale
testing using their own computer system unless extensive and up-to-date security measures are in
place and have been thoroughly tested for this purpose.
Unproctored Internet testing is increasingly common and presents unique problems for main-
taining test security. There are several things that can be done to help minimize test exposure and
motivation to cheat in this situation (Tippins et al., 2006). First, there should be a single point of
entry so that applicants can only complete the test one time (applicants should not be allowed to
retest to improve their score without the knowledge and approval of the hiring organization). The
system should recognize returning applicants so they are not allowed to take the test again. Second,
an applicant tracking system should be used that collects substantial identification information from
the applicant. Third, applicants should be warned about the consequences of cheating and how their
identity and their answers may be verified. Fourth, item order should be randomized with each
administration and/or items should be drawn from item banks so that the same set of items is not pre-
sented to all applicants in the same order. Finally, unproctored Internet tests might be used for initial
testing or to screen out candidates who are highly unlikely to be suitable for the job if they are then
followed up with a proctored test (see Nye, Do, Drasgow, & Fine, 2008). Cheating at the unproctored
stage of testing offers little advantage if the cheater is unlikely to pass the second proctored stage.
In a variation of this approach, it has been suggested that a proctored testing session could contain
some smaller versions of the tests administered in the unproctored testing session to corroborate the
unproctored test scores (Segall, 2001). Scores that are too discrepant between the two sessions would
be called into question. Ultimately, organizations using unproctored Internet testing must accept that
the test items are in the public domain and will be exposed to anyone who really wants access.
Test Administration and the Use of Test Scores 157

TESTING TIME
The amount of time available for testing is often a constraint that influences decisions about the
types of tests to be administered, the mode of administration, the number of tests included in a test
battery, the number of items in a test, and the number of stages in the selection system. For example,
if financial or other considerations place a strict time limit of 1 hour on test administration, that
limits the number of different constructs that can be assessed, the types of constructs that can be
assessed (e.g., reading comprehension takes longer than perceptual speed and accuracy), and the
testing method (e.g., SJTs generally take longer than biodata inventories). Our general advice when
presented with such constraints is to go for depth over breadth. It may be tempting to measure as
many different constructs as possible, but it is much better to measure a few things well than to mea-
sure many things poorly. Because testing time is finite, organizations should maintain their focus in
effectively measuring a relatively small handful of key constructs relevant for selection purposes,
fully recognizing that not all constructs of potential relevance can be measured. One way to reduce
the size of a test battery is to remove tests that are highly correlated with other tests in the battery
and do not provide incremental validity. Such a reduced test battery can often maintain its worth in
terms of reliability and validity for its intended purposes (see Donnellan, Oswald, Baird, & Lucas,
2006; Stanton, Sinar, Balzer, & Smith, 2002). In addition, many organizations cut down on testing
time by using unproctored web administration of noncognitive predictors (Tippins et al., 2006).
Those who meet a minimum score are invited to the proctored testing session where more tests
are administered. If the unproctored test scores can be linked to the proctored test scores via the
applicant name or identification number, then the number of constructs assessed can be increased
without increasing proctored testing time.
Power tests are defined as those tests for which speed is not relevant to the measurement of the
construct, such as for many cognitive ability and achievement tests. Thus, test-takers should be
given adequate time to complete the entire test. However, because unlimited time is not available
for test administration, a rule of thumb for a minimum amount of testing time for power tests is the
time it takes for 90% of the examinees to complete 90% of the items. Unlike power tests, speeded
tests require test-takers to perform quickly on simpler tasks within the time allotted. For paper-and-
pencil administration, speeded tests must be long enough and time limit short enough that no one is
able to finish. The test score is then the number of correct responses minus a correction for guessing
(see Cronbach, 1990). For computer-administered tests, the computer can keep track of the time it
takes to complete a set number of items.
Issues surrounding equity in testing could dramatically influence decisions on the amount of
testing time allocated. Consider job applicants who are nonnative speakers of the language in which
the test is written. If it is safe to assume that language skills are not relevant to the construct being
assessed by the test, then testing time should be set so that nonnative speakers have time to complete
the test when they may take more time to read instructions and test items. Alternatively, tests could
be redesigned so that written language or other factors irrelevant to the construct are minimized,
such as administering a video-based form of a test instead of a traditional written form (Chan &
Schmitt, 1997; Weekley & Jones, 1997).
Consideration of individuals covered by the Americans with Disabilities Act (ADA; 1990) is
also an organizational imperative. If the test is not speeded, applicants with disabilities preventing
them from completing the test in the normally allotted time should be given extra time to complete
the test. Readers are referred to Campbell and Reilly (2000) for an excellent discussion of ADA
accommodations in testing.

ALTERNATE TEST FORMATS


The need to create alternate tests that differ only in their format may arise to test different sub-
groups of individuals in a comparable manner. For example, individuals with visual impairments
158 Handbook of Employee Selection

may require that a test be read to them; tests administered in different countries may require that the
test be translated accurately into several languages; and when organizations are not always capable
of administering their test by computer, they must have a paper form on hand. In some cases it is
reasonable to assume that differences in test format are merely cosmetic and have no bearing on
construct measurement. However, in many cases it is necessary to formally test whether format dif-
ferences lead to score differences that are irrelevant to the constructs being measured.
Statistical tests for measurement invariance allow one to determine whether constructs are being
measured in the same or similar way between two test formats. Ideally, scores from different test
formats would exhibit strict measurement invariance (e.g., similar patterns and magnitudes of factor
loadings, similar error variance estimates; Vandenberg & Lance, 2000). When partial invariance
is said to exist, then further work is needed to determine which items behave differently across
test forms and need to be revised or deleted. Recent simulation and empirical work suggests that
partial invariance may only have a slight impact on selection outcomes in many practical situations
(Millsap & Kwok, 2004; Stark, Chernyshenko, & Drasgow, 2004), but the infancy of this work
prevents us from making any conclusive statements (for a current and comprehensive review of
tests of measurement invariance, see Schmitt & Kuljanin, 2008). Large sample sizes are desirable
in conducting appropriate tests for measurement invariance (see Meade & Bauer, 2007); otherwise,
one may have to rely on a strong rational basis for the equivalence of test formats (e.g., providing
evidence that the formats of the different tests do not influence the construct of interest and should
be unrelated to differences in the samples tested).

RETESTING
According to professional guidelines, allowing candidates to retest at a later date is a best practice
(Society for Industrial and Organizational Psychology, 2003; U.S. Department of Labor, 1999).
Because any single assessment can be influenced by various types of systematic measurement error
unrelated to the construct being measured (e.g., illness, extreme anxiety, not meeting the testing
prerequisites), it is reasonable to offer the opportunity to retest when it is appropriate and feasible.
Retesting is typically not a relevant concern for small-scale testing programs. If a small company
is testing to fill a specific position, the opportunity to retest cannot be offered after the position has
been filled. Candidates should be given the chance to retest if job opportunities are continuously
available, especially if it involves internal candidates. On the other hand, how are the psychomet-
ric qualities of the test affected by allowing candidates to take it two, three, or four times? Is the
purpose of the test defeated if a candidate can retake the test as many times as it takes to finally
pass? Score increases from retesting could be partially due to regression to the mean, because low
test-scorers are more likely to retest, and their retest scores would tend to increase by chance alone.
Alternatively, score increases could be due to practice effects on the test-specific content (e.g., mem-
orizing the correct answers) or in picking up on effective test strategies (e.g., learning that picking
the longest multiple choice answer is beneficial when in doubt). These are undesirable effects that
need to be considered if not controlled for or prevented by not allowing for a retest. On the other
hand, retest scores can also reflect desirable effects. Some test-takers could be less anxious about
the test when they retake it, which could lead to score increases even if their underlying knowledge
upon retest remains the same. Test-takers may also consider the types of questions they missed
when they first tested and concentrate subsequent study in those areas so that the retest reflects true
increases in the construct being measured.
Some recent studies have examined the effect of retesting on scores for different types of tests.
In a study of admission exams for medical students, Lievens, Buyse, and Sackett (2005) found stan-
dardized mean gains in test scores (after correcting for test-retest reliability) of 0.46 for a cognitive
ability test, 0.30 for a knowledge test, and 0.40 for a SJT. Retesting was conducted using alternate
forms. Raymond, Neustel, and Anderson (2007) found standardized mean gains of 0.79 and 0.48 for
two different medical certification exams. In both cases, these gains were nearly the same whether
Test Administration and the Use of Test Scores 159

the identical test or a parallel test was used. This latter finding was contrary to a recent meta-
analysis of retesting effects on cognitive ability tests that found an adjusted overall effect size of
0.46 for identical forms and 0.24 for alternate forms (Hausknecht, Halpert, Di Paolo, & Moriarty
Gerrard, 2007). Overall, Hausknecht et al. (2007) found a gain of 0.26 adjusted SD units between
time 1 and time 2 and 0.18 between time 2 and time 3. The portion of these effects that was due
to regression to the mean was estimated to be less than 10%. Test coaching, defined as instruction
aimed at improving test scores (through learning either test-related skills or test-taking strategies),
had a large effect. The effect size for individuals who had some form of test coaching was 0.70, as
opposed to 0.24 for individuals that did not have any coaching. Another moderator was the type
of cognitive ability assessed, with tests of quantitative and analytical ability showing larger mean
gains than tests of verbal ability.
Do scores on a retest reflect the individual’s ability better than scores on the initial test? Using
a within-person analysis of different forms of a knowledge test, Lievens et al. (2005) found
significantly higher validity coefficients for retest scores than for initial scores. There were no
significant differences in validity coefficients for the cognitive ability test or the SJT. However,
in a follow-up study, Lievens, Reeve, and Heggestad (2007) conducted within-group analyses of
two administrations of the same cognitive ability test and found a validity coefficient for retest
scores of .00, compared to .19 for initial scores. The initial score correlation was not significant,
but it suffered from range restriction and a small sample size. The validity coefficient for the larger
group of examinees that did not retest was .48 (P < .01). Lievens et al. (2007) further showed that
the psychometric properties of the test were not invariant from time 1 to time 2, meaning that test
scores appeared to reflect different constructs across administrations. The initial test was more
strongly related to general cognitive ability, whereas the retest was more strongly related to memory.
This may have been because the retest was the same test rather than an alternate form as in Lievens
et al. (2005), allowing the test-taker to benefit from remembering the content from the initial test.
These findings are informative, but a great deal more research is necessary before conclusions can
be drawn about the relative validity of retest scores.
There is also a need for research on the effect of retesting on adverse impact. Specifically, are
expected score gains upon retesting equivalent for different gender or racial/ethnic subgroups?
Are candidates within different subgroups equally likely to retest when given the opportunity?
Questions such as these must be answered before it is possible to speculate on the impact of retesting
on adverse impact ratios found in practice.
A common question is how long a candidate should have to wait before being allowed to retest.
There is little research to inform this decision, although research suggests that the length of the time
interval does not appear to influence test score gains when alternate forms are used. Hausknecht et
al. (2007) noted that score gains tended to decrease as the time interval increased only for identical
forms. Raymond et al. (2007) found no effect of time delay on score increases regardless of whether
identical or alternate forms were used. Using identical forms, Burke (1997) found different retest
gains across components of a cognitive and psychomotor ability selection battery, but the retesting
time interval (ranging from 1 to 5 years) did not moderate these gains by any significant amount.
The appropriate time interval for retesting in employment settings depends on factors such as the
availability of alternate forms, cost of testing, the need to fill positions, and the type of test. A mini-
mum of 6 months between administrations has been a common rule of thumb for large-scale ability
or knowledge testing programs, and 30–60 days is more typical for certain types of skills tests (e.g.,
typing, software proficiency).

COMPUTATION OF TEST SCORES


After test data have been collected, scores must be computed and transformed into a single score or
multiple scores that will be used as the basis for making the selection decision. In this section, we
discuss creating predictor composites and reporting test scores.
160 Handbook of Employee Selection

CREATING PREDICTOR COMPOSITES


It is common for organizations to use more than one predictor for decision-making and to combine the
scores on each predictor into a single composite score. For example, standardized scores on a reading
comprehension test, a conscientiousness test, and a SJT may be added together to create a single score
that better describes an individual than could any one test alone. This is a compensatory approach,
in which high scores on one predictor can compensate for low scores on another predictor. This is
contrasted with a noncompensatory or multiple-hurdles approach, in which selected applicants must
meet a minimum passing score on each predictor. Issues associated with a multiple-hurdles approach
are discussed later in this chapter. There are two key decisions that must be made when compiling a
predictor battery and computing a composite score. First, what predictors should be included in the
battery? Second, how should each predictor be weighted in computing the composite score?

Choosing Predictors
A common problem faced by personnel selection researchers and practitioners is choosing a set of
predictors from a larger set of potential predictors for the purpose of creating a predictor battery.
In large organizations, an experimental predictor battery may have been assembled for a validation
study, and the choice of which predictors to include is based on the psychometric characteristics of
the predictors as measured in that study. In smaller organizations, the choice may be based on pub-
lished norms or meta-analysis results for different kinds of predictors. Often, there are additional
practical constraints such as test availability, cost of the tests, and required testing time.
There are usually two goals that are often at odds with one another when assembling a predictor
battery—maximizing criterion-related validity while simultaneously attempting to minimize
adverse impact against protected groups (e.g., racial/ethnic minorities or females). Creating a
composite of several valid predictors is a common strategy for reducing the degree to which a
selection procedure produces group differences (Campbell, 1996; Sackett & Ellingson, 1997;
Schmitt, Rogers, Chan, Sheppard, & Jennings, 1997). The problem is that most valid predictors of
performance are cognitive in nature, and those predictors also tend to have the largest potential for
adverse impact. Therefore, adding a cognitive predictor that increases the validity of the composite
will often have the simultaneous effect of increasing adverse impact (Sackett & Ellingson, 1997).
Compounding the problem is the fact that adding a predictor with little adverse impact to a
predictor with large adverse impact typically does not reduce the adverse impact of the composite
to the extent that would generally be expected (Potosky, Bobko, & Roth, 2005; Sackett & Ellingson,
1997). Sackett and Ellingson (1997) gave an example of two uncorrelated predictors. One predictor
had a standardized mean subgroup difference (d) of 1.00 and the other had a d of 0.00. Most
researchers would expect that the two predictors would offset each other, so the d of an equally
weighted composite of the two predictors would be 0.50. In fact, the d of this composite would
be 0.71. Potosky et al. (2005) further demonstrated the difficulty in reducing adverse impact with
a predictor composite by pointing out that range restriction in many published studies results in
d values for some noncognitive predictors that are too small relative to the d values that exist in
applicant populations. In other words, the potential for adverse impact in many predictors has been
underestimated because d has been computed in range-restricted samples of job incumbents rather
than in the full range of job applicant samples. On the other hand, a meta-analysis of cognitive ability
and race/ethnic differences found that although overall Black-White d values hover around 1.0, the d
values are about 0.60 in high-complexity jobs (Roth, Bevier, Bobko, Switzer, & Tyler, 2001). Thus,
the results of these two meta-analyses suggest that the potential adverse impact of cognitive ability
may be less than previously estimated and the potential adverse impact of alternative predictors may
be greater than previously estimated.
The mathematical presentation of Sackett and Ellingson (1997) and the meta-analysis of Potosky
et al. (2005) demonstrated that reducing adverse impact by adding predictors to a composite is not
as easy as it seems at first glance. The take-away message is that reducing adverse impact is not
Test Administration and the Use of Test Scores 161

a simple matter of adding a noncognitive predictor or two to a predictor composite that includes a
measure of cognitive ability. Researchers and practitioners trying to balance validity against poten-
tial adverse impact when creating predictor composites should explore a wider variety of alterna-
tive predictors and weighting schemes rather than relying on offsetting the adverse impact of one
predictor with another.
How does one choose between alternative composites when the composite that has the highest
validity may have the worst adverse impact, and vice versa? To help inform the decision about which
of two or more potential predictor composites does the best job of maximizing validity and mini-
mizing adverse impact, Johnson, Abrahams, and Held (2004) created a combined validity/adverse
impact index score (VAI). This is a single number that indicates the extent to which a unit-weighted
composite score balances the level of criterion-related validity with the average standardized mean
subgroup difference. Importantly, this index gives more weight to higher levels of d. For example,
an increase in d from 0.00 to 0.10 is not as damaging as an increase from 0.50 to 0.60. The increases
are of the same magnitude, but in the first case potential adverse impact is still low and in the second
case potential adverse impact is becoming less acceptable. The composite that most effectively bal-
ances validity against potential adverse impact according to this procedure is the one at which VAI
is at its maximum. The VAI formula can be adjusted to reflect the relative value the user places on
validity and adverse impact.

Weighting Predictors
When multiple predictors are used, a decision must be made on how much weight to apply to each
predictor when computing a composite score. Two types of weights are considered here—statistical
and rational weights. Statistical weights are data-driven whereas the researcher specifies rational
weights, perhaps with input from SMEs. The most common statistical weights are derived using
multiple regression because regression weights maximize the prediction of the criterion in a sample.
However, regression weights have numerous limitations that suggest alternative weighting schemes
are often more desirable. First, criterion scores must be available, which would not be the case if a
content-oriented validation strategy is used. Second, regression weights focus entirely on prediction
so they cannot take into account adverse impact or other practical considerations. Third, regression
weights can be difficult to interpret and explain to stakeholders.
Finally, the question of ultimate interest is how well regression weights predict when they are
applied to data in other independent samples (e.g., job applicants), not to the specific sample in
which the weights were derived. There are two primary factors influencing the extent to which
regression weights derived in one sample will predict in another. First, sampling error variance
makes regression weights prone to inaccuracy compared to unit weights when sample sizes are
relatively small (i.e., less than about 180; Schmidt, 1971). Second, high predictor intercorrelations
(i.e., multicollinearity) lead to instability in regression weights, making them less applicable to
other samples (Green, 1977). Thus, regression weights derived in a small sample and/or with highly
intercorrelated predictor variables could be very different from the most stable or optimal weights
based on the population of interest.
De Corte, Lievens, and Sackett (2007) presented a procedure for weighting predictors in such
a way that the tradeoff between selection quality (e.g., validity, average criterion score of those
selected) and adverse impact is Pareto-optimized. Pareto optimization means that mean subgroup
differences are minimized for a given level of validity (or validity is maximized for a given level of
mean subgroup differences). The procedure applies optimization methods from the field of opera-
tions research and involves nonlinear programming. The authors offer a computer program for
application of the procedure. Unfortunately, the Pareto-optimal composites require down-weighting
cognitive ability considerably to reduce mean subgroup differences on the composite by a practi-
cally significant amount, and this compromises validity in most cases.
Turning from empirical to rational weighting schemes, examples include job analysis ratings of
importance (Goldstein, Zedeck, & Schneider, 1993) and expert judgments of predictor importance
162 Handbook of Employee Selection

(Janz, Hellervik, & Gilmore, 1986). Johnson (2007) has found that weighting each predictor by the
number of performance dimensions to which it is relevant yields higher validities than does apply-
ing unit weights. Given the multidimensional nature of job performance, the approach of placing
greater weight on predictors that are likely to influence a wider range of outcomes is an attractive
approach from a conceptual standpoint as well as a predictive standpoint. There are also other con-
siderations that may influence the weighting scheme, for better or for worse, such as equally weight-
ing cognitive and noncognitive portions of the predictor battery or weighting to please stakeholders
(e.g., the CEO thinks the interview should be given more weight than the cognitive test).
An important point to keep in mind when considering alternative weighting schemes is that the
weights that are directly applied to a set of predictors, called nominal weights, may not have the desired
effect. Brannick and Darling (1991) showed that variables are not actually weighted as intended unless
they are uncorrelated. This is because applying a nominal weight to one variable also applies an
implicit weight to each of the variables with which that variable is correlated. As a result, the effective
weight applied to a given variable is not the nominal weight, but instead reflects a combination of the
nominal weight and the implicit weights resulting from that variable’s correlation with each of the
other variables in the composite (see Guion, 1998). Because many components of a selection battery
are likely to be positively correlated to some extent, the composite scores created by weighting predic-
tors will not actually reflect the intended weighting scheme. Brannick and Darling (1991) presented
a procedure for translating nominal weights into effective weights, which, when applied, allow each
predictor to contribute to the composite in the manner that was originally intended.
Before spending time coming up with a weighting scheme and worrying if our explicit weights
correspond to our effective weights, we should first ask to what extent does differential weight-
ing of predictors influence the overall composite score? Koopman (1988) and Ree, Carretta, and
Earles (1998) showed that very different sets of weights can lead to highly correlated composite
scores (often above .95). Bobko, Roth, and Buster (2007) reviewed the literature on the usefulness
of unit weights and concluded that unit weights are highly appropriate under many circumstances,
including when using a content-oriented validation strategy. We recommend applying unit weights
to standardized variables in most situations because different sets of weights often make little dif-
ference in the resulting composite score, effective weights rarely correspond to the nominal weights
applied to predictors, and unit weights have been shown to be superior or equivalent to other weight-
ing schemes (Bobko et al., 2007). These weights are the easiest to calculate, the easiest to explain,
and the most generalizable to different situations. Unless sample sizes are large, the number of
predictors is small, and predictor intercorrelations are low, it is probably best to keep weighting of
predictors simple (Bobko et al., 2007; Cohen, 1990).
Going beyond choosing and weighting predictors, three review papers take a broader focus on
the frequent tradeoff between validity and reducing adverse impact (Kravitz, 2008; Ployhart &
Holtz, 2008; Sackett, Schmitt, Ellingson, & Kabin, 2001). These authors examined a wide variety
of organizational strategies indicating which strategies appear to have promise (e.g., administering
tests in a video format to reduce an unnecessary language burden), which strategies have not been
as fruitful (e.g., modifying tests based on differential item functioning statistics), and which longer-
term strategies lack empirical evidence but may be promising (e.g., engaging the organization in
community-based projects in a broad effort to increase visibility and attract more qualified minority
applicants).

REPORTING TEST SCORES


After applicants have been tested, it is necessary to communicate to them how they did. There is
no standard for how much information must be provided, so score reporting runs the gamut from
a simple pass/fail notification to a detailed report of the number correct on each test in the battery
and how the information was combined to create an overall score. We are unaware of any research
on applicant reactions to how test scores are reported (apart from reactions to how the selection
Test Administration and the Use of Test Scores 163

decision was made, such as top-down selection versus banding; Truxillo & Bauer, 1999, 2000). The
type of score information provided should depend on the purpose of the assessment and the nature
of the applicant. For example, if an assessment center is used to evaluate internal candidates for
promotion, it is probably beneficial to the organization and the candidates to provide extensive feed-
back on how the candidate did on each exercise. This provides feedback on developmental needs
that could lead to targeted training interventions and performance improvement among employees
(see Chapter 20, this volume, for details on providing assessment feedback to applicants). On the
other hand, if a standardized test is used to screen out many external applicants who do not meet a
minimum cut score, no more than a pass/fail decision need be communicated. If a pool of potential
candidates is identified that exceed a cut score, but the highest-scoring individuals in that pool will
be given the first opportunity for job openings, then some information about where the individual
stands among other candidates (e.g., a ranking or percentile score) would be appropriate to com-
municate an idea of how likely a job offer is in the near future.
Raw test scores are often transformed to make them more interpretable to applicants. This is espe-
cially important when the selection instrument contains tests that do not have right and wrong answers
(e.g., personality tests). To avoid confusion, we recommend that negative scores not be reported, which
may occur with raw scores from biodata inventories or by computing z-scores. A common transforma-
tion for score reporting is to standardize scores to have a mean of 50 and SD of 10 (called T-scores).
This provides information about how the applicant did on each test but does not explicitly state the
number correct. Another transformation is to convert the total score to a 100-point scale, and when
a cut score is used, the cut score can be set at a value such as 70. This reporting method is easy to
understand, because scores are similar to grades in school, and provides applicants with a good idea
of where they stand compared to the maximum. A downside is it may incorrectly imply that the score
represents the percentage of items answered correctly. Scores may also be placed into categories for
the purposes of communication to management or other constituencies (e.g., “excellent,” “good,” “bor-
derline,” “poor”). These coarser categories are sometimes used for selection, although that will reduce
the validity of the selection measure, as discussed in the next section.

MAKING SELECTION DECISIONS


When the final test scores or composite scores have been computed, they must be translated into
a selection decision. There are many ways to arrive at the decision to select, reject, or move to the
next phase of the selection process. In this section, we discuss top-down selection, setting cut scores,
banding, multiple hurdles, and selection to fit a profile.

METHODS OF SELECTION
Given a selection process that shows linear prediction of a meaningful criterion, top-down selection
using a linear composite of the standardized scores is the standard method for maximizing the util-
ity of criterion-related validity coefficients, assuming there is no nonrandom attrition such as when
the top-ranked talent decides to take a job elsewhere (Murphy, 1986). Any alternative to top-down
selection usually means compromising the validity and utility of the test at least to some extent, and
sometimes to a great extent (Schmidt, Mack, & Hunter, 1984; Schmidt, 1991). However, as we have
seen, maximizing validity often means high adverse impact, so many larger organizations look for
alternatives to top-down selection. Three major alternatives are prominent in the selection litera-
ture. The first alternative is setting a cut score, above which applicants are selected and below which
applicants are rejected. Often everyone who passes the test is considered qualified, but the number
of job openings is smaller than the number of qualified applicants. Job offers may then be made
in a top-down fashion, making the cut score almost irrelevant. Alternatively, other considerations
may come into play, such as job experience or other additional skill sets. In these cases, selection is
operating more like a multiple-hurdle selection system.
164 Handbook of Employee Selection

In fact, setting a cut score for a test as part of a multiple hurdle selection system is the second major
alternative to top-down selection. Those scoring above the cut score will move to the next stage of the
selection process, which may be another test or something as simple as an unstructured interview or
reference check. It is possible to set up a multiple-hurdle selection system such that the selection cutoffs
and the order of the predictors reduces mean subgroup differences and adverse impact ratios while
retaining the highest possible levels of mean predicted performance (De Corte, Lievens, & Sackett,
2006; Sackett & Roth, 1996). Ideally this approach could be Pareto-optimized, as we discussed with
regard to weighting predictor composites (De Corte et al., 2007), but the multivariate conditions are
very complex to solve even with nonlinear programming. Multiple hurdles offer the advantage of
reducing the overall cost of the selection procedure, because all applicants do not complete each com-
ponent. This allows the more expensive tests (e.g., computer simulations) or time-intensive tests (e.g.,
assessment center) to be administered to smaller numbers of applicants at the end of the process. This
increases the utility of the selection system compared with a compensatory approach by reducing the
cost of selection, but it depends on the assumption that low scores at an early stage of selection can-
not be compensated for by high scores at a later stage. A disadvantage of multiple hurdles is that the
reliability of the entire selection system is reduced because the reliability of the entire system is the
product of the reliabilities of each element of the system (Haladyna & Hess, 1999).
The third alternative is the use of test score banding procedures. Banding is a broad term
that encompasses any selection procedure that groups test scores together and considers them to
be equivalent. For example, standard error of the difference (SED) banding considers scores to
be equivalent unless they are significantly different from each other (Cascio, Outtz, Zedeck, &
Goldstein, 1991). Several empirical papers provide information on the necessary tradeoff between
maximizing validity or mean predicted performance and minimizing adverse impact that results
from the use of banding (e.g., Campion et al., 2001; Sackett & Roth, 1991; Schmitt & Oswald,
2004). The tradeoff tends to be larger when the bands are larger, the selection ratio is smaller, and
the standardized mean difference between groups is larger. However, these rules are not set in stone
because results also depend on the banding method and how the size of the band happens to align
with the cutoff point for selection in a particular data set. Ironically, there is very little evidence that
banding has much of a practical effect in reducing adverse impact (Barrett & Lueke, 2004), except
in the case of top-down selection of protected group members within bands (Sackett & Roth, 1991).
This is not a viable strategy because the Civil Rights Act of 1991 prohibits selection on the basis of
protected class status without a consent decree, and random selection within bands may actually
increase adverse impact (Barrett & Lueke, 2004).
Although there may be many situations in which the use of these alternatives to top-down selec-
tion makes sense for the organization, we do not recommend adopting them solely for the purpose
of reducing adverse impact. When cognitive ability measures are incorporated into a selection bat-
tery, large tradeoffs between adverse impact and validity are often impossible to avoid. Although
nothing in today’s U.S. legal system bars an organization from sacrificing validity to reduce adverse
impact, doing so fails to take full advantage of what selection research on validity has to offer
(Pyburn, Ployhart, & Kravitz, 2008). Furthermore, we would agree that the courts could perceive a
procedure that decreases the validity of the predictors (cognitive and noncognitive) as a decrease in
the job relevance of the selection system. Addressing adverse impact concerns in a proactive manner
should go well beyond simply assigning applicants to coarse categories (e.g., active recruitment of
minorities, fostering a climate for diversity, engagement in the community).

SETTING CUT SCORES


When it is necessary to set one or more cut scores, there are numerous methods from which to
choose (see Kehoe & Olson, 2005 and Mueller, Norris, & Oppler, 2007 for comprehensive sum-
maries of methods for setting cut scores). In general, methods for determining cut scores can be
grouped into the categories of judgmental methods and empirical methods. The most common
Test Administration and the Use of Test Scores 165

judgmental method is the Angoff (1971) method, in which expert judges estimate the probabil-
ity that a minimally qualified candidate will answer each test item correctly. The estimates are
summed across items to calculate the expected value of the mean test score for minimally qualified
candidates. A criticism of the Angoff method is that judges generally find it difficult to estimate
these probabilities. In fact, they tend to overestimate them, resulting in higher cut scores than those
determined by other methods. The cut score is often subsequently adjusted by lowering it one or two
standard errors of measurement of the test. Despite this general problem in estimation, Angoff-like
methods have the advantage of being well received by the courts (Kehoe & Olson, 2005).
Empirical methods for establishing cut scores are based on the relationship between test per-
formance and criterion performance, so a criterion-related validation study is required to use these
methods. In the regression technique, the criterion score associated with successful job performance
is determined and regression is used to find the test score corresponding to that predicted criterion
score. Forward regression regresses criterion scores on test scores, and reverse regression regresses
test scores on criterion scores. These methods produce different cut scores, so both methods could
be used to produce a range of cut scores and expert judgment could be used to set the appropriate
cut score within that range (Mueller et al., 2007).
Expectancy charts are a useful way of presenting information on the relationship between test
performance and criterion performance and can be used to help set a cut score. Expectancy charts
can graphically display expected criterion performance or the percentage of those selected who
would be expected to be successful on the job given a set of alternative cut scores. The advantages
of using expectancy charts to set cut scores are (a) they are easy for decision-makers to understand,
(b) the cut score is based on expected criterion performance, and (c) the courts have shown support
for these types of methods (Kehoe & Olson, 2005).
The method used to select cut scores deserves a great deal of attention because cut scores have
increasingly been subject to legal challenge. When cut scores are used, test users should put as much
effort into setting cut scores as they put into establishing the validity of the test. Test users should
carefully consider the need for a cut score and determine if a top-down selection strategy would be
more desirable. If a good business necessity case can be made, defending a top-down strategy may
be easier than defending a cut score because there is a great deal of room for interpretation in the
legal and professional literature on cut scores (e.g., what constitutes minimum qualifications). Test
users must be careful to have a clear and consistent rationale based on sound professional judgment
for what is done at each step of the cut score setting process. Because cut scores are likely to remain
in use in many selection contexts, future research could profitably investigate further the major sub-
stantive and methodological factors involved in the justification and setting of cut scores.

SELECTION TO FIT A PROFILE


The selection methods we have reviewed thus far are based on the assumption that higher scores
are better, but some have advocated selection on the basis of how well an individual fits a pro-
file (e.g., McCulloch & Turban, 2007). There are many types of fit (e.g., person-job, person-
organization, person-group, person-supervisor; Kristof-Brown, Zimmerman, & Johnson, 2005),
but person-organization (P-O) fit seems to be most commonly advocated for selection. P-O fit is
typically conceptualized as congruence between individual and organizational values or goals
and is strongly related to organizational attitudes (Kristof-Brown et al., 2005).
Conceptually, P-O fit is thought to predict important organizational criteria such as job performance
and turnover in ways that traditional selection measures do not. Empirical support for this is found
in a meta-analysis of P-O fit by Arthur, Bell, Villado, and Doverspike (2006), who found corrected
mean validities of .15 for predicting job performance and .24 for predicting turnover. Indications
were that work attitudes partially mediated the relationships between P-O fit and these criteria, so
selection may be more on the basis of satisfaction than job performance. Arthur et al. (2006) recom-
mended not using P-O fit to make selection decisions in the absence of a local validation study, yet
166 Handbook of Employee Selection

this meta-analysis also suggests that fit measures may be useful tools postselection for developmental
purposes, such as when working with employees who may develop performance issues or who are
withdrawing and may be considering leaving the organization because of some type of misfit.
A major issue with selection for any type of fit is calculating a fit score. Common techniques are
difference scores and correlations. Difference scores suffer from several methodological problems
(Edwards, 1994), including the confounding of effects of the variables composing the difference
score and attenuation of reliability. Correlations reflect similarity in profile shape but not absolute
differences between scores. Arthur et al. (2006) found stronger relationships when fit was calculated
via correlations than via difference scores. Polynomial regression is the appropriate analysis method
when evaluating the relationship between fit and a criterion (Edwards, 1994), but this cannot be used
for within-subject analyses to assign scores to individuals. As yet, there is no ideal way to measure
fit (Kristof-Brown et al., 2005).
On the basis of current research, we recommend that P-O fit only be used for selection when the
goal is to minimize turnover and a local validation study is possible. A procedure similar to that of
McCulloch & Turban (2007) holds promise and should be legally defensible. They had call center
managers describe the characteristics of the call center using a Q-sort measure to sort 54 work
descriptors using a 9-point scale in a forced normal distribution. This defined the call center profile.
Call center representatives sorted the same descriptors in terms of how much they valued each char-
acteristic. The P-O fit score was the correlation between the individual profile and the call center
profile. P-O fit was correlated .36 with employee retention and uncorrelated with job performance.
Personality has also been used as a content domain for assessing P-O fit (Kristof-Brown et
al., 2005). Tett, Jackson, Rothstein, and Reddon (1994) showed that personality scales that are
positively related to a performance dimension in some situations may have legitimately negative
correlations with the same performance dimension in other situations. For example, a person high
in agreeableness may do well in an organization that has a team-based, cooperative culture but
may have difficulty in an organization with a culture that is highly competitive and adversarial.
This suggests that the first organization should select applicants that are high on agreeableness
and the second organization should select applicants that are low on agreeableness, but selecting
individuals who are low on a positive trait measure is probably not the best approach. Rather, the
second organization should include a measure of competitiveness in the selection system and use
top-down selection.

CONCLUSIONS
Clearly, the use of test scores in personnel selection is embedded within a large and multifaceted
context. Given that selection is only one part of a set of system-wide organizational policies and
practices, there are many important questions to which it is worthwhile to invest the time in pur-
suing answers. In this chapter, we highlighted questions that must be asked (a) at the outset of the
testing program, (b) with regard to collection of test scores, (c) when computing test scores, and (d)
when making selection decisions. Test users should be aware of the legal implications of each deci-
sion made to avoid potential litigation problems. They should also be aware of the organizational
implications of each decision to ensure that the testing program is consistent with other organiza-
tional goals. When used properly by an experienced professional, test scores have great potential to
positively influence organizational outcomes.

REFERENCES
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measure-
ment (2nd ed., pp. 508–600). Washington, DC: American Council on Education.
Arthur, W., Bell, S. T., Villado, A. J., & Doverspike, D. (2006). The use of person-organization fit in employment deci-
sion making: An assessment of its criterion-related validity. Journal of Applied Psychology, 91, 786–801.
Test Administration and the Use of Test Scores 167

Barrett, G. V., & Lueke, S B. (2004). Legal and practical implications of banding for personnel selection. In
H. Aguinis (Ed.), Test-score banding in human resource selection: Technical, legal, and societal issues
(pp. 71–111). Westport, CT: Praeger.
Bobko, P., Roth, P. L., & Buster, M. A. (2007). The usefulness of unit weights in creating composite scores: A
literature review, application to content validity, and meta-analysis. Organizational Research Methods,
10, 689–709.
Brannick, M. T., & Darling R. W. (1991). Specifying importance weights consistent with a covariance struc-
ture. Organizational Behavior and Human Decision Processes, 50, 395–410.
Burke, E. F. (1997). A short note on the persistence of retest effects on aptitude scores. Journal of Occupational
and Organizational Psychology, 70, 295–301.
Campbell, J. P. (1996). Group differences and personnel decisions: Validity, fairness, and affirmative action.
Journal of Vocational Behavior, 49, 122–158.
Campbell, W. J., & Reilly, M. E. (2000). Accommodations for persons with disabilities. In J. F. Kehoe (Ed.),
Managing selection in changing organizations (pp. 319–370). San Francisco, CA: Jossey-Bass.
Campion, M. A., Outtz, J. L., Zedeck, S., Schmidt, F. L., Kehoe, J. F., Murphy, K. R., & Guion, R. M. (2001).
The controversy over score banding in personnel selection: Answers to 10 key questions. Personnel
Psychology, 54, 149–185.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York, NY: Cambridge
University Press.
Cascio, W. F., Outtz, J., Zedeck, S., & Goldstein, I. L. (1991). Statistical implications of six methods of test
score use in personnel selection. Human Performance, 4, 233–264.
Chan, D., & Schmitt, N. (1997). Video versus paper-and-pencil method of assessment in situational judg-
ment tests: Subgroup differences in test performance and face validity perceptions. Journal of Applied
Psychology, 82, 143–159.
Chang, H. H., & Ying, Z. (2008). To weight or not to weight? Balancing influence of initial items on adaptive
testing. Psychometrika, 73, 441–450.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304–1312.
Cronbach, L. J. (1990). Essentials of psychological testing (5th ed.). New York, NY: Harper Collins.
De Corte, W., Lievens, F., & Sackett, P. R. (2006). Predicting adverse impact and mean criterion performance
in multistage selection. Journal of Applied Psychology, 91, 523–537.
De Corte, W., Lievens, F., & Sackett, P. R. (2007). Combining predictors to achieve optimal trade-offs between
selection quality and adverse impact. Journal of Applied Psychology, 92, 1380–1393.
Donnellan, M. B., Oswald, F. L., Baird, B. M., & Lucas, R. E. (2006). The Mini-IPIP scales: Tiny-yet-effective
measures of the Big Five factors of personality. Psychological Assessment, 18, 192–203.
Dorans, N. J., Pommerich, M., & Holland, P. W. (2007). Potential solutions to practical equating issues. New
York, NY: Springer.
Edwards, J. R. (1994). The study of congruence in organizational behavior research: Critique and a proposed
alternative. Organizational Behavior and Human Decision Processes, 58, 51–100.
Fan, X. (1998). Item response theory and classical test theory: An empirical comparison of their item/person
statistics. Educational and Psychological Measurement, 58, 357–381.
Gibson, W. M., & Weiner, J. A. (1998). Generating random parallel test forms using CTT in a computer-based
environment. Journal of Educational Measurement, 35, 297–310.
Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An exploration of the job analysis-content validity process.
In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organizations (pp. 3–34). San Francisco,
CA: Jossey-Bass.
Green, B. F. (1977). Parameter sensitivity in multivariate methods. Multivariate Behavioral Research, 12,
264–288.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions. Mahwah, NJ: Lawrence
Erlbaum.
Haladyna, T., & Hess, R. (1999). An evaluation of conjunctive and compensatory standard-setting strategies for
test decisions. Educational Assessment, 6, 129–153.
Hausknecht, J. P., Halpert, J. A., Di Paolo, N. T., & Moriarty Gerrard, M. O. (2007). Retesting in selection: A
meta-analysis of coaching and practice effects for tests of cognitive ability. Journal of Applied Psychology,
92, 373–385.
Huselid, M. A., Jackson, S. E., & Schuler, R. S. (1997). Technical and strategic human resource management
effectiveness as determinants of firm performance. Academy of Management Journal, 40, 171–188.
Janz, T., Hellervik, L., & Gilmore, D. (1986). Behavior description interviewing: New, accurate, cost effective.
Boston, MA: Allyn and Bacon.
168 Handbook of Employee Selection

Johnson, J. W. (2007). Synthetic validity: A technique of use (finally). In S. M. McPhail (Ed.), Alternative
validation strategies: Developing new and leveraging existing validity evidence (pp. 122–158). San
Francisco, CA: Jossey-Bass.
Johnson, J. W., Abrahams, N., & Held, J. D. (2004, April). A procedure for selecting predictors considering
validity and adverse impact. Poster presented at the 19th Annual Conference of the Society for Industrial
and Organizational Psychology, Chicago, IL.
Kehoe, J. F., & Olson, A. (2005). Cut scores and employment discrimination litigation. In F. J. Landy (Ed.),
Employment discrimination litigation: Behavioral, quantitative, and legal perspectives (pp. 410–449).
San Francisco, CA: Jossey-Bass.
Kim, D-I., Choi, S. W., Lee, G., & Um, K. R. (2008). A comparison of the common-item and random-groups
equating designs using empirical data. International Journal of Selection and Assessment, 16, 83–92.
Koopman, R. F. (1988). On the sensitivity of a composite to its weights. Psychometrika, 53, 547–552.
Kravitz, D. A. (2008). The validity-diversity dilemma: Beyond selection—The role of affirmative action.
Personnel Psychology, 61, 173–193.
Kristof-Brown, A. L., Zimmerman, R. D., & Johnson, E. C. (2005). Consequences of individuals’ fit at work:
A meta-analysis of person-job, person-organization, person-group, and person-supervisor fit. Personnel
Psychology, 58, 281–342.
Lievens, F., Buyse, T., & Sackett, P. R. (2005). Retest effects in operational selection settings: Development and
test of a framework. Personnel Psychology, 58, 981–1007.
Lievens, F., Reeve, C. L., & Heggestad, E. D. (2007). An examination of psychometric bias due to retesting on
cognitive ability tests in selection settings. Journal of Applied Psychology, 92, 1672–1682.
Lievens, F., & Sackett, P. R. (2007). Situational judgment tests in high-stakes settings: Issues and strategies
with generating alternate forms. Journal of Applied Psychology, 92, 1043–1055.
Livingston, S. A. (2004). Equating test scores (without IRT). Princeton, NJ: Educational Testing Service.
MacDonald, P., & Paunonen, S. V. (2002). A Monte Carlo comparison of item and person characteristics based
on item response theory versus classical test theory. Educational and Psychological Measurement, 62,
921–943.
McCulloch, M. C., & Turban, D. B. (2007). Using person-organization fit to select employees for high-turnover
jobs. International Journal of Selection and Assessment, 15, 63–71.
Meade, A. W., & Bauer, D. J. (2007). Power and precision in confirmatory analytic tests of measurement invari-
ance. Structural Equation Modeling: A Multidisciplinary Journal, 14, 611–635.
Millsap, R. E., & Kwok, O. (2004). Evaluating the impact of partial factorial invariance on selection in two
populations. Psychological Methods, 9, 93–115.
Mueller, L., Norris, D., & Oppler, S. (2007). Implementation based on alternate validation procedures: Ranking,
cut scores, banding, and compensatory models. In S. M. McPhail (Ed.), Alternative validation strategies:
Developing new and leveraging existing validity evidence (pp. 349–405). San Francisco, CA: Jossey-
Bass.
Murphy, K. R. (1986). When your top choice turns you down: Effect of rejected offers on the utility of selection
tests. Psychological Bulletin, 99, 133–138.
Nye, C. D., Do, B.-R., Drasgow, F., & Fine, S. (2008). Two-step testing in employment selection: Is score infla-
tion a problem? International Journal of Selection and Testing, 16, 112–120.
Oswald, F. L., Friede, A. J., Schmitt, N., Kim, B. H., & Ramsay, L. J. (2005). Extending a practical method for devel-
oping alternate test forms using independent sets of items. Organizational Research Methods, 8, 149–164.
Peterson, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (Ed.),
Educational measurement (3rd ed.). New York, NY: American Council on Education and MacMillan.
Ployhart, R. E., & Holtz, B. C., (2008). The diversity-validity dilemma: Strategies for reducing racioethnic and
sex subgroup differences and adverse impact in selection. Personnel Psychology, 61, 153–172.
Potosky, D., Bobko, P., & Roth, P. L. (2005). Forming composites of cognitive ability and alternative measures
to predict job performance and reduce adverse impact: Corrected estimates and realistic expectations.
International Journal of Selection and Assessment, 13, 304–315.
Pyburn, K. M., Jr., Ployhart, R. E., & Kravitz, D. (2008). The validity-diversity dilemma: Overview and legal
context. Personnel Psychology, 61, 143–151.
Raymond, M. R., Neustel, S., & Anderson, D. (2007). Retest effects on identical and parallel forms in certifica-
tion and licensure testing. Personnel Psychology, 60, 367–396.
Ree, M., Carretta, T., & Earles, J. (1998). In top-down decisions, weighting variables does not matter: A con-
sequence of Wilks’ theorem. Organizational Research Methods, 1, 407–420.
Roth, P. L., Bevier, C. A., Bobko, P., Switzer, F. S., & Tyler, P. (2001). Ethnic group differences in cognitive
ability in employment and educational settings: A meta-analysis. Personnel Psychology, 54, 297–330.
Test Administration and the Use of Test Scores 169

Sackett, P. R., & Ellingson, J. E. (1997). The effect of forming multi-predictor composites on group differences
and adverse impact. Personnel Psychology, 50, 707–721.
Sackett, P. R., & Roth, L. (1991). A Monte Carlo examination of banding and rank order methods of test score
use in personnel selection. Human Performance, 4, 279–295.
Sackett, P. R., & Roth, L. (1996). Multi-stage selection strategies: A Monte Carlo investigation of effects on
performance and minority hiring. Personnel Psychology, 49, 549–572.
Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. (2001). High-stakes testing in employment, creden-
tialing, and higher education: Prospects in a post-affirmative-action world. American Psychologist, 56,
302–318.
Schmidt, F. L. (1971). The relative efficiency of relative and simple unit predictor weights in applied differen-
tial psychology. Educational and Psychological Measurement, 31, 699–714.
Schmidt, F. L. (1991). Why all banding procedures in personnel selection are logically flawed. Human
Performance, 4, 265–277.
Schmidt, F. L., Mack, M. J., & Hunter, J. E. (1984). Selection utility in the occupation of U.S. park ranger for
three modes of test use. Journal of Applied Psychology, 69, 490–497.
Schmitt, N., & Chan, D. (2006). Situational judgment tests: Method or construct? In J. A. Weekley &
R. E. Ployhart (Eds.), Situational judgment tests: Theory, measurement, and application (pp. 135–156).
Mahwah, NJ: Lawrence Erlbaum.
Schmitt, N., & Kuljanin, G. (2008). Measurement invariance: Review of practice and implications. Human
Resource Management Review, 18, 210–222.
Schmitt, N., & Oswald, F. L. (2004). Statistical weights of ability and diversity in selection decisions based on
various methods of test-score use. In H. Aguinis (Ed.), Test-score banding in human resource selection:
Technical, legal, and societal issues (pp. 113–131). Westport, CT: Praeger.
Schmitt, N., Rogers, W., Chan, D., Sheppard, L., & Jennings, D. (1997). Adverse impact and efficiency among
various predictor combinations. Journal of Applied Psychology, 82, 719–730.
Segall, D. O. (2001, April). Detecting test compromise in high-stakes computerized adaptive testing: A verifica-
tion testing approach. Paper presented at the annual meeting of the National Council on Measurement in
Education, Seattle, WA.
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of person-
nel selection procedures (4th ed.). Bowling Green, OH: Author.
Stanton, J. M., Sinar, E. F., Balzer, W. K., & Smith, P. C. (2002). Issues and strategies for reducing the length
of self-report scales. Personnel Psychology, 55, 167–194.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2004). Examining the effects of differential item functioning
and differential test functioning on selection decisions: When are statistically significant effects practi-
cally important? Journal of Applied Psychology, 89, 497–508.
Tett, R. P., Jackson, D. N., Rothstein, M., & Reddon, J. R. (1999). Meta-analysis of bi-directional relations in
personality-job performance research. Human Performance, 12, 1–29.
Tippins, N. T., Beaty, J., Drasgow, F., Gibson, W. M., Pearlman, K., Segall, D. O., & Shepherd, W. (2006).
Unproctored Internet testing in employment settings. Personnel Psychology, 59, 189–225.
Truxillo, D. M., & Bauer, T. N. (1999). Applicant reactions to test score banding in entry-level and promotional
contexts. Journal of Applied Psychology, 84, 322–339.
Truxillo, D. M., & Bauer, T. N. (2000). The roles of gender and affirmative action in reactions to test score use
methods. Journal of Applied Social Psychology, 30, 1812–1828.
U.S. Department of Labor. (1999). Testing and assessment: An employer’s guide to good practices. Washington,
DC: U.S. Department of Labor, Employment and Training Administration.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature:
Suggestions, practices and recommendations for organizational research. Organizational Research
Methods, 3, 4–70.
van der Linden, W. J. (2005). Linear models for optimal test design. New York, NY: Springer.
Viswesvaran, C., & Ones, D. S., (1999). Meta-analyses of fakability estimates: Implications for personality
measurement. Educational and Psychological Measurement, 59, 197–210.
Weekley, J. A., & Jones, C. (1997). Video-based situational testing. Personnel Psychology, 50, 25–49.
This page intentionally left blank
8 Technology and
Employee Selection

Douglas H. Reynolds and David N. Dickter

INTRODUCTION: THE CHANGING RELATIONSHIP BETWEEN


INDUSTRIAL-ORGANIZATIONAL PSYCHOLOGY
AND INFORMATION TECHNOLOGY
Technology has become an essential aspect of personnel selection. Organizations expect efficiency,
convenience, and cost savings—computing and technology provide these benefits. Selection pro-
grams are implemented with a range of rules, processes, and systems, and the validity and reli-
ability of these programs will be bounded by the organization’s ability to transact the information
exchanges effectively. Automation provides standardization, reduces the administrative burden, and
removes human error from some parts of the process, thereby ensuring logic and consistency in
selection processes. Handbook of Employee Selection includes this chapter because the trend will
be toward more automation rather than less.
With the widespread use of computers and technology in organizations, it is reasonable to
expect industrial-organizational (I-O) psychologists and other selection experts to achieve a degree
of mastery in the implementation of selection systems through technology platforms. Indeed,
technology has become an essential competency in a broader sense for I-O psychology, worthy
of special training just as statistical data analysis software has been for generations of graduate
students. As scientists and scholars, we must embrace and lead the study of how technologies are
impacting organizational systems. As practitioners, several new organizational realities require our
technical expertise as shown in Table 8.1. Clearly, the advent of technology requires us to stay on
top of information management and policy. Without acquiring and maintaining expert knowledge
of technology, psychologists’ interests may become subordinated to information technology (IT)
departments and software providers that typically design the data infrastructure. If IT drives the
selection process, then the function becomes software engineering, and psychologists may lose the
opportunity to be providers of trusted advice and consultation.
If applied thoughtfully, technology represents an opportunity for selection specialists to become
more strategic. Human resources (HR) departments tend to be viewed as service providers rather
than partners in setting strategy and making decisions. Psychologists can improve this perception
several ways. By using data to gain insight about a workforce, selection specialists earn a seat at the
decision-making table as a source of wisdom about HR planning. Given the trend toward integrat-
ing company data across tests, performance, development systems, and organizational metrics, it is
easier to gain access to information for conducting linkage research, thus showing the connection
between employee actions and organizational outcomes. This information is valuable to leaders and
also will provide the field with new recommendations for research and practice. By studying the use
of technology in the workplace, psychologists can add value in organizations by providing expertise
for shaping how technology can best support people processes to influence business outcomes (see
Table 8.1).

171
172 Handbook of Employee Selection

TABLE 8.1
How Technology Has Changed I-O Practice and Educational Requirements
Some New Realities

Business leadership perceives technology as essential for efficiency.


• Technology is viewed as essential for HR innovation—required to collect, store, and transmit data.

Automated workflows are commonplace.


• Transactional systems maximize efficiency and provide self-service tools to HR and businesses.
• Self-service for the examinee: unproctored Internet testing.
• I-O psychologists must, at a minimum, maintain and transact via these systems.

Scope and complexity have increased.


• Large-scale databases with integrated information about tests, performance, development, and organizational metrics
[e.g., enterprise resource planning (ERP), talent management systems].
• Rapid pace of change in systems (upgrades, new versions, new features).
• Technology implementation failures are common; a root cause is the tendency to underestimate the complexity of the
project (Standish Group, 1995).

Recommendations for I-O Psychologists


• Stay informed about technology.
• Forge partnerships with IT departments to develop new products and systems.
• Use technology as an opportunity to become more strategic.
• Provide information valuable to leaders; for example, use data to inform decision-making, serve as a unique viewpoint
about HR planning, and gain access to information for conducting linkage research.
• Study the use of technology in the workplace, providing I-O psychology with new recommendations for research and
practice.
• Learn and teach new statistical methods such as analysis at multiple organizational levels (individual, group, and
company-level); dynamic, time-dependent analyses tracking longitudinal employee data, (e.g., event history methods,
time-series analyses, latent growth modeling, dynamic performance modeling); and IRT or other methods to reduce the
length of tests and as content exposure from fixed forms.

What Is at Stake
• Retaining roles as functional architects of HR systems vs. system administrators.
• Setting strategy and making decisions vs. acting as service providers.
• Moving I-O psychology forward with theories and analytical methods required to understand modern work
environments vs. missing opportunities to advance the discipline in step with technology.

In this chapter, we first provide an overview of common technology-based systems and their
benefits and limitations. Then, we cover a core set of critical issues common to technology-based
selection systems—are they always the best approach, do they add utility, and are they fair and
defensible? The issues include measurement equivalence, unproctored Internet administration,
advanced delivery methods, demographic issues regarding technology, applicant reactions, and
legal considerations. Next, we discuss a range of implementation issues confronting the practi-
tioner: how to build and maintain the system; how to transition from other systems and/or legacy
paper-pencil processes; and how to integrate between selection systems and other components of
the talent management process. Finally, we discuss current HR technology and how to stay up to
date. This section will address technology advances that are new as of this writing, the drivers for
other innovation trends that appear likely for the future, and references for staying current in HR
technology applications.
The next section highlights some of the common technologies currently in use by organizations
for personnel selection.
Technology and Employee Selection 173

TECHNOLOGY-BASED SYSTEMS TO SUPPORT EMPLOYEE SELECTION


Effective HR software systems must address important business needs for the organizations that
invest in them. For HR processes, automated tools tend to fulfill needs that range on a continuum
from tactical concerns, such as data tracking and storage, to more strategic issues such as gaining
additional insight regarding the individuals who apply to the organization and deploying these indi-
viduals to take maximal advantage of these new insights; see Figure 8.1.
The contrast between tactical process facilitation and the ability to provide strategic insight and
advantage is a long-standing concern for HR practitioners, and it extends into software applications
that serve this market also (e.g., Fletcher, 2005). Both sides of the continuum represent essential
roles for an HR function, but the challenge has been reaching beyond tactics to inform organiza-
tional strategy.
On the tactical side of this continuum, the value to the organization is focused on improve-
ments in speed and efficiency. By automating standard tasks such as the movement of job seekers
through a selection process, the use of software can reduce the labor costs associated with manual
processes while adding benefits associated with standardization of the activities involved. These
improvements create value for organizations that are large enough to accrue enough efficiency
gains to justify the expense of the tool and its implementation. For smaller organizations, the
gains in labor cost are lower, so the tools must be more streamlined and less expensive. Moving
toward the strategic side of the continuum, value is derived by taking advantage of information
about people—information that could not be collected on a large scale without automation. An
example of a strategic contribution includes increasing the ability of an organization to pre-
dict the performance, satisfaction, or tenure of a new associate on the basis of psychometrically
sound tests. Of course effective psychometric tests were available before automation was a market
requirement; however, software providers have incorporated tests and other measures into their
product offerings with intent to provide additional value by moving up the continuum toward a
more strategic contribution.
Major examples of software systems that are designed to support employee selection processes,
ranging from tactical process facilitators to tools that provide insight that can help inform strat-
egy, can be grouped into a few common categories. These are discussed briefly in the following
sections.

Applicant Tracking Systems


The primary value of an applicant tracking system (ATS) is tactical. The system should collect,
track, and report critical information about open positions, candidates, and selection processes to
enable the efficient management of recruiting and staffing functions. ATSs also frequently serve as

Tactical Strategic

Examples: Resume Qualifications Testing &


tracking screening assessment

Automation/standardization Insight into candidate capabilities


Process facilitation Insight to inform
Speed/efficiency organizational strategy
Connections between
information systems to examine
employees over time

FIGURE 8.1 Tactical vs. strategic impact of technology-based selection tools.


174 Handbook of Employee Selection

software hubs for additional services (e.g., job posting and background checking) to further extend
the value they provide to the hiring process through automation.
In addition to the main functions of data tracking and management, an ATS will enable storage
and reporting of candidate quality and flow rates throughout the staffing process. This allows
for computation of effectiveness metrics, such as the success rates of recruiters and recruiting
channels, time to hire, and the criterion validity of candidate information. Data storage and
reporting are also critical for understanding how the system as a whole is operating with respect to
critical outcomes such as the diversity mix of the candidate pool at each stage of selection. These
reports are required to support government recordkeeping requirements and to respond to audit and
challenge requests.
On its own, an ATS typically provides very little sophistication to support the measurement of
people, so supplemental processes are often added to support the measurement required for strong
selection decisions, and these tools are usually required to integrate with the processes and data
formats supported by the ATS.

Resume Storage, Parsing, and Keyword Search


Resume storage tools are typically built to work in concert with an ATS. These tools are designed
to address the tactical issues associated with managing the high volume of candidates that can be
generated via Internet job postings. Generally, the purpose of these tools is to support the storage
and retrieval of job seeker information on the basis of a text search of the information contained
within candidate resumes. Typical features for these tools include keyword parsing and various
search methods.
Resume parsing tools will automatically deconstruct the resume and put relevant information,
(e.g., contact information, degrees, and educational institutions) and certifications into database
fields. These tools thus create efficiencies by eliminating the manual entry of this information into
the ATS.
Once critical information is gleaned from the resume, keyword search tools can be deployed
to assist recruiters in the task of assembling a group of job seekers who meet criteria that can be
imposed during a database search. These tools may include advanced technologies that allow the
meaning of a word or phrase to be detected from the context in which it appears in the resume. For
example, the resume for a financial analyst that worked for State Street would parse “State Street”
under experience, not as the job seeker’s address.
Resume search tools can help improve the efficiency of large-scale recruiting processes, but they
have severe limitations for providing insight into job seeker qualities. Because they are not stan-
dardized, resumes cannot reveal the quality of prior work, nor do they reflect the learning gained
from prior experience. These essential characteristics can best be determined via more advanced
screening, assessment, and interviewing techniques.

Applicant Screening Tools


Applicant screening tools can provide benefits for both tactical efficiency and strategic insight.
When they are designed and implemented effectively, they can provide a standardized method for
quickly collecting background data on job seekers and grouping them on the basis of their predicted
success for specific open positions.
These tools provide a means to construct or choose questions for administration to job seekers,
methods for scoring the questions, and ranking people on the factors that are assessed. These tools,
often described as “resume builders,” use standardized questions that have been studied through job
analysis and determined to be relevant to the job.
Screening systems can offer a range of options for collecting job-relevant information about
job seekers. Common questions include work and educational history (some systems may extract
this information from a resume and have candidates review and update the extracted information),
basic qualifications (e.g., licenses, certifications, and years of relevant experience), and specific
Technology and Employee Selection 175

experiences (e.g., with equipment, work processes, or business issues common to the job). Because
it is possible for HR administrators and hiring managers to build these questions themselves within
the software, sometimes with little oversight, it is important to emphasize that screening questions,
like test questions, are pre-employment tools that must be job-relevant and valid. This requirement
can create a challenge for the broad distribution of the user rights to construct these automated
screening questions.
Sophisticated versions of these screening tools are based on the same measurement principles as
the weighted application blank technique (e.g., England, 1971), in which scores are set based on the
value of a particular response for predicting job success. Here again, the sophistication of the mea-
surement should be balanced against the qualifications of the users. Unlike a weighted application
blank, where an I-O psychologist would likely guide the design of the tool, the design of the scoring
scheme in many automated tools may be delegated to a broader range of system users. Software
developers and users may perceive this feature as a benefit, but the flexibility comes with substantial
risk. For example, basic qualification questions are frequently too strict or bear little relationship to
the performance on the job if they are not constructed carefully. Well-designed systems will include
controls that limit access to question construction, scoring, and deployment to qualified users.
Another inherent limitation with automated screening tools stems from the fact that they are
designed to be deployed early in a selection process when the commitment of the job seeker to the
selection process can be low. This places operational pressure to keep screening content brief, so
the candidate is not overburdened early in the process. For example, many employers desire their
screening and application process to require no more than 20 or 30 minutes, and this requirement
places restrictions on the types of questions that may be used. Additionally, screening questions that
are embedded in the application are broadcast widely, usually on the employer’s “Careers” page of
its website. Because screening content is frequently distributed broadly on open-access websites,
ability, knowledge, skill, and other question types in which right/wrong scoring is typical should be
avoided because of concerns for item security and cheating by candidates. Security considerations
for Internet-based selection content will be expanded on later in this chapter.

Automated Testing
Compared with screening tools, testing provides greater insight into individual characteristics by
deploying psychometric instruments that can provide more accurate measurement of constructs that
are difficult to index with screening questions, such as abilities, traits, and knowledge.
Automated delivery systems for these tools tend to have several common features. Test-takers
are typically invited to take the assessment by providing secure log-in and password information;
standardized instructions, help menus, and practice items are then provided to orient the test-taker.
During the test session, several features are deployed to facilitate test-taking, including count-down
timers, progress indicators, and other navigational aids. Many test delivery systems also include
advanced features such as the capability to present audio, video, and animated graphics as part of
the question stimuli; they may also allow for alternative response formats such as “drag and drop”
controls to provide a greater flexibility for handling a range of item types.
Despite the rapid introduction of new technology into test delivery, it can be argued that little
has changed regarding the test instruments themselves; it is the conditions under which tests are
administered that have changed more dramatically. Many systems allow for the deployment of a
test in any location where the Internet is accessible, thereby eliminating the role of the test proc-
tor. Internet delivery of tests has raised new debate over the appropriate security conditions for
these measures (Naglieri et al., 2004). Professional standards and guidelines for testing dictate the
importance of material security, standardization of the test environment, and the control of factors
that may impact test performance aside from the construct being assessed (American Educational
Research Association, American Psychological Association, and National Council on Measurement
in Education, 1999; Society for Industrial and Organizational Psychology [SIOP], 2003). However,
operational pressures push toward remote deployment methods because they allow employers to
176 Handbook of Employee Selection

efficiently gain insight into job candidates before they are brought on-site for more expensive activi-
ties, assuming that the test results are not damaged by the lack of standardization in the testing
environment. We return to this issue in more detail later in the chapter.

Additional Tools and Facilitators


Various automated tools for gaining insight into people exist beyond the classifications provided
above. Tools for supporting behavioral assessment, such as work simulations and assessment cen-
ters, allow for presentation of stimuli via controlled e-mail inboxes, instant messaging tools, and
voice and video mail. These approaches allow for the replication of the modern office environment
in a manner that is a better reflection of modern managerial work and has higher degrees of control
and standardization than nonautomated alternatives. Automated tools have also been developed to
help to structure and facilitate the interview process. Interview facilitators often allow for the iden-
tification of the rating targets (e.g., competencies, past behaviors), the construction or identification
of questions that assess these targets, the assignment of questions to interviewers, and a process for
data combination across interviewers. Furthermore, the tools can help with records retention if the
interview protocol, summary notes, and ratings are maintained in the system. Many of these steps
are geared toward improving the efficiency and standardization of the interview process; if the tool
is based on an interview technique that has been well researched and proven, additional insight into
candidates may also be gained.
Many other technology-based products exist to enable the hiring process and provide insight into
individuals, and certainly many more will be developed. Fundamental issues regarding efficiency,
control, and insight will continue to underlie the business value of new approaches. Sophisticated
buyers and users will evaluate advancements on the basis of the balance between their business
value and the risks they pose as increasingly complex functions become automated and decentral-
ized. The following sections will introduce some of the major issues and concerns that are raised as
organizations continue to automate the staffing process.

CRITICAL ISSUES IN TECHNOLOGY-BASED SELECTION


Beyond the various tools and techniques that have evolved with the advancements in technology,
it is important to examine the issues that are raised by the use of electronic tools in the hiring pro-
cess. Should automated selection tools be considered to be close variants of traditional assessment
and selection approaches, simply deployed through different media? Alternatively, does automation
introduce dramatic differences from traditional deployment conditions?
Various factors may contribute to the potential impact of technology. These factors raise several
issues regarding how the use of technology affects the operation of organizational selection systems.
Some of these are unique to technology deployment (e.g., the equivalence of assessments across
media), and others may be inherent to the endeavor of personnel selection, but their impact may be
magnified by the use of technology.
For example, research has demonstrated that simply changing the media for test administration
from paper-and-pencil to computer has an impact on some types of assessments, primarily speeded
measures of ability (Mead & Drasgow, 1993). Technology also allows for the use of alternative
types of assessments that may only be deployed with computer technologies, such as the use of
embedded video, audio, animated imagery, and branched items—advancements that are viable on
a large scale only through computer administration.
Other factors related to use of Internet-based technologies may also have dramatic implications
for how selection systems operate: the sheer size of the potential applicant pool is far bigger because
of access to job application processes via the Internet; applicant pools may thus span geographic
and political boundaries more readily, calling into play international rules regarding the handling
of assessment data.
Technology and Employee Selection 177

In the sections that follow, we describe some of the issues that arise because of the increased use
of technology in the selection process.

MEASUREMENT EQUIVALENCE AND THE FACTORS THAT INFLUENCE EQUIVALENCE


One critical issue for selection technology is the extent to which the mode of administration (paper,
computer, web, etc.) affects the measurement characteristics of the test. Practitioners and research-
ers are concerned about equivalency for several reasons. First, much of what is known about the use
of tests in organizations, and indeed about testing in general, comes from an understanding of the
measurement properties of paper tools. Technological testing applications are new enough that most
commercially available published tests and large-volume, privately developed tests were created in
paper-and-pencil format, and most technology-based tests are the computerized incarnations of the
same instruments. Second, research has shown that the mode of administration affects the measure-
ment properties of certain types of tests—even when the computerized test is designed to duplicate
the paper-test experience, as we discuss below. Therefore, an organization should not simply place
the content on a computer or online and declare that the test has the same measurement proper-
ties and difficulty as the paper test by virtue of its content, while ignoring the effects that may be
introduced by the interface. Third, organizational selection systems frequently use both computer
and paper-and-pencil format, as driven by the availability of technology. For instance, some orga-
nizations use computers when testing a handful of candidates at once, and paper and pencil at other
times to accommodate large, simultaneous test administrations where there are not enough comput-
ers or computers are not available (such as at a job fair away from the organization’s premises).
The literature on paper-computer equivalence has focused on two primary sources for differ-
ences: information-processing effects on cognitive ability measurement and the effects of privacy or
monitoring on personality measurement. Research on cognitive ability tests has shown differences
for speeded tests. Mead and Drasgow (1993) reported in a meta-analysis that paper and computer-
based speeded ability tests correlated .72 when corrected for the reliability of the tests, compared
with .97 for power (unspeeded tests). The authors chose ability tests such as the general aptitude
test battery (GATB), for which the intent was to duplicate the paper experience on the computer. It
is easy to see that when time is critical for performance, test materials and resources could make a
difference (e.g., turning pages vs. clicking buttons to navigate; filling in circles vs. gridding answers
on an answer sheet). For noncognitive tools such as personality tests and inventories, research has
focused on whether using a computerized format affects the constructs measured; for instance, by
introducing or exaggerating anxiety associated with computer use and monitoring (e.g., see George,
Lankford, & Wilson, 1998; Salgado & Moscoso, 2003) or by affecting a test-taker’s propensity to
distort answers to the questions (Richman, Kiesler, Weisband, & Drasgow, 1999). The equivalence
literature on personality and biodata has suggested that the relationships between the factors mea-
sured by these tools, as well as the internal consistency of the personality scales, are similar across
paper and computer administrations (e.g., Bartram & Brown, 2004; Salgado & Moscoso, 2003).
However, currently there is little published research indicating that mean scores and score distribu-
tions are also similar.
Some studies have found noteworthy differences between paper and computer test modes. For
example, Ployhart, Weekly, Holtz, and Kemp (2003) analyzed data from proctored administrations
of ability and situational judgment tests (SJTs) and reported certain advantages of Internet-based
tests over paper tests for personality measurement—including higher variance in scores, higher
internal consistency, and more normal distributions—as well as the disadvantage of higher correla-
tions between scales, thus reducing their potential uniqueness in predicting performance. Although
the findings justify further research, this study and others in this literature used a quasi-experi-
mental design that compared scores between groups without randomization—differences between
these groups may have affected the results. Potosky and Bobko (2004) called for more within-group,
repeated-measure studies, in which examinees take paper and computerized forms of the tests.
178 Handbook of Employee Selection

The literature lacks such studies that also specifically use pre-employment tests (they note Salgado
& Moscoso, 2003, as a lone exception of a within-group design in this literature). New studies
have introduced other interesting factors that may play a role. For example, Meade, Michels, and
Lautenschlager (2007) found scores to be comparable for many personality constructs; however,
there were exceptions when the authors introduced having (or not having) a choice in whether the
test was taken via paper or Internet. They also distinguish between computerized and Internet-
based tests in the literature, pointing out that bandwidth and Internet connectivity affect test down-
load speeds, which in turn affect administration time and therefore measurement, particularly for
speeded tests. (Note that several studies cited in this section were focused on computer-based rather
than Internet-based testing, including George et al., 1998; Mead & Drasgow, 1993; and Richman
et al., 1999.) Ideally, future research will examine equivalency between paper, computer, and online
personnel tests using repeated-measure designs.

UNPROCTORED INTERNET TESTING: NEW OPPORTUNITIES AND MITIGATION OF RISK


Another key issue for technology and selection involves the pros and cons of unproctored
(unsupervised) Internet testing. Ten years prior to this writing, Internet-based testing was a
novelty; now it is the standard for many organizations that use selection tests. The convenience
and accessibility of Internet computing has spawned a market for anywhere and anytime
testing. Obvious benefits include candidate convenience, cost savings, and faster hiring cycles.
In addition, an organization can better recruit semipassive candidates who apply on a whim
while also portraying the company as a tech-savvy organization for which innovation begins
with the selection process. I-O psychologists must address these organizational motivations
and expectations, mitigating the risks of unproctored Internet testing while capitalizing on
the opportunities. In the following section, we address several key issues, including examinee
identification, test security, cheating, and test standardization (for a detailed discussion, see
Naglieri et al., 2004; see also Tippins et al., 2006).

Examinee Identification
Currently there is no way to verify that the person taking the test on the Internet is the actual can-
didate. Tippins et al. (2006) described and critiqued options for retesting all or some candidates
(e.g., randomly selected individuals) and the practical challenges of administering this program
and interpreting any score differences that are observed between administrations. In the future,
biometric verifications, real-time video communications, and other technology may make examinee
identification more practical.

Test Security
Security is a concern mostly for the publisher and user of the test. The user might be concerned
about his/her data, although the encryption protocols for Internet-based tests can theoretically be
as good for those for any other data exchanges. Although the delivery system can block the ability
to share and distribute the test using the local computer’s operating system, there is no stopping the
determined candidate from using other technology to circumvent that security (e.g., by taking pic-
tures of the computer screen). Test publishers typically include a statement that the candidate must
agree to in order to access the test, stipulating that he or she will take the test honestly and will not
distribute the information. The organization also may attempt to limit access to the test to serious
candidates who have been prescreened rather than opening the test content to the general public.

Cheating
Clearly, candidates who want to cheat will find that unproctored Internet testing provides them
with opportunities because no one is watching them take the test. To date there have been many
Technology and Employee Selection 179

conference presentations, but few publications, on the effect of cheating on test results. In personal-
ity testing, a related literature, meta-analyses on social desirability response distortion and faking,
provides evidence that paper personality tests do not appear to differ from computerized tests with
regard to cheating (Dwight & Feigelson, 2000; Richman et al., 1999). It is likely that mitigating
factors in a candidate’s decision to cheat will include the existence of “right” answers (such as those
on an ability test) versus opinions or attitudes (such as in personality tests in which the scales of
interest and the right answers are less obvious to the candidate) and the extent to which the test event
is seen as a hurdle (if not viewed as pass/fail, there is less at stake and therefore less motivation to
cheat). Therefore, the organization might attempt to deemphasize the high-stakes nature of test, for
instance, by describing it as one datum point of many. Threatening to retest candidates in a proc-
tored, second hurdle might also deter some cheaters.

Test Standardization
Dispensing with proctoring limits the ability to control the test environment, leading to situa-
tions that may disadvantage some candidates. In addition to the distractions that may be present
in an unsupervised setting, hardware and unforeseen software issues may arise. Differences
between examinees’ computers and Internet connections could affect testing, particularly on
speeded tests. The remedy for this is to use as little local software as possible—ideally, with
current technology this means a browser-based test—and to have instructions and trouble-
shooting tips available for the examinee to confi rm optimal usability of the test system prior to
administration.
As shown in this section, unproctored testing brings ethical issues to the forefront. The threat of
cheating or unfair conditions creates an ethical dilemma for I-O psychologists, whose duty is to pro-
vide a fair and consistent test experience for all and an equal opportunity for test takers to perform
up to their level of ability or trait.

ADVANCED TESTING METHODS


An assumed promise of technology is the belief that psychological measurement can be advanced
through the interplay of psychometrics and computer programming. Although there are many
examples in which this promise is more marketing hype than substance, there are a few areas of
research and development that have real potential to improve measurement above gains in efficiency
and standardization of processes. Computer adaptive testing and high-fidelity item content deserve
mention in the later category.

Computer Adaptive Testing


Test providers have suggested the use of computer adaptive testing (CAT) with increasing frequency
in recent years in the personnel selection context. Pressures to conduct remote and unproctored test-
ing may have contributed to the increased interest in adaptive testing. Adaptive tests expose each
candidate to only a small subset of the operational testing items—the items that are administered
to a given candidate will depend on the candidate’s responses to prior items, their estimated ability
level on the construct being assessed, and random factors that are designed to prevent overexposure
of the items in the item bank. CAT is thus viewed as a viable protection against the security threats
posed by unproctored testing. Additionally, a well-designed CAT will provide a more accurate
assessment of candidate ability compared to a linear test, because candidates are exposed to more
items that are closer to their true ability level. Comparisons to a secondary (proctored) test in the
hiring process can then be used to discourage having confederates taking the first exam, because
the actual job candidates will need to perform at comparable levels or better once they are onsite
for the proctored test.
Although adaptive testing may offer some relief from the challenges posed by unproctored test-
ing, it is not without significant limitations. Adaptive measures must focus their measurement
180 Handbook of Employee Selection

on a unitary, well-defined construct, and it is thus better suited for testing cognitive abilities
or knowledge areas versus personality or other softer constructs. CAT also requires a large
investment in the item bank development and calibration research. A single adaptive scale that
is designed for use with test-takers of wide ability ranges can easily require hundreds of test
items, each scaled on over a thousand test-takers. Even with the extra security and precision
afforded by a CAT, if the test is deployed without a proctor, the measure will still likely require
some form of follow-up test to estimate whether the candidate took the initial test without
assistance. Despite these concerns, the increase in unproctored testing provides a business
rationale for the investment in these tools by a broader range of organizations than has been
the case in the past, so it is possible that the theory and techniques behind adaptive testing may
increase in their popularity.
In practice, the use of CAT has been more prominent in high-stakes certification testing than
in the employment context; however, Internet-based CAT has now been developed for high-
volume selection (e.g., Gibby, Biga, Pratt, & Irwin, 2008). Techniques have also been developed
for applying adaptive testing technologies to personality assessment (e.g., McCloy, Heggestad, &
Reeve, 2005; Roberts, Donoghue, & Laughlin, 2000; Stark, Chernyshenko, Drasgow, & Williams,
2006), and applications of personality-based CAT are under development for application in the
armed services (e.g., Borman et al., 2008). The reemergence of CAT as a tool for Internet-based
selection has also contributed to confusion among lay people and software programmers alike as
to the differences between CAT and branched item delivery. Branched items (sometimes referred
to as “chained” items) are constructed to represent test content where a series of events has inter-
dependencies, and the choice of a response to an initial item determines the next item in the chain.
For example, a series of questions about medical diagnosis may begin with a set of symptoms and
the respondent must choose between several diagnostic options. On the basis of the option the
respondent picks, the next test question will provide results from their choice and the next step
must then be chosen. In this manner, a decision tree may be constructed to determine whether
the respondent knows the correct path toward a diagnosis. The ability to branch item presentation
on the basis of prior responses is also enabled by computer-based test delivery. However, unlike
adaptive delivery, branching in this manner does not meet the assumption of local independence
of test items that is required for adaptive testing. Thus, branched tests are able to reflect complex
test content, but they do not operate in the same manner and with the same benefits as adaptive
tests. Nevertheless, the concept of branching is appealing and several vendors of Internet test
engines actively market the capability as adaptive testing. Branching is perceived as adding real-
ism because later items can represent the consequences of prior actions. Although this feature
may add interest to low-stakes assessments such as post-training knowledge checks, there are
reasons to be skeptical regarding the quality of these measures if they violate assumptions
of the underlying measurement theory. As is the case with many aspects of technology-driven
assessment, the challenge is the education of lay decision-makers on the differences between
these methodologies.
To enable better understanding of these advanced testing techniques, students of I-O psychol-
ogy should be educated on appropriate test theory and methods. For instance, item response the-
ory (IRT) will become increasingly important because it provides the foundation for CAT and it
improves measurement over classical test theory (CTT; Embretson, 2000). Historically, many I-O
psychology graduate programs have taught CTT but have placed less emphasis on IRT, which is
more complex and arguably less accessible if covered as one section of a general survey course on
testing. The inclusion of a full IRT course in the curriculum would be advantageous for the field.

High-Fidelity Item Presentation


Technology also allows for the presentation of richer forms of stimuli than are possible with paper
formats. Available technologies include embedded video and/or audio stimuli that present realistic
scenarios before test-takers are cued for a response. For example, typical customer complaints may
Technology and Employee Selection 181

be presented in digitized video clips during an assessment of applicants for customer service posi-
tions, thereby providing a more realistic context than is possible through written scenarios and more
standardized than is possible with live role players.
Advances in computer animation, driven largely by the growth of the computer gaming industry,
have provided another set of tools that are gaining in popularity for assessment. Some assessment
platforms have been developed that incorporate animated avatars and simulated work environments
in an attempt to accommodate a wider range of stimuli. Questions remain as to whether the elicited
responses are adequately representative of real-world actions; however, visual appeal and the ability
to simulate rare events will likely spark enough applications of these assessments to provide a test
bed for research on these questions.
Additionally, new forms of test items can be created by combining animation with graphical
tools such as drag-and-drop controls. For example, work processes that involve the placement of
objects in a location and sequence can be modeled in animated environments. Test takers then click
and drag images to the correct location to complete the assessment. These item types can be used
to simulate processes such as food preparation, housekeeping rules, assembly tasks, and medical
procedures.
Aside from the measurement benefits of higher fidelity stimulus presentation and enhanced
standardization, increased use of high-fidelity item types may also serve to engage job appli-
cants in the assessment task. This outcome is of increasing value in a recruiting market geared
toward younger job seekers who may become frustrated with lengthy application and assessment
processes.
A final benefit from high-fidelity item presentation may be in its potential to reduce score
differences among demographic groups. Outtz (1998) presented findings suggesting that video
presentation of stimuli may reduce observed score differences. Although this finding has not
been widely demonstrated (e.g., Weekley & Jones, 1997), nor are the reasons behind the finding
clear (hypotheses include increased social context cues, reduced reading level, or psychomet-
ric artifact due to differences in the reliability between assessment methods), more meth-
ods for better understanding this pervasive issue would be advantageous and deserve further
investigation.
The use of high-fidelity item presentation also raises various challenges and issues. From a prac-
tical perspective, the deployment of assessments that include video, audio, and/or graphical stimuli
requires additional server capacity for the provider, bandwidth at the delivery site, and compatible
playback software on the local computer used for the assessment. Installation and maintenance of
these components can be complex and expensive, and failures can quickly erode the benefits of
standardization across assessment participants.
For developers of assessment software, the apparent benefits of higher fidelity presentation must
be counter-balanced with the challenges posed by the specificity of the stimuli. To appreciate this
point, consider the customer service assessment mentioned above. Should the assessment provider
wish to deploy the assessment in a healthcare environment, the stimulus video would likely need to
be reproduced in a healthcare setting. In contrast, if the assessment were simply a written SJT, the
adaptation could be as easy as changing the word “customer” to “patient.” In general, the higher
the realism provided in the assessment stimuli, the less generalizable that assessment is likely to be
across the environments where it could be deployed.
We offer one final note of caution regarding computer-driven high-fidelity assessments.
Psychologists have long been seduced by the idea that microbehaviors are closely related to
brain function and therefore can reveal important truths about individuals. With automation,
this reductionism is fed by various new microbehaviors that may be collected in the course of
computer-delivered assessments. Mouse-over or “hover-time” measures, click patterns, page-
view counts, and similar metrics are possible to extract from some assessment delivery vehi-
cles. To be sure, adequate research has yet to be conducted on the potential value that may be
hidden in these data piles; our prediction is that these endeavors are more likely to be a red
182 Handbook of Employee Selection

herring than a guidepost on the path toward understanding and predicting important workplace
behavior.

DEMOGRAPHIC CONSIDERATIONS
Computer ownership and usage is not ubiquitous. Although access rates to computers and the
Internet have risen dramatically since the mid-1990s (National Telecommunications and Information
Administration [NTIA], 2002; Pew Research Center, 2007), there remains a large enough shortfall
in certain portions of the population that employers need to consider the impact of adopting auto-
mated assessment processes on their resultant applicant pools. There are two reasons why employ-
ers should be concerned about the accessibility of technology. First, limited access to the Internet
may discourage some applicants from applying. Second, automation could negatively affect the
attitudes of some applicants toward the hiring process and the company.
In many cases, requiring applicants to use the Internet as an entry into the hiring process will
have a negligible effect on the composition of the applicant pool; however, the degree to which this
projection is true likely depends on a range of factors such as the level of the job, the location and
type of work involved, and the characteristics of the available qualified job seekers in the market.
The extent of the impact should be considered as recruitment processes and automated tools for
screening and assessing applicants are implemented. For example, data from the NTIA and the Pew
Research Center show that Internet usage at home has increased rapidly for all segments of society
in the past few years. Furthermore, Internet usage patterns also show that job search is a com-
mon activity for demographic groups who use the Internet outside of their homes (NTIA, 2002).
However, it is also important to note that some groups (such as those over age 65 and those with
less than a high-school education) have lower Internet usage rates (i.e., below 50% as of 2007; Pew
Research Center, 2007).
When evaluating the potential impact of Internet access rates on applicant pool demographics,
employers should collect evidence on the use of the Internet for job search by different demographic
groups. Carefully monitored demographics can reveal if some aspects of the applicant population
are not accessing the selection process, suggesting that parallel application processes should be
pursued. However, given the trend toward wider Internet access, increasing computer literacy, and
strong competition for labor, organizations will risk losing qualified candidates if an Internet-based
recruitment strategy is not pursued to some extent.

REACTIONS TO TECHNOLOGY-BASED SELECTION METHODS


Because most large organizations have now shifted toward Internet-based recruitment and hiring
processes—estimates were as high as 90% as early as 2001 (Cappelli, 2001)—the reactions of
applicants to these changes are important to understand. Although applicant reactions to selection
processes are discussed in detail in another chapter in this volume, reactions to technology-based
processes are briefly considered here.
The issue of how applicants perceive a company’s selection processes can have strong organiza-
tional potency—applicants who dislike a selection process may be less likely to continue and selec-
tion procedure validity may be impacted, among other potentially negative consequences (Gilliland
& Cherry, 2000). Diminishing applicant commitment may have a heightened impact in Internet-
based recruiting and selection processes, where the decision to abort a selection process and initiate
one with a competitor may be only a few clicks away.
Of course, strongly negative applicant reactions may have even more severe consequences.
Negative reactions toward various aspects of online selection systems have been shown to affect
perceptions of the organization, especially among those having less Internet experience (Sinar,
Reynolds, & Paquet, 2003). Such reactions toward the organization could potentially increase the
propensity to file complaints by those who are screened out by a selection procedure. The impact
Technology and Employee Selection 183

of these effects may be magnified if negative reactions are more predominant for individuals
belonging to legally protected demographic groups. Internet-based selection techniques may be
especially susceptible to negative reactions if some applicant groups do not have easy Internet
access.
Generally, however, research has not shown large differences in applicant reactions to selection
procedures because of race (Hausknecht, Day, & Thomas, 2004), although under some conditions
these differences may emerge more prominently (Ryan & Ployhart, 2000). When group differ-
ences in reactions toward technology-based selection procedures have been examined directly, few
meaningful differences have been found for racial groups, whereas age group differences were
potentially more of a concern (Reynolds & Lin, 2003).
In sum, the available research suggests that some of the more negative assumptions regarding
reactions to technology have not been realized. In fact, traditionally disadvantaged groups may
perceive some benefit to the objectivity provided by an automated system. Compared to traditional
processes that may have required a visit to the employer to complete an application, perhaps finding
access to the Internet has proven to be less of a challenge. More work in this area will be required
to better understand the factors that affect applicant perceptions to technology-based selection tools
and their potential to influence broader attitudes and behavior during the application and socializa-
tion process.

LEGAL CONSIDERATIONS
Despite the “wild west” attitude inspired by many software vendors who have reshaped recruitment
and selection processes, all of the usual employment laws apply to these new techniques. Less well
known are the risks some of the newer procedures may carry for employers because traditional con-
cepts related to the formal submission of a job application are reshaped by new technologies. Two
areas of regulation have focused most directly upon Internet-based hiring: the U.S. federal rules
regarding recordkeeping for Internet applicants and European data protection regulations.

Internet Applicant Defined


As employee selection processes were increasingly deployed via the Internet in the late 1990s,
the trend raised questions for employers as to how to best track and record job seeker interest in
specific positions as required by U.S. federal regulations. For example, does each job seeker who
submits an electronic resume need to be considered as an applicant for recordkeeping purposes?
In 2005, the Office of Federal Contracts Compliance (OFCCP) issued some guidance on this issue
(U.S. Department of Labor, 2005). This notice served to officially connect the rules and logic of the
Uniform Guidelines to Internet-based employment processes, at least for companies that contract
with the U.S. government and thus fall under the purview of the OFCCP. Three critical issues are
addressed in the official rule: (a) the definition of an applicant within the context of an Internet
selection process, (b) the recordkeeping requirements when automated tools and search techniques
are used, and (c) a definition of basic qualifications.
According to the final rule, an individual is considered an “Internet applicant” when four criteria
are met.

1. The individual submits an expression of interest in employment through the Internet or


related electronic data technologies.
2. The employer considers the individual for employment in a particular position.
3. The individual’s expression of interest indicates that he or she possesses the basic qualifica-
tions for the position.
4. The individual at no point in the selection process prior to receiving an offer of employ-
ment from the employer removes himself or herself from further consideration or other-
wise indicates that he or she is no longer interested in the position.
184 Handbook of Employee Selection

Regarding recordkeeping, employers are required to retain:

• All expressions of interest that are submitted through the Internet or related technologies
that the employer considers for a particular position, regardless of whether the individual
is declared an “Internet applicant”
• Internal resume databases, including items for each record such as the date each was
added, the positions for which each search of the database was made, and the criteria used
for each search
• External database search information, such as the positions for which each search was
made, the date of the search, the search criteria used, and records for each individual who
met the basic requirements for the position
• All tests, test results, and interview notes
• The gender, race, and ethnicity of each Internet applicant

The OFCCP also provided elaborate guidance on what constitutes a basic qualification. These are
qualifications that are either advertised (e.g., posted on a website) or established prior to considering
any expression of interest for a particular position (e.g., when searching an external resume data-
base). Furthermore, basic qualifications must meet three criteria: (a) they must not involve compari-
son of qualifications between applicants (e.g., 2 years of engineering experience is noncomparative;
the most experienced engineer in a pool of engineering applicants is comparative), (b) they must be
objective (i.e., not dependent upon judgment), and (c) they must be “relevant to performance of the
particular position and enable the [employer] to accomplish business-related goals.” By definition,
passing an employment test cannot be a basic qualification.
In 2004, an interagency task force proposed another set of guidance. The task force proposed
five additional Questions and Answers regarding the Uniform Guidelines on Employee Selection
Procedures (Equal Employment Opportunity Commission, U.S. Department of Labor, U.S.
Department of Justice, and Office of Personnel Management, 2004). These questions and answers
clearly linked Internet-based selection processes to the rules set forth in the Uniform Guidelines.
They also defined Internet applicants, but their definition differs from the OFCCP version. These
questions and answers have not been issued in final form, and given evident differences with the
OFCCP’s finalized rules, they may never be published in final form.

Data Privacy and Protection


As the use of the Internet for recruitment and selection extends the reach of these processes to
a worldwide audience, organizations must be aware of and compliant with the various inter-
national rules that apply to the transfer of individual data across political borders. Of primary
importance are the European data protection regulations. For example, online tools that allow
job seekers in Europe to apply for a job using systems that are hosted on computers located in
the United States are collecting personal data and transferring these data across borders. By
engaging in this activity, an organization could potentially be in violation of the domestic laws
of the European Union (EU) Member States. These laws were implemented as a result of the EU
Directive on Data Protection, which aims to prohibit the free flow of personal information from
EU nations to countries that have been deemed to have inadequate privacy protection, such as
the United States.
The Directive, which went into effect in 1998, underscores the difference between both the cul-
tures and legal systems of Europe and the United States. In Europe, privacy protection is viewed
as a personal right. To protect this right, the various EU Member States have over the past several
decades enacted an aggregation of legislation administered through government data protection
agencies. The Directive’s primary intended purpose is to set minimum privacy protection standards
for each of the EU Member States and make it easier to transfer personal data within the EU. In
the United States, by contrast, the protection of private information is viewed less uniformly, with
Technology and Employee Selection 185

differing standards for varying circumstances; therefore, privacy protection in the United States is
guided more by limited legislation and regulation, and by self-regulation.
Organizations that seek to deploy Internet-based HR systems that involve international data
transfers have several compliance options. Two approaches are most common. First, organizations
can seek permission from data providers (i.e., job seekers) regarding how and where their data will
be processed. Second, U.S.-based organizations may join a Department of Commerce program that
certifies them as a safe harbor for personal data. This certification states the organization’s willing-
ness to adhere to seven Safe Harbor Privacy Principles that the Commerce Department negotiated
with the EU. This program, and the EU laws to which it relates, are described at http://www.export.
gov/safeharbor. The seven principles are as follows:

1. Notice: Individuals must be informed, as early as possible and in unambiguous language,


about the organization’s reasons for collecting and using their personal information.
2. Choice: Individuals must be allowed to decide if and how their information is to be used
or disclosed to third parties beyond the purpose originally specified and authorized by the
organization collecting the information.
3. Onward transfer: Personal information may only be transferred to a third party under
the Notice and Choice conditions specified above. One organization can transfer data to
another without participant assent only if the third-party organization is also qualified as a
safe harbor or otherwise satisfies the requirements of the Directive.
4. Access: Within logistical reason, individuals must have access to and be able to correct,
add to, or delete their personal information where it is deemed inaccurate.
5. Security: Data must be reasonably protected from loss, misuse, unauthorized access, and
disclosure.
6. Data integrity: Personal information must be relevant, reliable, accurate, current, complete,
and used only for the purpose for which it was collected and authorized by the individual.
7. Enforcement: Organizations must provide mechanisms for complaints and recourse, pro-
cedures for verifying adherence to the safe harbor principles and obligations to remedy
problems.

In addition to the EU rules, a growing number of U.S. laws deal with privacy considerations. For
example, California laws have also been enacted that require a privacy statement to be posted on
any site that collects personal information from California residents, and additional rules require
the disclosure of any breach of personal data security.
Several organizations have been established to help companies comply with the various aspects
of the expanding protections and rules regarding data privacy. Services such as TRUSTe will review
data privacy policies and procedures in light of the applicable rules and will act as arbiters to settle
disputes over data handing, thereby satisfying the seventh safe harbor principle of enforcement that
requires the presence of a formal resolution mechanism.
Clearly, the liabilities associated with these responsibilities need to be carefully examined with
respect to any HR processes, and online recruitment and selection processes are of particular con-
cern because of their broad reach to the public at large. Companies that use online recruitment and
selection processes should be aware of these privacy considerations and take steps to ensure their
online tools are compliant; otherwise, Internet-based systems that collect information broadly from
job seekers will raise risks associated with liability for data processing that is inconsistent with EU
and U.S. laws.

Case Law
As of this writing, few cases have moved through the courts that relate to the appropriate use
of technology in the hiring process; cases relating more generally to data privacy are more
prevalent (e.g., FTC v. Toysmart.com, 2000). However, more of these cases are sure to emerge
186 Handbook of Employee Selection

and provide further guidance regarding the risks and boundaries of technology-based selection
practices.

IMPLEMENTATION ISSUES
I-O psychologists must consider a range of issues when implementing technology-driven selection
systems. This section discusses the issues in terms of the development, administration, and support
of technology-based selection systems.

DEVELOPMENT
Designing and managing technology-based selection systems involves skills that are related to the
analytical and decision-making skills of an I-O psychologist, yet also requires skills that are not
central, such as business acumen and IT expertise. Development includes making a business case
for purchasing or building a solution, acting as an information architect for the new system, and
managing the transition from traditional to technology-based systems.

Making the Business Case


Justification for a technology system generally will rely upon reduction of labor costs and improved
efficiency associated with automation. One of the first considerations in the business case is whether
to buy or build. The choice affects the time horizon, and any features that are currently available
can help in the justification. Now that the market for technology-based selection systems has begun
to mature and there is a choice of off-the-shelf solutions with a range of functions, organizations
should seek the assistance of vendors in making the business case for them; for instance, by sending
out a formal or informal request for information (RFI) prior to soliciting bids. When seeking bids,
the costs of customization should be included. Many factors can drive the need for customization,
including creating customized score reports, migrating tests or examinee data onto the vendor’s
platform, and setting up systems that mirror the company’s organizational and/or regional struc-
tures (Kehoe, Dickter, Russell, & Sacco, 2005). Although many organizations may see benefits
to customization, it should also be recognized that customization has many drawbacks, usually
in higher upfront costs and ongoing maintenance fees, because the resulting software is divergent
from the provider’s standard platform. For these reasons, configuration of available options within a
system is usually preferable to customization of software. Table 8.2 provides general guidance about
the categories of costs associated with technology-based selection systems.

Determining the Features


I-O psychologists implementing selection technology within organizations should keep in mind
three levels of users with a stake in the day-to-day operation of the system: the candidate, HR man-
ager or administrator, and the manager or supervisor receiving the candidates (Gilliland & Cherry,
2000). Flowcharts should be developed to map the current and desired processes and features and
to understand them from each type of stakeholder’s point of view (Kehoe et al., 2005).
The solution’s functionality will need to be scalable and flexible. For example, it should be pos-
sible to implement the standardized solution without new development or workarounds to accom-
modate different departments in the organization (scalability). The solution must also be adaptable
to meet future requirements and technology upgrades. Kehoe et al. (2005) discussed questions the
I-O psychologist should ask when developing a technology-based selection system. These include
how the administrative rights to it will be managed, how candidates will gain access to tests, how
test security will be assured, whether the system can be configured to apply the organization’s
test policies (such as retests or disability accommodation), and how test results will be stored and
communicated. The organization’s available hardware/software and IT infrastructure are also key
considerations. The selection technology’s requirements (e.g., operating systems, browsers) must be
Technology and Employee Selection 187

TABLE 8.2
Costs of Implementing Technology-Based Selection
Source Examples
Development • Software development and/or customization
• Technology integration (with applicant tracking systems, enterprise resource planning, etc.)
• Equivalency studies (paper/computer/Internet comparisons)
• Hardware (as applicable); for example, fax machines or scanners for scoring paper tests
electronically
Deployment • Field testing/quality assurance
• System hosting fees
• Installation costs if locally hosted
• Account setup (process flows, permissions, reporting requirements)
• Training
Maintenance • Tracking and reporting
• Upgrades
• Security monitoring
• Backups and failure recovery

compatible with the organization’s special requirements (such as security protocols and firewalls).
Whether the solution is created in-house or procured from a vendor, the IT department must assist
with its implementation.

Managing the Transition


Implementing a technology solution can be a complex project, from the design of the new selec-
tion process itself (as described in other chapters in this volume) to managing the technology. With
regard to the latter, we emphasize the inherent challenges in developing a functioning system and
in migrating to a computerized or other technology-driven process. These projects involve software
development and database administration, two skills that are not standard in I-O training.
The plan for implementation should include detailed specifications of functionality, whether for
building the system or understanding how an off-the-shelf solution will work with the organization’s
hardware and software. The new system also must accommodate any legacy processes and data.
The more complex the organization’s technology infrastructure, the more fine-grained the details
should be about software functionality. Failing to specify all of the software requirements ahead of
time will delay or derail the project. If paper-based tests are being used, test equivalence must be
addressed (particularly speeded cognitive tests as discussed earlier).
Because of the high-stakes nature of selection and the precision required for selection systems, a
rigorous quality assurance process plan is essential. The system should be beta-tested to make sure
the software functionality is intact and that it is in line with user expectations.
The selection software also must be able to integrate with other systems the organization may be
using—whether other selection systems or related systems (e.g., applicant tracking and career man-
agement tools). In anticipation of the need for different systems to communicate, organizations such as
the HR-XML Consortium (http://www.hr-xml.org/hr-xml/wms/hr-xml-1-org/index.php?language=2),
the Object Management Group (http://www.omg.org), and the IMS Global Learning Consortium
(http://www.imsglobal.org) have advocated for XML, a type of language for storing and organizing
information, as the standardized format for passing data between systems (Weiss, 2001). These groups
have provided industry-standard terms for organizing and storing data so that they can be transferred
more easily between systems. For instance, by using or selecting vendors who use HR-XML, organi-
zations will have choices in HR solutions without becoming locked into a single platform.
XML, or extensible markup language, is a standard language for describing data. Unlike the
more familiar hypertext markup language (HTML), XML allows users to define their own “tags,”
188 Handbook of Employee Selection

or ways of describing the data contained in a software system (Weiss, 2001). The tags facilitate the
transfer of data from one system to another, providing the information the receiving system will
need to process the information. However, not all systems currently use XML, and there may be
differences between systems in data definition that the manager of the integration project should
become aware of to have a successful implementation.

ADMINISTRATION
When administering the system, the organization must pay special attention to the ways in which
information is accessed, processed, and stored.

Access
Different types of users will have varying levels and methods of access to the selection tools and
systems. The organization must have policies detailing when, or if, test information is available to a
user (i.e., applicants, HR staff, and hiring managers). For instance, external applicants might be able
to take tests only after a screening and approval process categorizes them as viable candidates (e.g.,
application blank, phone screen). The system might provide internal applicants, but not externals,
with a feedback report about their results. HR and other administrative staff might have different
levels of access, from an administrator with basic privileges (access to deliver tests and see only
basic results), to an HR manager with purchasing authority (buy tests, view applicant inventory, see
detailed results), to analysts and I-O psychologists (run database queries of raw data, conduct item
analyses). Hiring managers might have another type of access, allowing them to see only those
candidates who are eligible for interviews.

Automation
Organizations must decide to what extent the information will be processed by rules and automa-
tion, and to what extent HR experts will be involved to make judgments and carry out transactions.
The system can enable users to do work that formerly had been conducted by others, such as work-
force and adverse impact analyses that business managers and HR generalists could conduct them-
selves rather than requesting from an I-O psychologist. Having this capability for self-service brings
into question the role of the I-O psychologist in the system and the extent to which expertise should
be offloaded to automation. On the one hand, when automation and user training are effectively
combined, the psychologist can be freed from routine, transactional work to become more strategic.
On the other hand, there is a risk of misuse of the system if results are interpreted incorrectly or if
sensitive data fall into the wrong hands.

Data Storage
The method of storage partly determines the administration possibilities. For instance, a data-
base structure that permits data integration across systems, whether through XML technology as
described above or through other means, opens the possibilities for different types of selection tools
and other HR data to be part of a larger transactional database. Thus, test results could be used for
succession planning, individual development plans, etc.

SUPPORT
The selection system will not be a static entity. As technology progresses, the organizational struc-
ture changes, and HR processes are upgraded, there will be a need for consulting and dedicated
staff time in the form of technical support, ongoing maintenance, and user training. When provided
by a vendor, these services should be included in a service-level agreement as part of the contract
for the vendor’s services.
Technology and Employee Selection 189

Technical Support
Expert support for the technology-based selection system includes services to each major group
of stakeholders (i.e., candidates, HR staff, and business managers). As with any selection system,
whether traditional or technology-based, candidates may need support to access information about
tests and eligibility for testing and may also require troubleshooting to access and take tests that are
delivered unproctored over the Internet. HR staff may encounter technical difficulties requiring IT
support. Business managers may need guidance from HR, including I-O psychologists, about the
use of the system and interpretation of results.

Maintenance
Ongoing maintenance, updates, and revisions (e.g., content and feature changes to hosted websites)
are another dynamic aspect of the selection system. Some routine downtime is expected for main-
tenance as the system is checked for ongoing quality assurance and upgrades. These activities must
be high priorities for the vendor and the organization’s internal IT department.

Staff Training
There should be multiple methods for training, including live sessions, websites with frequently
asked questions, and self-guided training modules. Retraining should be readily available as the
organization changes, the system advances, and staff turns over. Even with the best-designed inter-
face, failure to provide adequate staff training can derail the selection system through improper
administration of test procedures and policies. Further, as test data become increasingly integrated
into the broader HR strategy of attracting, developing, and retaining talent, the trend will be toward
more and more complex systems that require more training.
Importantly, the overarching trend that influences development, implementation, and support
is technology integration. Many technology providers are building market share by supporting a
broader array of HR functions across the “talent management lifecycle,” requiring the integra-
tion of data from recruitment and organizational entry through management and career progres-
sion. In the future, it may be rare for a technology-based selection system to be implemented and
administered in isolation from other systems. The facets of talent management will be integrated
into a single enterprise resource planning (ERP) database for use in making strategic decisions
about human capital. As mentioned earlier, there are several implications for I-O psychologists.
Researchers and practitioners will be able to obtain data that are harder to come by today (e.g.,
performance and return-on-investment data for use in validation studies, program evaluation, and
employee-organization linkage research). Psychologists will be able to earn a broader role in strat-
egy-setting and decision-making if they are able to provide an analytical, forward-thinking use of
the information. Most importantly, I-O psychologists will be able to broaden their own expertise,
as well as improve the field’s knowledge and credibility for research and practices that have true
organizational impact.

FUTURE DIRECTIONS
Looking ahead, the practice of employee selection will likely continue to experience rapid change
as a result of technology advancements. Many of the topics described in this chapter are in their
infancy; as long as technology ventures continue to be a potentially high-return (albeit high-risk)
investment, funding will continue to drive innovation into areas that will impact HR tools and pro-
cesses. On the basis of advancements that are already implemented in adjacent industries and trends
already in place, several areas of innovation for selection practices are likely.
Regarding assessment presentation formats, advancements in the construction and animation
of computer graphics will continue to provide assessment developers with new options for stimuli.
Importantly, the technology to construct graphical stimuli will become simpler to use and less
190 Handbook of Employee Selection

expensive to acquire, thereby allowing for broader application and experimentation. Tools are avail-
able now for constructing animated sequences that can be configured by end users. As these tools
are implemented and experimental data are collected, the potential operational and measurement
benefits will be better known. These foundational tools will continue to be funded from other indus-
tries, but applicants will likely see increased use of these features as a side benefit as these technolo-
gies develop further.
The push toward integrated media components will also continue. Technologies used at work will
be more likely to involve integrated interactive video and audio capability, and as these tools become
more commonplace, assessment delivery systems will also incorporate them to improve the fidelity
of stimulus presentation and to increase the realism of the experience. Many approaches to assess-
ment can benefit from these technologies, but assessment centers and other forms of work simulation
will likely be at the forefront because these techniques emphasize stimulus and response fidelity, and
these in turn create the need for process efficiencies that can be addressed by technology. Several
professional workshops have already been devoted to the application of these technologies in assessment
centers and simulations (e.g., Howard, Landy, & Reynolds, 2002; Rupp & Fritzsche, 2008).
Response fidelity will also increase as technology evolves. Currently high-fidelity response (e.g.,
speech or written text) is often scored by trained assessors in the employment context. High-volume
applications of assessments in education are already deploying handwriting and text recognition
tools and scoring algorithms that relate to human ratings of the subject matter (e.g., Landauer,
2006). These tools also should be expected to jump into employment testing and assessment as these
approaches are refined. Speech recognition tools will also be likely to improve to a point where they
can be more readily deployed for assessment purposes. The combination of more accurate speech-
to-text tools and the advancements of semantic recognition algorithms will allow for more complex
assessment to be deployed on a greater scale than we see today.
Technology advancement may also expand the set of variables we are able to measure and use
in research and practice. For example, some research has pointed to the relationship between timed
response latency and trait strength (Siem, 1991), suggesting a possible tool for detecting distortion
and deception in test responses.
In a broader context, the integration of technologies that support the full range of people pro-
cesses will create some new opportunities for the use of selection data. For example, as selection
tools are integrated into common data frameworks such as those held in learning and performance
management systems, the use of selection data to help shape the on-boarding and career develop-
ment process will be far more operationally viable than it is today. The opportunities to develop
data-leveraged processes to manage talent more effectively will multiply as the number of inte-
grated talent management components expands.
Of course, work itself will also continue to be affected by technology shifts. Organization struc-
tures and operational systems may shift from strict hierarchies toward networks, in part because
technologies allow for natural communication patterns and functional interdependencies to be
quickly established and supported through multifunctional shared workspaces. Through these tools,
global networks for accomplishing work can be established. These changes will need to be reflected
in the selection processes that are used to staff organizations that work in these ways.
This brief discussion touches on only a few of the likely sources for innovation in technology-
based selection, and many of these examples are already possible, but they are currently deployed in
limited and localized circumstances because of their complexity, cost, and primitive stage of devel-
opment. Future innovation will have a broad set of drivers and influences, including advancements
in technology standards that unify how people and data are represented to better enable integration
(e.g., HR-XML Consortium standards); case law and government regulations will have a broader
impact on technology application in areas that relate to fairness, privacy, and accessibility; and
venture investment will flow toward new technologies that have the potential for business impact.
These facilitators and potential barriers will interplay to shape the availability and distribution of
new approaches, providing a rich test bed for I-O psychologists who are prepared to participate.
Technology and Employee Selection 191

As discussed earlier, I-O psychologists must collaborate closely with their IT staff and technology
providers outside of their organizations to meet their stakeholders’ needs, to stay educated about
advances that affect their work, and to obtain and hold a strategic role in their organization’s
decisions and policies. As technology advances, it will be important to continuously look for
opportunities for research and practice improvements, rather than being reactive to these trends,
responding only after other stakeholders have noted the opportunities and begun to demand new
systems and processes.
Getting to know the IT staff in your organization is one way to stay on top of technologi-
cal innovations; another is supplementing the SIOP conference with others that follow technology
in HR, such as the HR Executive HR Technology Conference. Other conferences that cover the
topic include the annual meetings of the International Test Council and the Association of Test
Publishers. For reference material on trends and business applications written from a practitioner
perspective, I-O psychologists should consult IT-focused business articles in Human Resources
Executive, Harvard Business Review, and the MIT Sloan Management Review, for example. SIOP’s
volume on technology in HR (Gueutal & Stone, 2005) provided practical advice on a range of top-
ics. Readers who are interested in technologies that are currently available for use in organizations
should consult industry analysts and commentators, such as HR.com, Human Resource Executive
Online (HREonline.com), and Rocket-hire.com, in which new technology products are reviewed
frequently.
As described in the section on critical issues and elsewhere in this chapter, technology-based
testing generates questions about the constructs that are being measured and the extent to which
administration methods (such as unproctored testing) affect measurement. Because this literature
is practice-oriented and typically atheoretical, the results often pertain to best practices or research
methods rather than to psychological theory. In fact, it may be the lack of theory associated with
the development and implementation of technology that has held back research in this area. Many

TABLE 8.3
Summary: Areas for Future Research
Equivalence
• Research should better distinguish between computerized and Internet-based assessment to examine Internet-specific
considerations (e.g., bandwidth, Internet connectivity issues, browser constraints).
• More research in the noncognitive domain: Can new technologies limit response distortion?
• Within-group, repeated-measure studies of technology-based tests and assessments.

Assessment Environment
• Examine differences under the most controversial usage conditions: high-stakes testing with clear pass/fail outcomes
under various proctoring conditions.
• When conducting quasi-experimental field studies, research should classify the factors that influence scores so a
taxonomy of influences and acceptable administration protocols can be developed.

New Technologies
• Does adaptive test delivery limit security risks for unproctored testing?
• How might video presentation or other technology mitigate adverse impact?
• Does animation (e.g., animated work scenarios, use of avatars) influence assessment responses and/or reactions?
Are responses consistent with those generated by live role players?
• What assessment value is added by tracking “microbehaviors” such as hover-time and click patterns?
• Can technical advances such as noncognitive CAT approaches limit score inflation in applied settings?

Organizational Impact
• What combinations of technologies (e.g., selection systems, performance management systems) add incremental value
to organizations?
192 Handbook of Employee Selection

of the issues raised by technology are fundamentally practical and thus may hold less appeal for
full-time researchers. The published literature on computerized and Internet testing is relatively
small. To generate further research interest, we have summarized many of the issues raised in this
chapter in Table 8.3.

REFERENCES
American Educational Research Association, American Psychological Association, and National Council on
Measurement in Education. (1999). Standards for educational and psychological testing. Washington,
DC: American Psychological Association.
Bartram, D., & Brown, A. (2004). Online testing: Mode of administration and the stability of OPQ 32i scores.
International Journal of Selection and Assessment, 12, 278–284.
Borman, W. C., Lentz, E. M., Schneider, R. J., Houston, J. S., Bearden, R., & Chen, H. T. (2008). Adaptive
personality scales as job performance predictors: Initial validation results. In M.S. Fetzer & S.E. Lambert
(Eds.), Computer adaptive testing (CAT) and personnel selection. Presented at the annual meeting of the
Society for Industrial and Organizational Psychology, San Francisco, CA.
Cappelli, P. (2001). Making the most of online recruiting. Harvard Business Review, 79, 139–146.
Dwight, S. A., & Feigelson, M. E. (2000). A quantitative review of the effect of computerized testing on the
measurement of social desirability. Education and Psychological Measurement, 60, 340–360.
Embretson, S. E. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum
England, G. W. (1971). Development and use of weighted application blanks. Dubuque, IA: Brown.
Equal Employment Opportunity Commission, U.S. Department of Labor, U.S. Department of Justice, and
Office of Personnel Management. (2004). Agency information collection activities: Adoption of addi-
tional questions and answers to clarify and provide a common interpretation of the Uniform Guidelines
on Employee Selection Procedures as they relate to the Internet and related technologies. Federal Register,
69(43), 10152–10158.
Federal Trade Commission v. Toysmart.com, LLC, 2000 WL 34016434 (D. Mass. July 21, 2000).
Fletcher, P. A. K. (2005). From personnel administration to business-driven human capital management. In
H. Gueutal, D. L. Stone, & E. Salas (Eds.), The brave new world of eHR: Human resources in the digital
age (pp. 54–103). New York, NY: Wiley & Sons.
George, C. E., Lankford, J. S., & Wilson, S. E. (1998). The effects of computerized versus paper-and-pencil
administration of measures of negative affect. Computers in Human Behavior, 8, 203–209.
Gibby, R. E., Biga, A. M., Pratt, A. K., & Irwin, J. L. (2008). Online and unsupervised adaptive cognitive ability
testing: Lessons learned. In R. E. Gibby & R. A. McCloy (Chairs), Benefits and challenges of online and
unsupervised adaptive testing. Symposium presented at the annual meeting of the Society for Industrial
and Organizational Psychology, San Francisco, CA.
Gilliland, S., & Cherry, B. (2000). Customers of selection processes. In J.F. Kehoe (Ed.), Managing selection
strategies in changing organizations (pp. 158–196). San Francisco: Jossey-Bass.
Gueutal, H. G., & Stone, D. L. (Eds.) (2005). The brave new world of eHR: Human resources in the digital age.
San Francisco, CA: Jossey-Bass.
Hausknecht, J. P., Day, D. V., & Thomas, S. C. (2004). Applicant reactions to selection procedures: An updated
model and meta-analysis. Personnel Psychology, 57, 639–683.
Howard, A., Landy, F. J., & Reynolds, D. H. (2002). Marrying assessment and technology: For better and
worse. Workshop conducted at the meeting of the Society for Industrial and Organizational Psychology,
Toronto, Canada.
Kehoe, J. F., Dickter, D. N., Russell, D. P., & Sacco, J. M. (2005). e-Selection. In H. Gueutal, D. L. Stone, &
E. Salas (Eds.), The brave new world of eHR: Human resources in the digital age (pp.54–103). New
York, NY: Wiley & Sons.
Landauer, T. (2006). Reliability and validity of a new automated short answer scoring technology. Paper pre-
sented at the Annual Meeting of the Association of Test Publishers, Orlando, FL.
McCloy, R. A., Heggestad, E. D., & Reeve, C. L. (2005). A silk purse from the sow’s ear: Retrieving nor-
mative information from multidimensional forced-choice items. Organizational Research Methods, 8,
222–248.
Mead, A. D., & Drasgow, F. (1993). Effects of administration medium: A meta-analysis. Psychological Bulletin,
114, 449–458.
Mead, A.W., Michels, L. C., & Lautenschlager, G. J. (2007). Are Internet and paper-and-pencil personality tests
truly comparable? Organizational Research Methods, 10, 322–345.
Technology and Employee Selection 193

Naglieri, J. A., Drasgow, F., Schmitt, M., Handler, L., Prifitera, A., Margolis, A., & Velasquez, R. (2004).
Psychological testing on the Internet: New problems, old issues. American Psychologist, 59, 150–162.
National Telecommunications and Information Administration. (2002). A nation online: How Americans are
expanding their use of the Internet. Washington, DC: U.S. Department of Commerce.
Outtz, J. (1998). Video-based situational testing: Pros and cons. Workshop presented at the annual meeting of
the Society for Industrial and Organizational Psychology. Dallas, TX.
Pew Research Center. (2007). Demographics of Internet users. Retrieved June 15, 2008, from http://www.
pewinternet.org/Data-Tools/Download-Data/Trend-Data.aspx
Ployhart, R. E., Weekly, J. A., Holtz, B. C., & Kemp, C. (2003). Web-based and paper-and-pencil testing of
applicants in a proctored setting: Are personality, biodata, and situational judgment tests comparable?
Personnel Psychology, 56, 733–752.
Potosky, D., & Bobko, P. (2004). Selection testing via the Internet: Practical considerations and exploratory
empirical findings. Personnel Psychology, 57, 1003–1034.
Reynolds, D. H. & Lin, L. (2003). An unfair platform? Subgroup reactions to Internet selection techniques. In
T. N. Bauer (Chair), Applicant reactions to high-tech recruitment and selection methods. Symposium
conducted at the meeting of the Society for Industrial and Organizational Psychology, Orlando, FL.
Richman, W. L., Kiesler, S., Weisband, S., & Drasgow, F. (1999). A meta-analytic study of social desirability
distortion in computer-administered questionnaires, traditional questionnaires, and interviews. Journal
of Applied Psychology, 84, 754–775.
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000). A general item response theory model for unfolding
unidimensional polytomous responses. Applied Psychological Measurement, 24, 3–32.
Rupp, D. E., & Fritzsche, B. A. (2008). Using technology to enhance assessment and development programs.
Workshop conducted at the meeting of the Society for Industrial and Organizational Psychology, San
Francisco, CA.
Ryan, A. M., & Ployhart, R. E. (2000). Applicants’ perceptions of selection procedures and decisions: A critical
review and agenda for the future. Journal of Management, 26, 565–606.
Salgado, J. F., & Moscoso, S. (2003). Internet-based personality testing: Equivalence of measures and asses-
sees’ perceptions and reactions. International Journal of Assessment and Selection, 11, 194–203.
Siem, F. M. (1991). Predictive validity of response latencies from computer administered personality tests.
Paper presented at the 33rd Annual Conference of the Military Testing Association, San Antonio, TX.
Sinar, E. F., Reynolds, D. H., & Paquet, S. L. (2003). Nothing but ‘Net? Corporate image and web-based
testing. International Journal of Selection and Assessment, 11, 150–157.
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of person-
nel selection procedures (4th ed.). Bowling Green, OH: Author.
Standish Group. (1995). The chaos report. West Yarmouth, MA: Author.
Stark, S., Chernyshenko, O. S., Drasgow, F., & Williams, B. A. (2006). Examining assumptions about item
responding in personality assessment: Should ideal point methods be considered for scale development
and scoring? Journal of Applied Psychology, 91, 25–39.
Tippins, N. T., Beaty, J., Drasgow, F., Gibson, W. M., Pearlman, K., Segall, D. O., & Shepherd, W. (2006).
Unproctored Internet testing in employment settings. Personnel Psychology, 59, 189–225.
U.S. Department of Labor. (2005). Obligation to solicit race and gender data for agency enforcement purposes:
Final rule. Federal Register, 70 (194), 58947–58961.
Weekley, J. A., & Jones, C. (1997). Video-based situational testing. Personnel Psychology, 50, 25–49.
Weiss, J. R. (2001). Six things you should know about XML. The Industrial Organizational Psychologist, 39,
30–34.
This page intentionally left blank
9 Strategy, Selection, and
Sustained Competitive
Advantage

Robert E. Ployhart and Jeff A. Weekley

This chapter is motivated by a simple question: Do professionally developed personnel selection


practices offer strategic value to the firm? Most industrial-organizational (I-O) psychologists would
answer this question with an enthusiastic “Yes!” The belief that hiring better people will result in
better job performance, which will in turn contribute to better functioning organizations, is imbued
early in most I-O psychologists. Utility analyses indicate that selection systems with high validity
will generate monetary returns far in excess of the costs associated with selection. How then, despite
nearly a century of research demonstrating the consequences of effective selection at the level of the
individual, does the question posed above remain a real concern amongst practitioners, consultants,
and academicians?
Consider what it means for human resources (HR) to offer strategic value to a firm. At a high
level, an HR practice will add strategic value to the extent it supports execution of the firm’s business
strategy. Stated differently, an HR practice such as selection must support a firm’s strategy and
uniquely enable it to compete against other firms. Personnel selection is focused on identifying
whether applicants have the necessary knowledge, skills, abilities, or other characteristics (KSAOs)
to contribute to effective individual performance on some criterion/criteria. However, demonstra-
tion of validity of a selection procedure is—by itself—insufficient in creating sustainable competi-
tive advantage. The requirements for that include demonstration of firm (or unit), level consequences
for a selection procedure that cannot be easily replicated. The latter point is important because there
is growing recognition that HR practices are easily copied and consequently may not form a basis
for strategic value (Wright, Dunford, & Snell, 2001).
Although it is likely true that using a more valid selection system will improve the quality of a
firm’s workforce (all else being equal), that by itself does not make selection strategically valuable.
Because of the outsourcing of selection practices, many competitors can (and do often) use the
same vendor’s selection assessments. As a result, selection in such firms cannot contribute to their
sustained competitive advantage although they may use selection procedures that are predictive of
individual job performance. In this chapter, we do not question whether professionally developed
selection practices can add value to the firm. We believe that they often do. However, we doubt
whether this will always result in competitive advantage, and we offer some broad guidance about
the conditions under which it will and will not occur. We also discuss how a broader perspective on
selection can better articulate its value and thus perhaps increase the likelihood of effective selec-
tion practices being implemented and supported by top management. Demonstrating how effective
personnel selection practices contribute to firm performance only increases the likelihood of such
practices being implemented.

195
196 Handbook of Employee Selection

In the sections that follow, we first discuss the need for selection to take a broader perspective
and show consequences at the business unit level. We then review the dominant strategic HR per-
spectives on how HR practices and human capital resources are linked to a business unit’s strat-
egy. This is followed by a critical examination of whether personnel selection contributes to such
resources and sustained competitive advantage. We then discuss several ways through which selec-
tion may contribute to the unit’s ability to execute its strategy. We conclude by considering selection
in relation to other HR practices (e.g., training, compensation) for creating competitive advantage.
Most of the discussion to follow is based on integrating theoretical perspectives, with less emphasis
on empirical research, simply because such research has not yet been conducted. The hope is that
this chapter can serve as a springboard for empirical research into the effects of selection on strategy
execution and ultimately organizational performance.

WHY PERSONNEL SELECTION MUST SHOW BUSINESS-UNIT-LEVEL VALUE


The most basic requirement for an HR practice to provide strategic value is to demonstrate that the prac-
tice has noticeable effects on outcomes at the business-unit level. Business units are broadly defined as
those organizational entities that meaningfully describe unit functions or structures. Examples include
departments, stores, divisions, lines of business, and of course, entire firms. Most key organizational
decision-makers are held accountable for consequences that exist at these unit levels, and key orga-
nizational decisions are driven by the performance of these units. Therefore, business-unit-level
consequences are critical for demonstrating that the HR practice contributes to the firm’s success.
Several staffing scholars have argued that recruitment and selection research has failed to
sufficiently demonstrate business unit value because it is still focused only on individual level
outcomes (Ployhart, 2006; Saks, 2005; Taylor & Collins, 2001). It is safe to say that most personnel
selection research is limited to the individual level. There is an abundance of evidence that best
practices in selection will lead to the identification of KSAOs necessary for effective individual
performance. The use of sound selection procedures is based on the expectation that such practices
contribute to improved organizational effectiveness (Schneider, Smith, & Sipe, 2000). Although the
development of the selection system may be focused on improving individual job performance, the
ultimate belief is that hiring better employees will result in more effective firms.
Yet reviews of the selection literature indicate that most research has never examined this question
directly (Ployhart, 2004). Utility analysis is an estimate of the monetary value of selection, but it does
not test the effects directly (Schneider et al., 2000). Among other things, most conceptualizations
of utility analysis assume financial returns on staffing are linear, additive, and stable. Such
assumptions are questionable. This may be, in part, why managers have found utility estimates
unbelievable. For example, Latham and Whyte (Latham & Whyte, 1994; Whyte & Latham, 1997)
found that utility analysis reduced managerial support for implementing a valid selection procedure,
although the economic benefits of doing so were substantial and the logic and merits of utility
analysis as a decision-making tool were carefully described. Perhaps one reason managers have
not embraced utility analysis, in addition to not appreciating the mathematical proofs behind the
Brogden-Cronbach-Gleser and related models, is the extremely high valuations placed on the typical
selection intervention. Consequently, I-O psychology continues to struggle with demonstrating the
business-unit-level value of selection in a manner that managers find credible.
More recent research within the tradition of strategic human resource management (SHRM) finds
that firms using professionally developed selection practices perform better on financial, account-
ing, or market-based criteria than those that do not (e.g., Huselid, 1995). However, one important
limitation of most of these studies is that they rely on a manager (or small group of managers) to
self-report the nature of the selection practice for the entire firm (see Gerhart, 2005; Wright &
Haggerty, 2005). The often-cited paper by Terpstra and Rozell (1993), who showed that firms using
more valid predictors outperformed those that did not, even used a self-report measure of firm effec-
tiveness. It is interesting that as a profession we seem willing to place confidence in self-reports of
Strategy, Selection, and Sustained Competitive Advantage 197

unit-level performance, when we place no such confidence in self-assessments of one’s own perfor-
mance! A second, more important limitation of this research is that it asks only generally whether
“a valid or systematic selection process is used.” Demonstrating that higher performing firms are
more likely to use systematic selection procedures does not establish causality and is a far cry from
showing that units with the greatest levels of talent perform the best. As DeNisi, Hitt, and Jackson
(2003, p. 12) questioned, “If hiring ‘better’ people results in higher productivity, how exactly does
the selection of individuals translate into improved organizational performance?”
This all leads to the rather uneasy conclusion that the existing literature says relatively little about
whether selection contributes to business unit effectiveness—and hence offers the possibility of
competitive advantage to the firm (Ployhart, 2006). We believe it is critical for selection researchers
to show such value for three reasons. First, to the extent selection is perceived as nonstrategic, many
managers and organizational decision-makers will continue to use suboptimal selection practices.
Second, an inability to demonstrate value may limit the rigorous I-O approach to selection primar-
ily to entry-level hires (or “bottom of the pyramid hires”), where the focus may be on efficiency
and cost-effectiveness rather than added value. Third, the profession of I-O psychology may not
achieve the degree of reputation and visibility it deserves by failing to demonstrate how one of its
core practices, selection, adds value to firms. We do not mean to overstate our points here because
clearly many firms do not suffer from these issues. But it is also apparent that many firms do not
use selection practices as I-O psychologists would advocate (Anderson, 2005; Rynes, Brown, &
Colbert, 2002). We believe part of reason stems from the challenge associated with showing selec-
tion’s unit-level impact (Ployhart, 2006).

SHRM
The field of SHRM has grown rapidly since the late 1980s. Most traditional HR, and especially
I-O, research focuses on predicting, explaining, or controlling individual behavior. As noted above,
selection in particular has treated individual job performance (and related criteria like turnover) as
“the ultimate criterion.” In contrast, the criteria of interest in SHRM scholarship are at the unit level,
and most typically that of the firm (Wright & Boswell, 2002). SHRM scholarship tends to focus on
between-firm (or business unit) differences in HR practices that help explain between-firm (or busi-
ness unit) differences in performance. The typical SHRM study involves an examination of how
business units that use different HR practices perform differently (Becker & Huselid, 2006). Thus,
unlike traditional HR research, which focuses on individual differences in outcomes, SHRM research
focuses on unit differences in outcomes. The other part to the story is that SHRM researchers tend to
focus on predicting financial, market, or accounting criteria (Gerhart, 2005). Hence, SHRM scholar-
ship has attracted a great deal of attention from the business community precisely because it shows
that HR practices can improve the organizational metrics most important to stakeholders.
Broadly speaking, there are three dominant perspectives on how to best use HR practices
(Delery & Doty, 1996). The universalistic perspective suggests that use of certain HR practices will
always be useful and relevant. Note that such a belief suggests using the appropriate practices will
always improve a firm’s effectiveness, irrespective of changes in the economy, the firm’s strategy, or
its competition. Colbert (2004) argued:

Research under this perspective has been useful in identifying discrete HR practices that are universally
sensible, but it has not contributed much to HRM in the strategic sense, if we take strategic to mean
practices that differentiate the firm in its industry and that lead to sustainable competitive advantage.
(p. 344).

Yet this universalistic approach is precisely the approach taken by many selection scholars (e.g.,
cognitive ability is always a good predictor of job performance). The contingency perspective
suggests that HR practices will be useful and relevant only when they match with each other and
198 Handbook of Employee Selection

the firm’s strategy. The contingency perspective is more directly linked to adding value for the firm
because it recognizes that HR practices must support the firm’s strategy and be internally consistent
with other practices. Attempts have been made to link HR strategies to generic business strategies
(e.g., Porter’s cost leadership, product differentiation, and focus), (Schuler, Galante, & Jackson, 1987).
Although the idea that the “appropriate” set of HR practices can be deduced from a general business
strategy has obvious appeal, the approach has been widely criticized (Chadwick & Cappelli, 1999).
The configural perspective builds from the contingency approach to further recognize synergies
that exist in patterns of practices that fit with particular strategies. This approach takes the most
holistic view of HR management and suggests specific HR practices cannot be understood (or their
effects decomposed) in isolation from other practices and the firm’s unique strategy. In contrast, it
appears there are bundles of HR practices that must be used in combination to drive strategic value.
These are most often called high-performance work systems and involve combinations of practices
that include systematic staffing, training, compensation, and related practices. Firms that use these
high performance work systems outperform those that do not (Huselid, 1995).
Although there is now fairly compelling evidence that use of HR practices and high-performance
work systems is related to firm value (Combs, Yongmei, Hall, & Ketchen, 2006; although see Wright,
Gardner, Moynihan, & Allen, 2005), it should be recognized that most SHRM research only exam-
ines the link between unit HR practices and effectiveness. Intervening explanatory processes, such
as how the HR practice influences the cognitions, affect, and behavior of individuals, are rarely
considered (Gerhart, 2005; Becker & Huselid, 2006; Wright & Haggerty, 2005). These unexamined
intervening processes have been referred to as SHRM’s “black box.” It is within this black box that
I-O psychology in general, and selection specialists in particular, are uniquely situated to demon-
strate value. First, however, it is important to understand the dominant theoretical perspectives that
are invoked to explain why HR practices contribute to firm performance.

RESOURCE-BASED VIEW OF THE FIRM


Wright, Dunford, and Snel (2001) noted the dominant theoretical perspective adopted by SHRM
scholars has been the resource-based view of the firm (RBV), as articulated by Barney (1991). What
makes the RBV important among strategy theories is its emphasis on a firm’s internal resources.
Internal resources may represent human capital, top management expertise, financial capital, coor-
dination processes, and related factors. Importantly, the RBV argues that there is heterogeneity in
firm-level resources that contributes to some firms having a competitive advantage over other firms.
Further, the RBV makes clear predictions about the characteristics of resources that have the poten-
tial to underlie sustained competitive advantage.
First, valuable resources are those linked to the firm’s strategy and allow it to perform better
than competitors. For example, having highly qualified employees could be a valuable resource if
they resulted in firm-level competencies that manifested themselves in firm-level outcomes. Second,
rare resources are more likely to result in an organizational competitive advantage because there is
an insufficient quantity in the market. By definition, the most talented people will be rare (think of
a normal distribution), so firms that better attract and retain the best talent should benefit directly
(they have the rare talent) and indirectly (by keeping it out of the competition). Together, valuable
and rare resources create opportunities for a firm to achieve competitive advantage. However, what
firms need to be more concerned with is sustainable competitive advantage. A competitive advan-
tage that is not sustainable leads only to conditions of temporary superiority followed typically by
parity with other firms. Two additional conditions must be met for a competitive advantage to be
sustainable.
Inimitable resources are those that competitors cannot readily duplicate. For example, if one firm
retains high-quality talent better than its competitors, then it has created a resource that is inimitable.
Social complexity, time compression diseconomies, and causal ambiguity contribute to inimitability
(Barney & Wright, 1998). Social complexity refers to resources that only exist among aggregate
Strategy, Selection, and Sustained Competitive Advantage 199

collectives of people. For example, in many organizations knowledge is shared informally through
social networks, rather than through more formal organizational structures and processes. As such,
it is quite difficult to replicate the knowledge and knowledge sharing process in other organizations.
A highly effective climate is likewise socially complex because it exists in the shared perceptions
of employees. Southwest Airlines has fended off multiple imitators (remember Continental Lite or
the United Shuttle?). Although these imitators could replicate the business model (e.g., point-to-
point service, single type of aircraft, minimal in-flight service, etc.), they could not duplicate key
elements of the organization’s culture. Time compression diseconomies represent the notion that
time is often not something that can be compressed with equal effectiveness (Dierickx & Cool,
1989). For example, firms that have strong brand equity have an advantage that competitors cannot
easily copy because it takes a long time to generate brand recognition and brand loyalty. Causal
ambiguity describes resources that are linked to effective firm performance, but the specific reasons
or paths through which they contribute are not obvious. For example, it may not be apparent which
specific HR practices or combinations of practices contribute to building a more effective workforce.
Because creation of these resources is not easily understood, it is hard for competitors to copy them
with equal effectiveness.
In the RBV, the final condition for creating sustainable competitive advantage is that the resources
be nonsubstitutable. Nonsubstitutable resources are those that are both necessary and sufficient for
effective firm performance. For example, ATMs are an effective technological substitution for most
bank teller transactions, making bank tellers’ talent a substitutable resource. However, suppose that
bank tellers also provide financial advice that adds value and increases the customer base—in this
example bank tellers’ talent would be nonsubstitutable. Thus, only resources that are valuable, rare,
inimitable, and nonsubstitutable can create sustained competitive advantage. Selection practices
may or may not meet these conditions.

HUMAN AND SOCIAL CAPITAL THEORIES


Human capital theory (Becker, 1964) is a broad theory originating in economics (in fact, Becker
won the 1992 Nobel Prize for his work on human capital theory). As it relates to HR, human capital
theory argues that the acquisition and retention of firm-specific knowledge is an important determi-
nant of firm effectiveness (Strober, 1990). Generic knowledge is not particularly valuable because
it can be applied equally to any firm. However, as employee knowledge becomes firm-specific, it
generates potential for improving the firm because of increased knowledge of processes, opera-
tions, products, customers, and coworkers. It could be argued that firm-specific knowledge is also
more valuable, rare, inimitable, and possibly nonsubstitutable. Modern extensions to human capital
theory include the knowledge-based view of the firm (Grant, 1996). Note that such perspectives on
knowledge are not merely job knowledge, but include knowledge of the organization’s customers,
products, services, structure, processes, culture, and related factors. Thus, human capital represents
the business-unit-level aggregate of employee KSAOs.
Whereas human capital theory usually emphasizes aggregate employee education, experience,
or knowledge, social capital theory emphasizes the interpersonal relationships and networks that
exist among employees, units, and organizations (Nahapiet & Ghoshal, 1998). Given that modern
work is increasingly knowledge- and team-based, social networks are a critical means for sharing
and creating knowledge (Oldham, 2003). These social networks can generate effects on unit crite-
ria unrecognized at the individual level. For example, Shaw, Duffy, Johnson, and Lockart (2005)
demonstrated that using unit-level turnover rates (percent of people who quit) as a predictor of unit
performance underpredicted the true costs of turnover. Furthermore, he showed that the negative
effects of turnover were greater when those who left the firm were more centrally located in the
employees’ social networks. Simply put, losing a “more connected” person is more damaging than
the departure of someone on the periphery. Thus, social capital represents the business-unit-level
aggregate of employee social networks, relationships, and structures (Nahapiet & Ghoshal, 1998).
200 Handbook of Employee Selection

ALIGNMENT OF SELECTION AND STRATEGY


From the preceding discussion it is apparent that selection practices, by themselves, may not
contribute to sustained competitive advantage and hence demonstrate firm-level value. Figure 9.1
is an attempt to illustrate and integrate the various SHRM concepts noted above with personnel
selection. This figure is based on a multilevel, strategic model of staffing presented by Ployhart
(2006). However, it makes more careful consideration of the types of firm-level resources likely to
be important for sustained competitive advantage.
In Figure 9.1, notice that a business unit’s HR practices in general, and selection practices in
particular, will have a direct impact on the individual-level KSAOs attracted, selected, and retained
(Bowen & Ostroff, 2004; Schneider, 1987). This is as one would expect; use of more valid assess-
ments should better isolate the desired KSAOs (Guion, 1998). It is also as expected that these KSAOs
have a direct relationship with individual job performance (denoted by the dashed lines in Figure 9.1).

Business
unit’s
strategy

Business unit’s mix of


strategic resources:

Valuable
Rare

Strategic (macro) level


Inimitable
Nonsubstituable

Business Unit level Unit level


unit’s human human & sustained
resource social competitive
practices capital Human capital advantage
advantage

Human
capital
Selection (individual) level

emergence

Individual
differences Individual
(KSAOs) level job
acquired and performance
retained

FIGURE 9.1 Conceptual model-linking personnel selection with business unit strategy. Notice that the extent
to which unit-level resources are valuable, rare, inimitable, and nonsubstitutable will determine the strength of
relationship between unit-level human and social capital and sustained competitive advantage. However, note
that the contingent relationships found at the strategic level do not moderate individual-level predictor valid-
ity. The dashed line represents the primary relationship examined in personnel selection research. Only the
bold box represents potentially strategic resources. (Adapted from Ployhart, R. E., Journal of Management,
32, 868–897, 2006.)
Strategy, Selection, and Sustained Competitive Advantage 201

Here is where the usual selection story ends, with perhaps the additional effort in estimating utility.
However, considering our review of the SHRM literature earlier, it is apparent that effective selec-
tion practices may or may not support the business unit’s strategy and add value.
One of the main reasons for this ambivalence is that selection practices can be copied and will
not, by themselves, form the basis for sustained competitive advantage. Such practices may lead to
short-lived advantage or contribute to maintaining parity with competing units. Instead, unit-level
competencies (i.e., collective human and social capital; the bold box in Figure 9.1) form the basis
for sustained competitive advantage (Prahalad & Hamel, 1990; Barney & Wright, 1998). Consider
the organization selecting frontline employees on the basis of conscientiousness, agreeableness, and
emotional stability. By themselves, these individual-level selection KSAOs can and are screened
by many companies. However, when combined with other properly aligned HR practices, a strong
culture, and other management practices, selecting on these individual-level KSAOs can lead to the
emergence of a unit-level competency such as “aggregate customer service.” Thus, unit-level human
and social capital competencies are the source of sustained competitive advantage; they represent
intangible assets and add value to the firm but are created through application of HR practices like
personnel selection. However, the following paragraphs discuss certain conditions that must be met
before unit-level competencies will create sustainable unit-level differences in performance.
First, validity at the individual-level generalizes across like jobs, as is consistent with most
selection research (note there is no moderator of the dashed relationship in Figure 9.1). However,
this does not mean the validity of unit-level competencies (human and social capital) generalizes
across contexts. Rather, human capital at the unit level only adds value to the extent it is consistent
with the firm’s strategy. Also known as “external fit,” an organization creating a unit-level compe-
tency such as customer service will benefit only to the extent that the competency is critical to its
business strategy. Whereas one retailer competing on service would benefit from the creation of a
customer service competency, another competing on price would experience less, if any, impact
from the same activities. This is an important implication because it means one can be doing a
wonderful job of selecting at the individual level, yet adding nothing to the sustained competitive
advantage of the firm.
Second, it was previously asserted that selection practices alone are insufficient to create the
unit-level competencies that are the basis of sustained competitive advantage. Selection practices
must be aligned with other HR programs (e.g., training, performance management, and compensa-
tion) to create the desired impact. Selection practices that are inconsistently applied, or conflict with
other HR practices, will not create the type of human capital emergence necessary for a unit to build
a high-quality stock of aggregate human and social capital. For example, a unit may use exemplary
selection practices, but have an inability to recruit top talent. Or, the unit may attract and select top
talent, but fail to keep them in sufficient quantities. The unit’s climate, leadership, or compensation
may further enhance or lower the capital’s potential. Also known as “internal fit,” it is important to
recognize that selection is but one lever available to management and that the greatest impact comes
when all levers are aligned and pointing in a common direction.
Third, selection can have significant strategic value if the targeted KSAOs fit externally (with
the business strategy) and internally (with other HR programs) to create unit-level competencies
that drive important firm outcomes. Firms may compete on service, cost, or innovation; they do
not select on these competencies. While complicated enough, this scenario is likely to have several
important boundary conditions. For example, the relationship between unit competencies and unit
outcomes is unlikely to be linear cross-sectionally or over time (see Ployhart, 2004). Figure 9.1
emphasizes a contingency or configural view of selection practice because the value of unit-level
human and social capital is dependent (moderated by) on the unit’s strategy. This is in contrast to the
personnel selection literature, which in many ways argues for a universalistic approach. It may well
be the case that cognitive ability and conscientiousness predict the individual performance of most
jobs (Schmidt & Hunter, 1998). Although true, it does not necessarily follow that selecting on these
attributes will result in sustainable competitive advantage. Further, universalistic findings at the
202 Handbook of Employee Selection

individual level become contextualized at the unit level, and hence require a configural perspective.
For example, looking cross-sectionally, there is likely a need for a critical mass of human and social
capital (Deirickx & Cool, 1989). Particularly in highly task-interdependent work, hiring only a few
highly talented employees is unlikely to result in unit-level competencies (i.e., ensure sufficient
quantities of human and social capital) that will translate into improved business unit performance.
Although the “tipping point” defining critical mass likely changes by industry, occupation, and
organization, there is always a threshold. The relationship between the level of unit’s talent and its
performance is likely to be nonlinear.
Fourth, looking longitudinally, time is a critical element for modeling the value of human and
social capital. It takes time for HR practices to create human and social capital, and more time
for the increasing stock to translate into unit-level outcomes (Ployhart, 2004). Such a temporal
perspective is true for all unit-level processes, with higher-level processes requiring a longer time
perspective than lower-level processes (Kozlowski & Klein, 2000). In short, implementation of a
new selection procedure, regardless of its merits, will take time before its effects can be measured
at the unit level. This, of course, assumes that the other HR practices comprising the “bundle” are
themselves stable. Where other HR practices are evolving, the effects of a selection procedure may
be even more difficult to ascertain (or their effects accelerated). This requisite leap of faith may
explain why some organizations frequently change their selection processes.
One final caveat is that the predictors of individual job performance may or may not be the best
drivers of sustained competitive advantage. Certainly one expects better individual performance
to contribute to better unit performance, but this relationship will not necessarily be linear or
isomorphic (e.g., group process losses, etc). Because the criteria are not isomorphic across levels
(e.g., adding one additional “high ability” employee to a casino operator employing hundreds is
unlikely to measurably impact that unit’s performance), there exists a good possibility the predictors
will not be as well (Bliese, 2000). This means additional KSAOs may be important drivers of
unit effectiveness and sustained competitive advantage (e.g., those contributing to shared collective
performance and social capital, such as personality traits linked to interpersonal and teamwork skills;
Noe, Colquitt, Simmering, & Alvarez, 2003; Oldham, 2003). In short, characteristics unrelated to
individual performance may be important for unit performance (see also Chapter 35, this volume,
regarding team member selection).
If the above propositions are compared to the typical model for utility analysis, similarities and
differences become apparent. In terms of similarities, utility analysis and SHRM scholarship would
predict KSAOs that are more rare (have a lower base rate in the population) will result in higher
utility and competitive advantage. However, there is also an important difference. Utility analysis
assumes that economic value resulting from more valid selection will produce firm-level value;
SHRM scholarship does not. Validity does not equal value, and valuable and rare resources may
only contribute to parity or temporary competitive advantage. Sustained competitive advantage also
requires resources that are inimitable and nonsubstitutable. It is on these latter two points that utility
analysis and the contemporary view of selection fall short.
These points illustrate that one cannot simply assume that using more valid selection procedures
will add firm-level value, improve organizational performance, or create an opportunity for sus-
tained competitive advantage. In fact, sole focus on individual level selection prohibits such conclu-
sions. Again, consider one of the most important implications of Figure 9.1: an organization can be
using highly valid predictors of job performance at the individual level but not necessarily develop-
ing the kinds of human capital necessary for the firm’s sustained competitive advantage! This may
seem discouraging, but it represents incredible opportunity for selection scholars and practitioners
to better demonstrate their value to organizational decision-makers. There exists a bridge to connect
micro and macro disciplines of scholarship by linking selection (micro) to strategy (macro), and
each discipline has the opportunity to strengthen the other. From a practical perspective, articu-
lating selection’s value will increase the likelihood of effective practices being implemented and
supported.
Strategy, Selection, and Sustained Competitive Advantage 203

Consider the fact that staffing impacts the firm’s strategy and helps implement it. In terms of
impact, Figure 9.1 alludes to the expectation that implementing HR practices aligned with the unit’s
strategy will contribute to human and social capital emergence that is valuable, rare, inimitable,
and nonsubstitutable. However, through the process of developing and administering the selection
system, selection also helps to articulate and reinforce the firm’s strategy through all levels of the
organization. In the following section, we consider various means through which personnel selec-
tion can establish business-unit-level value.

SELECTION’S POTENTIAL CONTRIBUTION TO BUSINESS UNIT VALUE


Organizational decision-makers will devote the necessary resources for staffing practices because
a firm needs talent to survive. But organizational decision-makers are more likely to devote con-
siderable resources to the extent they believe selection practices will reinforce the firm’s strategy.
In the sections that follow, we discuss several opportunities for making this value proposition. We
acknowledge there is little to no research on many of these topics, yet we hope these gaps will help
stimulate some much-needed research.

MULTILEVEL SELECTION SHOWS BUSINESS UNIT CONSEQUENCES


Linking selection to a unit’s strategy requires a multilevel focus (Ployhart, 2006). Specifically,
one must show the emergence of individual-level KSAOs into unit-level human and social capital
(KSAO composition; see Figure 9.1). In multilevel language, emergence refers to ways in which
lower level KSAOs combine to form collective, unit-level constructs (Kozlowski & Klein, 2000)
that have frequently been referred to as core competencies. This is the critical link in Figure 9.1
that translates HR practices into strategically valuable intangible assets (i.e., human and social
capital). Emergence takes two general types. Composition forms of emergence represent agreement
or similarity among observations. When systematic selection results in homogeneity in KSAOs, it
becomes possible to describe the unit in terms of those KSAOs. This is of course the basis of the
attraction-selection-attrition model (Schneider, 1987). For example, one firm that attracts, selects,
and retains only those with high levels of conscientiousness will be more conscientious than a firm
unable to develop such talent. Compilation forms of emergence represent dissensus or dissimilarity.
For example, academic departments often specifically target applicants who have skills and
research interests absent among current faculty, thereby filling a hole in the department’s portfolio
of knowledge and teaching expertise.
There has been a fair amount of theory in the last decade focused on developing multilevel staff-
ing practices. Schneider et al. (2000) noted that personnel selection practices must become multilevel
and suggested that between-firm differences are driven by the effects of unit-level competencies,
which themselves are composed of similarity on individual differences. Ployhart and Schneider
(2002) discussed the practical implications of multilevel personnel selection. They noted that a
focus on “the job” is insufficient and developed a model to show how practitioners could establish
multilevel validity. Ployhart and Schneider (2005) then proposed a series of methodological and
statistical approaches for establishing validity at multiple levels. Ployhart (2004, 2006) attempted
to integrate the SHRM research on human capital with personnel selection scholarship to develop
a more comprehensive multilevel selection model that should articulate selection’s strategic value
to the firm.
This research on multilevel selection fits within the broader movement of SHRM scholarship
to link individual and unit levels. Wright and Boswell (2002) persuasively called for an integration
of micro (HR) and macro (strategy) scholarship because each level has implications for the other.
More recently, Gerhart (2005) and Wright and Haggerty (2005) noted that an important direction
for SHRM scholarship is to move beyond HR practices to demonstrating how HR practices build
aggregate compositions of human and social capital (competencies) linked to unit-level outcomes
204 Handbook of Employee Selection

(interestingly, compilation models are less-mentioned). They essentially argued for scholars to
focus on how HR practices create human and social capital emergence. Bowen and Ostroff (2004)
developed the concept of HR strength, an idea that stronger (more consistent, greater internal fit) HR
practices create more cohesive climates than do weaker (less consistent) HR practices.
So, how might HR practices, and selection in particular, contribute to the emergence of unit-
level competencies that form the basis for sustained competitive advantage? Lepak and Snell
(1999) suggested there are different types of employee groups that differ in their strategic value
and uniqueness to the firm and that firms should vary their HR practices accordingly. For example,
firms should use commitment-based HR practices for employees who are both highly valuable and
unique (e.g., top managers), but use efficiency-based HR practices for employees who are valuable
but not unique (e.g., entry-level employees). Jobs have similarly been differentiated as “A” jobs and
“C” jobs, with the former being central to the firm’s ability to execute its strategy (Kaplan & Norton,
2003). Selection is most likely to create unit-level impact when applied to A jobs.
As an example, in most banks the loan officer job is an important one. They are the interface
between the bank and customers and make decisions about whether the bank should lend money.
Incumbents in such positions are often evaluated on sales and service criteria. The KSAO keys
to selection in this context would be those related to sales (e.g., persistence, energy, persuasion)
and service (agreeableness, conscientiousness). However, much more critical to the organization’s
success are the far smaller number of A jobs responsible for setting the organization’s overall credit
policies and managing its risk. Although small in number, their impact on the firm is enormous
(witness the meltdown in the financial markets from overzealous lending practices). Criteria used to
evaluate these jobs might include various ratios of returns to bad loans. The KSAOs would be sub-
stantially different, focusing on quantitative analysis, forecasting, geopolitical trends, and the like.
Consistent across all this theory is the expectation that HR practices create a composition and/or
compilation of competencies (unit KSAOs) from individual KSAOs. It bears repeating that the stra-
tegic value manifests from the unit-level competencies, not individual KSAOs. HR practices pro-
duce unit differences and may contribute to sustained competitive advantage through the creation
of valuable, rare, inimitable, and nonsubstitutable competencies (human and social capital at the
unit level).
Empirical research is just starting to appear that supports many of these expectations. Lepak and
Snell (2002) found that firms do in fact use different HR practices for different employee groups
varying in terms of value and uniqueness. Takeuchi, Lepak, Heli, and Takeuchi (2007) found that
aggregate manager perceptions of employee competencies were positively related to self-reports of
firm performance. Several studies have found that human capital manifests different forms of com-
position across units, and hence units can be distinguished in terms of aggregate individual KSAOs
(Jordan, Herriot, & Chalmers, 1991; Schaubroeck, Ganster, & Jones, 1998; Schneider, White, &
Paul, 1998). Ployhart, Weekley, and Baughman (2006) extended these findings to show that job- and
organizational-level human capital helped explain between-unit differences in job performance and
job satisfaction.
Considerably more research is needed linking HR practices to specific types of human and social
capital emergence. For example, do firms that use more aligned recruiting, selection, and retention
practices create more valuable and rare forms of human capital more quickly? Do such firms have
greater control over the flow of human and social capital? Do firms that use such practices outper-
form rivals? SHRM scholarship needs to focus squarely on human and social capital emergence,
thereby illuminating the black box between HR practices and unit outcomes. Fortunately, I-O psy-
chologists and selection specialists are the torch that can light the examination of this black box.

SELECTION AND RETENTION


Turnover nullifies the positive human and social capital enhancements accrued through effective
selection. The extant literature suggests effective selection can help reduce turnover (Barrick &
Strategy, Selection, and Sustained Competitive Advantage 205

Zimmerman, 2005), but this only scratches the surface of retention issues. For example, recent unit-
level research suggests the economic costs associated with turnover are far greater than individual
research might indicate. Unit-level human and social capital losses represent the quantity and quality
of talent lost through turnover, the costs of which may be greater than simply the rate of turnover
(Glebbeek & Bax, 2004; Kacmar, Andrews, Rooy, Steilberg, and Cerone, 2006; Shaw, Gupta, &
Delery, 2005). This is an important transition because unit-level turnover captures losses of collective
processes (e.g., coordination and communication) not apparent in individual-level research. Dess
and Shaw (2001) theorized that turnover rates may not fully capture the costs of turnover. For
example, service- and knowledge-based firms are highly dependent on social capital (the collective
social relationships, interactions, and networks within a unit). Effective unit performance in these
settings requires employees to share information about customers, services, products, and best
practices. Unit turnover creates holes in social networks that have negative consequences beyond
simple turnover rates (Shaw et al., 2005).
Human capital theory predicts that the loss of talent through turnover means the investment
made in unit human capital may not be recouped. But all loss is not considered equal. For example,
turnover in A jobs should be more costly than turnover in C jobs. That is, the opportunity costs
of turnover in strategically critical positions should be higher than the same costs in less mission-
critical positions. Similarly, higher quality unit human capital is expected to produce more value,
so turnover among top talent will incur greater opportunity costs than turnover amongst lesser
talent (Becker, 1964). Turnover should also be more damaging when those who leave have more
tacit knowledge (Strober, 1990). From the RBV perspective, units are not equally distributed with
respect to human capital. Turnover may diminish the stock of unit human capital, but turnover
among the highest quality human capital may make that resource less valuable, rare, and more
easily imitated unless replacements are equally qualified (Lepak & Snell, 1999). Because individual
KSAOs tend to be normally distributed, by definition higher quality talent will be more scarce
and difficult to replace. Empirically, Shaw et al. (2005) found that unit social capital losses were
associated with additional negative effects beyond mere performance losses, indicating the quality
of social capital was an important factor. Research similarly finds a loss of shared tacit knowledge
(Hitt, Bierman, Shimizu, & Kochhar, 2001) and an erosion of communication and coordination
(Sacco & Schmitt, 2005).
Selection experts should take ownership of turnover problems. For example, an HR executive
of a Fortune 500 company told one of the authors: “I tell my (HR) team that turnover means
our hiring process failed.” That is a bold statement (people quit for all kinds of reasons that
have nothing to do with their ability to perform the job), but there is some truth to better
utilizing staffing as a means to control turnover. That said, adequately solving this problem
will require a deeper understanding of turnover than simply identifying the KSAOs necessary
for job performance. For example, it will probably require an understanding of how individuals
fit within the culture of the fi rm and characteristics of the work group, something that is rarely
assessed as part of selection practices. Note that doing so would obviously contextualize
selection, an idea that has been recently been advocated (Cascio & Aguinis, 2008). This research
may help link selection to important internal processes that contribute to the fi rm’s ability to
retain the high-quality stock of strategically valuable human and social capital necessary for
strategy implementation.

TALENT AS ASSETS VERSUS COSTS


Current models of accounting treat HR-related activities as costs. For better or worse, accounting is
the language of business and so viewing HR as a cost likely contributes to managers’ perceptions of
HR not adding strategic value. It is incumbent on HR managers and staffing specialists to convey
how human and social capital can become a financial asset. Chapter 11, this volume, discusses this
issue in some detail, but we raise a few additional points here.
206 Handbook of Employee Selection

Earlier when discussing Figure 9.1, we noted some similarities and differences between viewing
staffing as a strategic process versus traditional utility analysis. We emphasize that it is the unit-
level human and social capital that creates sustained competitive advantage. Therefore, it is critical
to articulate the economic value of this intangible asset—what Ulrich and Smallwood (2005) called
“return on intangibles.” In some of our own research, we have shown how individual KSAOs emerge
to create meaningful between-unit differences in human capital (Ployhart, Weekley, & Ramsey,
2005). More important, we quantified this unit human capital in a composition model and linked
it to unit-level financial and productivity-based outcomes over time. In a sample of over 100,000
employees and over 1,000 retail chain stores, we showed how store levels of service orientation
contributed to between-store differences in sales and controllable profit. Stores performed better as
they acquired higher levels of human capital and performed more poorly as they lost human capital.
Thus, we showed how this “intangible asset” was in fact tangibly linked to important store financial
criteria.

SELECTION AS A LEVER FOR (OR BARRIER TO) CHANGE


A firm’s business strategy and personnel selection practices should form a symbiotic relationship.
When a firm decides to change strategic direction (e.g., by shifting from competing on cost to
competing on quality or expanding into completely new markets) it should require changes in the
selection process. The need for change in strategy may be signaled by a new CEO, declining mar-
ket share, a paradigm threatening advance in technology, or a new entrant into the competitive
landscape. In such cases, new corporate competencies may be required. For such competencies to
emerge, selection efforts will have to focus on the KSAOs comprising the basic ingredients. When
combined with appropriate changes to other HR practices (e.g., performance management and com-
pensation), new competencies may begin to form to support execution of the firm’s new strategy.
Additionally, most organizational change efforts involve a substantial change in personnel, at least
at the top levels of the firm. By attracting, selecting, and retaining people who share the new vision
and have the skills necessary to achieve the vision, a firm can quickly and possibly radically alter its
human and social capital composition. These are important selling points HR managers can use to
secure support and resources for selection procedures.
Of course, selection can also serve as a barrier to change when it is divorced from the firm’s
strategy and results in the continued acquisition of employees who lack the skills or vision neces-
sary for the company’s survival. Organizations that rely on an internal labor market (i.e., hire at the
entry level and promote exclusively from within) are often at a disadvantage when change becomes
a necessity. Such firms may find it difficult to alter their skill mix in the short-term without reliance
on talent secured from the outside. Conversely, organizations that rely on the external labor market
for resources (i.e., buying the talent on the open market as needed) may be more nimble when faced
with the need for change. An organization attempting to shift from a commodities market to an
“upscale market” may find itself in need of people able to envision products worthy of a price pre-
mium (as opposed to people who focus primarily on cost control). Growing them internally would
be slow, at best. To the extent the organization’s new strategy requires new organizational compe-
tencies, extensive change in the selection procedures is inevitable.

GLOBAL CONSIDERATIONS IN SELECTION AND STRATEGY


Chapter 36 of this volume discusses various issues relating to the global implementation of selection
practices, but here we discuss a few specific to linking selection with strategy. Clearly organizations
operating globally face a unique set of selection issues. Organizations that operate globally often
have different functions located in different parts of the world (e.g., design in the United States and
manufacturing in China). As a result, different core competencies are required by different parts of
the organization located in different parts of the globe. Obviously, different selection criteria may be
Strategy, Selection, and Sustained Competitive Advantage 207

called for in different parts of the world, and thus the KSAOs matched to the competencies needed
to execute respective functions will also differ.
Global selection also means that selection methods developed in one part of the world may not
generalize to other countries or cultures (e.g., Ryan, McFarland, Baron, & Page, 1999). Even where
the same competencies are desired, indicating the same KSAOs are required, different or at least
modified methods may be required to identify people who possess them. For example, it is well
established that personality measures are affected by response styles (e.g., extreme responding,
acquiescence), which in turn vary across cultures. Other selection methods like biodata and situ-
ational judgment, which are usually developed and keyed at a local level, may not generalize across
cultural boundaries (Lievens, 2006). The obvious point is that globalization greatly compounds the
challenges facing selection researchers attempting to build the corporate competencies required to
support execution of the business strategy and impact key outcomes. The talent necessary to com-
pete in one local economy (e.g., China, India) may contribute little to supporting the organization’s
overall strategy. For example, suppose a U.S.-based organization opens a manufacturing facility in
China. They may choose to employ local managers because they have expertise in Chinese culture
and business practices, but they may lack some key competencies necessary for them to understand
how their operation adds value to the overall firm. Hence, assuming some autonomy, they may make
choices that run counter to the firm’s strategy even though they are successful locally. The critical
issue is alignment between the firm’s strategy and talent within and across geographic locations.

SELECTION INFLUENCES DIVERSITY


Strategy, selection, and diversity are an important combination for at least two reasons. First,
many U.S. firms report that increasing, or at least valuing, diversity is one of their corporate goals.
Whether this is done because they think it helps the bottom line, for social responsibility, or simply
to enhance public relations, the fact remains that many firms place a premium on attracting diverse
talent. In short, diversity may be part of the firm’s strategy. Many firms will only use a systematic
selection process to the extent it increases diversity—validity will often take a secondary role, if it
takes a role at all. A common scenario is an organization choosing to not implement a locally vali-
dated cognitive ability test because of the concomitant adverse impact implications. (Note that in
this section, our treatment of diversity is primarily in terms of demographic diversity; psychological
diversity is actually a subset of compilation models discussed earlier.)
However, it should be remembered that selection is the only way to influence demographic
diversity (notwithstanding mergers and acquisitions). Thus, selection is the primary mechanism for
enhancing diversity, and arguing for selection in this respect can be a powerful means to articulate
its value (Ployhart, 2006). For example, one of the authors worked on developing a selection system
for investment bankers. Because the investment banking community is rather small, many of the
hires were based on recommendations from current employees. Relying on recommendations from
internal employees almost ensures the status quo with respect to demographic diversity. The line
director wanted a systematic selection practice because he believed it would result in a more demo-
graphically diverse slate of applicants.

SELECTION SUPPORTS TALENT SEGMENTATION


The strategy determines which jobs are A jobs and which are C jobs. Selection’s role is in recognizing
which jobs are most critical, why they are so (e.g., because they directly impact the emergence of core
competencies), and in ensuring an adequate supply of talent in those jobs (through attraction, selection,
and/or retention). One problem facing selection researchers is that the C-level jobs are often the high
population jobs most suitable to local validation efforts. The truly critical jobs for an organization may
encompass a relatively small population, making criterion-related validation efforts difficult at best.
Research on the unit-level impact of alternative validation strategies is clearly needed.
208 Handbook of Employee Selection

Although the strategy determines which are A jobs, performance largely determines who is
considered an A player. Selection specialists can play a critical role in a firm’s success to the extent
they can match A players with A jobs. Firms that are successful in placing their most talented indi-
viduals in the most mission-critical jobs should be more successful than those taking a more casual
approach to placement. Although seemingly obvious, we are not aware of any systematic research
examining the issue. However, the role is not limited to selecting talent. It also involves deselecting
marginal performers from critical jobs. If the job is truly essential to the execution of the firm’s
strategy, then rigorous performance management, including the reallocation or separation of weak
performers, becomes an important if unpleasant task.

SELECTION HELPS DEVELOP A CRITICAL MASS


The assumption in staffing is that adding higher quality employees will improve the firm’s effectiveness
in a linear manner. This is the basis of basic utility analysis. Yet SHRM research provides some
theory to suggest that a unit must develop a critical mass of talent for it to be strategic. Unit-level
competencies, such as customer service or reliability (e.g., package delivery), are unlikely to emerge
where only a few members of the unit possess the appropriate individual-level KSAOs. In essence,
Figure 9.1 suggests that hiring only one or a few highly talented applicants is unlikely to produce
sustained competitive advantage. Rather, there must be sufficient quantities and quality so that
“human and social capital emerges” and hence can influence unit-level outcomes (Ployhart, 2006).
If correct, this has two important implications. First, it means that human and social capital has a
minimum threshold that must be passed to ensure contribution to sustained competitive advantage.
This means that an adequate flow (attraction and retention) of talent becomes paramount. Second,
it means that human and social capital might also have a maximum threshold, or at least a point
of diminishing returns. This would indicate a point where there is little relative value to be gained
by more effective selection, at least to the extent retention is stable. There is almost no research
that speaks to these issues, but such research is necessary because both implications are counter to
utility analysis.

SELECTION’S SYMBIOTIC RELATIONSHIP WITH OTHER HR ACTIVITIES


It has been argued that unit-level competencies can produce strategic benefits and that they are
the result of the careful alignment of selection and other HR programs with the firm’s strategy.
Selection is merely one way in which an organization can influence the talent it employs. Clearly,
training and development activities can impact individual-level talent and ultimately unit
competence. Performance management systems similarly can signal desired behaviors and shape
the development and utilization of unit competencies. Compensation systems can be designed to
reward the acquisition of individual knowledge, skills, and abilities that support unit competency
(e.g., pay-for-knowledge systems) and can play an important role in recruitment and retention. The
largest impact comes when all elements of the HR system are congruent in terms of effect; all
point toward and support the development of the same unit-level competencies linked to the firm’s
strategy.
HR executives are under increasing pressure to demonstrate the value the function adds to the
organization. Boards of directors are increasingly demanding evidence of an adequate return on
investment for HR-related expenditures. To meet this challenge, practitioners have begun examin-
ing group-level outcomes as they relate to HR activities. For example, in the survey arena, linkage
studies are increasingly common as HR executives seek to demonstrate that employee engagement
and satisfaction are related to important outcomes. This same approach will need to be applied
to selection research—demonstrating unit-level outcomes of individual-level selection systems.
As an example, one of the authors recently completed a study wherein an assessment designed
to predict customer service behaviors was related at the unit level to customer satisfaction scores
Strategy, Selection, and Sustained Competitive Advantage 209

TABLE 9.1
Summary of Key Implications: Integrating Strategic Human Resources
With Personnel Selection
1. Only unit-level human capital and social capital can offer strategic value to the firm. Personnel selection and selection
on targeted individual-level KSAOs can only contribute to the firm’s strategy insofar as they contribute to the
emergence of strategically valuable unit-level human and social capital.
2. Individual-level criterion-related validity is insufficient evidence to demonstrate the strategic value of selection. It is
necessary, but not sufficient. Simply using selection procedures that predict individual job performance is no guarantee
that personnel selection will contribute to the firm’s sustained competitive advantage.
3. Validity at the individual level generalizes across contexts, but the validity of unit-level human capital and social
capital does not generalize. The extent to which these unit-level competencies have relationships with unit outcomes is
dependent on the firm’s strategy, market, competitors, and related factors. Indeed, if unit-level human and social
capital are to add value to the firm’s competitive advantage, then the firm would not want these relationships to
generalize to other firms!
4. Demonstrating the strategic value of personnel selection will likely require a longitudinal focus, because selection
(and related HR) practices must be implemented appropriately over time to create a critical mass of unit-level human
and social capital emergence.
5. A critical mass of aggregate KSAOs is necessary to influence unit-level consequences. This has three important
implications. First, the development and sustainability of this critical mass is likely to be dynamic and nonlinear.
Second, the relationships between unit-level human and social capital are likely to be dynamic and nonlinear with
unit-level outcomes. Third, there is likely a threshold of diminishing returns; a tipping point where adding more talent
is unlikely to provide the same value to the firm.
6. Because performance criteria are not isomorphic between the individual and unit levels, there is the possibility that
different predictors of performance will be present at each level.

(specifically, mean store assessment scores correlated with a store-level customer service index at
r = .19, P < .05). Although not proving that the individual-level assessment was causing customer
service, it was certainly perceived as more important by the participating organization than was yet
another individual-level validation study.

CONCLUSIONS
In this chapter, we have argued that selection scholarship must be expanded to consider how, when,
and why personnel selection practices will contribute to creating business-unit-level value and
sustained competitive advantage. We noted that in contrast to expectations, effective selection prac-
tices will not always translate into firm-level value. Table 9.1 summarizes the main implications of
our chapter.
As a consequence of the issues summarized in Table 9.1, we have also tried to articulate ways
through which selection practices can manifest such value. Much is currently written about how
the HR profession needs to be part of the firm’s strategic decision-making team; this is even more
true for I-O psychology. Being able to demonstrate selection’s contribution to the firm’s strategy
is one way to accomplish this goal. Although the road toward establishing empirical connections
between selection practices and business unit sustained competitive advantage will not be easy
or quick, we believe it is vital for the future of our profession and I-O psychology’s own strategic
direction.

ACKNOWLEDGMENTS
We thank James L. Farr, Nancy T. Tippins, Jerard F. Kehoe, and Janice Molloy for providing sug-
gestions on earlier drafts of this chapter.
210 Handbook of Employee Selection

REFERENCES
Anderson, N. R. (2005). Relationships between practice and research in personnel selection: Does the left hand
know what the right is doing? In A. Evers, N. R. Anderson, & O. Smit-Voskuyl (Eds.), The Blackwell
handbook of personnel selection (pp.1–24). Oxford, England: Blackwell.
Barney, J. B. (1991). Firm resources and sustained competitive advantage. Journal of Management, 17,
99–120.
Barney, J. B., & Wright, P. M. (1998). On becoming a strategic partner: The role of human resources in gaining
competitive advantage. Human Resource Management, 37, 31–46.
Barrick, M. R., & Zimmerman, R. D. (2005). Reducing voluntary, avoidable turnover through selection. Journal
of Applied Psychology, 90, 159–166.
Becker, B. E., & Huselid, M. A. (2006). Strategic human resource management: Where do we go from here?
Journal of Management, 32, 898–925.
Becker, G. S. (1964). Human capital: A theoretical and empirical analysis with special reference to education.
New York, NY: National Bureau of Economic Research.
Bliese, P. D. (2000). Within-group agreement, non-independence, and reliability: Implications for data
aggregation and analysis. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and
methods in organizations: Foundations, extensions, and new directions (pp. 349–381). San Francisco,
CA: Jossey-Bass.
Bowen, D. E., & Ostroff, C. (2004). Understanding HRM-firm performance linkages: The role of the “strength”
of the HRM system. Academy of Management Review, 29, 203–221.
Cascio, W. F., & Aguinis, H. (2008). Staffing twenty-first-century organizations. Academy of Management
Annals, 2, 133–165.
Chadwick, C., & Cappelli, P. (1999). Alternatives to generic strategy typologies in strategic human resource
management. In P. M. Wright, L. D. Dyer, J. W. Boudreau, & G. T. Milkovich (Eds.), Research in person-
nel and human resources management, (Suppl. 4, pp. 1–29). Stamford, CT: JAI Press.
Colbert, B. A. (2004). The complex resource-based view: Implications for theory and practice in strategic
human resource management. Academy of Management Review, 29, 341–358.
Combs, J., Yongmei, L., Hall, A., & Ketchen, D. (2006). How much do high-performance work practices matter?
A meta-analysis of their effects on organizational performance. Personnel Psychology, 59, 501–528.
Delery, J. E., & Doty, D. H. (1996). Modes of theorizing in strategic human resource management: Tests
of universalistic, contingency, and configurational performance predictions. Academy of Management
Journal, 29, 802–835.
DeNisi, A. S., Hitt, M. A., & Jackson, S. E. (2003). The knowledge-based approach to sustainable competi-
tive advantage. In S. E. Jackson, M. A. Hitt, & A. S. DeNisi (Eds.), Managing knowledge for sustained
competitive advantage (pp. 3–33). San Francisco, CA: Jossey-Bass.
Dess, G. G., & Shaw, J. D. (2001). Voluntary turnover, social capital, and organizational performance. Academy
of Management Review, 26, 446–456.
Dierickx, I., & Cool, K. (1989). Asset stock accumulation and sustainability of competitive advantage.
Management Science, 35, 1504–1511.
Gerhart, B. (2005). Human resources and business performance: Findings, unanswered questions, and an
alternative approach. Management Revue, 16, 174–185.
Glebbeek, A. C., & Bax, E. H. (2004). Is high employee turnover really harmful? An empirical test using
company records. Academy of Management Journal, 47, 277–286.
Grant, R. M. (1996). Toward a knowledge-based theory of the firm. Strategic Management Journal, 17,
109–122.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decision (2nd ed.). Mahwah, NJ:
Erlbaum.
Hitt, M. A., Bierman, L., Shimizu, K., & Kochhar, R. (2001). Direct and moderating effects of human capital
on strategy and performance in professional service firms: A resource-based perspective. Academy of
Management Journal, 44, 13–28.
Huselid, M. A. (1995). The impact of human resource management practices on turnover, productivity, and
corporate financial performance. Academy of Management Journal, 38, 635–672.
Jordan, M., Herriot, P., & Chalmers, C. (1991). Testing Schneider’s ASA theory. Applied Psychology, 40,
47–53.
Kacmar, K. M., Andrews, M. C., Rooy, D. L. V., Steilberg, R. C., & Cerone, S. (2006). Sure everyone can be
replaced … but at what cost? Turnover as a predictor of unit-level performance. Academy of Management
Journal, 49, 133–144.
Strategy, Selection, and Sustained Competitive Advantage 211

Kozlowski, S. W. J., & Klein, K. J. (2000). A multilevel approach to theory and research in organizations:
Contextual, temporal, and emergent processes. In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel
theory, research, and methods in organizations: Foundations, extensions, and new directions. (pp. 3–90).
San Francisco, CA: Jossey-Bass.
Latham, G. P., & Whyte, G. (1994). The futility of utility analysis. Personnel Psychology, 47, 31–46.
Lepak, D. P., & Snell, S. A. (1999). The human resource architecture: Toward a theory of human capital allocation
and development. The Academy of Management Review, 24, 34–48.
Lepak, D. P., & Snell, S. A. (2002). Examining the human resource architecture: The relationships among
human capital, employment, and human resource configurations. Journal of Management, 28, 517–543.
Lievens, F. (2006). International situational judgment tests. In J. A. Weekley & R. E. Ployhart (Eds.), Situational
judgment tests (pp. 279–300). Mahwah, NJ: Lawrence Erlbaum.
Nahapiet, J., & Ghoshal, S. (1998). Social capital, intellectual capital and the organizational advantage.
Academy of Management Review, 23, 242–266.
Noe, R. A., Colquitt, J. A., Simmering, M. J., & Alvarez, S. A. (2003). Knowledge management: Developing
intellectual and social capital. In S. E. Jackson, M. A. Hitt, & A. S. DeNisi (Eds.), Managing knowledge
for sustained competitive advantage (pp. 209–242). San Francisco, CA: Jossey-Bass.
Oldham, G. R. (2003). Stimulating and supporting creativity in organizations. In S. E. Jackson, M. A. Hitt,
& A. S. DeNisi (Eds.), Managing knowledge for sustained competitive advantage (pp. 274–302). San
Francisco, CA: Jossey-Bass.
Ployhart, R. E. (2004). Organizational staffing: A multilevel review, synthesis, and model. In J. Martocchio (Ed.),
Research in personnel and human resource management, 23, 121–176. Oxford, England: Elsevier.
Ployhart, R. E. (2006). Staffing in the 21st century. Journal of Management, 32, 868–897.
Ployhart, R. E., & Schneider, B. (2002). A multilevel perspective on personnel selection: Implications for
selection system design, assessment, and construct validation. In F. J. Dansereau & F. Yammarino (Eds.),
Research in multi-level issues Volume 1: The many faces of multi-level issues (pp. 95–140). Oxford,
England: Elsevier Science.
Ployhart, R. E., & Schneider, B. (2005). Multilevel selection and prediction: Theories, methods, and models.
In A. Evers, O. Smit-Voskuyl, & N. Anderson (Eds.), Handbook of personnel selection (pp. 495–516).
Oxford, England: Blackwell.
Ployhart, R. E., Weekley, J. A., & Baughman, K. (2006). The structure and function of human capital emergence:
A multilevel examination of the ASA model. Academy of Management Journal, 49, 661–677.
Ployhart, R. E., Weekley, J. A., & Ramsey, J. (2005). Does human capital relate to unit level effectiveness over
time? Paper presented at the 2005 Annual Meeting of the Academy of Management, Honolulu, HI.
Prahalad, C. K., & Hamel, G. (1990). The core competence of the corporation. Harvard Business Review, 68,
79–91.
Ryan, A. M., McFarland, L. A., Baron, H., & Page, R. (1999). An international look at selection practices:
Nation and culture as explanations for variability in practice. Personnel Psychology, 52, 359–391.
Rynes, S. L., Brown, K. G., & Colbert, A. E. (2002). Seven misconceptions about human resource practices:
Research findings versus practitioner beliefs. Academy of Management Executive, 16, 92–103.
Sacco, J. M., & Schmitt, N. (2005). A dynamic multilevel model of demographic diversity and misfit effects.
Journal of Applied Psychology, 90, 203–231.
Saks, A. M. (2005). The impracticality of recruitment research. In A. Evers, O. Smit-Voskuyl, & N. Anderson
(Eds.), Handbook of personnel selection (pp. 47–72). Oxford, England: Blackwell.
Schaubroeck, J., Ganster, D. C., & Jones, J. R. (1998). Organization and occupation influences in the attraction-
selection-attrition process. Journal of Applied Psychology, 83, 869–891.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology:
Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Schneider, B. (1987). The people make the place. Personnel Psychology, 40, 437–454.
Schneider, B., Smith, D. B., & Sipe, W. P. (2000). Personnel selection psychology: Multilevel considerations.
In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations:
Foundations, extensions, and new directions. (pp. 91–120). San Francisco, CA: Jossey-Bass.
Schneider, B., White, S. S., & Paul, M. C. (1998). Linking service climate to customer perceptions of service
quality: Test of a causal model. Journal of Applied Psychology, 83, 150–163.
Schuler, R. S., Galante, S. P., & Jackson, S. E. (1987). Matching effective HR practices with competitive strategy.
Personnel, 64, 18–27.
Shaw, J. D., Duffy, M. K., Johnson, J. L., & Lockhart, D. E. (2005). Turnover, social-capital losses, and performance.
Academy of Management Journal, 48, 594–606.
212 Handbook of Employee Selection

Shaw, J. D., Gupta, N., & Delery, J. E. (2005). Alternative conceptualizations of the relationship between vol-
untary turnover and organizational performance. Academy of Management Journal, 48, 50–68.
Strober, M. H. (1990). Human capital theory: Implications for HR managers. Industrial Relations, 23,
214–239.
Takeuchi, R., Lepak, D. P., Heli, W., & Takeuchi, K. (2007). An empirical examination of the mechanisms
mediating between high-performance work systems and the performance of Japanese organizations.
Journal of Applied Psychology, 92, 1069–1083.
Taylor, M. S., & Collins, C. J. (2000). Organizational recruitment: Enhancing the intersection of theory and
practice. In C. L. Cooper & E. A. Locke (Eds.), Industrial and organizational psychology: Linking theory
and practice (pp. 304–334). Oxford, England: Blackwell.
Terpstra, D. E., & Rozell, E. J. (1993). The relationship of staffing practices to organizational level measures of
performance. Personnel Psychology, 46, 27–48.
Ulrich, D., & Smallwood, N. (2005). HR’s new ROI: Return on intangibles. Human Resource Management,
44, 137–142.
Whyte, G., & Latham, G. P. (1997). The futility of utility analysis revisited: When even an expert fails. Personnel
Psychology, 50, 601–610.
Wright, P. M., & Boswell, W. R. (2002). Desegregating HRM: A review and synthesis of micro and macro HR
research. Journal of Management, 28, 247–276.
Wright, P. M., Dunford, B. D., & Snell, S. A. (2001). Human resources and the resource-based view of the firm.
Journal of Management, 27, 701–721.
Wright, P. M., Gardner, T. M., Moynihan, L. M., & Allen, M. R. (2005). The relationship between HR practices
and firm performance: Examining causal order. Personnel Psychology, 58, 409–446.
Wright, P. M., & Haggerty, J. J. (2005). Missing variables in theories of strategic human resource management:
Time, cause, and individuals. Management Revue, 16, 164–173.
10 Managing Sustainable
Selection Programs

Jerard F. Kehoe, Stefan T. Mol, and Neil R. Anderson

The objective of this chapter is to describe the major features of selection programs that contribute
to their sustainable success. This chapter will focus primarily on the organizational context of selec-
tion programs considering the influence of governance, fit, strategy, and outcomes. The chapter will
not focus on the psychometric technology of selection practices that affect the value of selection
decisions as this content is treated elsewhere in this handbook. However, because the context of per-
sonnel selection science is so directly relevant, the relationship between program sustainability and
the scientific context of validation, research, and data will be explored. Finally, this chapter is the
result of collaboration between psychologists with U.S.- and European-centric professional experi-
ence. The intent is not so much to ensure comprehensive coverage of cultural or national differences
between sustainable selection programs as much as it is to better ensure that this chapter is relevant
to modestly diverse cultural and national perspectives and contexts.
Two recent chapters (Tippins, 2002; Roe, 2005) and one article (Klehe, 2004) have addressed the
design and implementation of selection programs. This chapter’s focus on the organizational context
for selection programs complements these earlier works. Tippins (2002) and Roe (2005) focused
primarily on the procedural elements of the selection process itself. For example, both addressed
selection decision processes such as the use of cut scores, multiple hurdle methods, and compensa-
tory methods. Roe (2005) placed somewhat greater emphasis on the planning processes such as the
use of a program of requirements (PoR) for the design of selection systems. Indeed, many of Roe’s
suggestions regarding planning processes could be usefully applied to any of several HR processes.
In contrast, Tippins (2002) focused more on the necessary elements of a fully functioning selection
program, including critical elements such as the management of test materials, test administra-
tion processes, test preparation strategies, and test use rules. Finally, Klehe (2004) focused on the
institutional pressures that may help or hinder the adoption of selection procedures that are recom-
mended by academia.
This chapter completes the picture by addressing the organizational context in which the design,
implementation, and management of selection programs takes place. It should also be noted that this
chapter complements Chapter 9, this volume, which also addresses the organizational context for
selection. In contrast to Chapter 9, this chapter treats the organizational context as an independent
variable, if you will, that influences the features of selection programs necessary to be sustainable.
In Chapter 9, Ployhart and Weekley focus on the organization as the dependent variable by consider-
ing the impact of selection as a human resources (HR) management strategy on the organization.

ORGANIZATION CONTEXT FOR SELECTION


Four layers of organization context and structure will be described here that influence the sustain-
ability of selection programs. These are (a) support of organization purposes, (b) alignment with HR
strategy, (c) acceptability of governance, and (d) effectiveness of process management. At the most

213
214 Handbook of Employee Selection

general level, organization purposes create the environment in which the most fundamental deci-
sions about sourcing employees are made. This chapter focuses on organization purposes served by
selection systems, in particular the multiple layers of purposes that can be served.
At the next level, the HR strategy—if it can be defined—is likely to provide essential direction in
establishing the goals and objectives of a selection program and the manner in which it is integrated
with other HR programs and business processes. At the next level of specificity, governance in the
form of operating principles, policies, and guidelines, establishes the authorities, accountabilities,
boundary conditions, roles, and the like, that enable the selection program to function effectively
and efficiently within the context of other HR processes. Finally, selection process management is
the most specific form of structure within which selection programs operate. The elements of pro-
cess management are highly specific to the functioning of a selection program. They can be com-
mon across all units and jobs or they can vary across units and jobs. Figure 10.1 provides a visual
depiction of these sustainability considerations as well as the specific points underlying each one
that are addressed in this chapter.
Before describing the four layers of organizational context that affect sustainability, we offer our
perspective about the meaning of selection system sustainability.

DEFINING SELECTION SYSTEM SUSTAINABILITY


Within the current chapter we apply an institutional perspective (DiMaggio & Powell, 1991; Scott,
1995) to the definition of selection system sustainability by our focus on organization purpose, HR
strategy, governance, and process management. Thus, rather than defining selection system sus-
tainability in terms of economically rational decision-making, which is epitomized in much of the
academic literature pertaining to personnel selection, selection system sustainability is defined here
in terms of a normative rationality that is contingent upon individual-level factors (e.g., managers’
norms, habits, and unconscious conformity to organizational traditions), the organizational level
(e.g., corporate culture, shared belief systems, and political processes), and the societal level (e.g.,
legislation and professional standards) (Oliver, 1997). In our view, a selection system is sustainable

A. Support of organization purpose B. Alignment with HR strategy


Person – Job fit Employee skill strategy
Person – Team fit Management of corporate and unit interests
Person – Organizational fit Role of professional selection expertise
Influence of employment/staffing delivery
Impact of HR technology

C. Acceptability of governance framework D. Effectiveness of process management


Guiding principles
Policies

FIGURE 10.1 The four-part model of organizational factors influencing the sustainability of selection
programs.
Managing Sustainable Selection Programs 215

to the extent that its purpose, strategy, governance, and management are consistent with these touch-
stones. A vital implication of our perspective is that although an organization may have designed a
sophisticated selection procedure that displays high validity, reliability, and fairness to begin with,
paying insufficient attention to sustainability issues will inevitably result in disuse or, more subtly, a
gradual (or even rapid) decline in the psychometric performance of the system over time.

ORGANIZATION PURPOSES
Traditionally, personnel selection has been conceived as the fit between the person and their immedi-
ate job role (i.e., P-J fit; Chatman, 1991; Ostroff & Rothausen, 1996). As organizations have become
more delayered, flexible, and team-based in their structures, so the imperative to consider person-
nel selection from other, more superordinate levels of fit has gained momentum among researchers
and personnel practitioners active in employee selection. Principally this has meant that issues of
person-team fit (P-T fit) and person-organization fit (P-O fit) have been added to the selection agenda
over recent years, and the need to consider any selection decision from all three levels of analysis—
person-job fit (P-J fit), P-T fit, and P-O fit—has been increasingly recognized (e.g., Ployhart &
Schneider, 2005). In effect, this has resulted in the criterion space under consideration in selec-
tion being substantially extended to include other levels of fit. Yet this expansion of the criterion
space has only been rather recent, and the research-base upon which I/O psychologists can make
grounded recommendations to organizations to best manage multilevel selection systems remains
underdeveloped. To illustrate this point, two quotes will suffice.

The critical challenge is to expand our conceptual horizon beyond the level of person-job fit and to
incorporate multiple and interactive levels of analysis into selection decision-making. (Herriot &
Anderson, 1997, p. 26)

Reflecting on nearly a century of personnel selection research, it is quite troubling to us that we have
no solid answers … and approaches to answering the questions remain outside of traditional personnel
selection research. We may be able to show how hiring better people contributes to better individual job
performance, but we have hardly examined whether this contributes to better unit-level performance.
(Ployhart & Schneider, 2005, p. 496)

This relative paucity of research into selecting for P-T and P-O fit compared against the mass
of studies into aspects of P-J fit of course leads to problems in making sound recommendations for
the management of selection systems at different levels of analysis (e.g., Anderson, Lievens, van
Dam, & Ryan, 2004; Ployhart, 2007; Schneider, Smith, & Sipe, 2000). Despite this discrepancy in
research coverage, the area of multilevel selection has recently become far more active, and sev-
eral authors internationally have contributed theoretical models (e.g., Ployhart & Schneider, 2002;
Ployhart, 2007; Stevens & Campion, 1994), empirical studies have been published (e.g., LePine,
Hollenbeck, Ilgen, & Hedlund, 1997; Morgeson, Reider, & Campion, 2005), and even validated
measures of P-T fit have appeared in publication (e.g., Burch & Anderson, 2004; Mumford, Van
Iddekinge, Morgeson, & Campion, 2008). In short, there has been a far more speculative approach
than clear signs of having arrived in terms of the focus being put upon the generation of theoretical
models and conceptual think-piece papers rather than the publication of robust empirical studies
into multilevel selection effects.
Despite these shortcomings, several important implications for the management of selection sys-
tems can be gleaned from the recent literature. In perhaps the most detailed and directly relevant
contribution to the validation of multilevel selection decisions, Ployhart and Schneider (2005) pro-
posed a ten-stage model for the conduct of any such validation study. The stages are as follows:

1. Articulate theory: Operationalize hypothesis of within- and across-level relationships


between predictor constructs.
216 Handbook of Employee Selection

2. Articulate relationships between theory and measurement issues, especially with regard to
data aggregation.
3. Articulate predictors define predictor methods and specify their predominant level/levels
of analysis.
4. Specify within-level relationships: Operationalize direction and strength of knowledge,
strength, abilities, and other characteristics (KSAO)-criterion relationships within level
(i.e., P-J, P-T, and P-O).
5. Specify cross-level relationships: Operationalize contextual effects, cross-level effects,
and multiple-level effects.
6. Sample units: Sample a sufficient number of units to test for within- and cross-level
relationships.
7. Measures: Use appropriate measures for each level of analysis.
8. Aggregation: Test for unit-level variance and reliability of aggregation procedures.
9. Analyze data using appropriate procedures.
10. Interpret results giving consideration to within- and cross-level findings.

This procedure for validation of multilevel selection procedures is comprehensive, but it is appar-
ent that only the most statistically versed of HR practitioners supported by an industrial-organiza-
tional (I-O) psychologist able to undertake the relevant analyses would be able to complete such a
procedure. Rather, it is far more conceivable in practice that selectors will approach such decisions
in a notably ad hoc manner, will give weights to different within- and cross-level variables on the
basis of some notional “rules of thumb” known only to themselves, and will be prone to a gamut
of errors brought on by information overload, imperfect information processing, and satisfaction in
their decision-making strategies. Indeed, this is what we would expect from, say, the vast literature
now accumulated into interviewer and assessor decision-making under conditions of information
overload.
Yet Ployhart and Schneider’s (2005) model for validation, and thus sustainability management,
is highly valuable in pointing up the complexities of the task facing any practitioner or researcher.
Other authors have highlighted other issues of concern, including the likelihood that maximizing fit
at one level of analysis can lead to declines in fit at other levels of analysis. For instance, Anderson
et al. (2004) proposed three types of effects in cross-level selection decision-making: (a) comple-
mentary, (b) neutral, and (c) contradictory fit. That is, KSAOs being sought by an organization at
the level of P-J fit can either be complementary to P-T and P-O fit, neutral in their overlaps, or more
problematically, contradictory in their effects. For example, high extraversion needed for P-J fit can
be complementary for team-level issues of fit; whereas, conversely, high-rule independence needed
for innovation potential in a research and development (R&D) scientist may militate against P-O
climate fit in an organization that possesses a strong climate in which conformity is valued.
The subdomain of multilevel fit in selection promises to generate novel but far more complex
models of decision-making to support organizational efforts to optimize P-J, P-T, and P-O fit.
However, this field remains at a very embryonic stage of development with mostly theoretical and
model-building contributions published to date. Applied research in field study settings is badly
needed to begin to extend and clarify our understanding of these complexities and how best to
advise organizations to deal with the many issues, challenges, and controversies thrown up by mul-
tilevel fit in employee selection.

HR STRATEGY
HR strategy can vary significantly across organizations. For example, very small or highly entrepre-
neurial organizations may have no formalized HR strategy, whereas large organizations are likely
to have an HR strategy that is professionally developed and integrated to some extent into the busi-
ness strategy. Our experience with medium and large organizations points to the importance of five
Managing Sustainable Selection Programs 217

key HR strategies in determining characteristics of successful, sustainable selection programs. The


first and, perhaps, most important is the organization’s employee skill strategy. (The term “skill”
here is used in the broadest sense to include all employee characteristics that create value for the
organization.) The skill strategy often defines how the organization balances the internal develop-
ment of employee skills (building) with the external acquisition of employee skills (buying). Also,
the skill strategy often determines how the organization uses employee skills once obtained either
by building or buying.
The second strategy is more relevant for larger organizations. How are the interests of units and
the interests of the organization as a whole managed? This is a critical consideration particularly for
regulated HR processes such as employee selection. These first two questions are general and have
implications for many HR programs. The third strategy is specific to the HR function responsible
for the development and validation of selection procedures and processes. Is this design function
positioned within the HR strategy as an expert role that has no ownership of any elements of the HR
strategy as might be found in a Center of Excellence (COE)? Or, is this function positioned as an
owner of the HR strategy for personnel selection? This proves to be an important consideration at
many levels. The fourth question is about the relationship between the responsibility for designing
and developing selection procedures and processes and the responsibility for delivering or manag-
ing employment and staffing functions that house the selection procedures themselves. Finally, we
acknowledge the importance of HR technology for selection program management. This issue is
addressed in considerable detail in Chapter 8, this volume.

EMPLOYEE SKILL STRATEGY


The role of training and development to create skilled employees has a significant impact on the
selection program. Generally, to the extent the organization emphasizes training and development
as the source of employee skills either (or both) of two things may be true of the selection program.
One possibility is that the focus of the selection criteria shifts to general learning ability and com-
plementary general attributes such as conscientiousness, leadership, motivation, and integrity. This
shift is likely to be accompanied by a shift away from job-specific skills such as job knowledge,
work simulations, and high-fidelity situational judgment tests (SJTs).
A more sophisticated version of this shift is that the selection criteria are tailored to the specific
training and development objectives. For example, the job analysis effort preceding the develop-
ment of the selection criteria may describe the knowledge, skill, and ability prerequisites of the
organization’s training and development content and may create selection procedures that target
those prerequisites.
Where the HR strategy focuses on “buying” skills rather than “building” skills, the selec-
tion program is frequently a major mechanism by which this HR strategy is implemented. In
this case, the selection program is likely to emphasize the knowledge, skills, and experiences
necessary to perform the job at some adequate level of proficiency with a minimum of training
or development.
Of course, this feature of an organization’s HR strategy is usually dynamic and depends on
the particular job family, changes in organization budgets and business plans, and the particular
personnel decision for which the selection program is being used. Certainly, knowledge-based jobs
requiring advanced degrees (e.g., high-tech research positions) are virtually always supported by a
“buy” strategy even in the same organization that may adopt a “build” strategy for other jobs (e.g., a
customer service job requiring knowledge of specific product features). Similarly, internal progres-
sion programs that define the bases for promotion within an organization may constitute a build
strategy by relying on specific proficiencies demonstrated in feeder jobs. At the same time, entry
into the feeder jobs may reflect a buy strategy.
This complexity also extends to two more recent work management strategies that impact the
selection program in this way. First, an increasing emphasis on workforce management requires
218 Handbook of Employee Selection

that information about employees’ current skills be used to make selection decisions about moving
employees to other jobs. In this situation, selection programs may need to focus on two consider-
ations—the relevance of current skills to future work and the relevance of current performance to
future performance in new jobs. In this scenario, the distinction between skills and performance
can be important for a workforce management selection program. This distinction hinges on the
assumption that skills and performance are assessed differently. In our experience, workforce man-
agement strategies that focus on the movement of employees between jobs vary in the extent to
which they rely on records of performance and assessments of skills. Where the movement under
consideration is between two similar jobs, the selection emphasis is often on recent performance
in the current job. Recent performance is usually assessed by referring to administrative records
of recent job performance such as appraisal ratings, salary and bonus awards, progressive work
assignments, and the like. This approach is particularly evident within the realm of expatriate man-
agement, in which selection decisions are typically based on the expatriate’s technical expertise and
domestic track record as opposed to language skills, international adaptability, and other selection
context predictors (Bonache, Brewster, & Suutari, 2001; Harris & Brewster, 1999; Mendenhall &
Oddou, 1985). This may in part be due to the fact that the expatriate position for which the candi-
date is being sought is highly similar to the domestic position this candidate is vacating. In contrast,
where the movement is between dissimilar jobs, the selection focus is more likely to be on skills that
are assessed independently of administrative assessments of performance. Such skill assessments
may include ad hoc supervisor’s ratings of defined skill levels, skill tests, and ad hoc interviews.
Another scenario can be even more problematic for selection programs. Many organizations con-
tract with external organizations to provide workers who perform work in the client organization. If
the client organization does not require contract employees to complete its own selection process to
be assigned to the jobs in question, it almost certainly faces a future dilemma. The dilemma arises
when, as is often the case, the client organization eventually wants to hire a contract employee who
has demonstrated sufficient job success as its own employee. In this case, there is the very real and
predictable likelihood that some significant percentage of successful contract employees will fail
to satisfy the client organization’s selection criteria despite their demonstrated success on the very
job for which they are being hired. This conflict between job success and selection failure can cause
serious harm to the credibility and defensibility of the selection program, although it may be entirely
consistent with the level of validity and the de facto selection rate. Within sustainable selection pro-
grams, a strategy will be pursued to avoid this conflict by requiring all contract employees to satisfy
the selection criteria prior to being assigned to work in the client organization or by establishing
some form of selection policy (see below) that allows recent success in the same job to be a surrogate
for satisfying that job’s selection criteria.
This prospect of having two ways of satisfying selection standards for a job may also manifest
itself where a vacancy may be filled by external applicants or by incumbent employees as part of
internal progression programs. For example, the movement of employees from entry-level techni-
cal positions to higher-level technical positions may be governed by a progression program that
specifies requirements for progression from one level to the next. In such progression programs,
progression requirements are selection criteria, and the employee-applicant often has some degree
of control over the process of satisfying the requirements. Internal progression requirements often
consist of various standards including demonstrated skills, training certifications, and/or current
job performance. In contrast, external hiring into the same job may consist of a different profile
of selection criteria such as education degrees, years of experience, interviews, and perhaps even
qualification test results. It is not uncommon for internal progression requirements to be different
than external hiring criteria for the same position simply because more local information is known
about incumbent employees than about external applicants.
Where such differences occur, it is crucial to give careful consideration to the equivalence
of the two paths to the target position. In many cases, it is very difficult, if not impossible, to
define equivalence psychometrically. There may be few, if any, people who have scores on both
Managing Sustainable Selection Programs 219

sets of selection criteria. The selection criteria in the two paths may be qualitatively different.
For example, internal progression may rely heavily on administrative documentation of local
workplace behavior such as performance and training achievement, whereas external hiring is
likely to rely on indicators such as degrees and test and interview results. One possible empirical
definition of equivalence is that job hires coming from the two paths tend to perform equally well;
that is, they have the same expected performance level. Other definitions of equivalence may be
rational, rather than empirical. One rational definition is that receiving managers agree the two
sets of standards are equivalent. A similar definition is that receiving managers are indifferent with
regards to the qualifications of the two types of applicants. However established, it is crucial that
the organization establishes the equivalence of the two paths for the two sets of selection criteria
to be simultaneously sustained.

MANAGING CORPORATE AND UNIT INTERESTS


One of the most significant HR strategy factors for the success of selection programs in medium to
large organizations is the manner in which the sometimes conflicting corporate and unit interests
are managed. (To be sure, whatever balance might be achieved between these two interests, it is
likely to be dynamic and will change with business conditions.) In our experience, three dimensions
capture the majority of these issues—funding source, approval roles, and the myriad facets of
location differences.

Funding
It is trite to say that sustainable selection programs must be adequately funded. Although true, that is
not the point being made here. Rather, the manner in which funding for selection programs derives
from corporate budgets and/or unit budgets has great influence on the organizational pressures
acting on the selection program. Where funding is mostly or entirely from corporate budgets and
is relatively distant from the means by which units fund the corporation, it is likely that corporate
interests in defensibility, fit with HR strategy, and perceived fairness and equivalence across units
will take on more importance in the design and management of selection programs. Where unit-based
funding is substantial or, often, even contingent on unit-level satisfaction with selection programs,
the pressures for unit-specific design and management are likely to be much greater. Our view is
that the latter condition is more difficult to manage for selection program managers because it can
create pressures that are more likely to conflict with the professional values of consistency, validity
across broad job families, and job focus. In general, corporate interests tend to have a convergent
influence supportive of a single, whole, integrative selection program whereas unit interests tend
to have a divergent influence that leads to differentiated, multiple selection practices across units.
Divergence of interests is more likely to create fundamental conflicts with the professional and legal
standards for selection programs.

Approval Roles
Two types of approvals are covered here: (a) the approval to implement or change a selection program
and (b) the approval to waive or exempt individuals from the requirements of a selection program.
(More detail about such approvals is provided below in the discussion of policy.) Where these two
approval roles reside in a corporate organization—perhaps even in the corporate organization that
designed and validated the selection procedures—the interests of the corporation are likely to be more
influential than if either or both approval roles reside in the affected units. In many ways, the impact
of approval roles is the same as the impact of funding source. The organizational entity that funds and
approves has more influence. However, we have seen various combinations of funding and approval
roles that create complexity in their mutual impact on selection programs. Indeed, selection programs
may be most sustainable where funding is corporate but approval is local. (The reverse combination of
local funding with corporate approval is unlikely in our experience except in organizations with very
220 Handbook of Employee Selection

highly centralized authorities.) The relevance of approval roles to sustainable selection programs is
that, at its core, the authority to approve the implementation of, changes to, or exceptions to a selection
program is tantamount to approval authority over the content of the selection program.
It may be difficult to reach agreement to organizationally separate funding and approval roles,
but, when separated, they create a form of checks and balances that may serve to sustain a selec-
tion program across a wider range of circumstances than if both were housed in the same level of
the organization. Corporate funding and local approval give both organizational levels a significant
operational stake in and influence over the selection program that is commensurate with their neces-
sary interests in the program.
The value we place on balancing these interests is rooted in the perspective that the effectiveness
of a selection program (of great local interest) and its defensibility or compliance (of great corporate
interest) are both necessary considerations and both require attention to be optimized. An alterna-
tive perspective we have observed in some organizations is that a selection program’s defensibility
can be quite variable and is assured only by persistent and rigorous attention, whereas the effective-
ness of a selection program can be satisfied more easily by the involvement of professional-level
selection expertise. In effect, this perspective holds that effectiveness can be attained by the exper-
tise of the designer but that defensibility requires continual attention to and alignment among all
processes that comprise a selection system. This latter perspective is less likely to seek a balance
between effectiveness and defensibility and is more likely to place great weight on defensibility.

Location Differences: Expatriate Selection


There is, perhaps, no better manifestation of the potential for location differences to impact selection
strategy than expatriate selection. Conflict between corporate and unit interests are likely to be
particularly salient in multinational companies (MNCs), in which personnel decision-making is
further complicated by determining whether expatriates—who can be either parent country nationals
(PCNs), third country nationals (TCNs), or host country nationals (HCNs)—should be employed.
Welch (1994), in her framework of determinants of international HR management approaches and
activities, has conceived MNC personnel selection to be contingent upon (a) contextual variables
relating to the particular host country environment (i.e., the legal system and cultural distance),
(b) firm-specific variables (e.g., stage in internationalization, type of industry), and (c) situation
variables (staff availability, location of assignment, need for control, and locus of decision). Dowling
and Welch (2004) further added (d) the particular approach to staffing (ethnocentric, polycentric,
regiocentric, or geocentric) that the MNC embraces to this list of antecedents of MNC selection
practices. Within the ethnocentric approach, strategic decisions pertaining to selection are made at
headquarters and subsidiaries, which are managed mostly by PCNs and have little or no autonomy
in decision-making. The polycentric approach is characterized by more decision-making autonomy
on the part of subsidiary organizations, which are usually also managed by HCNs. Within the
geocentric approach, applicants are drawn from an international pool of executives, and PCNs,
HCNs, and TCNs may be selected into any job in any country depending on their ability. Finally,
the regiocentric approach is similar to the geocentric approach but different in that decision-making
is deferred to regional headquarters.
Although it is beyond the scope of this chapter to consider MNC staffing in detail (see Chapter
36, this volume, for further discussion), the point being made here is that the particular organiza-
tional environment created by these antecedents may compromise selection system sustainability.
For instance, MNCs with a geocentric staffing policy may be forced to revise their selection systems
in light of host country legal regulations and immigration policies enforced to promote the hiring of
HCNs. Similarly, MNCs that seek to exert control over their overseas subsidiary operations through
ethnocentric staffing policies may find that the HCN employees within the subsidiary perceive they
are being unfairly treated in comparison to expatriate PCNs. Finally, MNCs favoring a geocentric
staffing policy may find this selection system unsustainable because of the huge costs involved in
the training and relocation of its HCN, PCN, and TCN staff.
Managing Sustainable Selection Programs 221

In addition to the above issues, the expatriate selection system sustainability may be further com-
plicated because of the fact that expatriates are incumbents of a myriad of different occupations and
countries. The term expatriate may thus be legitimately used to describe a French banker in Hong
Kong and an American geologist working for an oil company in Saudi Arabia. Any standardization
vis-à-vis expatriate selection decision-making is therefore likely to imply the comparison of apples
and oranges. This being the case, Mol (2007) has called for an abandonment of research into expa-
triate selection as such. A multinational bank might be better off selecting expatriate bankers on
the basis of the selection system in place for the selection of domestic bankers rather than trying to
develop a selection system that is tailored specifically to expatriate bankers in Hong Kong.

ROLE OF PROFESSIONAL SELECTION EXPERTISE


A third strategic consideration is the role of the selection professional(s) who designs, develops, and
validates selection programs. There can be several dimensions to the scope of this selection design
role; for example, inside or outside, broad or narrow, and large or small. Among the several pos-
sible dimensions, we view the expert versus owner dimension as the one having the most significant
strategic impact on the sustainability of selection programs.
This dimension refers to the extent to which the selection design and validation role, which
is virtually always supported by professional expertise in some fashion, is accompanied by stra-
tegic ownership responsibilities. These strategic ownership responsibilities might include any of
the following: (a) ownership of the budget for selection design and support work; (b) ownership of
approval authority over features of selection decisions and processes; (c) ownership of data systems
in which selection data about applicants and, more importantly, employees are stored and managed;
(d) authority over use of selection assessment results; (e) ownership of compliance responsibilities
beyond validation, such as monitoring and reporting to enforcement agencies; (f) ownership of
employment delivery functions that manage employment processes, including selection assessment
tools; and (g) ownership of the agenda for the development and maintenance of existing and new
selection programs.
The fundamental issue is the extent to which the organization’s strategic direction for selection
programs is owned by the experts who also design and develop those programs. Of course, there are
probably as many combinations of roles relating to this issue as there are organizations with selec-
tion programs. Nevertheless, we make the simplifying assumption that any combination of such fea-
tures can be placed along a continuum from expert-only at one end to expert-owner at the other end
of a bipolar dimension. Here we describe the ways the expert-only scenario and the expert-owner
scenario can manifest themselves in an organization.

Expert-Only Strategy
In the expert-only strategy, the professionals who design, develop, and validate selection procedures
do not own the strategic direction of the organization’s selection programs. Although they may cre-
ate selection procedures, they do not determine which organizational needs will be addressed by
selection solutions, they do not determine what selection strategies will be applied across units, they
do not have authority over tradeoffs between cost and value of selection procedures, and so on. This
expert-only strategy can manifest itself in various ways. A recent organizational strategy is to house
selection experts in HR organizations sometimes called COEs or centers of expertise. These COEs
are positioned as technical resources to the business, which may be accessed by business units as
needed—in the judgment of the business units—to develop HR solutions to business problems.
Similarly, selection experts who are described as internal consultants serve in roles very similar
to COEs. COEs are almost certainly an indication of an expert-only approach. Another clear sign
of an expert-only approach is the situation in which selection experts are funded only on a project-
specific basis. This can be the case whether selection experts are located in corporate or local
organizations. A third sign of an expert-only approach is that the selection experts do not report in
222 Handbook of Employee Selection

an HR department. Being housed outside of HR almost always means that selection budget funding
is closely tied to specific projects rather than an overall strategic purpose for selection. Yet another,
more dramatic version of the COE approach is outsourcing, in which the selection design role is
moved from the home organization to an external organization to provide design services back to
the home organization. In this case, the role and scope of the selection designer’s work is specified
by a services contract, which is typically overseen and approved by nonselection experts. Finally,
the HR strategy of hiring vendors for professional selection expertise almost certainly indicates an
expert-only role.
The expert-only approach is likely to have several typical consequences for selection programs.
First, it will be difficult to develop a long-term plan for the gradual restructuring or introduction of a
comprehensive selection program or to evolve to become an experimenting organization (cf. Binning
& Barrett, 1989) where selection assessment results might become part of a larger workforce plan-
ning strategy. Second, virtually all authority over the administration of the selection program and
over initial decisions about standards and policies is likely to reside in the funding organization or,
possibly, in the organization responsible for the administration of the selection procedures to can-
didates. Third, the development of selection programs that apply in some consistent fashion across
different organizational units will be difficult. The scope of selection design work is more likely
to have a local focus on particular jobs within particular units. Fourth, local business leaders may
provide stronger support to locally focused selection programs than corporately focused programs
if they see the programs as more directly tailored to their specific needs.

Expert-Owner Strategy
Selection experts who also own selection strategy identify strategic directions by analyzing orga-
nizational needs that cut across units or that have the greatest long-term benefits for the organiza-
tion as a whole. A critical strategic activity for selection owners is long-term planning. Strategic
planning can take many forms but almost always includes collaborative planning with HR leaders
across units of the organization. Such planning would typically focus on common interests across
units as well as unique interests of specific units. As mentioned earlier, particular challenges may be
faced in this regard by expert-owners in MNCs in which subsidiary local idiosyncrasies (such as the
host country legal context and the local labor market) may prevent the establishment of a selection
strategy that cuts across the various units of the organization. Here, multinational organizations will
clearly need to be sensitive to local needs rather than merely attempting to impose a standardized
procedure upon multiple units. Indeed, we would argue that this tension between standardization
versus country specificity will require active management.
Strategy ownership can manifest itself in various ways. Owners are more likely to have responsi-
bility for neighboring functions such as employment policy, employee research, training of employ-
ment administrative staff, and regulatory compliance that depend on or influence the selection
strategy. Strategy owners may have stronger and longer relationships than expert-only managers
with corporate and unit HR leaders because their roles are more comparable and interrelated. At the
same time, owners may not have strong relationships with unit business leaders because funding is
not as likely to be directly tied to business unit sources. This can be an important consideration for
selection strategy owners. Certainly there is considerable value in well-developed relationships with
HR leaders and with the business leaders they support. Typically, these relationships are not man-
aged independently. One approach is for the selection owner’s primary unit relationship to be with
the unit’s HR leader with business leader relationship managed under the umbrella of the HR-leader
relationship.
Strategy ownership has other implications for selection programs. They are more likely to be both
coherently integrated across units of the organization and supported by a common set of policies
and practices. Strategic selection programs are more likely to be integrated with other HR programs
such as training and development, workforce management, compliance support functions, and
organization-wide HR information systems. The selection development function is more likely to
Managing Sustainable Selection Programs 223

have policy authority regarding managers’ roles in selection decision-making, even if these roles vary
from unit to unit. One of the most tangible possible indicators of strategy ownership would be that
selection developers would have created a selection strategy document used to communicate and plan
with units and other HR functions and to develop selection budgets.
Overall, strategy ownership can be used to build in several features of selection programs that
promote sustainability. The one cautionary note is that strategic ownership tends to align itself
with corporate interests that are longer term and cross-unit. It is critical that the strategic role not
inadvertently lead to weaker relationships with local units where the actual selection decisions
are made.

ALIGNMENT OF SELECTION DEVELOPMENT AND SELECTION DELIVERY


A key issue for selection program managers, although somewhat narrower than the strategic issues
above, is the relationship between the research-based, professional selection development function
and the operational, transaction management function that administers, scores, and uses selection
procedures to make selection decisions. In many organizational arrangements, the development of
selection procedures is an organizationally separate function from the delivery of those same pro-
cedures. Even if the development and delivery functions have a collaborative working relationship,
their budgets often are developed and managed separately.
The primary issue is that these two HR functions are likely to have somewhat different priorities
and success criteria. In our experience, the priorities for selection developers tend to center on issues
of validity such as job relevance, assessment content, and impact on business needs; and on legal
defensibility. Their education, professional standards, and organizational expectations point them in
these directions. In many cases, selection developers’ budgets do not pay for the transactions associ-
ated with administering, scoring, and using the results of selection assessments.
In contrast, the organizations that deliver the selection procedures are often faced with very dif-
ferent measures of success. Their performance is measured typically in units of cycle time, cost per
hire, average time to fill a vacancy, and other process management metrics. Because selection deliv-
ery is viewed most often as transaction management, its success is measured in terms of transaction
characteristics. Delivery functions may even have service agreements with units that specify target
values for speed and cost metrics. This is now typical in the case of outsourced employment delivery
services. This perspective is reflected in the many available HR benchmarking resources such as the
Saratoga Institute (http://www.pwc.com/us/en/hr-saratoga/index.jhtml) that describe employment
delivery “best practices” in process management terms.
There is frequently a natural tension between quality of selection programs and the speed and
cost of delivering them. Changing employment market conditions may drive per-hire speed down
and cost up. Changes in business conditions may alter the urgency with which vacant positions
must be filled. Managers’ satisfaction with new hires may drop due to changing job requirements or
conditions. Any number of variable factors such as these can create circumstances in which there
is pressure to rebalance the existing combination of quality, speed, and cost. This is a dynamic ten-
sion and how that tension is managed can have a significant impact on the sustainability of selection
programs. The first step toward effectively managing the tension is to determine who “owns” the
conflicting interests. In most cases, the answer is that the employing unit is the ultimate owner of
the interest in quality selection and the interest in the speed and cost of selection. Of course, other
stakeholders such as corporate HR leaders, employment lawyers, and compliance leaders may have
an interest as well.
The second step is to determine how the owner’s interests are assessed. What is the current cost
per hire and how and why has it changed? What are the current turnover rates, new hire failure rates,
sales success rates, and so on? How and why have they changed? Frequently, by virtue of their focus
on transaction management, delivery functions have established performance metrics that continu-
ously track speed and cost metrics. The nature of transaction management is that it often tracks such
224 Handbook of Employee Selection

factors to respond to changes when they occur. In sharp contrast, developers of selection programs
rarely track quality indicators such as turnover, performance levels, and job requirements on a con-
tinuous basis. One reason is that employee quality data are more difficult to obtain and developers
often spend the effort to gather them only in the context of ad hoc validation studies. Another, per-
haps more fundamental reason is that the quality of selection programs is not viewed in most cases
as highly variable across continuous short time intervals. A third, more subtle reason may be that
selection developers are generally conservative about the “validator’s risk” (M. Tenopyr, Personal
Communication, 1988). The validator’s risk is the gamble selection developers take with local vali-
dation studies that the study may not support a conclusion of validity. In countries where the regula-
tion of employment selection procedures hinges on validation evidence, selection developers view
validation evidence in a legal context. The validator’s risk combined with the legal context often
results in developers being conservative about the conduct of validation studies. It is very unusual
for developers to have a continuous validation process in place. (A major exception to this is the case
in which the developer is a large consulting house with a selection process implemented in many
client organizations. In this case, the developer may have virtually continuous validation efforts
underway within and across client organizations. This ongoing, large-scale validation strategy tends
to minimize the validator’s risk, maximize defensibility, respond to local client desires for impact
measures, and provide external marketing documentation.)
The net result of these factors is that delivery functions are more likely than development func-
tions to have recent and continuous assessments of factors of interest to the owner. Indeed, with the
growing popularity of customer satisfaction measures for HR processes, selection delivery func-
tions may even have ratings of new hire quality (the developer’s domain) from hiring managers.
Independent of any other considerations, the ready availability of speed and cost metrics compared
with quality metrics can cause speed and cost metrics to be given more weight in the process of
rebalancing quality with speed and cost.
Given the availability of information about quality and speed and cost, the third step is to
determine the decision process(es) by which the current pressure to rebalance interests is resolved.
One efficient approach to these decisions is to establish a two-stage decision process. Stage one
is an administrative process designed to handle routine minor or temporary fluctuations without
directly involving the ultimate owner of the competing interests. Policies and practices can be
established with the owner’s concurrence to resolve such pressures. For example, temporary busi-
ness conditions that increase the urgency with which a vacant position must be filled might be
routinely addressed by an administrative approval process for authorizing a temporary change in
the selection standards. The key features of this first stage are that it is an established process the
owner has endorsed and that the developer and/or deliverer manages the process on behalf of the
owner’s interests.
Stage two is reserved for occasions in which the pressure to rebalance is greater in scope, more
important, less routine, and/or has longer-term implications than the simpler administrative process.
The key difference with the first processes is that here the owner is directly involved in the rebalanc-
ing decision. In stage two, the role of the developer and deliverer is to provide information about
the competing factors and to describe the methods and implications of changes to those factors as
well as constraints on what is possible. They may also be well served to make recommendations to
the owner.
The underlying principle of this two-stage process is that, above some threshold of importance,
the accountability for balancing competing interests of quality, speed, and cost lies with the ulti-
mate owner of those interests. One of the greatest risks to a selection program’s sustainability is the
disengagement of the ultimate owner from key decisions that impact the value and cost of the selec-
tion program for the owner. An important secondary benefit of an owner-based decision process for
rebalancing competing interests is that it occasionally re-engages the owner with the accumulated
information and decisions that give direction to selection programs and that ultimately impact the
owner’s success.
Managing Sustainable Selection Programs 225

HR TECHNOLOGY
We briefly note here that the manner in which HR technology is applied to selection programs can
have a significant impact on the sustainability of selection systems. This subject is taken up in con-
siderable detail in Chapter 8, this volume.

SELECTION SYSTEM GOVERNANCE


Some amount of governance is inevitable for any organization process, such as selection systems,
that affect the people of an organization. At a minimum, governance of selection processes serves
to promote efficiency, fairness, accountability, and compliance with legal regulations and corporate
mandates. Beyond that, governance can enable more strategic objectives such as promoting people’s
effectiveness and contribution to the organization, facilitating the integration of multiple related
processes and organizational units, and shaping the culture of an organization or ensuring the fit
between culture and process.
Governance of selection processes can be narrow or broad. A narrow governance of selection
program often focuses on legal/regulatory compliance and may take the form of oversight by an
employment attorney or HR policies defining and limiting the use of assessment results. Broader
governance can address a much wider range of process issues such as the acceptability of selection
practices given an organization’s culture, ownership of selection decisions, rules relating to the use
of test scores as discussed by Tippins (2002), the role of local managers and HR staff in support-
ing or participating in selection processes, metrics for managing selection, and corporate and local
authority over the selection processes.
In general, there are two layers of governance—guiding principles and policy requirements.
At the higher level of governance, organizations sometimes establish general principles that guide
various decisions in the development and use of selection programs. At the lower level, policies
shape and control the behavior of stakeholders in selection programs. Both are critical in creating
and sustaining selection programs. It can be useful to think of operating principles as preceding
and providing overarching direction that lead to key decisions about selection programs. Selection
policies can be more or less specific. Some policies, such as HR-level policies and guidelines, are
like operating principles in that they are precursors that shape selection programs; other policies are
a function of the legal and technical requirements of effective, defensible selection programs and
follow from those requirements.

GUIDING PRINCIPLES
We describe guiding principles here despite our experience that many organizations do not explic-
itly identify the guiding principles that shape selection programs. Where used, guiding principles
express the implications of an organization’s cultural values for selection programs. One reason
they may be less common is that many organizations have defined and communicated explicit
cultural values that are general in nature (e.g., integrity, respect for others, customer focus, team-
work, and safety), which may be seen as sufficient to provide overall guidance to HR programs,
including selection programs. Also, it is frequently the case that selection programs have a
considerable amount of process-specific governance in the form of policies, systems requirements,
and well-defined practices. Even if guiding principles have a strong influence on the development
of such polices and practices, once those policies and practices are implemented, behavior in
the selection processes may be constrained to the point that guiding principles may have little
incremental value.
Guiding principles apply across all units within an organization. They are not authority-based but,
like general organizational values, are expectations or aspirations. Even where guiding principles
are defined and made explicit for selection programs, there is rarely any enforcement mechanism
226 Handbook of Employee Selection

put in place other than the general accountability associated with an organization’s code of conduct,
explicitly or implicitly stated.
The benefit of guiding principles is that they provide operationally relevant guidance to selec-
tion programs. The overall purpose of this guidance is to ensure selection programs are designed,
delivered, and used consistent with the values most fundamental to the organization. In this way,
they enhance the sustainability of selection programs by ensuring a fit with the organization. The
following briefly describes selection program operating principles we have found in various large
organizations.

1. People are accountable for selection decisions: This principle fixes the accountability for
selection decisions on the people who make hiring decisions, rather than on rules or algorithms
that might be built into decision support tools. An implication of this principle is that selection
programs should be designed to inform and guide decision-makers, not replace them.
2. Choose the best: This principle establishes a deliberately high standard for who is
selected.
3. Equal access/opportunity: Many organizations will espouse an operating principle relating
to some aspect of fairness in the selection processes. In cultures that place high value on
performance-based results, such as the Latin American societal cluster (Gupta & Hanges,
2004), this principle is unlikely to refer to equal outcomes and is more likely to refer to
some other equity principle such as equal access or equal opportunity.
4. Compliance with government regulations: An operating principle endorsing compliance
with prevailing laws may seem unnecessary. Nevertheless, organizations that choose to
endorse such a principle may do so to set an important tone for all participants in its selec-
tion programs. Communicating about the importance of compliance can have a chilling
effect on risky behavior, and the language an organization uses to describe a compliance
principle should have significant impact on the manner in which compliance is addressed.
5. Selection processes do not replace performance management: This principle addresses
the appropriateness of possible uses of assessment results. Our experience has been that,
occasionally, the relative ease and simplicity of assessment-based selection processes lead
managers to consider ways in which assessment could be used to facilitate other person-
nel decisions. This operating principle would discourage the use of ad hoc assessments as
surrogates for current performance information in which current performance information
has not been well managed.
6. Selection assessment benefits the individual as well as the organization: Organizations
that embrace an explicit commitment to act in the interests of employees and even appli-
cants may endorse some form of principle that selection assessment results should benefit
the people who are assessed. This can be a strong principle that leads to assessment feed-
back, assessment preparation information, and assessment-based development feedback
that might not be provided otherwise.

In summary, guiding principles are intended to provide values-based guidance to the development
and ongoing administration of selection programs as well to the individuals who make selection
decisions. They are likely to have less ongoing value where an organization develops policies and
practices that impose relatively “tight” constraints on processes and people. But where such policies
and practices are either nonexistent or impose only “loose” constraints, guiding principles may be
more important.

SELECTION POLICY
In contrast to guidance of operating principles, selection policies are prescriptive. They define
authority and roles, establish rules and requirements, and set limits and boundary conditions.
Managing Sustainable Selection Programs 227

Because policies have direct and explicit impact on the behavior of virtually all participants in the
selection process, they often are seen as the “face” of the selection program. They are some of the
most tangible aspects of a selection program. Selection policies also are perhaps the best indication
of the ownership of the selection program. Because policy establishes authority, policy ownership is
the clearest indicator of selection program ownership.
It is likely that selection programs are among the most policy-oriented of all HR programs and
practices. There are three primary causes. First, in many countries, national and/or local laws regu-
late employment selection. Where it is regulated, organizations are very likely to establish policies
that enable compliance with those legal requirements. Second, employment selection is a high-
stakes process. Employment decisions have large consequences for people on both sides of the deci-
sion. People care a lot about employment decisions, and their interests sometimes conflict. Policies
are often used to manage these interests and the consequences for people. Third, employment selec-
tion is about a scarce, but valuable, resource. A qualified person selected into one job by one unit is
no longer available to be selected by other units for other jobs. Many organizations have found that
policy is required to govern managers’ access to candidates and candidates’ access to jobs.

A Taxonomy of Selection Policy


Perhaps the best way to describe selection policies is to provide a broad taxonomy with instructive
examples in the major cells. But this is not easy. The full set of policies may have accumulated over
time and, on some occasions, without great attention to the possible interrelationships among them.
Indeed, this week’s major conflict may become next week’s new policy. Some policies are ad hoc to
specific issues in specific situations. Nevertheless, a reasonably representative taxonomy of policy
content is one that organizes selection policies into four interrelated categories: (a) selection data
and results, (b) uses of selection results, (c) access to selection processes, and (d) legal compliance.

Policy About Selection Data and Results


This category of policy governs the creation, storage, and access to the formal information used to
make selection decisions. The information ranges from resume information (e.g., degrees, previous
work history, and demographic information) to formal assessment scores and results (e.g., tests,
interviews, inventories, and formalized scores representing past accomplishments, among other
things). Policies govern who may create the data, the rules by which the data are generated, the
place and method by which the data are stored, and access to the data once stored. Policies about
who may create or generate the data usually take the form of role definitions and training require-
ments for the people who administer and score tests, interviews, and other formal assessments as
well as the people and/or system processes that search resumes and codify information into more
manageable formats.
An increasingly important subset of policy regarding selection data and results governs access to
these data. The question is, who may have access to a candidate’s selection data? In our experience,
this policy consideration varies considerably across different types and sizes of organizations. In
many small organizations, formal selection data might be found in HR managers’ files or in local
hiring managers’ files where access is governed informally only by local policies, if any, regarding
access to other managers’ files. In some large organizations where privacy concerns and the legal
salience of such data are important, explicit policies may specifically restrict access to selection data
on a need-to-know basis. In many organizations, access to selection data is treated with the same
protections as are provided for compensation data but with somewhat lesser protections than are
provided for employee’s medical records.
In the United States, a significant consideration is the extent to which and circumstances
under which privacy and confidentiality protections under the Health Insurance Portability and
Accountability Act (HIPAA; 1996) apply to selection data. In general, where selection data are
not “health information,” employers are not acting as “healthcare providers” in making selection
decisions and the selection process does not constitute the provision of “health care,” it is unlikely
228 Handbook of Employee Selection

that selection policy about access to selection data would need to incorporate HIPAA provisions.
A fundamental principle underlying HIPAA protections is that the patient “owns” her health infor-
mation and that the patient’s authorization is required to release and transmit that information.
Notwithstanding HIPAA in the United States, some organizations may choose to establish a selec-
tion data ownership policy that explicitly establishes ownership of selection data. It is unlikely an
organization would regard the applicant as the “owner” of her selection data for various reasons.
However, the organization may establish access rules that protect the interests of candidates to have
the selection data be used only for purposes for which it is intended and valid. For example, as a
practical matter, a policy might prohibit managers from accessing an employee’s selection data for
purposes of performance management feedback or promotion consideration to a job for which the
selection data are not relevant.

Uses of Selection Data


The broadest category of selection policy is this category that includes policies relating to the use
of selection data and results. These policies cover a broad range of topics from initial approval to
implement the selection process, to decisions about specific standards or qualification requirements,
and to questions of alternative ways of satisfying selection standards.
Perhaps the most consequential policy in this category is the policy that establishes the authority
to approve the implementation of, including changes to or discontinuation of, a selection process.
The three most likely choices are a corporate HR role, a unit HR role, or a unit business leader role.
Rather than recommend one or the other, the primary point to make regarding this policy is that
it reflects two layers of authority. One layer is the authority granted by the policy to the person/
role who may authorize implementation. For example, a policy may grant a business unit leader
the authority to implement and change a selection process. But there is another layer of authority
implied by the policy itself. This policy, like every other, is issued by some organizational entity
that grants authority to others (in this case to business unit leaders) to authorize the implementation
of a selection program. This meta-authority might be considered the executive owner of selection
programs because it grants operational authorities over the details of the program. It is important
for successful programs that it be clear where the meta-authority lies. That is, it is critical that it be
clear who the policy-maker is, even if others have been granted operational authorities over aspects
of the selection program.
Within this category of selection data uses, a major subcategory consists of the authority(ies)
for the decisions that establish the standards for selection decisions. The standards are the rules
by which the selection results may be used to inform, influence, or dictate selection decisions. For
example, cut scores that determine whether a candidate is qualified or not are standards. Strong
selection programs formalize these standards so that they may be authorized and implemented.
Typically, the authority to authorize standards is the same as the authority to waive a standard in a
particular case or exempt a candidate from having to meet a standard. However, additional policies
may be established to provide for a more administratively feasible process of evaluating and autho-
rizing ad hoc waivers and exemptions. If high-ranking managers or executives own implementation
approval authority, it may be administratively helpful not to involve these time-pressured executives
in all ad hoc requests for waivers or exemptions. In this case, policies may be established that autho-
rize representatives of the executive to evaluate and decide routine waiver or exemption requests.
The policies may even provide guidelines to be considered by the authorizer.
In contrast to policies authorizing ad hoc waiver and exemption decisions, routine exemptions
are usually handled as part of the full set of rules governing the selection program. Routine exemp-
tions refer to known, anticipated conditions under which a candidate is not required to satisfy an
ordinary selection requirement. Three types of standard exemptions are common. First, so-called
grandfathering exemptions refer to conditions in which a candidate for a particular job has already
performed that same job at some satisfactory level of proficiency for a period of time. Grandfathering
rules would exempt such candidates if they satisfy the specific conditions laid out by the rules.
Managing Sustainable Selection Programs 229

The most common example of grandfathering applies to incumbents in a job when new or revised
selection standards are applied to that job.
A second type of standard exemption relies on an equivalency between two different sets of
selection standards. For example, a work simulation assessing telephone-based customer handling
skills in a sales context may be regarded as assessing the same skills, perhaps at a higher level, as
a similar work simulation designed in a service context. An equivalency policy means that candi-
dates who satisfy a particular standard on one selection procedure are treated as having satisfied
a particular standard on another selection procedure. The third common example of a standard
exemption relies less on an equivalency rationale than on a substitution rationale. For example, a
candidate who has demonstrated certain work experience, training, degrees, or other education/
training experience may be exempt from having to meet a test standard designed to be predictive of
those very accomplishments. In effect, the candidate has accomplished the criterion result the test
was designed to predict.
Selection programs are less prone to incremental erosion of confidence and credibility to the
extent systematic rationales for exemptions can be anticipated and accounted for in the original
applications rules and taken out of the hands of local, ad hoc decision-makers.
A final example is offered of a policy designed to establish authority for an ad hoc decision about
the use of selection standards. This example is different from the examples above, which rely on
some form of equivalence or substitution rationale. In effect, those rationales are all grounded in
the construct relevance of one set of standards to another set of standards. In contrast, this example
is grounded in what might be called a business necessity rationale. The typical situation is one in
which there is an ordinary set of selection standards for a particular job. For the sake of this exam-
ple, assume this ordinary set of standards is typically satisfied by 20% of the applicants. In all likeli-
hood this set of standards was chosen, in part, because the selection ratio yielded by these standards
enabled the hiring organization to meet its ordinary hiring needs at an acceptable cost and speed.
But business circumstances are always changing. From time to time, the hiring organization may
have an urgent need to substantially increase its hiring rate. For example, in The Netherlands man-
datory military service was lifted in the 1990s, resulting in thousands of unfilled vacancies. In this
case, there can be a compelling business necessity rationale to temporarily or permanently reduce
the selection standards to achieve the increased hire rate necessary to meet the business need. A
policy can be developed to address this special case that allows standards to be temporarily lowered
and may even specify certain conditions that must be satisfied to ensure the need is substantial. At
root, this authority, like the others described above, owns the responsibility to evaluate the ad hoc
tradeoff between the benefits of a faster, easier, less onerous, and possibly fairer seeming selection
process with the potential loss in expected performance among those selected. However, the policy
assigns authority, and it is important for these exemption processes to rely on input from the affected
business managers about the impact of the tradeoff on their business.

Access to the Selection Process


A third category of policy considerations is about candidates’ access to the selection process. Where
selection processes are in place, they serve as one of the gateways to desired jobs. Candidates who
do not have access to the selection process are effectively excluded from the sought jobs. A typical
selection program will have rules or practices defining how candidates have access to the selection
process. These might be as simple as scheduling requirements or as complex as having to satisfy a
series of prescreening steps, each requiring time and effort.
Some of the most common policy considerations for managing access include retest require-
ments, the ability to complete the assessment processes, physical accessibility, basic qualifications,
restrictions placed on incumbents regarding frequency of internal movement, where and when the
assessment processes may be completed, what organization resources (e.g., proctors and appropri-
ate space) are required to administer assessment processes, and the number of available vacancies
needing to be filled.
230 Handbook of Employee Selection

Often there are competing interests with respect to policy efforts designed to manage applicants’
access to selection processes. Policies that restrict access often have the direct or indirect effect of
increasing the yield rate among the applicants who do have access under those policies. For exam-
ple, typical retest policies limit applicants’ opportunities to retake selection tests they have previ-
ously failed. Given that retest policies, by their nature, limit the access of people who do relatively
poorly on tests, they are likely to increase the overall yield rate of the selection process by limiting
access to some lower scorers. Of course, the countervailing effect of retesting is that it generally
increases scores by approximately one-fourth of a standard deviation (Hausknecht, Halpert, Di
Paolo, & Moriarty Gerrard; 2007), thus increasing the pass rate among the applicants who retake
tests. However, recent evidence from Lievens, Reeve, and Heggestad (2007) indicated that this score
increase introduces measurement and predictive bias that harm criterion validity. Similarly, policies
that exclude candidates who do not meet basic qualifications such as education and work experience
are, in many cases, more likely to block lower qualified applicants, thus increasing the overall yield
rate. These types of access policies that increase the yield rate will, all else the same, reduce cost
per hire and, possibly, reduce the cycle times for employment processes.
On the other hand, policies that facilitate the access of larger numbers of applicants better ensure
that enough qualified candidates are available at any point in time. Also, they accommodate the
desires of candidates who seek jobs in the organization, thus potentially improving the candidates’
good will toward the organization. Also, increased access may reduce recruiting expenses, all else
the same.

Legal Compliance
Certain selection policies are directed primarily at the organization’s legal compliance responsibili-
ties. The policies we include in this category are the policies that establish authority for monitoring
selection results for evidence of prohibited discrimination, for owning modifications to selection
procedures to improve compliance, for the decisions about whether any modifications to selection
procedures should be made, and for responding to enforcement agencies’ requests for compliance
information.
This category of policies also relates to the question of the “official” database of selection results
for applicants and employees. It is often the case that selection data are formally and informally
located in various files—paper and electronic. Certain selection data such as hiring manager inter-
view ratings and protocols are often kept in local personnel files or even in hiring manager files.
In contrast, other selection data such as formal assessment results, demographic data, and resume
information are often maintained in corporate or unit HR information system databases. Compliance
support policy should specify what the “official” selection database is, how selection data get into
that database, how they are organized there, and who is responsible for putting them there.
An additional consideration regarding compliance policy is that the compliance issues associated
specifically with selection processes are often part of a larger employment and recruiting context.
Enforcement agencies may be as interested in recruiting and sourcing methods, resume searching
and screening, and an organization’s internal staffing system features as they are in detail about
selection procedures, data, and decisions. This broader context of compliance issues often involves
other roles and organizations beyond the development, validation, and maintenance of selection
programs. In this situation of multiple organizations having a role in employment compliance, selec-
tion policy is best integrated with compliance policies of the other organizations. However this inte-
gration takes place, it is advantageous to have a clearly established role with overarching authority
over responses to enforcement agencies.

Authority and Accountability Alignment Principle


A final observation about selection policy is that the sustainability of a selection program relies on
policies that align authority with accountability. As noted above, policies often specify who and
where the authority is for making decisions about selection programs. One specific example is the
Managing Sustainable Selection Programs 231

policy that determines who authorizes the selection standards for a particular selection procedure.
To help make this point about alignment, suppose a new selection procedure is designed to make
hiring decisions for a call center where account representatives talk to customers about their orders,
bills, and payments. The selection procedure consist of a work sample exercise to assess customer
handling skills and a cognitive ability test to assess information learning and processing skills.
In this example, a policy question is, “Who should have the authority to approve the standards by
which these selection procedures are used to make selection decisions?” The standards can take
many forms, including pass-fail cut scores, scores bands, and methods of combining the work simu-
lation and cognitive test results. The choice of standards will impact the cost and speed of the hiring
process, the performance of the new hires, and the degree of impact on protected groups, if any. In
determining who should have the authority to approve the final set of standards, the question that
should be asked is, “Who has accountability for the outcomes that will be affected by the approved
standards?” In this example, the accountabilities may be divided among different organizations
and roles. For example, the business leader over the call center operation is likely to have ultimate
accountability for the performance of the account representatives. In some organizations, that same
business leader may also have ultimate accountability for the supporting employment process and
its compliance with prevailing regulations. In this situation, a very strong case can be made that the
business leader who is accountable for all of the most important consequences of the selection deci-
sions should have the authority to approve selection standards. This policy would then, presumably,
define the role of the designer of the selection system, the manager of the employment process, and
the compliance manager as expert resources to the business leader’s decision about the standards.
This situation is an example of high alignment between authority and accountability.
The point of this subsection is that selection policies contribute to selection system sustainability
in various ways, but that a paramount requirement of selection policies is that the authority granted
by a policy should be aligned with the accountability for the consequences of the decisions made
under the policy. One implication of this alignment principle is that the selection program designer
may not have the authority over all selection-relevant policy decisions. In particular, the authority
to approve the selection standards that drive key business results is most aligned with the role that
“owns” the same business results.

SELECTION PROCESS MANAGEMENT


This chapter has considered several layers of sustainability factors ranging from organizational-
level considerations of fit, HR strategy, operating principles, and policies. This sequence has pro-
gressed from general to specific where organization purposes and HR strategy provide general
direction for selection programs and operating principles and policies specify increasingly specific
characteristics of sustainable selection programs. Process specifications and process management
are at the most specific end of this continuum. Process is the layer at which the most specific and
detailed characteristics of a selection program are defined and managed. It is not the purpose of this
chapter to consider all the possible variations of selection process detail. That variation is as wide
as the differences between organizations. Rather, this chapter addresses one specific component
of process specification and management that is becoming an increasingly significant factor in the
management of selection programs. This component is the role and application of process metrics
used in the management of selection programs.
It is our observation that the growing emphasis in HR management on HR process benchmarking,
best practices, plug-in systems, and cross-HR process integration is reaching into the management
of selection programs. Clearly, this impetus is coming from trends in the HR management profes-
sion and not from any such trends in the personnel selection profession. For selection practitioners,
the focus of this trend is significantly different from the selection profession’s historically research-
oriented focus on validation, tools, constructs, and predicted outcomes. This change emphasizes
processes and metrics as the mechanisms for managing HR work. We will briefly discuss here the
232 Handbook of Employee Selection

impact this trend is having on the management of selection programs and will offer suggestions
about strategies for sustaining selection programs in this changing context.
The distinction between the transaction management work of employment process manage-
ment and the “knowledge management” work of the development and validation of selection
programs is important. Like other HR-oriented “knowledge” work (e.g., compensation and labor
relations), the development and validation of selection programs has historically been managed
as an expertise, not a process. In general, the performance standards for these types of work have
been imprecise and general. Typically, the evaluation of a selection developer’s performance in
the development of new selection procedures does not rely on quantified metrics describing the
development process.
Increasingly the focus on process management has invited the “customers” of employment
processes, hiring managers, and business leaders to require metrics of the employment process
as the means by which they evaluate the quality of those services. Common employment process
metrics include (a) cycle time measures such as time from requisition to hire and time between
employment process events; (b) flow rates through each step in the employment process (e.g., the
rate at which people who schedule an employment office interview actually show up, complete
the interview, and move on to the next event); and (c) various cost measures such as cost per hire,
cost per candidate, or cost per event such as cost per assessment test or per interview. Clearly,
these process-oriented metrics are affected by the selection procedures and standards produced
by the selection developer, which may be seen as a root cause of satisfactory or unsatisfactory
process metrics.
Beyond these most typical metrics, additional metrics may be included to capture information
about the quality of the selected employees. The two most frequent examples of quality-of-hire met-
rics are early survival rates (e.g., 3-, 6-, and 12-month survival) and hiring manager (i.e., customer)
ratings of early overall satisfaction with the new hires. However, a fundamental problem is that the
process options available to employment process managers may have little effect on quality-of-hire
metrics. Indeed, options such as enhanced job preview processes and more targeted recruiting prac-
tices, which may offer some improvement in quality-of-hire metrics, may do so at a higher cost.
We suggest an approach here that may be helpful for selection program managers faced with
this challenge that employment process metrics are creating new pressure on the sustainability of
selection procedures. Essentially, this approach is to rethink the potential value of process metrics,
not in terms of research value but in terms of business decision value, and change or supplement the
information available to business leaders to help them continuously monitor the benefit of selection
procedures and accompanying usage standards. The research perspective tends to view a selection
program as a relatively fixed, unchanging manifestation of the basic, stable requirements of job suc-
cess. The business process perspective views selection programs as organizational processes in the
context of real-time business conditions that can change rapidly.
These different perspectives have led to very different approaches to the evaluation of selection
procedures and employment processes. Validation is an episodic, occasional event that is needed
only every several years to confirm that the causal model has not changed. Process metrics repre-
sent a continual process that enables process managers to optimize processes as needed. Business
managers are not trying to confirm scientific conclusions; they are trying to make business decisions
with uncertain data to optimize important outcomes.
Our own perspective about these divergent perspectives is that, although selection developers
cannot surrender the importance they attach to validation, they would be wise to become more open
to the prescientific value of continuously gathered data about worker behavior such as the quality-of-
hire data gathered by employment process managers. For many reasons, these types of data do not
have the information value of worker behavior data gathered in research settings. But they do have
value for building a more complete understanding of the possible situational dynamics that impact
worker behavior and a deeper understanding of the relationship between worker behavior and the
business outcomes that are most important to work managers.
Managing Sustainable Selection Programs 233

CONCLUSIONS
This chapter describes the organizational considerations that directly influence the sustainability
of selection programs. The four overarching categories of these organizational considerations are
organization purpose, HR strategy, governance, and process management. Beyond the technical
considerations of validity, utility, bias, and fairness, we make the case that these organizational con-
siderations are critical in designing and implementing a selection program. To the extent that pur-
pose, strategy, governance, and process are deliberately incorporated into the design of the selection
program, the success of that program is better ensured. Inattention to these organizational consider-
ations can undermine the sustainability of a selection program despite its validity.
We also note here that much of this chapter has been written from experience more than research.
The sustainability of selection programs warrants more research attention than has been given in
the past. Psychometric concerns are critical, but any organization that neglects sustainability does
so at its own peril and likely will find, in due course, that the psychometric integrity of its selection
procedures is inevitably compromised.

REFERENCES
Anderson, N., Lievens, F., van Dam, K., & Ryan, A. M. (2004). Future perspectives on employee selec-
tion: Key directions for future research and practice. Applied Psychology: An International Review, 53,
487–501.
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential
and evidential bases. Journal of Applied Psychology, 74, 478–494.
Bonache, J., Brewster, C., & Suutari, V. (2001). Expatriation: A developing research agenda. Thunderbird
International Business Review, 43, 3–20.
Burch, G. S. J., & Anderson, N. (2004). Measuring person-team fit: Development and validation of the team
selection inventory. Journal of Managerial Psychology, 19, 406–426.
Chatman, J. A. (1991). Matching people and organizations: Selection and socialization in public accounting
firms. Administrative Science Quarterly, 36, 459–484.
DiMaggio, P. J., & Powell, W. W. (1991). Introduction. In W. W. Powell & P. J. DiMaggio (Eds.), The new
institutionalism in organizational analysis (pp. 1–38). Chicago: University of Chicago Press.
Dowling, P. J., & Welch, D. E. (2004). International human resource management: Managing people in a mul-
tinational context (4th ed.). London, England: Thomson Learning.
Gupta, V., & Hanges, P. J. (2004). Regional and climate clustering of social cultures. In R. J. House, P. J. Hanges,
M. Javidan, P. W. Dorfman, & V. Gupta (Eds.), Leadership, culture, and organizations: The GLOBE
study of 62 societies (pp. 178–218). Thousand Oaks, CA: Sage.
Harris, H., & Brewster, C. (1999). The coffee-machine system: How international selection really works.
International Journal of Human Resource Management, 10, 488–500.
Hausknecht, J. P., Halpert, J. A., Di Paolo, N. T., & Moriarty Gerrard, M. O. (2007). Retesting in selection: A
meta-analysis of coaching and practice effects for tests of cognitive ability. Journal of Applied Psychology,
92, 373–385.
Health Insurance Portability and Accountability Act of 1996. Public Law No. 104-91, 110 Stat. 1936.
Herriot, P., & Anderson, N. (1997). Selecting for change: How will personnel and selection psychology sur-
vive? In N. Anderson & P. Herriot (Eds.), International handbook of selection and assessment (pp. 1–38).
London, England: Wiley.
Klehe, U. C. (2004). Choosing how to choose: Institutional pressures affecting the adoption of personnel selec-
tion procedures. International Journal of Selection and Assessment, 12, 327–342.
LePine, J. A., Hollenbeck, J. R., Ilgen, D. R., & Hedlund, J. (1997). Effects of individual differences on the
performance of hierarchical decision-making teams: Much more than g. Journal of Applied Psychology,
82, 803–811.
Lievens, F., Reeve, C. L., & Heggestad, E. D. (2007). An examination of psychometric bias due to retesting on
cognitive ability tests in selection settings. Journal of Applied Psychology, 92, 1672–1682.
Mendenhall, M., & Oddou, G. (1985). The dimensions of expatriate acculturation: A review. The Academy of
Management Review, 10, 39–47.
Mol, S. T. (2007). Crossing borders with personnel selection: From expatriates to multicultural teams.
Unpublished dissertation, Rotterdam, The Netherlands: Erasmus University.
234 Handbook of Employee Selection

Morgeson, F. P., Reider, M. H., & Campion, M. A. (2005). Selecting individuals in team settings: The impor-
tance of social skills, personality characteristics, and teamwork knowledge. Personnel Psychology, 58,
583–611.
Mumford, T. V., Van Iddekinge, C. H., Morgeson, F. P., & Campion, M. A. (2008). The team role test:
Development and validation of a team role knowledge situational judgment test. Journal of Applied
Psychology, 93, 250–267.
Oliver, C. (1997). Sustainable competitive advantage: Combining institutional and resource-based views.
Strategic Management Journal, 18, 697–713.
Ostroff, C., & Rothausen, T. J. (1996). Selection and job matching. In D. Lewin, D. J. B. Mitchell, & M. A.
Zaidi (Eds.), Human resource management handbook (pp. 3–52). Greenwich, CT: JAI Press.
Ployhart, R. E. (2007). Organizational staffing: A multilevel review, synthesis, and model. In G. R. Ferris &
J. Martocchio (Eds.), Research in personnel and human resource management (Vol. 23). Oxford,
England: Elsevier.
Ployhart, R. E., & Schneider, B. (2002). A multi-level perspective on personnel selection research and practice:
Implications for selection system design, assessment, and construct validation. In F. J. Yammarino &
F. Dansereau (Eds.), The many faces of multi-level issues: Research in multi-level issues (Vol. 1,
pp. 95–140). Oxford, England: Elsevier.
Ployhart, R. E., & Schneider, B. (2005). Multilevel selection and prediction: Theories, methods, and models.
In A. Evers, N. Anderson, & O. Voskuijl (Eds.), The Blackwell handbook of personnel selection (pp.
495–516). Oxford, England: Blackwell.
Roe, R. A. (2005). The design of selection systems—Context, principles, issues. In A. Evers, N. Anderson, &
O. Smit (Eds.), Handbook of personnel selection (pp. 73–97). Oxford, England: Blackwell.
Schneider, B., Smith, D. B., & Sipe, W. P. (2000). Personnel selection psychology: Multilevel considerations.
In K. J. Klein & S. W. J. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations:
Foundations, extensions, and new directions. San Francisco, CA: Jossey-Bass.
Scott, W. R. (1995). Institutions and organizations. Thousand Oaks, CA: Sage.
Stevens, M. J., & Campion, M. A. (1994). The knowledge, skill and ability requirements for teamwork:
Implications for human resource management. Journal of Management, 20, 503–530.
Tippins, N. T. (2002). Issues in implementing large-scale selection programs. In J. W. Hedge & E. D. Pulakos
(Eds.), Implementing organization interventions: Steps, processes, and best practices (pp. 232–269). San
Francisco, CA: Jossey-Bass.
Welch, D. (1994). Determinants of international human resource management approaches and activities: A
suggested framework. Journal of Management Studies, 31, 139–164.
11 The Business Value of
Employee Selection

Wayne F. Cascio and Lawrence Fogli

Hiring good people is hard. Hiring great people is brutally hard. And yet nothing matters more in
winning than in getting the right people on the field. All the clever strategies and advanced technologies
in the world are nowhere near as effective without great people to put them to work.
Jack Welch
Winning (2005, p. 81)

Industrial and organizational (I-O) psychologists have played major roles in developing selection (or
staffing) tools and implementing selection programs at every level for organizations of every size
and in every industry, domestic and multinational. This chapter focuses on evaluating, monitoring,
and managing the business value of employee selection. We begin by offering some general com-
ments about the traditional model of employee selection, or staffing, its focus, and its components,
with particular emphasis on selection as a dynamic organizational process and the rationale for
evaluating the business value of employee selection. We then consider what managers know about
employee selection, the different perspectives of I-O psychologists and managers, and what both
groups should know about the value of employee selection. Following that discussion, we present a
decision-based framework that illustrates the logic of employee selection, with particular emphasis
on assessing the outcomes of selection efforts. Such outcomes may be expressed in qualitative or
quantitative terms, and we illustrate both. We also illustrate the potential payoffs associated with
valid selection programs over multiple cohorts and time periods, and the assessment of employee
performance in economic terms. We conclude with a set of recommendations for managing and
monitoring the business value of employee selection, including trade-offs among managerial con-
cerns for “better, faster, cheaper, with less adverse impact.”

TRADITIONAL MODEL OF EMPLOYEE SELECTION


I-O psychologists have developed a general approach to employee selection that has evolved over
many decades (Cascio & Aguinis, 2010). Essentially, it consists of defining the work to be done, iden-
tifying individual-level characteristics that are hypothesized to predict performance with respect to
the work to be done, and developing measurement instruments to assess the relative standing of job
applicants on each of the individual-level characteristics (Binning & Barrett, 1989). Then applicants
are rank-ordered based on their relative standing, and those with the best scores are selected for the
job in what is called a “top-down” fashion.
Over time, various instruments to predict future job performance have appeared, and such
instruments are quite diverse, as noted in a recent review (Cascio & Aguinis, 2008). In general,
they assess information collected directly from job applicants or indirectly from other sources (e.g.,
past employers). Some types of measures are typically used at the beginning stages of the selection

235
236 Handbook of Employee Selection

process as prescreening devices. A set of measures that is consistent with the current staffing
model (Schmidt & Hunter, 1998) may include biographical data collected using application blanks
(Schmitt et al., 2007), integrity tests (Berry, Sackett, & Wiemann, 2007), and drug testing (Haar &
Spell, 2007). Often those job applicants who successfully complete the initial screening stage may
be required to pass a background check (Connerley, Arvey, & Bernardy, 2001) and, if they do,
they may be given paper-and-pencil or computer-administered tests that assess their general mental
abilities (GMAs) and personality traits (Behling, 1998; Frase, 2007; Hogan, Davies, & Hogan, 2007;
Morgeson et al., 2007), followed by an interview (Chapman & Zweig, 2005). Finally, for managerial
and other high-level jobs, there may be an additional stage, including a work-sample test (Roth,
Bobko, & McFarland, 2005) or an assessment center (Lance, Woehr, & Meade, 2007), in which
applicants must demonstrate specific knowledge and skills by performing a limited number of job-
related tasks in a controlled environment. With respect to work-sample tests, we do not mean to
imply that they are appropriate only for high-level jobs. They can also be extremely valid predictors
of performance in other jobs, such as craft jobs (electricians, plumbers, mechanics) and customer
service jobs (Cascio & Aguinis, 2010).
In the past, organizations have benefited from this traditional approach of “pick good people to
get good performance,” but the changing workplace is dramatically redefining personnel selection
(Pearlman & Barney, 2000). The next section describes some of the challenges to the business value
of that approach.

CHALLENGES TO THE BUSINESS VALUE OF THE


TRADITIONAL APPROACH
The following list describes seven such challenges:

1. Past behavior may not always predict future behavior (behavioral consistency), particu-
larly if the new job differs in the types of personal characteristics necessary for successful
performance. Past behavior that is relevant to future performance may predict that perfor-
mance effectively.
2. Selection decisions about people and jobs are not independent events in an organization.
Indeed, the broader business value of selection is often linked to other human resources
(HR) processes, such as training, promotion, special assignments, staff reductions, career
development, and succession planning.
3. Hiring managers do not always hire the best scorers. Validated selection techniques are
rarely the only source of information for selection decision-making.
4. Jobs are changing faster than we can do validation studies.
5. Assessing the business value of selection is complex, because different constituents—
managers, applicants, HR professionals, and those who implement selection systems—value
different outcomes.
6. The social context and social psychological processes of selection decisions are often
ignored in the traditional approach. Interpersonal processes in group decision-making are
extremely important to the implementation of selection systems. For example, a particular
decision-maker’s position power, influence, and interpersonal attraction to another person
may be important to understand in selecting employees.
7. Utility calculations that estimate economic returns on investments for valid selection tech-
niques are not widely accepted or understood by business managers. Managers often do
not believe the magnitude of the estimated returns because of their size and also because
of the use of complex formulas with too many estimates and assumptions. To many, the
dollar returns associated with improved performance are not “tangible,” and certainly less
so than the dollars in one’s departmental budget. All of this suggests that few organiza-
tions, if any, view the costs related to selection as investments; rather, they consider them
The Business Value of Employee Selection 237

as expenses. Beyond that, validity coefficients of equal size, say, 0.35, are not necessarily
equally valuable to decision-makers if they reference different criteria. A sales manager,
for example, may or may not view a validity of .35 for predicting organizational citizenship
behaviors as equal in value to a validity of .35 for predicting the dollar volume of sales.1

DYNAMIC, CONTEMPORARY APPROACH TO SELECTION


AS AN ORGANIZATIONAL PROCESS
This section presents a broader framework for understanding and valuing selection as an organiza-
tional process. Rather than considering selection as an independent event whose sole purpose is to
identify the best people to perform specific jobs, we propose a broader, macro approach to the busi-
ness “value added” of the selection process. This contemporary approach integrates the traditional
approach as a “good start” for designing selection systems (the use of validated selection tools),
but certainly not “the end.” We begin our discussion by examining four contemporary drivers that
frame selection as a dynamic organizational process:

1. Dynamic change and change management


2. Expectations of multiple organizational stakeholders
3. Selection beyond hiring and HR management
4. The importance of social context and interpersonal processes in selection decisions

DYNAMIC CHANGE AND CHANGE MANAGEMENT


Selection procedures need to be flexible and adaptable to changing organizations. A significant future
human capital challenge will be recruiting, staffing, and retention (Society of Human Resources
Management Foundation, 2007). The speed of organizational change has been discussed previ-
ously in this chapter. Numerous authors have cited the drivers of unprecedented change as chang-
ing demographics, the speed of technological change, increased customer expectations, increased
competition and globalization, and increased shareholder pressure as having the greatest impact
on people and jobs (Pearlman & Barney, 2000; Cascio & Aguinis, 2008; Kraut & Korman, 1999;
Cascio & Fogli, 2004; Fogli, 2006).
The message is clear: The traditional, static model of selection needs to be “reinvented” or “reen-
gineered” to select people to perform changing jobs in changing organizations. No job or career
today is “safe and secure.” The value of selection for an organization is predicated on how people
perform in the context of changing organizations. Several authors have presented models of job-
person, team-person, and organization-person assessments (Anderson, Lievens, van Dam, & Ryan,
2004; Cascio & Aguinis, 2008; Pearlman & Barney, 2000). Organizations have merged, acquired,
downsized, and reorganized to become more flexible, adaptable, efficient, and high performing.
The impact of change on talent-acquisition for jobs is two-fold: (a) new jobs are created and old jobs
are redefined, enriched, or eliminated; and (b) people are recruited, selected, trained, developed, or
eliminated.
Pearlman and Barney (2000) noted some significant outcomes of these changes for selection
processes:

• Increased use of performance competencies (variables related to overall organizational fit,


as well as personality characteristics consistent with the organization’s vision (Schippmann
et al., 2000)
• The valued placed on intellectual capital and learning organizations
• The value of speed, process improvement, and customer services

1 We would like to thank Jerard F. Kehoe for suggesting these last two points.
238 Handbook of Employee Selection

They offered a contemporary model of work performance with two distinguishable compo-
nents: (a) task performance of a specific job, and (b) contextual performance—performance
related to organizational and social performance activities. Contextual performance includes
three levels of analysis: external, organizational, and the immediate work or job context. Table
11.1 describes their model of worker-attribute categories needed to predict success beyond job
performance per se.
The key challenge in predicting performance at any level is that our current selection methods
have demonstrated limited usefulness, despite 80 years of staffing research. Limitations of the cur-
rent approach include the following (Cascio & Aguinis, 2008): a near exclusive focus at the level
of the individual, the assumption of behavioral consistency, a focus on thin slices of behavior and
behavior that may not be representative of actual performance on a job, selection systems that pro-
duce high levels of adverse impact, overestimation of expected economic payoffs from the use of
valid selection procedures, and limited applicability of the traditional model when applied to execu-
tives and expatriates.

TABLE 11.1
Definitions and Examples of Work-Performance Model Worker—Attribute Categories
Attribute Category Definition Examples
Aptitude and abilities Capacity to perform particular classes or categories Cognitive, spatial/perceptual,
of mental and physical functions psychomotor, sensory, and physical
abilities
Workplace basic skillsa Fundamental developed abilities that are required Reading, writing, and arithmetic or
to at least some degree in virtually all jobs computational skills
Cross-functional skills Various types of developed generic skills that are Oral communication, problem analysis,
related to the performance of broad categories of interpersonal skills, negotiating,
work activity and that tend to occur across information gathering, organizing,
relatively wide ranges of jobs planning, and teamwork skills
Occupation-specific Developed ability to perform work activities that Ability to read blueprints, to repair
skills occur across relatively narrow ranges of jobs or electrical appliances, to operate a milling
are defined in relatively job- or activity-specific machine, to operate a forklift, to do word
terms processing
Occupation-specific Understanding or familiarity with the facts, Knowledge of financial planning and
knowledge principles, processes, methods, or techniques analysis, fire-protection systems,
related to a particular subject area, discipline, computer graphics, data communication
trade, science, or art; includes language networks, patent law, Spanish, COBOL,
proficiency spreadsheet software
Personal qualities (also An individual’s characteristic, habitual, or typical Adaptability, empathy, conscientiousness,
known as personality manner of thinking, feeling, behaving, or self-esteem, autonomy, sociability,
traits, temperaments, responding with respect to self and others, service orientation, emotional stability,
or dispositions) situations, or events integrity, honesty
Values Goals, beliefs, or ideals an individual holds as Empowerment, cooperation, achievement,
important and that function as the standards or initiative, work ethic
criteria by which he or she evaluates things
Interests An individual’s characteristic work-related
Realistic, investigative, artistic, social,
preferences or likes and dislikes regarding
enterprising, and conventional
specific (or classes of) work activities

a Workplace basic skills are differentiated from aptitudes and abilities because of their significant knowledge and learning
components.
Source: From Pearlman, K., & Barney, M., Selection for a changing workplace, in J. Kehoe (Ed.), Managing selection in
changing organizations: Human resource strategies, Jossey-Bass, San Francisco, CA, 2000. With permission.
The Business Value of Employee Selection 239

Although many existing selection methods perform well, we believe that if selection strategies
are to be more useful in response to rapidly changing organizations, then we need to broaden our
perspectives of the relevant criterion space and criterion constructs, from a focus on predicting task
performance per se, to the kinds of characteristics identified in Table 11.1. Indeed, the construct of
adaptability (Pulakos, Arad, Donovan, & Plamondon, 2000) may well be a key predictor of success
in a rapidly changing organization.
A related consideration for utility researchers is the meaning of the parameter t in the general
utility equation (see Cascio & Boudreau, 2008). Traditionally that parameter represents the average
tenure of individuals in a given job. Perhaps a more nuanced view is to define t as the length of time
that the constructs measured by the current selection system remain relevant. The faster that jobs
and organizations change, the lower the value of t.

EXPECTATIONS OF MULTIPLE ORGANIZATIONAL STAKEHOLDERS


There are several stakeholders, direct and indirect, who have similar and sometimes different
expectations of the value of selection systems. Among the major stakeholders are line managers,
HR staff, those who implement selection procedures, and applicants. Jayne (2000) has described
the values of these different stakeholders. Line managers value benchmarking evidence about the
“best” selection systems, administrative efficiency (cycle time to fill a position), process metrics
(costs and results), and process reliability to meet the needs of different organizational units. Those
who implement selection systems primarily value administrative efficiency and process metrics.
Applicants value administrative efficiency, relationship to the job, fairness of the process, and qual-
ity of the information received about the job. HR managers are particularly concerned that selection
systems be aligned with diversity and affirmative action goals.
These various constituents of employee selection have similar, but sometimes competing, values and
expectations. Balancing these competing needs is critical to implementing successful selection systems.

SELECTION: BEYOND HIRING AND HR MANAGEMENT


Employee selection is more complex than hiring a qualified employee to perform a particular job.
Higgs, Papper, and Carr (2000) emphasized the important point that selection processes and tech-
niques are often keys to the effective execution of other HR processes. Below is Table 11.2 from
Higgs et al. (2000), describing how other HR processes depend on selection.

TABLE 11.2
HR Processes That Depend Upon Selection
Hiring Multiple-stage process using various techniques and types of information for
mutual selection decision by organization and candidate
Training Selection for participation in particular training programs
Superior performance evaluation Selection for limited-frequency ratings or for distribution-controlled rankings
Promotion Selection for limited promotional opportunities or for job families or job levels
with limited population sizes
Special assignments Selection for assignments to task forces, committees, special projects
Career development Selection for development processes, programs, or mentors
Succession planning Selection for inclusion in replacement-planning or succession-planning databases
or management-planning sessions

Source: From Higgs, A. C., Papper, E. M., & Carr, L. S., Integrating selection with other organizational processes and
systems, in J. Kehoe (Ed.), Managing selection in changing organizations: Human resource strategies, Jossey-
Bass, San Francisco, CA, 2000. With permission.
240 Handbook of Employee Selection

As stated earlier, selection is a key component in the overall life cycle of an individual’s employ-
ment with an organization. That life cycle includes changing jobs and changing people.

IMPORTANCE OF SOCIAL CONTEXT AND INTERPERSONAL


PROCESSES IN SELECTION DECISIONS
Ramsay and Scholarios (1999) challenged the traditional I-O psychology selection process for being
too micro in its orientation and for failing to integrate the social context and interpersonal processes
into selection decisions. Beyond the individual differences of job applicants, these authors argue
(and we agree) that the cognitive processes of key decision-makers, organizational characteristics,
group processes, and contextual factors constrain and shape a manager’s actual staffing decisions.
In contrast, the traditional psychometric paradigm of selection necessarily assumes that: (a)
effective (and ineffective) performance in most jobs can be reduced to relatively stable, observable
behaviors and static job demands; (b) intra- and interindividual differences in human capacities
(knowledge, skills, abilities, and other characteristics, or KSAOs) account for most differences in
job performance; and, consequently, that (c) effective staffing decisions depend largely on the effi-
cient processing of information about job-related human capacities.
Actual selection decisions are made based on the social context as well as individual differences.
Boudreau, Sturman, and Judge (1994) and others, including Skarlicki, Latham, and Whyte (1996),
Latham and Whyte (1994), and Whyte and Latham (1997), have raised serious concerns about the
ways that hiring managers actually use selection information in decision-making, specifically, about
how they use “rational” selection data (e.g., test scores). There may be only a weak link between
rational selection information and actual selection decisions. Therefore managers may actually
ignore valid information in their decisions to adopt particular information-gathering procedures,
being more receptive to other, “unscientific” sources of persuasion (Ramsay & Scholarios, 1999).
In fact, important social-psychological phenomena operate in selection decisions, including
interpersonal attraction, interviewer biases in processing information, the power and influence of
managers/executives to shape the perceptions of others, and the inclusion of nonjob-specific behav-
iors (e.g., organizational citizenship and pro-social behaviors) as important selection criteria for
hiring managers (Borman et al. 1997; Motowidlo, 2003; Anderson et al. 2004).
In light of these changes, and the contemporary view of selection as a dynamic organizational
process, it is important that we articulate the rationale for evaluating the business value of employee
selection. The next section considers that topic in greater detail.

RATIONALE FOR EVALUATING THE BUSINESS VALUE OF EMPLOYEE SELECTION


As Rynes, Giluk, and Brown (2007) have noted, management is not truly a profession like medi-
cine, education, or law. There is no requirement that managers be exposed to scientific knowledge
about management, that they pass examinations to become licensed to practice, or that they pursue
continuing education to be allowed to maintain their practice. Although they might not be familiar
with statistical terminology and methodology, the language of science, managers tend to be very
smart people who grasp ideas quickly and process information critically and analytically. To many
of them, employee selection is a cost, not an investment, and, as with any other area of business, they
want to minimize their costs. This is the origin of the mindset and desire of managers for selection
methods that are “better, faster, cheaper, with less adverse impact.”
As we shall demonstrate below, many, if not most, assessments of the outcomes of employee-
selection efforts are expressed in statistical terms, at least in the scientific literature. Because extremely
few managers read such literature, including academic publications (Rynes, Colbert, & Brown, 2002),
they are simply unaware of much of this potentially valuable information. Managers and academics
exist in different “thought worlds” (Cascio, 2007); therefore, an ongoing challenge is to educate
managers about the business value of selection efforts and to enable them to see those efforts as
The Business Value of Employee Selection 241

investments that will generate a stream of benefits over time. We hasten to add that the term “business
value” does not imply that all outcomes must be expressed exclusively in monetary or quantitative
terms. Indeed, as we shall demonstrate, much of the business value of selection may be expressed in
qualitative terms (e.g., improvements in customer service, team dynamics, or innovations).

ASSESSING THE OUTCOMES OF EMPLOYEE SELECTION


In theory, there are multiple strategies for assessing the outcomes of employee selection. In general,
they comprise two broad categories: quantitative (or statistical) and qualitative (or behavioral). Four
common statistical approaches to evaluation are validity coefficients, effect sizes, utility analyses,
and expectancy charts. Of these, validity coefficients and effect sizes are currently most popular.
Validity coefficients are typically expressed in terms of Pearson product-moment correlation
coefficients that summarize the overall degree of linear relationship between two sets of scores:
those on the predictor in question (e.g., a test of GMA) and a criterion (some measure of job perfor-
mance). Chapters 2, 3, and 5 in this volume address the concept of validity, the validation process,
and validation strategies in considerable detail, so we need not repeat that information here.
Using the methods of meta-analysis (Le, Oh, Shaffer, & Schmidt, 2007; Schmidt & Hunter, 2003)
to express cumulative results across validity studies that have used the same predictor over time and
situations, researchers typically have expressed their results in statistical (i.e., correlational) terms.
For example, summarizing the results of 85 years of research findings in employee selection, Schmidt
and Hunter (1998) reported that the top ten predictors of subsequent job performance are GMA tests
(meta-correlation of .51), work-sample tests (.54), integrity tests (.41), conscientiousness tests (.31),
structured employment interviews (.51), unstructured employment interviews (.38), job-knowledge
tests (.48), job-tryout procedures (.44), peer ratings (.49), and ratings of training and experience (.45).
Some validity studies express outcomes in terms of effect sizes. An effect size expresses the degree
to which a phenomenon is present in a population of interest, or, alternatively, the degree to which a
null hypothesis is false (Cohen, 1977). The null hypothesis (“no difference”) always means that the
effect size is zero, as when two tests are compared to determine which one is the better predictor of
some criterion of job performance. Regardless of which statistic is used to compare the results of the
tests (e.g., Pearson product-moment correlation, t, z, or F), each has its own effect-size index. The only
requirement for an effect-size index is that it be a pure (dimensionless) number, one not dependent on
the units of the measurement scales (Cohen, 1977). Examples include the population correlation coef-
ficient or the difference between two means expressed in units of standard deviation. Many studies in
the behavioral sciences express outcomes in terms of effect sizes (e.g., see Murphy & Myors, 1998).
Many operating executives may be unfamiliar with validity coefficients and effect sizes. Even
when they are, they may view these indexes as too abstract from which to draw implications about
the effects of employee-selection efforts on their businesses. In such situations, utility analyses and
expectancy charts may be valuable, for they express the outcomes of selection in monetary terms or
in terms of the likelihood of success on a job, given a particular level of performance on a selection
procedure. We consider each of these approaches in the following sections.

Utility Analyses
The utility of a selection device is the degree to which its use improves the quality of the individuals
selected beyond what would have occurred had that device not been used (Taylor & Russell, 1939).
Because the technical details of utility analysis have been addressed elsewhere (Boudreau, 1991;
Boudreau & Ramstad, 2003; Cabrera & Raju, 2001; Cascio, 2000, 2007; Cascio & Boudreau, 2008),
we focus here only on the logic of utility analysis as illustrated in Figure 11.1.
At its core, utility analysis considers three important parameters: quantity, quality, and cost.
The top row of Figure 11.1 refers to the characteristics of candidates for employment as they flow
through the various stages of the staffing process. At each stage, the candidate pool can be thought
of in terms of the quantity of candidates, the average and dispersion of the quality of the candidates,
242 Handbook of Employee Selection

Talent
flows

Potential Labor Applicant Candidates Offer New Productive


labor pool pool pool for further candidates hires employees
evaluation

Building and Offering and


Recruiting Screening Selecting On-boarding
planning closing

Staffing
processes
FIGURE 11.1 The logic of utility analysis. (From Cascio, W. F., & Boudreau, J. W., Investing in people:
Financial impact of human resource initiatives, Pearson, Upper Saddle River, NJ, 2008. With permission.)

and the cost of employing the candidates. For example, the “applicant pool” might have a quantity
of 100 candidates, with an average quality value of $50,000 per year and a variability in quality
value that ranges from a low of $25,000 to a high of $100,000. This group of candidates might have
an anticipated cost (salary, benefits, training, and so on) of 70% of their value. After screening and
selection, the “offer candidates” might have a quantity of 50 who receive offers, with an average
quality value of $65,000 per year, ranging from a low of $35,000 to a high of $100,000. Candidates
who receive offers might require employment costs of 80% of their value, because they are highly
qualified and sought-after individuals. Eventually, the organization ends up with a group of “new
hires” (or promoted candidates in the case of internal staffing) that can also be characterized by
quantity, quality, and cost.
Similarly, the bottom row of Figure 11.1 reflects the staffing processes that create the sequen-
tial filtering of candidates. Each of these processes can be thought of in terms of the quantity of
programs and practices used, the quality of the programs and practices, as reflected in their abil-
ity to improve the value of the pool of individuals that survives, and the cost of the programs and
practices in each process. For example, as we have seen, the quality of selection procedures is often
expressed in terms of their validity, or accuracy in forecasting future job performance. Validity may
be increased by including a greater quantity of assessments (e.g., a battery of selection procedures),
each of which focuses on an aspect of KSAOs that has been demonstrated to be important to suc-
cessful performance on a job. Higher levels of validity imply higher levels of future job performance
among those selected or promoted, thereby improving the overall payoff to the organization. As
a result, those candidates that are predicted to perform poorly never get hired or promoted in the
first place. Decision-makers naturally focus on the cost of selection procedures because they are
so vividly depicted by standard accounting systems; but the cost of errors in selecting, hiring, or
promoting the wrong person is often much more important.
Utility analysis has achieved limited success in translating the value of valid selection pro-
cedures into terms that managers and organizational decision-makers understand (Jayne, 2000).
Unfortunately, in many cases such analyses lack credibility because of complex formulas and dol-
lar-based return-on-investment analyses that seem “too good to be true” (Ashe, 1990; Cascio, 1993;
Schmitt & Borman, 1993). Indeed, one may logically ask, if the return on investment associated
with such programs is so high, then why don’t all companies invest substantial amounts of resources
in them? The answer is that the actual returns are likely to be considerably lower than the estimated
returns, because researchers have tended to make simplifying assumptions with regard to variables
like economic factors that affect payoffs and to omit others that add to an already complex mix of
factors.
The Business Value of Employee Selection 243

Some of those other factors are economic factors (the effects of taxes, discounting, and variable
costs), employee flows into and out of the workforce, probationary periods (the difference in perfor-
mance between the pool of employees hired initially and those who survive a probationary period),
the use of multiple selection devices, and rejected job offers. One study used computer simula-
tion of 10,000 scenarios, each of which comprised various values of these five factors (Sturman,
2000). Utility estimates were then computed using the five adjustments applied independently. The
median effect of the total set of adjustments was -91% (i.e., the adjusted values were, on average,
91% lower than the unadjusted values), with a minimum effect of -71% and negative estimates 16%
of the time. Although most utility estimates for the simulated scenarios remained positive, the five
modifications had sizable and noteworthy practical effects. These results suggest that although valid
selection procedures may often lead to positive payoffs for the organization, actual payoffs depend
significantly on organizational and situational factors that affect the quantity, quality, and cost of
the selection effort.

Expectancy Charts and Performance Differences Between High and Low Scorers
Expectancy charts allow managers to see graphically, for example, the likelihood that each
quintile of scorers on an assessment procedure will perform successfully on a job. More formally,
organizational or institutional expectancy charts depict the likelihood of successful criterion
performance to be expected from any given level of predictor scores. Individual expectancy charts
depict the likelihood of successful criterion performance to be expected by an individual score
at any given level on an assessment procedure. Figure 11.2 shows these two types of expectancy
charts. Figure 11.2a is an organizational expectancy chart, whereas Figure 11.2b is an individual
expectancy chart.

(a) 20% 82%

40% 71%
Select top

60% 65%

80% 60%

All 53%

0% 25% 50% 75% 100%


% Successful

(b)
Equal to or above 1852 82%

1827 - 1851 61%


Score range

1804 - 1826 53%

1779 - 1803 43%

Equal to or below 1778 26%

0% 25% 50% 75% 100%


Likelihood of success

FIGURE 11.2 (a) Organizational and (b) individual expectancy charts.


244 Handbook of Employee Selection

The organizational expectancy chart provides an answer to the question, “Given a selection
ratio of .20, .40, .60, etc., what proportion of successful employees can be expected if the future is
like the past?” Such an approach is useful in attempting to set cutoff scores for future hiring pro-
grams. In similar fashion, the individual expectancy chart illustrates the likelihood of successful
criterion performance for an individual whose score falls within a specified range on the predictor
distribution.
Computational procedures for developing empirical expectancies are straightforward, and theo-
retical expectancy charts are also available (Lawshe & Balma, 1966). In fact, when the correlation
coefficient is used to summarize the degree of predictor-criterion relationship, expectancy charts
are a useful way of illustrating the effect of the validity coefficient on future hiring decisions. In
situations in which tests have only modest validities for predicting job performance, test-score dif-
ferences that appear large will correspond to modest scores on the expectancy distribution, reflect-
ing the modest predictability of job performance from test scores (Hartigan & Wigdor, 1989).
Another way to demonstrate the business value of selection—in this case, the value of a testing
program—is to compare performance differences between individuals who score at the top and
bottom of the test-score distribution on job-related criteria. For example, managers can learn that
a bank teller who scored in the top 80% on a test will serve 1,791 customers and refer 22 new cus-
tomers in 1 month, compared to the bottom 20% of test scorers, who will serve only 945 customers
and refer only 10 new customers (People Focus, 1998). Table 11.3 shows performance differences
in job-related criteria across three companies in a Food Marketing Institute study of supermarket
cashiers (Food Marketing Institute, 1985).

TABLE 11.3
Job-Performance Differences Among Supermarket Cashiers
Who Score at the Top and Bottom 50% of a Test Battery
Company Average Score Value
A Amount over or under Top 50% 1.53
Bottom 50% 2.18
Items per minute Top 50% 19.15
Bottom 50% 17.43
Rings per minute Top 50% 18.18
Bottom 50% 17.33
Number of voids Top 50% 7.17
Bottom 50% 9.08
B Amount over or under Top 50% 1.55
Bottom 50% 2.37
Items per minute Top 50% 21.47
Bottom 50% 17.67
Rings per minute Top 50% 18.29
Bottom 50% 16.01
Number of voids Top 50% 6.84
Bottom 50% 10.99
C Amount over or under Top 50% 1.47
Bottom 50% 1.94
Items per minute Top 50% 21.60
Bottom 50% 18.63
Rings per minute Top 50% 15.27
Bottom 50% 15.92
Number of voids Top 50% 5.83
Bottom 50% 5.73
The Business Value of Employee Selection 245

In our opinion, expectancy charts, together with illustrations of performance differences between
high- and low-scoring individuals on an assessment procedure, provide credible, tangible evidence
of the business value of selection.

Qualitative (Behavioral) Approaches to Assessing the


Outcomes of Employee-Selection Programs
Qualitative outcomes can help to enrich our understanding of the actual operation of selection
programs, including their efficiency and effectiveness. Qualitative outcomes can also contribute to
the nomological network of evidence that supports the construct validity of selection instruments.
That network relates observable characteristics to other observables, observables to theoretical con-
structs, or one theoretical construct to another theoretical construct (Cronbach & Meehl, 1955).
Information relevant either to the construct itself or to the theory surrounding the construct can
be gathered from a wide variety of sources. Here is a practical example (Fisher, 2005). In 2002, J.D.
Power’s customer-satisfaction surveys ranked T-Mobile dead last in its industry, trailing Verizon,
Cingular, Nextel, and Sprint. The first step toward improvement was to bring together T-Mobile’s
HR people and its marketing managers to sit down and talk. The idea was to change the com-
pany’s hiring practices in an effort to improve the quality of customer service representatives who
would be willing and able to follow through on the promises that marketing representatives made
to customers.
Although this might sound like common sense, in practice the customer-contact people did not
report to anyone in marketing or have any contact with them. Nor did anyone in HR, so HR was not
able to understand the needs of managers in customer service, who, in turn, need people in place
who can deliver on the marketers’ message.
As a result of the in-depth discussions among representatives from customer service, HR, and
marketing, T-Mobile instituted a new set of hiring criteria that emphasized traits like empathy and
quick thinking. After all, customers want their problems resolved fast, in one phone call, and in
a courteous manner. In addition, T-Mobile made sure that all employees knew exactly how they
would be evaluated. By ensuring that HR and marketing were in sync, the company found that its
employee-incentive plans also worked well, because hiring, performance management, and rewards
all were linked to a common message and a common theme.
The broad-based effort paid off. By 2005, attrition and absenteeism each dropped 50% relative
to 2002, while productivity tripled. As for T-Mobile’s formerly exasperated customers, J.D. Power
ranked T-Mobile number 1 in customer service for 2 years running. This example illustrates
nicely how qualitative outcomes can help to enrich our understanding of the actual operation of
selection programs, including their efficiency and effectiveness. That approach certainly helped
T-Mobile.

WHAT MANAGERS KNOW (AND DO NOT KNOW) ABOUT EMPLOYEE SELECTION


Here are six well-established findings in the field of I-O psychology regarding employee selection.
A study of nearly 1,000 HR vice presidents, directors, and managers found that more than 50% of
them actively disagreed with or did not know about the findings (Rynes, Colbert, & Brown, 2002).

1. Intelligence predicts job performance better than conscientiousness (Schmidt & Hunter,
1998).
2. Screening for intelligence results in higher job performance than screening for values or
values fit (Schmidt & Hunter, 1998; Meglino & Ravlin, 1998).
3. Being very intelligent is not a disadvantage for performing well on a low-skilled job
(Hunter, 1986; Schmidt & Hunter, 1998).
4. Personality inventories vary considerably in terms of how well they predict applicants’ job
performance (Barrick & Mount, 1991; Gardner & Martinko, 1996).
246 Handbook of Employee Selection

5. Integrity tests successfully predict whether someone will steal, be absent, or otherwise
take advantage of employers, although individuals can “fake good” on them (Ones,
Viswesvaran, & Schmidt, 1993; Ones, Viswesvaran, & Reiss, 1996).
6. Integrity tests do not have adverse impact on racial minorities (Ones & Viswesvaran,
1998).

Needless to say, these findings are disturbing, for they indicate that HR vice presidents, directors,
and managers live in a very different world from that of I-O psychologists. To bridge that gap, the
Society for Human Resource Management (SHRM) Foundation (half of whose Board of Directors
are I-O psychologists—academics and practitioners) has pursued an aggressive strategy over the
past 5 years. It commissions reviews of the professional literature in key HR areas (e.g., performance
management, employee selection, retention, reward strategies, employee engagement, and commit-
ment) by knowledgeable professionals and has them “translate” the results of published research
into practical guidelines. An academic and a practitioner review each draft the report to ensure that
it is well organized and jargon-free, the findings are presented clearly, and the implications of the
findings for professional practice are highlighted. The name of this initiative is “Effective Practice
Guidelines,” and each report may be downloaded in PDF form from http://www.shrm.org/about/
foundation/Pages/default.aspx.
We know that senior-level managers are extremely aware of the importance of hiring the right people
for their organizations. For example, the 2006 CEO Briefing by the Economist Intelligence Unit (EIU)
highlights significant management challenges that face the world’s corporate leaders on the basis of
responses from 555 senior executives from 68 countries. Two of the most critical ones are as follows:

1. Recruitment and retention of high-quality people across multiple territories, particularly as


competition for top talent grows more intense
2. Improving the appeal of the company culture and working environment

In fact, respondents in the 2006 EIU study, by 45–55%, said that they expect to spend more on
people than on technology in the next 3 years. That’s an important change that highlights the impor-
tance of staffing issues.
More recently, the SHRM Foundation engaged the Hay Group to conduct research to identify the
most pressing human capital challenges faced by chief human resources officers and other C-suite
executives (Hay Group, 2007). There were two key phases of the research: in-depth interviews with
36 C-suite executives and an online survey with responses from 526 C-suite executives. Sixty-nine
percent of the respondents said that recruiting and selecting talented employees to positions was
one of the top three human capital challenges that could affect the ability of their organizations to
achieve strategic objectives.
Together, these results suggest that, whether an organization is purely domestic or international
in its scope of operations, senior managers recognize the critical importance of employee selection
to the achievement of their strategic objectives. There is therefore a pressing need and a ripe oppor-
tunity for I-O psychologists to have a major impact on organizations by demonstrating the business
value of employee selection. Executives need this information to make informed decisions about
selection tools and processes, and they have never been more receptive to it than now, in light of the
key human capital challenges they are facing.

BENCHMARKING CURRENT PRACTICES


Cascio and Fogli (2004) summarized an earlier study by HR Strategies of assessment methods
used to select entry-level managers. Forty-three companies participated, 22 industries were repre-
sented, and 77 programs were described. Table 11.4 shows the most common procedures used to
hire entry-level managers.
The Business Value of Employee Selection 247

TABLE 11.4
Historical Best-Practices Study: Most Common Procedures
Used to Select Entry-Level Managers
Selection Tools Percentage of Programs
Site interview 99
Resume/experience/education 71
Drug screen 59
Campus interview 40
Tests 40
Background check 36
Job performance 32
Other 31
References 28
Academics/grade point average 25
Work simulation 13
Job trial/probation 7
Assessment center 7

Source: Data from Cascio, W., & Fogli, L., Talent acquisition: new realities of
attraction, selection, and retention. Preconference workshop presented
at the annual conference of the Society for Industrial Organizational
Psychology, Chicago, IL, April 2004.

The interview, résumé, and a drug screen were the most frequently used methods. Given the
widespread use of the Internet and the kinds of changes in organizations we described earlier, Cascio
and Fogli (2004) described four current organizational themes in selection and assessment: (a) speed
(results of selection procedures should be fast to process; get results to hiring manager), (b) time
(cannot be time-consuming for applicant or hiring manager), (c) cost (vendors of selection instruments
offer volume discounts and licensing/new pricing strategies), and (d) quality (must be seen as essential
to getting better employees). To be sure, there are trade-offs among these four themes, and speed, time,
cost, and quality may be incompatible goals under certain circumstances. An online measure of ability
to work in cross-functional teams (low cost, fast) may not produce the same level of quality (validity)
as a more expensive, time-consuming work-sample test in the context of an assessment center. There
is no singularly “correct” approach, but the optimal approach to use in a given situation requires a
deep understanding of the needs of decision-makers, the interrelationships of selection with other HR
processes, and the relative importance of each theme to a particular organization and its managers.
Indeed, the advent of online applications and prescreening procedures has lead to the funnel and
multiple-hurdle hiring processes described in Figure 11.3. In terms of the four themes described
above, note how the faster, less costly, less time-consuming procedures (online application and pre-
screening) precede the slower, more costly and time-consuming ones (assessment, interview). From
start to finish, the overall process is designed to ensure the delivery of high-quality employees.

LOOKING TO THE FUTURE: THE NEED FOR IMPROVED SELECTION PROCESSES


Pearlman and Barney (2000) described the need to design more cost-effective and less time-con-
suming selection procedures that may include:
• The use of carefully constructed applicant self-rating or supplemental application forms
(for self-assessment of occupation-specific skills and knowledge).
• The use of other forms of structured, applicant-provided information (such as work-
experience questionnaires).
248 Handbook of Employee Selection

Online application

Prescreening
Background check; salary requirements, hours;
minimum qualification questions

Assesment
Aptitudes;
work sample;
personality;
honesty

Interview

Job offer extended

FIGURE 11.3 Today’s typical assessment process.

• The use of college or high school transcripts.


• The use of required prehire training as a prescreening procedure.
• The use of realistic job previews (via paper, video, Internet, telephone, or other media).
• The use of telephone- or interactive voice-response-based prescreening on basic job quali-
fications and work preferences. This can also usefully incorporate realistic job-preview
information on work content (major tasks, responsibilities, and outputs) and work context
(the work setting and facility, pay, hours, special work requirements, and other conditions
of employment).
• The use of self-administered online practice tests or simulations that provide diagnostic
feedback regarding one’s probability of success on the operational selection process and
providing linkages to related training.

Assuming an organization is willing to trade off speed and time for quality (validity), Cascio
and Aguinis (2008) recommended, where feasible, the use of internships, contingent work arrange-
ments, and the use of virtual-reality technology to predict future “in situ” performance. In situ
performance reflects the broad range of effects—situational, contextual, strategic, and environmen-
tal—that may affect individual, team, or organizational performance. Such specification provides
a richer, fuller, context-embedded description of the criterion space that we wish to predict. As an
example of this approach, consider that United Parcel Service routinely hires many of its full-time
employees from the ranks of contingent workers—almost half of its workforce in 2006, with 55%
of them college students (Bolgar, 2007).

CONCLUSIONS: HOW SHOULD WE VALUE THE SUCCESS


OF SELECTION IN ORGANIZATIONS?
Earlier in this chapter we argued that the traditional approach of developing valid selection methods
to predict individual job performance is only “the start” of selection as an organizational process.
The Business Value of Employee Selection 249

We agree with Higgs et al. (2000) that it is important to adopt a broader perspective of successful
selection processes. Those processes should be assessed in terms of criteria that include empirical
validity, face validity, selection ratios, marketability of the selection effort, demonstration of the
business value of selection using quantitative as well as qualitative metrics, the effectiveness of the
management of selection processes, candidate reactions to those processes, overall expense, and the
timeliness with which selection decisions can be made.

RECOMMENDATIONS AND FUTURE DIRECTIONS


I-O psychologists need to do more as professionals than simply develop selection systems character-
ized by sound psychometric qualities. Our role cannot be limited to that of technicians, because our
responsibility does not end after developing a valid selection system. It does little good to say, “I
developed a valid selection system, but the organization misused it.” We need to be better scientists/
practitioners in integrating selection systems into organizations. Beyond traditional technical psycho-
metric competencies, we need to provide facilitation and process-consulting skills within the business
context. As selection-system developers, we need to implement change-management techniques (e.g.,
overcoming resistance to change) and practices (e.g., involvement and participation) when implement-
ing these systems. We need to extend our role as scientists/practitioners to achieve successful imple-
mentation of selection systems within the context of specific organizational characteristics, social
contexts, and interpersonal processes. The Society for Industrial and Organizational Psychology
(SIOP) could provide education and skill development (e.g., workshops for which the objectives are to
develop learning skills and techniques to facilitate the implementation of selection systems in organi-
zational contexts). Graduate training and internships could require students to demonstrate competen-
cies in facilitation and process-consulting skills related to implementing selection systems.
Consider a real-world example that is based on the direct involvement of one of the authors. The
organization in question is a major fashion retailer that has used a recruitment/selection system for
over 10 years to hire college graduates from the most prestigious universities for a management
development program. The continued success of this program can be attributed to some fundamen-
tal development and implementation practices, including:

1. Involvement of managers, executives, decision-makers, and HR in selection-technique


development and implementation. This includes job analysis, simulations, and interviews.
2. Updating job and competency requirements and selection techniques to meet the require-
ments of changing management jobs.

College graduate candidates are screened on college campuses with interviews targeted to man-
agement competencies and organizational success factors. Candidates who pass the on-campus
screen are invited to a 1-day assessment at the corporate headquarters. This 1-day assessment
includes (a) learning financial analysis skills, (b) participating in a group-based leadership exercise
to improve retail-store effectiveness, and (c) two panel-group interviews. All assessors are trained to
evaluate candidate performance using behavioral benchmarks and standards. Independent and con-
sensus ratings standards and guidelines are required. An assessor conference is held after the 1-day
assessment. Selection-technique data, including ratings and behavioral observations, are reported to
the assessors, who include managers, executives, incumbents, and HR staff. Guidelines are provided
to establish bands of scores and to make successful decisions.
Observations of why this selection system has been successful in predicting job success and in
becoming integrated into the culture of the business include:

1. The original development and updates to the selection system have involved multiple orga-
nizational participants, including executives, managers, job incumbents, and representa-
tives from recruitment, staffing, and training.
250 Handbook of Employee Selection

2. Hiring decisions are made in a 1-day session with all key decision-makers involved.
3. Selection techniques are developed in the business context and are updated at least every
3 years. Interviews contain behavior-description questions and situational questions. The
leaderless-group-competition exercise requires candidates to visit company and competi-
tors’ stores and to read consumer information regarding trends and the latest company
strategies for business development.
4. Assessors self-monitor and also monitor each other to evaluate candidates using behaviors/
benchmarks related to competencies and success factors.

In our opinion, the key to successful implementation of a selection system is to involve decision-
makers and stakeholders. Development of selection techniques is therefore a necessary, but not a
sufficient condition, for their successful acceptance and use by decision-makers. Implementation is
an ongoing challenge.

REFERENCES
Anderson, N., Lievens, F., van Dam, K., & Ryan, A. M. (2004). Future perspectives on employee selection: Key
directions for future research and practice. Applied Psychology: An International Review, 53, 487–501.
Ashe, R. L, Jr. (1990, April). The legality and defensibility of assessment centers and in-basket exercises. Paper
presented at the meeting of the Society for Industrial and Organizational Psychology, Miami Beach,
FL.
Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Behling, O. (1998). Employee selection: Will intelligence and conscientiousness do the job? Academy of
Management Executive, 12, 77–86.
Berry, C. M., Sackett, P. R., & Wiemann, S. (2007). A review of recent developments in integrity test research.
Personnel Psychology, 60, 271–301.
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential
and evidential bases. Journal of Applied Psychology, 74, 478–494.
Bolgar, C. (2007, November 15). High performance: How talent creates a competitive edge and powers organi-
zations to success. The Wall Street Journal, p. A6.
Boudreau, J. W. (1991). Utility analysis for decisions in human resource management. In M. D. Dunnette &
L. M. Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 2, pp. 621–
745). London, England: SAGE Publications.
Boudreau, J. W., & Ramstad, P. M. (2003). Strategic industrial and organizational psychology and the role of
utility analysis models. In W. C. Borman, D. R. Ilgen, & R. J. Klimoski (Volume Eds.), Handbook of
psychology, Vol. 12, Industrial and organizational psychology (pp. 193–221). Hoboken, NJ: Wiley.
Boudreau, J. W., Sturman, M. C., & Judge, T. A. (1994). Utility analysis: What are the black boxes, and do they
affect decisions? In N. Anderson & P. Herriot (Eds.), Assessment and selection in organizations. Methods
and practice for recruitment and appraisal (pp. 77–96). New York, NY: John Wiley.
Cabrera, E. F., & Raju, N. S. (2001). Utility analysis: Current trends and future directions. International Journal
of Selection and Assessment, 9, 92–102.
Cascio, W. F. (1993). Assessing the utility of selection decisions: Theoretical and practical considerations. In
N. Schmitt & W.C. Borman (Eds.), Personnel selection in organizations (pp. 310–340). San Francisco,
CA: Jossey-Bass.
Cascio, W. F. (2000). Costing human resources: The financial impact of behavior in organizations (4th ed.).
Cincinatti, OH: South-Western.
Cascio, W. F. (2007). Evidence-based management and the marketplace for ideas. Academy of Management
Journal, 50(5), 1009–1012.
Cascio, W. F., & Aguinis, H. (2010). Applied psychology in human resource management (7th ed.). Upper
Saddle River, NJ: Pearson Prentice Hall.
Cascio, W. F., & Aguinis, H. (2008). Staffing 21st-century organizations. Academy of Management Annals, 2,
133–165.
Cascio, W. F., & Boudreau, J. W. (2008). Investing in people: Financial impact of human resource initiatives.
Upper Saddle River, NJ: Pearson.
The Business Value of Employee Selection 251

Cascio, W. & Fogli, L. (2004, April). Talent acquisition: New realities of attraction, selection, and retention.
Pre-conference workshop presented at the annual conference of the Society for Industrial Organizational
Psychology, Chicago, IL.
Chapman, D. S., & Zweig, D. I. (2005). Developing a nomological network for interview structure: Antecedents
and consequences of the structured selection interview. Personnel Psychology, 58, 673–702.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences (rev. ed.). New York, NY: Academic
Press.
Connerley, M. L., Arvey, R. D., & Bernardy, C. J. (2001). Criminal background checks for prospective and
current employees: Current practices among municipal agencies. Public Personnel Management, 30,
173–183.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52,
281–302.
Economist Intelligence Unit (2006, January). CEO briefing: Corporate priorities for 2006 and beyond.
Retrieved February 24, 2006, from http://www.eiu.com
Fisher, A. (2005, November 28). For happier customers, call HR. Fortune, p. 272.
Fogli, L. (2006, July). Managing change in turbulent times. Paper presented at the Sixth International
Conference on Knowledge, Culture and Change in Organizations. Monash University Centre, Prato,
Italy.
Food Marketing Institute. (1985). Cashier test battery administrator’s manual. Washington, DC: Author.
Frase, M. J. (2007, December). Smart selections: Intelligence tests are gaining favor in the recruiting world as
reliable predictors of future executive performance. HR Magazine, pp. 63–67.
Gardner, W. L., & Martinko, M. J. (1996). Using the Myers-Briggs type indicator to study managers: A litera-
ture review and research agenda. Journal of Management, 22, 45–83.
Haar, J. M., & Spell, C. S. (2007). Factors affecting employer adoption of drug testing in New Zealand. Asia
Pacific Journal of Human Resources, 45, 200–217.
Hartigan, J. A., & Wigdor, A. K. (Eds.). (1989). Fairness in employment testing: Validity generalization, minor-
ity issues, and the General Aptitude Test Battery. Washington, DC: National Academy Press.
Hay Group. (2007, October 23). SHRM Foundation research on human capital challenges. Washington, DC:
Hay Group.
Higgs, A. C., Papper, E. M., & Carr, L. S. (2000). Integrating selection with other organizational processes and
systems. In J. Kehoe (Ed.), Managing selection in changing organizations: Human resource strategies
(p. 94). San Francisco, CA: Jossey-Bass.
Hogan, J., Davies, S., & Hogan, R. (2007). Generalizing personality-based validity evidence. In S. M. McPhail
(Ed.), Alternative validation strategies (pp. 181–229). San Francisco, CA: Jossey-Bass.
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of
Vocational Behavior, 29, 340–362.
Jayne, M. E. A., & Rauschenberger, J. M. (2000). Demonstrating the value of selection in organizations. In
J. Kehoe (Ed), Managing selection in changing organizations: Human resource strategies (pp. 123–157).
San Francisco, CA: Jossey-Bass.
Kraut, A. I., & Korman, A. K. (Eds.). (1999). Evolving practices in human resource management: Responses
to a changing world of work. San Francisco, CA: Jossey-Bass.
Lance, C. E, Woehr, D. J., & Meade, A. W. (2007). Case study: A Monte Carlo investigation of assessment
center construct validity models. Organizational Research Methods, 10, 430–448.
Latham, G. P., & Whyte, G. (1994). The futility of utility analysis. Personnel Psychology, 47, 31–46.
Lawshe, C. H., & Balma, M. J. (1966). Principles of personnel testing (2nd ed.). New York, NY: McGraw-
Hill.
Le, H., Oh, I., Shaffer, J., & Schmidt, F. L. (2007). Implications of methodological advances for the prac-
tice of personnel selection: How practitioners benefit from meta-analysis. Academy of Management
Perspectives, 21, 6–15.
Meglino, B. G., & Ravlin, E. C. (1998). Individual values in organizations: Concepts, controversies, and
research. Journal of Management, 24, 351–389.
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007).
Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60,
683–729.
Motowidlo, S. J. (2003). Job performance. In W. C. Borman, D. R. Ilgen, & R. J. Klimoski (Eds.), Comprehensive
handbook of psychology. Industrial and organizational psychology (Vol. 12, pp. 39–53). New York, NY:
Wiley.
Murphy, K. R., & Myors, B. (1998). Statistical power analysis. Mahwah, NJ: Lawrence Erlbaum.
252 Handbook of Employee Selection

Ones, D. S., & Viswesvaran, C. (1998). Gender, age, and race differences on overt integrity tests: Results across
four large-scale job applicant data sets. Journal of Applied Psychology, 83, 35–42.
Ones, D. S., Viswesvaran, C., & Reiss, A. D. (1996). Role of social desirability in personality testing for per-
sonnel selection: The red herring. Journal of Applied Psychology, 81, 660–679.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities:
Findings and implications for personnel selection and theories of job performance. Journal of Applied
Psychology, 78, 679–703.
Pearlman, K., & Barney, M. (2000). Selection for a changing workplace. In J. Kehoe (Ed.), Managing selection
in changing organizations: Human resource strategies (pp. 3–72). San Francisco, CA: Jossey-Bass.
People Focus (1998). Bank of America new hire assessment predictive validation report. Pleasant Hill, CA:
Author.
Pulakos, E. D., Arad, S., Donovan, M. A., & Plamondon, K. E. (2000). Adaptability in the workplace:
Development of a taxonomy of adaptive performance. Journal of Applied Psychology, 85, 612–624.
Ramsay, H., & Scholarios, D. (1999). Selective decisions: Challenging orthodox analyses of the hiring process.
International Journal of Management Reviews, 1, 63–89.
Roth, P. L., Bobko, P., & McFarland, L.A. (2005). A meta-analysis of work sample test validity: Updating and
integrating some classic literature. Personnel Psychology, 58, 1009–1037.
Rynes, S. L., Colbert, A. E., & Brown, K. G. (2002). HR professionals’ beliefs about effective human resource
practices: Correspondence between research and practice. Human Resource Management, 41, 149–174.
Rynes, S. L., Giluk, T. L., & Brown, K. G. (2007). The very separate worlds of academic and practitioner
periodicals in human resource management: Implications for evidence-based management. Academy of
Management Journal, 50, 987–1008.
Schippmann, J. S., Ash, R. A., Battista, M., Carr, L., Eyde, L. D., Hesketh, B., et al. (2000). The practice of
competency modeling. Personnel Psychology, 53, 703–740.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychol-
ogy: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Schmidt, F. L., & Hunter, J. E. (2003). History, development, evolution, and impact of validity generalization
and meta-analysis methods, 1975–2002. In K. R. Murphy (Ed.), Validity generalization: A critical review
(pp. 31–66). Hillsdale, NJ: Lawrence Erlbaum.
Schmitt, N., & Borman, W. C. (Eds.). (1993). Personnel selection in organizations. San Francisco, CA: Jossey-
Bass.
Schmitt, N., Oswald, F. L., Kim, B. H., Imus, A., Merritt, S., Friede, A., & Shivpuri, S. (2007). The use of
background and ability profiles to predict college student outcomes. Journal of Applied Psychology, 92,
165–179.
Skarlicki, D. P., Latham, G. P., and Whyte, G. (1996). Utility analysis: Its evolution and tenuous role in human
resource management decision making. Canadian Journal of Administrative Sciences, 13, 13–21.
Society of Human Resources Management Foundation (2007, December 13). Top U.S. executives see rapidly
shrinking talent pool: HR survey finds recruiting and retaining tomorrow’s leaders and skilled employ-
ees is number one priority. Retrieved December 15, 2007, from http://www.shrm.org/about/foundation/
Pages/default.aspx
Sturman, M. C. (2000). Implications of utility analysis adjustments for estimates of human resource interven-
tion value. Journal of Management, 26, 281–299.
Taylor, H. C., & Russell, J. T. (1939). The relationship of validity coefficients to the practical effectiveness of
tests in selection. Journal of Applied Psychology, 23, 565–578.
Welch, J. (2005). Winning. New York, NY: HarperBusiness.
Whyte, G., & Latham, G. (1997). The futility of utility analysis revisited: When even an expert fails. Personnel
Psychology, 50, 601–610.
Part 3
Categories of Individual Difference
Constructs for Employee Selection
David Chan and Leaetta Hough, Section Editors
This page intentionally left blank
12 Cognitive Abilities

Deniz S. Ones, Stephan Dilchert, Chockalingam


Viswesvaran, and Jesús F. Salgado

Intelligence affects individuals’ lives in countless ways. It influences work lives of employees
perhaps to a greater extent than any other individual differences trait. It determines whether an
employee will be able to perform assigned tasks. It is the strongest predictor of learning and acquisi-
tion of job knowledge as well as overall job performance. It is remarkably relevant regardless of the
occupation one holds. It even predicts extrinsic career success (i.e., earnings and promotions). As
such, it is an exceedingly precious trait to include in employee selection systems.
In this chapter, we provide an overview of cognitive ability’s key role in staffing organizations
and provide evidence-based practice recommendations. We first present a brief synopsis of the
history, current usage, and acceptance of cognitive ability tests in employee selection. Second, we
highlight the theoretical underpinnings of cognitive ability as a construct. Third, we discuss devel-
opments in its measurement. Fourth, we present an overview of the criterion-related validity of cog-
nitive ability tests in predicting valued work behaviors and outcomes. Fifth, we discuss the issue of
group differences in cognitive ability test scores both within the United States and internationally.
We conclude by discussing future research and practice challenges.

HISTORY, CURRENT USAGE, AND ACCEPTABILITY OF


COGNITIVE ABILITY MEASURES IN EMPLOYEE SELECTION
It has been more than 100 years since the publication of Spearman’s influential (1904) article
“ ‘General Intelligence,’ Objectively Determined and Measured.” 1 Very early in the 20th century,
researchers began to study the usefulness of cognitive ability measures for predicting learning and
performance in educational and organizational settings. However, over the next few decades results
revealed much variability, particularly with regard to the supposed usefulness of such measures to
predict job performance. It seemed that the specific jobs under investigation, specific organizational
settings, the particular ability measures used, and many unidentified (and unidentifiable) factors
all contributed to the variability of observed results (e.g., Hull, 1928). Moreover, validation results
differed even when jobs, organizations, and measures were held constant. Industrial psychologists
came to believe that subtle, undetectable differences in situations were responsible for differences
observed in the predictive value of the test studied. By the 1960s, this belief in situational specific-
ity dominated the scientific literature and was well entrenched among practitioners (see Chapter 42,
this volume). The breakthrough came in the 1970s. Industrial psychologists Frank Schmidt and
Jack Hunter demonstrated (Schmidt & Hunter, 1977) that most differences observed across stud-
ies of cognitive ability were due to sampling error (sample sizes for validation studies in the 1960s

1 For the 100th anniversary of that article’s publication, the Journal of Personality and Social Psychology published
a special section attesting to the impact of cognitive ability on a multitude of life domains (Deary, Whiteman, Starr,
Whalley, & Fox, 2004; Gottfredson, 2004a; Kuncel, Hezlett, & Ones, 2004; Lubinski, 2004; Plomin & Spinath, 2004;
Schmidt & Hunter, 2004).

255
256 Handbook of Employee Selection

displayed a median of 68; see Lent, Aurbach, & Levin, 1971), differences in level of restriction of
range in samples (typically employees in concurrent studies who had already been selected into an
organization), and differences across studies in the unreliability of criterion measurement (typically
supervisory ratings of job performance). These statistical and measurement artifacts were respon-
sible for the differences in results observed across most previous validation studies, discrediting the
theory of situational specificity. The invention of meta-analysis (known as “validity generalization”
in the employee selection literature) paved the way to systematic investigations of predictor validity,
also reaching beyond the domain of cognitive abilities.2
Standardized, objective cognitive ability tests by now have a century-long history. Group tests
of intelligence were developed prior to World War I and used extensively during both World Wars.
European and U.S. armed forces continue to utilize cognitive ability tests for selection and/or place-
ment. Today, cognitive ability measures are used in educational admissions and civilian personnel
staffing. But how widespread is the use of cognitive ability measures in organizational settings in
general, as well as vis-à-vis other tools available for personnel decision-making? During the 1990s,
a patchwork of surveys indicated that cognitive ability tests were being used for employee selection
across the globe, but their usage varied by country (Salgado, Viswesvaran, & Ones, 2001). At the
turn of the century, the most comprehensive survey was conducted by Ryan, McFarland, Baron, and
Page (1999). Ryan and colleagues surveyed 959 organizations from 20 countries by randomly sam-
pling 300 large organizations (with more than 1,000 employees) in each country. The focus of their
study was the examination of national and cultural influences on many selection system features.
The pervasiveness of cognitive ability tests was also surveyed. On the basis of their data, Table 12.1
summarizes how extensively cognitive ability measures were used.
Across the 18 countries for which data were reported, on average, cognitive ability tests were
used between 21% and 50% of the time in employee selection. Organizations in The Netherlands,
Belgium, Portugal, Spain, South Africa, New Zealand, and the United Kingdom reported above
average use, whereas organizations in Germany, Hong Kong, and Italy reported especially low levels
of cognitive test use. Within each country, of 14 selection methods presented to study participants
(cognitive ability tests, physical ability tests, foreign language tests, work samples, personality tests,
integrity tests, interest inventories, simulation exercises, situational judgment tests, video-based
tests, projective techniques, drug tests, medical screens, and graphology), cognitive ability tests
were ranked in the top three most frequently utilized methods in 15 of 18 countries. It is of value
to note that some of the methods listed, such as situational judgment tests (SJTs) or simulations,
can be used to measure a variety of constructs, and thus data on their use are not necessarily
directly comparable to that of construct-specific ones such as standardized tests of cognitive ability
and personality. However, it appears that if objective tests are utilized at all in personnel staffing
decisions, cognitive ability measures are included with some frequency. Unfortunately, data on the
extensiveness of cognitive ability test use come from countries and cultural regions that are not
entirely representative of the world’s cultural regions. Data from eastern Europe (e.g., Ukraine,
Russia) and southern Asia (e.g., India, Pakistan) are meager; systematic, large-scale surveys from
Latin America, the Middle East, and Africa are also lacking. Countries from these world regions
offer a unique opportunity for industrial-organizational (I-O) psychologists to assess cultural
variability in the extensiveness of use of as well as reactions to employee selection tools, including
cognitive ability tests.
Prevalence data provide an index of organizational acceptance of selection tools. Another per-
spective on this issue can be gained by examining applicant reactions. Applicant reactions to selection
tests vary by test type (Kluger & Rothstein, 1993). There are a few international comparative studies
of applicant reactions to selection tests (including Belgium, France, Germany, Greece, Portugal,

2 In fact, 30 years after its inception, meta-analysis has changed the nature of epistemological inquiry in all sciences. To
date, there have been 40,318 peer-reviewed publications that have used or discussed meta-analytic methods, garnering
754,430 citations as of January 22, 2007 (Christensen, Beatty, Selzer, & Ones, under review).
Cognitive Abilities 257

TABLE 12.1
Cognitive Ability Test Use in 18 Countries
Country Organizations (N) Mean SD Rank
Australia 66 2.39 1.57 3
Belgium 68 3.85 1.30 1
Canada 84 2.59 1.62 3
France 35 2.29 1.72 5
Germany 35 1.90 1.52 2
Greece 27 2.54 1.56 3
Hong Kong 8 1.83 0.98 4
Ireland 49 2.79 1.42 3
Italy 29 1.33 0.82 5
Netherlands 66 3.76 1.41 2
New Zealand 112 3.37 1.41 2
Portugal 31 3.27 1.75 2
Singapore 16 2.83 1.60 2
South Africa 54 3.25 1.44 3
Spain 24 3.75 1.44 2
Sweden 91 2.86 1.37 3
United Kingdom 108 3.08 1.52 3
United States 52 2.09 1.26 3
All Countries 955 2.98 1.54 3

Responses were scaled as follows: 1 = never, 2 = rarely (1–20%), 3 = occasionally (21–50%), 4 = often (51–80%), and 5 =
almost always or always (81–100%). Rank = within-country rank among selection methods, computed by the authors. For
Hong Kong, rank 4 was a tie.
Source: Adapted from Ryan, A. M., McFarland, L., Baron, H., & Page, R., Personnel Psychology, 52, 359–391, 1999.

Italy, Spain, and the United States, among others, see Bertolino & Steiner, 2007; Moscoso, 2006;
Nikolaou & Judge, 2007), and the results suggest that general mental ability tests are among the
selection procedures rated more favorably. Hausknecht, Day, and Thomas (2004) conducted a meta-
analysis of applicant reactions to employee selection procedures. In ten of the studies they located,
study participants (in laboratory and field settings) rated the favorability of cognitive ability tests
along with other methods of employee selection. Pooled ratings from 1,499 study participants
indicated that the mean favorability ratings for cognitive ability tests were lower than those for
interviews, work samples, resumes, and references, but higher than those for (in descending order)
personality tests, biodata, personal contacts, honesty tests, and graphology. However, caution is
warranted in drawing conclusions about reactions of actual job applicants to cognitive ability tests
on the basis of these results: Participants in the studies contributing to the Hausknecht et al. (2004)
meta-analysis were not necessarily applying for jobs, were not in selection settings, and did not
experience each of the tools they were rating.
On the basis of the available data and our experiences in practice, we would like to offer the
following observations. Applicant reactions to cognitive ability tests are largely determined by
their perceived fairness and their perceived predictive validity (Chan & Schmitt, 2004). Applicant
perceptions of fairness can be improved by enhancing the procedural fairness of selection systems
as a whole, but are likely to present a challenge in employee selection as long as any traditionally
disadvantaged group (broadly defined) scores systematically lower on a given predictor battery.
Furthermore, it has recently been shown that general mental ability is a common antecedent not only
of performance on cognitive ability tests, but also of perceived test fairness and test-taking motivation
(Reeve & Lam, 2007). Regarding cognitive ability test use, tackling the issue of perceived predictive
258 Handbook of Employee Selection

validity (or lack thereof) will be one of the most crucial tasks for our profession in the years to come.
Vocal opponents of high-stakes testing have promulgated myths about the ability of intelligence
tests to predict valued outcomes. These myths, although often entirely unsupported by empirical
evidence or even common logic, are difficult to dispel and, if allowed to inform organizational
decisions on selection tool use, pose a threat to organizations and ultimately societies’ economic
welfare (Sackett, Borneman, & Connelly, 2008; Schmidt & Hunter, 1981).

DEFINITIONS AND THEORETICAL UNDERPINNINGS


The core of intelligence as a psychological construct has long been conceptualized as reasoning
ability and a form of mental adaptability (Stern, 1911). Despite the central place this construct
takes in determining individual behavior, it took almost a century for a broad scientific consensus
to emerge on its definition. A group of 52 experts that included luminaries of psychological science
defined intelligence as “a very general mental capacity that, among other things, involves the ability
to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and
learn from experience” (Gottfredson, 1997, p. 13). This group of scholars, drawn from various
psychological disciplines, goes on to state that intelligence “is not merely book learning, a narrow
academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for
comprehending our surroundings—‘catching on,’ ‘making sense’ of things, or ‘figuring out’ what
to do” (p. 13).
The importance of such a broad definition (in contrast to folk concepts such as “book smarts”)
cannot be overstated. Conceptually, intelligence, in humans and other species, indicates the com-
plexity and efficiency of cognitive functioning. Here, complexity refers to the “sophistication of the
intellectual repertoire” (Lubinski, 2004, p. 98), whereas the efficiency aspect refers to the effective-
ness of information processing skills. Both aspects are critical to performance in all domains of
life (in interpersonal interactions, at home, school, or work), and their impact on individual dif-
ferences in problem solving ability can be observed in individuals of all ages. The realization that
such information-processing skills “can be applied to virtually any kind of content in any context”
(Gottfredson, 2004b, p. 23) is of relevance to scientists and practitioners alike. The application of
this principle to organizational contexts forms the conceptual basis of Campbell’s (1990) funda-
mental statement that “general mental ability is a substantively significant determinant of individual
differences in job performance for any job that includes information-processing tasks” (p. 56).
Cognitive ability is an integral part in models of job performance because of its relation to
knowledge and skill acquisition. General mental ability predicts job performance because it is a
causal determinant of acquisition of job knowledge (McCloy, Campbell, & Cudeck, 1994; Schmidt,
Hunter, & Outerbridge, 1986). The more cognitively demanding the knowledge to be acquired and
the more complex the task to be performed, the greater is the relationship between cognitive ability
and performance (Hunter & Hunter, 1984).

STRUCTURE
Although there are numerous specific cognitive abilities, they all share a common construct core:
general mental ability, popularly called intelligence, or g (for general intelligence factor) in many
scientific writings. As Gottfredson (2002) so aptly noted, the multitude of ways to measure g attest
to its generality. Although measures of intelligence may look different, use different formats (e.g.,
individually administered tasks, paper-and-pencil tests, computerized batteries), and items (e.g.,
verbal, figural, numerical), this does not mean they assess entirely distinct constructs.
The structure of cognitive abilities has been examined extensively since Spearman’s (1904) dis-
tinction of g and s (specific abilities). A century of research has yielded hundreds of data sets in
which individuals took multiple cognitive ability measures. Carroll (1993) compiled, analyzed, and
summarized the correlation matrices resulting from over 460 of such data sets. The result was his
Cognitive Abilities 259

popular three-stratum model. Cognitive abilities are hierarchically organized. At the apex is g,
or an ability that is general. At the second stratum are group factors or broad abilities (e.g., prior
acquisition of knowledge, visual perception, and production of ideas). At the lowest level of the
hierarchy are specific factors or narrow abilities. It is quite common that individuals of similar
intelligence (i.e., at the same trait level of general mental ability) differ in their standing on specific
abilities because of differential “investment” of their cognitive capacity (guided by other personal
characteristics as well as idiosyncratic developmental and educational experiences) in these narrow
cognitive domains.
The most significant content domains that surface in most ability models are verbal/linguis-
tic, quantitative/numerical, and spatial/mechanical. The distinction between fluid and crystallized
intelligence (termed gf and gc respectively; see Cattell, 1971) provides a still popular conceptual
model but has also been shown to distinguish between g and lower-level abilities, rather than abil-
ity factors at the same level of the taxonomical hierarchy. Fluid and crystallized intelligence tend
to correlate around .70, and gf has been found to be indistinguishable from g (Gustafsson, 2002).
Today, researchers continue to clarify and refine the structure of intelligence in individual studies
(e.g., see Johnson & Bouchard, 2005; Johnson, te Nijenhuis, & Bouchard, 2007, 2008); however, the
overview provided by Carroll’s (1993) masterpiece remains the broadest and most comprehensive
empirical analysis of the issue to date.
When various cognitive ability tests are administered to test-takers reflecting the entire range of
intelligence from the general population, about 50% of the common variance is due to g, whereas
8–10% of the remaining common variance is attributable to verbal, quantitative, and spatial abilities
(Lubinski, 2004). g is also a great source of predictive power for behavior in many contexts, includ-
ing on the job (see below).
It has been found that relationships among cognitive ability scales are weaker at higher levels of
the ability spectrum (e.g., see Detterman & Daniel, 1989; Kane, Oakland, & Brand, 2006), imply-
ing a smaller amount of common variance due to g. Theoretically, one implication could be that
there may be more room among high-ability individuals for specific abilities to yield incremental
validities over tests of general mental ability. However, investigations of incremental validity in
highly complex jobs have so far yielded mixed results. Olea and Ree (1994) reported that specific
abilities contributed little beyond g to the prediction of job performance among pilots and naviga-
tors, whereas Ree and Carretta (1996) concluded that some specific abilities had the potential to
add incremental value at least for prediction of military pilot performance. If the common variance
among individual ability tests accounted for by g can be consistently found to be smaller than in
samples of broad talent, then it is plausible that specific abilities could add incremental value over
general mental ability for such groups. However, direct tests of this hypothesis among high-ability
job applicant samples are still called for.3
When broad job categories and applicants of a wide range of talent are studied, analyses directed
at incremental validities of specific ability measures over g have yielded disappointing results:
Specific abilities do not provide substantial incremental validity over g. However, we should also
note that in some meta-analyses, specific abilities have been shown to be similarly valid for the
prediction of some criteria (see below). In addition, there may also be nonvalidity-related consid-
erations for practitioners to include specific ability measures in their selection systems, such as the
consequences of anticipated group differences or applicant reactions.

MEASUREMENT
The list of cognitive ability measures available to scientists, individual practitioners, and organiza-
tions runs in the hundreds and includes everything from simple, homegrown measures to tests of
wide circulation supported by many decades of empirical research evidence. A discussion of the

3 Such investigations would have to sort out potential range restriction effects in these samples.
260 Handbook of Employee Selection

merits of individual measures cannot be provided in this chapter. However, a brief discussion of
commonly used methods, as well as current trends in cognitive ability assessment, is warranted.
Traditional, standardized tests are the most widespread method for measuring all types of cogni-
tive abilities. Their popularity is not because of a lack of alternative methods, but primarily because
of their excellent reliability, ease of administration, and scoring. Although validity (including pre-
dictive validity) is the property of a psychological construct (e.g., the abilities measured by a test,
not the test itself), the reliability of the assessment methods provides a ceiling for validities that can
be obtained in applied settings. From this point of view, standardized tests provide probably the
best solution for organizations looking to assess cognitive ability in a reliable, standardized, and
objective manner.
The use of standardized tests in employee selection and academic settings is not without con-
troversy. Unfortunately, criticism levied against cognitive ability tests, like that directed at other
standardized testing, often falls victim to “content-format confusion” (Ryan & Greguras, 1998; see
also Chan & Schmitt, 2004). In addition to standardized tests, many other methods can be used
to assess cognitive ability constructs, and a careful investigation of these methods and how they
are typically used can inform decisions on whether they are suitable for a given purpose and set-
ting. Interviews, assessment centers (ACs), and SJTs are all methods that assess cognitive ability
to varying degrees—sometimes by design, sometimes by accident. For example, the most recent
meta-analysis of the interview-cognitive ability correlations reported a mean, range-restriction cor-
rected correlation of .27 (N = 11,317, k = 40; Berry, Sackett, & Landers, 2007). Some intriguing
results reported include higher interview-ability test correlations when interview validity is high
and job complexity is low. Interviews with greater cognitive content can be expected to yield higher
criterion-related validities. Also, for low-complexity jobs, interviews may function as more of a
cognitive screen than for higher-complexity jobs.
Relationships between cognitive ability and overall AC ratings have also been examined. A
meta-analysis by Collins et al., (2003) reported that cognitive ability test scores correlated .43 with
overall AC ratings (N = 5,419, k = 34). However, AC dimensions have differential cognitive loads. In
a large-scale study, Dilchert and Ones (2009) reported that the highest correlations were found for
the AC dimension problem solving (r = .32, N = 4,856), providing further evidence for the fact that
cognitive ability measures capture real-world problem solving abilities (cf. Gottfredson, 1997).
Increasingly popular SJTs are also correlated with cognitive ability; however, the magnitude of
the correlation depends on the instructions given to participants. Knowledge instructions (e.g., “what
should one do,” “rate the best/worst option”) in completing SJTs produce an observed correlation
of .32 (N = 24,656, k = 69), whereas SJTs with behavioral tendency instructions (e.g., “what would
you do”) correlate .17 (N = 6,203, k = .26) with cognitive ability (McDaniel, Hartman, Whetzel, &
Grubb, 2007). Thus, if job applicants complete SJTs, especially under knowledge instructions,
assessments produce a ranking of job applicants on cognitive ability to a certain degree. This link
between cognitive ability and SJTs is also likely to be affected by other moderators in addition to
the type of instructions.
Many assessment methods increasingly rely on formats other that the traditional paper-and-pen-
cil form, a trend that is also reflected in ability measurement. Earlier research used meta-analysis to
establish the equivalence of computerized and paper-and-pencil versions of cognitive ability tests
(Mead & Drasgow, 1993). Recent trends in web-based assessment and test content delivery build
on the fact that tests of many individual difference predictors (not only cognitive ability) have been
shown to be equivalent between paper-and-pencil and computerized versions. However, the real
challenge arises not from a change in test format, but a change in administration mode.
Web-based, unproctored cognitive ability assessment is aimed at streamlining the application
process for applicants and organizations. Critics argue that this approach requires strong confidence
in the honesty of test-takers (who, at least in selection contexts, presumably have a strong incentive
to cheat). We would like to argue that these trends provide an opportunity for our field to take some
truly large steps in the development of cognitive ability assessments. Some organizations, confronted
Cognitive Abilities 261

by the real-world challenges of having to assess hundreds of thousands of applicants every year,
are already using computerized adaptive testing and constantly updated test materials to conduct
unproctored web-based testing (Gibby, 2008). Commercial test publishers and assessment providers
have also developed several strategies to address issues of cheating and test security, ranging from
regular monitoring for item-piracy and systematic, proctored retesting of test-takers (Burke, 2008)
to remote, video-based proctoring and biometric test-taker identification (Foster, 2008). We are
certain that for large-scale assessments, such trends will soon become the everyday reality, dictated
by demands for more streamlined assessment procedures from applicants and organizations alike
(see also Chapter 8, this volume). The challenges posed by remote, unproctored cognitive ability
assessment will need to be addressed by a more intense collaboration of scientists and practitioners,
as well as by drawing on expertise from outside of the psychological domain. Organizations and
providers that invest in the appropriate know-how and technology will be at the forefront of the next
big development in cognitive ability measurement.

CRITERION-RELATED VALIDITY EVIDENCE


The job relatedness and usefulness of cognitive ability tests in employee selection have been
documented in dozens of quantitative reviews in the form of publications and technical reports,
incorporating over 1,300 meta-analyses summarizing results from over 22,000 primary studies.
The total sample size of job applicants and employees providing data for these validation
studies is in excess of 5 million individuals (Ones, 2004; Ones & Dilchert, 2004). The question
of whether cognitive ability tests are useful predictors of performance in occupational settings
has been definitely answered: Yes, they are excellent predictors of training performance and job
performance. In fact, no other predictor construct in employee selection produces as high validities,
as consistently, as does cognitive ability. In addition, no assessment method has so far achieved as
reliable assessment of cognitive ability as standardized tests, making such tests the ideal choice for
predicting performance in organizational settings.
Meta-analyses of cognitive ability test validities up to the year 2004 have been tabulated and
summarized by Ones, Viswesvaran, and Dilchert (2005). Rather than covering the same ground
again, here we provide an overview of conclusions from these quantitative reviews. Readers inter-
ested in the specific meta-analyses supporting each conclusion are encouraged to review Tables 7.1
and 7.2 of Ones et al.’s (2005) chapter.

COGNITIVE ABILITY TESTS PREDICT LEARNING, ACQUISITION OF JOB KNOWLEDGE,


AND JOB TRAINING PERFORMANCE WITH OUTSTANDING VALIDITY (OPERATIONAL
VALIDITIES IN THE .50 TO .70 RANGE)
Validities for training criteria generalize across jobs, organizations, and settings. Meta-analyses
provide voluminous evidence of high validity for training success in military and civilian organiza-
tions. Operational validities (correlations corrected for attenuation due to unreliability in criterion
measures and range restriction, where applicable) are highest for general mental ability and specific
quantitative and verbal abilities, and somewhat lower for memory (although still highly useful with
a sample-size-weighted operational validity of .46). Validities are moderated by job complexity.
The greater the complexity of jobs being studied, the higher the validity of cognitive ability tests in
predicting training performance (Hunter & Hunter, 1984; Salgado, Anderson, Moscoso, Bertua, de
Fruyt, et al., 2003). Superior validities of cognitive ability tests for learning are in line with findings
that cognitive ability is the strongest determinant of knowledge acquisition, in this case acquisition
of job knowledge (Schmidt et al., 1986). The more complex jobs are, the more complex and vast
the knowledge to be acquired. Brighter individuals learn quicker, learn more, and can acquire more
complex knowledge with ease.
262 Handbook of Employee Selection

COGNITIVE ABILITY TESTS PREDICT OVERALL JOB PERFORMANCE WITH HIGH


VALIDITY (OPERATIONAL VALIDITIES IN THE .35 TO .55 RANGE)
Table 12.2 summarizes the potential moderators of cognitive ability test validity in employment settings,
indicating those supported and those rejected on the basis of meta-analyses, as well as those awaiting
investigation. Validities for overall job performance criteria generalize across jobs, organizations,
and settings. Support for these key conclusions comes from meta-analyses of studies using narrow
job groupings (e.g., mechanical repair workers, first-line supervisors, health technicians, computer
programmers, lawyers, retail sales personnel, firefighters), broad job groupings (e.g., clerical jobs, law
enforcement, maintenance trades), and heterogeneous job groupings (e.g., by job complexity). Individual
large sample studies (e.g., Project A) also point to the same conclusions. Operational validities are
highest for general mental ability and quantitative abilities and somewhat lower for memory (although
still useful with a sample-size-weighted operational validity of .39 across 12 different meta-analyses;
Ones & Dilchert, 2004). The method of performance measurement employed (objective vs. subjective)
does not lead to different conclusions about the usefulness of cognitive ability tests, and different indices
of performance (rankings, ratings, etc.) produce similar operational validities. Job complexity also
moderates the validities of cognitive employment tests. Higher validities are found for jobs of higher
complexity. Validity generalization studies have clearly demonstrated that matching specific cognitive
abilities to aspects of task performance deemed important in a given job is not necessary. In sum, it is
remarkable that even when moderators have been reported for cognitive ability test validity, they do not
result in validities reversing direction or shrinking to negligible levels in magnitude. Useful levels of
validity are found even for the more specific cognitive abilities and for lowest levels of job complexity.
There are a multitude of additional variables that have been tested as potential moderators of
validity in the meta-analyses reviewed in Ones et al. (2005), and all can be dismissed based on
empirical evidence. These include organizational setting, method of criterion measurement (ratings,
rankings etc.; Nathan & Alexander, 1988), race and ethnicity (U.S. Whites, Blacks, and Hispanics;
see below), sex (see below), validation study design (concurrent vs. predictive; Barrett, Phillips,
& Alexander, 1981), and length of time on the job (experience up to 5 years; Schmidt, Hunter,
Outerbridge, & Goff, 1988). In sum, cognitive ability test validity does not vary substantially
and systematically across organizational settings or for the subgroups that have been examined.
Concurrent validities approximate predictive validities, and cognitive ability tests show no declines
in validity as workers gain experience. There are nonetheless still some potential moderators waiting
to be tested in large-scale, representative studies, or more systematically or thoroughly investigated

TABLE 12.2
Hypothesized Moderators of Cognitive Ability Test Validities
Yes: Confirmed Moderators No: Rejected Moderators ?: Moderating Effect Unknown
Job complexity Situational variables Time of study (historical age)
Criterion predicted Organizational setting Age
Training performance Race Race
Job performance African Americans Asian Americans
Leadership Hispanics Native Americans
Turnover, etc. Sex National setting and culture
(except for some European
countries)
Cognitive ability construct assessed Military/civilian setting
GMA Validation design (concurrent/predictive)
Verbal ability Length of time on the job (up to 5 years)
Memory, etc. Method of criterion-measurement (e.g.,
ratings, production quantity, work samples)
Cognitive Abilities 263

using meta-analytic approaches. Studies of Asian and Native Americans as well as older adults are
notably absent from the I-O psychology literature (see below for more details).
Although we know a great deal about the validity of cognitive ability tests for predicting training,
task, and overall job performance criteria, knowledge of how cognitive ability relates to other aspects
of work behavior (e.g., organizational citizenship) is limited. Nevertheless, intriguing findings are
being reported. Alonso, Viswesvaran, and Sanchez (2008) found that cognitive ability correlated
more highly with contextual performance than personality factors. Also, the first direct predictive
validity study relating a cognitive ability measure to counterproductive behaviors was published
only recently (Dilchert, Ones, Davis, & Rostow, 2007), indicating that intelligent individuals avoid
engaging in organizational and interpersonal deviance on the job. More research into correlates of
cognitive ability outside of the traditional job performance domain would be welcome.
So far, there have been no investigations of cognitive ability test validity across time (i.e., has validity
for cognitive ability tests in general changed over time?). Labor force changes paired with progressive
changes in the way work is done in many fields (more complex processes, greater use of technology)
call for renewed inquiries. One hypothesis that we would like to offer is that cognitive ability tests
today have greater validity than even half a century ago. As job roles and tasks for jobs in most sectors
change over time to include more complex tools (e.g., computers), processes (e.g., virtual teamwork),
and requirements (e.g., multiple languages), the power of general mental ability as the basic learning
skill in predicting performance may increase substantially, especially in mid- to high-complexity jobs.
Another change that has put increasing demand on individuals and organizations over the last
2 decades or so is internationalization. We already know that organizations, small and large, com-
pete for customers in a global economy. However, in a time when mobility—real and virtual—is
greater than ever in humanity’s history, organizations now also internationally compete for their
labor force. Validity of cognitive ability tests has been studied in international contexts, most exten-
sively in Europe (Hülsheger, Maier, & Stumpp, 2007; Salgado, Anderson, Moscoso, Bertua, & de
Fruyt, 2003; Salgado, Anderson, Moscoso, Bertua, de Fruyt et al., 2003). Table 12.3 summarizes

TABLE 12.3
Validity of Cognitive Ability Tests for Predicting Job Performance
in European Countries
k N r ρ

Across countries
High-complexity jobs 14 1,604 .23 .64
Medium-complexity jobs 43 4,744 .27 .53
Low-complexity jobs 12 864 .25 .51

Analyses by country
France 26 1,445 .48 .64
Germany 8 746 .33 .53
Belgium and The Netherlands 15 1,075 .24 .63
Spain 11 1,182 .35 .64
United Kingdom 68 7,725 .26 .56

N = total number of subjects; k = number of studies summarized in meta-analysis; r = sample size weighted mean observed
correlation; ρ = operational validity, corrected only for sampling error and attenuation due to unreliability in the criterion.
Source: Data across countries summarized from Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., de Fruyt, F., &
Rolland, J. P., Journal of Applied Psychology, 88, 1068–1081, 2003. Data for individual countries except Germany
summarized from Salgado, J. F., & Anderson, N., European Journal of Work and Organizational Psychology, 12,
1–17, 2003. Data for Germany summarized from Hülsheger, U. R., Maier, G. W., & Stumpp, T., International
Journal of Selection and Assessment, 15, 3–18, 2008.
264 Handbook of Employee Selection

the results of European meta-analyses for job performance. Findings are fully parallel to those from
the United States: Cognitive ability tests show substantial validity, and higher validities are found
for higher complexity jobs. Moreover, highest validities are found for general mental ability rather
than specific abilities (Salgado, Anderson, Moscoso, Bertua, & de Fruyt, 2003).
There are a few practical issues worth discussing here. First, it is often naively argued that educa-
tional requirements serve as proxies for cognitive ability. This argument suggests that using a cogni-
tive ability test would not be necessary if a screening based on educational credentials were in place.
There are two flaws in this line of reasoning. Educational qualifications of applicants to the same
jobs are very similar, or at least more homogenous than those of the population at large. Conversely,
even among those who hold advanced degrees (e.g., doctoral, medical, and law degrees), there is still
substantial variability in cognitive ability (Sackett & Ostgaard, 1994; Wonderlic Inc., 2002), indicat-
ing room for utility to be derived from cognitive testing. Findings of highest predictive validities for
the most complex jobs (e.g., lawyers, physicians) underscore the usefulness of cognitive tests even
when there is homogeneity in the educational level of job applicants. If for some reason individuals
of mixed educational level were to apply for the same job, a cognitive ability test is a more precise,
valid, and efficient selection tool to use (Berry, Gruys, & Sackett, 2006).
It is also often suggested that beyond a certain required level of ability, cognitive capacity does
not contribute to performance. Such arguments essentially suggest a nonlinear relationship between
cognitive ability and performance: Cognitive ability predicts up to a certain point on the trait con-
tinuum, but validity drops off beyond that. All data that have been brought to bear on this question
tell an opposite story (Arneson, 2007; Coward & Sackett, 1990; Ghiselli & Kahneman, 1962; Hawk,
1970; Tiffin & Vincent, 1960). The relationship between ability measures and performance is linear
throughout the range of the ability spectrum. Validity remains equally high even among the most
extremely talented individuals (Wai, Lubinski, & Benbow, 2005).

GROUP DIFFERENCES ON COGNITIVE ABILITY MEASURES


One of the greatest points of concern in using cognitive ability measures in the United States is the
potential for adverse impact. In this section, we review mean group differences on cognitive abil-
ity measures and discuss their implications for adverse impact. We also review findings regarding
predictive fairness and discuss group differences in international contexts.
In the United States, employment discrimination on the basis of race, color, religion, sex, and national
origin is prohibited. Age discrimination in making employment decisions is also illegal. Historically,
disadvantaged racial and ethnic groups in the United States are African Americans (Blacks), Hispanic
Americans (Latinos), Asian Americans, and Native Americans/Pacific Islanders (see Chapter 30, this
volume, for an international perspective). Women and older adults have also been historically disad-
vantaged in many contexts. If selection decisions result in selection ratios for subgroups that are less
than 80% of those for the better performing group (typically Whites), presence of adverse impact is
concluded and the burden of proof shifts to the employer to establish the job relatedness of the selection
tools utilized, typically using criterion-related validity evidence (see also Chapter 29, this volume).
However, it is important to remember that adverse impact (or lack thereof) is the result of an
employee selection system and not only a single test. That is, adverse impact is the end result of
the magnitude of group differences, selection ratios, use of different selection tools in combination,
and the manner in which scores are combined and utilized. Sackett and Roth (1996) used a series
of Monte Carlo simulations to investigate the effects of multistage selection strategies on minority
hiring. The important features of selection systems that contributed to the level of minority hiring
included subgroup differences on the predictors, intercorrelations among the predictors in the selec-
tion system, the overall selection ratio, and the selection strategy used (i.e., top-down, hurdle, etc.).
For cognitive ability tests, group differences have been examined in dozens of primary studies
and have been meta-analytically summarized. In these studies, the measure of group differences
is typically Cohen’s d, which expresses the differences between the means of two groups being
Cognitive Abilities 265

compared in standard deviation units. In meta-analyzing these effect sizes, d values from all
individual studies are pooled and averaged to obtain an overall effect size that reflects the magnitude
of group differences in the population at large. Corrections for unreliability in cognitive ability
measures are typically not applied, because selection decisions are based on observed scores.4 In
general, d values of .80 or greater are considered large effects, those around .50 are moderate, and
those below .20 are small (Cohen, 1977). (From a theoretical perspective, d values under .20 are
often trivial; however, under extreme conditions, such as when the majority group selection ratio is
under 1%, even small differences in the .10 to .20 range can lead to violation of the four-fifths rule
and thus constitute adverse impact.)
In the I-O psychology literature, it is widely believed and reported that sex differences in cogni-
tive ability are nonexistent (e.g., see Table 1 in Ployhart & Holtz, 2008). Table 12.4 offers a more
precise and detailed view in summarizing sex differences on cognitive variables. Differences in
verbal and mathematical abilities are negligible. Women score moderately higher than men on one
particular verbal ability marker—speech production. Largest sex differences are found on visual-
spatial measures such as mental rotation and spatial perception (meta-analytic d values in favor of
men are in the .40 to .70 range). On measures of figural reasoning that incorporate some mental
rotation and spatial perception items, sex differences are about .30 standard deviation units, with
men scoring higher on average. Thus, given selection ratios of 50% or lower for men, cognitive tests
with visual-spatial items could result in adverse impact against women.
More so than sex differences in cognitive ability, race and ethnic group differences have
consumed attention in employee selection research and practice, especially in the North American
context (Hough, Oswald, & Ployhart, 2001). Table 12.5 summarizes race and ethnic group
differences on cognitive ability from the only existing meta-analysis of the literature (Roth, Bevier,
Bobko, Switzer, & Tyler, 2001). On average, Blacks score 1.00 and Hispanics .83 standard deviation
units lower than Whites on general mental ability measures used in employee selection. Group
differences on measures used in military settings are somewhat larger, especially for the White-Black
comparison. One explanation for this finding could be the greater heterogeneity among military job
applicants. Cognitive ability differences between Black and White applicants to high-complexity
jobs are smaller than among applicants to lower complexity jobs, most likely because of severe self-
selection as well as higher minimum requirements with regard to educational credentials. Among
applicants to medium- and low-complexity jobs, White-Black and White-Hispanic differences in
cognitive ability tests are large and almost certain to result in adverse impact if cognitive ability
were the only predictor used in employee selection. This finding is at the root of the validity-
diversity dilemma that most U.S. organizations face today (Kehoe, 2008; Kravitz, 2008; Potosky,
Bobko, & Roth, 2008; Sackett, De Corte, & Lievens, 2008). The situation is slightly better among
applicants to high-complexity jobs, in which group mean-score differences in cognitive ability
are only moderate (d = .63) and thus carry slightly less severe implications for adverse impact.
Data on Asian American- and Native American-White cognitive ability differences among job
applicants is scant to nonexistent, especially when the job applied to is held constant (i.e., within-job
examinations). The broader psychological literature indicates slightly higher scores among Asian
Americans compared with Whites (Gottfredson, 1997), but again, systematic data on job applicants
are scarce. The response categories used for demographic data collection in psychological research
often subsume individuals from very heterogeneous race and ethnic backgrounds in a single
category, which complicates comparisons especially with regard to the White-Asian comparisons.
(Illustrating this fact is the 2000 special census report on Asians in the United States, which lists
11 national and ethnic categories of interest as well as one residual “other” category; see Reeves &
Bennett, 2004.) The educational literature reports disconcertingly sizable lower scores among Native
Americans when compared with Whites (Humphreys, 1988). We were able to locate only one study

4 However, corrections for attenuation due to range restriction and unreliability in the criterion are advisable when compar-
ing results across studies differing in their levels of range restriction and unreliability.
266 Handbook of Employee Selection

TABLE 12.4
Meta-Analyses of Sex Differences on Cognitive Ability Measures
Cognitive Variable Study k d
Vocabulary Hyde & Linn (1988) 40 −.02
Reading comprehension Hyde & Linn (1988) 18 −.03
Speech production Hyde & Linn (1988) 12 −.33
Mathematics computation Hyde, Fennema, & Lamon (1990) 45 −.14
Mathematics concepts Hyde, Fennema, & Lamon (1990) 41 −.03
Mathematics problem solving Hyde, Fennema, & Lamon (1990) 48 .08
Spatial perception Linn & Petersen (1985) 62 .44
Spatial perception Voyer, Voyer, & Bryden (1995) 92 .44
Mental rotation Linn & Petersen (1985) 29 .73
Mental rotation Voyer, Voyer, & Bryden (1995) 78 .56
Spatial visualization Linn & Petersen (1985) 81 .13
Spatial visualization Voyer, Voyer, & Bryden (1995) 116 .19
Figural reasoning (matrices) Lynn & Irwing (2004) 10 .30

k = number of studies summarized in meta-analysis; d = standardized group mean score difference. Positive effect sizes
indicate males scoring higher on average.
Source: Data from Hyde, J. S., American Psychologist, 60, 581–592, 2005.

that compared Native North Americans and Whites in a job context (Vanderpool & Catano, 2008).
In this study, American Indians (Canadian Aboriginals) scored much lower on verbal ability tests
than on nonverbal tests.
It is important to stress that although race and ethnicity are protected categories in the United
States, the characteristics that define disadvantaged groups elsewhere are diverse (cf. Myors et al.,
2008). Cognitive ability test scores of disadvantaged groups around the globe remain largely
unstudied in employee selection settings, although exceptions can be found in a handful of countries
(e.g., Australia, Israel, New Zealand, South Africa, Taiwan, and The Netherlands). Extant research
appears to point to consistently lower scores of disadvantaged groups (e.g., Aborigines in Australia,
Canada, New Zealand, and Taiwan; Blacks in South Africa; immigrants in The Netherlands;
Sackett & Shen, 2008).

TABLE 12.5
Race and Ethnic Group Mean Score Differences in GMA Among Job Applicants
Group Comparison Setting Job Complexity N k d
White-Black Industrial Across complexity levels 375,307 11 1.00
Industrial (within-job Low 125,654 64 .86
studies) Moderate 31,990 18 .72
High 4,884 2 .63
Military Across complexity levels 245,036 1 1.46
White-Hispanic Industrial Across complexity levels 313,635 14 .83
Military Across complexity levels 221,233 1 .85

k = number of studies summarized in meta-analysis; d = standardized group mean score difference; N = total sample size.
Positive effect sizes indicate Whites scoring higher on average.
Source: Data from Tables 2, 4, and 7 of Roth, P. L., Bevier, C. A., Bobko, P., Switzer, F. S., & Tyler, P., Personnel Psychology,
54, 297–330, 2001.
Cognitive Abilities 267

Large-scale international comparisons using the same cognitive ability test in employment
settings are rare, if not nonexistent. The first two authors of this chapter recently developed items
for a computer adaptive figural reasoning test for operational use around the globe (Dilchert &
Ones, 2007). The test was intended to be used in the selection of employees for a Global 100
company regardless of job level, complexity, or location (Gibby & Boyce, 2007). Over 120,000
applicants from 100 countries, speaking a multitude of languages, took the test during the first
several months after it was made available online. The data allow us an extraordinary global look
at group differences on the same, nonverbal reasoning measure. The sampling of job applicants to
the same organization heightens the uniqueness and relevance of this data set for other employment
contexts. Among the nearly 60,000 U.S.-based applicants in this sample, group differences were
in the expected direction, with all minority groups except Asian Americans scoring lower than
Whites. However, typically observed group differences were reduced. We also analyzed national
differences for countries where sample sizes exceeded 100 applicants. The ten countries in which
job applicants scored highest on average were southeast Asian (4) and European (6). An analysis
of the data by cultural clusters revealed Confucian Asia and southern and northern Europe scoring
higher than other cultural regions. Of course, the exact nature of these group differences remains to
be studied (e.g., are they also in part due to differential attraction to jobs across countries and cultural
regions?). Regardless of the underlying mechanisms, observed differences between applicants from
different cultural regions applying to the same organization present a challenge and an opportunity
to actively shape their workforce on the basis of diversity and talent goals.
One area of group differences that has received little attention in the adverse impact literature
is that of cognitive ability differences between younger and older adults. One large-scale exami-
nation of cognitive abilities across the working life span (Avolio & Waldman, 1994) offers some
insights into the magnitudes of age-related declines in cognitive ability. Avolio and Waldman
(1994) reported mean scores of 25,140 White, Black, and Hispanic job applicants who had taken
the General Aptitude Test Battery (GATB) of the U.S. Employment Service and broke these scores
down by age groups. We used the data presented by Avolio and Waldman to compute standardized
effect sizes (d values) documenting the magnitudes of age-related declines in cognitive abilities
within race and ethnic groups. Table 12.6 presents these findings.
For all race and ethnic groups, there are sizable differences in general mental ability and specific
cognitive abilities between older and younger individuals. Using the 20–34 year olds as a com-
parison group, 35–44 years olds score about .30 standard deviation units lower on general mental
ability. Individuals who are between 45 and 54 score about .50 standard deviation units lower, and
55–65 year olds score approximately .80 standard deviation units lower. Trends for Blacks and
Hispanics mimic the data for Whites but are less precise because of smaller sample sizes. When
a general mental ability measure is used in employee selection, younger applicants stand to get
selected at greater rates than older applicants. The disparity in selection ratios would be particularly
severe if applicant pools included individuals from the entire age spectrum of adults. Comparisons
of young, White applicants with older, minority group applicants will lead to even larger observed
differences.
Focusing on specific abilities, differences between older and younger adults are lowest for verbal
ability. The oldest age group in the sample, 55–65 year olds, scores about .50 standard deviation
units lower than 20–34 year olds. The magnitudes of differences for numerical, spatial, and clerical
ability are similar to those for general mental ability. Enormously large differences are found for
form perception. The 55–65 year olds on average score about 1.53 standard deviation units lower
than 20–34 year olds. The difference between 45–54 year olds and those between 20 and 34 is also
larger than 1 standard deviation unit. Use of form perception measures is certain to result in a dras-
tic reduction of hiring rates for older adults.
Aging does result in cognitive ability declines, but certain abilities decline more with age than
others, and knowledge-based (or so called “crystallized”) abilities (e.g., vocabulary and certain
numerical skills) seem to display the slowest rate of decline in the normal population (see Avolio &
268 Handbook of Employee Selection

TABLE 12.6
Age Differences in Cognitive Ability
White Black Hispanic
Age Group Comparison
Cognitive (Reference Group
Variable 20–34 Yearsa) N d N d N d
GMA 35–44 2,571 .35 1,374 .32 360 .24
45–54 2,011 .57 643 .48 171 .62
55–65 968 .82 144 .68 49 .70
Verbal ability 35–44 2,571 .28 1,374 .24 360 .16
45–54 2,011 .36 643 .33 171 .39
55–65 968 .51 144 .39 49 .44
Numerical 35–44 2,571 .35 1,374 .40 360 .26
ability 45–54 2,011 .59 643 .57 171 .68
55–65 968 .82 144 .78 49 .76
Spatial ability 35–44 2,571 .32 1,374 .27 360 .27
45–54 2,011 .58 643 .46 171 .55
55–65 968 .91 144 .73 49 .78
Form 35–44 2,571 .54 1,374 .59 360 .59
perception 45–54 2,011 1.01 643 1.10 171 1.23
55–65 968 1.53 144 1.43 49 1.72
Clerical ability 35–44 2,571 .36 1,374 .49 360 .39
45–54 2,011 .62 643 .85 171 .90
55–65 968 .88 144 1.01 49 1.06

a Means and standard deviations for the reference group were obtained by sample-size weighting means and pooling stan-
dard deviations for 20- to 24-year-old and 25- to 34-year-old age groups. N = sample sizes for 20- to 34-year-old reference
groups (7,970, 4,720, and 1,056 for Whites, Blacks, and Hispanics, respectively). d = standardized group mean-score dif-
ference. Positive effect sizes indicate younger individuals scoring higher on average.
Source: Based on data presented in Table 3 of Avolio, B. J., & Waldman, D. A., Psychology and Aging, 9, 430–442, 1994.

Waldman, 1994; Owens, 1966; Schaie, 1994). Other evidence from the individual differences litera-
ture suggests that rates of cognitive decline are slower for those who have higher initial baseline abil-
ity (Deary, MacLennan, & Starr, 1998), higher levels of education (Deary et al., 1998; Rhodes, 2004),
and those employed in complex or enriched jobs that presumably use their cognitive abilities to a
greater extent (Schooler, Mulatu, & Oates, 1999). A recent meta-analysis suggests higher vocabulary
scores for older adults, although this effect appeared confounded by education (Verhaeghen, 2003).
In the future, the aging workforces of most Western countries will certainly necessitate greater atten-
tion to the consequences of cognitive ability test use for workforce diversity with regard to age.
Thus far, we have discussed only group mean score differences on cognitive ability tests. Another
salient issue is that of differential validity. Hunter, Schmidt, and Hunter (1979) and Schmidt,
Pearlman, and Hunter (1980) have quantitatively summarized dozens of validation studies using the
GATB with Blacks and Hispanics, respectively. Hunter et al.’s (1979) analysis demonstrated that on
average, validities for Whites were .01 correlational points higher than those for Blacks in predict-
ing objective performance criteria and .04 correlational points higher for predicting subjective rat-
ings of job performance (k = 866 validity pairs incorporating data from 120,294 Whites and 65,193
Blacks). Across 1,128 pairs of validity coefficients, Schmidt et al. (1980) showed White validities on
average to be .02 correlational points higher than Hispanic validities. Differential validity of cogni-
tive ability tests in organizational settings has not been reported for Asian and Native Americans in
the peer reviewed literature.
Cognitive Abilities 269

Rothstein and McDaniel (1992) reported an examination of differential validity by sex for cogni-
tive ability tests. Using 59 pairs of male and female correlations (N = 5,517 and 9,428, respectively),
they found observed validities to be on average .03 correlational points higher for women (validities
corrected for range restriction and unreliability in the criteria were .05 correlational points higher).
The higher validity for women was more marked in lower-complexity jobs and female-dominated
occupations. In male-dominated occupations, the validity was higher for predicting performance
among men. We were unable to locate differential validity investigations for older versus younger
adults. Future research should examine differential validity for hitherto unexamined groups (Asians,
Native Americans, older adults).
We would like to stress that an update of the existing literature on differential validity would
be valuable, because analyses for Black and Hispanic groups have relied on validation studies
conducted prior to the 1980s, and analyses by sex relied on data collected prior to 1990. Labor
force participation and occupational distributions of women, Blacks, and Hispanics are much
different today than 20–30 years ago. Changes in the nature of many jobs (e.g., greater complexity,
greater technological demands) as well as changes in the social milieu in many organizations
(e.g., emergence of workforce diversity as a core value) may manifest themselves in cognitive
ability-criteria relations. Research must also examine whether differential validity is found for
criteria other than overall job performance. In our opinion, studies on organizational citizenship
behaviors, task performance, and leadership criteria may constitute priorities. The only study yet
to examine Black-White differential validity of a cognitive ability test for predicting nontraditional
performance criteria investigated incidents of detected counterproductive behaviors (interpersonal
and those targeted at the organization) and found no evidence of differential validity (Dilchert et al.,
2007). However, replications of these results, as well as investigations among other minority groups,
are certainly warranted.
While differential validity compares the magnitudes of criterion-related validities between
groups of interest, differential prediction simultaneously compares slopes and intercepts of regres-
sion lines for such groups. A healthy body of literature in employee selection has led to the conclu-
sion that there is no predictive bias against Blacks in the United States (Rotundo & Sackett, 1999;
Schmidt, 1988). Some exceptions not withstanding (e.g., The Netherlands and South Africa), studies
from other parts of the world (as well as those for other minority groups) are sparse and need to be
conducted.

FUTURE CHALLENGES FOR RESEARCH AND PRACTICE


In this chapter, we have identified specific areas in need of additional research attention as well
as some challenges for the use of cognitive ability tests in applied settings. The high validity of
cognitive measures makes them attractive for use in employee selection. Their ability to enhance
productivity and offer substantial economic utility to organizations is indisputable. However, many
applied psychologists are frustrated, and understandably so, that various groups (e.g., Blacks,
Hispanics, other disadvantaged ethnic groups, and older applicants) on average score lower than the
majority applicants, often resulting in differential selection ratios for different groups. Our literature
is filled with suggestions about reducing the potential for adverse impact. Thoughtful description
and evaluation of various proposed alternatives is not possible in this short chapter but is available
in various papers (e.g., Campbell, 1996; De Corte, Lievens, & Sackett, 2007; Hough et al., 2001;
Hunter & Schmidt, 1982; Ployhart & Holtz, 2008; Potosky, Bobko, & Roth, 2005; Sackett, Schmitt,
Ellingson, & Kabin, 2001). Frankly, we believe that structural and procedural proposals to reduce
adverse impact are stopgap measures that are too little and too late in dealing with profound group
differences observed in occupational settings. Although the exact definition of what constitutes
protected classes may differ, societies around the world are now facing similar issues—this fact
has most recently been corroborated by the mandatory adoption of antidiscrimination laws across
countries in the European Community. It will require the collective wisdom of scientists across
270 Handbook of Employee Selection

disciplines to evaluate whether some group differences on individual differences traits can be
reduced, and if so, how. In the meantime, I-O psychologists need to face the challenges that these
group differences pose in applied settings. To this end, the responsibility is equally distributed
among scientists, who need to address the areas of concern summarized above; test publishers, who
need to continuously collect and make available data regarding group differences and predictive
fairness of their tests; and individual practitioners, who need to educate themselves on the past and
current research as well as its implications for their specific purpose.
Another, somewhat easier challenge is that of enhancing the acceptability of cognitive ability mea-
sures among applicants to high-complexity jobs. As Lubinski (2004) pointed out, cognitive ability
can be assessed in all shapes and forms: “Variegated conglomerations of information and problem-
solving content, not necessarily tied to an educational program, which may involve fresh as well as old
learning (acquired in or out of school), may be used to assess general intelligence” (p. 98). However, it
is our opinion that when new formats and approaches are used to address issues of applicant reactions
and face validity, intellectual honesty still mandates an acknowledgment of the construct being mea-
sured. The proliferation of “new” abilities and claims that such abilities are independent of traditional
intelligence are insincere and harmful to the professional reputation of our field.
We have also observed that sometimes preconceived notions of cognitive test acceptability can
cloud our judgment. Our work with nonverbal figural reasoning tests, arguably an item type that on
the surface does not appear extraordinarily related to most real-world tasks, yielded some surpris-
ing findings. Data show that such items, especially when compared to those with verbal content,
are received very positively by applicants. Although contextualization is certainly a viable method
of achieving face validity, items need not always be contextualized to invoke positive applicant
reactions.

EPILOGUE
This chapter aimed to offer a broad and forthright overview of cognitive ability tests and their use
in employee selection. Other excellent overviews of the topic may be found in Drasgow (2003, espe-
cially with regard to structural issues); Ree, Carretta, and Steindl (2001, especially with regard to
broader life correlates); Ones, Viswesvaran, and Dilchert (2004, especially with regard to validity
for learning criteria); and Ones et al. (2005, especially with regard to a criterion-related validity in
organizational settings). Debates over the use of cognitive ability tests in selection settings can also
be found in a special issue of Human Performance (Viswesvaran & Ones, 2002).
Cognitive ability is the capacity to learn, solve problems, and adapt to environments. Abstract
thinking and logic reasoning determine success in various life domains by allowing us to not only
rely on skills acquired through past experience, but to react to novel situations through knowledge
and insights acquired in mental simulations. Cognitive ability continues to be the single best deter-
minant of work performance. We believe that the benefits associated with cognitive ability test use
in employee selection far outweigh potential concerns. Nonetheless, we have summarized some
challenges that need to be tackled if I-O psychology wants to continuously develop as a field. We
are certain our profession will be up to those challenges.

REFERENCES
Alonso, A., Viswesvaran, C., & Sanchez, J. I. (2008). The mediating effects of task and contextual performance.
In J. Deller (Ed.), Research contributions to personality at work (pp. 3–17). Munich, Germany: Rainer
Hampp.
Arneson, J. J. (2007). An examination of the linearity of ability—Performance relationships among high scor-
ing applicants. Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN.
Avolio, B. J., & Waldman, D. A. (1994). Variations in cognitive, perceptual, and psychomotor abilities across
the working life span: Examining the effects of race, sex, experience, education, and occupational type.
Psychology and Aging, 9, 430–442.
Cognitive Abilities 271

Barrett, G. V., Phillips, J. S., & Alexander, R. A. (1981). Concurrent and predictive validity designs: A critical
reanalysis. Journal of Applied Psychology, 66, 1–6.
Berry, C. M., Gruys, M. L., & Sackett, P. R. (2006). Educational attainment as a proxy for cognitive ability in selec-
tion: Effects on levels of cognitive ability and adverse impact. Journal of Applied Psychology, 91, 696–705.
Berry, C. M., Sackett, P. R., & Landers, R. N. (2007). Revisiting interview-cognitive ability relationships:
Attending to specific range restriction mechanisms in meta-analysis. Personnel Psychology, 60, 837–874.
Bertolino, M., & Steiner, D. D. (2007). Fairness reactions to selection methods: An Italian study. International
Journal of Selection and Assessment, 15, 197–205.
Burke, E. (2008, April). Remarks. In N. Tippins (Chair), Internet testing: Current issues, research, solutions,
guidelines, and concerns. Symposium conducted at the annual conference of the Society for Industrial
and Organizational Psychology, San Francisco, CA.
Campbell, J. P. (1990). The role of theory in industrial and organizational psychology. In M. D. Dunnette &
L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 1, pp. 39–73). Palo
Alto, CA: Consulting Psychologists Press.
Campbell, J. P. (1996). Group differences and personnel decisions: Validity, fairness, and affirmative action.
Journal of Vocational Behavior, 49, 122–158.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York, NY: Cambridge
University Press.
Cattell, R. B. (1971). Abilities: Their structure, growth, and action. Oxford, England: Houghton Mifflin.
Chan, D., & Schmitt, N. (2004). An agenda for future research on applicant reactions to selection procedures:
A construct-oriented approach. International Journal of Selection and Assessment, 12, 9–23.
Christensen, F. G. W., Beatty, A. S., Selzer, B. K., & Ones, D. S. (2009). Thirty years of meta-analysis:
Psychology’s gift to the sciences. Manuscript submitted for publication.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. San Diego, CA: Academic Press.
Collins, J. M., Schmidt, F. L., Sanchez-Ku, M., Thomas, L., McDaniel, M. A., & Le, H. (2003). Can basic indi-
vidual differences shed light on the construct meaning of assessment center evaluations? International
Journal of Selection and Assessment, 11, 17–29.
Coward, W., & Sackett, P. R. (1990). Linearity of ability-performance relationships: A reconfirmation. Journal
of Applied Psychology, 75, 297–300.
Deary, I. J., MacLennan, W. J., & Starr, J. M. (1998). Is age kinder to the initially more able?: Differential age-
ing of a verbal ability in the Healthy Old People in Edinburgh Study. Intelligence, 26, 357–375.
Deary, I. J., Whiteman, M. C., Starr, J. M., Whalley, L. J., & Fox, H. C. (2004). The impact of childhood intel-
ligence on later life: Following up the Scottish mental surveys of 1932 and 1947. Journal of Personality
and Social Psychology, 86, 130–147.
De Corte, W., Lievens, F., & Sackett, P. R. (2007). Combining predictors to achieve optimal trade-offs between
selection quality and adverse impact. Journal of Applied Psychology, 92, 1380–1393.
Detterman, D. K., & Daniel, M. H. (1989). Correlations of mental tests with each other and with cognitive
variables are highest for low IQ groups. Intelligence, 13, 349–359.
Dilchert, S., & Ones, D. S. (2007, April). Influence of figural reasoning item characteristics on group mean-score
differences. In A. S. Boyce & R. E. Gibby (Chairs), Global cognitive ability testing: Psychometric issues
and applicant reactions. Symposium conducted at the annual conference of the Society for Industrial and
Organizational Psychology, New York, NY.
Dilchert, S., & Ones, D. S. (2009). Assessment center dimensions: Individual differences correlates and meta-
analytic incremental validity. International Journal of Selection and Assessment, 17, 254–270.
Dilchert, S., Ones, D. S., Davis, R. D., & Rostow, C. D. (2007). Cognitive ability predicts objectively measured
counterproductive work behaviors. Journal of Applied Psychology, 92, 616–627.
Drasgow, F. (2003). Intelligence and the workplace. In W. C. Borman, D. R. Ilgen, & R. J. Klimoski (Eds.),
Handbook of psychology: Industrial and organizational psychology (Vol. 12, pp. 107–130). Hoboken,
NJ: John Wiley & Sons.
Foster, D. (2008, April). Remarks. In N. Tippins (Chair), Internet testing: Current issues, research, solutions,
guidelines, and concerns. Symposium conducted at the annual conference of the Society for Industrial
and Organizational Psychology, San Francisco, CA.
Ghiselli, E. E., & Kahneman, D. (1962). Validity and non-linear heteroscedastic models. Personnel Psychology,
15, 1–11.
Gibby, R. E. (2008, April). Online and unsupervised adaptive cognitive ability testing: Lessons learned. In
R. E. Gibby & R. A. McCloy (Chairs), Benefits and challenges of online and unsupervised adaptive
testing. Symposium conducted at the annual conference of the Society for Industrial and Organizational
Psychology, San Francisco, CA.
272 Handbook of Employee Selection

Gibby, R. E., & Boyce, A. S. (2007, April). Global cognitive ability testing: Psychometric issues and applicant
reactions. Symposium conducted at the annual conference of the Society for Industrial and Organizational
Psychology, New York, NY.
Gottfredson, L. S. (1997). Mainstream science on intelligence: An editorial with 52 signatories, history and
bibliography. Intelligence, 24, 13–23.
Gottfredson, L. S. (2002). Where and why g matters: Not a mystery. Human Performance, 15, 25–46.
Gottfredson, L. S. (2004a). Intelligence: Is it the epidemiologists’ elusive “fundamental cause” of social class
inequalities in health? Journal of Personality and Social Psychology, 86, 174–199.
Gottfredson, L. S. (2004b). Life, death, and intelligence. Journal of Cognitive Education and Psychology, 4,
23–46.
Gustafsson, J.-E. (2002). Measurement from a hierarchical point of view. In H. I. Braun, D. N. Jackson, &
D. E. Wiley (Eds.), The role of constructs in psychological and educational measurement (pp. 73–95).
Mahwah, NJ: Lawrence Erlbaum.
Hausknecht, J. P., Day, D. V., & Thomas, S. C. (2004). Applicant reactions to selection procedures: An updated
model and meta-analysis. Personnel Psychology, 57, 639–683.
Hawk, J. (1970). Linearity of criterion-GATB aptitude relationships. Measurement and Evaluation in Guidance,
2, 249–251.
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants, detection and amelioration of adverse
impact in personnel selection procedures: Issues, evidence and lessons learned. International Journal of
Selection and Assessment, 9, 152–194.
Hull, C. L. (1928). Aptitude testing. Yonkers-on-Hudson, NY: World Book.
Hülsheger, U. R., Maier, G. W., & Stumpp, T. (2007). Validity of general mental ability for the prediction of
job performance and training success in Germany: A meta-analysis. International Journal of Selection
and Assessment, 15, 3–18.
Humphreys, L. G. (1988). Trends in levels of academic achievement of Blacks and other minorities. Intelligence,
12, 231–260.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin, 96, 72–98.
Hunter, J. E., & Schmidt, F. L. (1982). Ability tests: Economic benefits versus the issue of test fairness.
Industrial Relations, 21, 122–158.
Hunter, J. E., Schmidt, F. L., & Hunter, R. (1979). Differential validity of employment tests by race: A compre-
hensive review and analysis. Psychological Bulletin, 86, 721–735.
Hyde, J. S. (2005). The gender similarities hypothesis. American Psychologist, 60, 581–592.
Hyde, J. S., Fennema, E., & Lamon, S. J. (1990). Gender differences in mathematics performance: A meta-
analysis. Psychological Bulletin, 107, 139–155.
Hyde, J. S., & Linn, M. C. (1988). Gender differences in verbal ability: A meta-analysis. Psychological Bulletin,
104, 53–69.
Johnson, W., & Bouchard, T. J. (2005). The structure of human intelligence: It is verbal, perceptual, and image
rotation (VPR), not fluid and crystallized. Intelligence, 33, 393–416.
Johnson, W., te Nijenhuis, J., & Bouchard, T. J. (2007). Replication of the hierarchical visual-perceptual-image
rotation model in de Wolff and Buiten’s (1963) battery of 46 tests of mental ability. Intelligence, 35, 69–81.
Johnson, W., te Nijenhuis, J., & Bouchard, T. J. (2008). Still just 1 g: Consistent results from five test batteries.
Intelligence, 36, 81–95.
Kane, H. D., Oakland, T. D., & Brand, C. R. (2006). Differentiation at higher levels of cognitive ability:
Evidence from the United States. Journal of Genetic Psychology, 167, 327–341.
Kehoe, J. F. (2008). Commentary on pareto-optimality as a rationale for adverse impact reduction: What would
organizations do? International Journal of Selection and Assessment, 16, 195–200.
Kluger, A. N., & Rothstein, H. R. (1993). The influence of selection test type on applicant reactions to employ-
ment testing. Journal of Business and Psychology, 8, 3–25.
Kravitz, D. A. (2008). The diversity-validity dilemma: Beyond selection—The role of affirmative action.
Personnel Psychology, 61, 173–193.
Kuncel, N. R., Hezlett, S. A., & Ones, D. S. (2004). Academic performance, career potential, creativity, and
job performance: Can one construct predict them all? Journal of Personality and Social Psychology, 86,
148–161.
Lent, R. H., Aurbach, H. A., & Levin, L. S. (1971). Predictors, criteria, and significant results. Personnel
Psychology, 24, 519–533.
Linn, M. C., & Petersen, A. C. (1985). Emergence and characterization of sex-differences in spatial ability: A
meta-analysis. Child Development, 56, 1479–1498.
Cognitive Abilities 273

Lubinski, D. (2004). Introduction to the special section on cognitive abilities: 100 Years after Spearman’s
(1904) “ ‘General Intelligence,’ Objectively Determined and Measured”. Journal of Personality and
Social Psychology, 86, 96–111.
Lynn, R., & Irwing, P. (2004). Sex differences on the progressive matrices: A meta-analysis. Intelligence, 32,
481–498.
McCloy, R. A., Campbell, J. P., & Cudeck, R. (1994). A confirmatory test of a model of performance determinants.
Journal of Applied Psychology, 79, 493–505.
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb, W. (2007). Situational judgment tests, response
instructions, and validity: A meta-analysis. Personnel Psychology, 60, 63–91.
Mead, A. D., & Drasgow, F. (1993). Equivalence of computerized and paper-and-pencil cognitive ability tests:
A meta-analysis. Psychological Bulletin, 114, 449–458.
Myors, B., Lievens, F., Schollaert, E., Van Hoye, G., Cronshaw, S. F., Mladinic, A., et al. (2008). International
perspectives on the legal environment for selection. Industrial and Organizational Psychology, 1,
206–246.
Nathan, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A meta-analytic investigation.
Personnel Psychology, 41, 517–535.
Nikolaou, I., & Judge, T. A. (2007). Fairness reactions to personnel selection techniques in Greece: The role of
core self-evaluations. International Journal of Selection and Assessment, 15, 206–219.
Olea, M. M., & Ree, M. J. (1994). Predicting pilot and navigator criteria: Not much more than g. Journal of
Applied Psychology, 79, 845–851.
Ones, D. S. (2004, October). Validity of cognitive ability tests in selection: Quantitative summaries of
meta-analyses. Cattell Award address given at the annual conference of the Society for Multivariate
Experimental Psychology, Naples, FL.
Ones, D. S., & Dilchert, S. (2004). Practical versus general intelligence in predicting success in work and
educational settings: A first-order and a second-order meta-analysis. Paper presented at the University
of Amsterdam, Amsterdam, The Netherlands.
Ones, D. S., Viswesvaran, C., & Dilchert, S. (2004). Cognitive ability in selection decisions. In O. Wilhelm &
R. W. Engle (Eds.), Handbook of understanding and measuring intelligence (pp. 431–468). Thousand
Oaks, CA: Sage.
Ones, D. S., Viswesvaran, C., & Dilchert, S. (2005). Cognitive ability in personnel selection decisions. In
A. Evers, O. Voskuijl, & N. Anderson (Eds.), Handbook of selection (pp. 143–173). Oxford, England:
Blackwell.
Owens, W. (1966). Age and mental abilities: A second adult follow-up. Journal of Educational Psychology,
57, 311–325.
Plomin, R., & Spinath, F. M. (2004). Intelligence: Genetics, genes, and genomics. Journal of Personality and
Social Psychology, 86, 112–129.
Ployhart, R. E., & Holtz, B. C. (2008). The diversity-validity dilemma: Strategies for reducing racioethnic and
sex subgroup differences and adverse impact in selection. Personnel Psychology, 61, 153–172.
Potosky, D., Bobko, P., & Roth, P. L. (2005). Forming composites of cognitive ability and alternative measures
to predict job performance and reduce adverse impact: Corrected estimates and realistic expectations.
International Journal of Selection and Assessment, 13, 304–315.
Potosky, D., Bobko, P., & Roth, P. L. (2008). Some comments on Pareto thinking, test validity, and adverse
impact: When “and” is optimal and “or” is a trade-off. International Journal of Selection and Assessment,
16, 201–205.
Ree, M. J., & Carretta, T. R. (1996). Central role of g in military pilot selection. International Journal of
Aviation Psychology, 6, 111–123.
Ree, M. J., Carretta, T. R., & Steindl, J. R. (2001). Cognitive ability. In N. Anderson, D. S. Ones, H. K. Sinangil,
& C. Viswesvaran (Eds.), Handbook of industrial, work and organizational psychology: Vol. 1: Personnel
psychology (pp. 219–232). Thousand Oaks, CA: Sage.
Reeve, C. L., & Lam, H. (2007). Consideration of g as a common antecedent for cognitive ability test performance,
test motivation, and perceived fairness. Intelligence, 35, 347–358.
Reeves, T. J., & Bennett, C. E. (2004). We the people: Asians in the United States—Census 2000 special report
(CENSR-17). Washington, DC: U.S. Department of Commerce, Economics and Statistics Administration,
U.S. Census Bureau.
Rhodes, M. G. (2004). Age-related differences in performance on the Wisconsin Card Sorting Test: A meta-
analytic review. Psychology and Aging, 19, 482–494.
Roth, P. L., Bevier, C. A., Bobko, P., Switzer, F. S., & Tyler, P. (2001). Ethnic group differences in cognitive
ability in employment and educational settings: A meta-analysis. Personnel Psychology, 54, 297–330.
274 Handbook of Employee Selection

Rothstein, H. R., & McDaniel, M. A. (1992). Differential validity by sex in employment settings. Journal of
Business and Psychology, 7, 45–62.
Rotundo, M., & Sackett, P. R. (1999). Effect of rater race on conclusions regarding differential prediction in
cognitive ability tests. Journal of Applied Psychology, 84, 815–822.
Ryan, A. M., & Greguras, G. J. (1998). Life is not multiple choice: Reactions to the alternatives. In M. D. Hakel
(Ed.), Beyond multiple choice: Evaluating alternatives to traditional testing for selection (pp. 183–202).
Mahwah, NJ: Lawrence Erlbaum.
Ryan, A. M., McFarland, L., Baron, H., & Page, R. (1999). An international look at selection practices: Nation
and culture as explanations for variability in practice. Personnel Psychology, 52, 359–391.
Sackett, P. R., Borneman, M. J., & Connelly, B. S. (2008). High stakes testing in higher education and employ-
ment: Appraising the evidence for validity and fairness. American Psychologist, 63, 215–227.
Sackett, P. R., De Corte, W., & Lievens, F. (2008). Pareto-optimal predictor composite formation: A comple-
mentary approach to alleviating the selection quality/adverse impact dilemma. International Journal of
Selection and Assessment, 16, 206–209.
Sackett, P. R., & Ostgaard, D. J. (1994). Job-specific applicant pools and national norms for cognitive ability
tests: Implications for range restriction corrections in validation research. Journal of Applied Psychology,
79, 680–684.
Sackett, P. R., & Roth, L. (1996). Multi-stage selection strategies: A Monte Carlo investigation of effects on
performance and minority hiring. Personnel Psychology, 49, 549–572.
Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment,
credentialing, and higher education: Prospects in a post-affirmative-action world. American Psychologist,
56, 302–318.
Sackett, P. R., & Shen, W. (2008, April). International perspectives on the legal environment for selec-
tion. Symposium conducted at the annual conference of the Society for Industrial and Organizational
Psychology, San Francisco, CA.
Salgado, J. F., & Anderson, N. (2003). Validity generalization of GMA tests across countries in the European
Community. European Journal of Work and Organizational Psychology, 12, 1–17.
Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., & de Fruyt, F. (2003). International validity generalization
of GMA and cognitive abilities: A European community meta-analysis. Personnel Psychology, 56, 605.
Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., de Fruyt, F., & Rolland, J. P. (2003). A meta-analytic
study of general mental ability validity for different occupations in the European Community. Journal of
Applied Psychology, 88, 1068–1081.
Salgado, J. F., Viswesvaran, C., & Ones, D. S. (2001). Predictors used for personnel selection: An overview
of constructs, methods and techniques. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran
(Eds.), Handbook of industrial, work and organizational psychology: Vol. 1: Personnel psychology
(pp. 165–199). London, England: Sage.
Schaie, K. W. (1994). The course of adult intellectual development. American Psychologist, 49, 304–313.
Schmidt, F. L. (1988). The problem of group differences in ability test scores in employment selection. Journal
of Vocational Behavior, 33, 272–292.
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization.
Journal of Applied Psychology, 62, 529–540.
Schmidt, F. L., & Hunter, J. E. (1981). New research findings in personnel selection: Myths meet realities in
the 1980s. Management, 23, 23–27.
Schmidt, F. L., & Hunter, J. (2004). General mental ability in the world of work: Occupational attainment and
job performance. Journal of Personality and Social Psychology, 86, 162–173.
Schmidt, F. L., Hunter, J. E., & Outerbridge, A. N. (1986). Impact of job experience and ability on job
knowledge, work sample performance, and supervisory ratings of job performance. Journal of Applied
Psychology, 71, 432–439.
Schmidt, F. L., Hunter, J. E., Outerbridge, A. N., & Goff, S. (1988). Joint relation of experience and ability with
job performance: Test of three hypotheses. Journal of Applied Psychology, 73, 46–57.
Schmidt, F. L., Pearlman, K., & Hunter, J. E. (1980). The validity and fairness of employment and educational
tests for Hispanic Americans: A review and analysis. Personnel Psychology, 33, 705–724.
Schooler, C., Mulatu, M. S., & Oates, G. (1999). The continuing effects of substantively complex work on the
intellectual functioning of older workers. Psychology and Aging, 14, 483–506.
Spearman, C. (1904). “General intelligence,” objectively determined and measured. American Journal of
Psychology, 15, 201–293.
Stern, W. (1911). Die differentielle Psychologie in ihren methodischen Grundlagen [Differential psychology
and its methodological foundations]. Leipzig, Germany: J. A. Barth.
Cognitive Abilities 275

Tiffin, J., & Vincent, N. L. (1960). Comparison of empirical and theoretical expectancies. Personnel Psychology,
13, 59–64.
Vanderpool, M., & Catano, V. M. (2008). Comparing the performance of Native North Americans and pre-
dominantly White military recruits on verbal and nonverbal measures of cognitive ability. International
Journal of Selection and Assessment, 16, 239–248.
Verhaeghen, P. (2003). Aging and vocabulary score: A meta-analysis. Psychology & Aging, 18, 332–339.
Viswesvaran, C., & Ones, D. S. (2002). Special issue: Role of general mental ability in industrial, work, and
organizational psychology. Human Performance, 15.
Voyer, D., Voyer, S., & Bryden, M. (1995). Magnitude of sex differences in spatial abilities: A meta-analysis
and consideration of critical variables. Psychological Bulletin, 117, 250–270.
Wai, J., Lubinski, D., & Benbow, C. P. (2005). Creativity and occupational accomplishments among intellectu-
ally precocious youths: An age 13 to age 33 longitudinal study. Journal of Educational Psychology, 97,
484–492.
Wonderlic Inc. (2002). Wonderlic Personnel Test and Scholastic Level Exam user’s manual. Libertyville, IL:
Author.
This page intentionally left blank
13 Physical Performance Tests
Deborah L. Gebhardt and Todd A. Baker

Assessment of physical performance has a historical base in the exercise science, medical,
psychology, and military fields. It is an interdisciplinary field of study that encompasses work
physiology, biomechanics, industrial engineering, applied psychology, and medicine. Because of
the multidisciplinary aspects and uses of physical testing, it has been influenced by the law and
standards for testing in occupational settings. Hogan (1991a) provided a rich historical background
that highlights many of the factors that lead to the types of physical testing used today. Although
our society has now become computer-driven, there still remains a segment of civilian and military
jobs that are arduous in nature. A survey in 2001 indicated that preemployment physical tests
were used by less than 10% of the employers (Salgado, Viswesvaran, & Ones, 2001). However,
physical performance tests remain integral in selection of public safety personnel and have become
popular for evaluating applicants seeking blue collar jobs in the warehouse, manufacturing, long-
shore, telecommunications, railroad, airline, electric, and natural gas industries, as well as for law
enforcement and paramedic personnel (Gebhardt & Baker, 2007).
Physical performance tests are used for selection of applicants into arduous jobs, retention of
incumbents, and evaluation of general levels of physical fitness. Of primary interest in the employment
setting are the uses for selection and retention. Although physical testing is common in the selection
setting, few organizations evaluate incumbent personnel at specified intervals (e.g., annually) because
of union agreements and divergent opinions on the consequences of failing to meet the established
standard. A few public safety agencies require annual physical qualification (e.g., Nuclear Regulatory
Commission, state police agencies) with employment consequences (Gebhardt & Baker, 2006). The
employment consequences range from remedial training to job suspension, denial of promotion and
bonus payments, and job loss.

JOB ANALYSIS FOR ARDUOUS JOBS


Similar to all other assessments used to make hiring or promotion decisions, physical performance
tests need to be supported by a detailed job analysis. In the physical area it is important to consider
all of the underlying parameters (e.g., environment, equipment) that may impact job performance. It
would be unrealistic to consider police officer job tasks without including the weight of the equip-
ment worn by the officer (e.g., bulletproof vest, weapon, ammunition, radio, handcuffs). As with all
job analyses, the essential tasks and abilities should be identified.

ESSENTIAL PHYSICALLY DEMANDING TASKS


The tasks that comprise the job —whether physical, cognitive, or psychomotor—are rated on scales
such as frequency of performance and importance to the job. When the intent of the job analysis is
to develop and validate physical performance tests, the task statements should be specific in nature
to allow for identification of the frequency of specific types of physical activity (e.g., lifting different

277
278 Handbook of Employee Selection

types and weights of objects). Further, scales that define the amount of time required to complete a
task (e.g., 10 seconds, 5 minutes) and the physical effort involved enable the job analyst to classify
tasks by physical demand. An additional scale, expected to perform, is used for public safety jobs
(e.g., nuclear security officer) that contain tasks with high importance to job performance, but which
are rarely performed (e.g., fire weapon at suspect). For example, nuclear plant security officers are
responsible for protecting the plant from sabotage and terrorist attack. Although an armed attack on
a nuclear plant has not occurred, the officers are tasked with protecting the plant and expected to
perform tasks related to protecting the plant should an attack occur.
Past research found that task ratings can be completed by job incumbents or supervisors (Hogan,
1991a). Because of the significance of the frequency and time spent rating the development of the
physical tests and criterion measures, incumbent personnel are typically best suited to complete the
job analysis questionnaire because they perform the tasks and therefore know the frequency and
duration parameters. If supervisors are used, first-line supervisors with previous job performance
should be selected.
Numerous decision rules or algorithms (frequency, importance, and time-spent combinations)
have been used to identify essential tasks from task ratings. There is not one specific algorithm
associated with physical-oriented job analysis. Selection of the most appropriate algorithm depends
upon the nature and mission of the job. For example, jobs in which most tasks are performed fre-
quently (e.g., assembly line) may require a larger weighting for frequency than importance. Jobs that
include highly important tasks that are infrequently performed (e.g., fire weapon at suspect) may use
an algorithm in which there is a separate importance or frequency mean cutoff to identify essential
tasks. Selection of the most appropriate algorithm requires knowledge of the job.
An initial overview of the physical demand of the essential tasks can be obtained by using the
physical effort scale (Fleishman, Gebhardt, & Hogan, 1986). Tasks with mean physical effort rat-
ings equal to or above a specified value (e.g., 4 on a 7-point scale) are considered physically demand-
ing. An alternate method to identify the essential physically demanding tasks is to compute the
product for each essential task’s mean physical effort and frequency task means. These data assist
the researcher in determining whether the essential tasks have adequate physical demand to require
further investigation.

ERGONOMIC/BIOMECHANICAL/PHYSIOLOGICAL ANALYSIS
Ergonomic, physiological, and biomechanical data are used to quantify the demands of a job.
These three types of analysis provide direct measures of job demand. The approaches range
from simple measures such as the distance a worker walks when gathering tools, to sophisticated
measures involving oxygen uptake, mathematical modeling, and use of archival engineering
data. For most jobs, simple ergonomic measures such as weights of objects, distances objects
are carried, and heights lifted to and from can be easily gathered at the job site. Where needed,
the forces required to push/pull objects can be obtained using a load-cell device that measures
force production.
Basic physiological and biomechanical data can define the actual demands of job tasks. The type
of data gathered is dependent upon the essential tasks and physical demands of the job. The data can
be gathered using a variety of equipment (e.g., heart rate monitor, accelerometer, oxygen/gas sensor,
mathematical modeling). Heart rate monitors can be used to assess physiological workload for jobs
that require sustained performance of one or more tasks at high intensities for extended time periods
(e.g., order filler, firefighter). The monitor collects the individual’s heart rate while performing physi-
cally demanding tasks to determine the heart rate response to the work and thus, the percentage of
maximum heart rate at which the individual was working. For example, if a 30-year old warehouse
order filler with a maximum heart rate of 190 beats per minute (bpm) (220 – 30 = 190 bpm) is work-
ing at an average heart rate of 142.5 bpm, he is working at 75% of his maximum (142.5/190 = 0.75).
The American College of Sports Medicine (ACSM) classified the intensity of physical activity in
Physical Performance Tests 279

terms of the percentage of maximum heart rate (Whaley, Brubaker, & Otto, 2006). Table 13.1 lists the
ACSM intensities, which range from very light to maximum (Whaley et al., 2006). Gebhardt, Baker,
and Thune (2006) found that workers in the order-filler job had heart rates of 71–81% of maximum
across a 3- to 4-hour time frame, thus placing the job in the “hard” intensity level. This information
can then be used to determine an estimate of the maximum oxygen uptake (VO2submax) needed to
perform the job tasks.
Past research indicated that to sustain arduous work for an 8-hour period, the individual must be
working at 40–50% of maximum aerobic capacity (VO2 max) (Astrand, Rodahl, Dahl, & Stromme,
2003; McArdle, Katch, & Katch, 2007). Direct measure of oxygen uptake has been performed on
jobs ranging from light industry (e.g., manual materials handling) to firefighting and military jobs
(Bilzon, Scarpello, Smith, Ravenhill, & Rayson, 2001; Sothmann, Gebhardt, Baker, Kastello, &
Sheppard, 2004). Examples of results from studies that gathered VO2 data found that ship-board
firefighting required a VO2 of 3.1 liters/minute (L · min–1) or approximately 36.5 milliliters of
oxygen/kilogram of body weight/minute (mL · kg–1 · min–1 ), whereas urban and forest firefighters
VO2 minimum requirements ranged from 33.5 to 45.0 mL · kg–1 · min–1 (Bilzon et al., 2001; Gledhill &
Jamnik, 1992; Sothmann et al., 2004). This type of physiological data, along with heart rate data,
can be used to determine whether a test of aerobic capacity would be useful and establish job-
related passing scores for aerobic capacity tests.
Biomechanical data, which encompass the use of physics principles to define human movement,
can be gathered by direct measurement (e.g., weight, time, distance) and film/video data. If the
force to move or lift an object cannot be measured with a load cell as described above, the param-
eters related to the movement can be used to calculate the force exerted in a movement. A simple
biomechanical or mathematical model was developed to calculate the force a paramedic needed
to exert to lift either the head or foot end of a patient loaded stretcher (Gebhardt & Crump, 1984).
The length and weight of the stretcher and the height and weight of the most common patient were
used to determine that a force of 152 lb was required to lift the head end of a stretcher carrying a
200-lb patient. Another method requires filming (motion analysis) workers performing a job task.
The motions are captured using optical sensors placed at the subjects’ joints (e.g., elbow). These data
are then mathematically transformed to provide indications of the forces incurred at specific body
joints (e.g., knee, hip), which in turn yield an indication of the forces the worker needs to generate
to complete the work task.
In summary, ergonomic, biomechanical, and physiological data provide information that is
important to defining the physical demand of a job and can be used in a validation study to assist in
setting passing scores. These data are not readily available and must be gathered on site.

IDENTIFICATION OF ENVIRONMENTAL CONDITIONS


The environmental working conditions (e.g., heat, cold) play an integral part in the performance of
physical tasks. Arduous work performed in high temperatures (e.g., 90º or greater) and/or occlu-
sive clothing can reduce the frequency of task performance or increase the time required for task

TABLE 13.1
ACSM’s Categories of Physical Activity Intensity
Intensity Percentage of Maximum Heart Rate
Very light <35
Light 35−54
Moderate 55−69
Hard 70−89
Very hard ≥90
Maximum 100
280 Handbook of Employee Selection

completion. In either case, these factors should be taken into account when viewing the results of
job task ratings that have unexpected mean values. For example, nuclear power plant workers wear
occlusive clothing to protect them from the radiation. This clothing increases the workers’ core
temperature, which causes excessive sweating and reduces the workers’ ability to perform job tasks.
Research has shown that individuals with higher cardiovascular endurance are more readily able
to adjust to a heated environment (Astrand et al., 2003; Pandolf, Burse, & Goldman, 1977). Thus,
aerobic capacity may be relevant to job success because of the interaction of the physical demand
and the environmental conditions. Defining the demands of the essential tasks may assist in design-
ing the testing procedures, as well as criterion measures used in criterion-related validation studies.
Environmental condition information can be collected through job analysis questionnaires, incum-
bent focus group meetings, company operating procedures, and past weather history.

IDENTIFICATION OF REQUIRED PHYSICAL ABILITIES


Identification of the physical abilities required for a position provides the link between basic ability
tests and job analysis results. Because basic ability tests assess general abilities and not job tasks,
the required abilities for the job need to be established to provide job-related support for the tests.
Physical abilities have been defined in many contexts. The abilities below are defined from a physi-
ological standpoint (Astrand et al., 2003; McArdle et al., 2007).

1. Muscular strength is the ability to exert force to lift, push, pull, or hold objects.
2. Muscular endurance is the ability to exert force continuously over moderate to long time
periods.
3. Aerobic capacity or cardiovascular endurance is the ability of the respiratory and cardio-
vascular systems to provide oxygen to the body systems for medium- to high-intensity
tasks performed over a moderate time period.
4. Anaerobic power is the ability to complete high-intensity, short-duration (e.g., 5–90 sec-
onds) tasks using stored energy (e.g., adenosine triphosphate).
5. Flexibility involves the range of motion at the joints (e.g., knee, shoulders) to bend, stoop,
rotate, and reach in all directions with the arms and legs.
6. Equilibrium is the ability to maintain the center of gravity over the base of support (e.g.,
feet) when outside forces (e.g., gravity, slipping on ice) occur.
7. Coordination is the ability to integrate sight, hearing, and other neuron-sensory cues to
perform motor activities (e.g., change of direction) in an accurate sequential pattern.

Past research has identified different structures for classifying physical abilities. One of the first
taxonomies of physical abilities was generated by Fleishman through factor analysis and was refined
into nine abilities (e.g., static strength, dynamic strength, coordination, equilibrium) similar to those
listed above (Fleishman & Quaintance, 1984; Fleishman, 1964). A second study using test data
from workers in physically demanding jobs found three components describing physical perfor-
mance: muscular strength, endurance, and movement quality (Hogan, 1991b). A subsequent study
using equal samples of men and women, not found in previous studies, included 25 physical tests,
some of which were found in the Fleishman study (Myers, Gebhardt, Crump, & Fleishman, 1993).
A confirmatory factor analysis postulating a six-factor structure was found to best describe physi-
cal performance for men and women. It included static or muscular strength, explosive strength or
anaerobic power, aerobic capacity (stamina), trunk strength, and flexibility. A comparison of most
of these studies is presented by Guion (1998) in which muscular strength, muscular endurance, and
muscular power (anaerobic power) were grouped under a single factor (muscular strength) and flex-
ibility, balance, and coordination were grouped under Hogan’s movement quality factor. However,
use of a single strength factor does not correspond to the physiological components that underlie
performance of different types of physical tasks. For example, it takes 5–10 minutes to complete
Physical Performance Tests 281

300 turns when closing large wheel valves. This requires a high level of muscular endurance and
would fall under the strength factor in Hogan’s model. However, a test of muscular strength may
not be the best predictor for performing this task. Rather, a test of muscular endurance would be
better suited. Although each of these structures has scientific merit, a combination of these studies
provides a framework for identifying the physical requirement in the work setting. Theses abilities
are muscular strength, muscular endurance, aerobic capacity, anaerobic power, flexibility, and equi-
librium, along with a coordination factor.
Performance of physical tasks requires varying levels of the different physical abilities. The
muscular strength may be as minimal as lifting a spoon or as high as lifting 90-lb cement bags.
Similarly, the energy expenditure may be primarily anaerobic (e.g., drag a victim 50 feet from
a fire) or aerobic (e.g., fill eight warehouse orders totaling 8,900 lbs. in 7 hours) depending on
the duration and intensity of the activity. When identifying the relevant physical abilities for a
position, it is more informative to gather information related to the level of the physical abilities
needed to complete essential job tasks. The Fleishman Job Analysis Survey provides a set of nine
Likert physical ability-rating scales and 10 psychomotor ability scales (Fleishman & Quaintance,
1984). An alternate physical demands inventory that uses seven physical ability rating scales
(e.g., muscular strength, muscular endurance, anaerobic power, aerobic capacity) and behavioral
anchors targeted at work behaviors has also been used to define job demands (Gebhardt, 1984).
Incumbents, supervisors, or job analysts can complete these ratings with the end product being a
profile of the physical demand of the essential job tasks. This approach allows for comparison of
multiple jobs and assists in the selection or design of testing procedures for only relevant abilities
(Gebhardt, Baker, Curry, & McCallum, 2005). Regardless of the method used to determine the
abilities related to job performance, the procedures must be logical and linked back to the essential
job tasks.

PHYSICAL PERFORMANCE TESTS


A variety of physical performance predictor tests have been used for candidate selection and incum-
bent assessment. These tests can be classified as basic ability or job simulation/work sample tests.
Basic ability tests measure a single ability or construct (e.g., muscular strength, flexibility) and
typically do not resemble job tasks. These tests assess the physical abilities identified as required
for performance of essential job tasks. Basic ability tests can be used for multiple jobs in an orga-
nization that requires the same or a subset of abilities. These tests are safe to administer because of
the controlled nature of the testing protocol. Examinees typically perform simple movements (e.g.,
elbow flexion, stepping onto a platform at a specified cadence) in a basic ability test, thus resulting
in a low risk of injury for applicants. Several overviews of basic ability tests can be found in reviews
completed in the 1990s (Hogan, 1991a; Landy et al., 1992).
Muscular strength tests fall into three categories: isometric, isotonic, and isokinetic. Isometric or
static strength tests require exerting a maximum force without movement at the joint (e.g., elbow).
In this type of test, a muscle group generates force but the length of the muscles remains unchanged
(Astrand et al., 2003; McArdle et al., 2007). The arm lift test, an example of an isometric or static
strength test, requires an individual to stand with the arms at the sides of the torso with elbows
flexed to 90º while holding a bar that is connected to a stationary platform. The subject exerts an
upward vertical force on the bar and receives a score that measures the force generated (Chaffin,
Herrin, Keyserling, & Foulke, 1977). Isometric shoulder, arm, torso, and leg strength tests have
been used extensively in a selection setting and have been shown to be valid predictors of job perfor-
mance (r = .39 to .63) (Blakely, Quinones, Crawford, & Jago, 1994; Gebhardt, Baker, & Sheppard,
1998; Jackson & Sekula, 1999).
Isotonic tests measure maximum force through a range of motion at a joint(s) (e.g., hip, knee)
and are predictive of job tasks performed in a similar range of motion. A muscle group generates
force and the length of the muscles shortens with the concentric contraction (Astrand et al., 2003;
282 Handbook of Employee Selection

McArdle et al., 2007). Tests such as one repetition bench press or a dynamic lift to a specified height
are examples of isotonic strength tests. Isotonic tests have been found to be significant predictors for
public safety jobs (Davis, Dotson, & Santa Maria, 1982; Gebhardt & Crump, 1984).
Isokinetic testing assesses the force produced through a specified range of motion at the shoul-
der, back, and knee joints. The equipment used incorporates a force recording device (load cell)
and computer software, which controls the speed (degrees/second) at which a subject can per-
form maximal flexion and extension movements. The force generated by a subject is measured
in units of torque (τ), a vector quality that represents the force generated when rotating an object
(e.g., lower leg) about an axis (e.g., knee) (McGinnis, 2007). A torque curve is produced for each
joint measured. Scores are generated for each joint and summed to form a strength index. There
is limited published research using isokinetic testing in an occupational setting. However, some
research has shown a relationship between isokinetic test scores and injury reduction (Gilliam
& Lund, 2000; Karwowski & Mital, 1986). Research comparing isokinetic tests with isometric
and isotonic tests found the correlations among the tests to be high (r = .91 to .94; Karwowski &
Mital, 1986).
Muscular endurance (or dynamic strength) tests assess the ability to withstand muscular fatigue.
The duration of these tests varies in relation to the desired outcome and demands of the job. The
arm endurance test, in which a subject pedals an arm ergometer at a set resistance level (e.g., 50 W)
for a specified time period, is an example of a muscular endurance test (Gebhardt et al., 1998). The
test can be scored by counting the number of revolutions in a specified time period or the time until
a subject is unable to maintain a specific cadence. Other tests such as situps and pushups can also
be measured to exhaustion or for a specified duration.
Aerobic capacity tests assess the efficiency of the cardiovascular system (i.e., lungs, blood ves-
sels, heart) to deliver oxygen to the muscles. The tests can be classified as maximal and submaxi-
mal. In a maximal test, the subject typically runs on a treadmill or pedals a bicycle at incremental
workloads (e.g., increased treadmill speed and/or slope) until reaching exhaustion. The test can be
scored as the time to exhaustion or using a regression equation to determine the oxygen uptake value
(i.e., mL · kg−1 · min−1). The Bruce or the Balke treadmill protocols are commonly used (Whaley
et al., 2006). The submaximal assessments include the step test, 1.5-mile, 1-mile walk, and the
YMCA or Astrand-Rhyming bicycle test. The goal of the step and bicycle tests is to provide an
estimate of VO2max using heart rate response to the workload of the exercise (Astrand et al., 2003;
Golding, 2000). The results are reported in milliliters of oxygen per kilogram of body weight per
minute and expressed as VO2submax. The submaximal tests are used more frequently in employee
selection. Maximal and submaximal tests are considered to be medical assessments as defined by
the Americans with Disabilities Act of 1990 (ADA) (1990) and, therefore, should be given after
conditional offer of the job.
Flexibility and equilibrium tests have been used in employee selection. However, they are rarely
significant predictors of job performance. The correlations between job performance and these tests
ranges from 0.00 to 0.18 (Gebhardt et al., 2005; Gebhardt, Baker, & Sheppard, 1992). These correla-
tions may reflect that flexibility is specific to the joints providing the movement (e.g., knee, shoul-
der). Similarly, it appears that only when high levels of equilibrium (e.g., leaning out from heights of
40 ft or greater when lashing containers to a ship) are involved, is equilibrium a significant predictor
of job performance (Gebhardt, Schemmer, & Crump, 1985).
Some basic ability tests can be used to assess multiple abilities depending upon the intensity
and duration of the test. For example, the arm endurance test described above can be a muscular
endurance test or can become an anaerobic power test by shortening the duration (e.g., 10 sec) and
increasing the resistance (e.g., 100 W). Finally, most basic ability tests are practical in that they
are easily stored when not in use, have a small footprint, and are easy to transport and set up when
testing at multiple locations. The shortcoming of basic ability tests is that they do not resemble the
job tasks (Hogan, 1991a). Table 13.2 provides a listing of commonly used basic ability tests. For an
overview of these tests the reader should see the Hogan article.
Physical Performance Tests 283

TABLE 13.2
Basic Physical Ability Tests
Physical Ability Example Tests Physical Ability Example Tests
Muscular strength
Upper body Arm lift Aerobic capacity Step test
Shoulder lift 1.5-mile run
Handgrip 1-mile walk
Static push
Static pull
Chest pull
Dynamic lift
Pushups
Trunk/core Situps Flexibility Sit and reach
Trunk pull Joint range of motion
Lower body Leg lift Equilibrium Stabiliometer
Leg press balance beam
Muscular endurance
Upper body Arm endurance Anaerobic power Shuttle run
Pushups 100-yd run
Arm ergometer (10 seconds)
Margaria test
Purdue pegboard test
Trunk/core Situps
Lower body Stepping platform

Job simulations or work sample tests include components of the job (e.g., pursuing a suspect,
lifting and carrying boxes) and are used as predictors or criterion measures. Job simulations require
performance of actual or simulated job tasks during the test and may require use of job-specific
equipment. The primary advantage of a job simulation is its resemblance to the job. Further, they
can be developed directly from the essential job tasks and can provide an initial indication of how an
individual handles equipment. The feasibility of developing a simulation that does not include equip-
ment skills learned in training or on the job may be difficult. However, other equipment (e.g., weight
vest) can be used to simulate actual equipment (e.g., firefighter bunker gear). When job simulations
consist of a series of tasks, the sequence of performance should replicate the job as closely as pos-
sible The duration and intensity of a job simulation should reflect job conditions and not a protracted
set of tasks that last substantially longer than job events (Baker, Sheppard, & Gebhardt, 2000). It
is paramount that simulated tasks represent the critical physical job behaviors and that the param-
eters selected can be scored in a meaningful way to identify true individual differences (American
Educational Research Association [AERA], American Psychological Association, & National
Council on Measurement in Education, 1999). Many job simulations are time dependent, whereas
others call for the completion of a task or set of tasks in a specific time period. Regardless of the
scoring procedure used, it should reflect the job tasks and working conditions (e.g., distance run).
The primary disadvantages of job simulations are equipment size and construction (safe, durable,
storage), applicant safety (e.g., higher injury risk), and scoring the simulation. Despite the disad-
vantages, job simulations based on essential tasks and ergonomic parameters (e.g., duration, inten-
sity) possess content validity. Depending upon the job of interest, the following parameters should
be considered when designing a job simulation: (a) dimensions and weights of objects handled,
(b) frequency of weight category (e.g., 30–50 lb) handled in a time period (e.g., hour, day), (c) heights
to which objects lifted, (d) distances objects carried, (e) distances workers walk, (f) speed of task
movement, and (g) fidelity to the job.
284 Handbook of Employee Selection

Job simulations are commonly used in public safety jobs by law enforcement and fire depart-
ments. Less common are simulations of blue-collar jobs (e.g., warehouse worker, pipefitter, electri-
cian). These simulations involve lifting, pushing, and pulling movements, with lifting and carrying
of weighted objects being the most commonly used assessment.
Another type of lifting test—isoinertial—was developed to assess work capacity in a struc-
tured manner and increase the safety of the lifting tasks. Isoinertial tests involve lifting predeter-
mined weights from floor level to a defined height (e.g., waist, shoulder) at a specified pace (e.g.,
every 5 sec). This differs from the psychophysical approach in which the subject determines the
weight lifted. For isoinertial tests, the weight is increased by 5 or 10 lb every 20 to 40 seconds.
Depending upon the protocol, the weight of the objects is increased until the subject achieves
maximum capacity, or the maximum weight defined by the job analysis (Gebhardt et al., 2006;
Mayer, Gatchel, & Mooney, 1990). Isoinertial tests are used for selection of manual materi-
als handling personnel. Past research has shown that a progressive isoinertial lifting test is a
good assessment of an individual’s ability to perform lifting tasks over a time period (Hazard,
Reeves, & Fenwick, 1992) and an effective baseline test for lifting capacity. Isoinertial tests
were found to be a reliable, safe, and inexpensive method to screen for jobs with frequent lift-
ing (Hattori et al., 1998; Lygren, Dragesund, Joensen, Ask, & Moe-Nilssen, 2005). Two stud-
ies found that the inclusion of an isoinertial lifting evaluation was more predictive of injury
than basic strength tests (Gebhardt et al., 2006; Mayer et al., 1990). Other research found that
individuals with back pain performed poorly on an isoinertial lifting test (Ruan, Haig, Geisser,
Yamakawa, & Buchholz, 2001).

FACTORS TO CONSIDER IN TEST DEVELOPMENT OR SELECTION


When developing or selecting a physical performance test, one must consider the reliability, adverse
impact, safety, and logistics related to test setup and administration. Myers et al. (1993) reviewed
over 20 basic ability tests and found the tests to be reliable with test-retest reliabilities ranging
from 0.65 to 0.95. Reliability coefficients for job simulations tend to be similar to basic ability tests
(r = .50 to .92), with lowest reliabilities associated with lift/carry simulations (r = .50 to .57)
(Gebhardt et al., 1992; Gebhardt et al., 1998). The reliabilities for other job simulations such as
manhole hoist (0.83); ladder climb and carry (0.80–0.88); pursuit with turns, stairs, and drag (0.85);
fence climbs (0.88–0.94); maze crawl (0.76); and pole climb (0.79) were similar to basic ability tests
(Baker & Gebhardt, 2005a; Baker et al., 2000; Gebhardt & Baker, 1999; Gebhardt et al., 1998;
Gebhardt et al., 1992).
Adverse impact by sex and age is a concern with physical tests. Basic ability and job simulation
tests demonstrate significantly large test score differences between men and women with effect sizes
typically exceeding 1.0 when strength, aerobic capacity, and anaerobic power are measured (Baker,
2007; Blakely et al., 1994; Gebhardt, 2007; Gebhardt et al., 2005). When the test did not focus on
predominately strength or aerobic components, women performed similar to men (e.g., maze crawl,
ladder climb) (Gebhardt & Baker, 1999). These gender differences are attributed to physiological
differences (e.g., lean body mass, percent body fat, height, weight) between men and women. Arvey,
Landon, Nutting, and Maxwell (1992) used hierarchical regression analysis controlling for physi-
ological differences (e.g., body fat, lean body mass) and found fewer test score differences between
men and women. However, this does not obviate the fact that women’s performance is significantly
lower than men’s on physical tests.
Similarly, age differences have been found between the younger (under 40 years old) and older
(50 and over) age groups, and intermittently between the under- and over-40-year-old groups (Arvey
et al., 1992; Blakely et al., 1994; Gebhardt et al., 2005). These differences were found for basic abil-
ity tests and job simulations. It should be noted that when sex and age differences are examined
using differential predication in criterion-related validity studies, the results indicate that the tests
are fair across sex and age groups (Gebhardt & Baker, 1999; Gebhardt et al., 1998).
Physical Performance Tests 285

In an analysis of more than 50,000 subjects, Baker (2007) found numerous differences across
White, African-American, and Hispanic groups. To eliminate a gender affect, only men were
included in the study. Significant differences with effect sizes ranging from 0.29 to 0.52 were found,
with White men performing better than African-American men on basic ability and job simulation
tests that required some level of quick and/or continuous movement (e.g., pursuit run, 1.5-mile run,
arm endurance, firefighter evolution). For strength tests, White and African-American men have
significantly better scores than Hispanic men (Baker, 2007; Blakely et al., 1994).
Although one would ideally select tests with no adverse impact, this is not possible in the physi-
cal testing arena. Rather, one must select and/or design tests that have less adverse impact. Choosing
a job simulation over a basic ability test will not necessarily reduce adverse impact for women or
minorities. Further, if the job demands have a considerable upper body strength component (e.g.,
lineworker), tests that evaluate the ability should be used. Review of past research is the best method
to determine the validity and adverse impact of specific physical tests. When designing new tests,
one must ensure that the pilot or test sample includes an adequate number of women. Because of the
low number of women in physically demanding jobs, organizations should attempt to recruit women
from jobs with similar physical demands to participate in the data-gathering phase. This approach
was used in a statewide firefighter study in which women firefighters in fire suppression jobs were
recruited from neighboring states to participate in a validation study (Gebhardt & Baker, 1999).
Without these data, identification of a fair and sound passing score would not have been possible.
Although the reliability and adverse impact issues are important, one must pay special atten-
tion to the safety and logistics associated with administering physical performance tests. As stated
above, basic ability tests provide a more controlled environment than job simulations. If job simula-
tions are used, issues such as the floor surface, temperature, and general testing environment must
be monitored. Both types of tests can be administered safely.

VALIDITY OF PHYSICAL PERFORMANCE TESTS


Validity has been defined as “the degree to which evidence and theory support specific interpreta-
tions of test scores entailed by proposed uses of tests” (AERA et al., 1999, p. 9). Two approaches
are typically used to establish the validity of physical tests. One involves establishing an empiri-
cal relationship between the test and a criterion. The second encompasses gathering of evidence
that the test components have a verifiable link between the test and job content or requirements.
This validity may be established by confirming that the tests are related to a construct (e.g., mus-
cular strength) or tasks required for job performance. However, the Uniform Guidelines (1978)
listed three types of validity: content, construct, and criterion-related. These three types are sub-
sumed within the validity definitions in the Society of Industrial and Organizational Society (SIOP)
Principles (SIOP, 2003).
Various studies have shown that physical performance tests are valid predictors of job perfor-
mance. Hogan (1991a) reviewed over 14 journal articles and technical reports across a 13-year
time frame (1979–1986) that included data evaluating sex differences on physical test and objective
criteria of job performance (e.g., work sample, production rates, medical leave, job ratings). This
review demonstrated the validity of basic ability tests (e.g., strength, flexibility) in terms of physi-
cal employment requirements. Other studies, including a meta-analysis of physical tests, found that
muscular strength, muscular endurance, and aerobic and anaerobic tests (e.g., arm lift, 1-mile run,
situps) were valid predictors for public safety and blue-collar positions (e.g., construction, main-
tenance, repair) (Arvey et al., 1992; Blakely et al., 1994; Gebhardt, 2000). The significant simple
validity coefficients for basic ability tests with supervisor-peer ratings in these studies ranged from
0.14 to 0.63. Significant correlations between basic ability tests and work simulations, ranging from
0.20 to 0.64, were identified.
Job simulations not only demonstrate content validity, they have been shown to be predictive of
objective job performance measures (e.g., job ratings, productivity rates). The simulations used in
286 Handbook of Employee Selection

the selection setting included lift/carry tasks, pursuit and arrest simulation, firefighter tasks, valve
turning, and lashing (Anderson, 2003; Gebhardt, 2007). Sothmann and associates in multiple studies
found performance on a firefighter job simulation to be significantly related (r = .67) to actual heart
rate response and oxygen uptake values during firefighting tasks (Sothmann et al., 1990; Sothmann
et al., 2004). The progression of these studies demonstrated the validity of job simulations and basic
ability tests with criterion measures.
When conducting a criterion-related validity study, creation/selection of the criterion measure(s)
is as important as test selection. The most common criteria used are supervisor and/or peer ratings
of job performance, productivity measures, and work samples. Injury and lost workdays are viable,
but very large samples are required for use of these data. Further, injury data variables may lack
variability and be confounded with other safety initiatives implemented concurrently with the test-
ing by an organization. Regardless of the type of criteria used, the reliability of the measure should
be determined (Shrout & Fleiss, 1979).
Job simulation tests possess content and construct validity in terms of tasks, conditions, duration,
and intensity, whereas basic ability tests possess construct validity (Arvey et al., 1992; Fleishman,
1964; Hogan, 1991b; Myers et al., 1993). However, theses types of validity may not be sufficient for
physical performance tests to withstand legal scrutiny (United States v. City of Erie, 1995; EEOC
v. Dial Corporation, 2006).
The difficulty with content and construct models is establishing an accurate passing score for
a test or test battery. This is especially difficult if a job simulation consists of a series of tasks. If
the test consists of one job task with a known performance criteria obtained in the job analysis,
completion of the task can serve as the passing score. For example, in a paramedic test in which the
force to lift the head end of a stretcher (152 lb) was established, the completion of two consecutive
lifts requiring 152 lb of force constituted a passing score (Gebhardt & Crump, 1984). In light of
the Lanning v. SEPTA (1999) litigation, there is added responsibility of the organization to gather
empirical data (e.g., arrest rates) to establish a passing score that reflects the minimally acceptable
level of job performance and is consistent with business necessity.
When determining the validity model to use, several factors must be considered, including (a) type
of test desired, (b) availability of job performance information (e.g., ratings, productivity, attrition),
and (c) organizational resources and commitment to the study. The latter two are related primarily
to criterion data collection. When using supervisor or peer ratings, the availability of personnel and
probability of obtaining individual differences must be considered. Finally, if productivity, attrition,
or injury data are available, the data must be reviewed for applicability with the abilities or tasks
being measured (e.g., musculoskeletal injury vs. eye injury), type of data available (e.g., quantitative
versus qualitative), and confounding effects of other organizational programs (e.g., safety). A com-
pendium of criteria for workplace assessment that addresses these issues is found in Performance
Assessment for the Workplace (Wigdor & Green, 1991).

SELECTION OF FINAL TEST BATTERY


If empirical data relating the predictor tests to a criterion measure are available, various statistical
procedures can be used to establish the test validity and test battery components. The first assess-
ment should be a review of the simple validity correlations between the predictor tests and crite-
rion measure(s) to identify potential tests. Depending upon the goal and constraints of the study,
statistical procedures such as multiple regression, logistic regression, canonical correlation, and
others are used. Multiple regression is commonly used in physical testing to identify those tests that
significantly add to the prediction of job performance. The use of multiple regression analysis to
identify a valid test battery helps to eliminate test redundancy (e.g., highly correlated tests). Because
most physical performance tests adversely impact women, use of two highly correlated tests (e.g.,
upper body strength test) increases adverse impact. Use of multiple regression helps avoid this
possibility.
Physical Performance Tests 287

Other statistical procedures used are logistic regression and canonical correlation. Logistic
regression allows for use of multiple predictors but requires a dichotomous criterion measure
(Pedhazur, 1997). This type of analysis is useful if the criterion measure can be used in a dichoto-
mous format such as the level of aerobic capacity required to perform an order selector or firefighter
job, or the likelihood of injury (Gebhardt et al., 2006; Hodgdon & Jackson, 2000; Sothmann et al.,
2004). Canonical correlation yields a correlation of two latent variables, one representing a set of
independent variables, the other a set of dependent variables (Levine, 1977; Tabachnick & Fidell,
1997). This method allows the researcher to investigate a set of dependent variables instead of
one variable. Each of these methods has advantages and disadvantages. Selection of a statistical
technique is dependent on the available data, types of tests and criterion, organizational goals, and
business necessity.
Similar to other employment assessments, the fairness of the test across protected groups should
be established. Because physical performance tests commonly demonstrate adverse impact against
women and older individuals in terms of test scores and passing rates, it is important to establish
test fairness. Regression analysis can be used to examine subgroup differences (Bartlett, Bobko,
Mosier, & Hannan, 1978; Cleary, 1968). Research using a moderated regression analysis to evaluate
test fairness typically found physical tests to be fair across sex and age subgroups (Gebhardt et al.,
1998; Sothmann et al., 2004).

TEST SCORING AND ADMINISTRATION


TYPES OF SCORING
Two types of scoring methods commonly used for physical performance test batteries are multiple
hurdle and compensatory models. The multiple hurdle approach establishes a separate passing score
for each test in the battery. The compensatory approach combines the scores from the individual
tests into an overall score that determines whether an individual passed or failed the test battery.
The compensatory approach allows for an individual to offset poorer scores on one test with better
scores on other tests. Different methods have been used to combine test scores into an overall test
battery score. When using a compensatory model, one must consider whether equal weighting (e.g.,
z score) or multiple regression beta weights are most suited for the test battery.
A compensatory model normally results in less adverse impact against women than the multiple
hurdle approach (Sothmann et al., 2004). Although the compensatory approach reduces adverse
impact by sex, an applicant may receive an extremely low score on one test and pass the test battery.
To alleviate this problem, minimum individual test scores can be established if there is a known
specific training requirement for new hires. For example, law enforcement candidates enter an acad-
emy with defined physical performance graduation standards. By determining the magnitude of
physical improvement achieved in the academy, one can define a minimum test score that allows for
this projected improvement. In this model, the candidate must achieve the minimum level on each
test and a passing score for the combination of the tests (Baker & Gebhardt, 2005b).
Some compensatory models assign point values to ranges of test scores and sum the points
achieved across all tests to attain an overall score that is compared to a passing score. The points are
assigned using various techniques that provide a distribution of test scores (e.g., percentile, stanine
scores). Two issues arise when attempting to use this method with multiple tests: (a) identification of
the bandwidth for test scores assigned the same point value and (b) number of point values utilized
per test. The bandwidth should be generated using data from test scores and take into account the
statistical properties of the tests (e.g., standard error of the difference; Cascio, Outtz, Zedeck, &
Goldstein, 1991). Because the number of point values utilized is dependent on the width of the test
score bands or ranges, large samples are needed to construct point value distributions.
Attempts have been made to combine physical performance and other selection tests (e.g., cogni-
tive) in a compensatory model for making selection decisions. With these attempts at combining
288 Handbook of Employee Selection

physical and cognitive test scores, several questions have arisen. First, are the cognitive and physi-
cal performance components of a job independent or dependent? Can an individual compensate for
lesser physical capabilities with higher levels of cognitive capabilities? Because these components
are independent, higher levels of either factor cannot compensate for lower levels of the other factor.
Second, how will the weightings be determined? Job analysis data [e.g., essential task frequency;
time spent; knowledge, skills, and abilities (KSAs)] can be used to develop weightings that reflect
the portion of the job. Alternately, cognitive and physical KSA importance ratings were used to
establish a 50%-50% weighting scheme for a firefighter test (Brunet v. City of Columbus, 1995).
Although this weighting was upheld in the district and appellate courts, this approach may result in
greater adverse impact (e.g., sex, ethnic), especially if ranking is required.

ESTABLISHING PASSING SCORES


In the employment setting, passing scores are used to identify individuals who are capable of
performing or being trained to perform essential job tasks. There are two basic types of passing
scores: criterion-referenced and norm-referenced (Landy & Conte, 2007; Safrit & Wood, 1989). The
Uniform Guidelines (1978) indicated that passing scores should be “reasonable and consistent” with
proficient job task performance. Criterion-referenced are best suited for setting passing scores for
employment decisions. Establishing the passing score(s) for physical performance tests is similar to
other selection assessments (e.g., cognitive). Concurrent and/or predictive test validation data (e.g.,
test scores, criterion measures) are used to identify the potential passing score(s) that minimize
selection error (false positive, false negative) and maximize prediction of effective job performance
(true positive, true negative). There are several methods used to identify valid passing scores. These
include expectancy tables, contingency tables, ergonomic data, and Taylor-Russell tables.
Ergonomic or physiological data can provide actual values for completion of the work and in
turn a passing score for a test. Sothmann and colleagues determined the minimum level of aerobic
capacity required to perform firefighter tasks (e.g., pulling down ceiling with a pike pole, climbing
stairs with a high rise pack) and used these data to establish the minimum score for a measure of
aerobic capacity in firefighter selection (Sothmann et al., 1990; Sothmann et al., 2004). Similarly,
tasks involving muscular strength (e.g., crack a valve, tighten a turnbuckle) that can be measured
directly (e.g., force) have been used as the point that defines successful and unsuccessful perfor-
mance (Gebhardt et al., 1985; Jackson, Osburn, Laughery, & Vaubel, 1992). Absent these types of
data, one must use a combination of expectancy and contingency tables, job analysis information,
organizational preferences (e.g., test type), and business necessity to identify a passing score that
maximizes prediction and minimizes adverse impact on protected groups.
In most instances, passing scores are set using incumbent data in that organizations want a defen-
sible passing score prior to test implementation. Cascio, Alexander, and Barrett (1988) stated the
use of incumbents who are older and more experienced may lead to test score differences between
incumbents and candidates. Research in physical testing has found age differences. However, the
older workers perform at lower levels on physical jobs than younger workers (20–39 years), thus
negating this concern (Baker et al., 2000; Gebhardt et al., 1998). This is especially true if basic
ability tests are used. If job simulations are used, experience has been shown to be a factor if the
test includes movements that will become more efficient with practice (e.g., use of a pike pole to
pull down ceiling in a firefighter selection test battery). Therefore, simple tasks requiring little or no
instruction (e.g., drag hose) eliminate the experience factor.
For jobs that are time-sensitive (e.g., law enforcement, fire suppression, emergency medical ser-
vice), the pace with which an individual responds is important to effective performance and public
safety. For example, firefighters do not run when performing fire suppression activities because
of the inherent dangers of the environment (e.g., surface conditions, smoke, backdraft). However,
if a firefighter moves too slowly, lives may be lost. Experienced emergency personnel know the
paces at which effective incumbents perform a job. Pacing information coupled with other validity
Physical Performance Tests 289

data has been used to establish passing scores. Sothmann and colleagues used data from multiple
studies and pacing data to establish the passing score for a firefighter test battery (Sothmann et al.,
1990; Sothmann et al., 2004). Six video-pacing tapes of a firefighter suppression evolution criterion
measure were generated based on the validation data from incumbent firefighters (Sothmann et al.,
2004). The paces ranged from very fast (i.e., 1 standard deviation faster than the mean time) to very
slow (i.e., 3 standard deviations slower than the mean time). A random sample of 41 incumbent
firefighters (31 men, 10 women) was selected to determine which pace was acceptable or unac-
ceptable. On the basis of the incumbent responses, the slowest pace viewed as acceptable by most
incumbents was identified and served as the minimally acceptable level of job performance for the
firefighter work sample criterion measure. If pacing or other types of video-based ratings are used,
the individual criteria for judging the behaviors must be established, raters must be trained, and rat-
ings should be independently derived.
Physical performance tests in a selection setting typically use a single passing score that is
applicable to all candidates. However, a few law enforcement agencies utilize sex and/or age nor-
mative data to identify multiple passing scores in their selection procedures. For example, men age
20–29 years must complete 40 situps, whereas women age 20–29 years must complete 35. This
practice appears to be a clear violation of the Civil Rights Act of 1991, which states that passing
scores cannot vary on the basis of gender, ethnic group, or age. The rationale for using normative
gender and/or age data as passing scores is based on the premise that the agency is measuring
physical fitness and not job performance. This type of testing in which test scores are compared
to previously gathered data is called norm-referenced testing (Safrit & Wood, 1989). Most norma-
tive physical performance test data were gathered on the “normal population” and are not specific
to a job. These data allow an agency to classify individuals in terms of percentile or an overall
category (e.g., good, fair, poor). From an employment perspective, these classifications by age and/
or gender do not provide an indication of acceptable or unacceptable job performance. Individuals
supporting norm-referenced testing believe that different passing scores for men and women (or
age groups) are, in fact, the same because they represent the same percentile rank in the norm-
reference tables. However, these same individuals recommend a single test passing score when job
simulations are used.
In the 1980s and 1990s, the Employment Litigation Section (ELS) of the Department of Justice
urged law enforcement agencies to use norm-referenced selection tests (Ugelow, 2005). ELS took
the stand that sex-normed passing scores could be used to make selection decisions because the
candidates would receive physical training in the law enforcement academy. One court supported
the use of sex- and age-normed passing scores for selection and retention (Peanick v. Reno, 1995).
Another court upheld use of norm-referenced tests on the basis that the tests were assessing fitness
and not job requirements (Alspaugh v. Michigan Law Enforcement Officers Training Council, 2001).
In Lanning v. Southeastern Pennsylvania Transportation Authority (SEPTA) (1999), the court ruled
against the plaintiff’s recommendation to use sex-normed passing scores for a 1.5-mile run because
the percentile scores offered no evidence of how the times related to job performance. In these cases
that supported normed passing scores, little or no mention was made of how the percentile passing
scores were related to job performance.
Norm-referenced testing has not been used in firefighter or blue-collar job selection. Two pri-
mary issues must be considered prior to using a norm-referenced approach. First, multiple passing
scores will be used, thus violating a tenet of the Civil Rights Act of 1991. Second, the Uniform
Guidelines (1978) specified that a test’s passing score must represent the minimally acceptable level
of job performance. Use of percentiles does not establish the evidence needed to demonstrate that
the passing score represents or predicts minimally acceptable job performance. The Lanning v.
SEPTA (1999) decision rejected the use of percentiles because of this issue. It is the opinion of the
authors that the use of sex- and/or age norm-reference test scoring does not comply with employ-
ment laws and statutes in that multiple passing scores for a single test have not been shown to reflect
minimally acceptable job performance.
290 Handbook of Employee Selection

ADMINISTRATION OF TESTS
Physical performance test administration differs from cognitive testing in several respects. First,
test instructions must provide adequate detail to ensure the examinee understands the purpose and
goal of the test (e.g., complete maximum number of revolutions). Test instructions should include
information related to committing errors (e.g., running when only walking is permitted) and the
consequence of the errors (e.g., repeat trial, fail test). Having the test administrator demonstrate
correct and incorrect test performance enhances the instructions. Second, test administrators must
be trained to administer the test, recognize testing errors, and score the tests (e.g., time, count).
A video/DVD of a test can enhance the examinee’s understanding of the test but cannot replace the
test administrator.
Further, administrators and others should not provide encouragement (e.g., cheering) to
examinees because external motivation can alter performance. Examinees should be self-
motivated. Testing examinees separately removes the possibility of external motivation and
prevents subsequent examinees (e.g., second, third) from gaining test insight (e.g., pace) not
available to the first examinee, thus allowing for equal treatment of all examinees. Finally, if
a job simulation is used, test administrators must practice cuing the examinee to the next test
component because improper timing of test cues can impact examinee performance. Tests that are
timed and/or require counting the number of event completions involve coordination of a timing
device (e.g., stopwatch) or counter with occurrence of an event (e.g., cross finish line, completion
of a sequence).
Placement of physical performance tests in the selection continuum and retest policies vary in
relation to business necessity and type of test used. Typically, physical tests have been used at three
points in the selection process: (a) first selection assessment, (b) after completion and passing of the
cognitive test, and (c) after conditional offer of the job. If testing occurs at points a and b, candidates
who pass the test are placed in a pool and move onto the next assessment phase (e.g., medical exam,
interview). If one or more of the tests in a battery use any physiological parameter (e.g., heart rate)
to score a test, it is construed as a medical assessment and must be given after the conditional job
offer to be in compliance with ADA (1990). If retesting is allowed for those who fail the test bat-
tery, the time between the original and retest should be determined by the time needed to alter the
physiological state (e.g., muscular strength) and the organization’s needs. The literature is replete
with research demonstrating the ability of women and men to increase their physical capabilities
(Knapik et al., 2001; Kraemer et al., 2001; Nindl et al., 2007). From a physiological standpoint,
2–3 months of sustained exercise are required to realize substantial gains in strength and aerobic
capacity (McArdle et al., 2007). Although retesting can effectively take place 3 months after ini-
tial testing, an organization may determine that the logistics for retesting are difficult or the pool
of qualified applicants is sufficient. Conversely, the “shelf life” of test results may be impacted by
inactivity, injury, or aging for individuals not initially selected for the job. Therefore, a retest may
be appropriate prior to entry into the job.

PHYSICAL PERFORMANCE TEST PREPARATION


Participation in a sustained physical exercise program prior to testing will increase an individual’s
likelihood of passing the test. Various physical preparation programs designed to increase the
applicants’ physical capabilities have been successful in increasing the selection rate of women
and men (Baker & Gebhardt, 2005b; Gebhardt & Baker, 1999; Gebhardt & Crump, 1990; Hogan &
Quigley, 1994; Knapik et al., 2006). These programs have been especially effective for women
(Gebhardt & Baker, 2007; Hogan & Quigley, 1994). When job simulation tests are used, use
of applicant practice sessions or instructional material (e.g., video, DVD) that outlines the test
and how to prepare for the test are effective preparation techniques (Hogan & Quigley, 1994;
Sothmann et al., 2004).
Physical Performance Tests 291

LEGAL ISSUES
As women were denied the opportunity to enter higher paying trades and public safety jobs,
greater scrutiny was placed on the selection procedures and decision models used by organizations.
Litigation in the physical testing area focused mainly on adverse impact in the selection setting,
with a few cases related to job retention. As stated above, the physiological sex differences led to
test score differences, and in turn, disproportionate hiring of women. These test differences are not
due to test bias, because corresponding differences are found for the criterion measure of interest
(Hogan, 1991a). In fact, almost all physical tests violate the four-fifths rule, which defines adverse
impact as the passing rate of a protected group (e.g., women, minorities) being less than 80% (4/5)
of the majority group (e.g., men) (EEOC, 1978). Although almost all physical tests have an adverse
impact on women, they have been upheld when the validity evidence demonstrates the relationship
of the test and passing score(s) to the job (e.g., Porch v. Union Pacific Railroad, 1997). However,
when job analysis and/or validity evidence is lacking, the tests have not been upheld (e.g., Teresa
D. Varden v. Alabaster, AL et al., 2005). Prior papers have reviewed the physical testing litigation
(Hogan & Quigley, 1986; Terpstra, Mohamed, & Kethley, 1999). The review provided in this
section focuses on recent physical testing litigation and is organized by the laws under which they
were filed. The two laws, Civil Rights Act (1964, 1991) and ADA (1990), are similar, but they have
one difference that impacts physical testing. Both laws require showing the job-relatedness of a
selection procedure, but the ADA also requires determining whether a reasonable accommodation
was available.

ADA OF 1990
The ADA (1990) was designed to protect individuals with mental and physical disabilities in the
private sector workforce. In the federal sector, the Rehabilitation Act of 1973 (1973) is a corollary
to the ADA. Title I of ADA states that health/medical status inquires must follow conditional offer
of employment and physical tests can be given prior to conditional job offer [42 U.S.C. § 12112
(b) (5-6) (d) (2-4)]. However, these stipulations impact the type of test used for pre-job offer testing.
For example, submaximal aerobic capacity tests (e.g., step, bicycle, treadmill) require monitoring of
heart rate and cannot be given during the pre-offer stage. Physical performance testing has inherent
safety issues when assessing individuals ranging in age from 20 to more than 50 years. The ACSM
provides recommendations for screening (e.g., heart rate, blood pressure) prior to participation in
exercise/testing (Whaley et al., 2006). Because of the ADA medical test restrictions (e.g., blood
pressure, heart rate taken to ensure safely when completing tests), employers have used waiver
forms and medical certification by a physician for pre-job offer testing and medical examinations
for the post-offer testing. It should be noted that use of a waiver or any prescreening approach does
not absolve the employer of responsibility for the safety of the applicant, as seen in White v. Village
of Homewood (1993), in which the court ruled that a signed waiver was not enforceable.
Most ADA litigation has dealt with medical issues (e.g., vision, diabetes, bipolar disorder) and
incumbent personnel, rather than physical performance issues (Rothstein, Carver, Schroeder, &
Shoben, 1999). However, three court cases filed by incumbents included physical testing. In Andrews
v. State of Ohio (1997), officers failed to meet the Ohio State Highway Patrol strength, aerobic, and/
or weight standards and filed suit. The court dismissed the case and stated that the plaintiffs were
not disabled or regarded as disabled. Similarly, in Smith v. Des Moines (1996) the court ruled in
summary judgment that a firefighter who failed to meet the city’s aerobic standard was not disabled,
just unfit for firefighter work. In Belk v. Southwestern Bell Telephone Company (1999), the plain-
tiff, who wore leg braces because of residual effects of polio, desired to move from a sedentary to
a physically demanding job. The plaintiff requested and received a physical test modification but
failed the test. The court ruled in favor of the plaintiff for reasons other than failing the physical
tests. On appeal, the decision was vacated and settled out of court.
292 Handbook of Employee Selection

TITLE VII OF THE 1964 CIVIL RIGHTS ACT AND AGE


DISCRIMINATION IN EMPLOYMENT ACT OF 1967
Hogan and Quigley (1986) conducted a review of earlier Title VII court cases involving physical
performance tests. Of the 44 cases reviewed, all but one involved public safety positions (police
officer and firefighter). For the ten public safety cases involving physical testing, three ruled in favor
of the test and seven struck down the test. Hogan and Quigley concluded that the courts were most
concerned with whether a job analysis was conducted, and if so, the quality and appropriateness of
the job analysis in relation to the test validation strategy used. Content validity was supported for
work sample tests coupled with a proper job analysis (Hardy v. Stumpf, 1978). However, the content
validation of basic ability tests did not pass legal scrutiny (Berkman v. City of New York, 1982;
Harless v. Duck, 1980). Tests defended by claims of criterion-related validation were struck down
because of the study quality or criterion measure used. In summary, Hogan and Quigley’s findings
for cases filed in the late 1970s and early 1980s did not bode well for the defense of physical per-
formance tests. However, these physical testing cases, which corresponded with the enactment of
the EEOC Uniform Guidelines (1978), were the impetus for organizations to perform thorough job
analyses and use accepted validation procedures.
In later cases, the courts have continued to examine the quality of the job analysis and strategy
used to validate physical tests. For example, even a direct simulation of firefighter work (e.g., lifting/
carrying ladders, dragging hose) did not withstand legal scrutiny in Legault v. Russo (1994) because
the job analysis lacked the detail to support the use of several test events. Similarly, in 2005, a
police department implemented a physical test requiring candidates to complete an obstacle course
coupled with basic ability tests in a specified time (United States v. City of Erie, 2005). The test,
based on officer experience without reference to any job analysis, had a passing score equivalent
to the mean time for incumbent officers. The court found the test invalid and ruled in favor of the
plaintiff, demonstrating the importance of job analysis in test development and validation. Further,
the court provided additional criteria for test development and validation that the city failed to meet,
including (a) a test must be job-related as indicated in the Uniform Guidelines and professional stan-
dards such as the SIOP Principles, (b) physical performance tests like other tests must meet a busi-
ness necessity as indicated by the EEOC Guidelines, (c) a test must be administered to candidates in
the same manner it was validated, and (d) test validity and passing scores must be determined and
documented using professionally accepted methods.
Since the 1986 review, other issues have come to the forefront, including (a) use of physical
tests for incumbent assessment, (b) job-relatedness of the passing score, (c) appropriateness
of criterion measures, and (d) evidence used to support the business necessity of the test. There
have been several challenges to the use of physical performance tests for employee retention or
promotion. These challenges focused on fire or law enforcement departments that instituted some
form of incumbent physical assessment. The courts ruled that an employer can institute incumbent
physical assessments, but these assessments must stand up to legal scrutiny in regard to validity
and job relatedness (Smith v. Des Moines, 1997; Fraternal Order of Police v. Butler County Sheriff
Department, 2006; Pentagon Force Protection Agency v. Fraternal Order of Police, 2004). In the
private sector, an arbitrator upheld the use of physical tests for incumbent job transfers to physically
demanding jobs and upheld the test battery (UWUA Local 223 & The Detroit Edison Co, 1991).
Incumbent physical testing became an issue in litigation related to mandatory retirement when the
Massachusetts State Police reorganized to include former state police agencies, resulting in a more
restrictive mandatory retirement age. In 1992, officers brought suit under the Age Discrimination in
Employment Act of 1967 (ADEA, 1967) against the Commonwealth to nullify the mandatory retire-
ment age of 55 (Gately v. Massachusetts, 1992). In 1996, after several injunctions, the court allowed
individuals age of 55 years or older to continue as active members of the Massachusetts State Police
(Gately v. Massachusetts, 1996) with the proviso that they take and pass a physical performance test.
In 2006, a validated physical performance test battery was implemented to evaluate incumbents for
Physical Performance Tests 293

continued employment with the Massachusetts State Police. Enlisted members (i.e., troopers) who
do not pass the test (after repeated trials and remedial training) will be dismissed from the state
police regardless of age (Gebhardt & Baker, 2006).
The courts have also ruled on issues related to the development and use of passing scores for
physical performance tests. The Standards for Educational and Psychological Testing (AERA et
al., 1999) and SIOP Principles (2003) state that passing scores should be set using empirical data
that identify the relationship of the test to “relevant criteria.” The use of passing scores has more
recently been linked to business necessity and minimum qualifications (Kehoe & Olson, 2005).
One of the cases that articulated use of business necessity and minimum qualifications was Lanning
v. SEPTA (1999). Both Lanning v. SEPTA (1999, 2002) cases and their impact on passing scores
have received a great deal of attention (Gutman, 2003; Sharf, 1999, 2003). Sharf (2003) provides
an excellent overview and interpretation of the Lanning outcomes. In addition, the article provides
insight from employment attorneys pertaining to how the Lanning decision will impact the defense
of employment tests. A brief description of the Lanning cases are provided here, followed by the
impact the decisions have on physical performance testing.
In an attempt to improve the law enforcement capabilities of their police force, SEPTA imple-
mented a physical performance test battery that included a 1.5-mile run. SEPTA’s consultant
established the passing score for this test at 12 minutes, which equated to an aerobic capacity of
42.5 mL · kg–1 · min–1. This passing score led to disparate passing rates of 55.6% for men and
6.7% for women. Plaintiffs (women who failed the test and the Department of Justice) filed suit
under Title VII against SEPTA on the basis of sex discrimination. The District Court for the first
Lanning case (1999) ruled in favor of SEPTA and concluded the test was job-related and of business
necessity. This decision was appealed and the Third Circuit remanded the case back to the District
Court. Using the instructions for business necessity provided by the Circuit Court, the District
Court convened in 2002 and again ruled in favor of SEPTA. The second appeal to the Third Circuit
affirmed the District Court ruling.
Some decisions from the Lanning 2002 ruling are perplexing and have been questioned in other
District Courts. The Lanning 2002 ruling suggested that the SIOP Principles (2003) are not relevant
in establishing the job-relatedness and business necessity of employment tests. In addition, the
Lanning decision stated that a successful legal defense was not predicated on compliance with all
sections of the EEOC Uniform Guidelines (1978). However, in United States v. City of Erie (2005),
a different District Court concluded that the SIOP Principles and EEOC Uniform Guidelines were
relevant in the defense of a test.
The Lanning ruling applied a stricter burden to prove job-relatedness and business necessity of
a physical performance test. To meet these requirements, there must be evidence that (a) the test
is related to job performance, (b) alternative tests and methods have been considered, and (c) the
passing score reflects the minimally acceptable level of job performance. The minimally accept-
able level of performance was defined as “likely to do the job,” not “some chance of doing the job”
(Sharf, 2003). In addition, the minimal level of performance did not have to be defined by the fitness
level of current employees. This case indicated that an employer has the right to improve the physi-
cal capabilities of their workforce. Thus, if a minimum acceptable level of test performance can be
established using empirical criteria (e.g., arrest records, archival data), it is irrelevant if incumbents
cannot meet the minimal levels. The Lanning case showed that various information sources can be
used to defend a test and establish a minimum acceptable passing score.
In EEOC v. Dial Corp (2006), the EEOC found that a physical performance test involving mov-
ing/lifting 35-lb bars to heights of 30 and 60 in. discriminated against women and was more dif-
ficult than the job. Further, some women who completed the test received a fail status on the basis of
comments made by the test administrators. Dial used a business necessity defense and claimed the
test resulted in lower on-the-job injuries and that the test accurately measured the job requirements.
EEOC noted that reduction in injuries began 2 years prior to the implementation of the test, thus
being attributed to earlier measures (e.g., job rotation, job redesign) instituted by Dial. The court
294 Handbook of Employee Selection

ruled in favor of the plaintiff, stating that Dial did not adequately demonstrate that the test is valid
or a business necessity. This ruling was later upheld on appeal.
The recurring theme evident in the EEOC v. Dial Corp ruling is the need for accurate job analysis
to develop an acceptable test. Dial defended the test with two pieces of flawed information: (a) the
test reflected actual job tasks and (b) use of the test resulted in a decline in on-the-job injuries.
Further, subjective judgment of an individual’s performance on physical tests should be avoided.

BENEFITS OF PHYSICAL TESTING


The benefits of physical testing for selection into arduous jobs have been demonstrated in the public,
private, and military sectors. These benefits range from reduction in lost work time and injuries and
increases in productivity. Studies in the military have demonstrated the relationship between physical
capabilities and injuries. Many of the military studies are summarized in a recent National Academy
of Sciences publication (Sackett & Mavor, 2006). In a longitudinal study, the military demonstrated
reduction in injuries in basic training by using physical testing to identify individuals who were
unable to meet the training demands (Knapik et al., 2007). In the private sector, it is more difficult to
gather measures that define the impact of physical testing. Fitness programs in the work setting have
resulted in reductions in turnover, absenteeism, and healthcare costs when exercisers are compared
to nonexercisers (Chenoweth, 1994; Gebhardt & Crump, 1990). Craig, Congleton, Kerk, Amendola,
and Gaines (2006) found several physical performance factors (e.g., aerobic capacity, sit and reach
flexibility) were significantly associated with injuries in manual materials handing jobs. Similar
research found a reduction of injuries with the implementation of a preemployment physical test for
truck drivers and dockworkers (Gilliam & Lund, 2000). Some studies showed the positive impact
of physical selection tests by comparing the injuries, days lost from work, and administrative costs
between workers who were tested prior to job entry and workers who were not tested. One study
examined 5 years of injury and time loss data in the railroad industry using data from the two samples
of train service workers (tested and hired, n = 12,714; not tested and hired, n = 15,794) hired during
the same 5-year period (Baker & Gebhardt, 2001). The tested group had fewer workers injured than
the nontested group (648 vs. 3,898). When age, tenure, and year injured were controlled (ANCOVA),
the results showed significant differences (p < .001) on days lost (tested = 77.2; not tested = 142.4) and
injury costs (p < .01; tested = $15,315; not tested = $66,148). When the age, job tenure, and year injured
were controlled separately, the significant differences remained. Research in the freight industry found
significantly lower lost work days for the tested group, who had an odds ratio of 1.7–2.2 less likelihood
of injury than the not tested group (Baker & Gebhardt, 2001). Physical performance tests developed
and validated in accordance with the laws and professional standards benefit the employer and the
employee by identifying individuals who are capable of meeting the physical demands of arduous
jobs. Individuals who pass such tests are more likely to be successful performing physical work and
less likely to incur worker compensation costs (e.g., lost work days, injury).

REFERENCES
Age Discrimination in Employment Act of 1967, 29 U.S.C. Sec. 621, et. seq. (1967).
Alspaugh v. Michigan Law Enforcement Officers Training Council, 634 N.W.2d 161 (Mich. App. 2001).
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (1999). Standards for educational and psychological testing. Washington,
DC: American Educational Research Association.
Americans with Disabilities Act of 1990, 42 U.S. C. A.
Anderson, C. K. (2003). Physical ability testing for employment decision purposes. In W. Karwowski & W. S.
Marras (Eds.). The occupational ergonomics handbook. New York, NY: Routledge.
Andrews v. State of Ohio, 104 F.3d 803 (6th cir., 1997).
Arvey, R. D., Landon, T. E., Nutting, S. M., & Maxwell, S. E. (1992). Development of physical ability tests for
police officers: A construct validation approach. Journal of Applied Psychology, 77, 996–1009.
Physical Performance Tests 295

Astrand, P., Rodahl, K., Dahl, H. A., & Stromme, S. G. (2003). Textbook of work physiology (4th ed.) Champaign,
IL: Human Kinetics.
Baker, T. A. (2007, April). Physical performance test results across ethnic groups: Does the type of test have
an impact? Paper presented at the annual meeting of the Society for Industrial and Organizational
Psychology. New York, NY.
Baker, T. A., & Gebhardt, D. L. (2001). Utility of physical performance tests in reduction of days lost and
injuries in railroad train service positions. Beltsville, MD: Human Performance Systems.
Baker, T. A., & Gebhardt, D. L. (2005a). Development and validation of selection assessments for Energy
Northwest nuclear security officers. Beltsville, MD: Human Performance Systems.
Baker, T. A., & Gebhardt, D. L. (2005b). Examination of revised passing scores for state police physical per-
formance selection tests. Beltsville, MD: Human Performance Systems.
Baker, T. A., Sheppard, V. A., & Gebhardt, D. L. (2000). Development and validation of physical performance
test for selection of City of Lubbock police officer. Beltsville, MD: Human Performance Systems.
Bartlett, C. J., Bobko, P., Mosier, S. B., & Hannan, R. (1978). Testing for fairness with a moderated multiple
regression strategy: An alternative to differential analysis. Personnel Psychology, 31, 233–241.
Belk v. Southwestern Bell Telephone Company 194 F.3d 946 (8th Cir. 1999).
Berkman v. City of New York, 536 F. Supp. 177, 30 Empl. Prac. Dec. (CCH) 33320 (E.D.N.Y. 1982).
Bilzon, J. L., Scarpello, E. G., Smith, C. V., Ravenhill, N. A., & Rayson, M. P. (2001). Characterization of the
metabolic demands of simulated shipboard Royal Navy fire-fighting tasks. Ergonomics, 44, 766–780.
Blakely, B. R., Quinones, M. A., Crawford, M. S., & Jago, I. A. (1994). The validity of isometric strength tests.
Personnel Psychology, 47, 247–274.
Brunet v. City of Columbus, 58 F.2d 251 (6th Cir. 1995).
Cascio, W. F., Alexander, R. A., & Barrett, G. V. (1988). Setting cutoff scores: Legal, psychometric, and profes-
sional issues and guidelines. Personnel Psychology, 41, 1–24.
Cascio, W. F., Outtz, J. L., Zedeck, S., & Goldstein, I. L. (1991). Statistical implications of six methods of test
score use in personnel selection. Human Performance, 4, 233–264.
Chaffin, D. B., Herrin, G. D., Keyserling, W. M., & Foulke, J. A. (1977). Pre-employment strength testing in
selecting workers for materials handling jobs (Report CDC-99-74-62). Cincinnati, OH: National Institute
for Occupational Safety and Health, Physiology, and Ergonomics Branch.
Chenoweth, D. (1994). Positioning health promotion to make an economic impact. In J. P. Opatz (Ed.),
Economic impact of worksite health promotion. Champaign, IL: Human Kinetics.
Civil Rights Act of 1964 (Title VII), 42 U.S.C. §2000e-2, et seq., (1964).
Civil Rights Act of 1991, S. 1745, 102nd Congress, (1991).
Cleary, T. A. (1968). Test bias: Prediction of grades of Negro and white students in integrated colleges. Journal
of Educational Measurement, 5, 115–124.
Craig, B. N., Congleton, J. J., Kerk, C. J., Amendola, A. A., & Gaines, W. G. (2006). Personal and
non-occupational risk factors and occupational injury/illness. American Journal of Industrial Medicine,
49, 249–260.
Davis, P. O., Dotson, C. O., & Santa Maria, D. L. (1982). Relationship between simulated fire fighting tasks and
physical performance measures. Medicine and Science in Sports and Exercise, 14, 65–71.
Equal Employment Opportunity Commission v. Dial Corp, No. 05-4183/4311 (8th Cir. 2006).
Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, and Department
of Justice. (1978). Uniform Guidelines on Employee Selection Procedures. Washington, DC: Bureau of
National Affairs, Inc.
Fleishman, E. A. (1964). Structure and measurement of physical fitness. Englewood, NJ: Prentice Hall.
Fleishman, E. A., Gebhardt, D. L., & Hogan, J. C. (1986). The perception of physical effort in job tasks. In
G. Borg & D. Ottoson (Eds.), The perception of exertion in physical work (pp. 225–242). Stockholm,
Sweden: Macmillan Press.
Fleishman, E. A., & Quaintance, M. K. (1984). Taxonomies of human performance. New York, NY: Academic Press.
Fraternal Order of Police Local 101 v. Butler County Sheriff’s Department, #05-ULP-09-0509, 23 OPER 30
(Ohio SERB, 2006).
Gately v. Massachusetts, 92-CV-13018-MA (D. Mass. Dec. 30, 1992).
Gately v. Massachusetts, No. 92-13018 (D. Mass. Sept. 26, 1996).
Gebhardt, D. L. (1984). Revision of physical ability scales. Bethesda, MD: Advanced Research Resources
Organization.
Gebhardt, D. L. (2000). Establishing performance standards. In S. Constable & B. Palmer (Eds.), The pro-
cess of physical standards development (pp. 179–197). Wright-Patterson Air Force Base, OH: Human
Systems Information Analysis Center.
296 Handbook of Employee Selection

Gebhardt, D. L. (2007, April). Physical performance testing: What is the true impact? Paper presented at the
annual meeting of the Society for Industrial and Organizational Psychology. New York, NY.
Gebhardt, D. L., & Baker, T. A. (1999). Validation of physical performance tests for the selection of firefighters
in the State of New Jersey. Beltsville, MD: Human Performance Systems.
Gebhardt, D. L., & Baker, T. A. (2006). Determination of incumbent passing scores for the Massachusetts State
Police physical performance test. Beltsville, MD: Human Performance Systems.
Gebhardt, D. L., & Baker, T. A. (2007). Physical performance assessment. In S. G. Rogelberg (Ed.), Encyclopedia
of industrial and organizational psychology (pp. 179–199). Thousand Oaks, CA: Sage.
Gebhardt, D. L., Baker, T. A., Curry, J. E., & McCallum, K. (2005). Development and validation of medical
guidelines and physical performance tests for U. S. Senate Sergeant at Arms positions: Vol. I and II.
Beltsville, MD: Human Performance Systems.
Gebhardt, D. L., Baker, T. A., & Sheppard, V. A. (1992). Development and validation of physical performance
tests dockworker, hostler, and driver jobs in the freight industry. Hyattsville, MD: Human Performance
Systems.
Gebhardt, D. L., Baker, T. A., & Sheppard, V. A. (1998). Development and validation of physical performance
tests for BellSouth physically demanding jobs. Hyattsville, MD: Human Performance Systems.
Gebhardt, D. L., Baker, T. A., & Thune, A. (2006). Development and validation of physical performance, cognitive,
and personality assessments for selectors and delivery drivers. Beltsville, MD: Human Performance
Systems.
Gebhardt, D. L. & Crump, C. E. (1984). Validation of physical performance selection tests for paramedics.
Bethesda, MD: Advanced Research Resources Organization.
Gebhardt, D. L., & Crump, C. E. (1990). Employee fitness and wellness programs in the workplace. American
Psychologist, 45, 262–272.
Gebhardt, D. L., Schemmer, F. M., & Crump, C. E. (1985). Development and validation of selection tests for
longshoremen and marine clerks. Bethesda, MD: Advanced Research Resources Organization.
Gilliam, T., & Lund, S. J. (2000). Injury reduction in truck driver/dock workers through physical capability new
hire screening. Medicine and Science in Sports and Exercise, 32, S126.
Gledhill, N., & Jamnik, V. K. (1992). Characterization of the physical demands of firefighting. Canadian
Journal of Sport Science, 17, 207–213.
Golding, L. A. (2000). YMCA fitness testing and assessment manual (4th ed.), Champaign, IL: Human
Kinetics.
Guion, R. M. (1998). Assessment, measurement and prediction for personnel decisions. Mahwah, NJ: Lawrence
Erlbaum.
Gutman, A. (2003). Adverse impact: Why is it so difficult to understand? The Industrial-Organizational
Psychologist, 40, 50.
Hardy v. Stumpf, 17 Fair Empl. Prac. Cas. (BNA) 468 (Supp. Ct. Cal. 1978).
Harless v. Duck, 22 Fair Empl. Prac. Cas. (BNA) 1073 (6th Cir. 1980).
Hattori, Y., Ono, Y., Shimaoka, M., Hiruta, S., Kamijima, M., & Takeuchi, Y. (1998). Test-retest reliability of
isometric and isoinertial testing in symmetric and asymmetric lifting. Ergonomics, 41, 1050–1059.
Hazard, R. G., Reeves, V., & Fenwick, J. W. (1992). Lifting capacity. Indices of subject effort. Spine, 17,
1065–1070.
Hodgdon, J. A., & Jackson, A. S. (2000). Physical test validation for job selection. In S. Constable & B. Palmer
(Eds.), The process of physical fitness standards development (pp. 139–177). Wright-Patterson Air Force
Base, OH: Human Systems Information Analysis Center.
Hogan, J. C. (1991a). Physical abilities. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and
organizational psychology (Vol. 2, pp. 753–831). Palo Alto, CA: Consulting Psychologist Press.
Hogan, J. C. (1991b). Structure of physical performance in occupational tasks. Journal of Applied Psychology,
76, 495–507.
Hogan, J. C., & Quigley, A. M. (1986). Physical standards for employment and the courts. American
Psychologist, 41, 1193–1217.
Hogan, J. C., & Quigley, A. M. (1994). Effects of preparing for physical ability tests. Public Personnel
Management, 23, 85–104.
Jackson, A. S., Osburn, H. G., Laughery, K. R., & Vaubel, K. P. (1992). Validity of isometric tests for
predicting the capacity to crack open and closed industrial valves (pp. 688–691). In Human Factors and
Ergonomics Society annual meeting proceedings. Santa Monica, CA: Human Factors and Ergonomics
Society.
Jackson, A. S., & Sekula, B. K. (1999). The influence of strength and gender on defining psychophysical lifting
capacity. Proceeding of the Human Factors and Ergonomics Society, 43, 723–727.
Physical Performance Tests 297

Karwowski, W., & Mital, A. (1986). Isometric and isokinetic testing of lifting strength of males in teamwork.
Ergonomics, 29, 869–878.
Kehoe, J. F., & Olson, A. (2005). Cut scores and employment discrimination litigation. In F. J. Landy (Ed.),
Employment discrimination litigation. San Francisco, CA: Jossey-Bass.
Knapik, J. J., Canham-Chervak, M., Hoedebecke, E., Hewitson, W. C., Hauret, K., Held, C. et al. (2001). The
fitness training unit in U.S. Army basic combat training: Physical fitness, training outcomes, and injuries.
Military Medicine, 166, 356–361.
Knapik, J. J., Darakjy, S., Hauret, K. G., Canada, S., Scott, S., Rieger, W. et al. (2006). Increasing the physical
fitness of low-fit recruits before basic combat training: An evaluation of fitness, injuries, and training
outcomes. Military Medicine, 171, 45–54.
Knapik, J. J., Jones, S. B., Darakjy, S., Hauret, K. G., Bullock, S. H., Sharp, M. A., et al. (2007). Injury rates and
injury risk factors among U.S. Army wheel vehicle mechanics. Military Medicine, 172, 988–996.
Kraemer, W. J., Mazzetti, S. A., Nindl, B. C., Gotshalk, L. A., Volek, J. S., Bush, J. A., et al. (2001). Effect of
resistance training on women’s strength/power and occupational performances. Medicine and Science in
Sports and Exercise, 33, 1011–1025.
Landy, F., Bland, R., Buskirk, E., Daly, R. E., Debusk, R. F., Donovan, E., et al. (1992). Alternatives to chrono-
logical age in determining standards of suitability for public safety jobs (Technical Report) University
City, PA: Center for Applied Behavioral Sciences, Pennsylvania State University.
Landy, F. J., & Conte, J. M. (2007). Work in the 21st century: An introduction to industrial and organizational
psychology. Malden, MA: Blackwell.
Lanning v. Southeastern Pennsylvania Transportation Authority, 181 F.3d 478, 482-484 (3rd Cir. 1999).
Lanning v. Southeastern Pennsylvania Transportation Authority, 308 F.3d 286 (3rd Cir. 2002).
Legault v. Russo, 64 FEP Cases (BNA) 170 (D.N.H., 1994).
Levine, M. S. (1977). Canonical correlation analysis: Uses and interpretation. Beverly Hills, CA: Sage.
Lygren, H., Dragesund, T., Joensen, J., Ask, T., & Moe-Nilssen, R. (2005). Test-retest reliability of the
Progressive Isoinertial Lifting Evaluation (PILE). Spine, 30, 1070–1074.
Mayer, T., Gatchel, R., & Mooney, V. (1990). Safety of the dynamic progressive isoinertial lifting evaluation
(PILE) test. Spine, 15, 985–986.
McArdle, W. D., Katch, F. I., & Katch, V. L. (2007). Exercise physiology: Energy, nutrition, and human perfor-
mance physiology (5th ed.). Baltimore, MD: Lippincott Williams & Wilkins.
McGinnis, P. M. (2007). Biomechanics of sport and exercise (2nd ed.). Champaign, IL: Human Kinetics.
Myers, D. C., Gebhardt, D. L., Crump, C. E., & Fleishman, E. A. (1993). The dimensions of human physical
performance: Factor analyses of strength, stamina, flexibility, and body composition measures. Human
Performance, 6, 309–344.
Nindl, B. C., Barnes, B. R., Alemany, J. A., Frykman, P. N., Shippee, R. L., & Friedl, K. E. (2007). Physiological
consequences of U.S. Army Ranger training. Medicine and Science in Sports and Exercise, 39,
1380–1387.
Pandolf, K. B., Burse, R. L., & Goldman, R. F. (1977). Role of physical fitness in heat acclimatization, decay
and reinduction. Ergonomics, 20, 399–408.
Peanick v. Reno, 95-2594 (8th Cir. 1995).
Pedhazur, E. J. (1997). Multiple regression in behavioral research: Explanation and prediction (3rd ed.). New
York, NY: Harcourt Barace College Publishers.
Pentagon Force Protection Agency v. Fraternal Order of Police DPS Labor Committee, FLRA Case #WA-CA-
04-0251 (Wash. Region, 2004).
Porch v. Union Pacific Railroad, Administrative law proceeding, State of Utah.
Rehabilitation Act of 1973, 29 U.S.C. 701 et. seq. (1973).
Rothstein, M. A., Carver, C. B., Schroeder, E. P., & Shoben, E. W. (1999). Employment law (2nd ed.). St. Paul,
MN: West Group.
Ruan, C. M., Haig, A. J., Geisser, M. E., Yamakawa, K., & Buchholz, R. L. (2001). Functional capacity evalua-
tions in persons with spinal disorders: Predicting poor outcomes on the Functional Assessment Screening
Test (FAST). Journal of Occupational Rehabilitation, 11, 119–132.
Sackett, P. R., & Mavor, A. S. (2006). Assessing fitness for military enlistment: Physical, medical and mental
health standards. Washington, DC: The National Academies Press.
Safrit, M. J., & Wood, T. M. (1989). Measurement concepts in physical education and exercise science.
Champaign, IL: Human Kinetics.
Salgado, J. F., Viswesvaran, C., & Ones, D. S. (2001). Predictors used for personnel selection. In N. Anderson,
D. S. Ones, H. K. Sinaangil, & C. Viswesvaran (Eds.), Handbook of work and organizational psychology
(Vol. 1, pp. 164–199). London, England: Sage.
298 Handbook of Employee Selection

Sharf, J. C. (1999). Third circuit’s Lanning v. SEPTA decision: Business necessity requires setting minimum
standards. The Industrial-Organizational Psychologist, 37, 149.
Sharf, J. C. (2003). Lanning revisited: The third circuit again rejects relative merit. The Industrial-Organizational
Psychologist, 40, 40.
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological
Bulletin, 86, 420–428.
Smith v. Des Moines, #95-3802, 99 F.3d 1466, 1996 U.S. App. Lexis 29340, 72 FEP Cases (BNA) 628, 6 AD
Cases (BNA) 14 (8th Cir. 1996). [1997 FP 11]
Society for Industrial and Organizational Psychology (2003). Principles for the validation and use of personnel
selection procedures (4th ed.). Bowling Green, OH: Author.
Sothmann, M. S., Gebhardt, D. L., Baker, T. A., Kastello, G. M., & Sheppard, V. A. (2004). Performance
requirements of physically strenuous occupations: Validating minimum standards for muscular strength
and endurance. Ergonomics, 47, 864–875.
Sothmann, M. S., Saupe, K., Jasenof, D., Blaney, J., Donahue-Fuhrman, S., Woulfe, T., et al. (1990). Advancing
age and the cardiovascular stress of fire suppression: Determining the minimum standard for aerobic
fitness. Human Performance, 3, 217–236.
Tabachnick, B. G., & Fidell, L. S. (1997). Using multivariate statistics. New York, NY: HarperCollins College
Publishers.
Teresa D. Varden v. City of Alabaster, Alabama and John Cochran ; US District Court, Northern District of
Alabama, Southern Division; 2:04-CV-0689-AR.
Terpstra, D. A., Mohamed, A. A., & Kethley, R. B. (1999). An analysis of federal court cases involving nine
selection devices. International Journal of Selection and Assessment, 7, 26–34.
Ugelow, R. S. (2005). I-O psychology and the Department of Justice. In F. J. Landy (Ed.), Employment
discrimination litigation: Behavioral, quantitative, and legal perspectives (pp. 463–490). San Francisco,
CA: Jossey-Bass.
United States v. City of Erie, Pennsylvania, #04-4, 352 F. Supp. 2d 1105, (W.D. Pa. 2005).
UWUA Local 223 & The Detroit Edison Co., AAA Case No. 54-30-1746-87 (Apr. 17, 1991) (Lipson, Arb.)
Whaley, M. H., Brubaker, P. H., & Otto, R. M. (2006). ACSM’s guidelines for exercise testing and prescription
(7th ed.). Philadelphia, PA: Lippincott Williams & Wilkins.
White v. Village of Homewood, 628 N.E.2d 616 (Ill. App. 1993).
Wigdor, A. K., & Green, B. F. (1991). Performance assessment for the workplace. Washington, DC: National
Academy Press.
14 Its Measurement and Validity
Personality

for Employee Selection


Leaetta Hough and Stephan Dilchert

Personality variables have had a roller-coaster-like ride in employee selection during the 20th and
21st centuries. They have been denounced and rescued several times over the years. We foresee
a more stable and sanguine future as evidence continues to mount documenting the importance
of personality variables as determinants of individual- and team-level performance. Indeed, one of
the most important advances in our field can be attributed to the recognition of the importance of
personality variables in determining and explaining performance. With the addition of personality
variables to our models of job performance, we are now able to explain significantly more variation
in behavior and performance than ever before (Hough, 2001).
In this chapter, we review the issues, document the evidence, and describe the consensus emerg-
ing about the usefulness of personality variables in employee selection. We describe factors that
hinder our understanding and those that help increase our knowledge of personality variables and
their role in more accurately predicting work-related criteria. We address issues related to taxo-
nomic structure, measurement methods, level of measurement, validity, and factors that threaten
and enhance the validity of personality measures.

STRUCTURE OF PERSONALITY VARIABLES


The taxonomic structure of personality variables is critically important to industrial-
organizational (I-O) psychology, and it is nowhere more important than in employee selection
research and practice. Personality constructs now play key roles in our models of individual and
team performance. Researchers accumulate criterion-related validity studies to meta-analytically
summarize the relationships between personality and criterion constructs. Practitioners contribute
to the research base and benefit from the accumulation of knowledge generated by meta-analyses,
enabling us to build better prediction equations for criteria of interest. All of these activities
and contributions are dependent upon a good and generally agreed upon taxonomic structure of
personality variables.
Although criticism has waxed and waned, today the Five-Factor Model (FFM) is the most widely
accepted structure of personality variables (Goldberg, 1993; Wiggins & Trapnell, 1997; for a history
of the FFM, see Dilchert, Ones, Van Rooy, & Viswesvaran, 2006; Hough & Schneider, 1996; and
Schneider & Hough, 1995). The earliest version of the FFM (emotional stability, surgency, culture,
dependability, and agreeableness) dates back to Tupes’ and Christal’s work in the 1950s and early
60s (Tupes & Christal, 1958, 1961; see also Tupes & Christal, 1992). The specifics of the FFM
have evolved somewhat over the years, and the factors are now often labeled emotional stability
(or neuroticism), extraversion, openness to experience, conscientiousness, and agreeableness (see
Goldberg, 1993, for a concise summary of the FFM structure). Since Barrick and Mount (1991),
most researchers followed their example of summarizing relationships between personality vari-
ables and work-related criteria according to the FFM.

299
300 Handbook of Employee Selection

Nonetheless, Hough and her colleagues (Hough, 1992, 1997, 1998; Hough & Oswald, 2000,
2005, 2008; Hough & Schneider, 1996; Schneider & Hough, 1995; Schneider, Hough, & Dunnette,
1996) have steadfastly criticized the FFM, concluding it is an inadequate taxonomy of personality
variables for I-O psychology to build knowledge and understand the determinants of work behavior
and performance. They and others (especially Block, 1995) argued that the FFM is not comprehen-
sive, combines variables into factors that are too heterogeneous, and is method-bound, dependent
upon factor analysis of lexical terms. Variables that are not well represented at the factor level of
the FFM include autonomy versus dependency and succorance (Costa & McCrae, 1988), social
competence and insight (Davies, Stankov, & Roberts, 1998; Gough, 1968; Hogan, 1969; Schneider,
Ackerman, & Kanfer, 1996), risk taking (Paunonen & Jackson, 1996; Zuckerman, Kuhlman,
Joireman, Teta, & Kraft, 1993), masculinity/femininity (Hough, 1992; Kamp & Gough, 1986),
moral development and ego development (Kohlberg, Levine, & Hewer, 1994), tolerance for con-
tradiction (Chan, 2004); ego depletion (Baumeister, Gailliot, DeWall, & Oaten, 2006), and emo-
tionality (Averill, 1997; Russell & Carroll, 1999) (see Hough & Furnham, 2003, for other missing
variables).
Although some of these traits are included as lower-order facets in inventory-specific conceptu-
alizations of the FFM, they are not necessarily measuring the same trait, nor are they necessarily
narrow or homogenous enough to constitute personality facets. Compound traits such as integrity,
managerial potential, or customer service orientation (cf. Ones & Viswesvaran, 2001) are made up
of several homogeneous traits that do not necessarily covary, but all relate to a criterion of interest
(Hough & Schneider, 1996). Hough and Ones (2001) have offered a working taxonomy of personal-
ity compound traits including scales available to measure them.
In addition, a lack of generally accepted facet-level taxonomies for the Big Five domains and
the resulting reliance on inventory-specific, lower-level trait descriptions has impeded research and
practice of personality measurement relating to prediction of behaviors and performance in work
settings. This issue is only now being addressed as researchers are developing empirically derived,
facet-level taxonomies for Big Five domains (see Birkland & Ones, 2006, for emotional stability;
Connelly, Davies, Ones, & Birkland, 2008a, for agreeableness; Connelly et al., 2008b, for openness;
Connelly & Ones, 2007, as well as Roberts, Chernyshenko, Stark, & Goldberg, 2005, for conscien-
tiousness; Davies, Connelly, Ones, & Birkland, 2008, for extraversion).
Although most meta-analyses have utilized the FFM to summarize the relationships between
personality variables and job-related criteria, summaries of relationships at this broad a level
can mask relationships that emerge between narrower facets and performance constructs. Hough
and her colleagues (Hough, 1992, 1997, 1998; Hough, Eaton, Dunnette, Kamp, & McCloy, 1990;
Hough & Oswald, 2005; Schneider, Hough et al., 1996) have long argued that focusing exclusively
on factor-level personality traits in the prediction of heterogeneous work-related criteria can be
counterproductive for a science aiming to explain the relationships between personality constructs
and work-related constructs. The predictive validity of a personality variable depends on (a) the
criterion content domain being predicted (Bartram, 2005; Hogan & Holland, 2003; Hough, 1992;
Ones, Dilchert, Viswesvaran, & Judge, 2007) and (b) the hierarchical match between the predictor
and criterion measures (Hogan & Roberts, 1996; Ones & Viswesvaran, 1996; Schneider, Hough
et al., 1996).
This is not to conclude that measurement at levels even more narrow than the facet level of the
FFM is better. Overly narrow personality constructs can impede the growth of knowledge just as
overly broad constructs can impede our science and practice (Hough, 1997; Ones, Viswesvaran, &
Dilchert, 2005; Oswald & Hough, 2008). Although it is appropriate to summarize the relationships
between narrow constructs and various criteria, it is difficult to build a science without learning
about the extent to which information and conclusions generalize at a broader construct level as
well, including the Big Five and even higher order factors (cf. Digman, 1997). Combining variables
into compound variables (such as integrity and customer service orientation) that consist of multiple
Big Five domains, such as conscientiousness, agreeableness, and emotional stability, can increase
Personality 301

the predictive accuracy of personality variables (Hough & Ones, 2001; Ones et al., 2007; Ones &
Viswesvaran, 2001).
Meta-analytic evidence summarizing personality-criterion relationships at various levels,
including Big Five factors, facets, and compound scales, indicates that validity varies as a func-
tion of the theoretical relevance of the predictor of the criterion, which includes similarity of
bandwidth (Hogan & Roberts, 1996; Hough, 1992; Hough et al., 1990; Rothstein & Goffin, 2000;
Schneider, Hough et al., 1996; Tett, Jackson, Rothstein, & Reddon, 1999). Personality variables
that are a priori identified as theoretically relevant to a criterion correlate more highly with the cri-
terion, and the overall predictor and criterion should be similarly heterogeneous/homogeneous.
The FFM provides an important organizing function for I-O psychology and has helped con-
nect our science to our sister sciences. At the same time, we encourage the search for personality
constructs that consist of variables with similar nomological nets to improve our understanding of
personality structure. Hough (see Hough & Ones, 2001) proposed a nomological-web clustering
approach to develop personality constructs that are conceptually and empirically similar. She pro-
posed that cluster analysis of the patterns of relationships of target variables with other variables,
including variables other than personality variables, are needed to identify homogeneous personal-
ity constructs that are characterized as having similar nomological nets. Using this logic, Hough
and Ones (2001) conducted a qualitative cluster analysis of such personality variable profiles and
recommended others use their taxons and further improve their taxonomy. Some quantitative sum-
marizes have used this taxonomy in summarizing results across personality scales (e.g., Dudley,
Orvis, Lebiecki, & Cortina, 2006; Foldes, Duehr, & Ones, 2008). Dudley et al. (2006) examined the
criterion-related validities of four of the facets of conscientiousness defined by Hough and Ones,
and found that these facets (a) have only low to moderate correlations with each other; (b) correlated
differentially with the broad factor conscientiousness; and (c) depending upon the occupation and
criterion, correlate higher with the criterion than does global conscientiousness. Foldes et al. (2008)
used the Hough and Ones taxonomy to summarize mean score differences between Whites and dif-
ferent ethnic groups, finding that (a) facet-level mean score differences varied although the facets
all belonged to the same domain and (b) factor-level differences varied from their facet-level mean
score differences. Taken together, these summaries of very different types of information provide
construct validity evidence for the Hough and Ones personality taxonomy as well as its usefulness
for the science and practice of personnel selection. We urge others to report validities according to
Hough and Ones’ proposed structure, as well as to refine their structure to increase our understand-
ing of the pattern of relationships between personality constructs and other constructs.

MEASUREMENT METHODS
Although personality variables are typically measured with self-report, Likert-type items and scales,
other assessment methods can be used. In this section we describe reliability, construct validity,
and criterion-related validity evidence and discuss practical issues such as development cost and
cost and ease of administration of self-report measures [including Likert-type, forced choice, item
response theory (IRT), and other recent innovations] as well as several other methods of measuring
personality [i.e., biodata, interviews, situational judgment tests (SJTs), simulations, and assessment
centers]. For each of these different methods of measuring personality variables we summarize the
effects of response distortion and measurement mode (i.e., paper-and-pencil administered, com-
puter administered, and Internet administered).

SELF-REPORT QUESTIONNAIRE MEASURES


Personality measurement is almost synonymous with standardized self-report questionnaires. Many
other methods, some of them discussed below, can also be thought of as a form of self-report. For
example, the information provided in interviews and assessment centers is self-reported, despite
302 Handbook of Employee Selection

being other-rated or recorded, and in most cases captured in a less standardized fashion. Personality
questionnaires elicit an individual’s responses to items and use these responses (assuming Likert-
type scaling) to express the individual’s trait standing in comparison to a normative group. What
distinguishes them from most other self-report methods is the degree of standardization they pro-
vide in eliciting test-taker responses, allowing the user to reliably compare an individual’s scores to
those of other test-takers.
Decades of research, hundreds of primary studies, and dozens of quantitative summaries have
shown that such standardized, self-report tests of personality traits provide (a) reliable assessments
(cf. Viswesvaran & Ones, 2000) and (b) scores that correlate at highly useful levels with valued
organizational outcomes and criteria. Hough and Furnham (2003), Hough and Oswald (2008), Ones
et al. (2005), and Ones et al. (2007) provided comprehensive overviews of the validity of personality
measures for predicting various valued behaviors and outcomes in organizational settings, and we
refer interested readers to these summaries.
Despite the strong empirical evidence for their validity (see the section “Validity of Personality
Constructs and Other Factors that Affect Their Usefulness,” p. 308, for details), self-report mea-
sures of personality are often criticized when used in employee selection because of the possibility
of intentional response distortion. Much of the research addressing the issue of response distortion
has focused on standardized tests (rather than other ways of assessing personality; see below) and
much of the basis of criticism of self-report measures is, as Chan (2008) concluded, more of an
urban-legend than reality. Dilchert, Ones, Viswesvaran, and Deller (2006) have summarized sug-
gested palliatives and evaluated their merit to deal with intentional distortion and socially desirable
responding on such measures, concluding that approaches such as score corrections or exclusion of
test-takers on the basis of social desirability scale scores have little merit and that future improve-
ments are more likely to come from the use of new and innovative item formats. Oswald and Hough
(2008) and Hough and Oswald (2008) also summarized the literature, cautioning that results may
differ depending on (a) item transparency (i.e., subtle vs. obvious items), (b) research setting, (i.e.,
experimental vs. real-life applicant selection setting), and (c) research design [i.e., concurrent vs.
predictive (longitudinal) design]. They concluded that (a) validities for both types of scales remain
essentially intact in real-life applicant selection situations using concurrent validation studies and
(b) subtle-item scales also retain their validities in predictive designs. Below, we review new devel-
opments in this area and evaluate their promise for addressing concerns about response distortion
in typical, Likert-type self-report personality scales.

Forced-Choice Item Response Formats


Forced-choice formats require the respondent to choose between endorsing one statement (or
characteristic) but not others, thus forcing the respondent to score lower on one of the characteristics
or scales. The initial motive for developing force-choice tests was to counter the effects of intentional
distortion by matching response options for desirability. Most forced-choice response formats result
in ipsative or partially ipsative scores (cf. Hicks, 1970). Ipsative scores produce a rank ordering of
traits within individuals and do not allow for comparisons across individuals. Selection decisions
require choosing between and among candidates and thus depend upon normative measurement.
Although purely ipsative measures are inappropriate for employee selection, partially ipsative
measures can produce reasonably normative information, which led to initial optimism regarding
their value in applied settings. However, a careful evaluation of the usefulness of force-choice
formats failed to provide the hoped-for results. Hicks (1970) reviewed the psychometric and
statistical properties and validity evidence of forced-choice (ipsative) measures and concluded, “One
cannot reject the possibility that the rare positive results for partially ipsative measures represent
a random departure from the general condition of inferiority for both purely ipsative and partially
ipsative measures” (p. 181).
Despite the concerns about ipsative measurement for employee selection applications, continu-
ing concern about intentional distortion and its possible effects on criterion-related validity has led
Personality 303

to renewed interest in forced-choice formats as an alternative to Likert-type scales. For example,


Christiansen (1997); Jackson, Wroblewski, and Ashton (2000); White, Young, and Rumsey (2001);
Martin, Bowen, and Hunt (2002); and Young, White, Heggestad, and Barnes (2004) compared
intentional distortion on Likert-type scales with partially ipsative, forced-choice scales and found
that less distortion occurred on the partially ipsative, forced-choice scales. However, Heggestad,
Morrison, Reeve, and McCloy (2006) also examined extent of distortion at the individual level
and found that the forced-choice measures were as affected by faking as the Likert-type scales.
McCloy, Heggestad, and Reeve (2005) also raised concerns that although forced-choice measures
may effectively curb faking at the item level, it can still be possible to distort one’s scores on the
scale level by identifying the traits hypothesized to relate to job performance and endorsing state-
ments accordingly.
As with all faking research, conclusions are different when comparing laboratory studies (such
as those cited above) to data from real-world assessment settings. When partially ipsative, forced-
choice scales are used in operational settings, substantive scale scores are elevated by about 1 stan-
dard deviation (Young et al., 2004). The evidence so far suggests that forced-choice methods are not
fake-resistant in real-life settings.

Computer Adaptive, IRT, Nonipsative Forced Choice


One way to avoid ipsativity in forced-choice responses is to present response options that reflect
different trait levels of the same construct. Rather than forcing the respondent to choose between
equally attractive options loading on different traits, this approach uses IRT to focus on developing
more accurate measurement along the entire continuum of a given trait.
The Navy Computer Adaptive Personality Scales (NCAPS) are one example of this type of
personality measurement (see Houston, Borman, Farmer, & Bearden, 2005; Underhill, 2006). A
computer-adaptive, forced-choice format (albeit with simultaneously presented response options
loading onto the same trait) and a traditional version (nonadaptive, nonforced choice) of the NCAPS
were developed. Both types of scales correlated with the targeted criteria. However, in all but one
comparison, the traditional NCAPS scales outpredicted the computer-adaptive, forced-choice scales
and reached near-maximum validity with fewer items (six or seven item pairs for traditional NCAPS
vs. eight or nine for adaptive NCAPS). According to Underhill (2006), although the “item cutoff
adaptive component of the Adaptive NCAPS version did not meet expectations” (p. viii), further
research is warranted.

Ideal Point Response Methods


Stark, Chernyshenko, and their colleagues (Chernyshenko, Stark, Drasgow, & Roberts, 2007;
Stark & Chernyshenko, 2007; Stark, Chernyshenko, & Drasgow, 2005; Stark, Chernyshenko,
Drasgow, & Williams, 2006) are involved in a programmatic effort to improve current measure-
ment of personality constructs. They proposed that ideal point response scales, which are based
largely on Thurstone’s (1928) scaling method and assumptions, better fit the nature of item respond-
ing than the Likert (1932) method and assumptions. Ideal point response scales assume that people
endorse items that are closer to their true trait level (i.e., an individual’s ideal point) than items that
are further away from their true-trait level. Ideal point methods more precisely measure all points
on the trait continuum than Likert-type scales. Items that differentiate people at the extreme ends
of the continuum are infrequently endorsed, resulting in low variances and low item-total scale
correlations. Such items are retained in ideal point scaling methods but discarded in Likert scaling
methods. With Likert-type scaling methods, desirable items have monotonically increasing item
response functions whereas items selected using ideal point response methods have bell-shaped
item response functions. On the basis of item-response theory analyses, Stark, Chernyshenko, and
colleagues conclude that ideal point response methods (a) fit monotonically increasing item response
functions (although they, compared with Likert-type scales, do not require it), b) do not negatively
affect criterion-related validity of personality scales, and c) provide more accurate measurement of
304 Handbook of Employee Selection

high and low scoring individuals and thus potentially lead to better selection decisions (although
criterion-related validity is unlikely to be affected because the correlation coefficient is relatively
insensitive to minor changes in rank order) (Chernyshenko et al., 2007; Stark et al., 2006).

OTHER-REPORTS
Organizations frequently use 360º feedback measures (a form of other-reports) and observers fre-
quently rate personality characteristics of participants in simulations and assessment center exer-
cises (see below). Yet organizations rarely use other-reports of individuals’ personality that are
assessed with standardized personality measures.
A meta-analysis summarizing data published before 1996 shows that correlations between
self-observer (true score) and other-reports for the Big Five range from .46 to .62, suggesting that
they can increment criterion-related validity beyond that of self-reports (Connolly, Kavanagh, &
Viswesvaran, 2007). The cumulative evidence also shows that although close relatives’ descriptions
of a target person are more similar to that of the target’s self-description than are peer ratings, even
strangers can provide valuable insight into a target individual’s personality on easily observed traits,
such as extraversion (Connolly et al., 2007). Not surprisingly, traits that are more easily observed in
short-acquaintance situations (such as extraversion) also exhibit higher levels of interobserver agree-
ment (Kenny, Albright, Malloy, & Kashy, 1994).
Interestingly, a more recent meta-analysis shows that work colleagues’ perceptions of individuals’
personality traits were only slightly more accurate than strangers’ perceptions (as indexed by self-
other and other-other correlations) and consistently less accurate than those of family members,
friends, and cohabitators (Connelly, 2008). These results held regardless of whether the work
colleague was a supervisor, coworker, or reference. Despite these low self-other and other-other
correlations, a single coworker’s description of a target’s personality predicts job performance better
than does a self-rating of personality (Connelly, 2008; Mount, Barrick, & Strauss, 1994; Nilsen,
1995). Parallel results have been found for classmates’ and friends’ personality ratings predicting
grades and academic achievement (Connelly, 2008). However, only a handful of studies in Connelly’s
meta-analysis have examined the predictive validity of others’ ratings of personality, and further
research on this topic is warranted.
Studies investigating potential response distortion in standardized other-reports of personality
are also scarce. It is safe to assume that if such measures were used to elicit information from
candidates’ acquaintances, the choice of rating source will influence the degree of distortion to be
expected. However, it is unlikely that organizations are willing to rely on ratings obtained from
spouses or friends in selecting among job applicants. We see the potential for other-ratings of
personality for applications in which the source of the ratings can be standardized and verified
(e.g., personality ratings made by the last two supervisors). Other tools used in employee selection
already employ a similar rationale (e.g., letters of reference) but do not provide the benefit that
standardized ratings of personality could provide: wide distributions of scores that could be used
to select rather than identify negative indicators that allow screening out of potential candidates.
We encourage researchers and practitioners to explore the potential for standardized other-ratings
of personality and to conduct additional studies investigating their criterion-related validity and
potential for incrementing validity.

BIODATA
Biodata measures (also known as biographical data and autobiographical information) focus on
previous life experiences and have a long history in I-O psychology as useful predictors of work-
related criteria (see reviews by Barge & Hough, 1988; Ghiselli, 1966; Griffeth, Hom, & Gaertner,
2000; Hough, in press; Hunter & Hunter, 1984; Reilly & Chao, 1982; Schmitt, Gooding, Noe, &
Kirsch, 1984). The premise of their success is the old adage: Past behavior is the best predictor
Personality 305

of future behavior (consistency principle). Using Wernimont and Campbell’s (1968) terminology,
they are samples of relevant behavior rather than signs. When scale development is construct-
oriented, biodata represent another method of measuring individual differences such as personality
constructs (Hough, 1989; Hough & Paullin, 1994). As Tenopyr (1994) hypothesized, biodata scales
developed to measure personality constructs (e.g., Big Five factors and their facets) correlate
appropriately with each other and with work-related criteria (cf. Kilcullen, White, Mumford, &
Mack, 1995; Manley, Benavidez, & Dunn, 2007; Oviedo-Garcia, 2007; Sisco & Reilly, 2007a;
Stokes & Cooper, 2001).
Although intentional distortion occurs on biodata and on traditional personality scales, the evi-
dence on the extent of distortion compared to traditional personality scales is mixed. Some studies
report less distortion on biodata scales (e.g., Kilcullen et al., 1995; Sisco & Reilly, 2007b; Stokes,
Hogan, & Snell, 1993), whereas other research suggests little difference in the amount of faking
on biodata versus standard personality scales (e.g., McFarland & Ryan, 2000; White et al., 2001).
Evidence suggests that one fruitful approach to reduce distortion is to require respondents to elabo-
rate on their responses to biodata items (Schmitt & Kunce, 2002; Schmitt et al., 2003). Moreover,
both verifiable and subtle items (where the construct measured is less apparent) appear to retain
their validity when used in real-life applicant settings (Alliger, Lilienfeld, & Mitchell, 1996; Gough,
1994; Harold, McFarland, & Weekley, 2006; White, Young, Hunter, & Rumsey, 2008). Given
the advantages of biodata, it is surprising that biodata measures are not used more frequently for
employee selection (Stokes & Cooper, 2004).

INTERVIEW
Around the world the interview is probably the most frequently used employee selection assessment
method (Moscoso, 2000; Ryan, McFarland, Baron, & Page, 1999) and is most often intended
to measure personality characteristics (Huffcutt, Conway, Roth, & Stone, 2001). Huffcutt and
colleagues developed a comprehensive taxonomy of possible interview constructs that interview
questions might measure. The seven constructs were (a) mental ability, (b) knowledge and skills,
(c) basic personality characteristics (such as the Big Five), (d) applied social skills such as social
intelligence and social competence, (e) interests and preferences, (f) organizational fit, and (g)
physical attributes. They sorted 338 interview questions from 47 actual employment interviews
into the seven constructs. They found that interview questions were most often intended to
measure personality characteristics (35% of the questions), followed by applied social skills
(28%), mental ability (16%), knowledge and skills (10%), interest and preferences (4%), physical
attributes (4%), and organizational fit (3%). Sixteen percent of all questions were intended to
measure conscientiousness or its facets.1 Their study does not address the construct validity of
the interview ratings but did find that interview ratings of personality correlate well with overall
job performance in various jobs. The correlations (corrected for range restriction in interview
scores and measurement error in performance evaluations) with overall job performance were .33
for extraversion, .33 for conscientiousness, .51 for agreeableness, and .47 for emotional stability.
This study and meta-analyses of the criterion-related validity of interviews in general (Huffcutt &
Arthur, 1994; McDaniel, Whetzel, Schmidt, & Maurer, 1994; Wiesner & Cronshaw, 1988) leave
little doubt about the criterion-related validity of the interview.
Studies that investigate the construct validity of the interview often investigate external cor-
relates, answering questions about the trait saturation of the interview ratings but often leaving
unanswered whether or not the interview measured the construct(s) intended. For example, meta-
analytic evidence indicates that interview ratings in general correlate .40 with measures of cognitive
ability (Huffcutt, Roth, & McDaniel, 1996) and .26 with measures of conscientiousness (corrected

1 The results of differential observer agreement for personality traits reviewed above can provide helpful information about
which traits are best assessed with traditional employment interviews, at least with regard to issues of reliability.
306 Handbook of Employee Selection

correlations, Cortina, Goldstein, Payne, Davison, & Gilliland, 2000). Barrick, Patton, and Haugland
(2000) had interviewers assess interviewee personality characteristics and correlated those ratings
with interviewee personality self-reports. Corrected correlations were .19 for conscientiousness, .24
for emotional stability, .49 for extraversion, .44 for openness, and .37 for agreeableness. As would
be expected, the overlap between self-reports and interview scores was slightly lower than typically
reported self-other correlations using standardized personality tests for the same dimensions. More
disconcerting: the two best personality predictors of job performance, conscientiousness and emo-
tional stability, were clearly not well measured in the interview—at least if self-reports are used as
the criterion. However, in this study, the interview questions were not designed specifically to mea-
sure personality characteristics; interviewers’ ratings were based on their general impression of the
interviewee personality characteristics. If assessment of personality characteristics via the interview
is the goal, questions should be developed specifically for this purpose. However, the few studies
that have examined the construct validity of personality scores obtained from interviews designed
specifically to measure personality variables do not provide much support for the construct validity
of such interview scores (e.g., Van Iddekinge, Raymark, Eidson, & Attenweiler, 2004).
A second study by Van Iddekinge, Raymark, and Roth (2005) examined the construct valid-
ity of an interview for assessing the NEO Personality Inventory (Costa & McCrae, 1992) facets
of altruism, self-discipline, and vulnerability. Interviewees described themselves using the NEO
facet scales, and experienced interviewers interviewed the mock candidates, asking them ques-
tions intended to measure the three characteristics. The interviewers provided interviewer ratings
of the personality constructs, and they completed the NEO facet scales to describe the candidates.
The study included an honest as well as an applicant-like condition. In the honest condition, con-
vergent validities of interviewer-based NEO ratings with the self-report NEO ratings averaged .32
(discriminant validities averaged .20); convergent validities of the interview ratings with self-report
NEO ratings averaged .24 (discriminant validities averaged .16). Neither type of interviewer-based
assessment of personality showed good convergent validity with the target constructs. Convergent
validities were even lower in the applicant-like condition (Van Iddekinge et al., 2005), possibly
because of the effect of response distortion in self-reports and interview scores in this condition.
The somewhat disappointing results for construct validity can perhaps be improved with atten-
tion to four variables that moderate the accuracy of personality judgments: the judge, the target
individual, the trait, and information obtained (Funder, 1995). Research suggests that (a) unstruc-
tured interviews are better measures of personality variables than structured interviews (although
criterion-related validity may suffer), (b) visible traits such as extroversion and agreeableness are
better measured than less visible traits, and (c) accuracy increases with more information about the
target individual (Blackman & Funder, 2002).

SJTS
SJTs present test-takers with a scenario (in written, audio, or video format) and several response
options describing possible courses of action. For employee selection purposes, SJTs are most often
contextualized for specific occupational domains (e.g., law enforcement or customer service) and
designed to measure personality traits deemed particularly relevant (e.g., conscientiousness or extra-
version). To capture personality-relevant variance in test-taker responses, the development of sce-
narios and response options must be theory- and data-driven, and SJT scores hypothesized (based
on item content) to measure a certain personality trait should relate to external measures of the same
construct (Chan & Schmitt, 2005).
As it is with all individual difference measures, reliability is an important factor when evaluating
the usefulness of SJTs for employee selection purposes. SJTs are often multidimensional (McDaniel,
Hartman, Whetzel, & Grubb, 2007), rendering internal consistency estimates of little value (as
would be the case when an internal consistency reliability estimate were to be computed across
items of different scales on a traditional personality test). In these circumstances, parallel form
Personality 307

reliability (Chan & Schmitt, 2002) and test-retest correlations (over a short time period, Schmidt &
Hunter, 1996) are appropriate methods of estimating reliability. However, both types of estimates
for SJTs measuring personality variables are rarely presented in the literature, and we encourage
scientists and practitioners alike to investigate and report on this issue to further improve our knowl-
edge of personality measurement using situational judgment approaches.
Another important issue concerns the distribution of constructs assessed. Providing response
options that load on conceptually and empirically different traits complicates score interpretation
and makes interindividual comparisons difficult. This is especially true in the case of personal-
ity assessment. The challenge lies in developing different response options for SJT scenarios that
are all expressions of the same personality trait, albeit at different trait levels. Making test-takers
choose between response-options loading on different personality dimensions will result in ipsative
or partially ipsative scores, limiting their usefulness for employee selection purposes (see earlier
discussion on ipsativity).
Another important issue in assessing personality with the use of SJTs is that of response instruc-
tions. A major distinction is between behavioral tendency versus knowledge instructions, sometimes
conceptualized as “would do” versus “should do.” Although there are variations on these themes
(e.g., asking respondents what they “would do,” what they “would most and least likely do,” and
what they “have done in the past”), different behavioral tendency instructions produce scores that
are highly correlated (Ployhart & Ehrhart, 2003). Conceptually, SJTs administered with behavioral
tendency instructions are more likely to elicit responses that resemble future behavior on the job,
rather than mere knowledge of appropriate responses to a given scenario.
A recent meta-analysis helps shed light on the constructs (including personality traits) typically
assessed using SJTs under varying response instructions (McDaniel et al., 2007). We used the meta-
analytic true-score correlations provided by McDaniel et al. in combination with Big Five intercor-
relations to estimate the amount of personality variance typically observed in SJTs. We obtained
meta-analytic Big Five intercorrelations from the Ones (1993) meta-analysis2 and attenuated them to
reflect observed relationships (using meta-analytic reliability estimates from Viswesvaran & Ones,
2000). A multiple regression of SJT scores on the Big Five indicates that at the construct level, 25%
of the variance assessed by SJTs with behavioral tendency instructions (“what would you do?”)
is personality (Big Five) variance. If SJTs are administered with knowledge instructions (“what
should you do?”), less than 10% of the variance is explained by the Big Five. This suggests that SJTs
with behavioral tendency instructions are better suited to measure personality traits. “Would do”
instructions elicit more personality-saturated responses and thus predict outcomes in organizational
settings that are often predicted by personality variables (e.g., overall job performance, contextual
performance, leadership, job satisfaction). Unfortunately, McDaniel et al.’s (2007) meta-analyses
of SJT validity by response instructions could only investigate validity for predicting overall job
performance, not for performance facets (e.g., task vs. contextual performance). For overall job
performance, the operational validity for SJTs with behavioral tendency instructions (those most
strongly related to the Big Five domains of personality), was .26 (corrected for sampling error and
attenuation due to criterion unreliability).

SIMULATIONS AND ASSESSMENT CENTERS


Assessment centers (ACs) have received much attention in the research literature, yet high develop-
ment and administration costs have limited their use only to occupations in which the dollar value
of performance variability is large (e.g., as selection tools for higher-level managerial positions
2 Although some have criticized the use of these intercorrelations on the basis that they are purportedly “unrealistically
low” (Morgeson et al., 2007a, p. 1035), the meta-analyses are based on data from thousands of people. Other researchers
have also used these estimates to compute construct overlap between personality measures and other individual differ-
ence variables to estimate incremental validity (e.g., Judge, Heller, & Mount, 2002; Judge & Ilies, 2002; McDaniel et al.,
2007).
308 Handbook of Employee Selection

or screening tools in high-risk jobs). This is also true for what can be considered their building
blocks—single exercises or simulations that can be administered individually to assess personal
characteristics.
Motowidlo, Dunnette, and Carter (1990) described a simulation as any situation that “present[s]
applicants with a task stimulus that mimics an actual job situation” (p. 640). Now simulations are
considered situational tests that have fidelity greater than a paper-and-pencil test (Thornton & Rupp,
2003). Most often, these simulations are designed to assess constructs that are supposedly hard to
measure with paper-and-pencil tests. However, this claim is rarely substantiated, and construct-
validity evidence for the traits underlying performance on simulations is often sparse, making it dif-
ficult to evaluate whether the same traits could be assessed by other methods. A systematic review
of the available literature reveals that many dimensions assessed in ACs are at least conceptually
related to personality dimensions (Arthur, Day, McNelly, & Edens, 2003).
Arthur et al.’s (2003) construct-based meta-analysis of the AC method has shown that personality-
based AC dimensions, especially influencing others (a facet of extraversion), possess predictive
validity that rivals that of cognitive-ability-based dimensions such as problem solving. A recent
survey of AC practices among 97 organizations in western Europe and North America (Krause &
Thornton, 2009) shows that personality-based, extraversion-related dimensions are among those
most commonly assessed in ACs, in addition to interpersonal ones conceptually related to agree-
ableness (e.g., consideration of others).
In ACs, personality-relevant variance is captured using simulations and exercises such as role-
plays, group discussions, or in-baskets. An early meta-analysis by Scholz and Schuler (1993) revealed
an interesting pattern of findings, indicating that scores obtained in group discussion exercises
mainly captured openness to experience, dominance, and self-confidence (ρ = .46, .34, and .39,
respectively, N = 236–318), whereas in-basket exercises only reflected dominance (ρ = .23, N = 273).
A recent large-scale investigation in two primary samples (N = 3,748–4,770) showed that scores on
many simulations correlate only negligibly with personality characteristics, with the exception of
extraversion (Ones & Dilchert, 2008).
Simulations and exercises are often tailored to a given job context to make them more realistic
and face-valid. However, design features can impact the nature of the construct measured and the
quality of the measurement (e.g., reliability). For example, a leaderless group discussion that is
competitive (e.g., framed in a negotiation scenario) is likely to elicit behaviors indicative of differ-
ent personality traits than a discussion that is cooperative (e.g., framed in a team problem-solving
context). The selection of simulations and exercises for the prediction of specific criteria should
take such issues into account. Factors such as observability also affect the reliability and validity
of scores. The survey by Krause and Thornton (2009) indicated that in about 50% of organizations
surveyed, most (>75%) AC exercises are specifically developed for an organization. Customization
is costly. If customization elicits behavior indicative of traits particularly valued in a given context,
the cost is likely worthwhile. However, if customization is simply to increase face validity, it is a
missed opportunity.

VALIDITY OF PERSONALITY CONSTRUCTS AND OTHER


FACTORS THAT AFFECT THEIR USEFULNESS
Our opening paragraph portrayed a phoenix-like history for personality variables in I-O psychol-
ogy. Significant evidence documents the utility of personality variables for predicting important
organizational criteria. Yet there are those who sharply criticize the utility of personality vari-
ables for employee selection on the grounds of purportedly low validities. For an exchange on this
issue, see Morgeson et al. (2007a, 2007b) and Murphy and Dzieweczynski (2005) for one side of
the argument, and Ones et al. (2007), Tett and Christiansen (2007), R. T. Hogan (2005a, 2005b),
Barrick and Mount (2005), Hough and Oswald (2005), and Ones et al. (2005) for the other side.
In addition, we refer the reader to meta-analyses and reviews of the literature such as Barrick,
Personality 309

Mount, and Judge (2001); Dudley et al. (2006); Hogan and Holland (2003); Hogan and Ones (1997);
Hough and Furnham (2003); Hough and Ones (2001); Hough and Oswald (2008); Ones et al. (2007);
Ones, Viswesvaran, and Schmidt (1993); Roberts, Kuncel, Shiner, Caspi, and Goldberg (2007); and
Rothstein and Goffin (2006). These summaries indicate that personality constructs predict many
important criteria, including major life outcomes. The list of criteria that are well predicted by per-
sonality variables includes, among others:

• Overall job performance: Conscientiousness, rtrue = .23 (Barrick & Mount, 1991) and
roperational = .20 (Hurtz & Donovan, 2000); personality-based integrity tests, roperational = .37
(Ones et al., 1993)
• Organizational citizenship: Conscientiousness, robserved = .19; positive affectivity, robserved =
.16 (Borman, Penner, Allen, & Motowidlo, 2001)
• Counterproductive work behavior: Conscientiousness, roperational = −.26 (Salgado, 2002);
dependability, rtrue = −.34 (Dudley et al., 2006); personality-based integrity tests, roperational =
−.32 (Ones et al., 1993)
• Managerial effectiveness: Dominance, roperational = .27; energy level, roperational = .20; achieve-
ment orientation, roperational = .17 (Hough, Ones, & Viswesvaran, 1998); conscientiousness,
rtrue = .22 (Barrick & Mount, 1991)
• Customer service: Customer service scales, roperational = .34 (Ones & Viswesvaran, 2008)
• Job satisfaction: Emotional stability, rtrue = .29; conscientiousness, rtrue = .26; extraversion,
rtrue = .25; agreeableness, rtrue = .17 (Judge et al., 2002)
• Divorce: Conscientiousness, robserved = −.13; emotional stability, robserved = −.17; agreeable-
ness, robserved = −.18 (Roberts et al., 2007)
• Mortality: Conscientiousness, robserved = −.09; extraversion/positive emotion, robserved = −.07;
emotional stability, robserved = −.05; agreeableness/lack of hostility, robserved = −.04; each
greater than the effects of socioeconomic status and IQ (Roberts et al., 2007)

Ones et al. (2005) have summarized the meta-analytic evidence for compound personality scales
in predicting work-related criteria and shown that these scales have high validity in predicting the
specific criteria they were developed for and for overall job performance. We readily acknowl-
edge that it is not necessarily Big Five factors that predict valued outcomes. Indeed, we argue that
(a) more specific criteria are predicted by more narrow personality traits; (b) complex criteria are
predicted by theoretically appropriately matched predictors; and (c) for some of the criteria listed
above, the highest predictive validities are not necessarily obtained at the factor level.
We do not want to underestimate the importance of the FFM. It has provided a structure for us
to think about personality variables. Prior to its acceptance, personality and I-O psychology had
little from which to generalize, the myriad of personality measures and variables numbered in the
hundreds, and there were different names for the same or similar constructs or the same name for
different constructs. We are not advocating a return to the “good old daze” (Hough, 1997). We
applaud the interest and evidence coming from studies that examine facet-level variables of the
FFM. We urge more such research, especially research based on empirically derived, generalizable,
facet-level personality taxonomies.

VARIABLES THAT MODERATE VALIDITY OF PERSONALITY CONSTRUCTS


Many variables affect the magnitude of the criterion-related validity that is obtained in primary
and meta-analytic studies. Important factors are the type of criterion, the criterion measurement
method, the relevance of the predictor for the criterion, personality measurement method (see
above), research setting (experimental/laboratory vs. real-life selection), research design (concurrent
vs. predictive/longitudinal), item transparency (subtle vs. obvious), and rater perspective (e.g., self
vs. other). The more theoretically relevant the predictor is to the criterion, the higher the validity.
310 Handbook of Employee Selection

The Hough and Furnham (2003) summary of meta-analyses according to predictor and criterion
construct provided excellent examples of how predictor-criterion relevance affects the relation-
ship between the two. In addition, validities are typically higher in concurrent validation studies
compared to longitudinal validity studies (see Lievens, Dilchert, & Ones, 2005, for exceptions).
Validities are also higher in “weak” situations in which people have more autonomy and control
compared with “strong” situations in which people have few options. Trait activation theory (Tett &
Burnett, 2003; Tett & Guterman, 2000) provides an integrated framework for understanding how
the situation can explain variability in the magnitude of the relationships between personality and
behavior and performance.

INCREMENTAL VALIDITY
Personality variables can increment criterion-related validity in at least one of two ways. One way
is in combination with other relevant personality variables. A second way is in combination with
other individual variables such as measures of cognitive ability. If the predictor variables have low
intercorrelations but do correlate with the criterion variable, criterion-related validity will increase
when they are used in combination.3 Personality variables generally have low correlations with
cognitive ability measures and do increment validity when jointly used (Bartram, 2005; McHenry,
Hough, Toquam, Hanson, & Ashworth, 1990; White et al., 2008). When used in combination with
other measures such as the interview, biodata, and situational judgment, personality variables also
increment validity (DeGroot & Kluemper, 2007; McManus & Kelly, 1999).

ADVERSE IMPACT
Group mean score differences on measures used in employee selection are one of the major factors
determining adverse impact against protected groups, in addition to the selection ratio and score
variability. Hough, Oswald, and Ployhart (2001) summarized studies that examined mean score
differences between Whites and various ethnic minorities, between men and women, and between
older and younger people on personality traits, cognitive ability, and physical abilities. They found
essentially no differences between Whites and ethnic minorities for most personality variables.
They also examined mean score differences between groups at the facet-level of the Big Five with
some unexpected findings: For some facets, mean-score differences differed from that of their
respective Big Five factor (e.g., a Black-White difference of d = −.10 on global extraversion, but .12
on surgency/dominance). A recent meta-analysis of race and ethnic group differences on personal-
ity measures also showed modest differences between Whites and ethnic minority groups on other
facets of the Big Five (Foldes et al., 2008) and again established that differential patterns may exist
for Big Five factors and facets (e.g., a Black-White difference of −.12 on global emotional stability
measures but .17 on self-esteem, a facet of emotional stability). Table 8 of Foldes et al. (2008) also
provides a summary of scenarios based on majority/minority group selection ratios under which
these observed group differences can result in adverse impact.

NATURE OF PREDICTOR-CRITERION RELATIONSHIPS


As with all employee selection measures (whether standardized tests, interviews, simulations, or
ACs), their utility depends on the nature of the predictor-criterion relationship. In making top-down
hiring decisions, linearity between predictor and criterion scores is typically assumed. Pearson
correlations, which are most commonly used to estimate operational validities, also assume

3 Suppressor variables (i.e., variables that are uncorrelated with the criterion but correlated with another predictor) can
increment criterion-related validity; such cases are very rare among personality variables.
Personality 311

linearity, the same assumption that is critical to traditional utility approaches (cf. Schmidt, Hunter,
McKenzie, & Muldrow, 1979).
Nonlinearity has been postulated in other predictor domains (such as cognitive abilities), but few
if any such predictor-criterion relationships have been substantiated based on empirical evidence
(cf. Coward & Sackett, 1990). At least two plausible scenarios of nonlinearity between personality
traits and criterion variables that would impact employee selection can be envisioned: A relationship
between the predictor and criterion in which an asymptote is reached after a certain level of predic-
tor scores (e.g., beyond a certain point, all conscientious individuals keep their workplace similarly
tidy, and differences in orderliness do not translate into performance differences). Additionally, a U-
(or inverted U) shaped function, in which the direction of the relationship actually reverses beyond a
certain level of predictor scores, is possible. Other types of nonlinear relationships are also possible
(see Guion, 1991), but for most selection systems, especially those that use rigorous top-down selec-
tion, the two relationships described above are most detrimental to the utility of a hiring strategy
that had assumed a linear relationship between independent and dependent variables.
In the case of an asymptotic relationship between personality and criterion scores, there is still
potential utility in using personality as part of a selection process. Higher scoring applicants have
higher predicted performance than many low-scoring individuals, even if the highest scoring can-
didates are indistinguishable from one another in terms of their performance once hired. Many
organizations using minimum standards or defined cut-off scores do so because of the implicit
assumption that predictor scores do not matter after a certain cutoff (Coward & Sackett, 1990).
Nonetheless, the utility of selecting for a given characteristic depends on the make-up of the appli-
cant pool (e.g., how many individuals score above a certain level on the predictor) and factors such
as the number of applicants hired.
If, however, personality-performance relationships are described by an inverted U-shaped func-
tion, the detrimental effect on overall utility of a selection system is significant. In cases where
predictor-criterion relationships reverse direction, top-down hiring could result in the acceptance of
applicants who display high predictor scores but actually perform worse than some lower scoring
candidates. Such systematically documented nonlinearity would pose a threat to the usual opera-
tional use of personality measures.
In general, nonlinear relationships occur when the independent or dependent variable is non-nor-
mally distributed. Most personality traits used in employee selection (rather than screening) are nor-
mally distributed; thus, there is little concern. Nonlinearity may be more of an issue when measures
are used that assess only a narrow, extreme range of the trait continuum. For example, a measure of
abnormal personality designed to detect infrequently occurring psychopathological characteristics
will have a nonnormal predictor score distribution in most applicant populations. However, most of
these measures are not suitable for pre-offer employee selection and are most commonly employed
for screening out extreme cases after a conditional job offer has been made. Similarly, some par-
ticular performance indicators, most notably those of distal performance outcomes (e.g., salary,
frequency of citations to published works), are known to be nonnormally distributed and could thus
lead to nonlinear relationships.
Very few studies have investigated the issue of nonlinear personality-criterion relationships. An
early study reported nonlinear relationships between impulsive expression and two narrow per-
formance measures in a small sample of accountants (Day & Silverman, 1989). A study involving
several larger samples found no evidence of nonlinearity for conscientiousness predicting overall
job performance (Robie & Ryan, 1999). Other research reported nonlinear relationships between
conscientiousness and job performance in two samples (one involved biodata and a SJT, another
involved a traditional personality test); however, the nature of the nonlinear relationships differed
(inverted U vs. asymptote) (LaHuis, Martin, & Avis, 2005). Benson and Campbell (2007) reported
nonlinear relationships between composites of “dark-side” personality traits and leadership as
assessed in AC dimensions and supervisory ratings. Large-scale investigations that include per-
formance-relevant personality traits, diverse samples of occupations, and various measures of job
312 Handbook of Employee Selection

performance are needed to determine whether nonlinearity occurs in organizational settings, and if
so, what its impact is on employee selection.

CONCLUSIONS
We understand personality and its role in determining work behavior and performance better than
ever before. Although the FFM has provided an important framework to organize our research and
systematically cumulate evidence, understanding personality and personality-criterion relationships
requires more variables, including broader and narrower variables. Current research examining
the taxonomic structure at the facet-level of the FFM will benefit science and practice as generally
accepted models emerge. Such models allow us to move beyond inventory-specific investigations of
limited generalizability to cumulating results across studies and settings, thus enabling systematic
investigations of moderator variables. Such models also enhance our theory building and theory
testing. As our knowledge of personality-criterion relationships grows for different hierarchical lev-
els of predictor and criterion variables, we learn how to combine predictor variables into criterion-
appropriate variables that will enhance the prediction of valued outcomes in applied settings.
The prospects of better understanding the determinants of work behavior and performance are
exciting. Already primary studies, meta-analyses, and second-order meta-analyses provide ample
evidence that traditional self-report questionnaires of personality are among the most powerful
predictors of behavior in work settings. New developments in assessment and scoring methods
show promise for further improvements in measurement and prediction. Although initial optimism
regarding alternate response formats (e.g., fully ipsative forced-choice scales) proved unjustified,
other innovations (e.g., ideal point response methods and adaptive testing based on IRT) are prom-
ising ways to address concerns about traditional self-reports of personality on Likert-type scales.
Moreover, I-O psychologists have several other assessment tools at their disposal to measure per-
sonality (e.g., biodata, interviews, other-reports, SJTs, and ACs).
In addition to improving measurement using self-report personality measures, we encourage
researchers to thoroughly investigate the value of standardized other-reports in relation to occu-
pational criteria. The few studies that have investigated their criterion-related validity suggest that
other-reports may be even more valid for certain criteria than are self-report measures of personal-
ity. Other-reports can reliably capture personality variance that improves construct coverage and
thus have the potential to increment criterion-related validity. More evidence for the validity of
other reports must be established and moderator variables (such as rating source) more systemati-
cally investigated before organizations will be persuaded to implement more fully such measures
in employee selection.
Personality variables add significant explanatory and predictive power beyond other variables
(e.g., educational credentials, cognitive ability, work experience) often assessed during employment
decision-making. With better understanding of the structure of personality and criterion variables
and better measurement of both, personality will be more fully recognized for its very important
role in determining work behavior and performance.

REFERENCES
Alliger, G. M., Lilienfeld, S. O., & Mitchell, K. E. (1996). The susceptibility of overt and covert integrity tests
to coaching and faking. Psychological Science, 7, 32–39.
Arthur, W., Day, E. A., McNelly, T. L., & Edens, P. S. (2003). A meta-analysis of the criterion-related validity
of assessment center dimensions. Personnel Psychology, 56, 125–154.
Averill, J. R. (1997). The emotions: An integrative approach. In R. Hogan, J. A. Johnson, & S. R. Briggs (Eds.),
Handbook of personality psychology (pp. 513–541). San Diego, CA: Academic Press.
Barge, B. N., & Hough, L. M. (1988). Utility of biographical data for the prediction of job performance. In
L. M. Hough (Ed.), Literature review: Utility of temperament, biodata, and interest assessment for predicting
job performance (ARI Research Note 88-020). Alexandria, VA: U.S. Army Research Institute.
Personality 313

Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Barrick, M. R., & Mount, M. K. (2005). Yes, personality matters: Moving on to more important matters. Human
Performance, 18, 359–372.
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and performance at the beginning of the
new millennium: What do we know and where do we go next? International Journal of Selection and
Assessment, 9, 9–30.
Barrick, M. R., Patton, G. K., & Haugland, S. N. (2000). Accuracy of interviewer judgments of job applicant
personality traits. Personnel Psychology, 53, 925–951.
Bartram, D. (2005). The Great Eight competencies: A criterion-centric approach to validation. Journal of
Applied Psychology, 90, 1185–1203.
Baumeister, R. F., Gailliot, M., DeWall, C., & Oaten, M. (2006). Self-regulation and personality: How interven-
tions increase regulatory success, and how depletion moderates the effects of traits on behavior. Journal
of Personality, 74, 1773–1801.
Benson, M. J., & Campbell, J. P. (2007). To be, or not to be, linear: An expanded representation of personality
and its relationship to leadership performance. International Journal of Selection and Assessment, 15,
232–249.
Birkland, A. S., & Ones, D. S. (2006). The structure of emotional stability: A meta-analytic investigation. Paper
presented at the International Congress of Applied Psychology, Athens, Greece.
Blackman, M. C., & Funder, D. C. (2002). Effective interview practices for accurately assessing counterpro-
ductive traits. International Journal of Selection and Assessment, 10, 109–116.
Block, J. (1995). A contrarian view of the five-factor approach to personality description. Psychological
Bulletin, 117, 187–215.
Borman, W. C., Penner, L. A., Allen, T. D., & Motowidlo, S. J. (2001). Personality predictors of citizenship
performance. International Journal of Selection and Assessment, 9, 52–69.
Chan, D. (2004). Individual differences in tolerance for contradiction. Human Performance, 17, 297–324.
Chan, D. (2008). So why ask me? Are self-report data really that bad? In C. E. Lance & R. J. Vandenberg
(Eds.), Statistical and methodological myths and urban legends: Received doctrine, verity, and fable in
the organizational and social sciences. Hillsdale, NJ: Lawrence Erlbaum.
Chan, D., & Schmitt, N. (2002). Situational judgment and job performance. Human Performance, 15,
233–254.
Chan, D., & Schmitt, N. (2005). Situational judgment tests. In A. Evers, O. Voskuijl, & N. Anderson (Eds.),
Handbook of selection (pp. 219–242). Oxford, England: Blackwell.
Chernyshenko, O. S., Stark, S., Drasgow, F., & Roberts, B. W. (2007). Constructing personality scales under the
assumptions of an ideal point response process: Toward increasing the flexibility of personality measures.
Psychological Assessment, 19, 88–106.
Christiansen, N. D. (1997). The development and validation of a job-related choice method of personality
assessment. Unpublished doctoral dissertation, Northern Illinois University, DeKalb, IL.
Connelly, B. S. (2008). The reliability, convergence, and predictive validity of personality ratings: Another
perspective. Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN.
Connelly, B. S., Davies, S. E., Ones, D. S., & Birkland, A. S. (2008a, April). Agreeableness: A meta-analytic
review of structure, convergence, and predictive validity. In J. P. Thomas & C. Viswesvaran (Chairs),
Personality in the workplace: Advances in measurement and assessment. Symposium conducted at the
annual conference of the Society for Industrial and Organizational Psychology, San Francisco, CA.
Connelly, B. S., Davies, S. E., Ones, D. S., & Birkland, A. S. (2008b, January). Opening up openness: A meta-
analytic review of measures of the personality construct. Poster session presented at the annual confer-
ence of the Society for Personality and Social Psychology, Albuquerque, NM.
Connelly, B. S., & Ones, D. S. (2007, April). Combining conscientiousness scales: Can’t get enough of the trait,
baby. Poster session presented at the annual conference of the Society for Industrial and Organizational
Psychology, New York, NY.
Connolly, J. J., Kavanagh, E. J., & Viswesvaran, C. (2007). The convergent validity between self and observer
ratings of personality: A meta-analytic review. International Journal of Selection and Assessment, 15,
110–117.
Cortina, J. M., Goldstein, N. B., Payne, S. C., Davison, H., & Gilliland, S. W. (2000). The incremental validity
of interview scores over and above cognitive ability and conscientiousness scores. Personnel Psychology,
53, 325–351.
Costa, P. T., & McCrae, R. R. (1988). From catalog to classification: Murray’s needs and the Five-Factor
Model. Journal of Personality and Social Psychology, 55, 258–265.
314 Handbook of Employee Selection

Costa, P. T., & McCrae, R. R. (1992). The revised NEO Personality Inventory (NEO PI-R) and NEO Five-
Factor Inventory (NEO-FFI) professional manual. Odessa, FL: Psychological Assessment Resources.
Coward, W., & Sackett, P. R. (1990). Linearity of ability-performance relationships: A reconfirmation. Journal
of Applied Psychology, 75, 297–300.
Davies, M., Stankov, L., & Roberts, R. D. (1998). Emotional intelligence: In search of an elusive construct.
Journal of Personality and Social Psychology, 75, 989–1015.
Davies, S. E., Connelly, B. S., Ones, D. S., & Birkland, A. S. (2008, April). Enhancing the role of extraversion
for work related behaviors. Poster session presented at the annual conference of the Society for Industrial
and Organizational Psychology, San Francisco, CA.
Day, D. V., & Silverman, S. B. (1989). Personality and job performance: Evidence of incremental validity.
Personnel Psychology, 42, 25–36.
DeGroot, T., & Kluemper, D. (2007). Evidence of predictive and incremental validity of personality factors,
vocal attractiveness and the situational interview. International Journal of Selection and Assessment, 15,
30–39.
Digman, J. M. (1997). Higher-order factors of the Big Five. Journal of Personality and Social Psychology, 73,
1246–1256.
Dilchert, S., Ones, D. S., Van Rooy, D. L., & Viswesvaran, C. (2006). Big Five factors of personality. In
J. H. Greenhaus & G. A. Callanan (Eds.), Encyclopedia of career development (pp. 36–42). Thousand
Oaks, CA: Sage.
Dilchert, S., Ones, D. S., Viswesvaran, C., & Deller, J. (2006). Response distortion in personality measurement:
Born to deceive, yet capable of providing valid self-assessments? Psychology Science, 48, 209–225.
Dudley, N. M., Orvis, K. A., Lebiecki, J. E., & Cortina, J. M. (2006). A meta-analytic investigation of consci-
entiousness in the prediction of job performance: Examining the intercorrelations and the incremental
validity of narrow traits. Journal of Applied Psychology, 91, 40–57.
Foldes, H. J., Duehr, E. E., & Ones, D. S. (2008). Group differences in personality: Meta-analyses comparing
five U.S. racial groups. Personnel Psychology, 61, 579–616.
Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach. Psychological Review,
102, 652–670.
Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New York, NY: Wiley.
Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist, 48, 26–34.
Gough, H. G. (1968). The Chaplin social insight test manual. Palo Alto, CA: Consulting Psychologists Press.
Gough, H. G. (1994). Theory, development, and interpretation of the CPI Socialization scale. Psychological
Reports, 75, 651–700.
Griffeth, R. W., Hom, P. W., & Gaertner, S. (2000). A meta-analysis of antecedents and correlates of employee
turnover: Update, moderator tests, and research implications for the next millennium. Journal of
Management, 26, 463–488.
Guion, R. M. (1991). Personnel assessment, selection, and placement. In M. D. Dunnette & L. M. Hough
(Eds.), Handbook of industrial and organizational psychology (Vol. 2, pp. 327–397). Palo Alto, CA:
Consulting Psychologists Press.
Harold, C. M., McFarland, L. A., & Weekley, J. A. (2006). The validity of verifiable and non-verifiable bio-
data items: An examination across applicants and incumbents. International Journal of Selection and
Assessment, 14, 336–346.
Heggestad, E. D., Morrison, M., Reeve, C. L., & McCloy, R. A. (2006). Forced-choice assessments of person-
ality for selection: Evaluating issues of normative assessment and faking resistance. Journal of Applied
Psychology, 91, 9–24.
Hicks, L. E. (1970). Some properties of ipsative, normative, and forced-choice normative measures.
Psychological Bulletin, 74, 167–184.
Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job-performance relations: A socio-
analytic perspective. Journal of Applied Psychology, 88, 100–112.
Hogan, J., & Ones, D. S. (1997). Conscientiousness and integrity at work. In R. Hogan & J. A. Johnson (Eds.),
Handbook of personality psychology (pp. 849–870). San Diego, CA: Academic Press.
Hogan, J., & Roberts, B. W. (1996). Issues and non-issues in the fidelity-bandwidth trade-off. Journal of
Organizational Behavior, 17, 627–637.
Hogan, R. T. (1969). Development of an empathy scale. Journal of Consulting & Clinical Psychology, 33,
307–316.
Hogan, R. T. (2005a). Comments. Human Performance, 18, 405–407.
Hogan, R. T. (2005b). In defense of personality measurement: New wine for old whiners. Human Performance,
18, 331–341.
Personality 315

Hough, L. M. (1989). Biodata and the measurement of individual differences. In T. W. Mitchell (Chair),
Biodata vs. personality: The same or different class of individual differences? Symposium conducted at
the annual conference of the Society for Industrial and Organizational Psychology, Boston, MA.
Hough, L. M. (1992). The “Big Five” personality variables—construct confusion: Description versus predic-
tion. Human Performance, 5, 139–155.
Hough, L. M. (1997). The millennium for personality psychology: New horizons or good ole daze. Applied
Psychology: An International Review, 47, 233–261.
Hough, L. M. (1998). Personality at work: Issues and evidence. In M. D. Hakel (Ed.), Beyond multiple choice:
Evaluating alternatives to traditional testing for selection (pp. 131–166). Mahwah, NJ: Lawrence
Erlbaum.
Hough, L. M. (2001). I-Owes its advances to personality. In B. W. Roberts & R. Hogan (Eds.), Personality
psychology in the workplace (pp. 19–44). Washington, DC: American Psychological Association.
Hough, L. M. (in press). Assessment of background and life experience: Past as prologue. In J. C. Scott &
D. H. Reynolds (Eds.), Handbook of workplace assessment: Selecting and developing organizational
talent. Hoboken, NJ: Wiley & Sons.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-related validi-
ties of personality constructs and the effect of response distortion on those validities. Journal of Applied
Psychology, 75, 581–595.
Hough, L. M., & Furnham, A. (2003). Use of personality variables in work settings. In W. C. Borman, D. R. Ilgen
& R. J. Klimoski (Eds.), Handbook of psychology. Vol. 12: Industrial and organizational psychology
(pp. 131–169). Hoboken, NJ: John Wiley & Sons.
Hough, L. M., & Ones, D. S. (2001). The structure, measurement, validity, and use of personality variables
in industrial, work, and organizational psychology. In N. Anderson, D. S. Ones, H. K. Sinangil, &
C. Viswesvaran (Eds.), Handbook of industrial, work, and organizational psychology. Vol. 1: Personnel
psychology (pp. 233–277). London, England: Sage.
Hough, L. M., Ones, D. S., & Viswesvaran, C. (1998, April). Personality correlates of managerial perfor-
mance constructs. Poster session presented at the annual conference of the Society for Industrial and
Organizational Psychology, Dallas, TX.
Hough, L. M., & Oswald, F. L. (2000). Personnel selection: Looking toward the future—Remembering the
past. Annual Review of Psychology, 51, 631–664.
Hough, L. M., & Oswald, F. L. (2005). They’re right, well... mostly right: Research evidence and an agenda to
rescue personality testing from 1960s insights. Human Performance, 18, 373–387.
Hough, L. M., & Oswald, F. L. (2008). Personality testing and I-O psychology: Reflections, progress, and
prospects. Industrial and Organizational Psychology, 1, 272–290.
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants, detection and amelioration of adverse
impact in personnel selection procedures: Issues, evidence and lessons learned. International Journal of
Selection and Assessment, 9, 152–194.
Hough, L. M., & Paullin, C. (1994). Construct-oriented scale construction: The rational approach. In G. S.
Stokes & M. D. Mumford (Eds.), Biodata handbook: Theory, research, and use of biographical informa-
tion in selection and performance prediction (pp. 109–145). Palo Alto, CA: CPP Books.
Hough, L. M., & Schneider, R. J. (1996). Personality traits, taxonomies, and applications in organizations. In
K. R. Murphy (Ed.), Individual differences and behavior in organizations (pp. 31–88). San Francisco,
CA: Jossey-Bass.
Houston, J. S., Borman, W. C., Farmer, W., & Bearden, R. M. (2005). Development of the Enlisted Computer
Adaptive Personality Scales (ENCAPS), Renamed Navy Computer Adaptive Personality Scales (NCAPS)
(Institute Report #503). Minneapolis, MN: Personnel Decisions Research Institutes.
Huffcutt, A. I., & Arthur, W. (1994). Hunter and Hunter (1984) revisited: Interview validity for entry-level jobs.
Journal of Applied Psychology, 79, 184–190.
Huffcutt, A. I., Conway, J. M., Roth, P. L., & Stone, N. J. (2001). Identification and meta-analytic assessment
of psychological constructs measured in employment interviews. Journal of Applied Psychology, 86,
897–913.
Huffcutt, A. I., Roth, P. L., & McDaniel, M. A. (1996). A meta-analytic investigation of cognitive ability in
employment interview evaluations: Moderating characteristics and implications for incremental validity.
Journal of Applied Psychology, 81, 459–473.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin, 96, 72–98.
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of
Applied Psychology, 85, 869–879.
316 Handbook of Employee Selection

Jackson, D. N., Wroblewski, V. R., & Ashton, M. C. (2000). The impact of faking on employment tests: Does
forced choice offer a solution? Human Performance, 13, 371–388.
Judge, T. A., Heller, D., & Mount, M. K. (2002). Five-factor model of personality and job satisfaction: A
meta-analysis. Journal of Applied Psychology, 87, 530–541.
Judge, T. A., & Ilies, R. (2002). Relationship of personality to performance motivation: A meta-analytic review.
Journal of Applied Psychology, 87, 797–807.
Kamp, J. D., & Gough, H. G. (1986). The Big Five personality factors from an assessment context. Paper pre-
sented at the annual conference of the American Psychological Association, Washington, DC.
Kenny, D. A., Albright, L., Malloy, T. E., & Kashy, D. A. (1994). Consensus in interpersonal perception:
Acquaintance and the Big Five. Psychological Bulletin, 116, 245–258.
Kilcullen, R. N., White, L. A., Mumford, M. D., & Mack, H. (1995). Assessing the construct validity of rational
biodata scales. Military Psychology, 7, 17–28.
Kohlberg, L., Levine, C., & Hewer, A. (1994). Moral stages: A current formulation and a response to critics.
In B. Puka (Ed.), Moral development: A compendium. Vol. 5: New research in moral development
(pp. 126–188). New York, NY: Garland.
Krause, D. E., & Thornton, G. C. (2009). A cross-cultural look at assessment center practices: Survey results
from Western Europe and North American. Applied Psychology: An International Review, 58, 557–585.
LaHuis, D. M., Martin, N. R., & Avis, J. M. (2005). Investigating nonlinear conscientiousness—Job performance
relations for clerical employees. Human Performance, 18, 199–212.
Lievens, F., Dilchert, S., & Ones, D. S. (2005, April). Personality validity increases in medical school:
A seven-year longitudinal investigation. In L. M. Hough & D. S. Ones (Chairs), Power of personality:
Longitudinal studies and meta-analyses. Symposium conducted at the annual conference of the Society
for Industrial and Organizational Psychology, Los Angeles, CA.
Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22, 55.
Manley, G. G., Benavidez, J., & Dunn, K. (2007). Development of a personality biodata measure to predict
ethical decision making. Journal of Managerial Psychology, 22, 664–682.
Martin, B. A., Bowen, C. C., & Hunt, S. T. (2002). How effective are people at faking on personality question-
naires? Personality and Individual Differences, 32, 247–256.
McCloy, R. A., Heggestad, E. D., & Reeve, C. L. (2005). A silk purse from the sow’s ear: Retrieving normative
information from multidimensional forced-choice items. Organizational Research Methods, 8, 222–248.
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb, W. (2007). Situational judgment tests, response
instructions, and validity: A meta-analysis. Personnel Psychology, 60, 63–91.
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The validity of employment inter-
views: A comprehensive review and meta-analysis. Journal of Applied Psychology, 79, 599–616.
McFarland, L. A., & Ryan, A. M. (2000). Variance in faking across noncognitive measures. Journal of Applied
Psychology, 85, 812–821.
McHenry, J. J., Hough, L. M., Toquam, J. L., Hanson, M. A., & Ashworth, S. (1990). Project A validity results:
The relationship between predictor and criterion domains. Personnel Psychology, 43, 335–354.
McManus, M. A., & Kelly, M. L. (1999). Personality measures and biodata: Evidence regarding their incremental
predictive value in the life insurance industry. Personnel Psychology, 52, 137–148.
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007a). Are
we getting fooled again? Coming to terms with limitations in the use of personality tests for personnel
selection. Personnel Psychology, 60, 1029–1049.
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007b).
Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60,
683–729.
Moscoso, S. (2000). Selection interview: A review of validity evidence, adverse impact and applicant reactions.
International Journal of Selection and Assessment, 8, 237–247.
Motowidlo, S. J., Dunnette, M. D., & Carter, G. W. (1990). An alternative selection procedure: The low-fidelity
simulation. Journal of Applied Psychology, 75, 640–647.
Mount, M. K., Barrick, M. R., & Strauss, J. P. (1994). Validity of observer ratings of the big five personality
factors. Journal of Applied Psychology, 79, 272–280.
Murphy, K. R., & Dzieweczynski, J. L. (2005). Why don’t measures of broad dimensions of personality per-
form better as predictors of job performance? Human Performance, 18, 343–357.
Nilsen, D. (1995). An investigation of the relationship between personality and leadership performance.
Unpublished doctoral dissertation, University of Minnesota, Minneapolis, MN.
Ones, D. S. (1993). The construct validity of integrity tests. Unpublished doctoral dissertation, University of
Iowa, Iowa City, IA.
Personality 317

Ones, D. S., & Dilchert, S. (2008, February). Recent assessment center research: Dimensions, exercises, group
differences, and incremental validity. Paper presented at the annual Assessment Centre Study Group
conference, Stellenbosch, South Africa.
Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In support of personality assessment in orga-
nizational settings. Personnel Psychology, 60, 995–1027.
Ones, D. S., & Viswesvaran, C. (1996). Bandwidth-fidelity dilemma in personality measurement for personnel
selection. Journal of Organizational Behavior, 17, 609–626.
Ones, D. S., & Viswesvaran, C. (2001). Personality at work: Criterion-focused occupational personality scales
used in personnel selection. In B. W. Roberts & R. Hogan (Eds.), Personality psychology in the work-
place (pp. 63–92). Washington, DC: American Psychological Association.
Ones, D. S., & Viswesvaran, C. (2008). Customer service scales: Criterion-related, construct, and incremental
validity evidence. In J. Deller (Ed.), Research contributions to personality at work (pp. 19–46). Mering,
Germany: Hampp.
Ones, D. S., Viswesvaran, C., & Dilchert, S. (2005). Personality at work: Raising awareness and correcting
misconceptions. Human Performance, 18, 389–404.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities:
Findings and implications for personnel selection and theories of job performance. Journal of Applied
Psychology, 78, 679–703.
Oswald, F. L., & Hough, L. M. (2008). Personality testing and I-O psychology: A productive exchange and
some future directions. Industrial and Organizational Psychology, 1, 323–332.
Oviedo-Garcia, M. (2007). Internal validation of a biodata extraversion scale for salespeople. Social Behavior
and Personality, 35, 675–692.
Paunonen, S. V., & Jackson, D. N. (1996). The Jackson Personality Inventory and the Five-Factor Model of
personality. Journal of Research in Personality, 30, 42–59.
Ployhart, R. E., & Ehrhart, M. G. (2003). Be careful what you ask for: Effects of response instructions on the
construct validity and reliability of situational judgment tests. International Journal of Selection and
Assessment, 11, 1–16.
Reilly, R. R., & Chao, G. R. (1982). Validity and fairness of some alternative employee selection procedures.
Personnel Psychology, 35, 1–62.
Roberts, B. W., Chernyshenko, O. S., Stark, S. E., & Goldberg, L. R. (2005). The structure of conscientious-
ness: An empirical investigation based on seven major personality questionnaires. Personnel Psychology,
58, 103–139.
Roberts, B. W., Kuncel, N. R., Shiner, R., Caspi, A., & Goldberg, L. R. (2007). The power of personality:
The comparative validity of personality traits, socioeconomic status, and cognitive ability for predicting
important life outcomes. Perspectives on Psychological Science, 2, 313–345.
Robie, C., & Ryan, A. M. (1999). Effects of nonlinearity and heteroscedasticity on the validity of conscien-
tiousness in predicting overall job performance. International Journal of Selection and Assessment, 7,
157–169.
Rothstein, M. G., & Goffin, R. D. (2000). The assessment of personality constructs in industrial-organizational
psychology. In R. D. Goffin & E. Helmes (Eds.), Problems and solutions in human assessment: Honoring
Douglas N. Jackson at seventy (pp. 215–248). New York, NY: Kluwer Academic/Plenum.
Rothstein, M. G., & Goffin, R. D. (2006). The use of personality measures in personnel selection: What does
current research support? Human Resource Management Review, 16, 155–180.
Russell, J. A., & Carroll, J. M. (1999). On the bipolarity of positive and negative affect. Psychological Bulletin,
125, 3–30.
Ryan, A. M., McFarland, L., Baron, H., & Page, R. (1999). An international look at selection practices: Nation
and culture as explanations for variability in practice. Personnel Psychology, 52, 359–391.
Salgado, J. F. (2002). The Big Five personality dimensions and counterproductive behaviors. International
Journal of Selection and Assessment, 10, 117–125.
Schmidt, F. L., & Hunter, J. E. (1996). Measurement error in psychological research: Lessons from 26 research
scenarios. Psychological Methods, 1, 199–223.
Schmidt, F. L., Hunter, J. E., McKenzie, R. C., & Muldrow, T. W. (1979). Impact of valid selection procedures
on work-force productivity. Journal of Applied Psychology, 64, 609–626.
Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta-analyses of validity studies published
between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology, 37,
407–422.
Schmitt, N., & Kunce, C. (2002). The effects of required elaboration of answers to biodata questions. Personnel
Psychology, 55, 569–587.
318 Handbook of Employee Selection

Schmitt, N., Oswald, F. L., Kim, B. H., Gillespie, M. A., Ramsay, L. J., & Yoo, T.-Y. (2003). Impact of elabora-
tion on socially desirable responding and the validity of biodata measures. Journal of Applied Psychology,
88, 979–988.
Schneider, R. J., Ackerman, P. L., & Kanfer, R. (1996). To “act wisely in human relations”: Exploring the
dimensions of social competence. Personality and Individual Differences, 21, 469–481.
Schneider, R. J., & Hough, L. M. (1995). Personality and industrial/organizational psychology. In C. L. Cooper &
I. T. Robertson (Eds.), International review of industrial and organizational psychology (pp. 75–129).
Chichester, England: Wiley.
Schneider, R. J., Hough, L. M., & Dunnette, M. D. (1996). Broadsided by broad traits: How to sink science in
five dimensions or less. Journal of Organizational Behavior, 17, 639–655.
Scholz, G., & Schuler, H. (1993). Das nomologische Netzwerk des Assessment Centers: eine Metaanalyse.
[The nomological network of the assessment center: A meta-analysis]. Zeitschrift für Arbeits- und
Organisationspsychologie, 37, 73–85.
Sisco, H., & Reilly, R. R. (2007a). Development and validation of a biodata inventory as an alternative method
to measurement of the Five Factor Model of personality. Social Science Journal, 44, 383–389.
Sisco, H., & Reilly, R. R. (2007b). Five Factor Biodata Inventory: Resistance to faking. Psychological Reports,
101, 3–17.
Stark, S., & Chernyshenko, O. S. (2007, June). Adaptive testing with the multi-unidimensional pairwise prefer-
ence model. Paper session: New CAT models: Graduate Management Admission Council Conference on
Computerized Adaptive Testing. Minneapolis, MN.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2005). An IRT approach to constructing and scoring pair-
wise preference items involving stimuli on different dimensions: The multi-unidimensional pairwise-
preference model. Applied Psychological Measurement, 29, 184–203.
Stark, S., Chernyshenko, O. S., Drasgow, F., & Williams, B. A. (2006). Examining assumptions about item
responding in personality assessment: Should ideal point methods be considered for scale development
and scoring? Journal of Applied Psychology, 91, 25–39.
Stokes, G. S., & Cooper, L. A. (2001). Content/construct approaches in life history form development for selec-
tion. International Journal of Selection and Assessment, 9, 138–151.
Stokes, G. S., & Cooper, L. A. (2004). Biodata. In J. C. Thomas (Ed.), Comprehensive handbook of psycho-
logical assessment (Vol. 4: Industrial and organizational assessment, pp. 243–268). Hoboken, NJ: John
Wiley & Sons.
Stokes, G. S., Hogan, J. B., & Snell, A. F. (1993). Comparability of incumbent and applicant samples for the
development of biodata keys: The influence of social desirability. Personnel Psychology, 46, 739–762.
Tenopyr, M. L. (1994). Big Five, structural modeling, and item response theory. In G. S. Stokes, M. D. Mumford
& W. A. Owens (Eds.), Biodata handbook: Theory, research, and use of biographical information in
selection and performance prediction (pp. 519–533). Palo Alto, CA: Consulting Psychologists Press.
Tett, R. P., & Burnett, D. D. (2003). A personality trait-based interactionist model of job performance. Journal
of Applied Psychology, 88, 500–517.
Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the crossroads: A response to Morgeson, Campion,
Dipboye, Hollenbeck, Murphy, and Schmitt (2007). Personnel Psychology, 60, 967–993.
Tett, R. P., & Guterman, H. A. (2000). Situation trait relevance, trait expression, and cross-situational consis-
tency: Testing a principle of trait activation. Journal of Research in Personality, 34, 397–423.
Tett, R. P., Jackson, D. N., Rothstein, M., & Reddon, J. R. (1999). Meta-analysis of bidirectional relations in
personality-job performance research. Human Performance, 12, 1–29.
Thornton, G. C., & Rupp, D. E. (2003). Simulations and assessment centers. In W. C. Borman, D. R. Ilgen, &
R. J. Klimoski (Eds.), Handbook of psychology: Industrial and organizational psychology (Vol. 12,
pp. 319–344). New York, NY: Wiley & Sons.
Thurstone, L. L. (1928). Attitudes can be measured. American Journal of Sociology, 33, 529–554.
Tupes, E. C., & Christal, R. E. (1958). Stability of personality trait rating factors obtained under diverse condi-
tions (WADC-TN-58-61). Lackland Airforce Base, TX: Personnel Laboratory, Wright Air Development
Center, U.S. Air Force.
Tupes, E. C., & Christal, R. E. (1961). Recurrent personality factors based on trait ratings (ASD-TR-61-97).
Lackland Airforce Base, TX: Personnel Laboratory, Aeronautical Systems Division, U.S. Air Force.
Tupes, E. C., & Christal, R. E. (1992). Recurrent personality factors based on trait ratings. Journal of Personality,
60, 225–251.
Underhill, C. M. (2006). Investigation of item-pair presentation and construct validity of the Navy Computer
Adaptive Personality Scales (NCAPS). Millington, TN: Navy Personnel Research, Studies, Technology
Division, Bureau of Naval Personnel.
Personality 319

Van Iddekinge, C. H., Raymark, P. H., Eidson, C. E., Jr., & Attenweiler, W. J. (2004). What do structured
selection interviews really measure? The construct validity of behavior description interviews. Human
Performance, 17, 71–93.
Van Iddekinge, C. H., Raymark, P. H., & Roth, P. L. (2005). Assessing personality with a structured employ-
ment interview: Construct-related validity and susceptibility to response inflation. Journal of Applied
Psychology, 90, 536–552.
Viswesvaran, C., & Ones, D. S. (2000). Measurement error in “Big Five factors” personality assessment:
Reliability generalization across studies and measures. Educational and Psychological Measurement,
60, 224–235.
Wernimont, P. F., & Campbell, J. P. (1968). Signs, samples, and criteria. Journal of Applied Psychology, 52,
372–376.
White, L. A., Young, M. C., Hunter, A. E., & Rumsey, M. G. (2008). Lessons learned in transitioning personality
measures from research to operational settings. Industrial and Organizational Psychology, 1, 291–295.
White, L. A., Young, M. C., & Rumsey, M. G. (2001). ABLE implementation issues and related research. In
J. P. Campbell & D. J. Knapp (Eds.), Exploring the limits in personnel selection and classification (pp.
525–558). Mahwah, NJ: Lawrence Erlbaum.
Wiesner, W. H., & Cronshaw, S. F. (1988). A meta-analytic investigation of the impact of interview format and
degree of structure on the validity of the employment interview. Journal of Occupational Psychology,
61, 275–290.
Wiggins, J. S., & Trapnell, P. D. (1997). Personality structure: The return of the Big Five. In R. Hogan,
J. A. Johnson, & S. R. Briggs (Eds.), Handbook of personality psychology. (pp. 737–765). San Diego,
CA: Academic Press.
Young, M. C., White, L. A., Heggestad, E. D., & Barnes, J. D. (2004). Operational validation of the Army’s new
pre-enlistment attrition screening measure. Paper presented at the annual conference of the American
Psychological Association, Honolulu, HI.
Zuckerman, M., Kuhlman, D. M., Joireman, J., Teta, P., & Kraft, M. (1993). A comparison of three structural
models for personality: The Big Three, the Big Five, and the Alternative Five. Journal of Personality &
Social Psychology, 65, 757–768.
This page intentionally left blank
15 Values, Styles, and
Motivational Constructs

David Chan

For several decades now, cognitive ability and personality traits are the two major types of pre-
dictors examined in employee selection research. Construct-oriented studies have focused on the
structure and taxonomy of cognitive ability (see Chapter 12, this volume) and personality traits
(see Chapter 14, this volume), as well as the validity evidence for these two types of constructs.
In contrast, selection researchers have paid little attention to other types of individual difference
predictors such as those in the domains of values, cognitive styles, and motivational constructs.
To the extent that individual differences in values, cognitive styles, and motivations are distinct
from cognitive ability and personality constructs, and to the extent that these individual differences
predict work-relevant attitudes, perceptions, and behaviors, there is a need in selection research to
direct more attention to these “nontraditional” predictor constructs. The purpose of this chapter is
to provide an overview of the major values, cognitive styles, and motivational constructs that are
likely relevant in employee selection research. In the following sections, I discuss each of these three
construct domains with the objectives to (a) understand the basic conceptualizations of the focal
constructs and their potential value in employee selection research and practice, (b) illustrate the
variety of constructs and present the theory and research associated with their structure and validity,
and (c) discuss the current concerns and emerging issues in the conceptualization and measurement
of these constructs. I end the chapter with a discussion on practical considerations of the use of these
constructs in employee selection and a proposed strategic agenda for future research directions.

VALUES
The interest in the psychological research on the concept of values may be traced back to the
publication of Rokeach’s (1973) influential book The Nature of Human Values and the Rokeach
Value Survey, which he developed to measure the various value constructs described in his book.
Subsequent researchers who examined the structure of values or criterion-related validities of
values have tended to rely on Rokeach’s conceptual definition of values, which refers to the
individual’s “enduring belief that a specific mode of conduct or end-state of existence is person-
ally or socially preferable to an opposite or converse mode of conduct or end-state of existence”
(Rokeach, 1973, p. 5). Although researchers have defined values in different ways, there appears
to be a consensus from their conceptual definitions that values are the individual’s stable beliefs
that serve as general standards by which he or she evaluates specific things, including people,
behaviors, activities, and issues. These standards of evaluation are also considered abstract goals,
which are important guiding principles in life for the individual. There is also agreement that
values are more general than attitudes in that the latter are more referent-specific. Values are also
differentiated from interests in that the former is scaled on relative importance whereas the latter
is scaled on relative liking.

321
322 Handbook of Employee Selection

WHY STUDY VALUES?


The rationale for the study of values is primarily due to its criterion-related validity. Because values
are assumed to occupy a central position in the individual’s network of cognitive beliefs and atti-
tudes, we would expect values to be associated with and hence predictive of criteria such as specific
beliefs, attitudes, perceptions, and behaviors. Indeed, much of the early interest in empirical studies
of values were generated by Rokeach’s (1973) seminal research showing that rankings of the impor-
tance of values were predictive of a wide variety of attitudes and behaviors. Subsequent to Rokeach’s
work, the criterion-related validities of values were quite consistently demonstrated over the years
for diverse criteria including attitudinal, perceptual, and behavioral outcomes (e.g., England & Lee,
1974; Kidron, 1978; Maio, Roese, Seligman, & Katz, 1996; Ravlin & Meglino, 1987). For example,
Ravlin and Meglino (1987) showed that achievement, concern for others, fairness, and honesty were
major values that predicted various perceptions and decisions at the workplace.
A second reason for studying values is that value congruence, or similarity versus dissimilar-
ity of values, is expected to lead to important outcomes. For example, studies have found that
value congruence between managers and their organizations predicted the managers’ success and
intention to remain in the organization (Posner, Kouzes, & Schmidt, 1985), and value congruence
between subordinates and supervisors predicted subordinates’ ratings of supervisors’ competence and
success (Weiss, 1978). However, the inferences from the results of many value congruence studies
tend to be less conclusive given the difficulty of interpretation associated with methodological problems
in these studies (Meglino, Ravlin, & Adkins, 1989).

STRUCTURE OF VALUES
Following his conceptual definition of values, Rokeach (1973) made two useful distinctions in
the structure of values. The first distinction is between instrumental values and terminal values.
Instrumental values are about modes of conduct, and they refer to the subjective desirability about
the actions or conduct, such as being honest, obedient, or courageous, which are presumed as means
that lead to certain desirable outcomes. Terminal values are about end-states of existence, and they
refer to the subjective desirability of life outcomes such as equality or a peaceful world. The second
distinction is between values about well-being of the self and values about well-being of others. On
the basis of the above two distinctions, Rokeach produced a useful taxonomy of four major types
of values by factorially crossing the two independent distinctions. Instrumental values that are self-
oriented are called competence values (e.g., being ambitious, independent), whereas instrumental
values that are other-focused are called moral values (e.g., being altruistic, forgiving). Terminal
values that are self-oriented are called personal values (e.g., a materially comfortable life, a well-
respected person) whereas terminal values that are other-oriented are called social values (e.g., a
peaceful world, a society with little or no inequality).
Schwartz (1992) argued that the conceptual distinction between instrumental and terminal val-
ues, although intuitively attractive, may not be necessary and may in fact create confusion because
in many cases the same value may be construed as a means and an end. For example, pleasure may
be construed as a terminal value but it may also serve as an instrumental value in promoting other
terminal values such as happiness. Also, instrumental values, such as being honest, could also be
seen as a terminal value to be promoted by other instrumental values, such as being courageous.
Dissatisfied with the typology of values provided by Rokeach (1973), Schwartz and his col-
leagues (Schwartz, 1992; Schwartz & Bilsky, 1990) proposed a new framework or structure of
values that he believed to have universal content that can be applied across cultures. Schwartz and
colleagues presented respondents with items representing specific values and asked them to rate
the importance of each value to their lives. On the basis of these importance ratings, from large
and diverse samples of respondents, Schwartz organized the large variety of individuals’ specific
values into ten value types (e.g., power, achievement, hedonism, self-direction). Schwartz further
Values, Styles, and Motivational Constructs 323

proposed that the ten values may be organized at a higher level into two bipolar value dimensions;
namely, openness to change versus conservation and self-enhancement versus self-transcendence.
However, research using Schwartz’s framework has focused almost exclusively on the ten value
types probably because of the generic (and hence less useful) nature of the two bipolar value
dimensions. In addition to the ten value types at the individual level, Schwartz proposed seven
value dimensions at the cultural level to allow for cross-cultural comparisons in value research.
Examples of these cultural-level value dimensions are prosocial (active protection or enhancement
of the welfare of others), restrictive conformity (restraint of actions likely to harm others or vio-
late norms), and security (safety, harmony, and stability of the society of groups with whom one
identifies).
A major contribution of Schwartz’s framework is that in addition to providing a categorization
of values at the individual level, it offers a conceptual guide for us to understand and compare cul-
tures in terms of value dimensions. There is considerable empirical evidence that the framework,
including the ten value types and seven culture dimensions, can be used on a global basis to identify
and understand the content and structure of values across diverse cultures (e.g., Schwartz & Sagiv,
1995). To date, Schwartz’s framework represents the most comprehensive typology of values at the
individual and culture levels of analysis, and there is also a relatively large research literature on the
results of the Schwartz Value Survey administered in diverse cultures.
Another large-scale value survey project is the well-known World Values Survey, which was
developed from the original European Value Survey. The first World Values Survey, conducted in
1981, contained only 22 countries with 14 of them outside of Europe. The second wave, which con-
tained 42 countries, was conducted 10 years later. Subsequent waves, containing increasingly more
countries, were conducted at approximately 5-year intervals. Results on the World Values Survey
are freely available at http://www.worldvaluessurvey.com. One of the most well-known interpreta-
tions of the results of the World Values Survey is that the many values across countries may be
factor-analytically summarized into two global dimensions of cultural variation labeled as “tradi-
tional versus secular-rational” and “survival versus self-expression.” Given the large-scale results on
diverse cultures available on the World Values Survey and the Schwartz Value Survey, the utility of
these two value frameworks is likely to continue for many years.

CURRENT CONCERNS AND EMERGING ISSUES


The scientific defensibility and practical usefulness of values for employee selection are dependent
on the extent to which values are adequately conceptualized and measured. The following high-
lights some of the current concerns and emerging issues associated with conceptualization and
measurement in the study of values.

1. An adequate structure of values clarifying taxonomy and typology issues (i.e., number,
level, and type of values) is fundamental for the study of values to contribute to the science
and practice of employee selection. In employee selection, the criterion constructs of inter-
est are primarily work-relevant attitudes and behaviors. The structure of values is impor-
tant because it provides the conceptual organizing principles to relate these work-relevant
attitudes and behaviors to value constructs. Although we now have several conceptual
frameworks that provide researchers a working structure of values, it remains unclear what
degree of comprehensiveness and level of specificity we would require of a values structure
for the purpose of employee selection research and practice. A structure is nonparsimoni-
ous and impractical if it specifies a large variety of specific values organized into many
different types, domains, and levels of conceptualization. On the other hand, a structure
with a few generic values is likely to lead to studies with misspecified models because of
omitted value variables. There has been a proliferation of value measures that are rarely
reconciled with earlier measures in the literature, and this makes comparison of studies
324 Handbook of Employee Selection

problematic and accumulation of knowledge difficult. An adequate structure of values is


needed to guide researchers and provide more precise and theory-driven operationaliza-
tions of value constructs.
2. Even when multiple values are examined in a single study, researchers tend to study each
value in isolation as opposed to the effects of an individual’s actual profile of values.
Note that this goes beyond studying joint effects of multiple values at the aggregate level
of analysis (e.g., incremental validity of conformity over tradition, interaction effect of
power and achievement), which could examine only a small number of values. The study
of interindividual differences in intraindividual profiles of values is important because
it is unlikely that an individual’s attitude or behavior is determined by a single value in
isolation. Intraindividual analyses also directly address the issue of intraindividual value
conflicts, which should be most relevant in work situations involving moral dilemmas. The
study of interindividual differences in intraindividual profiles of values and intraindividual
changes in values over time involves difficult measurement and data analysis issues, but
there are recent methodological advances that provide useful tools for conceptualizing and
assessing these differences (see Chan, 1998a, 2002).
3. The study of individual values and cultural values raises important levels of analysis issues
that need to be addressed. For example, does a value construct change in meaning when it
is composed from the individual level to the cultural level of analysis? The functional rela-
tionships between the same value construct across different (individual vs. cultural) levels
of analysis need to be carefully specified in a composition model. Failing to adequately
address these multilevel issues could lead to critical conceptual, measurement, analysis,
and inferential errors (see Chan, 1998b, 2005a).
4. Two increasingly important areas in employee selection are staffing teams (see Chapter
37, this volume) and selection of employees for expatriate assignments (see Chapter 36,
this volume). In these two areas, as well as the ongoing area of interest relating to
person-organization fit, the critical issue in the study of values concerns value congruence.
Advancement in these areas of employee selection research and practice is dependent on
advancements in person-environment fit research, particularly in issues relating to the
different conceptualizations and measurements of fit (e.g., objective fit vs. subjective fit;
supplementary fit vs. complementary fit). For example, value fit is almost always conceptu-
alized as supplementary fit defined in terms of similarity of values between the person and
the environment. Are there situations in which value fit is better conceptualized as comple-
mentary fit defined in terms of the environment meeting certain value needs/demands of the
person? In other words, value congruence may not always mean or imply value similarity.
A different conceptualization of the type of value congruence or fit could open up new and
useful areas for research and practice in employee selection. These issues on value congruence
apply not only to person-environment fit but also to person-person fit.
5. Given the research dependence on large-scale international surveys, which are largely
western in origin, applications of research findings on individual- and cultural-level values
to nonwestern cultures will need to pay careful attention to validity concerns associated
with methodological issues in cross-cultural measurement.

COGNITIVE STYLES
Cognitive styles refer to the characteristic mode, typical method, habitual patterns, or preferred
ways of processing information that are consistent over time and across many areas of activity. So,
we can speak of cognitive styles in terms of thinking styles, problem-solving styles, learning styles,
and so forth.
As noted by Sternberg and Grigorenko (1997), cognitive styles should be distinguished from
strategies. The latter refers to the operations that individuals use or follow to minimize errors in
Values, Styles, and Motivational Constructs 325

problem-solving and decision-making. The use of strategies involves the conscious choice of alter-
native operations, whereas cognitive styles typically function without the individual’s awareness.
In addition, strategies are used in task- or context-specific situations, whereas cognitive styles refer
to more stable characteristic modes of information processing that the individual uses consistently
across a large variety of task situations or contexts.
Cognitive styles refer to a set of preferences or habits and hence should be distinguished from
cognitive abilities. We can construe cognitive abilities as the “can do” aspect of cognition and
cognitive styles as the “tend to do” aspect of cognition. Because styles are not abilities, they should
not be inherently better or worse in an absolute or context-free sense. Instead, cognitive styles may
differ in their goodness of fit to different environments or situations, and the degree of fit could
lead to different extent of positive or negative consequences. Cognitive styles are also distinct from
personality traits. Although personality traits such as conscientiousness and extraversion also refer
to individual differences in stable characteristic modes of behaviors, they tend to be construed as
generic behavioral tendencies or predispositions, whereas cognitive styles refer to typical modes of
information processing.

WHY STUDY STYLES?


Conceptually, the rationale for studying cognitive styles is fairly obvious because an individual’s
habitual or preferred ways of processing information would affect the individual’s perception, learn-
ing, and performance. Hence, employee selection researchers and practitioners should be interested
in cognitive styles as potential predictors for various work-relevant criterion outcomes. Given the
centrality of information processing in learning and skill acquisition, cognitive styles should also be
of great interest in training research.
Because cognitive styles affect information processing and are distinguished from cognitive
abilities and personality traits, they provide another potential source of predictor constructs for
employee selection. In addition, it may be useful to relate cognitive styles to the maximum-typical
performance distinction in employee selection. Cognitive abilities are most relevant to maximum
performance, and personality traits are most relevant to typical performance. Cognitive styles refer
to the “tend to do” aspect of cognition and therefore provide a potential bridge between cognitive
ability and personality traits for investigating how these two traditional types of predictors may
interface.

VARIETIES OF STYLES
The idea of cognitive styles as an interface between cognitive ability and personality was very
popular in the 1950s and 1960s, and it was during that period that numerous types and measures of
cognitive styles were developed. However, not all of the purported cognitive style constructs are in
fact assessing cognitive styles. For example, Witkin and colleagues introduced the style construct
called field independence to refer to the degree to which individuals are dependent or independent
on the structure of the surrounding visual field when perceiving objects. The Rod and Frames Test
(Witkin, Dyke, Faterson, Goodenough, & Karp, 1962) and the Embedded Figures Test (Witkin,
Oltman, Raskin, & Karp, 1971) are the two most widely used measures of field independence. In
these measures, the individual’s task is to locate a true vertical (in the Rod and Frame Test) or an
object/figure (Embedded Figures Test) that can be accomplished only by ignoring the surrounding
visual field. The problem with the purported style construct of field independence is that it most
likely represents a cognitive ability as opposed to a cognitive style. The way the construct is con-
ceptualized and measured clearly involves objectively right and wrong answers and it assesses the
ability to objectively obtain the right answer. Contrary to the conceptualization of a cognitive style
construct as not inherently adaptive or maladaptive, high field independence appears to be inher-
ently more adaptive than low field independence. It is difficult to think of situations in which field
326 Handbook of Employee Selection

dependence is better than field independence. Rather than a preferred way of processing information
(i.e., a style), high field independence refers to a specific type of information processing ability.
Whereas some measures of cognitive styles are in fact assessing cognitive abilities, others are
probably assessing personality traits or multidimensional constructs that are composites of styles
and personality traits. For example, Myers built on Jung’s (1923) theory of psychological types and
developed the Myers-Briggs Type Indicator (MBTI; Myers & McCaulley, 1985) as a cognitive style
measure consisting of four factors, each containing two categories (i.e., thinking vs. feeling, extra-
version vs. introversion, intuition vs. sensing, judgment vs. perception) that are combined to form
16 possible types of individuals. Although widely used in business and education settings, there
are numerous validity problems with the MBTI (e.g., Druckman & Bjork, 1991, 1994). Moreover,
conceptually and empirically, each of the 16 types in the MBTI is clearly a composite of personality
traits (extraversion-introversion) and other individual difference constructs that may be cognitive
styles (e.g., intuition-sensing) or the degree to which personal values versus impersonal logic are
used as the basis for making judgment and decisions (thinking-feeling).
There are legitimate cognitive styles constructs. For example, several researchers introduced
(differently labeled) constructs that all refer to the degree to which individuals see things as similar
or different. These include constructs such as categorizing behavior (Gardner, 1953), conceptual
differentiation (Gardner & Schoen, 1962), and compartmentalization (Messick & Kogan, 1963).
These constructs refer to the tendency to separate ideas or objects into discrete categories. Clearly,
any two ideas or objects are similar in some ways and different in other ways. Depending on the
problem or situation, the similarities (or differences) may be task-relevant or task-irrelevant. Hence,
consistent with the conceptualization of a cognitive style construct, the tendency to see things as
similar or different is not inherently adaptive or maladaptive—the adaptive value of any given level
on the construct is dependent on its fit with the problem situation.
Other examples of legitimate cognitive style constructs are the preference for abstract versus
concrete information (Harvey, Hunt, & Schroder, 1961), the adaption versus innovation cognitive
style (Kirton, 1976), and the tolerance for contradiction cognitive style (Chan, 2004). In two dif-
ferent studies, Chan demonstrated that a cognitive style is not inherently adaptive or maladaptive
and that it may interact disordinally with the style demands of the work context (Chan, 1996) or
practical intelligence (Chan, 2004) to produce positive or negative consequences. Using Kirton’s
(1976) conceptualization of adaption versus innovation approach to problem solving, Chan (1996)
showed that the degree of cognitive style mismatch between the individual’s problem-solving style
and the style demands of the work context predicted actual turnover over the predictability provided
by job performance. In Chan (2004), construct validity evidence for the cognitive style construct of
tolerance for contradiction were provided in terms of convergent and discriminant validity with an
established set of external constructs. Using a sample different from the validation sample, Chan
(2004) then showed that tolerance for contradiction positively predicted job performance among
individuals with high practical intelligence but negatively predicted job performance among those
with low practical intelligence.

CURRENT CONCERNS AND EMERGING ISSUES


Similar to the study of values, there are basic conceptualization and measurement issues that need
to be adequately addressed for the study of cognitive styles to contribute to the science and practice
of employee selection. The following are some major concerns and emerging issues.

1. Unlike the structure of values, there are no widely used or commonly accepted frame-
works/taxonomies of cognitive styles. Sternberg and Grigorenko (1997) classified some
of the styles available in the literature into three broad categories: cognition-centered,
personality-centered, and activity-centered. However, this classification is not very useful
for various reasons. First, only a few examples are given in each category. Second, several
Values, Styles, and Motivational Constructs 327

of the constructs are very closely related conceptually and may even be identical. For
example, it is unclear if cognitive complexity, compartmentalization, conceptual differen-
tiation, and conceptual integration are four distinct styles or if some of these are simply
different labels for the same construct. Third, the cognition-centered category includes
some cognitive styles that are clearly cognitive abilities and others that more closely fit
the conceptualization of cognitive styles. Fourth, the only two examples [the MBTI and
Gregorc’s (1985) Energic Model] given in the personality-centered category are models or
typologies in which individuals are classified into composite types simply obtained from
a combination of several factors that appear to include cognitive styles, personality traits,
and other types of individual difference constructs. Fifth, the activity-centered category,
which consisted of learning and teaching styles, is simply a description of the learning or
teaching contexts in which various types of cognitive styles, personality traits, and motiva-
tional constructs may be applicable. An adequate taxonomy of typology of cognitive styles
is needed to organize the extant style constructs and measures; reduce the proliferation of
different construct labels, which in fact represent the same construct; provide meaningful
comparisons of results across studies; and aid the meta-analysis of cognitive styles.
2. Although cognitive styles are conceptually distinct from cognitive abilities and personality
traits, the literature on cognitive styles contains numerous conceptualizations and mea-
sures of styles that are highly related to or even indistinguishable from cognitive abilities or
personality traits. On the other hand, there are examples of cognitive styles with empirical
evidence suggesting that they are distinct from cognitive ability and personality traits [e.g.,
Chan’s (2004) tolerance for contradiction style; Harvey et al.’s (1961) abstract-concrete
preference; Kirton’s (1976) adaption-innovation style]. When studying a cognitive style in
the context of employee selection, it is important to provide clear theoretical arguments and
empirical evidence for the cognitive style vis-à-vis the traditional predictor space contain-
ing cognitive ability and personality traits (an adequate taxonomy of cognitive styles will
provide a useful conceptual basis). When carefully studied, cognitive styles could provide
important contributions in terms of incremental validity or interaction effects involving
other individual difference constructs or situational variables (e.g., Chan, 1996, 2004).
3. Given the basic definition that cognitive styles are not inherently adaptive or maladaptive,
it is important to validate new cognitive style constructs by identifying and showing, in
theory-driven ways, the boundary conditions under which the cognitive style is adaptive
and those under which it is maladaptive.
4. Cognitive style constructs are often conceptualized, and probably correctly so, as continu-
ous variables. However, many studies measure and analyze cognitive styles as categorical
variables in which individuals are classified into discrete types. This is not merely an issue
of loss of statistical power to detect an effect due to artificial categorization of a continu-
ous variable. It concerns mismatch in theory, measurement, and analysis, which are likely
to lead to erroneous substantive inferences. For example, dichotomizing the abstract-con-
crete style continuum into the abstract type or concrete type (hence ignoring the degree of
abstraction) makes it impossible to conceptualize and empirically test the hypothesis that
degree of abstraction is curvilinearly related to a criterion variable of interest, such as task
performance.

MOTIVATIONAL CONSTRUCTS
Motivation is often defined in terms of three features: it directs (i.e., goal-oriented), it energizes (i.e.,
activation and activity), and it perseveres (i.e., effort). Clearly, motivation is necessary for accomplish-
ing many tasks. Many researchers would agree with the conceptualization of job performance as a
function of ability and motivation (e.g., Campbell & Pritchard, 1976). Yet, in terms of the nonability
predictor construct space, the past 3 decades of employee selection research have largely focused
328 Handbook of Employee Selection

on personality traits rather than motivational constructs, such as trait goal orientations and need for
achievement. Some personality traits (e.g., conscientiousness) are more easily construed as motiva-
tional constructs than others (e.g., extraversion and neuroticism). Given that personality may overlap
with motivation, and even if we assume that personality is a subset of motivation (and I suspect not
many of us would make this assumption), there remains a large part of the motivational construct
space that is not captured by personality traits.
Although motivational constructs may be captured in selection methods such as interviews,
accomplishment records, biodata measures, and situational judgment tests, we must not confound
these methods with constructs (see Chan & Schmitt, 2005). These selection methods may be used
to assess a wide range of constructs including cognitive ability, personality traits, and motivational
constructs. Employee selection has focused much on cognitive ability and personality constructs
but paid relatively little explicit attention to motivational constructs, although some motivational
constructs may in fact be assessed together with ability and personality in the variety of selection
methods used. The purpose of this section is to highlight the fact that many established motivational
constructs are available in the literature and they deserve more attention from employee selection
researchers than is currently received.

WHY STUDY MOTIVATIONAL CONSTRUCTS?


Research on motivational constructs is easily justified by the assumption that motivation is neces-
sary for job performance and the fact that the motivational construct space may overlap but is cer-
tainly not exhausted by personality constructs. In addition, values and cognitive styles, as defined
and illustrated in this chapter, do not appear to possess all three features of motivation. Specifically,
most value and cognitive style constructs do not seem to have to be goal-directed, activation- or
activity-oriented, and effortful. Motivation should be critical in learning and skill acquisition and
therefore should predict work-relevant outcomes associated with newcomer adaptation and training.
Motivation is also central in the conceptual definition of typical performance, in which the basis is
the “will do” aspect of performance. Finally, motivation is clearly the central conceptual feature in
work-relevant criterion outcomes such as organizational commitment, withdrawal behaviors, and
turnover.
In short, the study of motivational constructs is important because some of these constructs are
likely to provide incremental prediction for important work-relevant criteria over the predictability
provided by cognitive ability, personality traits, values, and cognitive style constructs.

EXAMPLES OF MOTIVATIONAL CONSTRUCTS


Instead of attempting a review of the numerous motivational constructs in the literature, which
is beyond the scope of this chapter, this section will briefly describe three types of motivational
constructs: trait goal orientations, achievement motivations, and interests. The three types are
clearly nonexhaustive—the purpose is to illustrate how the study of motivational constructs may
contribute to employee selection in various ways.

Trait Goal Orientations


The motivational construct of trait goal orientation originated from Dweck and her colleagues
(Dweck, 1986; Dweck & Leggett, 1988; Elliott & Dweck, 1988). These authors proposed a theory of
motivation that posited that individuals exhibit different response patterns according to stable dif-
ferences in their goal orientations. Two types of goals are distinguished—learning goals and perfor-
mance goals. Individuals who are high in learning goal orientation are motivated to learn something
new or increase their competence in a domain. They exhibit a “mastery-oriented” response pattern
characterized by seeking challenging tasks, treating their performance errors as useful feedback,
and persisting to arrive at solutions in the face of repeated failures and difficult task conditions.
Values, Styles, and Motivational Constructs 329

Individuals who are high in performance goal orientation are motivated to seek favorable or avoid
unfavorable evaluations of their performance or competence. They tend to attribute performance
errors and failures to low competence and hence avoid challenges or difficult situations that are
“error-prone.”
The bulk of the research on goal orientation is found in the educational literature. In the 1990s,
several researchers noted that goal orientation is potentially useful in organizational research, including
studies on design and implementation of training programs, performance appraisal systems, and
task performance in general (e.g., Bobko & Colella, 1994; Farr, Hofmann, & Ringenbach, 1993).
Consequently, there has been strong interest in applying goal orientation in several areas within the
employee selection and organizational behavior domains (e.g., Van de Walle, Brown, Cron, & Slocum,
1999). Because of the potential applied value of goal orientation in organizational research, it is likely
that the interest in trait goal orientations will continue.
Fundamental issues of construct validation need to be better addressed to guide substantive
studies of goal orientation in organizational settings. The works of Dweck and colleagues appear
to treat goal orientation as a single bipolar continuum with learning goal orientation at one end and
performance goal orientation at the other. However, subsequent researchers have argued that learn-
ing goal and performance goal orientation are distinct factors. Button, Mathieu, and Zajac (1996)
reviewed the conceptualizations of goal orientation and argued for an uncorrelated two-factor
model in which learning goal and performance goal orientations are distinct and independent.
Although there is agreement with the conceptualization of learning goal orientation (LGO),
previous research has not distinguished or paid sufficient attention to two important, distinct,
and relatively independent dimensions of performance goal orientation. As noted by Elliot and
Harackiewicz (1996) and Van de Walle (1997), goal orientations can be conceptualized as a three-
factor model because performance goal orientation can be construed (and assessed) in terms of
either an avoid performance goal orientation (APGO) or a prove performance goal orientation
(PPGO). Individuals high on APGO strive to avoid unfavorable judgments about their ability.
Given this conceptualization, APGO individuals are less likely to be high on LGO because they
tend to perceive error-prone and difficult situations as threatening and are vulnerable to negative
evaluation rather than learning opportunities for increasing job performance. Individuals high on
PPGO strive to gain favorable judgments by demonstrating their ability and competence to others
through their performance. Unlike APGO, which is conceptualized as negatively associated with
LGO, PPGO is conceptually independent of LGO.
Previous research has produced mixed findings on the association between LGO and performance
goal orientation, with studies reporting zero, positive, and negative correlations (see Button et al.,
1996). The failure to distinguish the notion of performance goal orientation into its two relatively
independent dimensions (APGO vs. PPGO) may be one reason for the apparently mixed findings in
previous research. Conceptually, we would expect LGO to be negatively and substantially related
with APGO but unrelated with PPGO. Given this differential pattern of associations across the two
performance goal orientations, the “mixed findings” in research may not be surprising because the
magnitude and direction of the correlation between LGO and performance goal orientation would
be dependent on the relative extent to which the performance goal orientation measure was loaded
with APGO and PPGO. Because previous performance goal orientation items were not designed to
assess two independent dimensions, some of the items are likely to be bi- or multidimensional rather
than pure markers of APGO or PPGO.

Achievement Motivations
The most well-known construct of achievement motivation is McClelland’s (1961) Need for
Achievement. Individuals with high need for achievement have a strong desire for significant accom-
plishments. They tend to be approach-oriented and they work harder and spend substantive efforts
in striving to achieve success. In addition, they tend to be medium risk takers and select tasks with
intermediate level of difficulty so that they have more than a 50% chance of achieving success
330 Handbook of Employee Selection

(McClelland, 1985). According to McClelland, individuals high in need for achievement have a
greater need to achieve success and, conversely, avoid failure. That is, high need achievement indi-
viduals tend to also have a high fear of failure and therefore tend to be avoidance-oriented when it
comes to tasks with high risks of failure.
McClelland’s conceptualization of need for achievement has dominated motivational constructs
from the 1960s to the 1980s. Since the 1980s, the concept of need for achievement has evolved in
important ways with regard to the way the concept of achievement is construed. A major advance-
ment came from researchers in cross-cultural social psychology. These researchers distinguish
between the individualistic notion of achievement, which is based on an independent view of the
self as originally conceived by McClelland, and a different notion of achievement that is based on
an interdependent view of the self and more characteristic of individuals from collectivistic cultures
(e.g., East Asian) in which group harmony, interconnectedness, and social relationships are empha-
sized (e.g., Markus & Kitayama, 1991). Cultural models of self and need for achievement provide
important conceptual bases for addressing challenging cross-cultural issues of construct equiva-
lence, measurement invariance of responses to measures, and comparisons of criterion-related
validity involving achievement motivational constructs and achievement-related criterion contexts.
Advances in these areas will directly contribute to the employee selection research on issues related
to staffing cross-cultural teams and expatriate assignment. Another major advancement in the con-
strual of need for achievement is the distinction of different achievement domains in terms of the
type of goal striving. Trait goal orientation, as described above, is essentially a multidimensional
view of need for achievement according to the type of goals that one is striving to achieve.

Interests
Interest measures have been used more frequently in vocational guidance situations than in
employee selection, but the goal of selection, most broadly, is to find a person who has the charac-
teristics that best fit that of the job position, organization, or occupation. Interests in certain type of
works or careers certainly could be one type of these characteristics and they are therefore relevant
to employee selection. The primary reason for considering and measuring interests in employee
selection lies in the assumption that a person will be happiest and most productive when he or she
is working in a job or occupation in which he or she is interested (Schmitt & Chan, 1998). Dawis
(1991) summarized research indicating that interest and personality measures are correlated rela-
tively lowly. Although there are no reviews examining the correlations between interests and values
or cognitive styles, the notion of interests is conceptually distinct from the values and cognitive
styles. Interests are scaled in terms of liking, whereas values are scaled in terms of importance and
cognitive styles are scaled in terms of preference in information processing. Interests may be con-
strued as primarily motivational constructs insofar as interests tend to have the three motivational
features; namely, goal-orientation, activation and activity, and effort.
Holland (1985) focused on the similarity between an individual’s interests and the degree to
which an environment provides for engagement in activities of interest to the individual. According
to Holland’s (1985) framework, which is the most well-known taxonomy of interests, individuals and
environments could be characterized along six major dimensions: social, enterprising, conventional,
realistic, investigative, and artistic. For example, high scorers on the realistic dimension are usually
interested in dealing with concrete things and relatively structured tasks, and they include such
occupations as engineers, farmers, and carpenters. Individuals who score high on the social dimension
are interested in working and helping others, and these individuals are attracted to such occupations as
teaching, social worker, flight attendant, and mental health workers.
According to Holland, the interest patterns are organized in a fashion explained by a hexagon.
Interest areas next to an area of primary interest are also likely to be of interest to an individual,
whereas those interests opposite to a primary area on the hexagon are unlikely to be of much
interest. Holland’s structure of interests, measured by The Strong Vocational Interest Blank, has
received considerable corroborative support (see Tracey & Rounds, 1993).
Values, Styles, and Motivational Constructs 331

Holland’s framework for the structure and understanding of interest dominates the field of
counseling and vocational guidance, and it has potential for employee selection, although its
direct use is surprisingly limited. However, some of Holland’s interest dimensions are probably
captured in biodata measures.

CURRENT CONCERNS AND EMERGING ISSUES


The following are some areas of concerns and addressing these issues would contribute to the study
of motivational constructs in employee selection research.

1. With the emergence of new motivational constructs, basic construct validation efforts are
necessary. Specifically, clarifying the dimensionality of a motivational construct is critical
because it affects our theorizing and directs our hypothesis formulation and our interpreta-
tion of findings regarding the motivational construct. Consider the research on trait goal
orientations. If a three-factor model is correct, then future meta-analytic studies have to
take into account the type of performance goal orientation being assessed when coding
each primary study. The research on dimensionality of trait goal orientations also high-
lights the importance of explicating the role of goals in a motivational construct, including
the content and structure of goals and the goal striving process.
2. An important issue in the conceptualization and hence measurement of motivational
constructs concerns the level of specificity. Although the appropriateness of the level of
specificity of a motivational construct is likely to be dependent on the particular research
question or practical use, we need to ensure conceptual clarity as we move up or down the
ladder of specificity. For example, when a motivational construct is conceptualized at a
very general level, it is likely to be multidimensional and made up of multiple constructs
that may be motivational or nonmotivational constructs. This is best illustrated in the study
of interests. Although the concept of interest has the elements of motivational constructs,
the interest dimensions in Holland’s structure are descriptive categories of individuals or
environments rather than unitary individual difference motivational constructs. In fact,
each interest dimension probably reflects multiple personality traits and cognitive styles,
in addition to motivational constructs. For example, the artistic dimension describes a
category of individuals who are likely to also score high on personality traits such as
openness to experience and cognitive style constructs such as preference for abstraction.
In addition, individuals’ knowledge and skills (e.g., artistic “talent”) as well as their
education, opportunities, and experiences are likely to shape their interests. In short,
interests are probably better understood in terms of descriptions of individuals or
environments in composite terms reflecting motivational constructs but also a variety of
knowledge, skills, abilities, and other characteristics (KSAOs), such as personality traits
and cognitive styles.
3. Motivation is a process. Hence, to understand how motivational constructs affect behav-
iors, we may require conceptualizations of motivational constructs that are more dynamic
than the static conceptualizations that are typical of personality traits, values, and cog-
nitive styles. To begin, studies on motivational constructs need to relate the individual
differences in motivation to the larger literature on motivation, particularly the literature
on theoretical models of work motivation (for review, see Mitchell & Daniels, 2003). A
theoretical model of work motivation specifies the motivational processes or mechanisms
by which motivated individuals select specific goals and pursue them through allocating
effort, monitoring progress, and responding to obstacles and feedback. In each of the estab-
lished models in the work motivation literature, what is the role of individual differences in
motivational constructs? Specifically, where in the work motivational model do we locate
the motivational construct(s)? A theory-driven framework for including motivational
332 Handbook of Employee Selection

constructs in employee selection would require us to specify the appropriate direct effects
and interaction effects linking motivational constructs and the focal variables in the par-
ticular work motivation model.
4. As illustrated in the above discussion on cultural models of need for achievement, studies
on motivational constructs need to be sensitive to cultural differences in the conceptual
definition of the motivational construct. Even if construct equivalence exists across cul-
tures, culture effects may operate in other ways. For example, it is possible that culture
may moderate the relationship between a motivational construct and a criterion variable.
Consider the motivational construct of APGO, which has almost always been construed
and empirically demonstrated to be negatively associated with job performance in western
samples. It may be possible that in cultures (or task settings) in which there is low toler-
ance for performance errors and high emphasis on speed and accuracy, individuals high on
APGO may not necessarily be rated as poorer performers than those low on APGO, and
they may even be rated as better performers.

PRACTICAL CONSIDERATIONS AND FUTURE RESEARCH CHALLENGES


In this chapter, I have discussed the basic conceptualizations of values, cognitive styles, and moti-
vational constructs. Using various specific examples in each of these types of constructs as illus-
trations, I have raised several concerns and issues with regard to fundamental conceptualization
and measurement issues that need to be addressed as we incorporate these constructs in employee
selection. There are some commonalities in the critical issues associated with the study of each of
the three types of constructs that will impact employee selection. In this final section of the chapter,
I will discuss several practical considerations in the use of these constructs in employee selection
and propose a strategic agenda for future research directions.

PRACTICAL CONSIDERATIONS IN EMPLOYEE SELECTION


The following four types of practical considerations in the use of values, cognitive styles, and
motivational constructs in employee selection will be considered: legal and social issues, subgroup
differences, cultural differences, and problems with self-report data.

1. Legal and social issues: We need to consider the legal and social constraints when
recommending the use of individual difference measures of values, cognitive styles, or
motivations for the purpose of making employee selection decisions. Virtually all of the
legal and social issues involving the use of cognitive ability and personality tests (see
Chapters 29 and 30, this volume) are applicable to the use of values, cognitive styles,
and motivational measures, although the importance of each issue is dependent on
the specific measure and situation of use. Examples of these issues include the legal
determination of job-relevance, which may or may not overlap with psychometric validity;
allegations of discriminatory hiring practices; affirmative action and equal employment
opportunities; applicant reactions; and the distinction between psychometric test bias
and nonpsychometric fairness perceptions (for review, see Schmitt & Chan, 1998). In
practice, legal and social issues are often closely related, as evident in the issue of adverse
impact. In addition, the extent to which it is appropriate or acceptable (whether legally
or socially) to assess a construct for employee selection decisions may be tied to the
selection procedures used and the extent to which the construct is explicitly assessed.
For example, values may be assessed in some biodata items and interviewers’ assessment
of applicants’ values is probably captured, although mostly not in an explicit manner, in
the interview scores. The measurement of values as a component of biodata or interview
scores may not attract as much legal or social attention as the use of an inventory designed
Values, Styles, and Motivational Constructs 333

specifically to measure values. The last two decades of employee selection research has
focused much attention on applicant reactions, including its importance and the various
ways to engender favorable reactions. When adequately developed, measures of values,
cognitive styles, and motivational constructs can lead to positive applicant reactions (see
Chan & Schmitt, 2004; Schmitt & Chan, 1997).
2. Subgroup differences: A practical problem faced by many organizations is the use of
selection tests (particularly cognitive ability tests) that are valid predictors of job per-
formance for majority and minority applicants but show large subgroup differences in
mean test scores favoring the majority subgroup. This situation leads to a conflict between
the organization’s need to use a valid test and the goal to hire a diverse workforce for
legal and social reasons. In general, measures of values, cognitive styles, and motiva-
tional constructs are probably more similar to personality tests than cognitive ability
tests in that there is no evidence of substantial subgroup differences between majority
and minority applicants. However, this may not be true of some specific measures even if
the measures do not assess cognitive ability constructs. For example, it has been argued
and there is some empirical evidence showing that Black Americans, as compared to
White Americans, tend to perform better on a test that is loaded with socially interac-
tive and visual information than a test loaded with written and verbal information (Chan
& Schmitt, 1997). Hence, using a cognitive style measure to assess the preference for
processing visual versus verbal information is likely to result in Black-White subgroup
difference in test scores, leading to adverse impact problems in employee selection. But
in general, adding measures of values, cognitive styles, and motivational constructs to
cognitive ability tests is likely to reduce subgroup difference in the composite test scores
and hence adverse impact. Including these nonability measures also increases criterion-
related validity to the extent that the criterion space is expanded from the narrow focus
on ability-based maximum and technical job performance to the nonability-based typical
and contextual job performance.
3. Cultural differences: Issues of possible cultural differences in test validity (in terms of
content, criterion-related, and construct validity evidence) need to be considered whenever
we use a selection measure in a culture different from the culture in which the measure is
developed and validated. Discussions on methodological issues in cross-cultural measure-
ment, such as response sets and measurement invariance, are readily available in the lit-
erature and will not be repeated here (for a recent review, see Chan, 2008a). Note, however,
that cultural differences may affect the conceptualization and measurement of constructs
in substantive ways that go beyond the technical issues of cross-cultural measurement. This
is particularly relevant to values and motivational constructs given that cultures may differ
qualitatively in their conceptualizations of certain values (e.g., freedom, happiness) and
motivations (e.g., need for achievement).
4. Problems with self-report data: In the assessment of values, cognitive styles, and
motivational constructs, the large majority of the measures used are in self-report
format. Similar to the use of personality inventories, issues related to the validity
problems of self-report data are relevant when self-report measures of these three types
of constructs are used in employee selection, especially given the high stakes involved
in actual employee selection contexts. Some values (e.g., honesty) and motivational
constructs (e.g., LGO), given the evaluative nature of their content, may be particularly
susceptible to social desirability responding problems. In general, cognitive styles
are probably less likely than values and motivational constructs to suffer from social
desirability responding given the nonevaluative nature of cognitive style items. Finally,
although self-report data problems do occur in the measurement of values, cognitive
styles, and motivational constructs, many of the purported problems are often overstated
(see Chan, 2008b).
334 Handbook of Employee Selection

STRATEGIC AGENDA FOR FUTURE RESEARCH DIRECTIONS


On the basis of the above discussions on values, cognitive styles, and motivational constructs,
I propose the following strategic agenda for future research directions.

1. Dimensionality: Construct validation efforts that specify and test the dimensionality of a
construct are fundamental when examining a value, cognitive style, or motivational con-
struct. Specifically, it is important to determine if the construct of interest under study is
a single, “pure” factor or a composite construct consisting of multiple factors. Composite
constructs are particularly difficult to deal with. First, we will need to identify the number
and nature of the various factors. Second, we will need to establish the different contribu-
tions of the various factors to the composite construct. Third, failing to accurately identify
the number, nature, and weights of the factors making up the composite construct will
result in substantive inferential errors about values, cognitive styles, or motivational con-
structs. For example, a purportedly motivational construct may in fact be a composite label
reflecting not only multiple motivational constructs but also various nonmotivational con-
structs such as knowledge, skills, abilities, personality traits, values, and cognitive styles.
2. Level of specificity: Closely related to the issue of dimensionality and composite constructs
is the issue of level of specificity of a construct. Depending on the particular research
question, researchers need to ensure that the level of specificity of the value, cognitive style,
or motivational construct is appropriate, and this requires clear conceptual definitions of the
constructs and appropriate matching between predictor and criterion constructs. Broader
constructs (e.g., individualistic vs. collectivistic values, need for achievement) may be more
useful for obtaining better prediction of a general criterion (e.g., organizational commitment,
overall job performance) in a parsimonious and generalizable manner. More narrowly
defined constructs may be more useful for increasing understanding of the criterion space
and the predictor-criterion relationships, including possible mediating mechanisms (e.g.,
linking specific trait goal orientations to specific dimensions of job performance). Note
that the issue here is not about any inherently optimal level of specificity of the construct.
Any general statement on the relative value of broad versus narrowly defined constructs
is unlikely to be useful because it is the clarity of conceptual definition of constructs and
appropriate matching between the predictor and criterion spaces that will lead to higher
validities and better explanations.
3. Adaptive value: Studies on the three types of constructs, particularly with respect to their
use in employee selection, need to explicate and test the adaptive value of the construct.
As noted in this chapter, cognitive styles are not inherently adaptive or maladaptive in an
absolute and context-free sense. Consider a measure that was designed to assess a cogni-
tive style. Let us suppose high scorers on this measure perform better in tasks across
many domains and it is very difficult or impossible to conceive of two different situa-
tions in which the high scorers are adaptive in one and maladaptive in the other. In this
scenario, the measure is likely to be assessing cognitive abilities or some other construct
that is inherently adaptive rather than a cognitive style. On the other hand, motivational
constructs are inherently adaptive in nature in that they should correlate positively rather
than negatively with a criterion in which higher scores represent higher adaptive value. It
is difficult to think of generalizable situations in which higher motivated individuals, as
compared to lower motivated individuals, will experience less positive or more negative
consequences. Whether or not a value construct is adaptive or maladaptive is dependent
on the nature of the construct. Values with higher evaluative content such as honesty and
fairness are likely to be adaptive in many situations, whereas those with lower evalua-
tive content such as individualism-collectivism may be adaptive or maladaptive depending
on the nature of the situational demands. In addressing the adaptive value of a predictor
Values, Styles, and Motivational Constructs 335

construct, it is important to examine possible nonlinear relationships linking the predictor


construct and the criterion construct. For example, in certain work situations, collectivism
(a value construct) or need for achievement (a motivational construct) may be related to a
job performance construct by an inverted-U function rather than a liner association. The
specific functional form of the predictor-criterion relationship has clear implications for
employee selection. If the function is an inverted U, then individuals with moderate scores
on the value or motivational construct are more likely to be better performers on the job
than those with low or high scores.
4. Person-environment fit: Another challenging and important future research direction is to
study individual differences in values, cognitive styles, and motivational constructs in the
context of person-environment fit. For example, Chan (1996) showed that a misfit between
the individual’s cognitive style and the commensurate style demands of the work environ-
ment predicted actual turnover beyond the predictability provided by job performance.
Such findings have potential practical implications for employee selection for various work
environments. Similar fit studies could be conducted for various values, cognitive style,
and motivational constructs by carefully mapping the individual difference construct to
the environmental construct. One promising area is to study the effects of fit between
trait goal orientations and the goal orientation demands of the work environment. Clearly,
studies of person-environment fit will require construct-oriented approaches that explicate
dimensionality and predictor-criterion relationships (Chan, 2005b). Fit is generally con-
strued as adaptive, whereas misfit is construed as maladaptive. However, advancements in
fit research are likely to occur if we can show when and how fit may have negative effects,
as well as when and how misfit may have positive effects. Examples of possible negative
effects of fit include homogeneity of individuals in an environment leading to groupthink
and cognitive style fit between individuals and the culture leading to failure to consider
alternatives. Examples of possible positive effects of misfit include diversity of individuals
leading to new ideas and value misfit leading to whistle blowing.
5. Interconstruct relationships: Any individual difference construct cannot be considered in
isolation. Future research should examine interconstruct relationships within and across
the three types of constructs. There are at least two ways to examine interconstruct rela-
tionships. The first way is to examine the incremental validity of one construct over another
in predicting a criterion. For example, Payne, Youngcourt, and Beaubien (2007) examined
trait goal orientations and found that these motivational constructs predicted job perfor-
mance above and beyond the prediction provided by cognitive ability and personality. It
is practically important to examine if a value, cognitive style, or motivational construct
offers any incremental validity in the prediction of job performance or other work-relevant
criteria over the predictability provided by the traditional predictor constructs, such as
cognitive ability and personality traits. The second way is to examine trait-trait interaction
effects on work-relevant criteria. For example, Chan (2004) found a disordinal interaction
effect between a cognitive style construct (tolerance for contradiction) and practical intel-
ligence such that the cognitive style positively predicts job performance among individu-
als high on practical intelligence but negatively predicted job performance among those
low on practical intelligence. Studies on trait-trait interactions are important because they
clarify and validate the nature of the individual difference constructs and identify the
boundary conditions for their criterion-related validities and adaptive effects.

EPILOGUE
Given the modest amount of criterion variance typically accounted for in employee selection, it is
understandable that researchers and practitioners seek to expand the predictor construct space by
336 Handbook of Employee Selection

going beyond cognitive abilities and personality traits to include values, cognitive styles, and moti-
vational constructs. I have provided an overview of the nature of these three types of constructs, their
potential usefulness, issues relating to conceptualization and measurement, practical considerations
to take into account in the use of these constructs for employee selection, and a strategic agenda
for future research directions. It is hoped that this chapter will provide an effective springboard for
fruitful construct-oriented research on values, cognitive styles, and motivational constructs.

REFERENCES
Bobko, P., & Colella, A. (1994). Employee reactions to performance standards: A review and research proposi-
tions. Personnel Psychology, 47, 1–29.
Button, S. B., Mathieu, J. E., & Zajac, D. M. (1996). Goal orientation in organizational research: A conceptual
and empirical foundation. Organizational Behavior and Human Decision Processes, 67, 26–48.
Campbell, J. P., & Pritchard, R. D. (1976). Motivation theory in industrial and organizational psychology. In M.
D. Dunnette (Ed.), Handbook of industrial and organizational psychology (1st ed., pp. 63–130). Chicago,
IL: Rand McNally.
Chan, D. (1998a). The conceptualization and analysis of change over time: An integrative approach incorporat-
ing longitudinal means and covariance structures analysis (LMACS) and multiple indicator latent growth
modeling (MLGM). Organizational Research Methods, 1, 421–483.
Chan, D. (1998b). Functional relations among constructs in the same content domain at different levels of
analysis: A typology of composition models. Journal of Applied Psychology, 83, 234–246.
Chan, D. (2002). Longitudinal modeling. In S. Rogelberg (Ed.), Handbook of research methods in industrial
and organizational psychology (pp. 412–430). Malden, MA: Blackwell.
Chan, D. (2004). Individual differences in tolerance for contradiction. Human Performance, 17, 297–325.
Chan, D. (2005a). Multilevel research. In F. T. L. Leong & J. T. Austin (Eds.), The psychology research hand-
book (2nd ed., pp. 401–418). Thousand Oaks, CA: Sage.
Chan, D. (2005b). Current directions in employee selection. Current Directions in Psychological Science, 14,
220–223.
Chan, D. (2008a). Methodological issues in international Human Resource Management. In M. M. Harris
(Ed.), Handbook of research in international human resources management, (pp. 53–76). Mahwah, NJ:
Lawrence Erlbaum.
Chan, D. (2008b). So why ask me?—Are self-report data really that bad? In C. E. Lance & R. J. Vandenberg
(Eds.), Statistical and methodological myths and urban legends: Received doctrine, verity, and fable in
the organizational and social sciences (pp. 309–336). Hillsdale, NJ: Lawrence Erlbaum.
Chan, D., & Schmitt, N. (1997). Video-based versus paper-and-pencil method of assessment in situational judg-
ment tests: Subgroup differences in test performance and face validity perceptions. Journal of Applied
Psychology, 82, 143–159.
Chan, D., & Schmitt, N. (2004). An agenda for future research on applicant reactions to selection procedures:
A construct-oriented approach. International Journal of Selection and Assessment, 12, 9–23.
Chan, D., & Schmitt, N. (2005). Situational judgment tests. In A. Evers, O. Smit-Voskuijl, & N. Anderson
(Eds.), Handbook of personnel selection (pp. 219–242). Oxford, England: Blackwell.
Dawis, R. V. (1991). Vocational interests, values, and preferences. In M. D. Dunnette & L. M. Hough (Eds.),
Handbook of industrial and organizational psychology (Vol. 2, pp. 833–872). Palo Alto, CA: Consulting
Psychologists Press.
Druckman, D., & Bjork, R. A. (1991). In the mind’s eye. Washington, DC: National Academy Press.
Druckman, D., & Bjork, R. A. (Eds.). (1994). Learning, remembering, believing: Enhancing individual and
team performance. Washington, DC: National Academy Press.
Dweck, C. S. (1986). Motivational processes affecting learning. American Psychologist, 41, 1040–1048.
Dweck, C. S., & Leggett, E. L. (1988). A social-cognitive approach to motivation and personality. Psychological
Review, 95, 256–273.
Elliot, A. J., & Harackiewicz, J. M. (1996). Approach and avoidance achievement goals and intrinsic motiva-
tion: A mediational analysis. Journal of Personality and Social Psychology, 70, 461–475.
Elliott, E. S., & Dweck, C. S. (1988). Goals: An approach to motivation and achievement. Journal of Personality
and Social Psychology, 54, 5–12.
England, G. W., & Lee, R. (1974). The relationship between managerial values and managerial success in the
United States, Japan, England, and Australia. Journal of Applied Psychology, 59, 411–419.
Values, Styles, and Motivational Constructs 337

Farr, J. L., Hofmann, D. A., & Ringenbach, K. L. (1993). Goal orientation and action control theory: Implications
for industrial and organizational psychology. In C. L. Cooper & I. T. Robertson (Eds.), International
review of industrial and organizational psychology (Vol. 8, pp.191–232). New York, NY: Wiley.
Gardner, R. W. (1953). Cognitive style in categorizing behavior. Perceptual and Motor Skills, 22, 214–233.
Gardner, R. W., & Schoen, R. A. (1962). Differentiation and abstraction in concept formation. Psychological
Monographs, 76 (41, Whole No. 560).
Harvey, O. J., Hunt, D. E., & Schroder, H. M. (1961). Conceptual systems and personality organization. New
York, NY: Wiley.
Holland, J. L. (1985). Making vocational choices: A theory of careers (2nd ed.). Englewood Cliffs, NJ: Prentice
Hall.
Jung, C. (1923). Psychological types. New York, NY: Harcourt Brace.
Kidron, A. (1978). Work values and organizational commitment. Academy of Management Journal, 21, 239–247.
Kirton, M. J. (1976). Adaptors and innovators: A description and measure. Journal of Applied Psychology, 61,
622–629.
Maio, G. R., Roese, N., Seligman, C., & Katz, A. (1996). Ratings, rankings, and the measurement of values:
Evidence for the superior validity of ratings. Basic and Applied Social Psychology, 18, 171–181.
Markus, H., & Kitayama, S. (1991). Culture and self: Implications for cognition, emotion, and motivation.
Psychological Review, 98, 224–253.
McClelland, D. C. (1961). The achieving society. Princeton, NJ: Van Nostrand.
McClelland, D. C. (1985). Human motivation. New York, NY: Scott-Freeman.
Meglino, B. M., Ravlin, E. C., & Adkins, C. L. (1989). A work values approach to corporate culture: A field
test of the value congruence process and its relationship to individual outcomes. Journal of Applied
Psychology, 74, 424–432.
Messick, S., & Kogan, N. (1963). Differentiation and compartmentalization in object-sorting measures of
categorizing style. Perceptual and Motor Skills, 16, 47–51.
Mitchell, T. R., & Daniels, D. (2003). Motivation. In W. C. Borman, D. R. Ilgen, & R. J. Klimoski (Eds.),
Handbook of psychology (Vol. 12, pp. 225–254). New York, NY: Wiley.
Myers, I. B., & McCaulley, M. H. (1985). Manual: A guide to the development and use of the Myers-Briggs
Type Indicator. Palo Alto: CA: Consulting Psychologists Press.
Payne, S. C., Youngcourt, S. S., & Beaubien, J. M. (2007). A meta-analytic examination of the goal orientation
nomological net. Journal of Applied Psychology, 92, 128–150.
Posner, B. Z., Kouzes, J. M., & Schmidt, W. H. (1985). Shared values make a difference: An empirical test of
corporate culture. Human Resource Management, 24, 293–309.
Ravlin, E. C., & Meglino, B. M. (1987). Effects of values on perception and decision making: A study of
alternative work values measures. Journal of Applied Psychology, 72, 666–673.
Rokeach, M. (1973). The nature of human values. New York, NY: Free Press.
Schwartz, S. H. (1992). Universals in the content and structure of values: Theoretical advances and empirical tests
in 20 countries. In M. P. Zanna (Ed.), Advances in Experimental Social Psychology (Vol. 25, pp. 1–65). San
Diego, CA: Academic Press.
Schwartz, S. H., & Bilsky, W. (1990). Toward a theory of the universal content and structure of values:
Extensions and cross-cultural replications. Journal of Personality and Social Psychology, 58, 878–891.
Schwartz, S. H., & Sagiv, L. (1995). Identifying culture-specifics in the content and structure of values. Journal
of Cross Cultural Psychology, 26, 92–116.
Schmitt, N., & Chan, D. (1998). Employee selection: A theoretical approach. Thousand Oaks, CA: Sage.
Sternberg, R. J., & Grigorenko, E. L. (1997). Are cognitive styles still in style? American Psychologist, 52,
700–712.
Tracey, T. J., & Rounds, J. B. (1993). Evaluating Holland’s and Gati’s vocational interest models: A structural
meta-analysis. Psychological Bulletin, 113, 229–246.
Van de Walle, D. (1997). Development and validation of a work domain goal orientation instrument. Educational
and Psychological Measurement, 8, 995–1015.
Van de Walle, D., Brown, S. P., Cron, W. L., & Slocum, J. W., Jr. (1999). The influence of goal orientation and
self-regulation tactics on sales performance: A longitudinal field test. Journal of Applied Psychology, 84,
249–259.
Weiss, H. M. (1978). Social learning of work values in organizations. Journal of Applied Psychology, 63, 711–718.
Witkin, H. A., Dyke, R. B., Faterson, H. F., Goodenough, D. R., & Karp, S. A. (1962). Psychological differen-
tiation. New York, NY: Wiley.
Witkin, H. A., Oltman, P. K., Raskin, E., & Karp, S. A. (1971). Embedded figures test, children’s embedded
figures test, group embedded figures test: Manual. Palo Alto, CA: Consulting Psychologists Press.
This page intentionally left blank
16 Practical Intelligence,
Emotional Intelligence,
and Social Intelligence

Filip Lievens and David Chan

Over the years, practical intelligence, social intelligence, and especially emotional intelligence
have received substantial attention in the academic and practitioner literatures. However, at the
same time, these individual difference “constructs” have also fueled controversies and criticisms,
including their applications to employee selection. It is without doubt that their definition, dimen-
sionality, and operationalization (measurement) have been much more questioned as compared with
the more traditional or established constructs (i.e. cognitive ability, personality) in this section of
the book.
This chapter has two main objectives. The first objective is to review and clarify the conceptual-
ization and measurement of these three constructs (or categories of constructs). In doing so, we aim
to identify commonalities and differences among the three constructs. The second objective is to
advance research on practical, social, and emotional intelligence. We aim to achieve both objectives
by placing the three intelligence constructs in an integrative conceptual framework that relates them
to traditional individual difference constructs and critical criterion constructs. We end by proposing
five strategies for future research.

DEFINITIONS AND CONCEPTUALIZATIONS


In this section, we review how practical, emotional, and social intelligence have been conceptualized
and the research that attempted to empirically test these conceptualizations.

PRACTICAL INTELLIGENCE
Sternberg and colleagues introduced the construct of practical intelligence in the mid- to late-1980s
(Sternberg, 1988; Wagner & Sternberg, 1985). As a common thread running through the various
definitions of practical intelligence, it is generally considered to refer to the ability of an individual
to deal with the problems and situations of everyday life (Bowman, Markham, & Roberts, 2001).
In lay terms, it can be characterized as “intuition” or “common sense,” and it is often referred to as
“street smart” to contrast with “book smart,” which is used to characterize traditional analytical or
academic intelligence.
A central element in practical intelligence is tacit knowledge. Sternberg, Wagner, Williams, and
Horvath (1995) defined tacit knowledge as “action-orientated knowledge, acquired without direct
help from others, that allows individuals to achieve goals they personally value” (p. 916). This
definition encompasses the key characteristics of tacit knowledge (see Hedlund et al., 2003). First,
tacit knowledge is difficult to articulate because it is not formalized in explicit procedures and rules.

339
340 Handbook of Employee Selection

Second, tacit knowledge is typically procedural knowledge, telling people how to act in various
situations. Third, individuals acquire tacit knowledge on the basis of their own everyday experience
related to a specific domain. Thus, tacit knowledge is not formally taught. Fourth, tacit knowledge
is practical because it enables individuals to obtain the goals that they value in life. These charac-
teristics exemplify the claim of practical intelligence and tacit knowledge being constructs that are
conceptually distinct from academic intelligence, technical job knowledge, or personality.
Research by Sternberg and colleagues as well as by others have found some support for or at least
produced findings consistent with some of these claims. First, tacit knowledge seems to increase
with experience. For example, business managers received higher tacit knowledge scores than busi-
ness graduate students, who in turn outperformed undergraduate students although sample sizes
in these groups were often small (Wagner, 1987). Second, scores on tacit knowledge inventories
showed low correlations (below .20) with measures of fluid and crystallized intelligence (Legree,
Heffner, Psotka, Martin, & Medsker, 2003; Tan & Libby, 1997). Finally, Bowman et al. (2001)
reviewed research on tacit knowledge in organizational, educational, and military settings and con-
cluded that the assessment of tacit knowledge has certain promise for predicting performance in
these real-world environments, although the level of prediction does not reach the values obtained
with g (see also Van Rooy, Dilchert, Viswesvaran, & Ones, 2006).
Bowman et al. (2001) leveled various criticisms with respect to the construct of practical intelli-
gence. From a conceptual point of view, questions have been raised whether practical intelligence
(tacit knowledge) at all exists as a single construct that is different from other types of intelligence,
job knowledge, and personality (see also Gottfredson, 2003; McDaniel & Whetzel, 2005). In
particular, McDaniel and Whetzel (2005) put various claims related to practical intelligence (tacit
knowledge) to the test. To this end, they used research related to situational judgment tests (SJTs),
a measurement method that is closely related to tacit knowledge inventories (see below). Consistent
with research by Sternberg and colleagues, McDaniel and Whetzel concluded that such tests predict
job performance and have incremental validity over more common selection procedures. However,
they argued that there was no support for the other claims. Specifically, they cited studies showing
that SJTs of practical intelligence were factorial complex and could not be represented by a general
factor in factor analytic studies and studies showing that these test scores were significantly related to
scores on established constructs such as g, conscientiousness, emotional stability, and agreeableness.
Later in this chapter, we argue that such criticisms are right and wrong—they are right that practical
intelligence is not a unitary construct, but they are wrong to conclude that the factorially complex
results and significant correlations with established constructs imply that practical intelligence is
not a distinct and valid construct.

EMOTIONAL INTELLIGENCE
Since the mid-1990s, emotional intelligence is probably the psychological construct that has
received the greatest attention in practitioner and academic literatures. Generally, a distinction is
made between two conceptualizations of emotional intelligence; namely, an ability emotional intel-
ligence model and a trait emotional intelligence model (e.g., Matthews, Zeidner, & Roberts, 2007).
The first model conceptualizes emotional intelligence as an ability akin to cognitive ability
and measures it via performance-based tests. In this paradigm, emotional intelligence is viewed
as another legitimate type of intelligence. Hence, this model is also referred to as emotional cogni-
tive ability or information processing emotional intelligence. Emotional intelligence is then defined
as “the ability to monitor one’s own and others’ emotions, to discriminate among them, and to
use the information to guide one’s thinking and actions” (Salovey & Mayer, 1990, p. 189). This
definition shows that the higher order construct of emotional intelligence is broken down into four
branches. The first branch—emotional identification, perception, and expression—deals with the
ability to accurately perceive emotions in others’ verbal and nonverbal behavior. Emotional facilita-
tion of thought is the second branch, referring to the ability to use emotions to assist thinking and
Practical Intelligence, Emotional Intelligence, and Social Intelligence 341

problem-solving. Third, emotional understanding denotes the ability to analyze feelings, discrimi-
nate among emotions, and think about their outcomes. Finally, emotional management deals with
abilities related to maintaining or changing emotions.
The second model, the trait EQ model, views emotional intelligence as akin to personality and
assesses it via self-report. In this model, emotional intelligence is defined as “an array of non-
cognitive capabilities, competencies, and skills that influence one’s ability to succeed in coping
with environmental demands and pressures” (Bar-On, 1997, p. 16). As the name suggests, this
model uses a broad definition of emotional intelligence. Abilities such as emotion perception are
typically combined with noncognitive competencies, skills, and personality traits. For example,
one of the most popular mixed models (Bar-On, 1997) measures five broad factors and 15 facets:
(a) intrapersonal (self-regard, emotional self awareness, assertiveness, independence, and self-ac-
tualization), (b) interpersonal (empathy, social responsibility, interpersonal relationship), (c) stress
management (stress tolerance and impulse control), (d) adaptability (reality testing, flexibility, and
problem solving), and (e) general mood (optimism and happiness). In the Goleman (1995) model,
a similar expanded definition of emotional intelligence is used, referring to emotional intelligence
as a set of learned competencies. Emotional intelligence competence is then defined as “an ability
to recognize, understand, and use emotional information about oneself or others that leads to or
causes effective or superior performance” (Boyatzis & Sala, 2004, p. 149). A distinction is further
made between five main competency clusters (with various subcompetencies): self-awareness, self-
regulation, motivation, empathy, and social skills. Given the trait-like nature of the mixed model,
some researchers have suggested using terms such as “trait emotional intelligence,” “emotional self-
efficacy” (Petrides & Furnham, 2003), or “emotional self-confidence” (Roberts, Schulze, Zeidner,&
Matthews, 2005).
Recent meta-analytic research (Van Rooy, Viswesvaran, & Pluta, 2005) has demonstrated
that these two models are not measuring the same constructs. Measures based on the two models
correlated only .14 with one another. In addition, these two models had different correlates.
Emotional intelligence measures based on the mixed model overlapped considerably with personality
trait scores but not with cognitive ability. Conversely, emotional intelligence measures developed
according to an emotional intelligence ability model correlated more with cognitive ability and less
with personality. Other research has clarified that ability model measures correlate especially with
verbal (crystallized) ability, with correlations typically between .30 and .40 (Mayer, Roberts, &
Barsade, 2008). Hence, some have posited that the term “emotional intelligence” should be replaced
by the term “emotional knowledge” (Zeidner, Matthews, & Roberts, 2004).
In addition to the construct validity of emotional intelligence, the criterion-related validity has
also been scrutinized. Van Rooy and Viswesvaran (2004) conducted a meta-analysis of emotional
intelligence measures (collapsing both models) for predicting performance. Their analysis of
59 independent empirical samples obtained a mean corrected correlation of .23. The validity of
emotional intelligence measures was .24, .10, and .24 for predicting performance in occupational,
academic, and life settings, respectively. However, a caveat is in order when interpreting the results
of this meta-analysis as it included only a small number of studies using ability-based emotional
intelligence instruments and a sizable number of studies using self-report measures of performance.
Thus, we are still far from being at the point of rendering a decision as to the incremental value of
emotional intelligence for selection purposes. However, in recent years, more positive conclusions
regarding the validity of emotional intelligence for predicting performance have been drawn. For
instance, Druskat and Jordan (2007) reviewed 26 studies that examined the validity of emotional
intelligence (both models) for predicting performance at the individual, team, and leadership level.
Importantly, all of the studies reviewed were published in peer-reviewed journals. The overall
conclusion was that “emotional intelligence predicts work performance over and above measures of
personality and general mental ability” (p. 2).
Despite this recent optimism, there are conceptual and methodological problems associated with
the research on emotional intelligence. Most criticisms were directed at the mixed model (Mayer
342 Handbook of Employee Selection

et al., 2008). First, the ambiguous (all-encompassing) definition and the very broad content of the
mixed model have been criticized (e.g., Landy, 2005; Locke, 2005; Matthews, Roberts, & Zeidner,
2004). For example, Landy (2005) succinctly noted: “The construct [of emotional intelligence] and
the operational definitions of the construct (i.e., the actual measurement instruments) are moving
targets” (p. 419). Similarly, Locke (2005) posited that “The concept of EI [emotional intelligence]
has now become so broad and the components so variegated that no one concept could possible
encompass or integrate all of them, no matter what the concept was called; it is no longer even an
intelligible concept” (p. 426).
Another criticism relates to redundancy of the mixed model with Big Five personality traits.
For instance, De Raad (2005) explored to what extent emotional intelligence (mixed model) can be
expressed in terms of personality traits. To this end, he gathered a total of 437 items from emotional
intelligence inventories. Sixty-six percent of the emotional intelligence descriptors could be classi-
fied in a well-known Big Five framework (The Abridged Big Five-Dimensional Circumplex). The
lion share of the terms was categorized under agreeableness and emotional stability. The main rea-
son for items not being classifiable was that they were ambiguous because they were related to vari-
ous Big Five factors. In other studies, the multiple correlation between Big Five scores and scores on
mixed model emotional intelligence measures ranged between .75 and .79 (Brackett & Mayer, 2003;
Grubb & McDaniel, 2007). However, other studies found incremental validity of the mixed model
over and above personality (Law, Wong, & Song, 2004; Tett, Fox, & Wang, 2005). Nonetheless, in
the scientific community, there have been calls to give up the mixed model (that is very popular in
practice) and focus solely on the ability model (Daus & Ashkanasy, 2005).
The ability model is not without limitations either. For example, a large-scale examination of
many emotional intelligence, cognitive intelligence, and personality measures showed that emotion
perception (as represented by measures of perception of emotions in faces and pictures) was the only
branch of the four branches of the ability model that could not be classified under established mea-
sures (Davies, Stankov, & Roberts, 1998). But even the emotion perception construct has drawbacks
because the construct does not seem to have generalizability across different measures (Gohm,
2004). That is, existing emotion perception measures correlate lowly among themselves (Roberts
et al., 2006).
In comparing the findings from the ability and the trait models, a major methodological prob-
lem exists because of a method-construct confound resulting from the fact that the ability model
is often measured using performance-based tests whereas the trait model is often measured using
self-reports. To advance research on the comparison of ability and trait models of emotional intel-
ligence (and also on the comparison of these models when applied to practical intelligence or social
intelligence), rigorous designs that allow us to clearly isolate construct and method variances are
needed (Chan & Schmitt, 2005).

SOCIAL INTELLIGENCE
Of the three intelligence constructs, social intelligence has the longest history. The idea goes back
to Thorndike (1920), who defined social intelligence as “the ability to understand and manage men
and women, boys and girls to act wisely in human relations” (p. 228). As noted by Landy (2005),
Thorndike did not build a theory of social intelligence, he only used the notion of social intelligence
to clarify that intelligence could manifest itself in different facets (e.g., abstract, mechanical,
social).
Social intelligence has a checkered history. Early studies tried to distinguish social intelligence
from academic intelligence (e.g., Hoepener & O’Sullivan, 1968; Keating, 1978). However, these
research efforts were unsuccessful. The problem was that measures of social intelligence did not
correlate highly among themselves and that academic intelligence and social intelligence formed one
factor. Methodologically, it was troublesome that both intelligences were measured with the same
method (paper-and-pencil measures). The early research led to the conclusion that the “putative
Practical Intelligence, Emotional Intelligence, and Social Intelligence 343

domain of social intelligence lacks empirical coherency, at least as it is represented by the measures
used here” (Keating, 1978, p. 221).
Two advancements led to more optimism. The first was the distinction between cognitive social
intelligence (e.g., social perception or the ability to understand or decode verbal and nonverbal
behaviors of other persons) and behavioral social intelligence (effectiveness in social situations).
Using this multidimensional definition of social intelligence and multiple measures (self, teacher,
and peer ratings), Ford and Tisak (1983) were able to distinguish social intelligence from academic
intelligence. In addition, social intelligence predicted social behavior better than academic intel-
ligence (see also Marlowe, 1986). The second advancement was the use of multitrait-multimethod
designs (and confirmatory factor analysis) to obtain separate and unconfounded estimates of trait
and method variance (Jones & Day, 1997; Wong, Day, Maxwell, & Meara, 1995).
These more sophisticated multitrait-multimethod designs have brought further evidence for
the multidimensionality of social intelligence and for its discriminability vis-à-vis academic
intelligence. For example, the aforementioned distinction made between cognitive social intelligence
and behavioral social intelligence has been confirmed (e.g., Wong et al., 1995). Similarly, a
distinction is often made between fluid and crystallized social intelligence. The fluid form of social
intelligence refers to social-cognitive flexibility (the ability to flexibly apply social knowledge in
novel situations) or social inference. Conversely, a term such as social knowledge (knowledge of
social etiquette, procedural and declarative social knowledge about social events) denotes the more
crystallized component of social intelligence (Jones & Day, 1997). Despite these common findings,
the dimensions, the definitions, and measures of social intelligence still vary a lot across studies.
Along these lines, Weis and Süss (2005) recently gave an excellent overview of the different facets
of social intelligence that have been examined. This might form the basis to use a more uniform
terminology when describing social intelligence subdimensions.
In recent years, interest in social intelligence has also known a renaissance under the general
term of social effectiveness constructs. According to Ferris, Perrewé, and Douglas (2002), social
effectiveness is a “broad, higher-order, umbrella term, which groups a number of moderately-
related, yet conceptually-distinctive, manifestations of social understanding and competence”
(p. 50). Examples are social competence, self-monitoring, emotional intelligence, social skill, social
deftness, practical intelligence, etc. The value of social skills has been especially scrutinized.
Similar to social intelligence, social skills are posited to have a cognitive component (interpersonal
perceptiveness) and a behavioral component (behavioral flexibility; Riggio, 1986; Schneider,
Ackerman, & Kanfer, 1996).
A key distinction between social skills and personality traits is that the former are learned (i.e.,
an ability), whereas the latter are relatively stable. Research has found that they are only moderately
(.20) correlated (Ferris, Witt, & Hochwarter, 2001). However, both constructs are also related in
that social skills enable personality traits to show their effects (Ferris et al., 2001; Hogan & Shelton,
1998). Research has indeed confirmed that social skills moderate the effects of personality traits
(conscientiousness) on job performance (Witt & Ferris, 2003). Social skills were also found to have
direct effects on managerial job performance, although personality and cognitive ability were not
controlled for in most studies (Semadar, Robins, & Ferris, 2006).

CONCLUSIONS
Our review of practical, social, and emotional intelligence highlights that these three constructs
share remarkable similarities. Specifically, we see at least three parallels. First, the origins and
rationale behind each of the constructs can be summarized as “going beyond g.” Cognitively
oriented measures of ability and achievement have been traditionally used in employment and
educational contexts. However, at the same time, there has always been substantial interest in
exploring possible supplemental (“alternative”) predictors for broadening the constructs measured
and reducing possible adverse impact. Supplementing cognitive with alternative predictors is seen
344 Handbook of Employee Selection

as a mechanism for accomplishing this (Sackett, Schmitt, Ellingson, & Kabin, 2001). Whereas
social intelligence is the oldest construct, practical intelligence came into fashion at the end of
the 1980s. Since Goleman’s (1995) book, emotional intelligence is the newest fad. Every time, the
construct was introduced as the panacea for the problem of an exclusive reliance on g. We agree
that there is a need to go beyond g and identify new and non-g constructs, but a new construct
has little scientific explanatory and utility value if it is defined solely by negation (i.e., as non-g).
Hence, good construct validity evidence for the three constructs is needed and the current state
of research indicates to us that more rigorous construct validation studies are needed. Second, the
conceptualizations of these three constructs have salient parallels. Each of these three constructs
has various definitions, is multidimensional, and there exists debate about their different dimen-
sions. Third, for each of these constructs, investigations of incremental validity over and above
more established constructs, such as cognitive ability and personality, have been the focus of
debate and research.
So, are there conceptual differences between the three constructs? According to Landy (2005),
emotional intelligence as a so-called new construct has simply replaced the older notion of social
intelligence. Similarly, Bowman et al. (2001) posited that “it is not certain to what extent tacit knowl-
edge, social, and EQ measures are structurally independent” (p. 148). Although our review shows
that these three constructs are definitely overlapping, it is possible to make at least some subtle
distinctions. On the one hand, emotional intelligence might be somewhat narrower than social intel-
ligence because it focuses on emotional problems embedded in social problems (Mayer & Salovey,
1993). That is probably why Salovey and Mayer (1990) defined emotional intelligence as a subset of
social intelligence (p. 189). Conversely, one might also posit that emotional intelligence is broader
than social intelligence because internal regulatory processes/emotions are also taken into account,
something that is not the case in social intelligence. Practical intelligence with its emphasis on
real-world problems is more distinct than the other two constructs because it makes no reference to
interpersonal skills (Austin & Saklofske, 2005). Domain specificity is another aspect of tacit knowl-
edge that contrasts to the more generic nature of social and emotional intelligence. In any case, these
conceptual distinctions are open to investigation because no study has explicitly examined the three
constructs together (Weis & Süss, 2005).

MEASUREMENT APPROACHES
In the previous section, we showed that the conceptual debate around practical, social, and emotional
intelligence shared many parallels. The same can be said about the measurement approaches used
because the similarities in how practical intelligence, social intelligence, and emotional intelligence
are measured are striking. Generally, six measurement approaches might be distinguished: (a) self-
reports, (b) other-reports, (c) interviews, (d) tests, (e) SJTs, and (f) assessment center exercises. The
following sections discuss each of these approaches including their advantages and disadvantages.
Examples of instruments are also given, and these are summarized in Table 16.1.

SELF-REPORTS
The self-report approach presents respondents with descriptive statements and asks them to use a
sort of rating scale to indicate the extent to which they agree or disagree with the respective state-
ments. An important advantage of self-report measures is that they can be administered inexpen-
sively and quickly to large groups of respondents.
Examples of the self-report approach are many. In fact, most examples of self-report emotional
intelligence measures are based on the popular mixed model approach to emotional intelligence.
Examples are the Emotional Competence Inventory (ECI; Sala, 2002), the Trait Meta-Mood Scale
(TMMS; Salovey, Mayer, Goldman, Turvey, & Palfai, 1995), EQ-I (Bar-On, 1997), and the Trait
Emotional Intelligence Questionnaire (TEIQue; Petrides & Furnham, 2003). Other emotional
Practical Intelligence, Emotional Intelligence, and Social Intelligence 345

TABLE 16.1
Overview of Methods (Including Some Examples) for Measuring
Practical, Emotional, and Social Intelligence
Ability Emotional Trait Emotional
Method Intelligence Model Intelligence Model Practical Intelligence Social Intelligence
Self-reports • WLEIS • EQ-I • Self-reports of • Social skills
• SREIT • ECI people’s behavior in inventories
• MEIA • TMMS everyday situations
• SUEIT • TEIQue
Other-reports • Same as self- • Same as self-reports • Other-reports of • Same as
reports people’s behavior in self-reports
• Workgroup everyday situations
Emotional
Intelligence Profile
Performance-based • MSCEIT • No known examples • Basic skills tests • LEAS
tests • DANVA2 • IPT-15
• PONS • Four-six-factor
• JACBART tests of social
• EARS intelligence
• VOCAL-I
• MSFDE
Interviews • Interview rating on • Interview rating on • Interview rating on • Interview rating
components of the mixed model people’s reported on applied social
four-branch model emotional behavior in everyday skills
of Mayer, Salovey, intelligence situations
and Caruso. competencies
(interpersonal
sensitivity, stress
tolerance, etc.)
SJTs • STEU • SJTs that aim to • Tacit Knowledge • George
• STEM measure mixed model Inventories Washington Social
emotional Intelligence Test
intelligence (judgment in
competencies social situations)
ACs • AC rating on • AC rating on mixed • Case situational • AC rating on
components of the model emotional problems applied social
four-branch model intelligence skills
of Mayer, Salovey, competencies
and Caruso

intelligence measures are based on the four-branch model (or its predecessors) (Salovey & Mayer,
1990) but use a self-report methodology (instead of performance-based tests) for measuring it.
Examples are the Wong Law Emotional Intelligence Scale (WLEIS; Law et al., 2004; Wong &
Law, 2002), the Multidimensional Emotional Intelligence Assessment (MEIA; Tett et al., 2005),
the Swinburne University Emotional Intelligence Test (SUEIT; Palmer & Stough, 2001), or the
Schutte Self-Report Emotional Intelligence Test (SREIT; Schutte et al., 1998). We refer to Pérez,
Petrides, and Furnham (2005) for a comprehensive list of trait EQ measures. There are also self-
report inventories of social intelligence/social skills (e.g., Ferris et al., 2001; Riggio, 1986; Schneider
et al., 1996). We are not aware of self-report instruments (excluding SJTs as self-report measures)
that assess tacit knowledge.
In the personality domain, there is a long history of using self-report measures and an equally
long debate over its use. Clearly, the debate and issues concerning the use of self-report measures
346 Handbook of Employee Selection

in personality research is readily generalizable to the use of self-report measures in assessing social
and emotional intelligence. A detailed review of the pros and cons of self-report measures is beyond
the scope of this chapter. Suffice to say that self-report data are by no means perfect and are in prin-
ciple susceptible to various validity problems such as faking and inflation of correlations because
of common method variance. However, it is noteworthy that the severity of many of the purported
problems of self-report data may be overstated (for details, see Chan, 2008).

OTHER-REPORTS
Other-reports (or informant reports) have also been used for measuring emotional and social intel-
ligence. One reason is that knowledgeable others might provide less lenient and more reliable
measurement. Another reason is that multidimensional constructs such as emotional and social
intelligence inherently have an important interpersonal component. Hence, it makes sense that in
other-reports the same emotional and social intelligence scales as listed above are used, with others
(peers, colleagues, teachers, parents, friends) now rating the focal person on descriptive statements.
For example, peers or supervisors can also complete the ECI of Goleman. There also exist emo-
tional intelligence measures that were specifically developed for use in team settings. For instance,
Jordan, Ashkanasy, Hartel, and Hooper (2002) developed a specific work group emotional intel-
ligence measure, namely the Workgroup Emotional Intelligence Profile.
Although there exists much research supporting the use of peers in the personality domain
(e.g., Borkenau & Liebler, 1993; Funder, 1987; Kenny, 1991), research with other-based emotional
intelligence measures is relatively scarce. Van der Zee, Thijs, and Schakel (2002) confirmed that
peer ratings of emotional intelligence were more reliable. However, they also found that these peer
ratings suffered from leniency. Law et al. (2004) reported that peer-reports of a trait-based emotional
intelligence measure had substantial incremental validity over self-reports of emotional intelligence and
personality. So, it seems beneficial to use peers for mixed model emotional intelligence measures.

PERFORMANCE-BASED TESTS
Whereas both self-reports and peer-reports are assumed to be measures of typical performance,
performance-based tests are posited to measure maximal performance. The rationale behind these
tests parallels the one behind cognitive ability tests because these tests present people with social
or emotion-based problem solving items. For example, in popular tests of emotion perception, indi-
viduals are presented with faces, voices, or pictures and are then asked to describe the emotions.
Historically, performance-based tests have been used for measuring social intelligence. An often-
cited example is O'Sullivan, Guilford, & deMille’s (1965) tests of social intelligence (see Landy,
2006, for other older examples). A more modern example is the Levels of Emotional Awareness
Scale (LEAS; Lane, Quinlan, Schwartz, Walker, & Zeitlin, 1990), although this test has also been
used as a measure of emotional intelligence (e.g., Barchard, 2003). Similarly, the Interpersonal
Perception Task-15 (IPT-15; Costanzo & Archer, 1993) is a performance-based measure that pres-
ents videotapes to participants.
Recently, these tests have known a renaissance in the context of the ability model of emo-
tional intelligence, with the Mayer-Salovey-Caruso Emotional Intelligence Test (MSCEIT) as the
best-known example. Other well-known examples are the Japanese and Caucasian Brief Affect
Recognition Test (JACBART; Matsumoto et al., 2000), the Diagnostic Analysis of Nonverbal
Accuracy (DANVA2; Nowicki, 2004), the Profile of Nonverbal Sensitivity (PONS; Rosenthal, Hall,
DiMatteo, Rogers, & Archer, 1979), the Emotional Accuracy Scale (EARS; Mayer & Geher, 1996),
the Montreal Set of Facial Displays of Emotion (MSFDE; Beaupré, Cheung, & Hess, 2000), and the
Index of Vocal Emotion Recognition (Vocal-I; Scherer, Banse, & Wallbott, 2001).
As noted by Spector and Johnson (2006), there is a difference between knowledge about
emotions and the actual skill. It is not because one knows how to regulate one’s emotion in the face
Practical Intelligence, Emotional Intelligence, and Social Intelligence 347

of problems that one will also do this in an actual context. With regard to practical intelligence, this
problem has been circumvented by using basic skills tests (Diehl, Willis, & Schaie, 1995). These tests
measure, among others, the ability to perform daily tasks such as cooking or using a bus schedule.
Scoring constitutes another problem of performance-based tests. In contrast to cognitive ability
tests, emotional intelligence tests using the ability model, for instance, do not have objectively
correct answers.

INTERVIEWS
Interviews constitute another possible method for measuring practical, social, and emotional intel-
ligence. In the past, especially social skills (social intelligence) have been frequently measured
in interviews. This is demonstrated by the meta-analysis of Huffcutt, Conway, Roth, and Stone
(2001), who reviewed the type of constructs most frequently targeted by interviews in 47 studies.
Specifically, social skills were measured in 27.8% of the interviews. Moreover, applied skills were
twice as frequently rated in high-structure interviews (behavior description interviews and situ-
ational interviews) as compared with low-structure interviews (34.1% vs. 17.7%).
Essentially, interviews are measurement methods that can be used to assess a wide variety of con-
structs. On the basis of multiple job-related questions, interviewees are asked to describe behavior that
is relevant for constructs deemed important. Therefore, interviews could also be used for measuring
practical intelligence (Fox & Spector, 2000) and emotional intelligence (mixed model; Schmit, 2006).
Schmit noted how interview questions can try to elicit situations from interviewees wherein they had
to recognize emotions of others and how they dealt with this. Yet, in interviews, observable samples of
behavior can be observed only for specific dimensions (e.g., interpersonal skills or oral communication
skills; Van Iddekinge, Raymark, & Roth, 2005). For other dimensions, candidates report past behavior
(in behavior description interviews) or intended behavior (in situational interviews).

SJTS
SJTs might be another approach for measuring practical, social, and emotional intelligence (Chan,
2000, 2006; O’Sullivan, 2007; Schulze, Wilhelm, & Kyllonen, 2007). SJTs are measurement meth-
ods that present respondents with job-related situations and sets of alternate courses of action to
these situations. For each situation, respondents either select the best and worst options or rate each
of the alternative actions in terms of its effectiveness. Because respondents have to respond to real-
istic (written and especially video-based) scenarios, SJTs might constitute a more contextualized
(ecologically valid) way of measuring practical, social, and emotional intelligence. This judgment
in a realistic context contrasts to the decontextualized nature of standardized tests. Technological
advancements make it possible to develop interactive SJTs that present different video fragments on
the basis of responses to earlier video fragments. This allows the SJT to simulate the dynamics of
interaction. Similar to emotional intelligence tests (ability model), multiple-choice SJTs are scored
using expert (excellent employees) or empirical (large pilot samples) grounds.
Over the years, SJTs have been developed for measuring each of the three constructs. First, as
noted by McDaniel, Morgeson, Finnegan, Campion, and Braverman (2001), the first SJTs were
social intelligence tests, namely the Judgment in Social Situations subtest of the George Washington
Social Intelligence Test. Second, instruments very similar to SJTs are used under the label “tacit
knowledge tests” for measuring practical intelligence (Sternberg et al., 1995). Examples are the Tacit-
Knowledge Inventory for Managers or the Tacit-Knowledge Inventory for Military Leaders. Third,
recent research has explored the use of SJTs for measuring two branches of Mayer and Salovey’s
emotional intelligence model. Specifically, MacCann and Roberts (2008) developed the Situational
Test of Emotional Understanding (STEU) and the Situational Test of Emotion Management (STEM).
There have also been recent attempts to develop video-based SJTs for measuring emotional intel-
ligence (Bedwell & Chuah, 2007).
348 Handbook of Employee Selection

SJTs are also referred to as low-fidelity simulations. Although they aim to provide a more eco-
logically valid approach for measuring practical, social, and emotional intelligence, they do not
require candidates to actually show how they would handle a specific situation. Candidates have
to pick the “correct” answer from a limited set of predetermined response options. Nevertheless,
a meta-analysis xof McDaniel et al. (2001) found a corrected correlation between SJTs and job
performance in employment settings of .34. In addition, recent research (Chan & Schmitt, 2002;
McDaniel, Hartman, Whetzel, & Grubb, 2007) provided evidence for the incremental validity of
SJTs in predicting job performance over and above the prediction provided by cognitive ability and
personality. Other validity research also found that video-based SJTs are more valid than written
ones (Lievens & Sackett, 2006).
An interesting aspect of SJTs is that differences in mean SJT scores between racial subgroups
are typically smaller than those reported for cognitive ability tests. The meta-analysis of Nguyen,
Biderman, and McDaniel (2005) found a difference in mean SJT scores between Whites and Blacks
of about .30 SD in favor of White candidates, which is much smaller than the 1.00 SD typically
found for cognitive ability tests (Jensen, 1998). A key determinant of whether SJTs show adverse
impact is the correlation of SJTs with cognitive ability. Yet, it should be noted that the lower reli-
ability of SJTs might also partially explain the lower subgroup differences found.
SJTs are inherently multidimensional because SJT items may refer to a range of situations and
include different types of content to which applicants attend when making a decision. In addition,
responses to SJT items with multiple options are the result of a combination of ability, experi-
ence, and personality (McDaniel et al., 2001; McDaniel & Whetzel, 2005). The multidimensional
nature of SJTs makes it often difficult to assess what they exactly measure. For instance, factor
analytic research on SJTs typically reveals a plethora of factors that are difficult to interpret (Chan &
Schmitt, 2005).

ASSESSMENT CENTER EXERCISES


A final possible approach for measuring practical, social, and interpersonal intelligence consists of
putting people in a simulated situation, observing their actual behavior, and then making inferences
about their standing on the construct of interest. Performance (or authentic) assessment is often used
as a general term for describing this strategy. In industrial and organizational (I-O) psychology, this
contextualized approach focusing on actual behavior is exemplified by assessment centers (ACs). In
ACs, several job-related simulations (e.g., role-play, interview simulation, in-basket, group discus-
sion) aim to elicit behavior relevant to the constructs under investigation. The assumption is that
individuals’ responses to these simulations reflect the responses that they would exhibit in the real
world. Multiple trained assessors observe and rate the candidates on these constructs.
According to Gowing (2001), the roots of the measurement of social, practical, and emotional
intelligence can be traced to this AC approach. Although these constructs are not explicitly mea-
sured in AC exercises, they correspond well to the typically competencies targeted by AC exercises.
In particular, some AC competencies such as flexibility, awareness for others, interpersonal skills,
flexibility, stress tolerance, and communication have clear resemblances with practical, emotional,
and social intelligence. The context sensitivity of what constitutes good performance in AC exer-
cises and the ease with which situations may temporally unfold or change through injecting novel
demands as the exercise progresses are features of the AC that make it a useful method for mea-
suring the adaptability competencies associated with practical, emotional, and social intelligence
(Chan, 2000).
Several researchers have explicitly related the measurement of these AC dimensions to the
measurement of the one or more of the three intelligence constructs. Specifically, Spector and
Johnson (2006) presented various examples of how AC exercises might be adapted for measuring
emotional intelligence. For example, in a role-play, a participant might be asked to deal with
an irate customer or to comfort an upset colleague. Assessors might then rate the assessees on
Practical Intelligence, Emotional Intelligence, and Social Intelligence 349

broad-based competencies or on more detailed verbal/nonverbal behaviors. Another example is


Stricker and Rock’s (1990) Interpersonal Competency Inventory (ICI), wherein participants have
to respond orally to videotaped scenes. Similarly, Sternberg and colleagues have argued that the
typical AC exercises are very useful for assessing practical intelligence. For example, Hedlund,
Wilt, Nebel, Ashford, and Sternberg (2006) developed so-called “case scenario problems” as a
skill-based measure of practical intelligence. These case scenario problems consist of a fictitious
business case wherein participants are given information such as the history of the organization,
their role, memos, e-mails, and financial tables. Individuals have to use their practical intelligence
(practical problem-solving skills) to solve these contextual and poorly defined problems. Clearly,
this methodology is very similar to the in-basket format that has been used for decades in ACs.
Although the emphasis on simulations and actual behavior results in good AC validities (Arthur,
Day, McNelly, & Edens, 2003) and little adverse impact (Terpstra, Mohamed, & Kethley, 1999),
the quality of construct measurement remains the Achilles heel of ACs (Lance, Lambert, Gewin,
Lievens, & Conway, 2004). Ratings of the same competency do not converge well across exercises
(i.e., poor convergent validity). In addition, there is little distinction between dimensions within a
specific exercise because within-exercise dimension ratings are highly correlated (i.e., poor dis-
criminant validity).

CONCLUSIONS
Our review of measurement approaches suggests parallels in how the three constructs are measured.
Although it is often thought that the three constructs are primarily measured with self-reports and
performance tests, this section highlighted that there are a wide array of other options possible.
Specifically, interviews, peer-reports, and instruments with somewhat more fidelity (e.g., SJTs and
AC exercises) are viable measurement approaches. Future research should explore these alternative
measurement methods.

CONCEPTUAL FRAMEWORK FOR EXAMINING PRACTICAL,


EMOTIONAL, AND SOCIAL INTELLIGENCE
In Figure 16.1, we present a conceptual framework that we adapted from Chan and Schmitt (2005)
to organize the discussion and guide future research on the validity of practical, emotional, and
social intelligence. Following Chan and Schmitt, the framework construes all three types of intel-
ligence as competencies that are multidimensional constructs, each of which is a partial mediator
of the predictive or causal effect of unidimensional knowledge, skills, abilities, and other character-
istics (KSAOs) on job performance or other job-relevant criteria. In addition, our framework con-
strues the three types of intelligences as distinct but related competencies with common and unique
construct space as depicted by the three overlapping circles representing practical, emotional, and
social intelligence.
The framework in Figure 16.1 shows that proponents and opponents of each of these three con-
structs are right and wrong in different ways. Specifically, the opponents typically focus on the
KSAOs and correctly argue that practical, emotional, and social intelligences are not factorally
pure (unitary) KSAOs, but they incorrectly dismissed the validities and value of these intelligence
constructs. Conversely, the proponents typically focus on the multidimensional competencies and
correctly argue that practical, emotional, and social intelligences are proximal (and hence some-
times better) predictors of performance and other criteria, but they incorrectly ignore the important
role of KSAOs in determining the nature of these intelligence constructs.
Our framework is consistent with and may reconcile several extant findings and the debate over
the value of the three types of intelligence. For example, each of the three intelligence constructs is
inherently multidimensional in the sense that it is conceptualized as a multidimensional competency
resulting from a combination of several different individual difference constructs. The relationships
350 Handbook of Employee Selection

Individual difference Job performance and


constructs other criterion
(KSAOs) constructs

Cognitive abilities Task performance


Practical
Personality traits Contextual performance
intelligence

Cognitive styles Adaptive performance


Emotional Social
Values Withdrawal behaviors
intelligence intelligence

Motivational constructs Counterproductive


behaviors
Experience

Job knowledge

FIGURE 16.1 Conceptual framework for examining practical, emotional, and social intelligence. (Adapted
from Chan, D., & Schmitt, N., Situational judgment tests, in A. Evers, O. Smit-Voskuijl, & N. Anderson, Eds.,
Handbook of personnel selection, 219–242, Blackwell, Oxford, United Kingdom, 2005.)

linking each type of intelligence and the various individual difference constructs explain the consis-
tent findings from factor analytic studies that the intelligence measure is factorially complex and the
data from the measure do not produce good fit with a single factor model. These relationships also
explain the significant and sometimes substantial correlations between the intelligence measure and
the established measures of traditional KSAOs such as cognitive ability and personality traits. In
addition, these relationships provide the conceptual bases for examining ability models, trait mod-
els, and mixed models of emotional (as well as practical or social) intelligence.
The findings on the substantial zero-order validities and incremental validities of practical
intelligence in predicting job performance over the prediction provided by cognitive ability and
personality traits (e.g., Chan & Schmitt, 2002) are consistent with the proximal status of practical
intelligence competencies (relative to the distal status of KSAOs) in the prediction of job perfor-
mance. Similarly, the proximal status of emotional and social intelligence also explains the find-
ings from studies that showed zero-order and incremental validities of these intelligence measures
in the prediction of job performance and other criteria (for meta-analytic review of studies, see
Druskat & Jordan, 2007). Interestingly, Figure 16.1 may also explain why SJTs and ACs, which
are multidimensional measures, do better than factorially pure measures of single unitary con-
structs (e.g., cognitive ability, personality) in predicting job-relevant performance criteria, which
are often multidimensional in nature. That is, much of what SJTs and ACs are assessing may well
be multidimensional competencies similar, if not identical, to practical, emotional, and social
intelligence.
We believe the conceptual framework in Figure 16.1 is consistent with existing findings and
reconciles much of the debate on the validity of practical, emotional, and social intelligence,
but more direct empirical support of the framework is certainly needed. We reiterate the call in
Chan and Schmitt (2005) that to obtain more direct evidence for a framework that construes the
intelligence competencies as multidimensional mediators in the relationship between KSAOs and
job performance (and other criteria), we would need to specify and test hypothesized and alternative
structural equation models (on the basis of primary data from a single study or cumulation of
results from past studies using meta-analyses) linking KSAOs, intelligence competencies, and job
performance or other criterion outcomes. Future research could derive theory-driven specific models
from the general framework depicted in Figure 16.1 to empirically examine the validity of one or
more of the three intelligence constructs that would facilitate the interpretation of the correlations
Practical Intelligence, Emotional Intelligence, and Social Intelligence 351

between the intelligence construct and more established individual difference KSAOs, as well as the
zero-order and incremental validities of the intelligence construct in predicting different criterion
outcomes. In the following section, we suggest various strategies for formulating theory-driven
testable models that are likely to advance research in ways that make conceptual and practical
contributions to the study of practical, emotional, and social intelligence.

STRATEGIES FOR FUTURE RESEARCH


We suggest the following five strategies for future research on the three types of intelligence: (a) matching
predictor and criterion, (b) disentangling methods and constructs, (c) going beyond bivariate relation-
ships, (d) using longitudinal validation designs, and (e) adopting a multilevel perspective.

MATCHING BETWEEN PREDICTOR AND CRITERION


An important development in personnel selection research is the movement away from general
discussions of predictors as “valid” to consideration of “valid for what?”. This development of more
nuanced questions about predictor-criterion relationships was spurred by the taxonomic work on job
performance led by Campbell, McCloy, Oppler, and Sager (1993) that differentiated performance
into multiple distinct dimensions. Since then, selection researchers have significantly expanded the
notion of job performance to include distinct performance dimensions such as those listed in the
criterion space of the framework in Figure 16.1. The expansion of the definition of performance and
recognition of the multidimensional nature of performance led to streams of research demonstrating
that different predictor constructs and selection tests will offer optimal predictive validity depend-
ing on the performance dimension(s) of interest (Chan, 2005a). For example, research has shown
that task performance is better predicted by cognitive ability tests, whereas contextual performance
is better predicted by personality tests (McHenry, Hough, Toquam, Hanson, & Ashworth, 1990).
The key message here is that one needs to carefully attend to the constructs underlying both predic-
tors and criterion dimensions in developing hypotheses about predictor-criterion relationships.
Unfortunately, research on practical, social, and emotional intelligence has typically ignored
linking these constructs to relevant criterion variables (Landy, 2005). These constructs are often
proposed to predict almost everything. This is probably best exemplified by studies investigating the
validity of emotional intelligence for predicting academic performance (e.g., Amelang & Steinmayr,
2006; Barchard, 2003; Jaeger, 2003; Newsome, Day, & Catano, 2000; Parker, Hogan, Eastabrook,
Oke, & Wood, 2006). There is no clear theoretical basis or conceptual match between emotional
intelligence and grade point average (GPA). Clearly, emotional intelligence will have at best mod-
erate predictive value for predicting an omnibus cognitively loaded criterion such as GPA. Hence,
we need studies that carefully match the three intelligence constructs and their subdimensions to
relevant criteria. For example, trait emotional intelligence might be a good predictor of courses
that require teamwork instead of cumulative GPA (see also Lievens, Buyse, & Sackett, 2005) and
satisfaction at school.
Referring to Figure 16.1, we could apply the conceptual matching between predictor and criterion
to foster our understanding of the link between the three intelligence constructs and the different
dimensions of job performance. For instance, task performance might be predicted by ability-based
emotional intelligence, whereas contextual performance might be predicted by trait-based emo-
tional intelligence. As another example, practical intelligence might predict adaptive performance
better than it predicts routine task performance.

DISENTANGLING METHODS AND CONSTRUCTS


In recent years, there is increased recognition that methods should be distinguished from constructs
in the comparative evaluation of predictors (Arthur & Villado, 2008; Arthur et al., 2003; Bobko,
352 Handbook of Employee Selection

Roth, & Potosky, 1999; Chan & Schmitt, 1997, 2005; Lievens, Harris, Van Keer, & Bisqueret,
2003). Constructs refer to the substantive conceptual variables (e.g., conscientiousness, cognitive
ability, finger dexterity, field dependence-independence, reaction time, visual attention, emotional
intelligence) that the measures were designed to assess. Conversely, methods refer to the tests,
techniques, or procedures (e.g., paper-and-pencil tests, computer-administered tests, video-based
tests, interviews, ACs, self-reports, peer reports) used to assess the intended constructs. This
distinction between constructs and methods is especially crucial for multidimensional predictors
(Bobko et al., 1999). Conceptual and methodological issues of variance partition associated with the
construct-method distinction and their applications to constructs such as practical intelligence are
available in Chan and Schmitt (2005).
Given the multidimensional nature of practical, social, and emotional intelligence, clarity of the
method-construct distinction is critical. As shown in Table 16.1, practical, social, and emotional
intelligence might be measured in multiple ways. As noted above, recent research on social intel-
ligence has adopted such multitrait-multimethod design and cleared some of the confusion around
this construct. For example, social intelligence constructs (e.g., social understanding, memory, and
knowledge) were operationalized in a multitrait-multimethod design applying verbal, pictorial, and
video-based performance measures.
A similar strategy could be followed for clarifying some of the confusion related to emotional
intelligence. So far, research mainly compared self-reports of ability-based emotional intelligence
or mixed model emotional intelligence to personality inventories (see Roberts et al., 2006, for
an exception). However, many more strategies are possible. One possibility is to operationalize a
specific branch of the emotional intelligence ability model via different measurement approaches
(Wilhelm, 2005). For example, the emotion understanding branch of the ability model might be
measured via the MSCEIT and an SJT. Similarly, the emotion perception branch might be mea-
sured via faces, pictures, movies, voices, etc. As another example, people might complete an ability
emotional intelligence test, they might provide self-reports of their emotional intelligence, and they
might be rated by trained assessors on emotional intelligence (or conceptually similar competencies
such as interpersonal sensitivity) in AC exercises. Such research designs (see also Landy, 2006)
focus on convergent validity and enable one to answer key questions such as the following:

• How well do these different methods converge in assessing emotional intelligence?


• How much variance is accounted for by method factors and how much variance is accounted
for by substantive construct factors?
• What does this tell us about the construct?

It is important to distinguish between methods and constructs because comparative evaluations


of predictors might be meaningful only when one either (a) holds the method constant and varies the
content or (b) holds the constructs constant and varies the method. This is another reason why it is
crucial to operationalize emotional intelligence constructs via multiple methods. Moreover, it shifts
the attention from measures to constructs (Matthews et al., 2004). Similarly, the need to include
diversity in measurement also applies to the criterion side (see also Figure 16.1) because most stud-
ies on trait emotional intelligence are prone to common method variance (predictors and criteria
are measured with the same method, namely self-reports). We need studies that link the three intel-
ligence constructs to objective measures of the various performance constructs.

GOING BEYOND BIVARIATE RELATIONSHIPS


Current personnel selection research has gone beyond documenting simple bivariate relationships
between individual difference predictor and job performance criterion to examine mediator and
moderator relationships. Identifying mediators in the predictor-criterion relationship increases
our understanding of the prediction and helps in the search for alternative predictors or design of
Practical Intelligence, Emotional Intelligence, and Social Intelligence 353

interventions that influence individuals’ scores on the criteria (by understanding what might affect
the mediator). Research could attempt to explicate the precise affective, cognitive, motivational, and
behavioral mechanisms that mediate the effects of practical, emotional, or social intelligence on the
criterion and directly measure and test these hypothesized mediation mechanisms. For example,
cognitions and motivations (expectancy and instrumentality beliefs), or more subtle mediators (like-
ability), may mediate the intelligence effects on criteria such as job satisfaction and performance.
When an intelligence construct interacts with another predictor (e.g., personality trait) to affect
the criterion, the interaction effect is mathematically equivalent whether we select intelligence or
the other predictor as the moderator. However, conceptually, the predictor selected as the moderator
reflects different research questions. Identifying moderators that affect the magnitude and even nature
of the relationship between the intelligence and criterion constructs is important as the moderator
effect clarifies the range and boundary conditions of the predictive validity of the intelligence
construct. There has been increasing research examining moderator effects in the predictive validity
of personality traits (e.g., Barrick, Parks, & Mount, 2005). However, in the domain of practical,
emotional, and social intelligence, research on moderator effects on their validity is still scarce. For
instance, Côté and Miners (2006) found that emotional intelligence was linked to task performance
and organizational citizenship behavior (OCB) toward the organization only for people low on
cognitive ability. Another rare example is Ferris et al. (2001), who reported that the relationship
between social intelligence and job performance was stronger among workers high rather than low
in cognitive ability. On the other hand, when the intelligence construct is the moderator affecting
the relationship between another predictor and the criterion, the importance of the intelligence
construct is demonstrated not in terms of its bivariate predictive validity of the criterion, but in terms
of its role in determining the range and boundary conditions of the bivariate predictive validity
of another predictor. Several studies have demonstrated important moderator roles of practical,
emotional, and social intelligence constructs. For example, Witt and Ferris (2003) found that the
conscientiousness-performance relationship was moderated by social intelligence in that high
levels of conscientiousness together with poor social intelligence led to lower performance. Chan
(2006) found that proactive personality predicts work perceptions (procedural justice perception,
perceived supervisor support, social integration) and work outcomes (job satisfaction, affective
organizational commitment, job performance) positively among individuals with high practical
intelligence (construed in terms of situational judgment effectiveness) but negatively among those
with low practical intelligence. The findings on the disordinal interaction effects show that high
levels of proactive personality may be either adaptive or maladaptive depending on the individual’s
level of practical intelligence and caution against direct interpretations of bivariate associations
between proactive personality and work-relevant criteria. In short, fruitful future research could be
conducted by adopting a strategy that goes beyond bivariate relationships to examine the mediators
that link the intelligence construct to the criterion construct, the moderators that affect the nature
of the intelligence-criterion relationship, and the role of the intelligence construct as a moderator
affecting the nature of a predictor-criterion relationship.

USING LONGITUDINAL VALIDATION DESIGNS


The time spans over which criteria are gathered for validation studies often reflect practical con-
siderations. In predictive studies, the time period selected for the criterion rarely exceeds a year or
two. Validation studies of practical intelligence, social intelligence, or emotional intelligence are no
exception. As such, criterion-related validities reported for these three constructs may or may not
accurately estimate the long-term validities associated with these constructs. That is, early perfor-
mance may not be reflective of typical performance over an individual’s tenure in an organizational
or educational context, and if so, early validation efforts would provide misleading results.
In the personnel selection domain, research has shown that predictors of job performance might
differ across job stages. Along these lines, the transitional job stage at which there is a need to learn
354 Handbook of Employee Selection

new things is typically contrasted to the more routine maintenance job stage (Murphy, 1989). For
instance, Thoresen, Bradley, Bliese, and Thoresen (2004) found that openness was related to per-
formance and performance trends in the transition stage but not to performance at the maintenance
stage. As another example, Jansen and Stoop (2001) discovered that the AC dimension of interper-
sonal effectiveness showed validity only after several years on the job.
We believe that future studies on practical, social, and emotional intelligence should also adopt a
longitudinal design where possible. Similar to personality, it might well be that the validity of these
intelligence constructs differs in the long run for predicting job performance. For example, the tran-
sitional job stage typically involves more adaptive demands than the routine maintenance job stage.
So, practical intelligence might predict job performance stronger in the transitional job stage than
in the routine maintenance job stage.
A construct-oriented approach to the study of practical, emotional, and social intelligence that
locates the constructs in the framework presented in Figure 16.1 would provide the conceptual basis
to hypothesize, test, and interpret performance changes over time. Using appropriate longitudinal
designs and change assessment techniques allows one to draw practical implications for key issues
such as changes in test validities, changes in mean performance, changes in rank order of individu-
als’ performance, and changes in dimensionality (i.e., number and/or nature of dimensions) of per-
formance (see Chan, 1998a, 2005a).

ADOPTING A MULTILEVEL PERSPECTIVE


In many contexts, personnel selection researchers have to move beyond the individual level to con-
sider variables at the higher levels (e.g., group, organization) of analysis. For example, a study con-
cerned with identifying individual difference variables that predict work group performance has to
deal with constructs and data at the individual and group levels of analysis. In the conceptual frame-
work presented in Figure 16.1, the three intelligence constructs—and all of the other constructs in
the individual difference and criterion spaces–could be conceptualized, measured, and analyzed in
multiple levels of analysis (e.g., individual, group, organization).
So far, the research on practical, emotional, and social intelligence has not adopted a multilevel
approach. With the increasing reliance on the use of teams to accomplish work in various organizations,
the relevant job performance criteria are often at the higher level (e.g., team, organization) than the indi-
vidual level of analysis. When each of the three intelligence constructs is examined as predictors in the
multilevel context of staffing teams or organizations and relating them to job performance at the indi-
vidual, team, and organizational levels, we would need appropriate composition models (Chan, 1998b)
that explicate the functional relationships linking the same intelligence constructs at the different levels
of analysis so that we have clear conceptual understanding of what is meant by team social intelligence
and how to measure and analyze social intelligence at the team level. Unlike the traditional KSAOs,
which are single unitary constructs, the multidimensional nature of the practical, emotional, and social
intelligence constructs would pose challenges to multilevel research because of the increased difficulty
in formulating and testing appropriate composition models for these intelligence constructs.
Multilevel constructs and data bring with them complex conceptual, measurement, and data
analysis issues and discussion of these issues is beyond the scope of this chapter (for reviews, see
Chan, 1998b, 2005b). Our basic point is that a multilevel approach is a strategy for future research
on practical, emotional, and social intelligence that is not just desirable but probably necessary,
given the inherently multilevel nature of the criteria of interest (e.g., team performance) that are
emerging in personnel selection research.

EPILOGUE
We have, under the constraint of a relatively short chapter length, critically reviewed the vast litera-
ture on practical, emotional, and social intelligence constructs. We have proposed a conceptual
Practical Intelligence, Emotional Intelligence, and Social Intelligence 355

framework, adapted from Chan and Schmitt (2005), which provides a way to organize the conceptu-
alizations of the intelligence constructs and their relationships with other individual difference and
criterion constructs. We believe that this framework also reconciles some, if not most, of the findings
and debates in the literature on the intelligence constructs. Finally, by explicating several strategies
for future research, we hope that more scientifically rigorous studies could be conducted on practi-
cal, emotional, and social intelligence to provide practitioners in personnel selection and other HR
functions a more evidence-based basis for the use of these intelligence constructs and measures.

REFERENCES
Amelang, M., & Steinmayr, R. (2006). Is there a validity increment for tests of emotional intelligence in
explaining the variance of performance criteria? Intelligence, 34, 459–468.
Arthur, W., Day, E. A., McNelly, T. L., & Edens, P. S. (2003). A meta-analysis of the criterion-related validity
of assessment center dimensions. Personnel Psychology, 56, 125–154.
Arthur, W., & Villado, A. (2008). The importance of distinguishing between constructs and methods when
comparing predictors in personnel selection research and practice. Journal of Applied Psychology, 93,
435–442.
Austin, E. J., & Saklofske, D. H. (2005). Far too many intelligences? On the communalities and differences
between social, practical, and emotional intelligences. In R. Schulze & R. D. Roberts (Eds.), Emotional
intelligence—An international handbook (pp. 107–128). Cambridge, MA: Hogrefe & Huber.
Barchard, K. A. (2003). Does emotional intelligence assist in the prediction of academic success? Educational
and Psychological Measurement, 63, 840–858.
Bar-On, R. (1997). Bar-On emotional quotient inventory: a measure of emotional intelligence: Toronto,
Ontario: Multi-Health Systems.
Barrick, M. R., Parks, L., & Mount, M. K. (2005). Self-monitoring as a moderator of the relationships between
personality traits and performance. Personnel Psychology, 58, 745–767.
Beaupré, M. G., Cheung, N., & Hess, U. (2000). The Montreal Set of Facial Displays of Emotion (Slides).
Available from Ursula Hess, Department of Psychology, University of Quebec at Montreal, P.O. Box
8888, Station“Centre-ville,” Montreal, Quebec H3C 3P8.
Bedwell, S., & Chuah, S. C. (2007, April). Video-based assessment of emotion perception: toward high fidel-
ity. Paper presented at the 22nd Annual Conference of the Society of Industrial and Organizational
Psychology, New York, NY.
Bobko, P., Roth, P. L., & Potosky, D. (1999). Derivation and implications of a meta-analytic matrix incorporat-
ing cognitive ability, alternative predictors, and job performance. Personnel Psychology, 52, 561–589.
Borkenau, P., & Liebler, A. (1993). Convergence of stranger ratings of personality and intelligence with self-
ratings, partner ratings, and measures of intelligence. Journal of Personality and Social Psychology, 65,
546–553.
Bowman, D. B., Markham, P. M., & Roberts, R. D. (2001). Expanding the frontier of human cognitive abilities:
So much more than (plain) g! Learning and Individual Differences, 13, 127–158.
Boyatzis, R. E., & Sala, F. (2004). Assessing emotional intelligence competencies. In G. Geher (Ed.), The
measurement of emotional intelligence: Common ground and controversy (pp. 147–180). Hauppauge,
NY: Nova Science.
Brackett, M. A., & Mayer, J. D. (2003). Convergent, discriminant, and incremental validity of competing mea-
sures of emotional intelligence. Personality and Social Psychology Bulletin, 29, 1147–1158.
Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. (1993). A theory of performance. In N. Schmitt &
W. C. Borman (Eds.), Personnel selection in organizations (pp. 35–70). San Francisco, CA: Jossey-
Bass.
Chan, D. (1998a). The conceptualization of change over time: An integrative approach incorporating longitu-
dinal means and covariance structures analysis (LMACS) and multiple indicator latent growth modeling
(MLGM). Organizational Research Methods, 1, 421–483.
Chan, D. (1998b). Functional relations among constructs in the same content domain at different levels of
analysis: A typology of composition models. Journal of Applied Psychology, 83, 234–246.
Chan, D. (2000). Understanding adaptation to changes in the work environment: Integrating individual difference
and learning perspectives. Research in Personnel and Human Resources Management, 18, 1–42.
Chan, D. (2005a). Current directions in personnel selection. Current Directions in Psychological Science, 14,
220–223.
356 Handbook of Employee Selection

Chan, D. (2005b). Multilevel research. In F. T. L. Leong & J. T. Austin (Eds.), The psychology research hand-
book (2nd ed., pp. 401–418). Thousand Oaks, CA: Sage.
Chan, D. (2006). Interactive effects of situational judgment effectiveness and proactive personality on work
perceptions and work outcomes. Journal of Applied Psychology, 91, 475–481.
Chan, D. (2008). So why ask me?—Are self-report data really that bad? In C. E. Lance & R. J. Vandenberg.
(Eds.), Statistical and methodological myths and urban legends: Received doctrine, verity, and fable in
the organizational and social sciences (pp. 309–336). Hillsdale, NJ: Lawrence Erlbaum.
Chan, D., & Schmitt, N. (1997). Video-based versus paper-and-pencil method of assessment in situational judg-
ment tests: Subgroup differences in test performance and face validity perceptions. Journal of Applied
Psychology, 82, 143–159.
Chan, D., & Schmitt, N. (2002). Situational judgment and job performance. Human Performance, 15, 233–254.
Chan, D., & Schmitt, N. (2005). Situational judgment tests. In A. Evers, O. Smit-Voskuijl, & N. Anderson
(Eds.), Handbook of personnel selection (pp. 219–242). Oxford, England: Blackwell.
Costanzo, M., & Archer, D. (1993). The Interpersonal Perception Task-15 (Videotape). Berkeley, CA: University
of California Extension Media Center.
Côté, S., & Miners, C. (2006). Emotional intelligence, cognitive intelligence, and job performance.
Administrative Science Quarterly, 51, 1–28.
Daus, C. S., & Ashkanasy, N. M. (2005). The case for the ability-based model of emotional intelligence in
organizational behavior. Journal of Organizational Behavior, 26, 453–466.
Davies, M., Stankov, L., & Roberts, R. D. (1998). Emotional intelligence: In search of an elusive construct.
Journal of Personality and Social Psychology, 75, 989–1015.
De Raad, B. (2005). The trait-coverage of emotional intelligence. Personality and Individual Differences, 38,
673–687.
Diehl, M., Willis, S. L., & Schaie, K. W. (1995). Everyday problem-solving in older adults observational
assessment and cognitive correlates. Psychology and Aging, 10, 478–491.
Druskat, V., & Jordan, P. (2007). Emotional intelligence and performance at work. Manuscript submitted for
publication.
Ferris, G. R., Perrewé, P. M., & Douglas, C. (2002). Social effectiveness in organizations: Construct validity
and research directions. Journal of Leadership & Organizational Studies, 9, 49–63.
Ferris, G. R., Witt, L. A., & Hochwarter, W. A. (2001). Interaction of social skill and general mental ability on
job performance and salary. Journal of Applied Psychology, 86, 1075–1082.
Ford, M. E., & Tisak, M. S. (1983). A further search for social intelligence. Journal of Educational Psychology,
75, 196–206.
Fox, S., & Spector, P. E. (2000). Relations of emotional intelligence, practical intelligence, general intelligence,
and trait affectivity with interview outcomes: It’s not all just ‘G’. Journal of Organizational Behavior,
21, 203–220.
Funder, D. C. (1987). Errors and mistakes: Evaluating the accuracy of social judgment. Psychological Bulletin,
101, 75–90.
Gohm, C. L. (2004). Moving forward with emotional intelligence. Psychological Inquiry, 15, 222–227.
Goleman, D. (1995). Emotional intelligence: Why it can matter more than IQ. New York, NY: Bantam.
Gottfredson, L. S. (2003). Dissecting practical intelligence theory: Its claims and evidence. Intelligence, 31,
343–397.
Gowing, M. K. (2001). Measures of individual emotional competencies. In C. Cherniss & D. Goleman (Eds.),
The emotionally intelligent workplace (pp. 83–131). San Francisco, CA: Jossey-Bass.
Grubb, W. L., & McDaniel, M. A. (2007). The fakability of Bar-On’s Emotional Quotient Inventory Short
Form: Catch me if you can. Human Performance, 20, 43–59.
Hedlund, J., Forsythe, G. B., Horvath, J. A., Williams, W. M., Snook, S., & Sternberg, R. J. (2003). Identifying
and assessing tacit knowledge: Understanding the practical intelligence of military leaders. Leadership
Quarterly, 14, 117–140.
Hedlund, J., Wilt, J. M., Nebel, K. L., Ashford, S. J., & Sternberg, R. J. (2006). Assessing practical intelligence
in business school admissions: A supplement to the graduate management admissions test. Learning and
Individual Differences, 16, 101–127.
Hoepener, R., & O’Sullivan, M. (1968). Social intelligence and IQ. Educational and Psychological Measurement,
28, 339–344.
Hogan, R., & Shelton, D. (1998). A socioanalytic perspective on job performance. Human Performance, 11, 129–144.
Huffcutt, A. I., Conway, J. M., Roth, P. L., & Stone, N. J. (2001). Identification and meta-analytic assessment
of psychological constructs measured in employment interviews. Journal of Applied Psychology, 86,
897–913.
Practical Intelligence, Emotional Intelligence, and Social Intelligence 357

Jaeger, A. J. (2003). Job competencies and the curriculum: An inquiry into emotional intelligence in graduate
professional education. Research in Higher Education, 44, 615–639.
Jansen, P. G. W., & Stoop, B. A. M. (2001). The dynamics of assessment center validity: Results of a 7-year
study. Journal of Applied Psychology, 86, 741–753.
Jensen, A. R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger.
Jones, K., & Day, J. D. (1997). Discrimination of two aspects of cognitive-social intelligence from academic
intelligence. Journal of Educational Psychology, 89, 486–497.
Jordan, P. J., Ashkanasy, N. M., Hartel, C. E., & Hooper, G. S. (2002). Workgroup emotional intelligence.
Scale development and relationship to team process effectiveness and goal focus. Human Resource
Management Review, 12, 195–214.
Keating, D. P. (1978). Search for social intelligence. Journal of Educational Psychology, 70, 218–223.
Kenny, D. A. (1991). A general-model of consensus and accuracy in interpersonal perception. Psychological
Review, 98, 155–163.
Lance, C. E., Lambert, T. A., Gewin, A. G., Lievens, F., & Conway, J. M. (2004). Revised estimates of dimen-
sion and exercise variance components in assessment center postexercise dimension ratings. Journal of
Applied Psychology, 89, 377–385.
Landy, F. J. (2005). Some historical and scientific issues related to research on emotional intelligence. Journal
of Organizational Behavior, 26, 411–424.
Landy, F. J. (2006). The long, frustrating, and fruitless search for social intelligence: A cautionary tale. In K.
R. Murphy (Ed.), A critique of emotional intelligence: What are the problems and how can they be fixed?
(pp. 81–123). Mahwah, NJ: Lawrence Erlbaum.
Lane, R. D., Quinlan, D. M., Schwartz, G. E., Walker, P. A., & Zeitlin, S. B. (1990). The levels of emotional
awareness scale—A cognitive-developmental measure of emotion. Journal of Personality Assessment,
55, 124–134.
Law, K. S., Wong, C. S., & Song, L. J. (2004). The construct and criterion validity of emotional intelligence and
its potential utility for management studies. Journal of Applied Psychology, 89, 483–496.
Legree, P. J., Heffner, T. S., Psotka, J., Martin, D. E., & Medsker, G. J. (2003). Traffic crash involvement:
Experiential driving knowledge and stressful contextual antecedents. Journal of Applied Psychology, 88,
15–26.
Lievens, F., Buyse, T., & Sackett, P. R. (2005). The operational validity of a video-based situational judgment
test for medical college admissions: Illustrating the importance of matching predictor and criterion con-
struct domains. Journal of Applied Psychology, 90, 442–452.
Lievens, F., Harris, M. M., Van Keer, E., & Bisqueret, C. (2003). Predicting cross-cultural training perfor-
mance: The validity of personality, cognitive ability, and dimensions measured by an assessment center
and a behavior description interview. Journal of Applied Psychology, 88, 476–489.
Lievens, F., & Sackett, P.R. (2006). Video-based versus written situational judgment tests: A comparison in
terms of predictive validity. Journal of Applied Psychology, 91, 1181–1188.
Locke, E. A. (2005). Why emotional intelligence is an invalid concept. Journal of Organizational Behavior,
26, 425–431.
MacCann, C., & Roberts, R. D. (2008). New paradigms for assessing emotional intelligence: Theory and data.
Emotion, 8, 540–551.
Marlowe, H. A. (1986). Social intelligence—Evidence for multidimensionality and construct independence.
Journal of Educational Psychology, 78, 52–58.
Matsumoto, D., LeRoux, J., Wilson-Cohn, C., Raroque, J., Kooken, K., Ekman, P., et al. (2000). A new test
to measure emotion recognition ability: Matsumoto and Ekman’s Japanese and Caucasian Brief Affect
Recognition Test (JACBART). Journal of Nonverbal Behavior, 24, 179–209.
Matthews, G., Roberts, R. D., & Zeidner, M. (2004). Seven myths about emotional intelligence. Psychological
Inquiry, 15, 179–196.
Matthews, G., Zeidner, M., & Roberts, R. R. (2007). Emotional intelligence: Consensus, controversies, and
questions. In G. Matthews, M. Zeidner, & R. R. Roberts (Eds.), The science of emotional intelligence—
Knowns and unknowns (pp. 3–46). New York, NY: Oxford University Press.
Mayer, J. D., & Geher, G. (1996). Emotional intelligence and the identification of emotion. Intelligence, 22,
89–113.
Mayer, J. D., Roberts, R. D., & Barsade, S. G. (2008). Human abilities: Emotional intelligence. Annual Review
of Psychology, 59, 507–536.
Mayer, J. D., & Salovey, P. (1993). The intelligence of emotional intelligence. Intelligence, 17, 433–442.
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb, W. L. (2007). Situational judgment tests, response
instructions, and validity: A meta-analysis. Personnel Psychology, 60, 63–91.
358 Handbook of Employee Selection

McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M. A., & Braverman, E. P. (2001). Use of situ-
ational judgment tests to predict job performance: A clarification of the literature. Journal of Applied
Psychology, 86, 730–740.
McDaniel, M. A., & Whetzel, D. L. (2005). Situational judgment test research: Informing the debate on practi-
cal intelligence theory. Intelligence, 33, 515–525.
McHenry, J. J., Hough, L. M., Toquam, J. L., Hanson, M. A., & Ashworth, S. (1990). Project a validity results—
The relationship between predictor and criterion domains. Personnel Psychology, 43, 335–354.
Murphy, K. R. (1989). Is the relationship between cognitive ability and job performance stable over time?
Human Performance, 2, 183–200.
Newsome, S., Day, A. L., & Catano, V. M. (2000). Assessing the predictive validity of emotional intelligence.
Personality and Individual Differences, 29, 1005–1016.
Nguyen, N. T., Biderman, M. D., & McDaniel, M. A. (2005). Effects of response instructions on faking a situ-
ational judgment test. International Journal of Selection and Assessment, 13, 250–260.
Nowicki, S. (2004). A manual for the Diagnostic Analysis of Nonverbal Accuracy tests (DANVA). Atlanta, GA:
Department of Psychology, Emory University.
O’Sullivan, M. (2007). Trolling for trout, trawling for tuna. In Matthews, G., Zeidner, M., & Roberts, R. R.
(Eds.), The science of emotional intelligence—Knowns and unknowns (pp. 258–287). New York, NY:
Oxford University Press.
O’Sullivan, M., Guilford, J. P., & deMille, R. (1965). The measurement of social intelligence (Psychological
Laboratory Report No. 34). Los Angeles, CA: University of Southern California.
Palmer, B., & Stough, C. (2001). The measurement of emotional intelligence. Australian Journal of Psychology,
53, 85–85.
Parker, J. D., Hogan, M. J., Eastabrook, J. M., Oke, A., & Wood, L. M. (2006). Emotional intelligence and
student retention: Predicting the successful transition from high school to university. Personality and
Individual Differences, 41, 1329–1336.
Pérez, J. C., Petrides, K. V., & Furnham, A. (2005). Measuring trait emotional intelligence. In Schulze, R. &
Roberts, R. D. (Eds.), Emotional intelligence—An international handbook (pp. 181–201). Cambridge,
England: Hogrefe & Huber.
Petrides, K. V., & Furnham, A. (2003). Trait emotional intelligence: Behavioural validation in two studies of
emotion recognition and reactivity to mood induction. European Journal of Personality, 17, 39–57.
Riggio, R. E. (1986). Assessment of basis social skills. Journal of Personality and Social Psychology, 51, 649–660.
Roberts, R. D., Schulze, R., O’Brien, K., MacCann, C., Reid, J., & Maul, A. (2006). Exploring the validity of
the Mayer-Salovey-Caruso emotional intelligence test (MSCEIT) with established emotions measures.
Emotion, 6, 663–669.
Roberts, R. D., Schulze, R., Zeidner, M., & Matthews, G. (2005). Understanding, measuring, and applying
emotional intelligence. In Schulze, R. & Roberts, R. D. (Eds.), Emotional intelligence—An international
handbook (pp. 311–336). Cambridge, England: Hogrefe & Huber.
Rosenthal, R., Hall, J. A., DiMatteo, M. R., Rogers, P. L., & Archer, D. (1979). Sensitivity to nonverbal com-
munication: The PONS test. Baltimore, MD: Johns Hopkins University Press.
Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment,
credentialing, dentialing, and higher education—Prospects in a post-affirmative-action world. American
Psychologist, 56, 302–318.
Sala, F. (2002). Emotional competence inventory: Technical manual. Philadelphia, PA: McClelland Center For
Research, HayGroup.
Salovey, P., & Mayer, J. D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9,
185–211.
Salovey, P., Mayer, J., Goldman, S., Turvey, C., & Palfai, T. (1995). Emotional attention, clarity and repair:
Exploring emotional intelligence using the Trait Meta-Mood Scale. In J. W. Pennebaker (Ed.), Emotion,
disclosure, and health (pp. 125–154). Washington, DC: American Psychological Association.
Scherer, K. R., Banse, R., & Wallbott, H. G. (2001). Emotion inferences from vocal expression correlate across
languages and cultures. Journal of Cross-Cultural Psychology, 32, 76–92.
Schmit, M. J. (2006). EI in the business world. In K. R. Murphy (Ed.), A critique of emotional intelligence:
What are the problems and how can they be fixed? Mahwah, NJ: Lawrence Erlbaum.
Schneider, R. J., Ackerman, P. L., & Kanfer, R. (1996). To “act wisely in human relations”: Exploring the
dimensions of social competence. Personality and Individual Differences, 21, 469–481.
Schulze, R., Wilhem, O., & Kyllonen, P. C. (2007). Approaches to the assessment of emotional intelligence. In
G. Matthews, M. Zeidner, & R. R. Roberts, (Eds.), The science of emotional intelligence—Knowns and
unknowns (pp. 199–229). New York, NY: Oxford University Press.
Practical Intelligence, Emotional Intelligence, and Social Intelligence 359

Schutte, N. S., Malouff, J. M., Hall, L. E., Haggerty, D. J., Cooper, J. T., Golden, C. J., et al. (1998). Development
and validation of a measure of emotional intelligence. Personality and Individual Differences, 25,
167–177.
Semadar, A., Robins, G., & Ferris, G. R. (2006). Comparing the validity of multiple social effectiveness
constructs in the prediction of managerial job performance. Journal of Organizational Behavior, 27,
443–461.
Spector, P. E., & Johnson, H. M. (2006). Improving the definition, measurement, and application of emotional
intelligence. In K. R. Murphy (Ed.), A critique of emotional intelligence: What are the problems and how
can they be fixed? (pp. 325–344). Mahwah, NJ: Lawrence Erlbaum.
Sternberg, R. J. (1988). The triarchic mind: A new theory of human intelligence. New York, NY: Penguin
Books.
Sternberg, R. J., Wagner, R. K., Williams, W. M., & Horvath, J. A. (1995). Testing common-sense. American
Psychologist, 50, 912–927.
Stricker, L. J., & Rock, D. A. (1990). Interpersonal competence, social intelligence, and general ability.
Personality and Individual Differences, 11, 833–839.
Tan, H. T., & Libby, R. (1997). Tacit managerial versus technical knowledge as determinants of audit expertise
in the field. Journal of Accounting Research, 35, 97–113.
Terpstra, D. E., Mohamed, A. A., & Kethley, R. B. (1999). An analysis of federal court cases involving nine
selection devices. International Journal of Selection and Assessment, 7, 26–34.
Tett, R. P., Fox, K. E., & Wang, A. (2005). Development and validation of a self-report measure of emo-
tional intelligence as a multidimensional trait domain. Personality and Social Psychology Bulletin, 31,
859–888.
Thoresen, C. J., Bradley, J. C., Bliese, P. D., & Thoresen, J. D. (2004). The big five personality traits and indi-
vidual job performance growth trajectories in maintenance and transitional job stages. Journal of Applied
Psychology, 89, 835–853.
Thorndike, E. L. (1920). Intelligence and its uses. Harper’s Magazine, 140, 227–235.
Van der Zee, K., Thijs, M., & Schakel, L. (2002). The relationship of emotional intelligence with academic
intelligence and the Big Five. European Journal of Personality, 16, 103–125.
Van Iddekinge, C. H., Raymark, P. H., & Roth, P. L. (2005). Assessing personality with a structured employ-
ment interview: Construct-related validity and susceptibility to response inflation. Journal of Applied
Psychology, 90, 536–552.
Van Rooy, D. L., Dilchert, S., Viswesvaran, C., & Ones, D. S. (2006). Multiplying intelligences: Are general,
emotional, and practical intelligences equal? In K. R. Murphy (Ed.), A critique of emotional intelligence:
What are the problems and how can they be fixed? (pp. 235–262). Mahwah, NJ: Lawrence Erlbaum.
Van Rooy, D. L., & Viswesvaran, C. (2004). Emotional intelligence: A meta-analytic investigation of predictive
validity and nomological net. Journal of Vocational Behavior, 65, 71–95.
Van Rooy, D. L., Viswesvaran, C., & Pluta, P. (2005). An evaluation of construct validity: What is this thing
called emotional intelligence? Human Performance, 18, 445–462.
Wagner, R. K. (1987). Tacit knowledge in everyday intelligent behavior. Journal of Personality and Social
Psychology, 52, 1236–1247.
Wagner, R. K., & Sternberg, R. J. (1985). Practical intelligence in real-world pursuits—The role of tacit knowl-
edge. Journal of Personality and Social Psychology, 49, 436–458.
Weis, S., & Süss, H.-M. (2005). Social intelligence—A review and critical discussion of measurement con-
cepts. In Schulze, R. & Roberts, R. D. (Eds.), Emotional intelligence—An international handbook
(pp. 203–230). Cambridge, UK: Hogrefe & Huber.
Wilhelm, O. (2005). Measures of emotional intelligence: Practice and standards. In R. Schulze & R. D. Roberts
(Eds.), Emotional intelligence—An international handbook (pp. 131–154). Cambridge, England: Hogrefe
& Huber.
Witt, L. A., & Ferris, G. R. (2003). Social skill as moderator of the conscientiousness—Performance relation-
ship: Convergent results across four studies. Journal of Applied Psychology, 88, 809–820.
Wong, C. M., Day, J. D., Maxwell, S. E., & Meara, N. M. (1995). A multitrait-multimethod study of academic
and social intelligence in college-students. Journal of Educational Psychology, 87, 117–133.
Wong, C. S., & Law, K. S. (2002). The effects of leader and follower emotional intelligence on performance
and attitude: An exploratory study. Leadership Quarterly, 13, 243–274.
Zeidner, M., Matthews, G., & Roberts, R. D. (2004). Emotional intelligence in the workplace: A critical review.
Applied Psychology—an International Review, 53, 371–399.
This page intentionally left blank
Part 4
Decisions in Developing, Selecting,
Using, and Evaluating Predictors
Ann Marie Ryan and Neal W. Schmitt,
Section Editors
This page intentionally left blank
17 Decisions in Developing and
Selecting Assessment Tools

Nancy T. Tippins, Jone M. Papinchock, and Emily C. Solberg

INTRODUCTION
Typically, employers embrace testing for several reasons: identifying capable applicants; enhancing
the productivity of their workforce; minimizing error, waste, and accidents; complying with Equal
Employment Opportunity (EEO) regulations; and minimizing staffing costs. Once employers have
decided that formal selection procedures will be useful, they then must determine what measures
best meet the organization’s needs. This chapter reviews the decisions that must be made in choos-
ing the selection procedure and identifying the criteria that must be considered, as well as the poten-
tial ramifications of those decisions.
There are four basic decisions associated with developing a selection system: (a) What con-
structs should be measured in the selection system? (b) How should the chosen constructs be
measured? (c) How should validity evidence be gathered? and (d) How should the resulting
scores be used? Answers to each of these questions are dependent upon several criteria. The
choice of constructs to be measured must take into account criteria such as the feasibility of
measuring a construct; the number of knowledge, skills, abilities, and other personal charac-
teristics (KSAOs) to be measured; the utility of measuring multiple constructs; and the goals
of the organization. The decision regarding how to measure the chosen constructs depends on
factors such as the anticipated validity and utility of various kinds of tests, the appropriateness
and feasibility of different validation strategies, the potential group differences in test scores and
adverse impact, the availability of alternative selection procedures, possible legal ramifications,
applicant reactions, and a host of administrative concerns including costs of development and
administration, time requirements for development and administration, and feasibility within
the staffing context. Resolving the question of score use must take into account the form of the
test score, the ways in which test scores can be combined and weighted, and the procedures for
choosing cutoff scores and bands.
Although each criterion will be discussed independently, none can be viewed in isolation. Nor
can the criteria be considered in strict sequence. One has an effect on another, and decisions may
need to be revisited as additional criteria are considered. Similarly, many of the criteria must be
considered simultaneously; optimization of one may have a direct effect on another. For example,
most employers cannot simply choose the test with the highest validity without consideration of the
cost and feasibility of administration in the staffing context.
Organizations pursue different goals when developing and implementing a selection procedure,
and these goals define which criteria are most relevant to the decisions that must be made. Some
common goals include minimizing costs, maximizing prediction of performance, avoiding legal
challenges, achieving a diverse workforce, and engendering positive applicant reactions toward
the job and organization. When the goal is maximizing performance, criteria related to valid-
ity may be paramount; when the goal is cost control, criteria related to costs and time may be

363
364 Handbook of Employee Selection

most important. Often these goals are overlapping. For example, positive applicant reactions often
increase an applicant’s propensity to remain in the application process and thus reduce the costs
of recruiting. A diverse work force may increase the productivity of the organization or enhance
its ability to serve its customers. Avoidance of challenges to the selection system certainly reduces
overall organizational costs in the long term. In addition, the choice of one goal may heighten
the importance of another. When controlling costs takes precedence over other goals, criteria
related to the efficiency of administration may become more important and applicant reactions
less important.
The value placed on each of the goals differs by organization. Some organizations may place a
greater value on accuracy of prediction whereas others may focus on creating a diverse workforce
and engendering positive applicant reactions. Thus, an important step in any selection project is to
establish a clear understanding of the goals that the organization hopes to achieve.
The remainder of the chapter reviews the basic decisions and the criteria used in making those
decisions. It also highlights which issues are most important for the achievement of the organiza-
tion’s particular goals.

WHICH CONSTRUCTS SHOULD BE MEASURED?


A starting point in the decision-making process is the choice of what constructs to measure (see
Chapters 12–16, this volume, for more information on the measurement of specific constructs).
Should the selection process include a single measure of cognitive abilities (e.g., math), personality
factors (e.g., conscientiousness), or physical abilities (e.g., strength), or measures of more than one
of these constructs? The extent to which the whole content domain must be measured by a selection
process is not clear. Should all important KSAOs be measured? The most important ones? Those
that can be measured most efficiently? Some important criteria to consider when answering this
question are discussed below.

IMPORTANCE AND NEEDED AT ENTRY


Legal and professional guidelines such as the Uniform Guidelines on Employee Selection Procedures
(Uniform Guidelines; Equal Employment Opportunity Commission, 1978), the American
Psychological Association’s Standards for Educational and Psychological Tests (AERA, APA, &
NCME, 1999), and the Principles for the Validation and Use of Personnel Selection Procedures
of the Society for Industrial and Organizational Psychology (SIOP Principles; SIOP, 2003) make it
clear that tests measuring only the important KSAOs that are needed at entry to the job should be
chosen or constructed. If math and reading are important KSAOs for completing tasks on a job, and
job incumbents are required to have these KSAOs when they begin work, tests of math and reading
are likely to be predictive of job performance.
However, performance on a test should not be moderated by KSAOs that are not required for
the job. For example, a math test administered on a computer might not be appropriate if the job
does not require use of computers. If a computerized test is used, the test administration should
begin with a tutorial of computer functions that leads to a level of mastery sufficient for taking
the test. Similarly, a measure of manual dexterity that requires the test-taker to read detailed
instructions will confound the measure of manual dexterity with reading proficiency, and steps
must be taken to communicate the test directions in another form if reading is not also required
for the job.
Further complications may arise if an applicant for a job requiring manual dexterity skills
but not reading skills has a visual disability that makes reading printed instructions impossible.
If this applicant meets the requirements for protection under disability laws [e.g., the Americans
with Disabilities Act (ADA) and the ADA Amendments Act of 2008], the organization must find
a different method for assessing the applicant’s manual dexterity skills. Such issues make a clear
Decisions in Developing and Selecting Assessment Tools 365

delineation between the KSAOs to be measured (that are important to job performance) and the
KSAOs required to take the test a critical part of deciding the type of test to use.

FEASIBILITY OF MEASURING THE CONSTRUCT


Some important constructs are notoriously difficult to measure accurately. For example, highly pre-
dictive measures of an individual’s integrity are difficult to find or develop. Consequently, a factor
in deciding which construct to measure will be the extent to which the construct can be measured
validly and reliably.
Even when a set of constructs can be measured, organizational constraints such as those imposed
by the organization’s staffing context can limit which of the constructs will be measured. For exam-
ple, if an employer can only administer computer-based tests over the Internet, a direct measure of
oral communication skills or physical abilities would not be possible.

NUMBER OF KSAOS TO MEASURE


In practice, it is not clear how many KSAOs should be measured and how many tests should be used
to assess the job content domain (see Chapter 7, this volume, for an additional discussion on choos-
ing predictors). When criterion-oriented validity data are available, the researcher can make deci-
sions about the number of tests on the basis of the incremental validity of additional tests. However,
in most content-validity contexts, such data are not available. As noted in the SIOP Principles,
“The sufficiency of the match between (the) selection procedure and work domain is a matter of
professional judgment based on evidence collected in the validation effort” (SIOP Principles, 2003,
p. 25).
A thorough job analysis often results in far more important KSAOs that are needed at entry than
are feasible to test (see Chapter 4, this volume, for more information regarding work analysis.) The
SIOP Principles (2003) indicated that measurement of all the important KSAOs is not necessary:
“Not every element of the work domain needs to be assessed. Rather, a sample of the work behaviors,
activities, and worker KSAOs can provide a good estimate of the predicted work performance”
(p. 24).
In contrast, Goldstein, Zedeck, and Schneider (1993) indicated that measuring only 10–20% of a
job would be problematic. They suggested using a guide of measuring KSAOs that linked to at least
50% of the tasks. However, they noted that measuring the KSAOs linked to 50% of the tasks might
not be possible in some cases.
For practitioners in the United States, the salience of the question of how much of the job content
should be represented by tests increases when one realizes that this very question has been litigated
on multiple occasions. When a selection practice is challenged under Title VII of the Civil Rights
Act of 1964 as amended in 1991 (Title VII), the user of a test(s) supported on the basis of content-
oriented validity may need to defend how well the job content domain is represented by the test(s).
Court opinions have varied on how the sufficiency of job content representation is determined; thus,
little general guidance can be gleaned from these opinions.
Similar concerns about the extent to which the job domain is covered also apply to tests that are
validated using a criterion-oriented strategy. When the user has relied upon a criterion-oriented
strategy, the problem may actually be exacerbated if no formal content validation was done, because
there may be no quantitative support for the number of KSAOs to measure. Many practitioners
argue that a test measuring a single, important KSAO can be demonstrated to be job-related and
a business necessity by virtue of the results of the job analysis and the criterion-oriented validity
study. However, efficient practice may conflict with the goal of avoiding challenges as regulatory
agencies and courts may question the practice of using a test of a single, albeit important KSAO,
using the rationale that even strong criterion-oriented validity coefficients do not explain a great
deal of the variance in performance.
366 Handbook of Employee Selection

Another dimension of the issue of how much of the job domain to cover is the utility gained by
measuring multiple KSAOs. Many organizations attempt to balance their needs for accurate evalu-
ation of candidates’ skills, cost-effective staffing procedures, and legal compliance. Consequently,
they question the addition of a test that adds little incremental validity although additional KSAOs
are being measured.

IMPORTANCE OF CRITERIA RELATIVE TO THE GOALS OF THE ORGANIZATION


Although the goals of an organization typically have little effect on the requirement that all
constructs measured must be important and needed at entry, organizations have differing views on
the number of KSAOs to measure (see Chapters 9–11, this volume, for more information regarding
employee selection and organizational strategy). For example, organizations that are focused on
the cost-effectiveness of their selection programs will pay a great deal of attention to the number
of constructs measured and their utility. These organizations attempt to control the costs of test
development, validation, and administration by using fewer measures and ensuring that each
measure adds substantially to the overall validity of the selection process. In contrast, organizations
focused on the legal defensibility of a selection process may be more likely to include tests that
provide broader coverage of the domain of critical KSAOs.

HOW SHOULD THE CONSTRUCTS BE MEASURED?


Once the constructs to be measured are identified, the organization must determine the best way to
measure them. Many methods of testing exist to measure different content areas: multiple-choice
tests of cognitive abilities, demonstrations of job skills through work samples, interviews measur-
ing interpersonal and communication skills, etc., and each has its advantages and disadvantages.
Testing professionals must weigh their measurement options against several criteria.

TIMING
An important organizational consideration in the choice of selection tests is timing: How quickly
can the tests be deployed? Some employers have the luxury of time to develop a new selection
program that might be validated locally while continuing the use of an existing program; however,
many organizations lack an existing selection process and need to rapidly develop and install one.
The immediate need for a selection process may guide an organization toward off-the-shelf tests that
can be quickly validated through a generalizability study or a content-oriented validity strategy.

GROUP DIFFERENCES IN TEST SCORE MEANS AND ADVERSE IMPACT IN SELECTION DECISIONS
Regardless of the requirements of the applicable EEO laws, all organizations that desire a diverse
workforce and want to avoid unnecessarily eliminating members of protected groups are concerned
about group differences in test score means. Thus, another criterion that often informs decisions
about the choice of tests is group differences in test performance.
Group differences may be assessed in several ways. In addition to statistically analyzing mean
score differences, “adverse impact,” a legal term in the United States, can be evaluated in several
ways. Perhaps the most common method, which is described in the Uniform Guidelines, is to deter-
mine whether the selection rate of a protected group (e.g., female applicants, Hispanic applicants) is
less than 80% of the selection rate for the majority group (e.g., male applicants, White applicants).
Another way to calculate adverse impact is to compare the two selection rates using a statistical test
(e.g., Fisher’s exact test). Regardless of how group mean differences or adverse impact are assessed,
organizations must decide whether to avoid, reduce, or eliminate them through their choice of tests,
decisions on cutoff scores, or some other strategy such as alternate recruitment strategies.
Decisions in Developing and Selecting Assessment Tools 367

CONSIDERATION OF ALTERNATIVES
Another important factor when deciding what constructs to measure, how to measure them, and
how to combine the scores is consideration of alternate selection procedures. During this process,
an organization may compare various measures, combinations of measures, or methods of making
selection decisions (e.g., setting pass/fail cutoff scores, banding scores, using top-down selection)
to determine which ones result in the highest validities with the least amount of group differences.
In the United States, this quest for alternate selection procedures is an expectation first set by the
Uniform Guidelines.
It is not clear what was envisioned when the Uniform Guidelines or APA Standards were
written in terms of the steps that should be taken to comply with the consideration of alternatives
provision. From the practitioner’s perspective, many unanswered questions remain regarding the
effort that must be exerted to find acceptable alternatives. What is a legitimate alternative—a
different measure of the same construct, a measure of a different construct, different ways of
combining measures, different cutoff scores? In addition, there are practical issues to consider
when attempting to compare alternatives. For example, a comparison of the validity and adverse
impact of two alternatives assumes the availability of criterion-oriented validity data and adverse
impact data. Content-oriented validity strategies do not yield validity coefficients, and many do
not produce estimates of adverse impact unless the tests have been administered in a pilot study.
Often, validity data from criterion-oriented studies and adverse impact data are not available, or
such information has not been collected in similar situations (e.g., for similar jobs, using a similar
criterion).
If studies containing validity coefficients, group means, and adverse impact data can be obtained
for tests under consideration, the next step is to make comparisons among the tests. The Uniform
Guidelines indicate:

Where two or more selection procedures are available which serve the user’s legitimate interest in
efficient and trustworthy workmanship, and which are substantially equally valid for a given purpose,
the user should use the procedure which has been demonstrated to have the lesser adverse impact.
(Section 3B)

However, there is no guidance on what is “substantially equally valid” nor on what has “lesser
impact.” Thus, even if two tests have comparable validity coefficients and adverse impact data avail-
able, there are no common methods in the field for determining how much of a difference in valid-
ity coefficients or adverse impact findings is sufficient to warrant the use of one test over the other.
The test user must make this judgment. These difficulties with the process, in not only locating the
information but also determining how to make the comparisons, may explain why, in practice, some
test users have not made thorough reviews as pointed out by Guion (1998).
Broad reviews of alternative selection procedures may be limited in some circumstances. For
example, it is not uncommon in consulting firms that have a set of proprietary tests with robust
criterion-oriented validity evidence to choose from among those tests when developing a selection
process. Thus, the consideration of alternatives occurs by making decisions only on the basis of the
firm’s own tests. In fact, an organization that hires a particular consulting firm because of its tests
might question the consulting firm if tests from other firms are recommended.
Further, if a human resources professional or operating manager is choosing the tests without
the assistance of a qualified testing professional, he or she may not have the background necessary
to make nuanced judgments about the validity of different tests (Murphy & Davidshofer, 1988).
Further, decision-makers lacking a testing background may be easily swayed by test advertisements
that broadly claim the “validity” of a test, suggest a lack of adverse impact, or extol the ease of
administration. Test users who fail to understand that validity is not a characteristic of a test, but
rather a description of how well the test score predicts performance or other relevant criteria are not
in a position to critically evaluate such claims.
368 Handbook of Employee Selection

Regardless of the availability of validity and adverse impact data and the difficulty in interpret-
ing observed differences, the presence of a “better” alternative will have direct implications for the
choice of measure.

CONSIDERATION OF LEGAL RAMIFICATIONS


Although theoretical information and research findings should always be the primary drivers of
the choice of which tests to use, the outcome of previous litigation can inform the test user about
the potential ramifications of those choices, particularly in the United States (see Chapters 29 and
30, this volume, for a more detailed account of legal issues related to employee selection). Some
types of tests attract more legal scrutiny than others. Certainly those that generally produce large
group differences (e.g., multiple-choice measures of cognitive ability) are more often challenged
than those with smaller group differences (all other factors being equal). Additionally, tests in the
form of structured or unstructured interviews are less frequently reviewed than multiple-choice
tests in general, possibly because of the ubiquity of the use of interviews.
In addition to concerns about the likelihood of legal challenges, some organizations attempt to
avoid administrative challenges such as grievances and arbitrations by labor unions resulting from
their choice of tests. For example, an organization may avoid a personality test that uses a lie scale
or a social desirability scale because of its union’s negative reaction to these scales. Indeed, many
organizations with a represented labor force will avoid any selection tool that does not have univer-
sally right and wrong answers.
The type of test used in combination with the feasibility of an appropriate validity study may also
have legal implications. For example, when a criterion-oriented validity study is not feasible, fewer
questions may be raised about a work sample test that is obviously job-related than a personality
inventory or a more abstracted measure of cognitive ability. Thus, when a criterion-oriented valida-
tion strategy is not possible, some organizations will consider only those tools for which a content-
oriented strategy will provide compelling evidence.

ADMINISTRATIVE CONCERNS
In addition to the considerations of validity, adverse impact, and costs and timing of test development
and validation discussed above, administrative concerns are critical factors in the decision regarding
what test to use (see Chapter 7, this volume, for additional information on administrative concerns
associated with employee testing). The use of a particular test that is valid and complies with
legal guidelines and professional standards may not be feasible in the context of an organization’s
staffing environment. For example, a multiple-choice test of cognitive ability that must be proctored
is not practical in an organization that processes applications via the Internet and does not have
the facilities and staff available to administer tests in proctored settings. Similarly, an assessment
center designed to evaluate customer service skills may not be cost-effective when the hiring rate
of customer contact personnel is high because of turnover. Some common administrative concerns
are discussed below.

Administrative Personnel
The test user must consider whether the personnel required to administer and score the test are
available and affordable. For example, a work sample test measuring knowledge and skill in the
repair of electronic equipment may require that an electronics technician score the test. In this case,
a work sample may not be feasible simply because the personnel needed to administer and score the
work sample are not available.
Cost is a related factor. Even if personnel with the prerequisite skills are available to admin-
ister assessments, an organization may find the cost of using many of these employees prohibi-
tive. For example, an organization using a structured interview as a first screen for a high-volume,
Decisions in Developing and Selecting Assessment Tools 369

high-turnover position may well find the cost of the interviewer exceeds the value of the interview.
Organizations must take into account not only the cost of the administrative personnel’s time but
also the cost of time for training the administrator and scorer. The added cost of training inter-
viewers may place a heavy financial burden on the organization. Moreover, additional costs are
incurred by conscientious organizations that also re-train, calibrate, and monitor personnel to main-
tain proficiency.

Testing Facilities and Equipment


In addition to adequate personnel to administer selection tests, appropriate facilities are also neces-
sary, and the test user must consider the initial cost of the facilities and equipment and the cost for
ongoing maintenance and upgrades associated with the technology used for computer-administered
tests. Perhaps the most expensive form of testing in terms of facilities is an assessment center that
requires space for participants’ independent work, interviews with assessors, and group exercises
with other participants. Nevertheless, other forms of testing can also require expensive facilities.
Measures of physical abilities such as ladder or pole climbing exercises can be expensive to build.
Similarly, more abstract kinds of physical ability tests such as tensiometers can be expensive to buy
and maintain because they must be recalibrated regularly to ensure accurate ratings.
Although cost may be the overriding concern for many organizations when considering testing
facilities, the mobility of facilities and equipment can also be important in certain situations.
For example, physical ability tests that are based on samples of the tasks are often difficult if not
impossible to move, which could be problematic for an organization with centralized employment
offices and widely dispersed applicants. Similarly, facilities for assessment centers may be expensive
and time-consuming to replicate in multiple locations. Large amounts of bulky testing equipment
(e.g., computers, equipment for work samples) may not be feasible in large-group testing situations.
Computerized testing can be an accurate, inexpensive form of test delivery when all tests are
administered in dedicated employment offices. However, testing that occurs in large batches or
in changing locations such as job fairs may make computer-administered testing impractical. In
these situations, the cost of many computers to be used in batch hiring or the cost of moving them
to specific locations may be excessive. Additionally, some test administration platforms require a
connection to the Internet that may not be available in some testing locations.

Proctored and Unproctored Internet Testing


An issue that has continued to gain more attention throughout the years among test users is the ques-
tion of whether to use proctored or unproctored Internet testing (see Chapter 18, this volume, for
additional information on unproctored testing). Although there is substantial disagreement on the
appropriateness of unproctored Internet testing, many test users agree that it is rarely appropriate for
high-stakes testing alone without some form of subsequent verification testing in a proctored setting
(Tippins et al., 2006). Some of the important concerns regarding unproctored, high-stakes Internet
testing are the opportunity for cheating, the question of the identity of the test-taker, variations in
the testing environment that may result in nonstandard test administration, the potential for com-
promising test security, differences in quality and speed of computers and Internet access, and the
applicability of reliability and validity evidence obtained in a proctored setting to an unproctored
use. Although some employers use only tests that are not particularly affected by cheating because
they do not have right and wrong answers (e.g., personality tests), these unproctored Internet tests
retain the other potential problems.

Time Requirements
The time to administer various types of tests is also a factor often considered in choosing a test.
The time spent testing has costs associated with the personnel required for administration and the
use of facilities. Instruments like situational judgment inventories (SJIs) can be good predictors that
also provide a realistic preview of the work. However, because of the reading demand, the SJIs may
370 Handbook of Employee Selection

require more time to administer than other forms of tests. In approximately the same amount of
time, an employer could administer either a 20-item SJI or a 100-item biodata form.
Another concern about lengthy tests is their impact on applicant flow. Anecdotally at least, many
recruiters believe that applicants have a low tolerance for lengthy evaluations, especially when they
are administered in unproctored Internet settings. Even when the applicant is asked to take a test at
an employer’s site, the amount of time spent testing can be a deterrent to maintaining the applicant’s
interest. When applicants are currently employed, convincing them to take significant amounts of
leave from their jobs may be difficult. Similarly, the number of tests administered can pose a chal-
lenge. When the testing program is based on multiple hurdles and the applicant is asked to return
multiple times to take tests in the sequence, the number of tests and the time required become major
factors in the applicant’s decision to remain in the hiring process.

CONSEQUENCES OF POOR HIRING DECISIONS


The consequences of hiring an applicant without the necessary skills should also be considered
when choosing selection tests. If the repercussions of hiring someone who lacks skills are severe,
more accurate prediction is required, and a more extensive selection system is often used. (A more
elaborate selection procedure is typically one that covers more of the KSAOs required for the job
or one that measures them in a more reliable and valid manner, either in a high-fidelity context
or in multiple measurements.) When retraining costs are high or the consequences of error are
high, organizations are often more willing to implement comprehensive selection procedures. For
example, if a large amount of on-the-job training is necessary, the cost of retraining a replacement
for an unsuccessful employee could outweigh the cost of a more elaborate hiring system, making
an extensive selection system more viable. Similarly, a more comprehensive selection system would
be recommended when hiring individuals to fly airplanes or operate chemical or nuclear plants
because an error caused by an employee without the requisite skills could lead to injury, death, or
widespread property damage. In contrast, an organization may use less extensive selection instru-
ments when the repercussions of an error are relatively minor or when the cost of training a replace-
ment employee is minimal, such as hiring in many entry-level jobs.

ORGANIZATION REACTIONS
Most testing professionals who have spent time inside of organizations have learned that organiza-
tions have idiosyncratic preferences for certain types of tests and strong dislikes for others. For
example, some organizations promote the idea that with some hard work and a little coaching, an
individual can become anything he or she wants. Thus, tests that measure relatively immutable traits
(e.g., personality tests) or are based on past experiences (e.g., biodata) are not acceptable. Instead,
tests measuring developable skills and abilities (e.g., skills tests, achievement tests) or knowledge
that can be acquired are preferred. Other organizations promote the idea that they hire the best
by hiring individuals who have proven their skills by graduating from top schools. Consequently,
asking individuals to demonstrate their mental prowess through measures of cognitive ability is
anathema.

APPLICANT REACTIONS
When designing selection systems, applicants’ reactions should be considered for several reasons.
For example, applicants who are unhappy with a selection system may withdraw from the selection
process, may have an increased potential for filing lawsuits, and may spread a negative reputation
for the organization.
In recent years, several research studies have investigated the role of various factors such as
test type, administration format, procedural characteristics, and personal variables on applicant
Decisions in Developing and Selecting Assessment Tools 371

reactions to the testing event (see Gilliland, 1993 for a theoretical model of applicant reactions).
In general, the research indicates that tests are perceived more positively when the relationship
between the content of the test and the duties of the job is clear to the applicants.
However, more research is needed before concluding that applicants’ reactions to selection pro-
cedures actually predict applicant behaviors (e.g., withdrawal from the selection process, job accep-
tance, job performance). Moreover, it is important to remember that an applicant’s performance
on a test may override other variables associated with the test or the testing context. As with other
criteria, an applicant’s reaction is not the only factor in deciding which test to use. For example,
if a job requires cognitive ability, the finding that cognitive tests are not perceived as favorably by
applicants as interviews and work samples may be irrelevant.

HOW SHOULD VALIDITY EVIDENCE BE GATHERED?


Once the constructs to be measured are identified and the method for measuring them is deter-
mined, the organization must determine what approaches to accumulating evidence of validity will
be used (see Chapters 2, 3, and 5, this volume, for more detailed discussions of validity).

VALIDITY OF THE TEST


Validity is a key factor in the selection of a test. It makes little sense to include a test in a selection
process for which there is little or no evidence of validity for the kind of inference to be made (e.g.,
using graphology to measure a cognitive ability). Because the results of past validity studies are
often indicative of future findings, the literature on the validity of inferences made from scores on
particular kinds of tests for specific populations is useful in answering the question, “How shall I
measure this construct?” Consequently, researchers deciding what tests to include in a battery often
review meta-analytic studies and other research to guide their choice of tests.
When making decisions about the selection process, testing professionals often consider reports
of the incremental validity of various tests (see Schmidt & Hunter, 1998) when they are combined
with a standardized, multiple-choice measure of cognitive ability. In addition, testing professionals
are well advised to consider the complexity of the job in estimating what the validity of a specific
test is likely to be (Hunter, Schmidt, & Judiesch, 1990).
The use of content- and/or criterion-oriented validity evidence as a means of choosing a test
is sometimes overlooked in practice, particularly when individuals other than those trained in
industrial-organizational (I-O) psychology are making the decision (Rynes, Colbert, & Brown,
2002). Possible reasons for this omission are inadequate or missing information about the validity
of tests being considered or a lack of understanding of the concept of validity or the meaning of the
available data. Some decision-makers within organizations tend to rely on their idiosyncratic beliefs
about what kinds of measures and what constructs are most useful for predicting job performance
and other criteria of interest (e.g., turnover, absenteeism). Some may not know where to find relevant
information about validity; others may not understand validity data even when they have access to
them. A few may even discount the value of such information.

APPROPRIATENESS AND FEASIBILITY OF VALIDATION STRATEGIES


As discussed, the validity of inferences can be derived from studies using different strategies for
accumulating evidence (e.g., content-oriented strategies, criterion-oriented strategies, transportabil-
ity studies, or other validity generalization techniques). However, the feasibility and appropriateness
of different kinds of validity strategies vary according to the organizational context (e.g., number
of incumbents), type of test, and standards of professional practice. For example, a content-oriented
validity strategy may be appropriate for a structured interview that assesses problem-solving ability
but not for a multiple-choice test of abstract reasoning. The relationship between problem-solving
372 Handbook of Employee Selection

interview questions and the KSAOs may be easier to demonstrate with a content-oriented strategy
than that between abstract reasoning test items and the KSAOs. Moreover, relevant court cases
regarding test validity and concerns regarding future legal defensibility may guide the employer
toward one strategy over another. Many factors impact the feasibility of different kinds of validity
studies. Organizations with a small incumbent population and low hiring volume are not likely to
have samples large enough to meet the necessary statistical assumptions in a study using a criterion-
oriented strategy. In these cases, the primary strategy for establishing validity may be a content-ori-
ented one. Other possible validation strategies include the use of a validity generalization strategy
such as a transportability study (a study conducted to justify the use of a selection procedure based
on the validity results of another study) or reliance on the results of meta-analyses of validity studies
involving relevant instruments, criteria, and jobs. Additionally, some organizations may address the
problem of validation studies with small incumbent populations by using a synthetic or job compo-
nent validity approach in which validity inferences are based on the relationship between scores on
a test and performance on a component of the job.
New jobs can pose special problems for content- and criterion-oriented validation. It may not be
possible to obtain local subject matter expert (SME) input for a job that currently has no incumbents
or supervisors, so a traditional content-oriented study cannot be conducted. In addition, the lack of
incumbents makes a concurrent criterion study impossible. In some cases, archival information on
the validity of a test for a particular job and criterion may be used on an interim basis to justify the
use of the test. In other cases, the test user may gather organizational information about the impetus
for the newly created job, proposed minimum qualifications, jobs from which current employees
will be promoted, and proposed training; information about similar jobs from external sources (e.g.,
O*NET™, the literature); as well as input from those within the organization who are designing the
job about the work processes that will be covered by the job, the tasks that need to be performed,
the equipment that will be used, etc. This information can then be used to make linkages between
the KSAOs of the new job and the proposed tests.
The presence or absence of an internal I-O psychologist or other testing professional may deter-
mine the feasibility of certain kinds of validity studies or test development efforts. An organization
without qualified personnel may be overwhelmed developing tests or executing a validation study.
Thus, the lack of an internal testing professional may force the organization to limit its consider-
ation of tests to those offered by test publishers and consulting firms or those that can be easily
validated without extensive internal guidance.
Another complicating factor for content- and criterion-oriented validation strategies can be the
consequences of the participation of employees. When the need for test security is high (e.g., police
and fire exams), the involvement in the test design process or validation effort of incumbents may
draw into question whether the test content has remained confidential or has been compromised.
It merits noting that although some organizations will adapt their choice of selection tests to the
type of validation effort they are willing and able to make, others will simply choose their selection
tests and eschew the validation effort altogether. Just as some managers discount the information on
validity of inferences, some will also ignore practice guidelines that indicate the conditions under
which various strategies are appropriate. Consequently, small-sample criterion-oriented validity
studies are not infrequent.

COST OF TEST DEVELOPMENT AND VALIDATION STUDIES AND


THE UTILITY OF THE SELECTION PROGRAM

There are many sources of costs associated with developing and validating a testing. Validation
efforts can be expensive in terms of the time of employees who participate in job analysis activi-
ties, take experimental tests, or provide criterion data. Even retrieving archival performance data
can be difficult and time-consuming. The cost of equipment and supplies is another consideration.
Occasionally, special facilities for the validity study must be arranged and paid for. Moreover, the
Decisions in Developing and Selecting Assessment Tools 373

internal and/or external consultants who guide the validation effort add to the cost of developing
and validating a selection program. Many organizations have limited funds for the validation and
operational use of test programs. They must consider the cost of various kinds of validation efforts
when choosing the most appropriate test. For many, a less costly validation effort such as a trans-
portability study may dictate the use of an off-the-shelf test.
The source of the budget for these expenditures can become an important factor. For example, in
some organizations, test development and validation expenses are paid from a limited, centralized
human resources budget whereas administration costs come from richer, distributed operational
budgets. Thus, there is less money for development and validation and more for administration. In
these situations, an organization might be driven to select tools that are commercially available, less
costly to develop (e.g., interviews, work samples), or less costly to validate (e.g., those that can be
justified through a transportability study or a content-oriented validity strategy). An organization
that is less cost sensitive might prefer to develop and conduct a criterion-oriented validity study of a
proprietary test tailored to its industry, core values, or culture rather than buy one off-the-shelf.
When an organization has confidence in the value of its selection program, an important factor
can be the ownership of the selection tools. For some organizations, the volume of test use dictates
proprietary ownership. In others, the competitive advantage that accrues from the sole use of a
selection process may direct the employer to a proprietary test. When the value of a business’s ser-
vices is derived from something other than its employees (e.g., natural resources), a test shared with
others may be sufficient for its needs. In a few situations (e.g., utility companies), one organization
dominates a geographical area, and applicants come primarily from regional pools. Tests shared
with others have little effect on the organization’s competitive advantage.
Perhaps the ultimate yardstick for determining a test’s value to the organization is its utility,
which takes into account not only its costs but also its benefit. A test that is expensive to develop,
validate, and administer may be worth more to the organization than a less expensive test if its abil-
ity to predict future job performance is also higher.

HOW SHOULD SCORES BE USED?


Once the constructs to be measured have been determined and appropriate measures and validation
strategies of each identified, the test user must consider how he or she will calculate, report, and use
the resulting test scores (see Chapters 7 and 18, this volume, for more information about the use of
test scores). The criteria for this set of decisions include the form of the test score used, the combina-
tion of test scores, and their operational use.

CALCULATION AND FORM OF REPORTED TEST SCORE


The test user must determine how to calculate test scores (e.g., points are given for more than
one option of a question, a different number of points is given for different questions, points are
subtracted for guessing) and which form of test score (e.g., raw scores, percent scores, percentile
scores, standardized scores) to present. The determination of the reported score requires
consideration of several factors including the type of test (e.g., power vs. speeded, cognitive ability
versus personality, measurement of a single competency or multiple competencies), the ability of the
test score recipient to interpret it, the purpose of the test score, and the reliability and validity of the
test score. For example, a score based on the number correct or the percent correct may be useful
when communicating the extent to which an individual possesses a body of knowledge. In contrast,
the number correct on a personality inventory would make little sense. Similarly, a percent correct
would be appropriate on a power test but would not be meaningful on a speeded test. A standardized
score might be used when the intent is to provide information about an applicant’s standing relative
to other test-takers. Further, applicants and hiring managers may be less prepared to interpret some
forms of test scores (e.g., norm-referenced percentile scores with multiple norm groups) whereas
374 Handbook of Employee Selection

testing professionals may gravitate toward more complex forms that convey more information about
the individual.

COMBINING SCORES ACROSS TESTS


If multiple tests are used in a selection process, the organization must decide whether to combine
the test scores or to use them individually. Several questions arise: Should test scores be weighted
and combined in a compensatory fashion? Should a multiple hurdle model with cutoff scores on
each test be used? Or would it be more appropriate to use a mixed model in which a minimum level
of performance is required on all tests and then the scores are combined? Answers to questions like
these should take into account the requirements of the job as well as available data that may inform
the decision. A selection procedure for a customer service job that requires problem-solving skills
and service orientation may involve a multiple hurdles approach when job analysis data have estab-
lished that high levels of problem-solving skills do not compensate for low levels of service orienta-
tion nor do high levels of service orientation obviate the need for problem-solving ability. In another
job that requires lower levels of problem-solving skills along with planning and organizing skills, a
combination of test scores allowing for high levels of planning and organization to compensate for
lower problem-solving skills may be more appropriate. In still another job in which problem-solving
activities are 80% of the job and sales activities are 20% of the job, a compensatory model that
weights scores on tests measuring problem-solving and sales (e.g., 80/20) may be appropriate.

USE OF TEST SCORES


Test scores may be used in various ways. Occasionally, little guidance is provided to the recipient
of test scores. In these cases, test scores may be distributed to personnel making hiring decisions
as one source of job-relevant information that they use according to their professional judgment.
In contrast, test scores may be provided with an expectancy table and guidance regarding how the
data should be used. For example, a person in one range of scores may be hired without any other
education or experience credentials whereas another person with a score in a lower range may be
hired only if he or she has certain kinds of experience. Often, cutoff scores (for individual tests or a
battery) requiring minimum performance are applied, and decision-makers are given only pass/fail
information. A common variation to a single cutoff score is score bands that theoretically take into
account the unreliability of individual test scores and assume that all scores within the same band
predict the same level of performance. At times, sliding bands may be used. Finally, many organiza-
tions embrace top-down selection and choose the individual with the highest score.
To inform the decision about how to use a test score, other questions must be posed. For example,
the feasibility of top-down selection depends on factors such as the type of hiring program and
the extent of criterion-oriented validity evidence. Top-down selection can work well when testing
occurs infrequently and the employees are drawn from one pool of qualified applicants; however, it
may be less tenable in a staffing environment in which testing occurs daily because the pool from
which employees are drawn varies from day to day and the top candidate may change every day.
Another important question involves the requirements of the job. Top-down hiring may be appro-
priate if there is a wide range of skill in the applicant population but may result in the employment
of unqualified individuals if there are few highly skilled individuals in the applicant pool for a job
requiring high skills. An organization may need to set a floor on test scores to ensure minimum
skill levels.
This fourth set of decisions can also have an impact on the first two. For example, assume that
an organization decides to use a multiple-hurdle approach that includes a math word problem test, a
reading test, and a personality inventory. On the basis of data from a concurrent criterion-oriented
validity study, the organization decides to set a cutoff score on the math word problems test that is
so high that 90% of those who pass the math word problems test also pass the reading test and 95%
Decisions in Developing and Selecting Assessment Tools 375

also pass the personality inventory—so there is little value in retaining the other two tests. Thus, the
decision on the cutoff score question negates the decision by the organization of which constructs
to measure and what tests to use.

CONCLUSIONS
This chapter has reviewed the issues test users consider in selecting or developing a test for valida-
tion or for operational use on an interim basis. As described, the issues are many, and hard and fast
answers are few. As noted earlier in the chapter, none of these factors can be evaluated without
consideration of the others. For example, the feasibility of test development and validation and their
costs are significant factors in the choice of tests. An organization with few incumbents in a job for
which tests are being considered may not be able to supply enough SMEs to complete job analysis
questionnaires and develop job-relevant tests, test-takers for a concurrent study, or SMEs for a con-
tent study. Even enterprises with many incumbents may not be able to relieve employees from their
job duties for the time needed to assist with test development and validation and maintain smooth
operations.
In addition, the answer to the first question (Which constructs should I measure?) may need
to be revisited depending on the answers to the next three questions (How should the constructs
be measured? How should evidence of validity be gathered? How should scores be used?). An
organization that decides to measure only problem-solving ability because it was the most important
KSAO for a particular job and then decides to use a work sample test may find that the work
sample can measure a broader array of KSAOs than just problem-solving abilities. Conversely, an
organization that decides to measure all of its important KSAOs may find that the number of tests
required is so large that testing requires 3 days and consequently is unaffordable to the organization
and intolerable to applicants.
A particularly difficult, overarching concern is how to arrive at one decision in the face of many
competing demands on the organization. Optimization of all factors would be challenging, if not
impossible. For example, increasing validity while minimizing adverse impact and meeting orga-
nizational constraints of time and cost associated with validation and administration remains a bal-
ancing act rather than a series of discrete decisions. Minimally, it is imperative that those who are
tasked with identifying or developing successful selection systems are familiar with the many deci-
sion points in the process. The test user responsible for designing selection systems must consider
these issues and their ramifications, weigh the tradeoffs, and make fully informed final decisions.

REFERENCES
American Educational Research Association, American Psychological Association, and National Council on
Measurement in Education (1999). Standards for educational and psychological testing. Washington,
DC: American Educational Research Association.
Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department
of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43,
38290–38315.
Gilliland, S. W. (1993). The perceived fairness of selection systems: An organizational justice perspective.
Academy of Management Review, 18, 694–734.
Goldstein, I. L., Zedeck, S., & Schneider, B. (1993). An exploration of the job analysis-content validity process.
In N. Schmitt & W. C. Borman (Eds.) Personnel selection in organizations. San Francisco, CA: Jossey-
Bass.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions. Mahway, NJ: Lawrence
Erlbaum.
Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function
of job complexity. Journal of Applied Psychology, 75, 28–42.
Murphy, K. R., & Davidshofer, C. O. (1988). Psychological testing: Principles and applications. Englewood
Cliffs, NJ: Prentice-Hall.
376 Handbook of Employee Selection

Rynes, S. L., Colbert, A. E., & Brown, K. G. (2002). Human resource professionals’ beliefs about effec-
tive human resource practices: Correspondence between research and practice. Human Resource
Management, 41, 149–174.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychol-
ogy: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Society for Industrial and Organizational Psychology (2003). Principles for the validation and use of personnel
selection procedures (4th ed.). Bowling Green, OH: Author.
Tippins, N. T., Beatty, J., Drasgow, F., Gibson, W. M., Pearlman, K., Segall, D. O., et al. (2006). Unproctored
Internet testing in employment settings. Personnel Psychology, 59, 189–225.
18 Administering Assessments
and Decision-Making

R. Stephen Wunder, Lisa L. Thomas, and Zupei Luo

There are innumerable topics that could be covered under the heading “administering assessments
and decision-making.” Rather than attempt a superficial coverage of a wide range of topics, we have
chosen to focus our attention on two that are of particular relevance to the applied use of assess-
ments in the workplace. The first of these—unproctored Internet testing (UIT)—is a relatively
recent phenomenon and one that not only has the potential to greatly expand the reach of valid
professionally designed assessment but also to destroy valid assessment as we have known it in the
pre-Internet world. The second issue we cover is not new. It involves the aggregation of assessment
information for use by decision-makers. More specifically, it involves the knotty question of how to
maximize accuracy/validity, yet not steal from the authority of decision-makers to make decisions.
There is a thread that ties both issues together, and that is the how the nexus between scientifically
tested practices and the requirements of managerial preference and decision-making can be merged
to yield sound selection solutions.

UNPROCTORED INTERNET TESTING


Probably the biggest single test administration issue of today is that of UIT administration. Cizek
(1999) devoted an entire volume to the issue of cheating on tests, showing it to be a serious problem
even before the advent of widespread testing on the Internet. And, as will be shown below, societal
cheating is a serious and growing problem. There is no reason to believe that high-stakes employ-
ment testing is immune to this trend.
Not so many years ago it would have been hard to find an industrial-organizational (I-O)
psychologist who would support sending a test to a candidate’s home, asking him/her to complete
it as instructed, and send it back, all without any assurance that the responses to the test were
the work of the candidate (and only the candidate), and that the candidate did not record any of
the test’s content. But, today there is great pressure from recruiters, managers, job candidates,
consultants, and test publishers to do just that. Recruiters and managers argue that the employer
must permit this to be competitive with other employers, to cut costs, to reduce time-to-hire, or
simply to demonstrate that the employer is technologically up-to-date by offering online testing
in the location of the candidate’s choice. Candidates, of course, favor it because it puts less
burden on them. And, test publishers recognize that removal of the barrier to testing imposed
by the requirement to proctor necessarily opens up larger markets. This is not to say that these
stakeholder groups do not also see negative aspects to UIT. Some employers may worry about
the legitimacy of any given candidate’s test score where there is no proctoring. Honest job
candidates may worry about being literally cheated out of a job by dishonest candidates. And,
some test publishers may recognize the threat UIT poses to the security of their intellectual
property. It is to say, however, that at present the forces in favor of UIT seem stronger than the

377
378 Handbook of Employee Selection

forces opposed. We do not see a critical mass of any of these stakeholder groups clamoring to
curtail UIT.
A good deal of literature has been produced on this topic, and we will not attempt to duplicate it.
We refer the reader to the International Test Commission (2005); Naglieri et al. (2004); and Tippins
et al. (2006) for excellent overviews of the major issues involved in UIT. For our purposes, four
issues are most relevant:

1. Standardization of the testing environment.


2. Examinee identity authentication and cheating.
3. Test security.
4. Perceived value of testing: Do decision-makers care?

This fourth issue has not, to our knowledge, been addressed in the research: Do decision-makers
believe assessments provide sufficient value to merit efforts to protect them from compromise?

STANDARDIZATION OF THE TESTING ENVIRONMENT


It is a longstanding principle of test administration that the examinee should be provided a testing
environment that is standardized and of sufficient quality that each candidate can perform his/her
best on the assessment. Unfortunately, UIT makes the requirement for a sound, consistent testing
environment problematic. There are technical issues such as bandwidth and its effects on timed tests
and graphic displays, connection reliability, and monitor resolution settings (especially important
to ensure that displays do not “scroll off” or where online simulations require a certain level of
resolution to work properly). Hughes and Tate (2007) surveyed over 500 university students on their
experiences with UIT, 319 of whom reported having been involved in high stakes (selection) UIT.
Thirty-two percent of the respondents to the survey reported having experienced technical difficul-
ties. Of these, 62% said those difficulties adversely affected their performance on the test. There
were also nontechnical issues such as noise and other interruptions. The Hughes and Tate (2007)
study reported that 46% of the UIT examinees surveyed said they experienced such interruptions,
and of these 63% felt the interruptions had negatively impacted their test performance. Huff (2006)
provided corroborating support for the existence of interruptions under UIT conditions. However,
Burke (2006), citing research by Weiner and Reynolds, reported that “the number of distractions
experienced with online testing is no greater than those experienced in traditional proctored or
classroom testing” (Burke, 2006, p. 4). Whether or not UIT has more distractions or only as many
as proctored testing, it cannot be denied that the requirement to provide an acceptable testing envi-
ronment is not given the degree of emphasis with UIT that it formerly received under proctored test-
ing. In addition, under traditional proctored conditions if there is a serious disruption to the testing
environment, it can be corrected and the candidate given an accommodation. In the UIT world this
is generally not possible, especially if, as Hughes and Tate (2007) reported, only about half of the
candidates experiencing a technical problem even reported it to the organization they were apply-
ing to and only 8% reported it if the interruption was nontechnical (e.g., excess noise). Yet, to give
the UIT testing environment insufficient attention is to lower the standard for UIT to a level that
would not be tolerated in a proctored environment. There are at least partial solutions to some of
the technical issues (e.g., use only assessments that are untimed or download the entire test to the
candidate’s desktop so that connection speed or reliability do not play a role). But, we have not seen
any solutions for standardizing the testing environment other than admonitions to the candidate to
create such an environment for himself/herself. Again, perhaps the benefits of UIT outweigh the
costs of diminished quality assurance of the test, perhaps not. But, unless there is no longer concern
about these issues to the degree there was in pre-UIT days, much more research and solutions result-
ing from it are needed to address the issues of (a) whether a standardized testing environment even
matters and, if it does, (b) how it can be reasonably assured in a UIT environment.
Administering Assessments and Decision-Making 379

EXAMINEE IDENTITY AUTHENTICATION AND CHEATING


A threshold question about UIT is how to ensure that the person completing the assessment actually
is the person we think is completing the assessment. If we cannot ensure that this is the case, then
we have little defense against cheating, particularly through collaboration with others when taking
the assessment. There is evidence of a growing erosion of ethical norms in society. For instance, the
Ethics Resource Center (ERC) in its National Business Ethics Survey (ERC, 2007) reported that
the percentage of employees surveyed that “believe their peers are not committed to ethics” (p. 24)
went from 25% in 2003 to 34% in 2005 to 39% in 2007. Outside of the organizational setting, in
the Junior Achievement (JA)/Deloitte Teen Ethics Survey (JA/Deloitte, 2007), 18% of the students
surveyed said they had cheated on a test in the last year. Twenty-four percent thought cheating on
a test was at least sometimes acceptable. More than half of these cited “personal desire to succeed
in school” (p. 1) as the reason for believing cheating could be acceptable. The Josephson Institute
of Ethics’ Report Card on the Ethics of American Youth (2006) reported that high school students
when asked the question “How many times have you cheated during a test at school in the past
year?” over 60% responded “at least once,” and more than 35% said “two or more times.” In this
ethical environment, self-policing by job candidates would seem to be largely futile.
Technological solutions ranging from web cameras trained on the examinee to so-called “bio-
metric” verification (fingerprints, retinal scans) to “forensic” analyses (e.g., Foster, 2008) have all
been suggested. Without meaning to dismiss these efforts out of hand, we do not see them fully
delivering on their promise any time soon. For example, the use of web cameras to monitor the
candidate while being tested has been touted. Yet, given that the current technological issues of
UIT center on fundamentals like bandwidth, connection reliability, and monitor resolution settings,
availability of web cameras at every candidate’s home computer does not have much more than
theoretical promise over the short-to-medium run. Likewise, biometric validation such as finger-
print readers or retinal scans are only a vision of the future and, arguably, would do nothing by
themselves to prevent a collaborator from taking the assessment after the actual candidate logged
on. More exotic preventatives such as forensics involving the candidate’s keyboard typing rate or
response latency seem to us to suffer from the same drawbacks as fake scales in personality assess-
ment in that they can raise a red flag about the likelihood of the undesirable behavior, but can
provide no definitive proof that the candidate indulged in such behavior. When we see an elevated
score on a fake scale, we are usually advised to “interpret with caution,” whatever that means. We
think the forensic measures will likely suffer the same fate although there may be advantages to the
use of forensics that we will cover later. Although researchers in this area should be commended for
looking for innovative ways to address the problems created by UIT, for the foreseeable future we
do not see these safeguards providing candidate authentication and monitoring anywhere close to
the gold standard of proctored testing.
Likewise, several researchers in the area have suggested proctored verification or confirmatory
testing as a workable solution to the UIT problem (International Test Commission, 2005; Tippins et
al., 2006). Although individual variants on this idea differ in their specifics, they share in common
that there is an initial test administered under UIT conditions. It is acknowledged that those who
pass this test are a combination of legitimately higher-potential candidates and those who got
there by cheating. Selection decisions are not based on this initial test, but rather on a second
confirmatory test that is proctored. For many I-O psychologists who work in applied settings, the
proposed use of proctored verification tests does not solve the problem of UIT but only delays it.
Granted, there are fewer candidates to be tested on the verification test, but even here unless a very
aggressive cutoff score is used on the UIT-administered test, this advantage is likely to be minimal.
Moreover, confirmatory testing would now require the most promising candidates to be tested twice,
creating an extended-duration selection program and more—not less—inconvenience for the best
candidates, thus negating in large part the presumed advantages of UIT (i.e., a faster, cheaper, and
less burdensome assessment process). We are not opposed to verification testing, but simply point
380 Handbook of Employee Selection

out that its advantages should not be oversold. Discussion of an alternate use of verification testing
that might be more practical is presented later in this chapter.

TEST SECURITY
If examinee authentication is problematic, protecting test security under UIT is doubly so. Solutions
to the security problem can include hardware and software solutions (e.g., disabling the print screen
key, “locking down” the web browser), use of item banks, use of forensic analysis to detect suspi-
cious responding indicative of theft of test content, and monitoring of Internet activity and sites
where “pirated” test materials are trafficked (Burke, 2006).
All are worthy steps to consider in helping protect test security. Yet, it is clear that they can also
be ineffective and can lead to a sense of false security. For instance, safeguards such as disabled
hardware and locked-down browsers can easily be defeated by simply taking pictures of test screens
with a digital snapshot camera. Forensic analysis can only measure the likelihood of content theft
by a test-taker; it cannot provide proof. And, unlike in the case of cheating detection, in which
there is some potential remedy such as proctored verification testing, when test security is breached
the damage is done. The same is true of scanning websites for stolen test content. It may very well
locate pirated test materials, but by that point it is too late to do anything except begin planning the
replacement test. Moreover, as test security deteriorates, it would not be unreasonable to expect that
wide access to test contents might also increase the incidence of cheating on the test itself.
One solution is to use “honesty certificates” in which the candidate certifies before he/she begins
the test that he/she will not cheat and will not pilfer test content. Such an approach clearly com-
municates to the candidate the employer’s expectations regarding honesty (Burke, 2006; Cizek,
2001; Hughes & Tate, 2007; McCabe, Treviño, & Butterfield, 2001). This step is virtually without
cost and there is some indication it actually has some deterrent effect on cheaters and thieves. Yet,
Nathanson, Paulhus, and Williams’ (2006) finding that students high in subclinical psychopathy
were more likely to cheat than others reminds us that there are distinct limits on what we can expect
from honesty certificates. Rotating items from a large item bank will help by preventing the loss
of a test in toto. Yet, even here we know that merely having access to a parallel form of a test is
invaluable to a job candidate who wants a leg up on the competition. Nor should it be ignored that
developing an item bank of the size needed can be extremely costly.
We admit that we have painted a grim picture of the advisability of UIT from the standpoints of
candidate verification (cheating) and test security. In actuality, the picture may not be quite as dire
as the limited effectiveness or impracticality of many of the suggested remedies may imply. That is
because the simple fact of having some of these precautions in place and letting the examinees know
they are in place is likely to deter some of the cheating and test content theft.
However, in the end we agree with Pearlman, in Tippins et al. (2006), who wrote, “… it can be
safely assumed that for high-stakes applications, all test content available for viewing on a [UIT]
computer screen will eventually be compromised, the only questions being how quickly and to what
extent” (p. 205). It may be, at least with the tools available to us today, that eventual loss of test
security is, in the final analysis, an inherent cost of doing business in the UIT world.

PERCEIVED VALUE OF TESTING: DO DECISION-MAKERS CARE?


A discussion of UIT is not complete without consideration being given to a special nontechnical
issue. Specifically, do the decision-makers who are our clientele for assessment care if the assess-
ment process is impaired because of lack of proctoring or test security? This, we think, is a question
that deserves more attention than it has received. The debate within the I-O community contains an
implicit assumption that assessments are quite useful and therefore the integrity and quality control
of the conditions under which they are administered is an important issue. But, what if managers
perceive pre-employment assessment to be of, at best, modest use in identifying the right talent?
Administering Assessments and Decision-Making 381

What if they see assessment’s main value as being a way to cheaply screen many candidates down
to a smaller number so that the “real” evaluation can then begin? If so, it is quite logical that they
would believe that the care under which assessments are administered could be greatly compro-
mised without incurring any meaningful damage. The answer to this question is important because
it will dictate how we address UIT going forward. If there is a pervasive belief within our clientele
that assessment does not really do much other than provide a cheap way to cull out candidates, then
we will probably not achieve a mutually acceptable solution to the issues surrounding UIT because
the I-O psychologists will be pursuing one goal and the clients another.
There is some evidence that suggests clients, even in organizations where assessment is well
established, derogate its importance in identifying top-quality job candidates. Ryan and Tippins
(2004) observed that:

Even when presented with evidence of strong relationships between test scores and job performance,
some managers are unwilling to concede that structured selection programs are not just good ways to
select employees; they are better than other less structured alternatives. The power of “gut instinct” and
“chemistry” seems to override the hard data and rational arguments. (p. 306)

An early theoretical paper by Shrivastava and Mitroff (1984) anticipated some of the Ryan and
Tippins (2004) conclusions. Shrivastava and Mitroff compared the “assumptions organizational
decision-makers use in making decisions with those of researchers” (p. 19). Among their propositions
were that decision-makers have a “preference for intuitive images of the problem” (p. 20) and that
“inquiry [by decision-makers] is conducted through informal, personal, nonstandard procedures”
(p. 20). If correct, these propositions are consistent with those of Ryan and Tippins (2004), and it
should therefore not come as a surprise when managerial decision-makers are not all that concerned
with the safeguards that come from proctoring.
Lievens, Highhouse, and De Corte (2005) found that managers placed more value on general
mental ability and extraversion as useful selection constructs when they were measured via unstruc-
tured interview than when they were measured by test. Terpstra and Rozell (1997) reported that the
leading reason cited by human resources (HR) managers for not using cognitive tests or biodata
(other types of noncognitive tests were not included in the study) was that these managers were either
uncertain of the usefulness of the tests or felt the tests were not effective. Perhaps most telling was a
study by Terpstra (1996). He surveyed 201 HR executives, asking them to rate nine different classes
of selection procedures in terms of “how predictive of future job performance” (p. 16) each was (i.e.,
their perceived validity). Four of the classes involved tests (i.e., specific aptitude tests, personality
tests, general cognitive ability tests, and biographical information inventories) whereas the other five
are generally not regarded as being tests in the lay usage of the term (i.e., work samples, references,
unstructured interviews, structured interviews, and assessment centers). Of the nine classes of selec-
tion procedures rated, these HR executives put the four classes of tests at the very bottom of the list.
Note that these beliefs were expressed despite well-documented evidence to the contrary. In this
context it would not be surprising if there were a willingness to sacrifice the assurance afforded by
proctoring if there is a fundamental lack of confidence in the first place that tests work.
Although the studies cited above are, to be sure, only a smattering of the literature that could be
brought to bear on the issue of the value placed by managers on tests, they suggest that there is not as
much confidence in the efficacy of tests as might be supposed. And, if confidence is lacking, it fol-
lows that not much is lost by compromising the standards under which the tests are administered.
However, one might wonder why the organization would use testing at all if it has such weak
confidence in the value of testing. We can only speculate, but in the real world of organizations it
is rarely the same decision-maker who put the testing program in place as the one who desires to
compromise it. Sometimes this is because of decision-makers being in different parts of the organi-
zation (e.g., HR vs. sales) or because of the simple passage of time (e.g., HR executive A installed the
testing program; years later HR executive B is open to compromising it). In any event, much more
382 Handbook of Employee Selection

research is needed to better understand the orientations of managers (HR and line) toward testing
and its value and how those orientations may relate to managers’ willingness to compromise quality
assurance steps in employment testing.

SIMULATING THE EFFECTS OF UIT—WHY SIMULATE?


Where decisions must be made in the absence of empirical data to help resolve a question, a useful
substitute is to construct a simulation that allows input of the key variables that are thought to affect
the outcome of interest and then to populate those variables with plausible values (assumptions).
By observing the range of results, a well-constructed simulation can help inform the decision.
Of course, the simulation is only as good as the assumptions that drive it, but if they are well-
thought out, as empirically anchored as they can be, and represent a range of conditions likely
to be encountered, the results can be very useful, especially when compared with the frequent
alternative—guesswork.
Simulation can be a very valuable tool to use in projecting the net effects of several variables
acting simultaneously. Consistent with this line of thought, Schmitt and Oswald (2006) conducted
a thorough simulation of the effects of faking on noncognitive measures. Their work, although on
a related but different topic than ours (faking vs. cheating), demonstrated the value of simulation
in estimating the effects of difficult-to-observe phenomena like cheating and faking on important
outcomes like test performance, pass rates, and test validity.
It appears that there is still consensus among I-O psychologists that the moral hazard of UIT is
severe enough to prevent its use where cognitive ability testing is concerned. There is less consensus
with regard to noncognitive measures such as biographical data or personality inventories, so in our
simulation we sought to estimate the effects of UIT on noncognitive testing. In the present case, we
thought a simulation would alert us to whether the practical consequences of UIT would be serious
or not. Our simulation required five things:

1. A distribution of proctored test scores


2. A specified cutoff score
3. Assumption(s) of the incidence of cheating
4. Estimated score gains through cheating
5. The estimated distribution of cheaters throughout the candidate population

Simulation Input 1: Distribution of Proctored Scores


A distribution of proctored scores is the necessary starting place because it enables us to see how
individual scores change as assumptions 3–5 are entered into the model. Clearly, an initial distri-
bution that is normal will produce quite different results than one that is, say, heavily skewed. An
empirical distribution of the actual assessment under consideration for UIT is preferable. Absent
that, an assumed distribution of scores could be generated.

Simulation Input 2: Cutoff Score


All else equal, a very stringent cutoff score will produce different results in a simulation than a more
relaxed one.

Simulation Input 3: Assumed Incidence of Cheating


The third parameter that the model needs is assumptions about the incidence of cheating. Our
examination of the literature was disappointing because there is so little convergence on the inci-
dence of cheating; yet it was still instructive. Much of the lack of convergence is due to the inher-
ent difficulty in doing research on a phenomenon that lacks a clear definition and that those being
studied go out of their way to conceal (i.e., it is very difficult to observe cheating in situ). What is
left is an amalgam of criterion variables ranging from self-reports of personal cheating behavior to
Administering Assessments and Decision-Making 383

reports of observed cheating to computer program “cheating detectors” (Nathanson et al., 2006).
Timeframes for the cheating to “count” vary widely. To further muddy the water, cheating behav-
ior can range from stealing answers to a test from someone else to using unauthorized aids (e.g.,
calculators on a math exam). For noncognitive measures, the most likely type of cheating would be
illicit collaboration (i.e., soliciting the unauthorized aid of another person in completing the assess-
ment) and only some of the studies in the literature address this variety of cheating. With respect
to assessment itself, several other variables come into play that prevent the current literature from
converging: whether the assessment is proctored or unproctored, whether it is cognitive or noncog-
nitive, whether high stakes or lower stakes are involved, and demographics (virtually all studies
involve student populations).
Our immediate interest in our simulation work was in unproctored, noncognitive, high-stakes
assessment involving adults. The literature that most influenced our decision on cheating rate
included studies where there was evidence reported of the two specific kinds of cheating of greatest
interest to us: cheating on a test (as opposed to, say, on homework) and willingness to engage in illicit,
deceptive collaboration with another. We would like to have focused on noncognitive measures and
adult populations, but the literature was virtually devoid of information in those areas. We also
would have liked to focus on situations involving only unproctored testing, but the vast majority of
the literature involved classroom testing, which we presumed to be at least to some degree proctored.
So, we were left with a literature mostly comprised of college student populations involved in the
kinds of test cheating/illicit collaboration found in the academic environment, that is, on cognitive
(achievement) tests taken under at least nominally proctored conditions rather than on noncognitive
tests taken by adults under unproctored conditions. The literature pertinent to estimating the third
parameter in our simulation (incidence of cheating) is summarized in Table 18.1.
In reviewing this literature, our first area of interest was the threshold question of how frequently
people cheat under UIT conditions. Hughes and Tates (2007) found that 12% of students in the United
Kingdom reported they had cheated under UIT conditions involving high-stakes (selection), ability
(cognitive) testing (time period covered was unspecified). One thing that is apparent in scanning the
studies summarized in Table 18.1 is that many capture cheating over extended periods of time (e.g.,
a year, a college career) and that all else being equal, longer time periods will inflate the reported
incidence of cheating. For that reason, we were particularly interested in the incidence of cheating
on a single, discrete event (i.e., on a given pre-employment test administration) or as close to it as we
could come. Nathanson et al. (2006) used a methodology that matched patterns of item responses
to identify pairs of students whose response patterns on an in-class exam were suspiciously similar.
The suspicions were corroborated by close proximity of the seats of the pairs of students in virtually
every case. The inferred cheating behavior was not limited to the single incident category, but it
was fairly close to it, being limited to two to five in-class exams. As might be expected, inferred
cheat rates were quite low—around 4%. However, this study involved only one category of cheating
(copying) and the tests were proctored, both of which would also drive down cheat rates. Blackburn
(1998) limited the observation period for the cheating behavior to count to the “first [college] class
[of] each week during one semester” (p. 173). Here too, self-reported cheat rates were comparatively
low, ranging from 2% (looking up answers in a closed-book test) to 17% (copying answers from
another student during a test). Although not explicitly stated, it could be assumed that where tests
were involved, they would be proctored, suppressing some cheating.
The next question had to do with the specific kind of cheating. In noncognitive UIT, we would
expect that the greatest source of cheating behavior would involve illicit collaboration with another
person to “figure out” the test. The studies we reviewed were rich in evidence of illicit collaboration.
We defined illicit collaboration as receiving assistance from another person with that person’s
acquiescence for the purpose of falsely improving one’s test score. This excluded such behaviors as
copying from another person during an exam without his/her knowledge. Eight studies (Table 18.1)
produced 17 estimates of the incidence of illicit collaboration, ranging from 1% (self-report of
taking a college exam for someone else or vice versa; Newstead, Franklyn-Stokes, & Armstead,
384 Handbook of Employee Selection

TABLE 18.1
Summary of Incidence of Cheating Studies
Percentage Cheating, Selected Behavior,
Study Description Period Covered Stakes
Blackburn (1998) Self-report; college students Measured 27 cheating behaviors—4 of interest Moderate
(N = 225) in UIT; cheating behavior reported if it occurred
“during the first class of each week over the
course of a semester”
• 17%—copying answers from another student
during a test or quiz
• 8%—using a “cheat sheet” during a test
• 2%—looking up answers in a closed-book test
• 5%—having another person write a paper or
assignment handed in as own work
Hughes & McCabe Self-report; Canadian college Cheating behavior reported if it occurred “at least Moderate
(2006) students (N = 13,644 once in the past year”
undergraduates; 1,318 graduate • 18% (undergraduates); 10% (graduate
students) students)—received unpermitted help on an
exam
• 6% (undergraduates); 3% (graduate
students)—copying from another student
during a test with his or her knowledge
• 9% (undergraduates); 4% (graduate
students)—turned in work done by someone
else
Hughes & Tate (2007) Self-report; U.K. university Cheating behavior reported if it ever occurred High
students (N = 534; 319 were under UIT conditions; time period covered not
involved in high-stakes UIT) specified
• 12%—used any of the four cheating methods
below
• 5% —prior practice with test under a
pseudonym
• 4-5%—illicit collaboration with another
• 4-5%—advance access to test content
• 2% —circumvention of UIT technology to
gain unfair advantage
Klein, Levenburg, Self-report; college students Measured 13 potential cheating behaviors—4 Moderate
McKendall, & (N = 268) pertinent to UIT; time period covered: sometime
Mothersell (2007) during college career
• 27%—collaborating on take-home exam
• 5%—using “cheat sheet” on an exam
• 14%—looking at someone else’s exam during
a test
• 29%—programming extra help into a
calculator to use during an exam
McCabe, Butterfield, & Self-report; Canadian and U.S. Cheating behavior reported if it occurred at least Moderate
Treviño (2006) graduate students (N = 5,331) once within the last academic year
• 10% (business students); 8% (nonbusiness
students)—engaged in “serious test cheating”
• 28% (business students); 23% (nonbusiness
students)—engaged in improper collaboration
Administering Assessments and Decision-Making 385

TABLE 18.1 (continued)


Summary of Incidence of Cheating Studies
Percentage Cheating, Selected Behavior,
Study Description Period Covered Stakes
McCabe, Treviño, & Self-report; college students Cheating behavior reported if it occurred “at Moderate
Butterfield (2001) (N = 1,793) least once;” time period not specified
• 52%—copied from someone else on an exam
• 27%—used unauthorized crib notes in a test
• 37%—helped someone else on a test
Nathanson, Paulhus, & Computer program to match Did not rely on self-report; inferred cheating Moderate
Williams (2006) improbably similar multiple through circumstantial evidence.
choice test answer sheets, • 4%—(Study 1); 4% (Study 2)—likely illicit
corroborated by proximity in copiers detected by computer program over
seating; college students two (Study 1) or five (Study 2) class tests
(N = 770 for Study 1; 250 for
Study 2)
Newstead, Franklyn- Self-report; U.K. university Measured 21 cheating behaviors—3 of interest Moderate
Stokes, & Armstead students (N = 943) in UIT; cheating behavior reported if it
(1996) occurred “at least once during the previous
academic year”
• 18%—collaborated on coursework that was
supposed to be individual work
• 5%—colluded with another student to
communicate test answers during an exam
• 1%—taking an exam for someone else (or vice
versa)
Robinson, Amburgey, Self-report; college students Measured 7 cheating behaviors—4 of interest in Moderate
Swank, & Faulkner (N = 118) UIT; cheating behavior reported if it occurred
(2004) “2–10” or “more than 10” times during the
previous academic year
• 42%—copying from another student in an
exam without his/her knowledge
• 37%—copying from another student in an
exam with his/her knowledge
• 16%—using notes impermissibly during an
exam
• 61%—getting questions or answers from
someone who has already taken the test

1996) to 61% (self-report of getting test items or answers from someone who has already taken the
test; Robinson, Amburgey, Swank, & Faulkner, 2004), with a median of 10%. In the closest of these
studies to our primary interest, illicit collaboration under unproctored conditions, Klein, Levenburg,
McKendall, and Mothersell (2007) reported that 27% of college students said they had collaborated
on a take-home (i.e., unproctored) exam, whereas Hughes and Tate (2007) reported that 4–5% self-
reported having engaged in illicit collaboration with another person in a UIT setting.
The results of these studies were so divergent it would be tempting to conclude they are of little
use in arriving at informed assumptions for entry into a simulation model. On the contrary, we
found these results to still be useful, and we concluded that we could bracket the most likely inci-
dence of cheating under these conditions by entering our simulation with three different assump-
tions of the incidence of cheating: 10%, 15%, and 30%. To be sure, these percentages were not as
high as many found in the self-report literature. However, we were focusing on a single event (a
given pre-employment test), not over an extended period of time. We were focusing on noncognitive
386 Handbook of Employee Selection

testing, which would be expected to have a lower cheating rate. Age is a commonly cited factor
in rates of cheating. Our examinees were all adults. Taking these into account, our cheating rate
estimates for the simulation were lower than those often seen in the literature. Because most of the
literature is based on self-report, it undoubtedly reflects underreporting of the true incidence of
cheating, and this was the main basis for choosing 30% for our highest-level estimate. All three of
our estimates, then, were considered conservative in the sense that they did not “stack the deck”
against UIT, something we were sensitive to in preparing a balanced case.

Simulation Input 4: Estimated Score Gains Through Cheating


The next assumption entered into the model was an estimate of the amount of score gain a cheater
could expect. There were additional complications with using the research literature in that we
could find no studies that directly attempted to calculate the effects of cheating on test scores, much
less on high stakes, noncognitive testing specifically. As noted previously, the cheating literature
relies heavily on anonymous self-reports. This methodology is not well suited to ascertaining how
much the people who cheated on a test gained by their cheating. Indeed, it is possible, when the
cheating involved illicit collaboration with another, for score losses to occur if the collaborators
were poorly chosen.
With particular reference to noncognitive testing, the next best thing was to examine the faking
literature. It is important to distinguish the effects of gains through cheating from the effects
of “faking good.” We did not seek to address faking and assumed that any score gains due to
cheating would be above and beyond any gains due to faking. That said, the faking literature was
our best source of information to gain insight on what score gains we might expect to observe
through cheating on a high-stakes, noncognitive test. A threshold question was whether any gain
would be realized on a noncognitive test through cheating. Indeed, there is recognition within the
I-O psychology community that UIT is less a problem with noncognitive measures than it is for
cognitive measures (Tippins et al., 2006). Yet there is evidence that illicit coaching or collaboration
does occur and it does affect biodata scores (S. H. Brown, personal communication, June 1, 2007).
Although notoriously difficult to get direct evidence of illicit coaching of candidates (i.e., coaching
for the purpose of inappropriately helping them pass a test), Brown and his colleagues reasoned
that this coaching—by overzealous recruiters in this case—would result in certain anomalies
on a biodata instrument. They developed four forensic indicators of such anomalies. Two were
regression equations predicting responses on selected items from responses on other items, the third
involved response latencies, and the fourth was a measure of fit with clusters based on examinees’
backgrounds. Each job candidate received a “flag” when an equation produced an unusually high
residual, when the response latency was abnormally short, or when there was a lack of fit with
any of the clusters, for a maximum of four possible flags. Candidates who were hired with two
or more such flags were compared with those hired with less than two flags. The survival rate
in a commissioned sales occupation of the “high-flag” group (which was 2.4% of those hired)
was only about half that of the “low-flag” group. Moreover, the high-flag group’s commissions
were only about half those of the low-flag group. This research did not directly observe coaching/
collaboration on biodata, but it did produce circumstantial evidence that the relative invulnerability
of noncognitive measures that is often assumed may not hold true. It is worth noting that two of the
approaches that we expressed reservations about earlier—namely forensic analysis, as exemplified
in Brown et al.’s work—coupled with proctored confirmatory testing could be useful in situations
like this. Proctored confirmatory testing would answer our question about what should be done when
forensic analysis flags a candidate. Limiting confirmatory tests to those who both passed and were
flagged by a forensic analysis would address our concern about requiring proctored confirmatory
testing of everyone who passed the UIT. This, of course, presumes that forensic analysis would flag
comparatively few candidates.
Zickar and Robie (1999) examined the item response theory (IRT) characteristics of personality
measures collected under various conditions related to their fakeability. One fortunate byproduct
Administering Assessments and Decision-Making 387

of this study was that it compared scores collected under (a) instructions to respond honestly with,
(b) instructions to fake good with, and (c) instructions to fake good combined with some degree of
coaching. We were particularly interested in the difference between faking alone and faking plus
coaching because it provided some sense of the incremental score gain we might expect via coaching
(illicit collaboration in our case). Means under the coaching conditions were roughly in the range of
one-third to over one-half SD higher than means under the fake-good-only condition. This suggests
that coaching could indeed boost personality scores. These results on one kind of noncognitive
test (personality) are tempered by results on another kind (biodata) reported by Kluger, Reilly, and
Russell (1991). In their study of the fakeability of biodata, they found that the keying method used
made a substantial difference. When item keying was used (i.e., where the response scale has a
monotonic relationship with the criterion measure) fakeability was significant. We would assume
that if fakeability were possible, collaboration with another person could have similar effects.
However, when the keying method used was option-keying (i.e., where each response option is keyed
individually, often nonmonotonically) fakeability was stymied. In fact, the fake-good condition in
this study actually produced lower scores than the honest condition. This illustrated the need to be
conservative in assigning score gains because of cheating, especially if the noncognitive instrument
at issue is option-keyed biodata.
It was clear that much guesswork would be necessary to arrive at credible assumptions for this
element of our simulation model. Taking the results of the two studies above into consideration and
then weighing the fact that our testing would be done under high-stakes conditions, we elected to
bracket score gains in the range of 0.25 SD to 0.75 SD. We acknowledge that an optimal simulation
model would actually have a distribution of different-sized score gains through cheating, including
some with actual reduction in scores (because of the cheater’s picking a less-able collaborator or
because option-keyed biodata were used). However, we concluded that the state of the art was not
yet sufficiently developed to incorporate this level of sophistication into our model. Clearly, more
research is needed in this area.

Simulation Input 5: Assumed Distribution of Cheaters


This assumption, the distribution of those who will cheat, was the most problematic of all. For
example, if cheaters are concentrated at the lower end of the distribution, far from the cutoff score,
cheating will have no practical effect on who passes the test. On the other hand, if cheaters are
disproportionately represented at a place on the distribution that is just below the cutoff score,
even modest percentages of cheaters with modest score gains could push large numbers of “fails”
to become “passes.” A study by Arthur, Glaze, Villado, and Taylor (2008) sought to estimate the
distribution of “malfeasances” (a mingling of cheating and response distortion/faking), including
cheating on cognitive and noncognitive tests. Arthur et al. did not measure actual cheating, but they
reasoned that high-stakes testing conditions would be more likely to produce such malfeasances.1
Although Arthur et al. were not able to isolate cheating alone, they concluded in their noncognitive
test analysis that cheaters are likely to be fairly evenly distributed across the score range with some
tendency to be overrepresented at the higher end of the score range. Although their paper conflated
cheating and response distortion, we found it to contain the only empirically derived estimates of
the distribution of cheaters that we are aware of. As such, we found it quite useful in informing
assumption 5 of our simulation.

PUTTING IT TOGETHER TO CONDUCT THE SIMULATION


To conduct the simulation, we used a random number generator to select 10% of the cases from our
initial, proctored distribution of test scores. Then, the smallest assumed score gain (0.25 SD) was
added to each of those cases and the “fate” of each of those cases was observed to determine how

1 We made the point earlier that cheating and response distortion are related, but different, classes of behavior.
388 Handbook of Employee Selection

TABLE 18.2
Estimated Percentage of Cases Moving From “Fail” to “Pass”
Because of Cheating
Assumed Cheat Rate
Assumed Score Gain (SD) 10% 15% 30%
0.25 1.1 1.7 3.4
0.50 2.3 3.4 6.5
0.75 4.1 6.1 11.5

Entries were calculated by nf → p /(nf → p + np), where nf → p is the number of cases who
went from “fail” to “pass” because of the score gain obtained due to cheating, and np
is the number who passed the test before the adjustment for cheating gains was made.

many moved from fail to pass as a result of the adjustment to their scores for cheating. This pro-
cess was iterated 1,000 times and results were the percent of all passing cases who went from fail
to pass because of cheating averaged across the 1,000 iterations. It should be emphasized that we
approached this from a very pragmatic point of view: We were only interested in those cases which
moved from “fail” to “pass” because of estimated cheating. Cases that “cheated” but were already
above the cutoff score before having their score adjusted or cases that were designated as cheaters
but remained below the cutoff score after their scores were adjusted were not of interest. This pro-
cess was repeated for all remaining combinations of assumed cheat rate and score gain. As Table
18.2 shows, the estimated percentages of people who passed because of cheating would be minimal
under the most conservative of conditions (10% cheat rate, 0.25 SD score gain), with only about 1%
of the cases who were above the cutoff score getting there because of cheating. To the surprise of the
researchers, the effects of cheating were still quite modest even under the moderately conservative
condition (15% cheat rate, 0.50 SD score gain), rising to only 3.4%. Only under the upper realistic
limit (30% cheat rate; 0.75 SD score gain) did we finally observe the percent of the pool of candi-
dates who passed the test to contain more than 10% who got there by cheating.
A very large caveat is in order here. The results shown in Table 18.2 should be taken only as a
demonstration of the value of simulation as an aid in the decision of whether or not to engage in
UIT. Specific results depend not only on the assumptions discussed above, but also the shape of the
proctored test distribution (ours was somewhat negatively skewed) and the cutoff score in use (ours
generally yielded a 45–55% pass rate). Beyond that, the assumptions themselves will produce widely
varying results. Although beyond the scope of this chapter, it is readily apparent that if the test at
issue were heavily cognitive in content, the temptation to cheat and the average score gains realized
from cheating would increase substantially, with the result being a possibly dramatic increase in the
negative effects of UIT over the example we present here.

UIT SIMULATION: THE FUTURE


The simulation could be used in other ways as well. Some studies of UIT show that it does not
adversely affect validity and/or does not result in significant increases or differences in mean test
scores (e.g., Fallon, Janovics, Lamazor, & Gust, 2006; Grauer, Beaty, Davis, & Meyer, 2006; Nye,
Do, Drasgow, & Fine, 2008; Templer & Lange, 2008; Tippins et al., 2006). Yet it is likely that
most of these studies are fairly blunt instruments with regard to detecting the effects of UIT. In
some cases, research designs were questionable (e.g., using within-subject designs without coun-
terbalancing where large practice effects are likely). But the biggest problem is because cheaters
are likely embedded in a much larger group of noncheaters making such studies insensitive to the
effects, if any, of UIT. The simulation approach could be used to estimate what incidence and score
Administering Assessments and Decision-Making 389

gains would be necessary before significant mean score differences appeared and/or validity were
impaired.2
UIT is increasingly being portrayed as inevitable. The simulation approach can assist in the
decision-making process by helping to evaluate how serious the consequences will be of dropping
the safeguard of proctoring. Of course, research is needed to help hone the assumptions that go into the
simulation model. Better estimates of the incidence of cheating, especially that which distinguishes
high- from low-stakes assessment and noncognitive measures from cognitive measures, would help
immensely. A more sophisticated understanding of the score changes likely to be realized under
UIT is also needed. Our simulation assigned the same score gain to every “cheater” case. A better
simulation would assign differing amounts of score change to individual cases, including, especially
in the case of noncognitive testing, the real possibility that some scores might actually go down.
Finally, we need to have a much better understanding of where along the score continuum cheaters
are most likely to reside. Our simulation used essentially a random assignment, but depending
on the type of test, whether it is high stakes or low, and even the characteristics of the examinee
population, the distribution of cheaters could be considerably different from random. For example,
the literature provides information that could be used to fine tune the distributional assumptions
used in a simulation. Gender (men report more cheating), perceived or actual ability (low-ability
people report more cheating), and age (younger people report more cheating) are reported with
some reliability (Klein et al., 2007; McCabe et al., 2001; Nathanson et al., 2006; Newstead et al.,
1996). Instead of a random selection of cases to be designated as cheaters in the simulation, younger
cases, male cases, and/or cases at the lower end of the test distribution could be oversampled to
more realistically represent the distribution of cheaters. We were not ready to go to this level of
sophistication in our simulation, but the potential to move beyond simple random distribution of
cheaters is clearly there.
Assuming that UIT is here to stay, we need much better capability to assess its likely conse-
quences in specific situations, and the simulation tools will improve only to the extent that the
assumptions entered into the simulations are themselves more accurate.

SO WHAT CAN WE DO?


First, it is useful to admit that UIT creates hazards that did not exist in pre-UIT days and some of
these may prove to be insoluble. In the future, safeguards such as webcam-based monitoring may
become viable, permitting an acceptable level of proctoring. In the meantime, there are things that
can be done to at least mitigate some of the issues of candidate cheating and test security.

• Use honesty certificates, requiring the examinee to certify that he/she will not cheat and
will not take test content.
• Forensics, hardware, and software solutions may not, at present, objectively prevent
cheating or content theft, but the known presence of such things can act as a deterrent.
Publicizing to job candidates that such deterrents are in place will dissuade some from
cheating and/or theft.
• If proctored confirmation testing is feasible, use it. However, for many organizations
administering a proctored test to everyone who passes the UIT is not going to be feasible
or even desirable. An alternative might be to use validated forensic analysis to detect cases
of suspected cheating from among those who passed the test and then subject only those
candidates to proctored confirmation testing.
• Use item banks to create multiple forms or presentations of the test, where feasible.
• Instruct candidates to set aside a specified number of uninterrupted minutes needed to start
and finish the test in one sitting. If a candidate knows that once he/she starts a test it must

2 Simulation of the effects on validity would, of course, also require criterion data.
390 Handbook of Employee Selection

be finished, the likelihood that he/she will ensure that distractions and interruptions are
minimized during that limited amount of time may be increased.
• Reconcile oneself to the fact that putting a test “out there” in people’s homes will eventu-
ally compromise it. Periodic replacement of the assessment should be regarded as a normal
cost of using UIT.

It is virtually certain that the push for UIT will intensify. A rational first step before deciding on
a course of action in any given testing situation would be to construct a simulation that takes into
account likely scenarios. This will help determine the potential seriousness of the consequences
of going to UIT. If the consequences are likely to be very serious (i.e., many candidates passing
because they cheated), it would make sense to hold the line on the requirement for proctoring, either
in all cases or in verification testing of all who passed a preliminary UIT screen, whichever made
more economic and practical sense. If the consequences—as in our example—are not projected to
be serious, UIT may be fully appropriate. In any event, simulation can bring the salient factors on
which the decision should be made more clearly into focus.
The six suggestions we made above will help mitigate, if not eliminate, any ill effects of UIT.
In the future, we fully expect that technological aids such as web cameras will become as standard
on home computers as color monitors are now. At that point, verifying candidates’ Identities and
monitoring them as they take the test will become much easier. We expect forensics to become more
effective. It is not hard to imagine at some future point that we retire the term “unproctored Internet
testing” because technology will truly provide a measure of proctoring that may even surpass dili-
gent human proctors in their effectiveness.
We now turn from an issue of where the pressure is to exclude the human element from the assess-
ment process to an issue where the pressure is in the opposite direction—to include the human ele-
ment even where research clearly indicates there are “mechanical” methods that surpass the human
in arriving at good decisions.

COMBINATION OF CANDIDATE INFORMATION:


THE ROLE OF THE DECISION-MAKER
Psychological research has produced a great deal of useful information that could help organiza-
tions perform much better than they currently do in a wide range of areas, not the least of which
is assessment of human talent. Yet, often that information is not used even when it is known. In
this section we cover the issue of how candidate assessment information can best be assembled
for optimal decisions, why it often is not, how I-O technology can collide with decision-making
preferences, and some ideas on how these dilemmas can be mitigated. As will be seen below, this
discussion falls under the heading of “clinical versus mechanical combination of information” for
decision-making in the assessment context.

CLINICAL VERSUS MECHANICAL COMBINATION OF INFORMATION


We start with some examples to illustrate where these two ways of aggregating assessment data
commonly come into play. The following kinds of events happen every day in organizations:

• A structured panel interview is completed and the ratings across dimensions and inter-
viewers need to be aggregated to help arrive at the decision of whether to accept or reject
the candidate.
• Test scores must be combined with interview ratings to make an employment decision.
• Assessment center assessors’ evaluations must be combined before a decision to promote
or not can be made.
• Three tests make up a battery. The scores from the three must be used in some way to make
hiring decisions.
Administering Assessments and Decision-Making 391

These are all familiar issues to I-O psychologists, but how they are handled is typically not a uni-
lateral decision made by the psychologist. Hiring and HR managers also typically weigh in on these
topics. For example, in a “high-touch” organization, interviewers may resist averaging across their
individual ratings but would rather discuss the candidates until consensus is reached. In another
organization, managers may have much more faith in interviews than in selection tests and so want
to give more emphasis in the decision-making process to the interview. And, in a third organization,
a hiring manager may have become enamored of a particular test, thus tending to defer to its results
in preference to other sources of candidate data.
In fact, a rich literature exists to help us optimize the ways we combine and use assessment
data on the way to making employment-related decisions. Meehl’s (1954) and later Dawes’ (1979)
seminal work comparing the predictive accuracy of mechanical (statistical, actuarial) combination
of data with clinical judgment set in motion a controversy within the psychological discipline when
they empirically supported the claim that mechanical combination of information was superior to
clinical combination. In the intervening years, the Meehl/Dawes hypothesis has been repeatedly
substantiated, and its main contention is no longer considered controversial within psychology (e.g.,
Dana & Thomas, 2006; Dawes, 1979; Grove & Meehl, 1996; Grove, Zald, Lebow, Snitz, & Nelson,
2000; Kleinmuntz, 1990).
There is one little fly in the ointment. That is the fact that psychologists do not typically run the
organizations we work in or consult to. Managers do, and it is here that mechanical combination of
data runs up against some strong opposition. Not only is the proposition that a formulaic combina-
tion of data “untouched by human hands” could be superior to human judgment extremely coun-
terintuitive for most decision-makers, it also impinges on an often jealously-guarded management
prerogative: to decide who is best suited to be hired or promoted.
So, what should the I-O practitioner do? Options include insisting on mechanical combination
and, frankly, running the real risk of being ignored by the organization. Or, the I-O practitioner can
step aside and just let nature take its course. This course of action, unfortunately, cheats the organi-
zation out of the benefit of the most effective selection procedures available.
We think the answer lies in looking for clues that help guide the role of objective aggregation
of data so it can do the most good in the process without preempting management’s role in the
process.
Johns (1993) examined I-O psychology’s penchant for supplying technical solutions to prob-
lems when he discussed the more general issue of why organizations so consistently adopt sub-
optimal solutions to HR problems when the organizational research clearly points to better
solutions. He noted that adoption of new practices is considered innovative, and then went on
to make the distinction between technical innovation and administrative innovation. Technical
innovations are those that are central to the organization’s products and services and are judged
by their technical merit. Administrative innovations, by contrast, are central to the organiza-
tion itself and whether they are adopted or not depends more on the preferences of manage-
rial decision-makers. In comparison with technical innovations, the organization’s judgment
of administrative innovations is heavily influenced by factors outside of technical merit, such
as political interests, imitation of other organizations, legal/regulatory issues, and the like.
According to Johns (1993), where the trouble arises is that I-Os see our innovations—such
as mechanical weighting of selection information or utility analysis—as technical innovation,
but organizational management sees them as administrative innovation. Macan and Highhouse
(1994) examined I-O psychologists’ use of dollar utility analysis in communicating the value to
HR programs to managers. Although they did not work from the Johns (1993) technical versus
administrative innovation framework, their results were completely consistent with it: Although
46% of their I-O respondents indicated they had used utility analysis (a technical argument) to
communicate the value of selection tests to decision-makers, when asked (a) if the managers
perceived the utility estimates to be credible and (b) if the utility-based communication was
more effective than other information in its persuasive value, these same I-O psychologists
392 Handbook of Employee Selection

produced mean ratings that fell somewhere between the rating scale anchors of “occasionally”
and “hardly ever.” Hazer and Highhouse (1997) tested various methods of estimating utility for
its persuasive value with managers. Among their conclusions were “… the degree to which man-
agers accept the notion that the effectiveness of HR programs can be measured in terms of dol-
lars remains an open question” (p. 110). As Latham (1988) observed, “The reason why Brogden
and Taylor’s (1950) work has been largely ignored for 38 years may be that the customer has not
asked for it” (p. 561).
All of these studies are compatible with Johns’ (1993) hypothesis that I-O arguments such as util-
ity or mechanical combination of information tend to be technical, which would be fine if decision-
makers considered employee selection (or training) to be in the class of technical innovation. But,
such arguments will likely fall on deaf ears if the decision-makers in reality consider them to be
administrative innovations. If Johns (1993) is right, then it is no surprise that mounting ever more
powerful technical arguments for the superiority of mechanical combination meets with so little
success.
Referring to the Shrivastava and Mitroff (1984) paper referenced earlier, taking note of mana-
gerial decision-makers’ preference for “subjective, experiential data” (p. 20) and recognizing that
“the analytical devices, methodological rules of inquiry, and the decision heuristics adopted by
decision-makers are highly personalized, informal and non-standard” (p. 20) are considerations
worth taking into account when trying to find appropriate roles for mechanical combination of data
and for decision-makers. In our view, we need to find a happy meeting point between the two if we
are to progress. Finding that point will not simply be a matter of “better education” for the decision-
makers so they can begin to see the error of their ways.
We can start by looking closely at the literature on clinical versus mechanical combination. The
original research was stimulated by a firmly entrenched belief within the psychotherapeutic and
medical diagnostic communities that the predictions “in the heads” of clinical practitioners consti-
tuted the gold standard. Further, that clinical gold standard was expensive, that is, often involved
pooling the judgments of multiple highly compensated clinicians in case conferences. So, when
empirical research began to find that mechanical combination of data was equal to or better than
clinical combination in the vast majority of cases, the cost benefit of mechanical combination tipped
the scale in favor of conclusions of the almost universal superiority of mechanical combination—if
not in terms of predictive accuracy at least in terms of economic superiority. In the Grove et al.
(2000) meta-analysis, 136 studies were examined. In 63 (46%) of the studies, mechanical combina-
tion was superior in predictive validity. In only eight (6%) of the studies was clinical combination
more valid. But, in 65 (48%) of the studies there was no substantial difference. Ganzach, Kluger,
and Klayman (2000) reported the usual superiority of mechanical to clinical prediction in a study
of military recruits’ later performance. However, they indicated that the best prediction involved
both. These studies have profound implications for practice in organizations. Taken to its logical
conclusion, the odds are roughly equal that if we have a significant clinical element in the way
assessment information is combined, we will do about as well as if we follow a purely mechanical
combination model. And, if we pool clinical and mechanical prediction we could possibly do even
better. In other words, as I-O practitioners we need not be doctrinaire about the value of mechani-
cal combination to have an effective assessment program that is still palatable to decision-makers
and their needs.

LIVE WHERE DECISION-MAKERS LIVE


For us to be successful in bringing the best that psychological science has to offer to the organiza-
tions we serve, we must do a better job of relating to the influences that affect decision-makers
and adapt our strategies to those forces. The following are some thoughts on how that might
be done.
Administering Assessments and Decision-Making 393

Stop Treating Administrative Innovation As If It Were Technical Innovation


Over a decade ago Johns (1993) proposed that:

The point is not so much that products of I-O science are too complex or “scientized.” Rather, it means
that they are out of sync with the way organizations make decisions about the class of problems for
which I-O innovations are thought to be solutions. (p. 573)

Applying this advice means putting less reliance on promoting the technical merits of mechanical
combination of pre-employment data and more on finding what drives managers’ decisions. It
means building persuasive cases that start by putting oneself in the mind of the decision-maker.
We might, for instance, capitalize on something Johns (1993) pointed out has great influence on
many managers: the persuasive power of imitation. Those who work inside organizations cannot
help but notice the legitimizing effect that framing an innovation as a “best practice” or through
“benchmarking” can have. If the I-O scientist-practitioner can find other, respected organizations
that have adopted less purely clinical ways of combining pre-employment predictors, he/she can
harness the power of imitation as an invaluable aid in helping mold decision-making.
It may help to capitalize on managerial values, some of which are the same things that cause
decision-makers to be attracted to UIT; namely, increased speed and reduced cost. Mechanical
combination generally beats clinical combination on both counts. However, both considerations
might be trumped by the felt need for “high touch” in some organizations. In these cases, it might
be worth experimenting with using mechanical combination to present the decision-makers not
with a single “right answer” but with a palette of options (or candidates) from which they can select.
Or, mechanical procedures could be used to supply the decision-makers with a suggested solution
(i.e., the candidate(s) proposed to be hired) but with the added step of asking the decision-makers
to ratify these recommendations. Of course, the decision-makers may choose to override these
recommendations, but this process would help retain the best of mechanical combination without
stripping managerial decision-makers of their latitude to make decisions.

Involve Decision-Makers in the Mechanical Combination Process


One of the more compelling aspects of Dawes’s (e.g., Dawes, 1979) research was the finding that
the weighting scheme used in combining variables matters little. Regression-based weights, unit
weights, or even randomly chosen weights worked about equally well in making mechanical pre-
dictions. Involving decision-makers in setting weights for mechanical combination could add a
“hands-on” aspect to mechanical combination that it otherwise lacks, resulting in greater buy-in for
the use of mechanical combination of information, all with little risk of any harm being done. The
chief objection to this approach might be that it is a bit manipulative. However, we would argue that
even full disclosure of the research on differential weighting would have little effect on decision-
makers’ desire to participate, principally because of its intuitive appeal. Whether this argument is
correct is an empirical question we have not tested.

Use, Don’t Fight, Decision-Makers’ Strengths


Researchers have noted that experts (decision-makers in this case) can do some things that mechani-
cal combination cannot do (e.g., Dana & Thomas, 2006; Kleinmuntz, 1990). Meehl (1954; also cited
in Grove & Meehl, 1996) noted what he called the “broken leg case.” In it he described a situation
in which an equation might be developed to predict whether people would go to a movie on a par-
ticular night. But, if a person had just broken his leg, the equation would obviously be wrong for
that individual. In the organizational selection context, decision-makers can be alert for “broken
legs” that would properly nullify the mechanical predictions. For example, say the organization had
developed a structured accomplishment record (Hough, 1984) for use in screening recent college
graduates. The accomplishment record had been validated as a predictor of early-career success by
assigning empirically derived points for occupying leadership roles in campus organizations, with
394 Handbook of Employee Selection

the score on the accomplishment record being a mechanical combination of the points assigned.
However, a decision-maker notes from a particular candidate’s resume and references that she had
put herself through school while working two jobs, and completed a very successful internship, all
while tending to her elderly grandmother. Clearly, there was no time left in the day for campus lead-
ership positions for this otherwise high-achieving student. This might be a broken leg case in which
it would make sense to temper the mechanical prediction of success through the intervention of the
decision-maker. To be sure, given the tendency of many decision-makers to clinicalize decision-
making, there is the real risk of progressively defining-down what constitutes a metaphorical broken
leg. But, if well-defined guidelines can be developed for what constitutes a legitimate broken leg
and what does not, the decision-makers could play a valuable role in preventing mispredictions. Of
course, the guidelines themselves would have to have buy-in from the decision-makers. Better yet,
decision-makers could be involved in developing the guidelines. In this way, decision-makers could
be given a role that is complementary to, rather than competitive with, the formula.
Kleinmuntz (1990) added a couple of additional suggestions. He noted that researchers as far
back as the 1960s have written that a combination of expert judgment with mechanical combination
of results could actually outperform either clinical prediction alone or statistical prediction alone.
This, of course, begs the question of what manner of synthesis between clinical and mechanical pre-
diction improves results. It appears that when the experts (decision-makers, managers) are used to
collect the data and evaluate their subtleties and quirks, encode (rate) the information in a way that
is amenable to quantitative analysis, and then hand it over for statistical combination, the predictive
results are optimized. This approach assigns appropriate roles to the decision-maker/manager and
to the formulaic prediction model. Neither is poaching on the other’s territory; everyone is happy. In
the context of selection an obvious example is in interviewing. Here, interviewers, especially panels
of interviewers, could conduct a structured interview, judging from the candidate’s responses how
he/she will likely perform on the job. Then, the interviewer(s) encode their complex judgments as
ratings. At this point, if they are smart, the interviewers stop and let the formula combine their rat-
ings to make the prediction. Then, the decision-makers review the prediction, looking especially for
any broken legs. In most cases they will affirm the prediction. But, in some cases they will override
it, and provided they can clearly articulate job-related reasons why (“gut feel” would not qualify),
the override would usually be appropriate.
If decision-makers can be convinced to limit their role to what they do best, letting statistical
models do their part, prediction can be improved. However, there are impediments to this, a discussion
of which goes beyond the scope of this chapter. Suffice it to say that such things as overestimates by
decision-makers of their expertise (called “deluded self-confidence” by Kleinmuntz, 1990, p. 302)
can wreck the plan to apportion appropriate roles for clinical and statistical prediction. Steps beyond
just declaring that clinical and mechanical prediction must share the stage are usually going to be
necessary. This is discussed in the next section.

Capitalize on Self-Discovery
Self-discovery has long been recognized as an effective method of attitude and behavior change.
If decision-makers can be provided circumstances that allow them to discover on their own the
benefits of mechanical combination of data, the path will be paved for better use of the strengths
of data gathering (a clinical process) and data combination and prediction (a mechanical process).
Self-discovery could be helped along by borrowing from cognitive dissonance theory (Festinger,
1957; Wood, 2000). Decision-makers could be led to adopt mechanical combination more
extensively by dissonance created by the discrepancy between their clinical predictions’ results
and corresponding results from mechanical predictions. As noted in Wood (2000) one theory
of cognitive dissonance proposes that dissonance is increased “when people fail to behave in a
manner consistent with some valued self-standard” (p. 546). If, as seems likely, decision-makers
value hiring or promoting the right people, a large “miss rate” could create dissonance. Switching
behavior to a more mechanical approach would increase their hit rate and thus reduce dissonance.
Administering Assessments and Decision-Making 395

This would require convincing decision-makers to formalize their predictions about candidates
and then track the degree of success of the candidates who were hired or promoted, especially
those who were hired/promoted in conflict with the predictions of a mechanical model. This
would provide the data necessary to create dissonance. Greater reliance on objective combination
of predictive information would reduce it. This approach may be facilitated by the current
trend toward evidence-based management (e.g., Pfeffer & Sutton, 2006), which has legitimized
quantitative analysis in many places where pure intuition previously ruled. In any event, it would be
important to have the tracking process be transparent to the decision-makers, even as the data are
being collected. To “ambush” them with results showing the superiority of mechanical decision-
making would be counterproductive if the goal is to rationalize the way data are combined and
used in employment decisions. Such transparency may not lead to the most rigorous science, but
if the real goal is informed decisions about decision-making processes, this compromise would
be well worth it.

IN SUMMARY
To reiterate, specific steps that can be taken to find a healthy balance between quantitative rigor and
managerial decision-making rights include:

• Volunteer a meaningful role for decision-makers to play in the process, even if it means
compromising somewhat on methodological rigor and accuracy. Do not put decision-
makers in a position where they feel their judgment must compete with a formula. The
formula will always eventually lose.
• Have the mechanical formula produce a slate of candidates, not just one. Charge the deci-
sion-makers with selecting the one to receive the job offer or promotion. This is not a new
concept; post-World War II era civil service statutes stipulated that a “rule of 3” be used,
calling for three candidates to be presented to the hiring authority for each opening to be
filled.
• Involve decision-makers in the mechanical combination process by letting them set the
weights for various pieces of information to be combined. It will incorporate the intuitive
element decision-makers prefer for administrative innovations, and although it may not
actually enhance the mechanical combination of information, neither will it do it damage.
• Specifically charge decision-makers with identifying Meehl’s broken leg situations in
which the mechanical, formulaic results should be overridden by human judgment. At the
same time, be sure there are objective rules in place to prevent what defines a broken leg
from becoming diluted.
• Create situations of self-discovery for decision-makers. Most managers want to make good
decisions, but conventional education will only go so far, especially when what it teaches
is counterintuitive. Engage them in developing “track records” comparing the success
rates of the formulaic approach with the decision-makers’ preferred judgmental approach.
Presuming the formulaic approach proves itself; the dissonance created should help in self-
discovery of “a better way.”
• Utilize the imitative tendencies of most organizations by finding best practices in other,
respected organizations that have successfully merged mechanical combination with man-
agerial clinical judgment. Likewise, capitalize on the things that are meaningful to manag-
ers (e.g., increased speed, reduced cost) to let them discover how mechanical combination
can be supportive of the objectives they value.
• Capitalize on general changes in thinking within the organization. Many organizations are
just now discovering the wisdom of quantitative, objective analysis—often referred to as
“evidence-based management.” If this is the case, use this as a springboard for legitimizing
the benefits of mechanical combination of data to improve selection decisions.
396 Handbook of Employee Selection

CONCLUSIONS
In this chapter, we covered two important topics, one that occurs at the beginning of the assessment
process, unproctored testing, and the other that occurs at the end of the process, combining data for
decision-making. With regard to the former, the Internet has created great opportunity and great risk
for assessment. In this chapter we suggested that simulation of the most likely consequences of UIT
is a useful first step in the process of determining whether the benefit is worth the risk. With regard
to employment-related decision-making, like most research-oriented I-O practitioners, we would
normally prefer to use algorithms that combine information objectively. Yet, we also acknowledge
that approaches that strip organizational decision-makers of their capability to make those decisions
likely do not have a rosy future in most organizations. So, the trick should become one of finding a
solution that balances accuracy of prediction with managers’ rights to make employment decisions.
Some ways in which this might be accomplished were suggested.
Although unproctored testing and combination of data for prediction might seem at first blush to
have little in common, in fact there is considerable commonality between them in that both represent
where the rules and values of science and research often collide with the rules and values of man-
agement. There were many issues we could have chosen from the very broad area of administration,
scoring, and interpretation of assessments where this difference might crop up. The most pressing
issues of today, such as UIT, will eventually find a resolution only to be replaced by other issues
that involve the interface between the opinions of stakeholders in organizations and those of the I-O
scientist-practitioner. Whether it is a manager who wants to go to UIT, a manager who believes she
has an innate ability to assess people, a manager who has too little trust in pre-employment tests, or
one who has too much, I-O psychologists must play a lead role in helping determine where the right
balance is. This means that I-O psychologists should have in their role statement an obligation to
help resolve all kinds of misconceptions about assessment. Moreover, we must know how to do it in
ways that touch upon managers’/decision-makers’ value systems. This means we must be prepared
to act not only as educators for our organizations, but we may also be called upon to act as applied
social psychologists in creating situations for participation and self-discovery that lead to better-
informed, more effective practices within our organizations.

REFERENCES
Arthur, W., Glaze, R. M., Villado, A. J., & Taylor, J. E. (2008). Magnitude and extent of cheating and response
distortion effects on unproctored Internet-based tests of cognitive ability and personality. Manuscript
submitted for publication.
Blackburn, M. A. (1998). Cheating and motivation: An examination of the relationships among cheating
behaviors, motivational goals, cognitive engagement, and perceptions of classroom goal structures.
Unpublished doctoral dissertation, The University of Oklahoma, Norman, OK.
Brogden, H. F., & Taylor, E. K. (1950). The dollar criterion—Applying the cost accounting concept to criterion
construction. Personnel Psychology, 3, 133–154.
Burke, E. (2006). Better practice for unsupervised online assessment. Surrey, England: SHL Group.
Cizek, G. J. (1999). Cheating on tests: How to do it, detect it, and prevent it. Mahwah, NJ: Lawrence Erlbaum.
Cizek, G. J. (2001, April). An overview of issues concerning cheating on large-scale tests. Paper presented at
the annual meeting of the National Council on Measurement in Education, Seattle, WA.
Dana, J., & Thomas, R. (2006). In defense of clinical judgment...and mechanical prediction. Journal of
Behavioral Decision Making, 19, 413–428.
Dawes, R. M. (1979). The robust beauty of improper linear models in decision making. American Psychologist,
34, 571–582.
Ethics Resource Center. (2007). National business ethics survey: An inside view of private sector ethics.
Arlington, VA: Author.
Fallon, J., Janovics, J., Lamazor, A., & Gust, J. (2006, May). An early adopter’s view: Data from a one-year
unproctored Internet testing study. Paper presented at the 21st annual conference of the Society for
Industrial and Organizational Psychology, Dallas, TX.
Festinger, L. (1957). A theory of cognitive dissonance. Evanston, IL: Row Peterson.
Administering Assessments and Decision-Making 397

Foster, D. F. (2008, April). Secure, online, high-stakes testing: Science fiction or business reality? Paper pre-
sented at the 23rd annual conference of the Society for Industrial and Organizational Psychology, San
Francisco, CA.
Ganzach, Y., Kluger, A. N., & Klayman, N. (2000). Making decisions from an interview: Expert measurement
and mechanical combination. Personnel Psychology, 53, 1–20.
Grauer, E., Beaty, J., Davis, J., & Meyer, J. (2006, May). Unproctored Internet testing: Important questions
and empirical answers. Paper presented at the 21st annual conference of the Society for Industrial and
Organizational Psychology, Dallas, TX.
Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and
formal (mechanical, algorithmic) prediction procedures: The clinical–statistical controversy. Psychology,
Public Policy, and Law, 2, 293–323.
Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical predic-
tion: A meta-analysis. Psychological Assessment, 12, 19–30.
Hazer, J. T., & Highhouse, S. (1997). Factors influencing managers’ reactions to utility analysis: Effects of SDy
method, information frame, and focal intervention. Journal of Applied Psychology, 82, 104–112.
Hough, L. M. (1984). Development and evaluation of the “accomplishment record” method of selecting and
promoting professionals. Journal of Applied Psychology, 69, 135–146.
Huff, K. (2006). The effects of mode of administration on timed cognitive ability tests. Unpublished doctoral
dissertation, North Carolina State University, Chapel Hill, NC.
Hughes, D., & Tate, L. (2007). To cheat or not to cheat: Candidates’ perceptions and experiences of unsuper-
vised computer-based testing. Selection & Development Review, 23, 13–18.
Hughes, J. M. C., & McCabe, D. L. (2006). Academic misconduct within higher education in Canada. Canadian
Journal of Higher Education, 36, 1–21.
International Test Commission. (2005). International guidelines on computer-based and Internet delivered
testing. Author.
Johns, G. (1993). Constraints on the adoption of psychology-based personnel practices: Lessons from organi-
zational innovation. Personnel Psychology, 46, 569–592.
Josephson Institute of Ethics. (2006). Report card on the ethics of American youth. Los Angeles, CA. Retrieved
May 1, 2008, from http://charactercounts.org/programs/reportcard/2006/data-tables.html
Junior Achievement/Deloitte. (2007). 2007 JA Worldwide/Deloitte teen ethics survey. Author.
Klein, H. A., Levenburg, N. M., McKendall, M., & Mothersell, W. (2007). Cheating during the college
years: How do business school students compare? Journal of Business Ethics, 72, 197–206.
Kleinmuntz, B. (1990). Why we still use our heads instead of formulas: Toward an integrative approach.
Psychological Bulletin, 107, 296–310.
Kluger, A. N., Reilly, R. R., & Russell, C. J. (1991). Faking biodata tests: Are option-keyed instruments more
resistant? Journal of Applied Psychology, 76, 889–896.
Latham, G. P. (1988). Human resource training and development. Annual Review of Psychology, 39, 545–582.
Lievens, F., Highhouse, S., & DeCorte, W. (2005). The importance of traits and abilities in supervisors’
hirability decisions as a function of method of assessment. Journal of Occupational and Organizational
Psychology, 78, 453–470.
Macan, T. H., & Highhouse, S. (1994). Communicating the utility of human resource activities: A survey of I/O
and HR professionals. Journal of Business and Psychology, 8, 425–436.
McCabe, D. L., Butterfield, K. D. & Treviño, L. K. (2006). Academic dishonesty in graduate business pro-
grams: Prevalence, causes, and proposed action. Academy of Management Learning and Education, 5,
295–305.
McCabe, D. L., Treviño, L. K., & Butterfield, K. D. (2001). Cheating in academic institutions: A decade of
research. Ethics and Behavior, 11, 219–232.
Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence.
Minneapolis, MN: University of Minnesota Press.
Naglieri, J. A., Drasgow, F., Schmit, M., Handler, L., Prifitera, A., Margolis, A., et al. (2004). Psychological
testing on the Internet: New problems, old issues. American Psychologist, 59, 150–162.
Nathanson, C., Paulhus, D. L., & Williams, K. M. (2006). Predictors of a behavioral measure of scholastic
cheating: Personality and competence but not demographics. Contemporary Educational Psychology,
31, 97–122.
Newstead, S. E., Franklyn-Stokes, A., & Armstead, P. (1996). Individual differences in student cheating.
Journal of Educational Psychology, 88, 229–241.
Nye, C. D., Do, B.-R., Drasgow, F., & Fine, S. (2008). Two-step testing in employee selection: Is score inflation
a problem? International Journal of Selection and Assessment, 16, 112–120.
398 Handbook of Employee Selection

Pfeffer, J., & Sutton, R. I. (2006). Hard facts, dangerous half-truths, and total nonsense: Profiting from
evidence-based management. Boston, MA: Harvard Business School Press.
Robinson, E., Amburgey, R., Swank, E., & Faulkner, C. (2004). Test cheating in a rural college: Studying the
importance of individual and situational factors. College Student Journal, 38, 380–395.
Ryan, A. M., & Tippins, N. T. (2004). Attracting and selecting: What psychological research tells us. Human
Resource Management, 43, 305–318.
Schmitt, N., & Oswald, F. L. (2006). The impact of corrections on faking on the validity of noncognitive mea-
sures in selection settings. Journal of Applied Psychology, 91, 613–621.
Shrivastava, P., & Mitroff, I. I. (1984). Enhancing organizational research utilization: The role of decision mak-
ers’ assumptions. Academy of Management Review, 9, 18–26.
Templer, K. J., & Lange, S. R. (2008). Internet testing: Equivalence between proctored lab and unproctored
field conditions. Computers in Human Behavior, 24, 1216–1228.
Terpstra, D. E. (1996). The search for effective methods. HRFocus, 73, 16–17.
Terpstra, D. E., & Rozell, E. J. (1997). Why some potentially effective staffing practices are seldom used.
Public Personnel Journal, 26, 483–495.
Tippins, N. T., Beaty, J., Drasgow, F., Gibson, W. M., Pearlman, K., Segall, D. O., et al. (2006). Unproctored
Internet testing in employment settings. Personnel Psychology, 59, 189–225.
Wood, W. (2000). Attitude change: Persuasion and social influence. Annual Review of Psychology, 51,
539–570.
Zickar, M. J., & Robie, C. (1999). Modeling faking good on personality items: An item level analysis. Journal
of Applied Psychology, 84, 551–563.
19 Evaluation of Measures
Sources of Error, Sufficiency,
and Contamination
Michael J. Zickar, Jose M. Cortina, and Nathan T. Carter

Suppose that we wish to develop a measure of cognitive ability. After receiving advice from an
“expert” neighborhood psychologist, we will ask each participant to throw a baseball as far as he/
she can, and the distance, in feet, will be our measure of cognitive ability. A shrewd observer might
remark that a given respondent might vary from one trial to the next in the distance that he/she
throws the ball. Wanting to improve upon our measure, we decide to use the average of ten throws.
Whereas there may be considerable intraindividual variability in the distance of a single throw,
there will be little variability in the average distance over ten throws.
A shrewder observer might now express concern over the possibility that men, being physically
larger than women on average, might have an unfair advantage. In addition, direction and strength of
the wind might play a role, as would previous experience throwing baseballs, age, physical strength,
etc. We therefore endeavor to hold wind constant and to adjust for any and all of the confounding
variables that have been mentioned.
We are now satisfied that we have intraindividual consistency and that confounding factors have
been eliminated. We now discover that, although we have no intraindividual variability, we also
have no interindividual variability. What to do …. Aha! After we have collected the average dis-
tance thrown for each participant, we will administer the most recent version of the Wechsler Adult
Intelligence Scale (WAIS; see Wechsler, 1939) and add that score to the distance score. This sum
will then serve as our measure of cognitive ability. To our great joy, we find in subsequent studies
that our measure predicts those things that should, rationally, relate to cognitive ability, and that it
does not predict those things that should not relate to cognitive ability.
This was a strange test development process to be sure. The initial result was off the mark (as was
Goddard’s initial attempt to screen immigrants at Ellis Island for cognitive ability with questions
requiring knowledge about aspects of American life, including what Crisco was; see Zenderland,
1998); however, our procedure did not stop there. It eventually went through all of the stages of test
development. We began by establishing the reliability of the measure. We focused only on test-retest
reliability because we were not sampling items from a domain of items (i.e., internal consistency),
and we were not using a measure for which scoring required judgment (i.e., interrater reliability).
Having established the reliability of the measure, we then set about establishing its validity. We
began by eliminating the influence of those factors that were not relevant but that might have exer-
cised influence over scores. When it then became obvious that the scores with which we were left
did not allow for appropriate inferences, we went searching for what was missing. In the end, we
had a measure with acceptably low levels of relevant forms of random error; although there was
some irrelevant variance in the measure, we deemed it to have acceptably low levels of contamina-
tion; finally, we deemed the measure to have acceptably high levels of sufficiency (i.e., the measure
covered the main sources of variance in the theoretical construct of interest).
Although we hope that most psychologists would not start off with such an absurd first attempt
at a psychological measure, our example details the main processes involved in evaluating a

399
400 Handbook of Employee Selection

psychological­measure. This process involves the assessment of inferential appropriateness through


the establishment of reliability and validity. In the sections that follow, we consider the factors that
compromise each of these important components of test evaluation and the procedures and statis-
tics that are used to index them. We focus on explaining the various sources of measurement errors
that comprise reliability theory and explaining the concepts of sufficiency and contamination, two
important concepts of validity.

Reliability
Although reliability theory is one of the first topics covered in graduate measurement courses, it is
one of the most misunderstood topics. Most students learn about reliability in the context of classi-
cal test theory (CTT) and are deceived by the simple formula, X = T + E, where an observed score
is mysteriously parsed into a true score, T, and error, E. Students who delve a little deeper into reli-
ability theory realize that there is little “true” about the true score, and often what they think is error
is not. What is often lost with novice researchers is that the type of reliability coefficient calculated
dictates the source of error that is identified in a particular measure. In this chapter, we focus on
three common types of error that are often present in psychological measures: error associated with
different items, error associated with different raters, and error due to issues related to momentary
time-limited phenomena.

Error Due to Items


When one of the authors (Zickar) took the GRE psychology subject exam, there was an item that
asked something like “What was the name of the first computerized learning system?” He got that
item correct, not because he knew a lot about psychology, but because he had been an undergradu-
ate student at the University of Illinois where nearly every freshman had to use the computerized
system PLATO to learn chemistry, mathematics, or economics. In a sense, Zickar got one extra item
correct because of the unique content of one item that was biased in his favor. Students from other
universities across the country and world were not so lucky.
Internal consistency measures of reliability, such as the popular coefficient alpha, are largely a
function of interitem covariances. As items relate more strongly with each other, holding all else
equal, internal consistency reliability increases. Tests that have a large percentage of items that are
dominated by unique variance will be more susceptible to error due to individual items. In addition,
all else equal, scales with few items are more susceptible to the unique influence of individual items.
For example, if the GRE psychology test had only three items and one was related to the PLATO
learning system, Zickar’s score would have been greatly inflated. As it was, the small increase that
he got by having “inside information” on that item probably made little difference on the overall test
score given the many items on the subject test.
Although it might be tempting to eliminate error due to the uniqueness of individual items by
administering a scale of items that ask the same item in slightly different ways, this approach runs
the risk of compromising measure sufficiency. Research has also shown that, although rewriting the
same item in slightly different ways may result in a high internal consistency index, the resulting nar-
rowness of the scale may result in reduced validity (see Roznowski & Hanisch, 1990). A better way to
minimize the error associated with unique item content is to increase the number of items while mak-
ing sure that individual items do not share construct-irrelevant components (i.e., are contaminated).

Error Due to Raters


The classic Japanese movie Rashomon (Jingo & Kurosawa, 1950) is a good way to understand the
nature of rater error. In that movie, several observers witness the same crime. Although when they
retell what they observe, their retellings are vastly different. When observing behavior or coding
Evaluation of Measures 401

written behavior, observers interpret information differently. Some raters are more lenient, others
more stringent. Some interviewers might give preference to blondes whereas others may uncon-
sciously give high ratings to people who wear blue ties. Differences in rater behavior can sometimes
be reduced by providing training, although given the different ways individuals view the world,
these differences are unlikely to be completely eliminated.
There is a large amount of error related to raters when judging job-related variables. For exam-
ple, Woehr, Sheehan, and Bennett (2005) found that unique, idiosyncratic source-specific factors
were responsible for two-thirds of the variance in performance ratings. Employment interview
researchers have also demonstrated the interrater reliability of interviewees is typically fairly low
(see Conway, Jako, & Goodman, 1995).
There are many ways to reduce the amount of error related to raters. If at all possible, it is
important to standardize the nature of information that different raters observe. In addition, provid-
ing frame-of-reference training that attempts to provide common standards of comparison might
help improve interrater reliability. Computerized scoring algorithms are used by large-scale testing
companies to interpret and score written essays in the GRE and other certification tests, thereby
eliminating the possibility of rater unreliability. If you cannot reduce error by standardizing infor-
mation, the best way to reduce it is to increase the number of raters, thereby reducing the amount
of error through aggregation in the same way that increasing the number of items reduces internal
inconsistency. Taking an average of many raters will cancel out the positive and negative errors
associated with individual raters.

Error Due to Momentary Time-Limited Factors


There are lots of reasons that scores on tests may vary from one testing administration to another.
Weird things can happen in testing administrations. For example, in an entrance testing session one
of our students witnessed another student vomiting (perhaps because of nervousness) in the vicin-
ity of other students. It is likely that the students who were near the projectile vomiter would score
lower on that particular administration compared with other times. Although that is a bizarre event,
there are many time-limited errors that can be due to test administrators, the testing environment,
or temporary issues related to the test-taker.
Test administrators can give too much time or not enough time. They can be unnecessarily
harsh and intimidating, thus increasing test anxiety, or they can be so welcoming and pleasant that
test-takers do much better than normal. Administrators can give erroneous instructions or they can
inadvertently give away correct answers for difficult items.
Related to the testing environment, the heating or air conditioning system can fail. A picture in
the testing room of the school principal might remind a test-taker of a mean uncle who used to taunt
him or her, thus prompting that student to do poorly. Or that student may be given a comfortable
chair that fits just right.
Test-takers can have unique things happen to them on one testing occasion that might not hap-
pen to them on another testing occasion. Test-takers can be hung over or sick with the flu. They
could have just been dumped by a fiancée. They may have had an especially good night’s sleep or
an especially restless one.
Regardless of the source of time-limited momentary effects, these are events that are unlikely to
happen if the test-taker were to take the test at a different time. Events that are predictable and are
expected to occur each and every time a respondent takes a test would not be considered error even
if they were distinct from the construct that the test is measuring. For example, test anxiety would
not be considered error in the context of test-retest reliability if the test-taker experienced the same
level of anxiety each time he/she took a math test, although test anxiety is clearly a different con-
struct than mathematics ability. Although it would be impossible to eliminate all sources of time-
limited error, it is possible to minimize the effects of error due to administration and environment
by having standardized instructions and environments for test-takers.
402 Handbook of Employee Selection

Measures of reliability sensitive to time-limited factors (e.g., test-retest reliability) rest on the
assumption that all score differences across two separate testing administrations are due to momen-
tary time-limited errors. Of course, differences in scores across two administrations can be due not
only to time-limited errors such as the ones mentioned but also to true change in the underlying
construct. For example, dramatic changes in vocabulary test scores given across 2 years time are
more likely due to true growth in vocabulary rather than momentary, time-limited errors.

Conclusions on Sources of Error


Error can contaminate our measures and compromise measurement. This can wreak havoc in work
contexts such as top-down selection, where small differences in test scores might have significant
consequences. Sources of error can be reduced by carefully attending to item content, increasing
the number of items, increasing the number of raters, providing training to standardize raters, and
standardizing test administrations. However, test developers are often unaware of the amount of
error that their tests are likely to generate because they have used a single operationalization of
reliability, generally internal consistency, which is sensitive to one source of error but ignores other
sources of error. One way to calculate the effects of multiple sources of error is through application
of generalizability theory, which is an analysis of variance (ANOVA)-based approach that can be
used to determine the magnitude of various sources of error simultaneously. Generalizability theory
approaches to error estimation are used less frequently than traditional approaches to reliability
because they require more extensive data collections (especially compared to internal consistency
analyses). For readers more interested in generalizability theory, we refer them to Chapter 2, this
volume, as well as Shavelson and Webb (1991).

Validity
Our review of validity focuses on sufficiency and contamination, two concepts that are deemed
critical for demonstrating evidence of validity. There are several approaches to validation, including
criterion-related validation, content validation, and construct validation, with each of the methods
having different techniques and methods for evaluating the amount of validity. We do believe that
all approaches to validation are related to each other and the concepts of sufficiency and contam-
ination—although most often used in discussion of content validation—are relevant to all forms
of validation [see Landy, 1986; the Society for Industrial and Organizational Psychology (SIOP)
Principles, SIOP, 2003]. However, as we describe below, the issue is complicated by confusion
over the role of sufficiency in unidimensional scales and by an implied quasi-isomorphism between
sufficiency and content validity. For these reasons, we focus most of our attention on the role of suf-
ficiency in content validation.

Sufficiency
In discussions of validity, it is often asked whether the test in question covers all the ground
that it should. For example, measures of job performance have been expanded to accommodate
dimensions that have been added to models of job performance (e.g., adaptive performance;
Pulakos, Arad, Donovan, & Plamondon, 2000). This change was made because of criticism that
existing measures of job performance focused too restrictively on dimensions related to produc-
ing while ignoring dimensions related to supporting the organization and its members. Similarly,
intelligence measures have been expanded to accommodate dimensions that have been added
to models of intelligence (e.g., practical intelligence; Sternberg, Wagner, Williams, & Horvath,
1995). These additions were made because of criticisms that existing measures of intelligence
focused too much on types of cognitive skills related to academic success compared with other
domains in life.
Evaluation of Measures 403

One might conclude from these criticisms that existing measures were insufficient. However, it
would be more appropriate to say that existing models of performance were insufficient, and that
the measures merely reflected the inferior models on which they were based. If we assume that a
measure is unidimensional, then insufficiency can only indicate factorial complexity at the model
level. It seems more parsimonious, then, to stipulate that sufficiency is a property of conceptual
models rather than measures. Once a model has been deemed to cover the full breadth of its domain
(e.g., a performance model that is comprised on technical performance, contextual/citizenship per-
formance, adaptive performance, interpersonal performance, etc.), then unidimensional scales mea-
suring each factor can be developed. Reliability then reflects the proportion of true score variance,
and validity represents lack of contamination.
This position may seem at odds with the American Psychological Association (APA), National
Council for Measurement in Education (NCME), and American Educational Research Association
(AERA) Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999). In
the section on content-related evidence, it is stated that:

Construct underrepresentation … may give an unfair advantage or disadvantage to one or more


subgroups. Careful review of the construct and test content domain by a diverse panel of experts may
point to potential sources of irrelevant difficulty (or easiness) that require further investigation. (p. 12)

There are several observations to be made about this passage. The first is that sufficiency is inex-
tricably intertwined with content-related validity evidence. Evidence of insufficiency comes from
a comparison of test content to the “content domain.” Omissions suggest insufficiency. Second, the
solution that is offered in the passage has to do with contamination rather than sufficiency. This may
have been incidental, but it may also have been due to an inability to refer to insufficiency without
also referring to deficiencies in the definitions of the construct of interest and of the domain of items
that apply to it. Third, this passage is representative of the Standards (AERA, APA, & NCME,
1999) as a whole in that nowhere in the Standards are issues of sufficiency raised without reference
to content.
Although the term “sufficiency” does not appear in the index of the Standards (AERA, APA, &
NCME, 1999) or in any of the relevant standards themselves (e.g., 1.6, 1.7, 3.2, 3.6, 14.8, 14.11),
issues related to sufficiency appear in every section that deals with content-related evidence. Issues
relating to contamination, on the other hand, appear in every section that deals with evidentiary
bases of inferences.
Our position is that content-related evidence has the potential to expose insufficiency only if the
construct is poorly specified. If the construct is well specified, then insufficiency is not possible in
the absence of egregious oversight. Therefore, we recommend that to ensure sufficiency, researchers
devote additional effort to better explaining the conceptual foundations of their measure or model.
From our experience, many scale development efforts jump straight into item writing with little
attention paid to careful explication of the construct that those items are supposedly measuring.
Engaging in more “upfront” thinking about the target construct will help ensure sufficiency. Finally,
it should be noted that in some cases, organizations may purposely use measures that are insuffi-
cient for purposes of expediency or cost. In these cases, organizations should be careful to address
whether their insufficiency results in differences in selection rates for minority groups compared to
measures without insufficiency.

Contamination
As noted in the introduction, measurement contamination implies that a particular measure is influ-
enced by unwanted sources of variance, different from the construct of interest. Confirmatory factor
analytic frameworks are helpful in understanding the complex multidimensional nature of contami-
nation by isolating different sources of variance. As will be noted throughout this section, concern
404 Handbook of Employee Selection

for contamination is motivated not only by the psychometric goal of creating a “pure” measure, but
also by a desire to minimize sources of irrelevant variance that covary with membership in protected­
demographic classes that are accorded special protection under employment law. Therefore, all
industrial-organizational (I-O) psychologists should be concerned with the contamination of their
instruments. Given the complexity of the analyses that can be used to quantify contamination, we
devote more space on this topic than reliability and sufficiency.

Confirmatory Factor Analytic Approaches to Contamination


Contamination implies that a particular measure is influenced by sources of variance other than
the construct of interest. Although sources of irrelevant variance could arise from method effects,
response styles, or irrelevant constructs, the largest concern within a selection context is contami-
nation of scores by irrelevant variance due to respondents’ membership in legally protected classes
and/or disadvantaged groups. U.S. employment law prohibits making employment decisions on
the basis of group membership defined by race, color, religion, gender, nationality (Civil Rights
Act of 1964), age (Age Discrimination in Employment Act of 1967), and disability (American with
Disabilities Act of 1990), whether or not this is the intent of the employer. In this sense, the use
of test scores that vary due to race or another characteristic defining a protected class can cre-
ate adverse impact, increasing an employer’s chances of involvement in litigation (Williamson,
Campion, Malos, Roehling, & Campion, 1997). Some governments in other areas of the world
also have legislative constraints that have had a similar impact on selection practice (e.g. South
Africa, Canada; Myors et al., 2008), although these countries are a minority in that regard. Aside
from legal concerns, ignoring measurement differences between subpopulations can negatively
impact decisions based on organizational research (Drasgow, 1984; 1987) and diversification efforts
(Offerman & Gowing, 1993), and cause negative applicant reactions to the assessment (Gilliland &
Steiner, 1999). In an international context, interest in establishing validity evidence for selection
procedures in some countries has been motivated purely by public relations and/or social ethics,
not legislation (e.g., Belgium, Italy, The Netherlands, Spain, etc.; see Myors et al., 2008). Thus, it
is imperative for researchers in organizations to examine whether the adequacy of an assessment
method is similar across groups that may be legally, practically, socially, or theoretically important.
In addition to legal and practical concerns, the consideration of potential differences across sub-
populations has scientific value. For example, hypotheses about cultural differences can be tested by
examining the ways in which people from different cultures respond differently to certain items.
In the rest of the chapter, we will focus mostly on contamination resulting in subgroup differ-
ences for protected classes. Other forms of contamination are also worthy of concern but often
can be tolerated, at least in a personnel context, if they do not result in subgroup differences. For
example, if a particular measure of a personality trait is contaminated because of the influence of
a response style bias, this may result in lower validity for the personality measure, which would
itself be a negative consequence. However, reduction in validity is not likely to prompt litigation. It
should be noted that if there are differences in response bias across cultures (as evidence suggests),
this should prompt additional concern. It should also be noted that group differences on a particular
measure are acceptable if those group differences covary with variance that is job-related. In any
case, most of the lessons to be learned from subgroup-related contamination also apply to other
forms of contamination.
In the psychometric literature, the consideration of assessment adequacy across subpopulations is
often referred to as measurement invariance or measurement equivalence and is termed throughout
this chapter as measurement invariance/equivalence (MI/E). The term invariance refers to a desired
lack of systematic measurement variance as a function of group membership. For instance, the ques-
tion could be posed, “Are Asian Americans’ test scores systematically different than Caucasians’
regardless of their ability?” One methodology that permits us to make assertions about measure-
ment differences across groups is confirmatory factor analysis (CFA), or measurement modeling in
the linear structural relations (LISREL; Joreskog & Sorbom, 1996) model. CFA techniques provide
Evaluation of Measures 405

an appropriate way of assessing how the structure of constructs differs across groups. It should
be noted that item response theory (IRT) is another way to determine measurement equivalence,
although we have chosen to exclude discussion of IRT for parsimony’s sake. The interested reader
should consult Raju and Ellis (2002).

LISREL and CFA


The LISREL (Joreskog & Sorbom, 1996) model is a system through which hypotheses concerning
the interrelationships among observed and latent variables can be formulated and tested. Under the
LISREL model, the equation

x = Λξ + δ (19.1)

defines the vector of observed variables. This can be expressed as a matrix formula in which the
values of x are determined by the strength of the relationships between the variables and the traits
(i.e., factor loadings), Λ; the values of the traits themselves, ξ; and a vector containing the residuals,
or uniquenesses, δ. To elaborate, consider a person’s responses to a test with four items or observa-
tions and two proposed dimensions or factors, which are clarified in Equation 19.2.

 x1  λ11 0   δ1 
     
 x2  = λ 21 0  ξ1  + δ 2  (19.2)
 x3  0 λ12  ξ 2  δ 3 
     
 x4  0 λ 22  δ 4 

The researcher has specified that the first two items belong to the first of the j factors and the remain-
ing two to the second dimension by setting the factor loadings, λij , to 0 for the factors’ excluded
items, and leaving the others free to vary. Thus, the structure of the response-trait relationship is
specified a priori. The CFA model represented in Equation 19.2 is depicted in Figure 19.1.
As can be seen in the figure, we are working not only with observed measures, but also with unob-
servables. Thus, we cannot work with Equation 19.1 alone. Instead, we must turn to the covariance
matrix of the observed variables, xi. To utilize the covariance matrix, it must be assumed that the
factors, ξ, and the uniquenesses, δ, are random variables. Secondly, the residuals, or uniquenesses,­
should not systematically vary across factors. Finally, observations must be measured in deviations

21

1 2

1 2 3 4

x1 x2 x3 x4

1 2 3 4

1 2 3 4

FIGURE 19.1  A two-factor model with two items per factor.


406 Handbook of Employee Selection

from their mean. If these assumptions hold, we can conclude that the covariance matrix takes the
form

Σ = Λ x ΦΛ ′x + Θ (19.3)

Equation 19.3 states that the covariance matrix of the observed variables is equal to the product of
the factor loading matrix; its transpose; and the variance-covariance matrix, Φ, of the latent traits,
ξ, plus the covariance matrix, Θ, of the uniqueness terms, δ. By specifying the expected form of
the variables as shown with Equation 19.2, its covariance matrix (Equation 19.3) can be used to
determine if the data are likely given the hypothesized model. The resulting χ2 statistic reflects the
deviance of the predictions made by the hypothesized model from the observed data. This statistic
is used to evaluate the goodness-of-fit of the CFA model.

Overview of CFA MI/E


The characteristics of LISREL’s CFA submodel allow for what is called a nested modeling approach.
The term nested modeling is used because of the fact that within the relatively complex CFA mea-
surement model (Equation 19.1; Joreskog & Sorbom, 1996) there are simpler models “nested”
within. For example, if we set one of the terms in Equation 19.1 to be zero, the resulting model is
said to be nested within Equation 19.1. These nested models are derived by inducing constraints
on the parameters in the covariance matrix such that the more complex model contains all of the
parameters of the nested model plus one or more additional parameters. In conducting MI/E studies
comparing two groups, we have two models, and thus, two covariance matrixes, Σg and Σg′.
If we test the equality of Σg and Σg′, we are essentially asking, “Are the covariance matrixes the
same, in general across groups?” To do this, we simply fit the model to the aggregated data (i.e.
including both groups). If the test is nonsignificant, we have shown MI/E configural invariance.
However, if we want to see if factor loadings (or any other parameter matrixes) are equal across
groups, we would constrain the factor loading parameter, Λ, to be equal across groups, arriving at
one of the models nested within Equation 19.1 for each group. However, it should be noted that this
test for equality of covariance matrices is often statistically significant. Caution should be used to
interpret statistical significance. At this point, fit for the constrained model is compared against a
less constrained model; any remaining lack of fit in the constrained model cannot be attributed to
different factor loadings between groups.
As noted by Wang and Russell (2005), the psychometric literature shows a high degree of con-
gruence among researchers that MI/E should be conducted in a hierarchical step fashion (Byrne,
1989; Raju, Laffitte, & Byrne 2002; Rensvold & Cheung, 1998; Vandenberg & Lance, 2000). It
should be noted here that MI/E can be tested by the omnibus hypothesis: Σg = Σg′ where g denotes
a reference group, and g ′ denotes a focal group, usually the protected class. Confirmation of this
test would indicate that further analyses were not necessary because the covariance matrix and the
elements that it is linearly dependent on are equivalent across groups, meaning that scores are not
contaminated by group membership. Using this test as a starting point is the most efficient use of
the MI/E framework from a hypothesis-testing strategy. Despite the frequency with which MI/E
researchers suggest the use of the omnibus test, surprisingly few MI/E studies appear to apply it
(Vandenberg & Lance, 2000).
The omnibus χ2 test used for the omnibus model, which is simply the fit of the model in the
entire sample (i.e. protected and nonprotected groups), will, more often than not, be significant as
it is known to have high type I error rates. However, it should still be tested to clearly justify carry-
ing on with subsequent tests using transformations of χ2 to control type I error, such as root mean
square error of approximation (RMSEA) or the comparative fit index (CFI). This is similar to post
hoc testing of factor levels in ANOVA, in that a main effect should be established first. Difference
in fit between models is evaluated via statistics such as Δχ2 or by placing confidence intervals
Evaluation of Measures 407

(CIs)1 around the RMSEAs for each model and comparing each succeeding model’s interval to the
prior model’s interval. If the two do not overlap, there is a significant difference in fit, and thus a
violation of measurement invariance (see Cheung & Rensvold, 1992; Kline, 2005, for an excellent
introduction to issues involving fit).
In the following section, we focus on the steps taken in the case in which the covariance matrices
exhibit MI/E. The test of the above hypothesis is carried out in the same fashion as the tests discussed
here. If the test is rejected, we are concerned with uncovering the source of the problem through
sequential model comparisons, which are the focus of this paper. The four forms of MI/E presented
here are: (a) configural invariance, (b) metric invariance, (c) factor variance-covariance (FVC)
invariance, and (d) uniqueness invariance. We now present an examination of each form of MI/E,
how they are tested, and their implications for evaluating test scores across groups. Additionally, the
hypotheses are presented in the order in which they are conducted. These hypotheses provide infor-
mation relevant for evaluating contamination. They are presented utilizing an adaptation of Wang
and Russell’s (2005) examination of measurement equivalence between Chinese and American
respondents on the Job Descriptive Index (JDI).

Sources of Invariance: A Handful of Hypotheses


Configural Invariance
Configural invariance has been dubbed the weakest form of MI/E (Horn & McArdle, 1992;
Vandenberg & Lance, 2000). In other words, the establishment of the configural invariance of
a measure between two groups provides only limited justification for equating scores across the
groups. Evidence of configural invariance is achieved through examining the extent to which the
prespecified measurement model simultaneously fits in the two groups. Consider the model in
Figure 19.2. Configural invariance exists if this two-factor model holds for both groups. Configural
invariance is a weak form of invariance because it makes no reference to between-group differences
in factor loadings, unique variances, correlations among factors, etc. In Figure 19.2, each estimated
value has a unique subscript, indicating that each is estimated separately. There is no indication that
any of these values must be equal between groups. As long as items 1 and 2 load on one factor and
items 3 and 4 load on the other for both groups, configural invariance exists.
Although configural invariance is the weakest form of measurement equivalence, it is still impor-
tant to examine. First, the finding of configural invariance has been identified as a prerequisite to
continuing in the sequence of steps undertaken in specifying the source of MI/E (Meredith, 1993).
Second, researchers have called the test of configural invariance a baseline model (Bagozzi  &
Edwards, 1998; Reise, Widaman, & Pugh, 1993). The test of configural variance is not only a test
of the hypothesis concerning the invariance of factor patterns across subgroups; it also serves as a
first step in examining the sources of MI/E and has thus been designated by researchers as a base-
line model. This model should be carefully constructed, using theory and past research to guide
its specification. Where theory or past research is not available, it is possible to conduct a sepa-
rate exploratory factor analysis2 study to build a theoretical model for confirmatory analyses; this
has been shown empirically and theoretically (Fabrigar, Wegener, MacCallum, & Strahan, 1999;
Gerbing & Hamilton, 1996; Hurley et al., 1997; Haig, 2005). The mis-specification of the baseline
model may lead to erroneous conclusions about the use of scores and may render all subsequent tests
of MI/E meaningless.

1 Constructing a 90% CI around models and comparing them for overlap is conceptually equivalent to a significance
test for ΔRMSEA at the .10 level; if the intervals overlap, the test is nonsignificant, whereas if they do not, the test is
significant.
2 It should be noted that principal components is not appropriate here because we are dealing with latent traits, and they do

not discriminate between common and unique variance (see Fabrigar et al., 1999).
408 Handbook of Employee Selection

211

1 2

1 2 3 4

x1 x2 x3 x4

1 2 3 4

1 2 3 4

212

1 2

5 6 7 8

x1 x2 x3 x4

5 6 7 8

1 2 3 4

FIGURE 19.2  Two-group configural invariance model.

By confirming the hypothesis associated with configural invariance, we have only begun to
demonstrate MI/E, and only to a small degree. However, this is an important step in that (a) it is the
prerequisite to more accurately determining the “problem,” or source of MI/E; and (b) it allows us
to confirm that the same number and pattern of factors are found in different groups, suggesting that
the latent structure is not systematically impacted by group membership. By examining Table 19.1,
which was adapted from Wang and Russell’s (2005) examination of the JDI across American and
Chinese samples, we see that the RMSEA value is below .08, which suggests that configural invari-
ance holds for these groups. In the context of the JDI, this means that there is equivalent fit for the
data of both groups (Chinese and American) combined to a five-factor model including the trait fac-
tors pay, work, coworker, supervision, and promotion. By finding adequate fit in the baseline model,
we have only shown evidence that, across groups, the number and pattern of factors in the proposed
measurement model are adequate in describing the data. However, had the hypothesis been rejected,
stopping here would have left other important questions unasked. Thus, we continue the sequence
of hypothesis testing. Not surprisingly, we will see the other forms of invariance supported, as was
suggested by the omnibus test.
Evaluation of Measures 409

TABLE 19.1
Simultaneous CFA Across American (n = 1,664) and Chinese (n = 1,319) Subpopulations
Model
Specified Type of
Across Sample Invariance df χ2 Comparison Δdf Δχ2 RMSEA (90% CI)
Model 1: free Configural 4,948 20,925 None .047 (.042–.051)
Λx, Φ, Θδ
Model 2: equal Metric 5,015 22,286 Model 1 vs. Model 2 67 1,361 .048 (.043–.052)
Λx, free Φ, Θδ
Model 3: equal FVC 5,030 22,917 Model 2 vs. Model 3 15 630 .049 (.044–.053)
Λx, Φ, free Θδ
Model 4: equal Uniqueness 5,102 25,078 Model 3 vs. Model 4 72 2,160 .051 (.046–.055)
Λx, Φ, Θδ

Source: Adapted from Wang, M., & Russell, S. S., Educational and Psychological Measurement, 65, 709–732, 2005.

Metric Invariance
Whereas configural invariance is a weak test of MI/E, the test of metric invariance is considered
a “strong” test of MI/E (Vandenberg & Lance, 2000, p. 12). Configural invariance does not allow
us to draw the conclusion that the relationship between latent variables and observed item scores is
similar across groups. Where the configural test asks if the hypothesized latent structure is similar
across groups, the metric invariance test examines whether the relationship between the observa-
tions (or item responses), xi, and the latent traits, Λ, are equally strong across groups, suggesting
freedom from contamination by group membership. Stated formally, it is the hypothesis that there
is no significant difference between the factor loadings of the groups. By placing the constraint of
equality, Λ g = Λ g ′, on the baseline model we can test this hypothesis.
Consider the model in Figure 19.3. It is identical to that of Figure 19.2 except that loadings for
Group 2 have the same subscripts as the loadings for Group 1, indicating that the loading for each
item on its latent variable is constrained to be equal between groups.
Such a constraint also supplies us with Model 2 from Table 19.1. We may now compare Model
2 to the configural or baseline model (Model 1). As can be seen in Table 19.1, the 90% CIs for
RMSEA in the two models overlap. Therefore, we can say that evidence for metric invariance has
been observed, or that the relationship between the trait or factor and the observed response is
equivalent across Chinese and American respondents for the JDI.
A finding of metric invariance is deemed a strong test because it allows us to make much stronger
assertions about the comparability of items across groups or forms. If metric invariance holds for
the measure, it means that “People in different [groups] respond to items the same way” and may
therefore be compared across groups (Steenkamp & Baumgartner, 1998, p. 80). This is because
we have evidence that there is a similarly strong relationship between our observations and the
unobservable traits or abilities we are measuring. Further, this means that, for both groups, given a
certain increase in the latent trait, a particular magnitude of change is observed in scores. Without
showing evidence of metric invariance, comparing the two groups’ test scores would be analogous
to comparing temperatures between Arizona and Alaska when the researcher in Arizona has taken
measurements in Celsius and the Alaskan in Fahrenheit.
In a selection context, metric invariance is an important consideration. If the relationships between
the trait(s) and the observed responses are contaminated, resulting in lower scores for members of
one group in an applicant pool, an organization may unknowingly make employment decisions on
the basis of subgroups defined by race, sex, or national origin. Members of some of these groups are
protected by laws in several countries (see Myors et al., 2008 for a review). Additionally, Meredith
410 Handbook of Employee Selection

(1993) has shown that in a selection procedure (defining “fairness” as the case where two individu-
als with the same “true” score have the same probability of being selected from the applicant pool;
this is a scientific definition, not a legal one) metric invariance is necessary for fairness to exist.

Factor Variance-Covariance Invariance


As stated earlier in this chapter, the LISREL model evaluates not only the relationships between
observed and unobserved variables (as is the case when considering metric invariance) but also the
interrelationships among unobserved variables. One way to do this is to consider the hypothesis that
relationships between the factors proposed in the specified model are equivalent across groups. For
instance, we may be concerned with a set of tests measuring the factors of verbal and mathematical
ability. Factor variance-covariance (FVC) invariance asks the question “Is the correlation between
verbal and math ability contaminated by group membership?”
The test of FVC invariance is an amalgam of two hypotheses: (a) That the factor variances and
(b) factor covariances are equivalent across subpopulations (Vandenberg & Lance, 2000). To test
this hypothesis, we induce the constraint that the FVC matrix be equal across groups, or: Φ g = Φ g ′ .
For Figure 19.3 to represent a test of FVC invariance, each groups’ factor covariance, Φ, term would
be constrained to be equal to one another. In addition, the factor variances would be constrained to
equality.
If the hypothesis of FVC invariance is rejected, then the relationships among tests, and perhaps
the conceptual domain itself, differ across subpopulations. Additionally, this test can allow us to
know, partially, if a hypothesized nomological network holds across two subpopulations. Separately,
rejecting these tests will indicate that (a) respondents in both groups do not use the same scale range
to respond to items and (b) that the pattern and strength of factor covariances are not constant across
groups, respectively. Together, they have been taken to show that factor intercorrelations differ
across subpopulations (Steenkamp & Baumgartner, 1998). However, Vandenberg and Lance (2000)
have noted that researchers have not been specific concerning the meaning of this test despite the
frequency with which it has been tested. Examining Table 19.1, we see that the hypothesis of equiva-
lent interrelationships holds across groups, given that the RMSEA intervals exhibit overlap. This
finding would suggest the intercorrelations between latent attitudes on the JDI (e.g., the correlation
between pay and work) are not substantially different between the American and Chinese samples.

Uniqueness Invariance
Measurement error lies at the heart of every test development effort. In considering the use of
a test of subpopulations, it is important that the scores resulting from observations or responses
have equivalent residuals in both groups. If an assessment shows smaller residuals for one par-
ticular group, then scores in that group have increased measurement precision compared with
the other group. This would imply that our estimates of uniqueness are contaminated by group
membership.
The matrices we constrained to create Model 4 contain what are known as uniquenesses; this
matrix is the sum of (a) random error and (b) systematic error not accounted for by the latent factor
(Mulaik, 1972). We can determine the invariance of uniquenesses of indicators by inducing the
constraint that Θ sg = Θ sg ′ ; unique variances are now constrained to be equal. Regarding Table 19.1,
such a constraint creates Model 4, which we in turn compare to Model 3. Examining Table 19.1,
we see that the hypothesis of equal residuals was not rejected. We can take this to mean that there
is no more measurement error for one group than the other. Several authors (e.g., Steenkamp
& Baumgartner, 1998; Wang & Russell, 2005) have used this interpretation. It should be noted
that attention has been drawn to the importance of establishing equal factor variances before the
interpretation of this test as one of equivalent uniquenesses, and that corrections must be made if
Evaluation of Measures 411

212

1 2

1 2 3 4

x1 x2 x3 x4

5 6 7 8

1 2 3 4

211

1 2

1 2 3 4

x1 x2 x3 x4

1 2 3 4

1 2 3 4

FIGURE 19.3  Two-group metric invariance model.

they are not (see Vandenberg & Lance, 2000, p. 34). Additionally, it is important to understand
that this test is not only indicative of differences in random error across groups, but also irrelevant
systematic error, which would be a source of contamination in scores. Thus, in the present case
(i.e., Table 19.1), we would conclude that similar levels of measurement precision (or inversely,
error) are obtained when fitting the five-factor JDI model to the American as compared with the
Chinese sample.
In evaluating employee selection measures, we would like to know that both groups’ scores are
equally precise. The amount of residual variance is indicative of (although not analogous to) the
lower bound of its validity (Crocker & Algina, 1986). Thus, tests (e.g., those in a selection battery)
whose scores are contaminated by group membership could have problems with validity in the form
of differential prediction or relationships with external variables.

CFA MI/E in Selection Research


Selection researchers have utilized CFA measurement equivalence studies to address various
questions and concerns. Here we review studies that discuss interesting applications of MI/E in
412 Handbook of Employee Selection

the selection context, organized into two broad categories: as being either focused on (a) predictors
or (b) criterion. However, it should be noted that this is not meant to be a comprehensive review
of such studies.

MI/E Studies Involving Predictors


Some researchers have addressed the role of race in selection testing through MI/E studies on
predictor types. Schmitt and Mills (2001) questioned the extent to which scores obtained via “tradi-
tional” (i.e., paper-and-pencil) selection tests could be compared to those extracted by means of a job
simulation, particularly a telephone-based exercise measuring similar constructs across Caucasian
and African-American applicants. The authors utilized the hypotheses of configural metric, FVC
(utilizing separate tests for variances and covariances), and uniqueness invariance, among others,
to address this issue. The configural model represented two factors: a “traditional test” factor and
a “simulation test” factor. The configural, or baseline, model proved to be a good fit to the data.
When the constraint of equal factor loadings (metric invariance) was added, fit decreased trivially,
suggesting that observed scores were related to their respective factors with similar strength for
Caucasians and African Americans. Additionally, tests of uniqueness invariance showed little dif-
ference in residuals across groups. However, constraints creating the comparison of factor variances
(a partial test of FVC invariance) proved to decrease model-data fit significantly because of larger
variances in the African-American group, which are attributed to a greater number of low scores in
the African-American sample in traditional paper-and-pencil tests. Further, constraints creating a
test of factor covariances (completing the test of FVC) produced a significant drop in fit, suggesting
that the relationships between traditional and simulation tests were not as strong in the Caucasian
group as they were in the African-American group.
Robie, Schmit, Ryan, and Zickar (2000) examined the extent to which items that were work-
specific resulted in measurement invariance. Specifically, they hypothesized that contextualizing
personality items would narrow the range of the scale used by respondents. Using the configural,
metric, and uniqueness MI/E hypotheses outlined by Bollen (1989), they found, as expected, that
contextualizing items led to lower internal consistency reliability, thus exemplifying how MI/E can
be used to understand how cognitive framing may alter the measurement properties of psychologi-
cal measures. Their research suggested that framing items within a work context reduced contami-
nation in the personality measures.
Some studies have examined measurement equivalence across modes of administration.
One study, in particular, showed through a series of MI/E model tests that many personality
traits (e.g., conscientiousness) exhibited measurement equivalence between paper-and-pencil
tests, but that researchers must be wary of assuming that psychometric properties of a test
hold in administering its e-version. This study is important in that it invoked and illustrated
a pairing of MI/E and experimental methodologies to uncover the source of MI/E (Meade,
Michels,  & Lautenschlager, 2007), which could prove to be a very helpful methodology for
selection researchers wishing to examine how different methods contaminate the measurement
of constructs of interest.

MI/E Studies Involving Criteria


Well-developed criteria (e.g., managerial performance ratings) are imperative to the establishment
of a meaningful criterion by which employees are moved up or down in an organization. They are
also important for validating the basis for hiring. Thus, for legal and ethical reasons, it is advanta-
geous for organizational researchers to investigate MI/E across protected subgroups or groups of
interest with regard to criteria. Selection researchers must ensure that their criteria are uncontami-
nated by issues related to format, rater, or method.
Evaluation of Measures 413

Much interest in MI/E studies in the selection literature is focused on the equivalence of rating
sources. Facteau and Craig (2001) showed an interesting approach to this question by estimating a
baseline model across self, peer, supervisor, and subordinate performance ratings and compared it
to a metric invariant model. Evidence for metric invariance was found across all sources of ratings,
suggesting that these sources respond to items in the same fashion. This finding has repercussions
for how we view multisource rating data.
Diefendorff, Silverman, and Greguras (2005) filled this gap by examining self, peer, and super-
visor ratings using the sequence of hypothesis testing suggested by Vandenberg and Lance. Their
data showed a high degree of MI/E using various measures and these three rating sources. In the
same issue of the Journal of Business and Psychology, Greguras (2005) showed how MI/E perfor-
mance ratings can be used to ask theoretical questions about rating MI/E. His analysis showed that
the rating process of managers does not appear to be disturbed by job experience. Thus, although
MI/E analyses are more common on the predictor side, they can also be useful in efforts designed
to evaluate the criterion side.

Conclusions
In this chapter, we have discussed various measurement criteria and ways to evaluate measures
all with the goal of determining whether an instrument has minimized various sources of error,
minimized contamination, and maximized sufficiency. Instead of rehashing the main points made
throughout this chapter, we thought it best to end with three summary points.

1. Statistics should inform scale development but not dominate the process. These days,
many scale evaluation efforts could be conducted by a computer program that was pro-
grammed to select items merely on the basis of maximizing some statistical criterion, usu-
ally internal consistency or goodness-of-fit. Such analyses are easy to do given the ease of
programs such as SPSS that report the alpha-if-item-deleted statistic. People often have to
do strange things to get goodness-of-fit statistics to reach acceptable levels, such as letting
error variances covary. Statistics should certainly inform the process, because it is a good
thing, in general, to have models that fit and measures that reflect high internal consistency.
But relying on these statistics alone might result in narrow measures that fail to correlate
with any external variable.
2. Error is much more than internal consistency. Researchers are obsessed with internal con-
sistency measures for evaluating reliability. This fascination is undoubtedly due not to the
appropriateness of internal consistency but to the ease with which it can be computed and
the frequency with which it is accepted as the sole indicator of scale quality. Relying solely
on internal consistency may provide researchers with a false sense of comfort, given that
these indices ignore other sources of error that might be important. Other assessments of
reliability require multiple data collections or multiple raters, whereas internal consistency
merely requires multiple items.
3. Scale evaluation is hard work. Most researchers involved in scale development and evalu-
ation efforts are interested primarily in determining whether measure X is a good measure
of construct X for the purposes of conducting their substantive research. Scale evaluation
work is ancillary to their main interests and may warrant only a few sentences in the meth-
ods or results section. Scale development and evaluation should be considered a continuous
process that unfolds over time, rather than an objective that is achieved in a single study or
publication. Of course, the need to continuously improve a scale should be balanced by the
need for comparability across studies and time. Ideally, a scale should be scrutinized using
various methods and should be examined by various researchers. Too often, a scale gets
published and then other researchers use it assuming that, because it had been published,
that the psychometric work has been done. We wish things were that easy!
414 Handbook of Employee Selection

References
Age Discrimination in Employment Act of 1967, 29 U.S.C. § 621 (1967).
American Psychological Association, National Council for Measurement in Education, and American
Educational Research Association. (1999). Standards for educational and psychological testing.
Washington, DC: American Psychological Association.
Americans with Disabilities Act of 1990, 42 U.S.C. § 12101 (1990).
Bagozzi, R. P., & Edwards, J. R. (1998). A general approach for representing constructs in organizational
research. Organizational Research Methods, 1, 45–87.
Bollen, K. A. (1989). Structural equation modeling with latent variables. New York, NY: John Wiley.
Byrne, B. (1989). A primer of LISREL: Basic applications and programming for confirmatory factor analytic
models. New York, NY: Springer Verlag.
Cheung, G. W., & Rensvold, R. B. (1992). Evaluating goodness-of-fit indexes for testing measurement invari-
ance. Structural Equation Modeling, 9, 233–255.
Civil Rights Act of 1964, 42 U.S.C. § 253 (1964).
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. Belmont, CA: Wadsworth-
Thompson Learning.
Diefendorff, J. M., Silverman, S. B., & Greguras, G. J. (2005). Measurement equivalence and multisource rat-
ings for non-managerial positions: Recommendations for research and practice. Journal of Business and
Psychology, 19, 399–425.
Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with
external variables are central issues. Psychological Bulletin, 95, 134–135.
Drasgow, F. (1987). Study of the measurement bias of two standardized psychological tests. Journal of Applied
Psychology, 72, 19–29.
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the use of exploratory
factor analysis in psychological research. Psychological Methods, 4, 272–299.
Facteau, J. D., & Craig, S. B. (2001). Are performance appraisal ratings from different rating sources compa-
rable? Journal of Applied Psychology, 86, 215–227.
Gerbing, D. W., & Hamilton, J. G. (1996). Viability of exploratory factor analysis as a precursor to confirmatory
factor analysis. Structural Equation Modeling, 3, 62–72.
Gilliland, S. W., & Steiner, D. D. (1999). Applicant reactions. In R.W. Eder & M. M. Harris (Eds.), The employ-
ment interview handbook (pp. 69–82). London, England: Sage.
Greguras, G. J. (2005). Managerial experience and the measurement equivalence of performance ratings.
Journal of Business and Psychology, 19, 383–397.
Haig, B. D. (2005). Exploratory factor analysis, theory generation, and the scientific method. Multivariate
Behavioral Research, 40, 303–329.
Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging
research. Experimental Aging Research, 18, 117–144.
Hurley, A. E., Scandura, T. A., Schriesheim, C. A., Brannick, M. T., Seers, A., Vandenberg, R. J., et al.
(1997). Exploratory and confirmatory factor analysis: Guidelines, issues, and alternatives. Journal of
Organizational Behavior, 18, 667–683.
Jingo, M., & Kurosawa, A. (1950). Rashomon [Motion picture]. Japan: Daiei.
Joreskog, K. G., & Sorbom, D. (1996). LISREL 8: User’s reference guide. Chicago, IL: Scientific Software
International.
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed.). New York, NY: Guilford.
Landy, F. J. (1986). Stamp collecting versus science: Validation as hypothesis testing. American Psychologist,
41, 1183–1192.
Meade, A. W., Michels, L. C., & Lautenschlager, G. J. (2007). Are Internet and paper-and-pencil personality
tests truly comparable? An experimental design measurement invariance study. Organizational Research
Methods, 10, 322–345.
Meredith, W. (1993). Measurement invariance: Factor analysis and factorial invariance. Psychometrika, 58, 525–543.
Mulaik, S. A. (1972). The foundations of factor analysis. New York, NY: McGraw-Hill.
Myors, B., Lievens, F., Schollaert, E., van Hoye, G., Cronshaw, S.F., Mladinic, A., et al. (2008). International
perspectives on the legal environment for selection. Industrial and Organizational Psychology:
Perspectives on Science and Practice, 1, 206–246.
Offerman, L. R., & Gowing, M. K. (1993). Personnel selection in the future: The impact of changing
demographics and the nature of work. In N. Schmitt & W.C. Borman (Eds.), Personnel selection in
organizations (pp. 385–417). San Francisco, CA: Jossey-Bass.
Evaluation of Measures 415

Pulakos, E. D., Arad, S., Donovan, M. A., & Plamondon, K. E. (2000). Adaptability in the workplace:
Development of a taxonomy of adaptive performance. Journal of Applied Psychology, 85, 612–624.
Raju, N. S., & Ellis, B. B. (2002). Differential item and test functioning. In F. Drasgow & N. Schmitt (Eds.),
Measuring and analyzing behavior in organizations: Advances in measurement and data analysis
(pp. 156–188). San Francisco, CA: Jossey-Bass.
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based
on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517–529.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two
approaches for exploring measurement invariance. Psychological Bulletin, 114, 552–566.
Rensvold, R. B., & Cheung, G. W. (1998). Testing measurement models for factorial invariance: A systematic
approach. Educational and Psychological Measurement, 58, 1017–1034.
Robie, C., Schmit, M. J., Ryan, A. M., & Zickar, M. J. (2000). Effects of item context specificity on the mea-
surement equivalence of a personality inventory. Organizational Research Methods, 3, 348–365.
Roznowski, M., & Hanisch, K.A. (1990). Building systematic heterogeneity into work attitudes and behavior
measures. Journal of Vocational Behavior, 36, 361–375.
Schmitt, N., & Mills, A. E. (2001). Traditional tests and job simulations: Minority and majority performance
and test validities. Journal of Applied Psychology, 86, 451–458.
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. Thousand Oaks, CA: Sage.
Society for Industrial-Organizational Psychology. (2003). Principles for the validation and use of personnel
selection procedures (4th ed.). Bowling Green, OH: Author.
Sternberg, R. J., Wagner, R. K., Williams, W. M., & Horvath, J. A. (1995). Testing common sense. American
Psychologist, 50, 912–927.
Steenkamp, J. E., & Baumgartner, H. (1998). Assessing measurement invariance in cross-national consumer
research. Journal of Consumer Research, 25, 78–90.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature:
Suggestions, practices, and recommendations for organizational research. Organizational Research
Methods, 3, 4–70.
Wang, M., & Russell, S. S. (2005). Measurement equivalence of the job descriptive index across Chinese and
American workers: Results from confirmatory factor analysis and item response theory. Educational and
Psychological Measurement, 65, 709–732.
Wechsler, D. (1939). The measurement of adult intelligence. Baltimore, MD: Williams & Wilkins.
Williamson, L. G., Campion, J. E., Malos, S. B., Roehling, M. V., & Campion, M. A. (1997). Employment
interview on trial: Linking interview structure with litigation outcomes. Journal of Applied Psychology,
82, 900–912.
Woehr, D. J., Sheehan, M. K., & Bennett, W. (2005). Assessing measurement equivalence across rating sources:
A multitrait-multirater approach. Journal of Applied Psychology, 90, 592–600.
Zenderland, L. (1998). Measuring minds: Henry Herbert Goddard and the origins of American intelligence
testing. Cambridge, England: Cambridge University Press.
This page intentionally left blank
20 Assessment Feedback
Manuel London and Lynn A. McFarland

This chapter explores the role of feedback in the assessment process. The content of assessment
feedback can range anywhere from a pass/fail statement to a detailed, competency-based report
delivered in-person and followed up with suggestions for development. Unlike feedback in other con-
texts, such as performance appraisal, selection feedback is not primarily about how applicants can
improve their performance in the future. Indeed, many tools are used in selection to measure areas
where we do not believe people can easily improve, such as personality characteristics and general
cognitive ability. The purpose of feedback may be only to explain the selection decision. However,
feedback may influence applicants’ perceptions of fairness, their self-image, and their reactions to
the organization. This in turn may affect applicants’ behavior, such as whether or not to accept a job
or recommend the organization to another prospective employee. Also, although not necessarily the
intended purpose, selection feedback may be useful to guide applicants’ development—whether to
help them next time they apply for a position, repeat a similar test for the same job, or accept a job
offer and need or want further training to enhance their job performance. Further, organizations and
human resource (HR) professionals who are responsible for selection may view giving feedback as a
professional and ethical obligation. Assessment feedback also can affect the organization’s reputa-
tion, for instance, as a respectful, development-oriented employer.
Here we consider the benefits and drawbacks of providing feedback from the standpoint of the
organization and the candidate. We review the literature on test-givers’ obligations to provide feed-
back, candidates’ reactions to feedback, and the potential costs and benefits of feedback to the
recipients and the organization. Finally, we consider implications for practice and areas for research
to better understand the role of feedback from individual and organizational perspectives.

SOME CASE EXAMPLES


Most organizations do not provide feedback beyond the selection decision itself. Some organiza-
tions (e.g., civil services) are required by law to provide more specific information on standing
or scores, but this is not the norm outside of those contexts. Consider some examples of deciding
whether or not to provide postselection feedback to candidates. The first example involves an online
assessment center (AC); the second, an objective preemployment test; and the third, an individual
assessment for executive selection.

AC FEEDBACK
A large, national management consulting firm is hiring 15 recent MBA graduates for entry-level
support positions. The jobs require analytic skills, client relations, high work standards, ability to
work well in teams, and motivation. Organizational psychologists on the HR staff develop a selec-
tion method that includes a battery of psychological tests, a measure of cognitive ability, a biodata
form, two business exercises to produce work samples (simulations that ask candidates to set pri-
orities, write e-mails, and respond to phone calls), and a test that asks how the candidate would

417
418 Handbook of Employee Selection

handle certain situations. The assessment is conducted online. Participants can log in from remote
locations. The firm’s recruiters conduct on-campus interviews and identify top candidates to invite
to participate in the assessment. The assessment process provides a wealth of information that helps
the firm decide who to hire and, for those hired, what development and job experiences they need
to improve their chances for success. The information would also be valuable to candidates who are
not selected to help them guide their own early career decisions.
The HR staff considered the following: The online assessment allowed giving feedback immedi-
ately after the completion of testing. The ability to deliver feedback quickly and at little additional
cost has created a dilemma. In the past, such an assessment would be conducted in person with
small groups of participants. Specific feedback would be time-consuming and costly to deliver.
Applicants would need to return to the testing site, and individualized feedback would be provided.
Computerization of assessments no longer makes this difficult to do. Now the decision to not deliver
feedback is not as justifiable on economic or practical grounds. So, the HR staff wondered, should
they give all candidates feedback immediately after the assessment? Should they wait until a selec-
tion decision is made, inform the candidates, and then invite them to return for feedback? Does this
risk divulging proprietary information about the testing process and its validity? Does this put the
firm at risk if candidates question the selection process’ fairness or accuracy? Should only those
who are hired be offered feedback? Should the firm require that those who are hired receive feed-
back and use the information to establish a development plan? Who should present the feedback to
the candidates? Should a psychologist meet with each candidate to explain the results? This would
be expensive. Should a report be prepared for each candidate and sent to a new hires’ immediate
supervisor to review with the candidate?

TEST FEEDBACK
A restaurant is hiring food service personnel. It uses a biodata form, an integrity test to assess
self-reports of honesty, and an interview. Hundreds of people take the test each year, administered
by the managers of the company’s restaurants across the country. What type of feedback should
the restaurant provide to applicants who are selected and those who are not, other than to inform
them of the decision?

INDIVIDUAL ASSESSMENT FOR EXECUTIVE SELECTION


A multinational consumer products company is hiring a marketing vice president from outside of
the organization. The company HR department works with the CEO to hire an executive search firm
to help identify candidates. A personnel psychologist working for the search firm meets with the
CEO and others in the organization to determine the job demands and expectations and formulate
a set of desired characteristics for the successful candidate, including knowledge, experience, moti-
vation, and interpersonal skills. Also, the psychologist develops a screening technique to identify
candidates for further analysis and ultimately formulates an individual assessment consisting of a
battery of personality tests along with background and situational interview questions for the top
candidates. Three candidates make it to the final stage and agree to complete the selection tests.
The psychologist writes a detailed report about each candidate for the hiring CEO to review before
interviewing the candidates, talking to references who know the candidates well, and making a final
decision. This process raises several questions about feedback: Should the reports be available to
the candidates? Who should deliver the reports? … the personnel psychologist, who can help the
candidates interpret the information in relation to what the organization wanted (e.g., perhaps the
results showed that the fit would not be good despite the stellar qualifications of the finalists)? Might
the results be used to support the development of the candidates? Should the organization simply
hand the report to the candidates? Should the candidates not receive any feedback other than not
being offered the job?
Assessment Feedback 419

The questions about feedback in these examples deal with how feedback fits within the selection
process. Should applicants be told more than whether or not they were selected? If they were not
chosen, should they receive more specific information that might help them in the future? More gen-
erally, how does feedback fit within the assessment process? We can begin to answer these questions
by turning to professional testing standards for guidance.

FEEDBACK AND PROFESSIONAL STANDARDS


Although feedback is a neglected aspect of testing, professional standards for test development and
administration provide some guidance about whether to provide test feedback to applicants and
how specific that feedback should be. The American Psychological Association (APA)’s Ethical
Principles of Psychologists specifies that applicants have the right to a full explanation of the nature
and purpose of an assessment technique in language they can understand, unless the candidate has
explicitly waived that right, and establish a procedure for ensuring the adequacy of the explanation
(APA, 2002). Specifically, the Principles state:

Regardless of whether the scoring and interpretation are done by psychologists, by employees or
assistants, or by automated or other outside services, psychologists take reasonable steps to ensure that
explanations of results are given to the individual or designated representative unless the nature of the
relationship precludes provision of an explanation of results (such as in some organizational consulting,
pre-employment or security screenings, and forensic evaluations) and this has been clearly explained to
the person being assessed in advance. (Principle 9.10, p. 14)

The APA’s The Rights and Responsibilities of Test Takers: Guidelines for Testing Professionals
specifies that applicants receive a written or oral explanation of their test results within a reason-
able amount of time after testing and in commonly understood terms (APA, 1998). The document
emphasizes that the “rights and responsibilities” are neither legally based nor inalienable rights, but
they represent good professional practice.
Pope (1992) summarized psychologists’ responsibilities in providing psychological test feedback
to the organization and to job candidates. He viewed feedback as a process that includes clarifying
tasks and roles of the test-giver and taker, ensuring that the test-taker has given informed consent
(or refusal) before taking the test, framing the feedback, acknowledging its fallibility, guarding
against misuse and misinterpretation of the information, guarding the test results, and assessing and
understanding important reactions. He emphasized that applicants and the organization have a right
to understand the purpose and use for the assessment, the procedures involved, and the feedback
they can expect and from whom.
The rights of test-takers were controversial when they were first created. They seemed to ignore
some of the realities of large-scale employment testing. For instance, the right expressed in the
APS Guidelines, “have their test administered and their results interpreted by appropriately trained
individuals who follow professional codes of ethics” (pt. 6.0) (which is not APA policy), may not
be possible when test administration personnel have no code of ethics or when feedback is deliv-
ered electronically to many people. These guidelines need to be contrasted with practice. In our
experience, most organizations do not follow the guidelines when it comes to selection tests. When
it comes to preemployment tests of cognitive ability and personality or interviews, applicants are
likely to be told little more than whether they were offered the job.
Feedback tends to be integral to the operation of ACs used for selection. The Guidelines and
Ethical Considerations for Assessment Center Operations were developed and endorsed by practi-
tioners to delineate the key components and activities of an AC, including feedback (International
Task Force on Assessment Center Guidelines, 2000). ACs combine a host of qualitative and quan-
titative data about candidates and are used for selection, promotion, and development for general
management positions and for specific functions and various industries, such as manufacturing,
banking, and sales (Spychalski, Quiñones, Gaugler, & Pohley, 1997). Often, line managers are
420 Handbook of Employee Selection

trained as assessors and may also be charged with giving feedback to candidates. The Guidelines
defines AC feedback as “information comparing actual performance to a standard or desired level
of performance” (International Task Force on Assessment Center Guidelines, 2000, p. 10), indicat-
ing that applicants must receive information to help them understand their results. The organization
using the AC should establish and publish a policy statement about the use of the data (e.g., who
will receive reports, restrictions on access to information, planned uses for research and program
evaluation, and feedback procedures). Assessor training should include “thorough knowledge and
understanding of feedback procedures” as well as a “demonstrated ability to give accurate oral and
written feedback, when the assessor’s role is to give feedback” (p. 10). Further, guidelines for use of
the data suggest the following:

1. Assessees should receive feedback on their AC performance and should be informed of


any recommendations made. Assessees who are members of the organization have a right
to read any formal summary written reports concerning their own performance and rec-
ommendations that are prepared and made available to management.
2. Applicants to an organization should be provided with, at a minimum, what the final
recommendation is and, if possible and if requested by the applicant, the reason for the
recommendation.
3. For reasons of test security, AC exercises and assessor reports on performance in particu-
lar exercises are exempted from disclosure, but the rationale and validity data concerning
ratings of dimensions and the resulting recommendations should be made available upon
request of the individual.
4. The organization should inform the assessee what records and data are being collected,
maintained, used, and disseminated.
5. If the organization decides to use assessment results for purposes other than those originally
announced and that can impact the assessee, the assessee must be informed and consent
obtained. (p. 9)

The various guidelines reviewed above recognize selection feedback as valuable to the applicant,
and indeed couch it in ethical terms—that applicants deserve to know the meaning of the informa-
tion collected about them and how the information was used to make decisions about them. There is
a growing body of research on applicants’ reactions to feedback that suggests the potential value of
feedback to the applicant and the organization. We examine this literature next.

APPLICANTS’ REACTIONS TO FEEDBACK


Understanding how applicants react to feedback can help HR managers design feedback that will be
beneficial to applicants and the organization. For instance, if administered appropriately, feedback
may help applicants make decisions about whether to accept an offer, prepare them to re-apply,
or make it more likely they will recommend the organization to other qualified applicants. Also,
feedback may reinforce or enhance applicants’ self-image, increase their self-awareness (accurately
recognizing strengths and weaknesses), direct their development goals, and, at least, not do harm by
damaging an individual’s self-image.
Table 20.1 summarizes research on applicant reactions to feedback and implications for selection
system design. Ryan and Ployhart (2000) examined the literature on applicant perceptions of
selection procedures, including the consequences of selection results and feedback. They found
various factors that may influence reactions to the selection procedure, such as test type, HR policy,
and the behavior of HR personnel, as well as selection results. They suggested that selection results
and feedback operate within the larger context of conditions that affect applicant reactions. Ryan and
Ployhart (2000) proposed that these conditions include personal characteristics (e.g., experience), job
characteristics (e.g., attractiveness), procedural characteristics (reasonableness of explanation; job
Assessment Feedback 421

TABLE 20.1
Applicants’ Reactions to Feedback and Implications for Practice
Examples of applicants’ perceptions and reactions
• Applicants’ reactions to the test and the testing process may change after receiving feedback about whether they
passed or failed the test (Marcus, 2003); feedback reactions are influenced by the context of the process (e.g.,
information about competition for the position; Ryan & Ployhart, 2000).
• Applicants who pass a test evaluate the organization and testing process more favorably than those who failed
(Bauer, Maertz, Dolen, & Campion, 1998; Kluger & Rothstein, 1993). Receiving negative feedback leads applicants
to perceive that the testing process was unfair (Lounsbury, Bobrow, & Jensen, 1989; Smither, Reilly, Millsap,
Pearlman, & Stoffey, 1993; Schinkel, van Dierendonck, & Anderson, 2004).
• Failing leads to lower self-image and self-perceptions of incompetence (Anderson & Goltsi, 2006; Fletcher, 1991;
Maertz et al., 2005; McFarland & Ross, 1982; Ployhart, Ryan, & Bennett, 1999; Ployhart & Harold, 2004).
• Feedback helps applicants have an accurate perception of their performance and benefit from the selection process
regardless of whether or not they are selected (Nicola & Macfarlane-Dick, 2006).
• E-mail produces fewer social context cues than voicemail, so e-mail increases the negative content of feedback,
filtering out the affect of the sender and receiver (Watts, 2007; Mishra, 2006).
Implications for selection system design
• Provide information about the fairness of the testing and decision-making process.
• Focus on implications of the results for behaviors, not personal characteristics that threaten self-image.
• Tie feedback to job requirements and developmental opportunities.
• Precede feedback with information about the selection test to explain its job relevance and fairness.
• Accompany feedback with behavior-based standards in relation to job knowledge, abilities, and experiences
required for success on the job.
• Explain why the test was fair to avoid applicants creating their own judgments about fairness.
• Convey personal and procedural information sensitively to generate more positive perceptions of the test and
organization, whether the applicant was hired or not.
• Explain reasons for test methods and results to promote accurate attributions and judgments of test fairness and
validity.
• Protect applicants’ self-image (e.g., remind them of the difficulty of the test, the tough competition, and the value of
knowing now that the job is not right for them).
• Recognize that machine-delivered feedback may require additional explanation than face-to-face feedback to
convey the meaning and intention.

relatedness of the selection methods), and organizational context (e.g., selection ratio/competition).
These affect applicants’ perceptions about the selection process and outcome, depending on
applicants’ expectations, the perceived desirability of the job and organization, available alternative
jobs, and social support. Applicants’ reactions to feedback need to be understood in relation to
their perceptions and feelings about the context and process before, during, and after testing and
feedback of results. The effects of assessment feedback may depend on conditions, such as applicants
perceiving that there was strong competition or applicants already having excellent jobs, thereby
providing an attribution beyond their own ability (Ryan & Ployhart, 2000). Recognizing that the
selection process involves evaluation, Ryan and Ployhart (2000) commented that reactions to tests
are not merely reactions to the process but reactions to being evaluated.
Research supports this line of thinking. Job candidates’ reactions to a test, what Schuler (1993)
called “social validity,” depend, in part, on the feedback they receive about their test performance.
Thus, how test results are presented will influence how candidates interpret the meaning of the
results and what they decide to do as a result, if anything (Marcus, 2003; Pope, 1992). Candidates’
reactions to a test and the testing process may change after receiving feedback about whether they
passed or failed the test. Bauer, Maertz, Dolen, and Campion (1998) assessed applicants’ reac-
tions before testing, after testing, and again after feedback about whether they passed or failed.
They found that applicants who passed the test evaluated the organization and testing process more
422 Handbook of Employee Selection

favorably than those who failed. Wiechmann and Ryan (2003) found that feedback about the selec-
tion decision had a significant effect on reactions to the test, consistent with a self-serving bias.
Candidates who were selected preferred to think that the test was fair and valid because this helped
them maintain their self-image. Those who were not selected expressed a negative perception about
the test because this diverted blame for the outcome from their own ability to the situation. In a
laboratory study, applicants who failed to meet hiring standards had a more negative view of the
organization than those who had passed (Kluger & Rothstein, 1993). In a study of actual applicants,
those who performed more poorly rated the entire selection process more negatively than those who
performed well (Macan, Avedon, Paese, & Smith, 1994). Thus, those who receive negative feedback
on test performance may automatically blame the test to protect their self-perceptions. However, if
a test is perceived to be procedurally fair before receiving feedback, but applicants receive feedback
that they did not perform well on the test, this may cause applicants to feel incompetent and thereby
lower their expectations of being able to perform on such tests in the future (Ployhart, Ryan, &
Bennett, 1999).
Ryan, Brutus, Greguras, and Hakel (2000) examined managers’ reactions to feedback from a
battery of tests and surveys collected for development purposes. Although this was not for selec-
tion, the study tells us how people react to assessment feedback. In this study, feedback content and
feedback sessions were standardized and assessors received extensive training for giving feedback.
In addition, the feedback was both positive and negative. Receptivity was measured by self-report,
observers’ ratings, and behavioral manifestations of receptivity (e.g., whether there were displays
of anger or denial). Individuals who were high in self-awareness and who were acquainted with the
feedback provider were more receptive to developmental feedback; older individuals and those who
were demographically different from the feedback provider were less receptive to it.

APPLICANTS’ FAIRNESS PERCEPTIONS


Much of the research on applicant reactions to feedback uses the justice framework to understand
reactions to testing and feedback. Distributive justice refers to perceptions of the fairness (equity)
of the outcomes of a selection process, whereas procedural justice refers to perceptions of the
fairness of the process itself (Folger & Greenberg, 1985; Gilliland, 1993; Thibaut & Walker,
1975). Applicants who receive negative feedback tend to rate the test less fair than those who
received positive feedback (Lounsbury, Bobrow, & Jensen, 1989; Schleicher, Venkataramani,
Morgeson, & Campion, 2006; Smither, Reilly, Millsap, Pearlman, & Stoffey, 1993). Procedural
justice may mediate the effects of favorability on outcomes. Several important organizational
outcomes may be related to procedural justice perceptions of selection processes (Bauer et al.,
2001). These include the attractiveness of the organization to applicants, applicant intentions
toward the organization (e.g., recommending the company to others), and deciding to accept the
organization’s job offer. However, there is still some debate regarding the importance of feedback
in test reactions. In their Annual Review article, Sackett and Lievens (2008) concluded, based on
the meta-analysis of Hausknecht, Day, and Thomas (2004), that there was little evidence for a
relationship between applicant reactions to the selection process, including feedback, and actual
behavioral outcomes.
Feedback and justice perceptions may interact such that those who perceive the testing process
to be fair will have lowered self-esteem if they fail and elevated self-esteem if they pass (Bauer
et al., 1998; Brockner & Wiesenfeld, 1996; McFarland & Ross, 1982; Ployhart & Harold, 2004).
For example, Gilliland (1994) found that perceptions of job-relatedness were negatively related to
test-taking self-efficacy for applicants who failed but positively related to test-taking self-efficacy
for those who passed. Bauer et al. (1998) found that feedback was more important than procedural
justice perceptions as a determinant of organizational outcomes. However, those who failed the
test and reported greater perceived information about the test and better treatment increased their
test-taking self-efficacy, perhaps because these individuals attributed their failure to external causes
Assessment Feedback 423

outside of their control or because they resolved to do better in the future as a way of compensating
for their lower performance.
In a laboratory study, Gilliland (2004) found that student applicants who were selected on the
basis of their test results reported greater fairness in the selection process than the rejected students,
especially when they had high expectations of being hired. Applicants who were rejected were more
likely to recommend application to others when they were offered an explanation for the use of
the selection procedure. Feedback influenced the factors that were important in perceiving that the
process was fair (Van Vienen et al., 2004). Explaining the selection decision to applicants increased
their perceptions of a fair selection process (Gilliland et al., 2001; Ployhart, Ryan, & Bennett, 1999).
Other research found that applicant reactions to the selection process and results are likely to be
fairer when they are given information about the process before being evaluated (Truxillo, Bauer,
Campion, & Paronto, 2002).
Because feedback can contribute to unfavorable perceptions of the selection process and out-
comes, understanding the effects of such perceptions is important. Truxillo, Steiner, and Gilliland
(2004) reviewed the literature on the effects of selection fairness on organizational outcomes. They
found that feelings of unfair treatment affect such “soft” outcomes as satisfaction with the selection
process and “hard” outcomes such as applicant withdrawal. Fairness perceptions matter particularly
in contexts of high uncertainty. Uncertainty is threatening, and applicants may construct fairness
perceptions to make things seem more certain, comprehensible, or tolerable. Hiring is by its nature an
uncertain situation because applicants do not know their chances of succeeding, and they have little
information about how well the job would work out if they were selected (Gilliland & Chan, 2001).
Organizations will want to be sure that applicants are clear about the purpose for the test and
treat all applicants in a friendly and respectful manner, including the process of giving feedback.
Procedural justice perceptions predicted organizational attractiveness and intention related to the
organization prior to receiving pass-fail feedback; however, this effect was diminished after con-
trolling for the pass-fail result (Maertz, Bauer, Mosley, Posthuma, & Campion, 2004). Procedural
justice perceptions seemed to be influenced by how well applicants thought they performed on the
test, and this in turn influenced their orientation toward the organization. However, when control-
ling for initial outcome levels, information about the test predicted organizational attractiveness,
and treatment at the testing site predicted intention toward the organization. Applicants who felt
they were informed about the test see the organization as more attractive, at least temporarily.
Similarly, feeling well treated at the test site increased applicants’ intentions to recommend the
organization to other applicants.
In summary, passing or failing a test contributes strongly to applicants’ subsequent reactions (Bauer
et al 1998; Thorsteinson & Ryan, 1997): the more positive the results, the more favorable the reaction.
Conveying personal and procedural information sensitively contributes to more positive perceptions
of the test and organization, whether the applicant was hired or not. Understanding the procedures
may limit the negative effects of failing a test on self-perceptions.

FEEDBACK AND SELF-IMAGE


Generally, people do not perceive themselves accurately, or at least as others see them (London,
2003). Feedback helps them have an accurate perception of their performance and benefit from
the selection process regardless of whether they are selected or not. Feedback supports candidates’
self-learning by clarifying what good performance is, encouraging dialogue between testers and
test-takers, and suggesting ways that testers can improve their performance in the future (Nicola &
Macfarlane-Dick, 2006). Job candidates are likely to perceive feedback as being accurate when the
feedback is clear and objective, not based on ambiguous or subjective data (e.g., one assessor’s or
interviewer’s opinions).
People tend to evaluate themselves positively to maintain or increase their self-image (Harris &
Schaubroeck, 1988). They interpret feedback through their own lens; for instance, potentially
424 Handbook of Employee Selection

attributing poor test results to factors outside of their control, such as an unfair testing procedure
(Ployhart & Harold, 2004). In contrast, as we noted earlier, applicants may lower their self-image
when they receive a low score on a test that they perceive to be fair (Ployhart, Ryan, & Bennett,
1999). Nicola and Macfarlane-Dick (2006) suggested that candidates evaluate their own perfor-
mance during the testing process and create their own feedback in line with their self-image when
feedback is not given. For instance, if they do not receive specific information about the results
of a test and they were not chosen, they will rationalize that their rejection was not because of
their test performance but because of other factors beyond their control. This allows them to main-
tain their self-image, but it may also create erroneous impressions that could be damaging to the
organization—for instance, that the organization has unfair hiring or promotion practices. When
the candidate is offered the job, the candidate is likely to attribute the cause to his or her good test
performance, but not necessarily without feedback. The candidate with low self-esteem may errone-
ously conclude that the positive outcome was due to luck or other factors beyond his or her ability.
In addition, the successful candidate could benefit from test results that suggest ways to improve
his or her skills and knowledge to be even more valuable to the organization and increase his or
her chances of job and career success. Hence, feedback should be provided to influence applicants’
perceptions about the testing process and organization.
Social identity theory suggests how applicants’ social identities interact with their perceptions
of selection experiences to predict their withdrawal from the process. Herriot (2004) argued that
applicants are likely to withdraw from the selection process when there is incongruence between
their current perceptions of the organization’s identity and their own self-identities that are salient
during specific elements of the selection process. Social identities are individuals’ beliefs about
their membership in social categories (e.g., their gender, ethnicity, occupation, family, and reli-
gion), in contrast to their personal identities, which are beliefs about their own characteristics (e.g.,
their strengths and weaknesses). Social identities are associated with a range of beliefs, values, and
norms of behavior, and may incorporate prototypes, or beliefs about the typical or ideal member of
a category. Organizational identities are subsets of social identities. Applicants develop perceptions
about their organizational identity as they participate in the selection process and receive feedback.
The effects of degree of congruence on leaving the process may be moderated by the applicants’
perceptions of the probability of obtaining another job. People who believe there are plenty of other
opportunities will have a lower threshold for incongruence. Those who believe that job opportuni-
ties are scarce are more likely to tolerate a higher level of incongruence.
Schinkel, van Dierendonck, and Anderson (2004) studied the role of feedback in minimizing
the psychological effect of a negative selection decision on job applicants. Student subjects com-
pleted two tests and then received a rejection message. Half received just the rejection message and
half received the rejection message and bogus performance feedback (the percentile, how they did
relative to others). Core self-evaluations and affective well being of the rejected students receiving
performance feedback significantly decreased from before to after the testing and feedback com-
pared with that of students in the rejection message-alone condition. Core self-evaluations actually
increased for those who were rejected but were not given performance feedback, particularly if they
saw the procedure as unfair. Procedural fairness (candidates’ perceptions that they had a chance to
demonstrate their performance and that the test was related to the job function) interacted with feed-
back to affect core self-evaluation; distributive fairness (the perception that the selection decision
was correct) interacted with feedback to affect well being. The authors suggested an attribution the-
oretic explanation for these results. The students may have showed a self-serving bias after receiving
negative outcomes, attributing the negative outcome to external causes (e.g., unfair procedures that
were not under their control), thereby maintaining their self-perceptions. This comparison made
reducing the negative state following rejection even more important, following DeNisi and Kluger’s
(2000) concept that feedback entails a comparison.
Maertz, Bauer, Mosley, Posthuma, and Campion (2005) measured applicants’ self-efficacy for
cognitive ability testing before and immediately after the test and again after pass/fail feedback.
Assessment Feedback 425

The applicants were applying for a position at a utility company. Self-efficacy for the test prior to
actually taking it was higher for men, applicants who had prior successful experiences with cogni-
tive ability tests, those who perceived the test to be valid and fair, and those with higher general
self-efficacy. Self-efficacy for the test increased for those who passed it and decreased for those
who failed. Failing had a greater negative impact on subsequent self-efficacy for the test for women
and Whites and a lower negative effect for those who had previously been hired based on ability
tests. An implication is that those who fail the test and have significant decreases in self-efficacy
for cognitive tests might tell others that the test was particularly difficult and discourage possible
applicants.
Maertz et al. (2005) suggested that organizations consider attribution-related or other interven-
tions to bolster self-efficacy or to increase perceptions of fairness and test validity. Also, test admin-
istrators should follow procedural justice rules and emphasize in pretest preparation sessions the
proven validity of the test for predicting job performance. These interventions may also enhance
applicants’ attraction to the organization. Derous, Born, and de Witte (2004) found that applicants
valued and expected feedback. They argued that although applicants should not be deprived of their
right to performance scores, perhaps performance measures after selection should not be provided
or should be provided in ways that protect applicants’ self-image; for instance, reminding them of
the low selection ratio or that the position is not right for everyone and that knowing now is better
than being unhappy later.
Anderson and Goltsi (2006) formulated the construct of negative psychological effects (NPEs) of
selection and assessment methods upon applicants. They defined NPE as follows:

Declines in applicant psychological well-being, general mental health, or core self-esteem that are
inveterate, measurable, and statistically demonstrable, and that occur as a result of exposure to rejection
decisions, violations of applicant rights, or unprofessional feedback given to applicants by recruiters, at
any stage during organizational selection or promotion assessment procedures. (p. 237)

They also defined positive psychological effects (PPEs) as increases in applicant psychological
well being, general mental health, or core self-esteem that result from acceptance decisions,
perceived respect for applicant rights, or complementary feedback. Previous research had found
that applicants participating in an AC experienced negative psychological effects (Fletcher, 1991).
Anderson and Goltsi (2006) suggested that NPEs may be present for several weeks and months
after receiving a negative selection decision. They noted that much of the research on applicant
reactions to selection methods has focused on applicants’ immediate reactions and preference
perceptions to different predictors. This study investigated the long-term effects and outcomes of
applicants’ exposure to specific selection methods on candidate decision-making, attitudes toward
the organization, and psychological health and well being. For instance, measures included self-
esteem, mental health, positive and negative affect, and career exploration behavior. One hundred
seven applicants participating in an AC completed measures just before participating in the center,
immediately afterward but before they were told the outcome, and 6 months after the assessment.
All applicants received detailed feedback regardless of whether they were accepted or not. Rejected
applicants did not differ significantly from accepted applicants on the indices of NPEs. Accepted
applicants rated feedback dimensions more favorably than rejected applicants. Rejected applicants
seemed to attribute the negative decision to a lack of accuracy in the assessment process. The
authors thought that one reason why NPEs did not emerge for unsuccessful candidates may have
been that the selection ratio was so competitive in this organization that this may have moderated
applicants’ negative feelings from rejection. An implication is that providing detailed feedback to
unsuccessful candidates may be dysfunctional. For internal applicants, rejected applicants remain in
the organization, and NPEs could affect their job performance. This could also apply to successful
applicants who received inappropriately negative feedback or felt that they were not treated fairly
in some way.
426 Handbook of Employee Selection

In summary, unfavorable feedback may have a negative effect on applicants’ self-image.


Organizations should investigate the costs of potential NPEs on reduced performance and at least
consider the possible long-term negative consequences from rejection.

BENEFITS AND COSTS OF FEEDBACK


People generally do not react positively to feedback. They are naturally apprehensive about being
evaluated and are concerned about what others think of them (London, 2003). However, feedback
can direct behavior by helping people set and recalibrate goals and determine what they need
to do to achieve their goals. Feedback can be motivating, by giving people a sense of what they
have accomplished and a feeling of reward for their achievements. Feedback from selection tests
can inform candidates about whether their ambitions were realistic and what they need to do to
increase their preparedness in the future. As such, feedback can contribute to development and
career planning.
The dilemma for the organization is deciding the level of specificity for feedback to maximize the
benefits and minimize the costs to the organization and the individual. Here we consider the costs
and benefits of feedback from the perspectives of the organization and the individual candidate.

ORGANIZATION’S PERSPECTIVE
Potential benefits of feedback for the organization include the following:

• Feedback informs the candidates’ decision-making. Candidates who are selected will have
a better understanding of why and the value the organization believes they will bring to
their positions.
• Knowing that candidates will receive feedback requires the organization to maintain focus
on the skills and knowledge needed by the successful candidate(s). These characteristics
may reflect skills needed to do the job today and/or needed for future career advancement
in the organization. Maintaining this focus is especially important when decisions are
based on qualitative information, such as interviews and supervisory opinions.
• Feedback is a way to maximize value from assessment dollars. Considerable time and
money may be spent evaluating candidates. This information can be useful not only for
making the decision but guiding candidates’ future development.
• Feedback may guard against illegal discrimination and defend against claims of such in
that feedback recipients will know and understand the reason for the selection process and
decision, see its job relevance, and recognize that it was not arbitrary or based on some-
thing other than bona fide job requirements.

Regarding potential costs of feedback, the organization needs to consider the following:

• Of course, the organization incurs the cost of delivering feedback. This may include the
cost of personnel, such as a coach, who meets with the candidates after the selection pro-
cess is concluded to explain the decision and provide the feedback.
• Feedback may create difficulties in maintaining the reliability and validity of the assess-
ment methods. For instance, if candidates know the test content, they may communicate
this to others, giving future candidates an unfair advantage, or causing them to respond in
certain ways to create impressions they feel the organization wants, and thereby limiting
the accuracy of the information.
• Test security is costly, and feedback that goes beyond test outcomes may make security
difficult. Having alternate forms of the selection process may eliminate this worry but adds
a further cost to create and validate these forms.
Assessment Feedback 427

• The organization may be obliged to go beyond feedback to include advice for development.
Such advice needs to be given in a way that does not lead the candidates to have expecta-
tions about future opportunities; for instance, implying that if they follow the advice, they
will be promoted.
• Guarding candidates’ confidentiality is also a cost. Candidates should be told about who
has access to the assessment results, how the information will be used by the organization,
how long it will be retained, and how identifying information will be secured.
• Giving feedback imposes obligations to follow-up, especially with internal employees. The
employees may want career advice in the future. Such feedback and career counseling
can be linked to a career development function in the organization. Such a function may
be valuable but imposes costs and commitment to maintain such a process, including the
professional personnel and systems to track individuals. Organizations may want to do this
in any case, and as suggested above, will thereby be taking maximum advantage of the
assessment data for talent management.
• The issue of longevity of assessment results needs to be examined by the organization.
Feedback indicates the value of the information for development, implying that the orga-
nization recognizes that people grow and develop over time. However, reassessment for
future career opportunities is costly. The organization will want to study changes in assess-
ment performance over time when varying degrees of development have occurred; for
instance, differences in performance between candidates who have received feedback and
those who have not and differences in performance between candidates who subsequently
participated in various developmental experiences compared to those who did not. Such
research is an additional but worthwhile cost in increasing the value of the assessment.

CANDIDATE’S PERSPECTIVE
Potential benefits of feedback for the candidate include the following:

• In general, the benefit of feedback is to increase self-awareness of strengths and weaknesses,


identify competencies and experiences needed to increase competitiveness for future positions,
and learn areas needing development to be more effective once on the job.
• Feedback may help candidates who are offered positions understand the nature of the work
and expectations. Details about passing assessment results will show them the character-
istics that are valued by the organization for the job and/or for their careers. Such detailed
feedback, if provided, would contain information about strengths and weaknesses and sug-
gest areas for improvement although they have been offered a job. The way they are treated,
the information they are given about themselves and the selection process, and their conclu-
sions about the fairness and validity of the process will help them evaluate the organization.
The feedback may suggest that the organization cares about and is aware of their abilities
and will support their continuous learning. For individuals who were rejected, the informa-
tion explains why, affects their beliefs that the decision was made fairly and on the basis of
relevant information, and that they can benefit by using the information to recognize their
strengths and weaknesses to focus their future job search and development.
• Assessment feedback should include not only information about results but also ideas for
development of weaknesses and ways to build on one’s strengths. The feedback recipi-
ents should understand differences between performance areas that can be developed and
those that are difficult to develop. Some areas, such as decision-making, are harder to
understand and may take years to develop. Other areas, such as organizing skills, are
easier to understand and improve. For performance areas that are difficult to develop,
candidates might value suggestions about how to avoid behaviors and responsibilities that
require ability in these areas.
428 Handbook of Employee Selection

Potential costs of feedback for the individual candidate include the following:

• Perhaps the major cost of feedback from the individual’s perspective, particularly for those
who are not offered positions, is losing face and declining self-confidence. Generally, less
specific information will be less threatening but of less value.
• Another potential cost is processing the information. This takes time, energy, and moti-
vation—sometimes more than the individual cares to give. Perhaps the applicant was not
highly motivated to apply for the position to begin with. Or perhaps the individual knew
from the start that this was a long shot. This person does not need to hear more than he or
she was not offered the job. However, perhaps more information may redirect the candi-
date’s time and attention toward more fruitful opportunities in the future.
• The results may lead the candidate to make a disappointing career decision. Candidates
who get positive results may be flattered by the job offer and not pay attention to details
of the feedback—or hear just about their strengths and ignore or give less attention to any
weaknesses. Also, they may not pay sufficient attention to the nature of the job, instead
focusing on their own competence. As a result, they may accept a position that may have
requirements (e.g., travel) that they really did not want. Conversely, candidates who are
rejected may focus on the rejection decision and not be able to process feedback about their
personal characteristics mindfully.

DIFFERENCES BETWEEN INTERNAL AND EXTERNAL CANDIDATES


The costs and benefits may differ depending on whether the candidates are internal or external to
the organization. If internal, the organization wants to maintain the loyalty and motivation of the
employee and enhance the individual’s career development with the organization. Moreover, com-
municating reasons for selecting certain candidates and bypassing others lets the candidates and
other stakeholders (internal and external) know what is important to the organization. Internal candi-
dates who were not selected will expect a rationale and may appreciate and take advantage of advice
to direct their development and improve their future career development opportunities within the
organization. External candidates may not need as much information. Still, as described above, their
reactions to the fairness and thoroughness of the selection process and the validity of the decision
may affect their impressions of the organization and their subsequent relationships with the organi-
zation. As they communicate their perceptions to others, this may affect the organization’s reputation
as an employer and the organization’s ability to recruit qualified candidates in the future.

FEEDBACK OPPORTUNITIES FOR DIFFERENT ASSESSMENT METHODS


Assessment methods vary in the nature of the data they collect and their difficulty in interpreting
feedback results and applying the information for development. They differ in quantitative and
qualitative results, the face validity of the methods, and their usefulness for improving performance
and future selection prospects. Consider the following methods:

• Interviews: Interview results are difficult to quantify. Situational judgment interviews


(SJIs) may provide sample “normative” responses for comparison with individual candi-
dates’ answers.
• Cognitive and personality tests: These produce objective results that can be explained by
subject area, percentiles, and norms within the company and with national samples.
• Integrity tests: Honest feedback on integrity tests could generate some really negative reac-
tions. Test administrators may need to justify the use of integrity tests for candidates or,
more simply, say in a tactful way that the results did not conform to the pattern that the
organization wanted without divulging the nature of the test.
Assessment Feedback 429

• Test batteries: Test batteries produce complex results and may require professional input
to integrate and summarize, although automatic scoring and prewritten interpretations can
be developed for electronic feedback.
• Biodata: Past behavior is an excellent predictor of future behavior. Biodata results can be
justified in terms of experience needed. This method may be valuable in guiding failed
candidates to better preparatory experiences.
• AC measures of management and leadership skills: ACs produce complex results and
allow in-depth feedback, focusing candidates’ attention on weaknesses that can be cor-
rected and strengths that suggest alternative career directions. These can be delivered to
the candidate online immediately or soon after an online AC.
• Multisource (360 degree) feedback survey results: This method of collecting performance
ratings from subordinates, supervisors, peers, and/or customers as well as self-ratings
is often used alone for development. When used for decision-making, the results can be
related to other performance data and/or to changes in performance ratings from different
sources over time. An executive coach may assist in providing the feedback and encourag-
ing the recipient to interpret it accurately.
• Supervisor nominations and evaluations: When used as sole input for a placement or pro-
motion decision, candidates may perceive nominations and performance evaluations as
unfair. This may increase their political behavior and impression management in hopes of
winning favor for future positive decisions
• Performance in a management/leadership development program: Participation in a man-
agement development program may indicate performance capabilities in areas in which
the candidates may not have had a chance to demonstrate their abilities on the job. The
participants would likely perceive the use of such results for making decisions about them
as unfair unless this purpose was evident before they participated in the program. Also,
the experiential situations in such a program may be viewed as artificial and not a good
representation of actual job performance.

COMPUTERIZED ASSESSMENTS AND FEEDBACK


Online assessments can be cost effective and flexible. They can be customized to assess various
abilities and behaviors. For instance, they can present realistic scenarios to candidates via full-
motion video and ask candidates how they would respond to the different situations (Drasgow,
Olsen, Keenan, Moberg, & Mean, 1993; Wiechmann & Ryan, 2003). Test-takers’ posttest,
postfeedback reactions to the tests are not affected by mode of administration (computer vs.
paper-and-pencil test; Wiechmann & Ryan, 2003). Just as assessments can be computerized,
so can the feedback. This may be less threatening than receiving feedback from an individual.
However, it also presents an opportunity to avoid paying attention to the results, which is more
difficult with in-person feedback. Computerized feedback can be given at different points
of time during the assessment or at the end, along with information about alternative response
possibilities. Although these are simulations, they are realistic, standardized, and easy to
administer. Of course, the computerized feedback can be combined with in-person feedback to
help the candidate use the information.
There is a growing body of research comparing receiving feedback via computer to receiving
feedback face-to-face. For instance, Watts (2007) noted that computer-mediated communication
makes delivering evaluative feedback immediate and detailed. She compared the effects of feed-
back via e-mail with voicemail from the perspective of the sender and receiver in a study of evening
MBA students delivering feedback that they generated themselves in relation to fellow students’ par-
ticipation in a group project. E-mail produced fewer social context cues than voicemail, so e-mail
increased the negative content of feedback, filtering out the affect of the sender and receiver. E-mail
senders viewed the negative feedback they gave as more negative than the receivers viewed the
430 Handbook of Employee Selection

feedback. This was not the case for voicemail senders. However, voicemail senders, but not e-mail
senders, were less comfortable with the negative feedback than receivers. Media conditions did not
influence feedback effectiveness (e.g., the perception that the feedback motivated the receiver to
work harder next time).
Mishra (2006) addressed whether people respond similarly to computer feedback about their
abilities as they do to human feedback. Students participated in a laboratory study in which they
were assigned randomly to one of four experimental conditions: scored test versus nonscored test
crossed with praise for success on easy task and no blame for failure on a difficult task versus
no praise for success on an easy task and blame for failure on a difficult task. Mishra reported
that participants took feedback at “face value” and seemed unwilling to commit to the same
level of “deep psychological processing” about the intention of the feedback, as they seemed to
do with face-to-face feedback. This was contrary to the position that people learn to respond to
computers as social actors with whom they interact in real time using natural language fulfilling
traditional social roles. When feedback comes from humans, people are more interpretive and try
to comprehend the context of the feedback. They do not do this when working with computers.
Psychological aspects of person-machine interaction are complex and difficult to understand.
Although students responded to the affect computer feedback, they seemed to disregard the con-
text within which the feedback was offered. They saw praise from the computer as being positive,
regardless of whether or not they thought that their ability level was known or whether the task
was easy or difficult.
People may not be willing to process inferences as deeply when they receive computer feedback
as when they receive feedback from other people (Mishra, 2006). When receiving feedback from a
human, praise for success on an easy task is discounted and may even reduce motivation, especially
when they believe that their ability level is known. Receiving blame for a difficult task leads to posi-
tive affect because the recipient believes that the feedback sender knows that the evaluator thinks
highly of his or her ability and that he or she can do better. If people respond more mindlessly to
computer feedback about their test scores, this could thwart the goals of the feedback; for instance,
to motivate higher achievement next time or to take the difficulty of the task into account when
receiving praise. However, research is needed to determine if providing feedback recipients with a
social script that suggests that the computer is capable of sophisticated inferences; for instance, that
the computer “respects” the subject’s intelligence because the computer has a record of informa-
tion about the subject and takes that into account in providing feedback. Ferdig and Mishra (2004)
explored the technology as a social actor, finding that people exhibited emotional responses (e.g.,
anger and spite) when they felt that the computer had treated them unfairly in an ultimatum bargain-
ing game.

FEEDBACK, TEST PREPARATION, AND COACHING


Test feedback provides opportunities for career development. More specifically, it focuses the candi-
dates’ attention on ways to increase opportunities and also avoid errors or failures in areas that were
critical to the selection process. Feedback’s main value is correcting errors. Providing test feedback
suggests ways that the feedback recipients can increase their knowledge and avoid similar errors in
the future. Feedback that is directed at the specific retrieval and application of specific knowledge
stimulates recipients to correct errors when they recognize (are mindfully focused) correcting or
avoiding these errors in similar testing situations (Bangert-Drowns, Kulik, Kulik, & Morgan, 1991).
However, feedback may have a deleterious effect if it focuses candidates’ attention on areas that will
not be as important in future situations. For instance, what was important for a given position in one
organization may be different than what is needed for another position in a different organization,
even if the positions are similar. Also, feedback may focus attention on areas that detract from cur-
rent performance. For instance, the candidates may concentrate on improving their weaknesses or
maximizing strengths that were important in the selection situation (e.g., for a promotion), but may
Assessment Feedback 431

not be as important in their current situation. They may behave as if they were promoted or were
working on a different job and ignore current job requirements.

IMPLICATIONS FOR PRACTICE


Practitioners need to make fundamental decisions about giving feedback recognizing that feedback
may affect applicants’ reactions to the testing situation, the organization, and their self-image. Also,
in divulging test results, practitioners worry about guarding the security of the test and minimizing
costs relative to the gains from feedback. Specifically, practitioners need to determine if and when
feedback should be given, by whom (or by what means), or by what medium (face-to-face, letter,
e-mail). They also need to consider how much detail should be given and the form of score reporting
(e.g., raw score, percentiles, standard scores, etc.). Other questions involve the relevant comparison
groups (e.g., other candidates, other people with similar ability and background) and what resources
should be provided to assist applicants in interpreting the feedback and using the information for
their development.
Overall, HR practitioners need to foster applicants’ perceptions that the selection process and
outcome are fair, guard against erroneous attributions about the organization, and protect appli-
cants’ self-image. In addition, practitioners need to guard the security of the test and minimize cost
as they make feedback a positive part of the selection process. Also, candidates are not accountable
for using the feedback. Candidates need to be made aware that it is available. The organization then
needs to provide a setting that is conducive to delivering feedback, including the format for the feed-
back. The decision about format, setting, and feedback specificity depends on the test developer’s
and administrator’s conclusions about their ethical obligation to provide feedback and to do so in
a manner that does not do harm to the candidates, at the very least, and hopefully benefits them.
The dilemma is how to maximize the benefits and minimize the costs to the organization and the
recipients. This is likely to be a balance between candor and confidentiality. It may also require
customizing the feedback to suit the candidates. Some may welcome feedback; others may avoid
it. Those who want more detail can be given the information. Precautions should be taken to guard
the assessment information to protect its usefulness to the organization (e.g., do not hand the test
and scores to applicants) as well as deliver the results in a sensitive way that takes into account the
recipient’s ability to comprehend the information. This may require hiring and training an assessor
or coach to convey and discuss the results with the applicant.
The organization must also determine its obligation to follow-up after the feedback is delivered.
Follow-up questions can benefit the individual by ensuring that harm was not done and possibly
providing further coaching or career guidance. Follow-up can benefit the organization by asking
candidates for the perceptions of the fairness of the selection process and their feelings about the
organization. Perceptions of unfairness, whether because candidates were unhappy with the out-
come or because they truly felt that the process was discriminatory or unfair in some way, will
suggest ways to change the perception for future candidates.
Organizations that routinely provide assessment feedback to internal candidates are likely to fos-
ter a continuous learning culture that includes accepting and understanding performance feedback
and seeking areas for performance improvement and career development. Feedback recipients learn
to evaluate the feedback results for themselves and share it with others, perhaps their coworkers, as
a way of validating the results and seeking ways to apply the information for their continued profes-
sional growth. Clear communication about the assessment method and how the results are used is
important.
Professionals responsible for deciding what and how assessment results are fed back to appli-
cants need to consider not only the cost of divulging information about the test and results from
the standpoint of the organization but also the individual’s ability to understand and benefit from
the information. Organizations should track changes in performance over time at the individual
and organizational level. Also, organizations can collect data to show the added value of selection
432 Handbook of Employee Selection

feedback and its joint effects with other interventions, such as coaching, training, career paths,
online developmental resources, etc.
Returning to the three cases that we introduced at the outset of this chapter, here is how the
organizations answered the questions about whether to provide feedback, how much, and in what
form.

AC FEEDBACK
The consulting firm that used an AC to help select recent MBA graduates for entry-level positions
decided to inform the candidates that feedback would be given 1 week after the online assess-
ment. Although some test results would be available to the firm immediately, on the basis of com-
puter scoring, the exercises would provide qualitative results that the firm wanted to examine and
integrate with all the information about the candidates. Observers who reviewed the transactions
would record some feedback on each exercise. The feedback would be available to candidates who
passed the assessment and those who did not, although the nature of the feedback and tenor of the
conversation would differ. Those who passed and were offered jobs were told about developmental
experiences that would help them use the feedback for development in relation to the nature of the
work they would be doing and the organization’s performance expectations—essentially, a realistic
job preview. Those who were not offered positions were given the information in a way that did not
damage their self-image (e.g., suggested that this job may not have been right for them) and that
pointed to behaviors they could develop.
The firm asked the candidates not to describe the details of the selection process with others,
although they may want to reveal the results to others who could be helpful in their job search (e.g.,
their academic advisor or a friend or career mentor). The feedback included information about how
the data would be retained and protected. For selected candidates who accepted a position, this
included making the assessment results available to the individual’s new manager and the HR direc-
tor who would track the individual’s career and developmental assignments. For candidates who
failed the assessment, the data would be retained for a year under lock and key in case the individual
wanted to apply for another position with the firm.

TEST FEEDBACK
The restaurant hiring food service personnel using several evaluation sources (a biodata form, an
integrity test, and an interview) provided a written report for individuals who were offered jobs.
Those who were not hired were given information about the nature of the assessment to help them
realize that the selection methods were fair and related to the position. They were also given a sum-
mary statement of results written in relation to job requirements (e.g., “This job requires a person
to keep records accurately.”) without providing the individual’s actual results. Pains were taken not
to divulge the nature or purpose of specific aspects of the selection process, such as the integrity
test, or to provide results that would impugn the candidate in any way (e.g., indicate the potential
for dishonesty).

INDIVIDUAL ASSESSMENT FOR EXECUTIVE SELECTION


The consumer products company hiring a marketing vice president asked the personnel psy-
chologist who created and administered the assessment process to separate feedback reports
for the company and the candidate. The candidate had the option of requesting the feedback
report, which would be delivered in person by the psychologist. The feedback reports would be
available only after the decision was made and a candidate accepted the job offer. The success-
ful candidate would receive a more detailed, career-oriented report that would be the start of an
Assessment Feedback 433

ongoing developmental coaching experience with the psychologist or another external coach at
the discretion of the new vice president.

IMPLICATIONS FOR RESEARCH


Chan and Schmitt (2004) offered an agenda for future research on applicant reactions to selection
procedures. They suggested the need for research on such topics as dimensions of applicant reac-
tions (e.g., perceptions of fairness, predictive validity, and face validity), changes in reactions over
time (such changes would suggest measurement invariance), determinants of reactions (e.g., justice
perceptions and self-serving biases that induce negative reactions to the test when performance out-
comes are poor), reactions in relation to test constructs (e.g., perceptions of face validity for different
types of tests, such as personality compared to cognitive measures), outcomes of applicant reactions
(e.g., withdrawal from the selection process, job performance, and perceptions of the organization),
reactions to new technology in testing (online testing and feedback), and various methodological
and data analytic issues (e.g., an overreliance on correlational and concurrent study designs rather
than the more difficult longitudinal and predictive study designs). With reference to study design,
they recommended meta-analyses and longitudinal research. “The ideal project would be one in
which data on process issues are collected during the selection process and both those who are
accepted and those rejected by the procedure are followed over a longer period of time, perhaps a
year or two” (p. 21). Further, they believe that the most critical research initiative in this field should
be to incorporate test reactions in models of recruitment, selection, and newcomer orientation. This
would force researchers and eventually practitioners to recognize the determinants and immediate
and distal outcomes associated with reactions to selection processes.
We believe that applied research is needed to evaluate selection feedback. The effects of feed-
back should be evaluated along with the effects of other aspects of the selection process such as
explaining the process itself to demonstrate its fairness and relevance to the position and provid-
ing realistic job previews before and/or after the assessment to guide candidates’ decisions. As
noted above, reactions to the feedback can be tracked to determine their effects on decisions about
the organization (e.g., whether or not to continue with the selection process if further hurdles are
involved, accept a job offer, apply for another job, or recommend the organization to others). Also,
the effects of feedback on candidates’ development can be evaluated. The effects of feedback on
continued reliability and validity of the assessment process should also be determined. These data
can be used for ongoing program improvement.
Basic research should explore the effects of anticipated and actual feedback on candidates’
perceptions, test performance, and career decisions. This should include studying candidate’s
reactions to feedback as an impression making opportunity. The effects of feedback on later job
applications, participation in development, and assessment performance should be studied. Other
research should examine the extent to which the expectation of receiving feedback influences
assessment performance. Reactions to feedback that can be measured include candidates’ affective
reactions (liking the process, the test administrators, the organization), setting development goals
and participating in developmental experiences, and telling others about the selection process
and the organization. Relationships between assessment feedback and retention can be studied
to determine if the feedback provided information to help the accepted candidates make bet-
ter decisions about the company. Generally, we need to study how people process positive and
negative assessment feedback and the effects of the feedback on their making of career decisions.
Other areas for investigation include understanding how assessment feedback interacts with can-
didates’ demographic characteristics (age, gender, minority status, cultural background, career
stage), organizational characteristics (size, growth history, reputation for treating employees), and
the nature of the assessment data (qualitative or quantitative, detailed or general, and accompanied
by coaching and availability of developmental resources such as training programs for internal
candidates).
434 Handbook of Employee Selection

CONCLUSIONS
Feedback of assessment results is a complex process involving issues of information security and
professional accountability as well as development. Professional standards suggest that developers
of selection tests and other assessment methods are obligated to provide some feedback, if only to
explain the selection method and rationale for the outcome. Feedback also affects candidates’ reac-
tions to the selection process, which in turn may affect their decisions about the organization and
their development goals. Feedback can benefit the candidates and the organization, but precautions
must be taken to guard the confidentiality of the information and protect the self-image of the feed-
back recipient. The organization must determine the level of feedback specificity that is in the best
interests of the organization and the candidates. Internal and external candidates may be treated
differently, providing more details and developmental support to internal candidates and offering
external candidates optional feedback and help interpreting the results. To be constructive, feedback
can focus on implications of the results for behaviors, not personal characteristics that threaten
self-image. Moreover, feedback can be tied to job requirements and developmental opportunities.
Feedback can be preceded by information about the selection test when possible to explain its job
relevance and fairness. Furthermore, feedback can be accompanied by behavior-based standards,
not merely comparisons to others but standards in relation to job knowledge, abilities, and experi-
ences that are required for the job and that predict success on the job.
Feedback should not be given without adequate support for using the information and ensuring
that the candidate was not harmed by the information. Although assessment feedback has potential
costs to the candidate and the organization, the benefit of assessment results can be maximized by
recognizing its value for selection and development.

REFERENCES
American Psychological Association. (1998). The rights and responsibilities of test takers: Guidelines and
expectations. Test Taker Rights and Responsibilities Working Group of the Joint Committee on Testing
Practices. Washington, DC: Author. http://www.apa.org/science/ttrr.html
American Psychological Association. (2002). Ethical principles of psychologists and code of conduct.
Washington, DC: Author.
Anderson, N., & Goltsi, V. (2006). Negative psychological effects of selection methods: Construct formula-
tion and an empirical investigation into an assessment center. International Journal of Selection and
Assessment, 14, 236–255.
Bangert-Drowns, R. L., Kulik, C. L. C., Kulik, J. A., & Morgan, M. T. (1991). The instructional effect of feed-
back in test-like events. Review of Educational Research, 61, 213–238.
Bauer, T. N., Maertz, C. P., Jr., Dolen, M. R., & Campion, M. A. (1998). Longitudinal assessment of appli-
cant reactions to employment testing and test outcome feedback. Journal of Applied Psychology, 83,
892–903.
Bauer, T. N., Truxillo, D. M., Sanchez, R. J., Craig, J. M., Ferrara, P., & Campion, M. A. (2001). Applicant
reactions to selection: Development of the selection procedural justice scale. Personnel Psychology, 54,
387–418.
Brockner, J., & Wiesenfeld, B. M. (1996). An integrative framework for explaining reactions to decision:
Interactive effects of outcomes and procedures. Psychological Bulletin, 120, 189–208.
Chan, D., & Schmitt, N. (2004). An agenda for future research on applicant reactions to selection procedures:
A construct-oriented approach. International Journal of Selection and Assessment, 12, 9–23.
DeNisi, A. S., & Kluger, A. N. (2000). Feedback effectiveness: Can 360-degree appraisals be improved.
Academy of Management Executive, 14, 129–139.
Derous, E., Born, M. P., & DeWitte, K. (2004). How applicants want and expect to be treated: Applicant’s
election treatment beliefs and the development of the social process questionnaire on selection.
International Journal of Selection and Assessment, 12, 99–119.
Drasgow, F., Olson, J. B., Keenan, P. A., Moberg, P., & Mead, A. D. (1993). Computerized assessment. Research
in Personnel and Human Resources Management, 11, 163–206.
Ferdig, R. E., & Mishra, P. (2004). Emotional responses to computers: Experiences in unfairness, anger, and
spite. Journal of Educational Multimedia and Hypermedia, 13, 143–161.
Assessment Feedback 435

Fletcher, C. (1991). Candidates’ reactions to assessment centres and their outcomes: A longitudinal study.
Journal of Occupational Psychology, 64, 117–127.
Folger, R., & Greenberg, J. (1985). Procedural justice: An interpretive analysis of personnel systems. Research
in Personnel and Human Resources Management, 3, 141–183.
Gilliland, S. W. (1993). The perceived fairness of selection systems: An organizational justice perspective.
Academy of Management Review, 18, 694–734.
Gilliland, S. W. (1994). Effects of procedural and distributive justice on reactions to a selection system. Journal
of Applied Psychology, 79, 691–701
Gilliland, S. W., & Chan, D. (2001). Justice in organizations: Theory, methods, and applications. In
N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran (Eds.), International Handbook of Work and
Organizational Psychology (pp. 143–165). London, England: Sage.
Gilliland, S. W., Groth, M., Baker, R. C., Dew, A. F., Polly, L. M., & Langdon, J. C. (2001). Improving
applicants’ reactions to rejection letters: An application of fairness theory. Personnel Psychology,
54, 669–703.
Harris, M. M., & Schaubroeck, J. (1988). A meta-analysis of self-manager, self-peer, and peer-manager ratings.
Personnel Psychology, 41, 43–62.
Hausknecht, J. P., Day, D. V., & Thomas, S. C. (2004). Applicant reactions to selection procedures: An updated
model and meta-analysis. Personnel Psychology, 57, 639–683
Herriot, P. (2004). Social identities and applicant reactions. International Journal of Selection and Assessment,
12, 75–83.
Hough, L. M., & Oswald, F. L. (2000). Personnel selection: Looking toward the future—Remembering the
past. Annual Review of Psychology, 51, 631–664.
International Task Force on Assessment Center Guidelines (1989). Guidelines and ethical considerations for
assessment center operations. Public Personnel Management, 18, 457–470.
International Task Force on Assessment Center Guidelines (2000). Guidelines and ethical considerations for
assessment center operations. Bridgeville, PA: Development Dimensions International.
Kluger, A. N., & Rothstein, H. R. (1993). The influence of selection test type on applicant reactions to employment
testing. Journal of Business and Psychology, 8, 3–25.
London, M. (2003). Job feedback: Giving, seeking, and using feedback for performance improvement (2nd ed.).
Mahwah, NJ: Lawrence Erlbaum.
Macan, T. H., Avedon, M. J., Paese, M., & Smith, D. E. (1994). The effects of applicants’ reactions to cognitive
ability tests and an assessment center. Personnel Psychology, 47, 715–738.
Maertz, C. P., Jr., Bauer, T. N., Mosley, D. C., Jr., Posthuma, R. A., & Campion, M. A. (2004). Do procedural
justice perceptions in a selection testing context predict applicant attraction and intention toward the
organization? Journal of Applied Social Psychology, 34, 125–145.
Maertz, C. P., Jr., Bauer, T. N., Mosley, D. C., Jr., Posthuma, R. A., & Campion, M. A. (2005). Predictors of
self-efficacy for cognitive ability employment testing. Journal of Business Research, 58, 160–167.
Marcus, B. (2003). Attitudes towards personnel selection methods: A partial replication and extension in a
German sample. Applied Psychology: An International Review, 52, 515–532.
McFarland, C., & Ross, M. (1982). Impact of causal attributions on affective reactions to success and failure.
Journal of Personality and Social Psychology, 43, 937–946.
Mishra, P. (2006). Affective feedback from computers and its effect on perceived ability and affect: A test
of the computers as social actor hypothesis. Journal of Educational Multimedia and Hypermedia, 15,
107–131.
Nicola, D. J., & Macfarlane-Dick, D. (2006). Formative assessment and self-regulated learning: A model and
seven principles of good feedback practice. Studies in Higher Education, 31, 199–218.
Ployhart, R. E., & Harold, C. M. (2004). The applicant attribution-reaction theory (AART): An integrative
theory of applicant attributional processing. International Journal of Selection and Assessment, 12,
84–98.
Ployhart, R. E., Ryan, A. M., Bennett, M. (1999). Explanations for selection decisions: Applicants’ reactions to
informational and sensitivity features of explanations. Journal of Applied Psychology, 84, 87–106.
Pope, K. S. (1992). Responsibilities in providing psychological test feedback to clients. Psychological
Assessment: Clinical & Forensic, 4, 268–271.
Ryan, A. M., Brutus, S., Greguras, G. J., & Hakel, M. D. (2000). Receptivity to assessment-based feedback for
management development. Journal of Management Development, 19, 252–276.
Ryan, A. M., & Ployhart, R. E. (2000). Applicants’ perceptions of selection procedures and decisions: A critical
review and agenda for the future. Journal of Management, 26, 565–606.
Sackett, P. R., & Lievens, F. (2008). Personnel selection. Annual Review of Psychology, 59, 419–450.
436 Handbook of Employee Selection

Schinkel, S., van Dierendonck, D., & Anderson, N. (2004). The impact of selection encounters on applicants:
An experimental study into feedback effects after a negative selection decision. International Journal of
Selection and Assessment, 12, 197–205.
Schleicher, D. J., Venkataramani, V., Morgeson, F. P., & Campion, M. A. (2006). So you didn’t get the job …
now what do you think? Examining opportunity-to-perform fairness perceptions. Personnel Psychology,
59, 559–590.
Schuler H. (1993). Social validity of selection situations: A concept and some empirical results. In H. Schuler,
J. L. Farr, & M. Smith (Eds.), Personnel selection and assessment: Individual and organizational perspectives
(pp. 11–26). Hillside, NJ: Lawrence Erlbaum.
Smither, J. W., Reilly, R. R., Millsap, R. E., Pearlman, K., & Stoffey, R. W. (1993). Applicant reactions to selection
procedures. Personnel Psychology, 46, 49–76.
Spychalski, A. C., Quiñones, M. A., Gaugler, B. B., & Pohley, K. (1997). A survey of assessment center practices
in organizations in the United States. Personnel Psychology, 50, 71–90.
Thibaut, J., & Walker, L. (1975). Procedural justice: A psychological analysis. Hillsdale, NJ: Lawrence
Erlbaum.
Thorsteinson, T. J., & Ryan, A. M. (1997). The effect of selection ratio on perceptions of the fairness of a selec-
tion test battery. Internal Journal of Assessment, 5, 159–168.
Truxillo, D. M., Bauer, T. N., Campion, M. A., & Paronto, M. E. (2002). Selection fairness information and appli-
cant reactions: A longitudinal field study. Journal of Applied Psychology, 87, 1020–1031.
Truxillo, D. M., Steiner, D. D., & Gilliland, S. (2004). The importance of organizational justice in person-
nel selection: Defining when selection fairness really matters. International Journal of Selection and
Assessment, 12, 39–53.
Van Vianen, A. E. M., Taris, R., Scholten, E., & Schinkel, S. (2004). Perceived fairness in personnel selection:
Determinants and outcomes in different stages of the assessment procedure. International Journal of
Selection and Assessment, 12, 149–159.
Watts, S. A. (2007). Evaluative feedback perspectives on media effects. Journal of Computer-Mediated
Communication, 12, 384–411.
Wiechmann, D., & Ryan, A. M. (2003). Reactions to computerized testing in selection contexts. International
Journal of Selection and Assessment, 11, 215–229.
Part 5
Criterion Constructs in
Employee Selection
Kevin R. Murphy and Elaine D. Pulakos,
Section Editors
This page intentionally left blank
21 The Measurement of Task
Performance as Criteria
in Selection Research

Walter C. Borman, Rebecca H. Bryant, and Jay Dorio

This chapter is about measuring task performance (i.e., the technical proficiency part of job per-
formance) in personnel selection research. When evaluating the validity of selection tests and
procedures, the accuracy of these validity estimates depends in turn on the accuracy of criterion
performance measurement. Accordingly, there is considerable motivation in selection research to
obtain reliable and accurate criterion scores for job incumbents participating in the research. Our
chapter covers the task performance criterion “space.” Chapter 22, this volume, describes citizen-
ship performance criteria. Specific topics covered in this chapter are (a) relatively objective criterion
measures, such as work samples, job knowledge tests, and production rates; (b) subjective mea-
sures (i.e., ratings of performance), including different rating formats and rater training strategies;
(c) dimensionality of job performance; and (d) validity estimates against task performance for sev-
eral predictor constructs (e.g., ability, personality, etc.).

OBJECTIVE CRITERIA
At first glance, we might assume that objective criterion measures should be preferred over subjec-
tive ratings of performance. However, “objective” may be at least in part a misnomer in that judg-
ment often enters into the use of objective criteria. Also, objective measures are notoriously deficient
as criteria because they usually tap into only a small part of the total criterion space. Contamination
can, as well, be a serious problem with objective measures. For example, factors beyond the con-
trol of the job incumbent can influence objective criterion measures. Nonetheless, when they are
relevant to important performance requirements and are reasonably reliable and uncontaminated
(or when corrections can be made to reduce contamination), objective measures can be useful for
measuring some criterion dimensions. In other words, the deficiency issue may not be a problem
with objective criteria if we measure well the task performance part of job performance and have
other criterion measures evaluating performance in other aspects of the job.

PRODUCTION RATES
For jobs that have observable, countable products that result from individual performance (e.g.,
military recruiters or patrol officers who are assigned traffic enforcement duties), a production rate
criterion is a compelling bottom-line index of performance. However, as often noted (e.g., Borman,
1991; Guion, 1965), considerable care must be taken in gathering and interpreting production data.
For example, work-related dependencies on other employees or on equipment for determining

439
440 Handbook of Employee Selection

production rates may create bias in these rates. Also, production standards and quota systems (e.g.,
in call center jobs) can create problems for criterion measurement.
Instability of production rates is another potential problem. Rothe’s (1978) extensive research
on production workers showed that week-to-week production rates are only moderately reliable.
Correlations between successive weeks’ production average .75 with incentives and .53 without
incentives (Rothe, 1978). Longer periods for data collection may be necessary to ensure stable
criterion production rates. Most importantly, researchers attempting to derive production criteria
should pay special attention to possible contaminating influences whereby employees have unequal
opportunities to produce at the same rate.

SALES
Initially, sales jobs may seem ideally suited for the use of objective criteria as performance mea-
sures. Number of sales per unit time, or some similar index of bottom-line sales volume, appear
compelling as global, overall performance measures. However, upon closer inspection, significant
criterion contamination issues are evident for objective sales criteria.
First, summary sales volume measures are a function of individual skill and effort and envi-
ronmental factors beyond the control of the salesperson. In the context of the Campbell, Dunnette,
Lawler, and Weick (1970) behavior-performance-effectiveness model, objective sales volume is an
effectiveness measure. Where environmental influences are substantial and unequal in their effect
on sales, criterion measurement will be contaminated.
One way to remove contamination is to adjust sales data for factors such as market potential (e.g.,
Kanfer & Borman, 1987). A practical strategy for making these adjustments is to create norms for
stores, sales territories, or for whatever organizational unit provides the appropriate comparison.
Then criterion scores for each salesperson can be compared to scores for other salespersons with
roughly the same selling-related environment and thus similar opportunities to produce sales.
Unfortunately, an inherent problem with this approach has to do with the norming process itself. For
example, if large sales territories with many salespersons are used to accomplish the norming, there may be
meaningful differences within territories with respect to opportunity to perform. If smaller territories are
used, then the norms tend to be unstable because the mean sales performance comparison indices are based
on too few salespersons. Thus, how one does the adjusting may be as important as whether or not to adjust.
As with most other objective performance measures, sales criteria suffer from problems of defi-
ciency in that global measures of sales volume will often fail to tap important parts of the job. For
example, identifying new customers and maintaining good relations with existing customers are
important aspects of sales but would not be directly indexed by objective sales measures.

WORK SAMPLES
Work sample or performance tests are sometimes developed to provide criteria for selection research.
Some argue that work sample tests have the highest fidelity for measuring criterion performance. In
a sense, the argument is compelling: What could be fairer than to assess employees’ performance
on a job by having them actually perform some of the most important tasks associated with it? Yet,
evaluation of work samples as criteria is not quite so simple, and their use involves several issues.
One issue in test scoring is whether to evaluate products or process relative to work sample per-
formance. In general, tasks associated with products (e.g., troubleshooting a problem with a radio)
can be oriented toward either product or process; tasks with no resulting products (e.g., interviewing
a job candidate) must be scored according to process considerations. An advantage to scoring prod-
ucts over process is that assessment is typically more objective. However, if the procedures taken to
arrive at the product are also important, process assessment is clearly necessary.
Other issues relevant to scoring work samples are germane here. Difficult-to-score process steps
are to be avoided. For example, checking and inspecting steps are difficult, if not impossible, to
The Measurement of Task Performance as Criteria in Selection Research 441

observe. Ill-defined steps and complex steps where an employee can do well on one part of the step
but poorly on another should also be avoided.
Still another issue with scoring work samples is the relative merits of pass-fail marks versus per-
formance level ratings on test steps. Guion (1978) argued for test step performance ratings because
they provide more information. Indeed, many steps seem amenable to a continuous performance
scale where such judgments as “more skillful,” faster,” and “more efficient” may have meaning for
evaluating performance. For certain very simple task steps, pass-fail may suffice, but it will usually
be desirable to develop continuous performance scales for use in work sample testing.
A major issue with work sample tests is that researchers may treat them as ultimate criteria; that
is, these tests are sometimes considered the criterion of choice for accurately assessing performance
in certain jobs, especially those that require complex motor skills. Work samples should not be
thought of in this light. First, they are clearly maximum performance rather than typical perfor-
mance measures. As such, they tap the “can-do” more than the “will-do” performance-over-time
aspects of effectiveness. Yet will-do longer-term performance is certainly important for assessing
effectiveness in jobs. Accordingly, these measures are deficient when used exclusively in measur-
ing performance. In sum, inherent shortcomings of work samples for measuring some aspects of
performance, as well as practical limitations such as time and equipment constraints, argue against
relying on such tests to provide a comprehensive index of overall performance.

JOB KNOWLEDGE TESTS


Another category of criterion measures is the job knowledge test. Once the target tasks are identi-
fied, items can be prepared—typically in a multiple-choice format, although other kinds of items
such as the essay type are of course possible. Just as in writing any other multiple-choice items, care
should be taken to ensure that the item stems and response alternatives are clearly stated and that
distractor responses are definitely wrong but plausible.
An issue with job knowledge test development is when is the paper-and-pencil knowledge test
medium appropriate for evaluating job performance. When a task is procedural, requiring primarily
knowledge about steps to complete it, and not complex motor skills for performing each step, a job
knowledge format seems clearly to be as appropriate as a work sample format. Tasks requiring cer-
tain skills and operations are probably not amenable to job knowledge testing. Such tasks include (a)
those that require finely tuned acts of physical coordination (e.g., a police marksmanship task), (b)
those that require quick reaction (e.g., typing a letter under time pressure), and (c) those that require
complex time-sharing psychomotor performance (e.g., aircraft cockpit simulator tasks).

SUBJECTIVE CRITERIA
Subjective criteria will typically be synonymous with performance ratings. The notion of supervisors
or peers providing numerical scores for employees on job-relevant performance areas is an interest-
ing idea. Ideally, it provides well-informed observers with a means of quantifying their perceptions
of individuals’ job performance. This is preferable to verbal descriptions of performances because
individuals can now be compared in a reasonably straightforward way. The notion can be viewed as
analogous to developing structured job analysis questionnaires to take the place of verbal job descrip-
tions for purposes of comparing jobs (McCormick, 1976). In each case, quantification of perceptions
clears the way for scientific study of an area that could not be previously studied in this manner.
The emphasis in this section will be on ratings gathered for research only as criteria for selec-
tion research applications. Although ratings can be generated for purposes of salary administration,
promotion decisions, or employee feedback and development, and although performance appraisal
systems to address these management functions are extremely important to individual and orga-
nizational effectiveness (e.g., DeVries, Morrison, Shullman, & Gerlach, 1981), they are not very
relevant to personnel selection research.
442 Handbook of Employee Selection

Performance ratings are indeed the most often used criterion measure in industrial and organi-
zational psychology. Landy and Farr (1980) referred to several surveys intended to assess how fre-
quently ratings are used as criterion measures in research reports. The percentages reach 75% and
higher, suggesting that considerable attention should be paid to this criterion measurement method.
Issues in using ratings as performance criteria include (a) design of the rating form to be used and
(b) type of training to provide to raters.

RATING FORMATS
Behaviorally Anchored Rating Scales
Smith and Kendall (1963) extended the notion of critical incidents (Flanagan, 1954) by designing
a rating format they referred to as behavioral expectation scales, now generally labeled behavior-
ally anchored rating scales (BARS). Smith and Kendall reasoned that different effectiveness lev-
els on job performance rating scales might be anchored using behavioral examples of incumbent

TABLE 21.1
Sample Rating Formats
Graphic Rating Scale Behaviorally Anchored Rating Scale
Administrative skills Organizational skills
Planning ahead; organizing time efficiently; A good constructional order of material slides smoothly from
completing paperwork accurately and on time; one topic to another; design of course optimizes interest;
keeping track of appointments; not wasting time students can easily follow organizational strategy; course outline
followed
Effective 7
10
6 This instructor could be expected to
assimilate the previous lecture into the
9 present one before beginning the lecture.
5

4 8

3 7 This instructor could be expected to


announce at the end of each lecture the
2 material that will be covered during the
6 next class period.
1
Ineffective
5

This instructor could be expected to be


4 sidetracked at least once a week in
lecture and not cover the intended
material.
3

2 This instructor could be expected to


lecture a good deal of the time about
subjects other than the subject she or he is
1
supposed to lecture on.
The Measurement of Task Performance as Criteria in Selection Research 443

TABLE 21.1 (continued)


Sample Rating Formats
Behavior Summary Scale Behavior Observation Scale
Establishing and maintaining good relationships in the community Reviews previous work performance
Contacting and working effectively with high school counselors, 1. Communicates mistakes in job activities
newspaper editors, radio and TV personnel, and others capable of to subordinates
helping recruiters to enlist prospects; building a good reputation for the Almost
Navy by developing positive relationships with parents and family of never 1 2 3… 11 12 Almost
prospects; presenting a good Navy image in the community always
2. Praises subordinates for good work
9 or 10: Extremely effective performance behavior
Is exceptionally adept Is innovative in Volunteers off-duty Almost
at cultivating and informing the time to work on never 1 2 3… 11 12 Almost
maintaining excellent public about the community projects, always
relationships with Navy; actively celebrations, parades, 3. Discusses hindrances in completing
school counselors, promotes the Navy etc projects
teachers, principals, and makes friends Almost
police, news media for the Navy while never 1 2 3… 11 12 Almost
persons, and other doing it; always always
persons who are distributes the most 4. Inspects quality of output materials
important for getting current Navy Almost
referrals and free information never 1 2 3… 11 12 Almost
advertising always
6, 7, or 8: Effective performance 5. Reviews inventory of necessary parts and
equipment
Spends productive time Arranges for Encourages principals,
Almost
with individuals such interested persons counselors, and other
never 1 2 3… 11 12 Almost
as police, city such Navy activities persons important to
always
government, or school as trips to the Naval a prospect to call if
officials; may lunch Academy; keeps they have any Total score __________
with them, and/or relevant persons questions about the
distribute calendars, informed of Navy Navy
appointment books, activities
buttons, etc., to them,
and/or invite them for
cocktails
3, 4, or 5: Marginal performance
Contacts school Is not alert to Is, at times,
officials only opportunities to discourteous to
sporadically; keeps promote the Navy; persons in the
them waiting for rarely volunteers community; for
information they want; off-duty time to example, sends form
relationships with promote the Navy letters to persons who
counselors, teachers, and is assisted him or other
etc., and persons unenthusiastic when Navy recruiters; is
important to an approached to do not always alert to the
applicant or recruit are something for the family’s desire for
distant and community; rarely more information
undeveloped accepts speaking about the Navy and
invitations the program in which
their son or daughter
enlisteda

continued
444 Handbook of Employee Selection

TABLE 21.1 (continued)


Sample Rating Formats
Behavior Summary Scale
1 or 2: Ineffective performance
Does not contact high Alienates persons in Presents negative
school counselors; community or image of the Navy by
does not accept persons important doing things like
speaking to an applicant or driving while
engagements; drives recruit by ignoring intoxicated or
around in car instead them, not answering speeding and honking
of getting out and their questions, impatiently at other
meeting people responding rudely, drivers; may express
demanding dislike for the Navy
information, or recruiting
encouraging high
school students to
drop out of school;
sometimes does not
appear at recruiting
presentations for
which he or she is
scheduled

a Statements 4–10 in the Behavioral Observation Scale were omitted because of space restrictions.
Graphic rating scale from Berk, R. A., Ed., Performance assessment: Methods and applications, The Johns Hopkins
University Press, Baltimore, 1986; behaviorally anchored rating scale from by Bernardin, H. J., & Beatty, R. W., Performance
appraisal: Assessing human behavior at work, Kent, Boston, 1984; behavior observation scale from Gatewood, R. D., &
Field, H. S., Human resource selection (5th ed.), Harcourt College, Fort Worth, TX, 2001.

performance. Accordingly, they developed performance rating dimensions, with scaled behavioral
examples anchoring the appropriate effectiveness levels on the dimensions (see Table 21.1 for exam-
ples of all of the formats discussed here ).
Essentially, the rater’s task is to compare observed job behaviors of the ratee with the behavioral
anchors on the scale to assign a rating on that dimension. This was seen as preferable to evaluating
a ratee without guidance regarding the effectiveness levels of different scale points. The BARS idea
is more than a format; it is a system, or even a philosophy (Bernardin & Smith, 1981). For example,
ideally raters should record examples of employee work behavior in preparation for assigning per-
formance ratings.
Another positive feature of BARS is that users of the system typically participate in scale devel-
opment, enhancing the credibility of the format. Further, from a domain sampling perspective,
BARS development steps provide an excellent methodology to aid in identifying all important
dimensions for a job.

Behavior Summary Scales


In response to a difficulty some raters have had with BARS, that of matching observed ratee perfor-
mance and the often very specific, low-base-rate behaviors serving as anchors on the scale, Borman
(1979) developed the behavior summary scales (BSS) format. With this format, behavioral inci-
dents are first generated targeting a wide range of levels of effectiveness on each dimension, as
with BARS. Second, the incidents are retranslated according to dimension membership and level
of effectiveness, also as is done with BARS. Finally, the content of all incidents reliably retrans-
lated into the high-, mid-, and low-effectiveness levels, respectively, is summarized, resulting in the
The Measurement of Task Performance as Criteria in Selection Research 445

summary scale anchors. These summary anchors represent sometimes four or five effectiveness
levels, but the main point is rather than the BARS practice of having individual incidents as scale
anchors, BSS has summary anchors capturing the behavioral content of several individual anchors
at each level of effectiveness for each dimension.
Regarding raters’ use of the BSS, the most important potential advantage is that the behavioral
construct underlying each aspect of job performance is made more evident to the rater. Raters do
not need to infer the dimensionality from a series of highly specific incidents. The inferential step is
accomplished in scale development, in which the behavioral essence from several specific incidents
is distilled in each behavior summary statement.
Accordingly, this approach should increase the probability that raters can match observed ratee
behavior directly with scaled behavior. That is, by increasing the scope of behavior representing
various performance levels on a scale, chances are greater that one of the anchors will accurately
describe a ratee’s performance on that dimension.
This argument makes good conceptual sense, but in the one format comparison study pitting
BARS against a BSS format, there were no consistent differences between these format types with
respect to psychometric error or accuracy (Borman, 1979). Thus, the seeming conceptual advantage
of BSS may not make any difference in the actual use of the scale.

Behavior Observation Scales


Latham and Wexley (1981) developed the behavior observation scales (BOS) format with favor-
ably worded behavioral statements that the rater responds to by indicating how frequently the ratee
exhibits each of these behaviors. They provided a list of advantages of BOS, including (a) BOS are
developed from a systematic job analysis, (b) the content of the explicit behavioral items provides
an excellent listing of the job’s performance requirements in concrete behavioral terms, and (c) item
analysis and factor analytic procedures can be more readily applied to BOS ratings than to BARS
or BSS data. To these should be added that BOS items appear to cut down on the complexity of
inferences necessary to make a rating, although a study by Murphy, Martin, and Garcia (1982) cast
some doubt on this point.

Computerized Adaptive Rating Scales


Each of these behavior-based rating format ideas had appealing features. However, the following
question arose: Does format make a difference relative to rating errors or the reliability and validity
of the ratings generated by raters using the different formats? Not all of the relevant format com-
parison studies have been conducted, but the studies that have been completed generally show small
differences between formats in terms of level of rater errors, reliability, validity, or accuracy. For
example, early reviews of format comparison studies (Landy & Farr, 1983; Schwab, Heneman, &
DeCotiis, 1975) concluded that the psychometric properties of the BARS format are probably not
much better than the psychometric properties of graphic rating scales (GRS, or scales with numeri-
cal rather than behavioral anchors). Borman (1979) found only small differences in halo, interrater
reliability, and validity for BARS, the BSS, and a graphic rating format. Landy and Farr (1980) went
so far as to estimate that the variance accounted for in psychometric quality by rating format was
as little as 4%. In fact, they called for a “moratorium” on rating format research, citing the largely
negative results.
For the next 20 years, Landy and Farr’s suggestion was followed for the most part (Farr & Levy,
2007). And yet, it still seems compelling to explore rating format ideas that might result in more
reliable and valid judgments about work performance. Small adjustments made to present formats
are unlikely to result in higher reliabilities and validities; however, it still seems important to experi-
ment with formats fundamentally different from those currently used in hopes of developing a
format more in alignment with raters’ cognitive processes or that somehow calibrates raters’ percep-
tions to help them make more precise judgments about observed performance.
446 Handbook of Employee Selection

One possible idea in the direction of a different rating measurement method started with consid-
eration of Thurstone’s (1927) law of comparative judgment in the context of the performance rating
process. Thurstone developed a method for scaling stimuli on the basis of paired-comparison judg-
ments. Arguably, his approach places stimuli on an interval scale. In the context of rating job per-
formance, Borman et al. (2001a) reasoned that pairs of behavioral statements might be presented to
the rater with instructions to pick the statement that is more descriptive of the ratee. If interval-scale
judgments of ratee performance levels can be achieved with this method, the paired-comparison
judgments may provide ratings that are more precise than those generated by other rating formats
that use a linear numerical scale, which arguably provide only ordinal-level measurement. Another
idea that might make the paired-comparison format even more effective is to apply an item response
theory (IRT) adaptive testing orientation to the method. For example, the rater could be presented
with a series of behavioral statement pairs such that responses to each successive pair provide a
more precise estimate of ratee performance.
Accordingly, our notion for computerized adaptive rating scales (CARS) was to develop a paired-
comparison rating task that used adaptive testing principles to help raters estimate a ratee’s performance
level through an iterative paired-comparison rating process. The idea was to initially present two behav-
ioral statements associated with a dimension—one reflecting somewhat below average performance
and the other reflecting somewhat above average performance. Depending on which statement the rater
indicated was more descriptive of the ratee, the rating algorithm, developed subsequently by Stark and
Drasgow (2002), selected two additional behavioral statements—one with a scaled effectiveness level
somewhat above the effectiveness value of the statement picked first as the more descriptive, and the
other with a scaled effectiveness level somewhat below the effectiveness value of that initially chosen
statement. The rater’s selection of the more descriptive statement for the second paired comparison
then revised the initial estimated ratee effectiveness level, and, as before, the algorithm selected two
more statements with effectiveness values bracketing the revised estimated performance level. Thus,
analogous to adaptive testing, a ratee’s “true” effectiveness level was to be estimated in an IRT sense
by this iterative paired-comparison rating task that presents in sequence item pairs that maximize the
amount of information about performance derived from each choice of an item.
In a laboratory study to evaluate selected psychometric properties of CARS compared to two
other formats, videotapes of six office workers were prepared depicting prescripted levels of per-
formance on three dimensions, and subjects rated these vignettes using the CARS format and one
of the other competing formats (graphic or behaviorally anchored rating scales). Results showed
23–37% lower standard errors of measurement for the CARS format. In addition, validity was sig-
nificantly higher for the CARS format (d = .18). Accordingly, in a laboratory study, CARS showed
promising results (Borman et al., 2001a).
One last point about formats—although different format ideas may not make very large differ-
ences related to psychometric properties, well-articulated performance standards for communicating
expectations and providing feedback in operational performance management systems can be quite
useful. Thus, especially the BARS and BSS can serve this important purpose in organizations.

RATER TRAINING
Rater training provides a promising approach to improving the quality of performance ratings.
Two general kinds of training programs have emerged to help raters generate more error free and
accurate ratings (Bernardin & Buckley, 1981; Smith, 1986; Woehr & Huffcutt, 1994). Rater error
training seeks simply to alert raters to certain psychometric or perceptual errors such as leniency/
severity, halo, restriction-in-range, and similar-to-me effects. Training often takes the form of a
brief lecture on or demonstration of each error and training to avoid such errors when making per-
formance ratings (Bernardin & Buckley, 1981).
Frame-of-reference training (Bernardin & Pence, 1980) attempts to convey to raters that per-
formance is multidimensional and to thoroughly familiarize them with the actual content of each
The Measurement of Task Performance as Criteria in Selection Research 447

performance dimension. Regarding familiarization, examples of different levels of performance on


individual dimensions are typically reviewed with raters, along with the “correct” or actual perfor-
mance levels the examples reflect (e.g., Pulakos, 1984). Practice and feedback for trainees typically
rating videotaped performances are important components of this type of training.
Researchers have conducted studies comparing the psychometric properties and accuracy of
ratings made by raters trained using one of the approaches just discussed and ratings generated by
untrained raters. Results suggest the following conclusions: (a) error training is usually successful
in reducing the target psychometric error (Pulakos, 1984), (b) error training does not improve the
quality of ratings when interrater reliability or accuracy is used as a criterion (e.g., Borman, 1979),
and (c) frame-of-reference training increases rating accuracy (Noonan & Sulsky, 2001; Woehr &
Huffcutt, 1994).
A useful observation was offered by Bernardin and Pence (1980): Rater error training is suc-
cessful in reducing the target psychometric response set or error (e.g., halo), but essentially new
response sets are imposed on raters (e.g., to eliminate halo, spread out your ratings across dimen-
sions), resulting in no change in accuracy or a reduction in it. Similarly, Borman (1979) suggested
that to direct persons to adjust their rating distributions in some manner is relatively easy for train-
ing to accomplish; it is much more difficult to train raters to be more accurate. Frame-of-reference
training appears to be the best bet to attain this worthwhile goal.

DIMENSIONALITY OF JOB PERFORMANCE


Almost no one doubts that job performance is a multidimensional construct (Campbell, 1990b;
Ghiselli, 1956). To identify these multiple categories of performance for a job, industrial-organi-
zational (I-O) psychologists will typically use task analysis from which clusters of tasks may be
derived to define the performance dimensions (McCormick, Jeanneret, & Mecham, 1972) or criti-
cal incidents analysis (Flanagan, 1954) that can also result in a set of performance dimensions for
a job. With these approaches to identifying performance categories, the dimension sets are likely
to be different across target jobs. At one level, this is how it should be. Jobs are often different.
However, there is considerable motivation to identify a set of dimensions that represent the perfor-
mance requirements in common across jobs. Over the past 15 years or so, at least six attempts have
been made to develop such dimension sets. In this section of the chapter, we review the methodolo-
gies used in these efforts and present the dimension sets. Then, we summarize all of the dimension
content and review similarities and differences between dimension sets.

HUNT (1996)
Hunt’s (1996) intention was to develop a dimension set that reflected important behavioral dimen-
sions of the performance requirements for entry-level jobs. Using a critical incidents approach, Hunt
(1996) derived an eight-dimension taxonomy of generic work behaviors focusing on non-job-specific
aspects of performance. In this investigation, Hunt used factor analysis to empirically derive dimen-
sions of generic work behaviors from supervisor ratings of employee behaviors. However, contrary
to the typical approach in which a single job family or single organization is used, Hunt obtained
supervisory ratings for nearly 19,000 employees in 52 different job settings across 36 different
companies. Because of the nature of these data (i.e., each data set included a slightly different com-
bination of the behaviors assessed), multiple factor analyses of the dimension structure could be
conducted. First, a sample of data sets was subjected to factor analysis, resulting in several initial
dimension structures. Similarities across these initial taxonomies were then cross-validated through
the use of additional factor analyses (using hold-out data sets) and subject matter expert (SME) rat-
ings of agreement.
These analyses resulted in an eight-dimension taxonomy of generic work behavior, including
task and citizenship behaviors. Specifically, the dimension structure consisted of two higher-order
448 Handbook of Employee Selection

dimensions and eight second-order dimensions. The higher-order dimension of required perfor-
mance behaviors included those behaviors required of an employee for continued employment,
including the second-order dimensions of attendance, off-task behavior (i.e., effort expended toward
non-job-related activities while at work; e.g., goofing off), and employee deviance (a combination
of unruliness, theft, and drug misuse). The second higher-order dimension, organizational citizen-
ship behaviors, is comprised of the second-order dimensions of schedule flexibility and work ethic
(a combination of industriousness and thoroughness). Although Hunt identified a ninth specific
dimension, adherence to confrontational rules, this factor was posited to be primarily relevant to
cash register work, so Hunt omitted it from his model of generic work behavior. In addition to
dimensions generated from the supervisory ratings, Hunt also identified four dimensions of generic
work behavior through a review of the literature, including teamwork, problem solving, safety, and
personal appearance.

VISWESVARAN (1993)
Viswesvaran (1993) built upon the lexical hypothesis of Galton (Goldberg, 1993) to develop a taxon-
omy of general job performance. In this investigation, Viswesvaran compiled and sorted measures
of job performance from the literature into summary categories, resulting in 25 conceptually dis-
tinct dimensions. Next, correlations were obtained from studies utilizing these dimensions. These
correlations were used in a meta-analysis to determine the true score correlations between the
dimensions. Finally, factor analysis was used to analyze these true score correlations and to derive
a set of ten job performance categories.
The dimensions identified in this investigation were intended to summarize overall job perfor-
mance. Dimensions identified by Viswesvaran (1993) included interpersonal competence, adminis-
trative competence, quality, productivity, effort, job knowledge, leadership, compliance/acceptance
of authority, communications competence, and an overall job performance dimension.

BORMAN AND BRUSH (1993)


In a third approach, focusing on managerial performance, Borman and Brush (1993) inductively
derived an 18-dimension taxonomy of performance categories from existing dimension sets taken
from empirical studies of managerial performance. In this project, 26 existing sets of manage-
rial job performance dimensions were gathered from published and unpublished empirical stud-
ies, resulting in a total of 187 independent dimensions. These dimensions were then sorted into
categories by 25 SMEs on the basis of the similarity of the content of each dimension. Utilizing a
method described by Rosenberg and Sedlak (1972), we first computed for each pair of dimensions
the proportion of SMEs who sorted both behavioral dimensions into the same category. Thus, if
15 of the 25 SMEs sorted two dimensions into the same category, the cell in the matrix for that
pair of dimensions was assigned a .60. This computation was accomplished for all pairs of the 187
dimensions. Then, an indirect similarity index was derived for each dimension pair by computing
the correlation of these proportions for each of the two dimensions and the 185 other dimensions.
This index is referred to as an indirect similarity index because it indicates for any dimension pair
the degree of correspondence between each of these two dimensions’ patterns of similarity with all
other dimensions (Borman & Brush, 1993).
To clarify, when one dimension’s pattern of similarities with other dimensions corresponds
closely to a second dimension’s pattern of similarities with these same other dimensions, then the
indirect similarity correlation between these two dimensions will be high. When this correspon-
dence between two dimensions’ similarities with the other dimensions is lower, then the indirect
similarity correlation will be lower. In this manner, correlations between all pairs of the 187 dimen-
sions were generated. Finally, factor analysis was used to analyze the resulting pooled indirect simi-
larity correlations and to derive a set of managerial job performance “megadimensions.”
The Measurement of Task Performance as Criteria in Selection Research 449

Borman and Brush (1993) identified 18 megadimensions of managerial job performance divided
into four groupings. The first grouping, interpersonal dealing and communication, consists of those
megadimensions involving communication skills, maintaining good interpersonal relationships at
work, representing the organization to others, and selling/influencing behaviors. Second, the leader-
ship and supervision grouping includes those megadimensions related to guiding, directing, moti-
vating, training, coaching, developing, and coordinating subordinates, as well as providing feedback
as needed. Third, technical activities and the “mechanics of management” involve megadimensions
pertaining to the technical proficiency required for a job, but also those related to managerial tasks
such as planning, organizing, decision-making, staffing, monitoring, and delegating. Finally, the
last grouping, useful personal behavior and skills, consists of megadimensions involving persis-
tence, handling stress, and organizational commitment. This system, at the 18 “megadimension”
level, is at a relatively high level of specificity, especially because it covers only management jobs.

BORMAN, ACKERMAN, AND KUBISIAK (1994)


Borman, Ackerman, and Kubisiak (1994) incorporated elements of personal construct theory in
developing a 12-dimension taxonomy of performance dimensions arguably relevant to all nonman-
agement jobs in the U.S. economy. Briefly, personal construct theory posits that, on the basis of
their experiences over time, individuals develop categories or dimensions that they use to interpret
and make judgments about events or objects, especially other people. Personal construct theorists
believe that these categories represent the natural way that people think about their world, again,
especially regarding other people (e.g., Adams-Webber, 1979). The Repertory Grid protocol has
provided a method for individuals to generate their personal constructs by contrasting different role
persons (e.g., mother, best friend). In this application, we were asking supervisor participants to gen-
erate their personal constructs related to job performance, what have been referred to as personal
work constructs, or “folk theories” of performance (Borman, 1987).
In particular, 81 supervisors, representing many different types of jobs and industries (e.g., sales,
manufacturing, service sector) generated the names of several effective workers they had worked
with and several relatively ineffective workers. The supervisor sample was instructed to select certain
pairs of effective and ineffective employees and generate a performance dimension that differentiated
the two employees. Sample members prepared a dimension label and a definition of the dimension.
The supervisors generated a total of 730 dimensions and definitions. Because there was overlap
in the content across these dimensions, we reduced the number to 176 reasonably nonredundant
dimensions and definitions, and similar to the Borman and Brush (1993) research, 12 I-O psycholo-
gists sorted these dimensions into categories according to similarity in the performance areas rep-
resented. The Borman and Brush (1993) procedure for generating a proportion matrix for every pair
of dimensions and subsequently an indirect similarity correlation matrix (176 × 176) was executed
in this research, as well. Finally, a factor analysis of this matrix revealed a highly interpretable
12-factor solution. The resulting dimension set might be organized hierarchically, similar to Borman
and Brush (1993).
First, a grouping of interpersonal and communication dimensions was evident, consisting of the
dimensions of communication and cooperation. Next, technical activities related to the job were
represented in the dimensions of job knowledge, task proficiency, productivity, and judgment and
problem solving. Finally, useful personal behavior and skills included the dimensions of depend-
ability, integrity and professionalism, initiative, adaptability, organization, and safety.

CAMPBELL, MCCLOY, OPPLER, AND SAGER (1993)


Campbell, McCloy, Oppler, and Sager (1993) posited eight latent performance categories that sum-
marize the performance requirements of all jobs. The notion was that not every job has all eight
dimensions as performance requirements, but that for any single job, a subset of these factors (or
450 Handbook of Employee Selection

all eight) are sufficient for describing its performance requirements. Several of these constructs
emerged in factor analyses of the performance measures administered in the Project A research (a
large-scale selection and classification study conducted in the U.S. Army; Campbell, 1990a) across
the many jobs studied in that program. As examples, for first-tour soldiers in 19 different jobs, tech-
nical proficiency, personal discipline, and effort were consistently represented in factor analyses of
performance criterion measures. For second-tour noncommissioned officer jobs, a leadership factor
was added to the mix of factors emerging. Accordingly, the Campbell et al. (1993) taxonomy has
been largely confirmed for several categories using data that are highly appropriate for testing its
generality across a wide variety of supervisory and nonsupervisory jobs.
Thus, the Campbell et al. (1993) dimension system includes these eight dimensions: (a) job-
specific technical proficiency, (b) non-job-specific technical proficiency, (c) written and oral com-
munication, (d) demonstrating effort, (e) maintaining personal discipline, (f) facilitating peer and
team performance, (g) supervision/leadership, and (h) management/administration. Parts or all of
Dimensions 1, 2, 4–7, and 8 were confirmed in Project A using multiple methods to measure job
performance, including hands-on performance tests; job knowledge tests; supervisor and peer rat-
ings on multiple dimensions of performance; administrative measures such as disciplinary cases,
awards and commendations, etc.; and a supervisory situational judgment test (SJT) criterion mea-
sure. Importantly, all of these criterion measures were developed and then administered to super-
visory and nonsupervisory people. The wide variety of criterion measures used to evaluate job
performance constructs ensured the criterion space was comprehensively reflected in the coverage
of the performance domain. Accordingly, the system seems quite generalizable across different
types of jobs and supervisory and nonsupervisory jobs.

PETERSON, MUMFORD, BORMAN, JEANNERET, AND FLEISHMAN (1999)


O*NET™’s generalized work activities (GWAs; Borman, Jeanneret, Kubisiak, & Hanson, 1996)
provide a broad-level overview of job behaviors that are applicable to a wide range of jobs. The
GWA framework contains 42 lower-order dimensions that have been summarized into four ”high-
est” order dimensions. Information Input describes those GWAs that focus on how and where infor-
mation is acquired as part of the job, including looking for, receiving, identifying, and evaluating
job-related information. Mental Processes summarizes those GWAs that involve information and
data processing, as well as reasoning and decision-making. Work Output describes physical activi-
ties that get performed on the job, including manual activities and complex and technical activities
requiring coordinated movements. Finally, Interacting With Others summarizes GWAs that involve
interactions with others or supervisory functions including communicating, interacting, coordinat-
ing, developing, managing, advising, and administering.
Between the 4 and 42 levels of GWA dimensions, there is a nine-dimension system that we focus
on here. This system is supported by factor analytic studies of the Position Analysis Questionnaire
(PAQ) and other job analysis instruments for nonsupervisory jobs, and the Borman and Brush (1993)
behavioral dimensions (in turn derived in part from other factor analyses involving management jobs)
for supervisory jobs. The nine-dimension system includes (a) looking for and receiving job-related
information, (b) identifying and evaluating job-related information, (c) information/data processing,
(d) reasoning/decision-making, (e) performing physical and manual work activities, (f) performing
complex/technical activities, (g) communicating/interacting, (h) coordinating/developing/manag-
ing/advising others, and (i) administering (see Chapter 41, this volume, for more on O*NET).

INTEGRATING THE JOB PERFORMANCE DIMENSION TAXONOMIES


Clearly, important similarities exist across the dimension taxonomies discussed that allow an inte-
gration of the dimension systems, but also point to “outlier” dimensions in some of the taxonomies
that are worth noting. Table 21.2 presents a crosswalk of the six dimensional systems, indicating the
TABLE 21.2
Summary of Six Performance Taxonomies
Campbell, McCloy, Peterson, Mumford,
Borman, Ackerman, & Oppler, & Sager Borman, Jeanneret, &
Hunt (1996) Viswesvaran (1993) Borman & Brush (1993) Kubisiak (1994) (1993) Fleishman (1999) Summary Dimensions
• Teamwork • Communication • Communicating • Communication • Communication • Communicating and Communicating and
competence effectively and keeping • Cooperation • Facilitating peer interacting interacting
• Interpersonal others informed and team
competence • Maintaining good working performance
relationships
• Selling/influencing
• Representing the
organization to customers
and the public
• Thoroughness • Productivity • Technical proficiency • Effort and • Job-specific • Performing Productivity and
• Job knowledge productivity technical complex and proficiency
• Quality • Job knowledge proficiency technical activities
• Effort • Task proficiency • Non-job-specific • Performing physical
technical and manual work
proficiency activities
• Industriousness • Compliance/ • Persisting to reach goals • Initiative • Demonstrating Useful personal
• Adherence to acceptance of • Handling crises and stress • Adaptability effort qualities
confrontational rules authority • Safety • Maintaining
• Safety • Dependability personal discipline
• Schedule flexibility • Integrity and
The Measurement of Task Performance as Criteria in Selection Research

professionalism
• Problem solving • Decision-making/ problem • Judgment and • Reasoning and Problem solving
solving problem solving decision-making
• Administrative • Administration and • Organization • Administering Organizing and
competence paperwork planning
• Planning and organizing
• Monitoring and
controlling resources
• Staffing
451

continued
452

TABLE 21.2 (continued)


Summary of Six Performance Taxonomies
Campbell, McCloy, Peterson, Mumford,
Borman, Ackerman, & Oppler, & Sager Borman, Jeanneret, &
Hunt (1996) Viswesvaran (1993) Borman & Brush (1993) Kubisiak (1994) (1993) Fleishman (1999) Summary Dimensions
• Leadership • Coordinating subordinates • Supervision/ • Coordinating, Leadership and
and others resources to get leadership developing, supervision
the job done • Management/ managing, and
• Guiding, directing, and administration advising
motivating subordinates
and providing feedback
• Training, coaching, and
developing subordinates
• Delegating
• Collecting and interpreting • Information and Information processing
data data processing
• Identifying and
evaluating
job-relevant
information
• Looking for and
receiving
job-related
information
• Off-task behavior Counterproductive work
• Unruliness behaviors
• Attendance
• Drug misuse
• Theft
Handbook of Employee Selection
The Measurement of Task Performance as Criteria in Selection Research 453

commonalities and differences across the systems. Then, we provide a summary column, reflecting
the common content where it is evident. Also, the rows of Table 21.2 are ordered such that the first
row represents the most commonality across systems, the second row has the next most commonal-
ity, and so on.
All six dimension sets have content involving communicating and interacting with others,
although Hunt (1996) has only a teamwork dimension so communicating is not explicitly repre-
sented in his framework. In Viswesvaran (1993), Borman and Brush (1993), and Borman et al.
(1994), communicating and the interpersonal component are represented separately; in Peterson
et al. (1999), at the nine-dimension level, the two constructs are combined in a single dimension.
Productivity and proficiency are likewise reflected in all six dimension sets, although the config-
uration of performance dimension content for this construct is somewhat different across the dimen-
sion sets. For example, Viswesvaran (1993) has four of his nine dimensions related to this construct
(productivity, job knowledge, quality, and effort), Borman et al. (1994) have 3 of their 12 dimensions
(effort and productivity, job knowledge, and task proficiency) in this category; and Peterson et al.
(1999) divided the construct into performing complex/technical and physical/manual activities.
The third summary construct, useful personal qualities and skills, is more heterogeneous, but
five of the six dimension sets are represented in some fashion. The content varies from Hunt’s (1996)
industriousness and adherence to rules to Borman and Brush’s (1993) persisting to reach goals and
handling crises, to Borman et al.’s (1994) initiative, adaptability, and safety and Campbell et al.’s
(1993) personal discipline and effort (this effort dimension is defined less like productivity and
more like a personal quality compared with the other two effort dimensions). Viswesvaran’s (1993)
compliance dimension is similar to Hunt’s adherence to rules.
Problem solving draws on content from four of the six dimension sets. Three of these four (Borman
& Brush, 1993; Borman et al., 1994; and Peterson et al., 1999) include elements of decision-making
in addition to problem solving; Hunt’s (1996) system defined problem solving more narrowly.
The fifth construct, organizing and planning, also has representation by four of the dimension
sets. Because this construct can be seen as in part management-oriented, it is not surprising that
Borman and Brush’s (1993) managerial taxonomy has several dimensions in this category (i.e.,
administration and paperwork, planning and organizing, monitoring and controlling resources,
and staffing). Viswesvaran (1993), Borman et al. (1994), and Peterson et al. (1999), have a single
administering or organizing dimension. Finally, Campbell et al.’s (1993) management/administra-
tion dimension is broader than organizing and planning but does contain elements relevant to this
construct.
The sixth summary construct is leadership and supervision and is also represented in four of the
dimension sets. Again, as might be expected, the Borman and Brush (1993) managerial taxonomy
has multiple dimensions in this category (coordinating subordinates; guiding, directing, and moti-
vating subordinates; training, coaching, and developing subordinates; and a delegating dimension).
Campbell et al. (1993) have two leadership-related dimensions (supervision/leadership and at least
part of management/administration). We should note that the Hunt (1996) and Borman et al. (1994)
taxonomies were intended for entry-level and nonmanagement jobs, respectively, and thus would
not be expected to contain supervisory or managerial dimensions.
A seventh construct, information processing, had representation from only two systems: (a) infor-
mation and data processing, identifying and evaluating job-relevant information, and looking for
and receiving job-related information from Peterson et al. (1999) and (b) collecting and interpreting
data from Borman and Brush (1993). Hunt’s (1996) dimension set had several dimensions that could
be classified as counterproductive work behaviors. These included off-task behavior, unruliness,
drug misuse, and theft.
We are not advocating our six (or seven or eight) summary construct system as a preferred
dimension set. All of the six dimension sets reviewed have important strengths. What we are advo-
cating, following Campbell et al. (1993) and Campbell, Gasser, and Oswald (1996), is that the field
move toward some performance taxonomy that can be used in personnel selection research to more
454 Handbook of Employee Selection

systematically study empirical links between individual differences and individual performance
constructs as represented by a job performance taxonomy. A science of personnel selection could
benefit greatly from research using a common set of performance constructs to map individual
difference (e.g., abilities, personality, vocational interests) job performance relations (Borman,
Hanson, & Hedge, 1997; Campbell et al., 1996). This approach gets us beyond studying individual
differences-overall job performance correlations. These thrusts to conceptually differentiate crite-
rion constructs, and most importantly, in measuring them are prerequisites toward systematically
studying links between individual predictor and criterion variables in a selection context.

PREDICTORS OF TASK PERFORMANCE DIMENSIONS


Although a great deal of research has examined relationships between individual differences and
job performance, most studies suffer from an important limitation: conceptual or measurement
ambiguity of the criterion domain. Commonly referred to as “the criterion problem” (see Austin &
Villanova, 1992, for a review), it is often unclear which specific aspect of job performance is the
target criterion or, if the criterion is overall job performance, what the criterion measure is actually
tapping. For example, regarding the latter case, Podsakoff, MacKenzie, and Hui (1993) presented
several theoretical reasons why when supervisory ratings represent the criterion measure, super-
visors may consider task and citizenship performance when assessing an employee’s overall job
performance. Empirically, the literature supports this proposition; across field and experimental
studies, researchers have found that citizenship behaviors account for as much or sometimes more
variability in overall performance evaluations compared to task performance (e.g., Borman, White,
& Dorsey, 1995; Motowidlo & Van Scotter, 1994; Werner, 1994; see Podsakoff, MacKenzie, Paine,
& Bachrach, 2000, for a review). Because the criteria in selection research are often conceptually
ambiguous, specifying the relationship between predictor variables and task performance is more
difficult than one might expect.
The necessity of differentiating between task performance and citizenship performance is high-
lighted by the empirical finding that they have different antecedents. Specifically, researchers have
hypothesized that cognitive ability is more likely to predict task performance, whereas person-
ality variables are more likely to predict citizenship performance (Borman & Motowidlo, 1993;
Motowidlo, Borman, & Schmit, 1997). The few studies that have explicitly tested these relationships
offer some support that personality correlates more highly with citizenship performance than with
task performance (Borman, Penner, Allen, & Motowidlo, 2001b; Motowidlo & Van Scotter, 1994).
Thus, when examining the relationship between predictors and task performance, it is essential to
identify studies that clearly operationalize performance as such.
Although few studies clearly distinguish between task and citizenship performance, Project A
represents an important exception. Also known as the U.S. Army’s Selection and Classification
Project, Project A was a large-scale test validation effort conducted by the Army Research Institute
(ARI) and three research firms (Campbell, 1990a; Campbell & Zook, 1990; see also Chapter 40,
this volume). The 7-year effort included both cross-sectional and longitudinal research, and data
were collected from thousands of participants across a wide range of military occupational spe-
cialties (MOS). In addition to its large sample size, Project A offers several other advantages. For
example, the predictor variables were chosen with careful consideration of the predictor construct
space, the target population of jobs for which the predictors would be used, and the specific criterion
constructs identified for these jobs. On the basis of a literature review and several types of expert
judgments, various predictors were developed. Twenty-four predictor composite scores were col-
lected for each participant, representing six major constructs: general cognitive ability, spatial abil-
ity, perceptual/psychomotor ability, temperament, vocational interests, and job reward preferences.
Thus, can-do (i.e., ability) and will-do (i.e., personality) factors were well represented.
Similarly, and as mentioned, development and measurement of the criteria reflected a great deal
of research, time, and effort. Criteria were carefully developed based on a literature review, the
The Measurement of Task Performance as Criteria in Selection Research 455

critical incidents technique, and a clear explication of the task domain. Performance was measured
using multiple indices, including hands-on job sample tests, multiple-choice knowledge tests, and
supervisor/peer ratings of performance on BSS. Army-wide and MOS-specific scales were devel-
oped, and administrative/archival records were also examined. On the basis of exploratory and
confirmatory factor analytic results, five dimensions of performance were specified: core technical
proficiency, general soldiering proficiency, effort and leadership, personal discipline, and physical
fitness and military bearing. The first two factors—core technical proficiency and general soldier-
ing proficiency—clearly represent task performance constructs; substantial loadings were evident
for hands-on performance tests, the job knowledge tests, and supervisor/peer ratings on some of the
technical performance dimensions. Thus, this section focuses on these two performance dimen-
sions. By differentiating between task and citizenship aspects of performance, measuring repre-
sentative and carefully developed predictor variables, and utilizing a large sample size, Project A
provides relatively accurate point estimates of predictor construct-task performance relationships.
The remainder of the chapter will summarize the correlations obtained in the concurrent valida-
tion study of Project A (Campbell & Zook, 1990). Mean validities are based on data from 4,039
incumbents in nine diverse MOS, including infantryman, cannon crewmember, armor crewman,
single channel radio operator, light wheel vehicle mechanic, motor transport operator, administra-
tive specialist, medical specialist, and military police. Validity estimates were corrected for range
restriction and criterion unreliability. Despite the exemplary aspects of Project A, one might ques-
tion the generalizability of the results, given that an Army sample was used. Thus, for each type of
predictor, relevant research conducted with other samples is also discussed.

GENERAL COGNITIVE ABILITY


A large body of literature has examined the link between general cognitive ability and job perfor-
mance, and findings indicate that it is one of the most robust predictors of performance (Schmidt
& Hunter, 1998; see also Chapter 12, this volume). In Project A, general cognitive ability was
measured with the Armed Services Vocational Aptitude Battery (ASVAB), in which nine subtests
combined to form four composite scores: technical, quantitative, verbal, and speed. Similar to other
research in the selection field, the Project A relationships between general cognitive ability and task
performance dimensions were strong, with a mean validity of .63 between cognitive ability and core
technical proficiency and .65 between cognitive ability and general soldiering proficiency.
A substantial body of research has also examined the relationship between cognitive ability
and job performance using nonmilitary samples. Literally thousands of studies have investigated
this research question, finding strong correlations between general cognitive ability and job perfor-
mance across various jobs, companies, and criteria (e.g., Hunter, 1986; Hunter & Schmidt, 1996;
Schmidt & Hunter, 1981). Although research conducted on civilian populations report high valid-
ity coefficients between job performance and cognitive ability, they are not quite as high as those
reported in Project A. For example, Hunter and Hunter (1984) summarized the results of 515 vali-
dation studies conducted by the Department of Labor, with more than 32,000 employees in 512
diverse civilian jobs (Hunter, 1980). On the basis of this large-scale meta-analysis, Hunter and
Hunter reported validities of .40, .51, and .58 between general cognitive ability and job proficiency
for low-, medium-, and high-complexity jobs, respectively. The .51 estimate for medium complexity
jobs was recited in Schmidt and Hunter’s (1998) seminal article and is frequently referenced in the
literature as a point estimate for the relationship between cognitive ability and job performance.
More recently, researchers have conducted meta-analyses on the basis of studies conducted in dif-
ferent countries (e.g., the United Kingdom and Germany), reporting relationships of similar magni-
tude (Bertua, Anderson, & Salgado, 2005; Hülsheger, Maier, & Stumpp, 2007).
Although it is unclear why the Project A validities between general cognitive ability and job
performance are somewhat higher than those reported elsewhere, one possibility is the different
conceptualizations of performance across the various studies. As previously mentioned, Borman
456 Handbook of Employee Selection

and Motowidlo (1993) suggested that the relationship between cognitive ability and performance is
likely to be stronger for task versus citizenship performance. Thus, because the Project A validity
estimates against the task proficiency criteria represent task performance alone, whereas the validity
estimates reported in other meta-analyses represent validity against overall performance, it follows
that the Project A validities might be higher. In support of this prediction, the relationship between
general cognitive ability and the other three performance factors examined in Project A (which are
more aptly described as indices of citizenship performance) were significantly lower. Specifically,
general cognitive ability correlated .25 with effort and leadership, .16 with personal discipline, and
.20 with physical fitness and military bearing. Although this explanation is somewhat speculative,
one conclusion is clear: General cognitive ability is a robust predictor of task performance.

SPATIAL AND PERCEPTUAL/PSYCHOMOTOR ABILITY


In addition to general cognitive ability, Project A examined the relationship between spatial and
perceptual/psychomotor and task performance. Spatial ability was measured with the Spatial Test
Battery, comprised of six paper-and-pencil tests. The six tests—assembling objects, object rotation,
mazes, orientation, map, and figural reasoning—were combined to form an overall composite score.
Perceptual/psychomotor ability was assessed in a computerized battery of 20 tests, which formed
six composite scores: psychomotor, complex perceptual speed, complex perceptual accuracy, num-
ber speed and accuracy, simple reaction speed, and simple reaction accuracy. Sample tests include
target identification, cannon shoot, and target tracking.
Although lower than with general cognitive ability, the relationships between spatial and per-
ceptual/psychomotor ability and task performance were high. The correlations with core techni-
cal proficiency and general soldiering proficiency were .56 and .63 for spatial ability and .53 and
.57 for perceptual/psychomotor ability. These mean validities are substantially higher than those
reported in other studies, in which task and citizenship performance were not explicitly distin-
guished. Several meta-analytic studies have examined these relationships, focusing on such specific
industries as aviation (Hunter & Burke, 1994; Martinussen, 1996) and craft jobs in the utility field
(Levine, Spector, Menon, Narayanan, & Cannon-Bowers, 1996). Across 68 studies on pilot selec-
tion, Hunter and Burke (1994) reported mean validities of .19 for spatial ability, .32 for gross dexter-
ity, .10 for fine dexterity, and .20 for perceptual speed, correcting for sampling error only. Similarly,
Martinussen (1996) reported a mean relationship of .20 between psychomotor/information process-
ing and pilot performance. Martinussen’s meta-analysis was based on 50 studies conducted in 11
countries. Finally, Levine et al. (1996) conducted a meta-analysis of 80 studies that sampled craft
jobs in the utility industry across six job families. The weighted average of correlation coefficients
was .20 between perceptual ability (which included spatial ability tests) and overall performance
and .21 between psychomotor ability and performance.
Thus, although Project A reported mean relationships of .53 to .63 between spatial and percep-
tual/psychomotor ability and task performance, meta-analytic studies examining overall perfor-
mance report validities closer to .20. Similar to the findings for general cognitive ability, the Project
A relationships between these specific abilities and the citizenship dimensions of performance were
much lower, ranging from .10 to .25 for spatial ability, and .11 to .26 for perceptual/psychomotor
ability.
Another noteworthy research question with regard to the predictive validity of spatial, percep-
tual, and psychomotor abilities is whether such specific aptitudes account for variance over and
above general cognitive ability. Although multiple aptitude theory suggests that the prediction of
performance should be optimized by differentially weighting aptitudes depending on the job under
investigation, researchers have repeatedly found that this approach does not add predictive validity
(e.g., Hunter, 1986; Jensen, 1986; Olea & Ree, 1994; Ree & Earles, 1992; Ree, Earles, & Teachout,
1994; and Schmidt, Ones, & Hunter, 1992), with few exceptions (e.g., Mount, Oh, & Burns, 2008).
On the contrary, researchers have suggested that specific aptitude tests are primarily measuring
The Measurement of Task Performance as Criteria in Selection Research 457

general cognitive ability and that this overlap is one potential explanation for the moderate relation-
ships between spatial, perceptual, and psychomotor abilities and performance criteria. Additionally,
in comparison to general cognitive ability, the validity coefficients for specific aptitudes tend to be
lower and much more variable (Schmidt, 2002), highlighting the utility of using general cognitive
ability rather than specific aptitudes in selection research.

PERSONALITY
Over the past 2 decades, research on the utility of personality in the selection field has received a
great deal of attention (see Chapter 14, this volume). Although early estimates of the relationship
between personality and performance were quite low, more recent results have been more optimis-
tic. Personality researchers generally credit the advent of a well-accepted taxonomy (i.e., the Big
Five) and the increased use of validity generalization techniques (e.g., Barrick & Mount, 1991) for
the recent positive findings.
Although Project A did not utilize the Five-Factor Model in the measurement of personality, it
did find moderate correlations between personality and task performance. Using the Assessment of
Background Life Experiences (ABLE), soldiers completed 11 scales (emotional stability, self-esteem,
cooperativeness, conscientiousness, nondelinquency, traditional values, work orientation, internal
control, energy level, dominance, and physical condition). Seven of the scales combined to form
four composite scores: adjustment, dependability, achievement orientation, and physical condition.
Overall, the mean validities for personality were .25 for both dimensions of task performance. This
is similar to the relationship of .27 reported by Barrick, Mount, and Judge (2001) between conscien-
tiousness and overall job performance in their meta-analysis of 15 earlier meta-analyses, including
such seminal articles as Barrick and Mount (1991) and Tett, Jackson, and Rothstein (1991). Project
A also examined relationships between personality dimensions and performance on the other three
more citizenship oriented criteria, reporting mean correlations of .33, .32, and .37, respectively, for
effort and leadership, personal discipline, and physical fitness and military bearing.
Hurtz and Donovan (2000) and Borman et al. (2001b) conducted other meta-analyses worthy
of mention. Hurtz and Donovan partitioned the criterion domain into three dimensions: task per-
formance, job dedication, and interpersonal facilitation. Their findings indicate the following rela-
tionships between the Big Five and task performance: .15 for conscientiousness, .13 for emotional
stability, -.01 for openness to experience, .07 for agreeableness, and .06 for extraversion. Furthermore,
they found that partitioning the criterion domain into various dimensions had minimal impact on
the magnitude of the validity coefficients. This contrasts with the results of Project A, which found
higher mean validities for the citizenship performance dimensions (.32 to .37) compared to the task
performance dimensions (.25). Similarly, in a review of 20 studies, Borman et al. (2001b) reported
that the relationship between conscientiousness and citizenship performance was higher than the
relationship between conscientiousness and task performance. Despite such findings, researchers
tend to agree that dispositional factors other than conscientiousness are not strongly related to citi-
zenship performance after controlling for common method variance (Podsakoff et al., 2000). Thus,
despite Borman and Motowidlo’s (1993) proposition that personality variables are more likely to
relate to contextual performance than task performance, researchers examining this question have
reported mixed findings.

VOCATIONAL INTERESTS
Another set of predictors investigated in Project A was vocational interests. On the basis of Holland’s
(1966) Basic Interests Constructs, as well as six different organizational climate scales, Project A
researchers developed the Army Vocational Interest Career Examination (AVOICE). Twenty-two
scales make up AVOICE, forming six composite scores: skilled technical, structural/machines,
combat-related, audiovisual arts, food service, and protective services. Across the six composites,
458 Handbook of Employee Selection

vocational interests related to core technical proficiency (.35) and general soldiering proficiency
(.34). Conversely, Schmidt and Hunter (1998), citing Holland (1986), commented that there is gener-
ally no relationship between interests and job performance. Although they considered this a some-
what surprising finding, they hypothesized that interests may affect one’s choice of jobs, but once
the job is selected, interests do not affect performance.
More recently, Morris (2003) conducted a meta-analysis of 93 studies, reporting a mean cor-
rected correlation of .29 between vocational interests and job performance. Interestingly, larger
effect sizes were observed when studies used task performance as the criterion. The reason for this
finding is unclear, but it does mirror the Project A results. Specifically, the correlations between
vocational interests and performance were higher for task performance dimensions (.34 to .35)
compared with citizenship performance dimensions (.12 to .24).
Overall, Project A research supports quite strong relationships between general cognitive ability,
spatial ability, and perceptual/psychomotor ability and task performance. Importantly, these cor-
relations are higher with task performance than when the criterion is overall performance, the cri-
terion almost always used in meta-analyses of these predictors’ validities against job performance.
The explanation we offered for this finding is the consistent trend in the literature of combining
task and citizenship performance into one overall factor, thus reducing the validities of these pre-
dictors. Relations for personality and vocational interests with task performance are more modest
but still far from trivial (mid .20s for personality and mid .30s for vocational interests). Thus, the
largest mean validities were observed for the maximum performance can-do factors (i.e., GMA,
spatial ability, and perceptual/psychomotor ability), whereas the typical performance or will-do
factors (i.e., personality and vocational interests) exhibited more modest validities against task
performance.
Because of the careful explication of the predictor and criterion variables, as well as the large
sample size, we can be confident that Project A provides reasonable point estimates of the relation-
ships between these constructs. Although the generalizability of the results is questionable, meta-
analyses examining more diverse samples also support the significance of these relationships, albeit
with generally smaller magnitudes.

REFERENCES
Adams-Webber, J. (1979). Intersubject agreement concerning relationships between the positive and negative
poles of constructs in reperatory grid tests. British Journal of Medical Psychology, 52, 197–199.
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917–1992. Journal of Applied Psychology, 77,
836–874.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and performance at the beginning of the new
millennium: What do we know and where do we go next? Personality and Performance, 9, 9–30.
Bernardin, H. J., & Beatty, R. W. (1984). Performance appraisal: Assessing human behavior at work. Boston,
MA: Kent.
Bernardin, H. J., & Buckley, M. R. (1981). Strategies in rater training. Academy of Management Review,
6, 205–212.
Bernardin, H. J., & Pence, E. C. (1980). Effects of rater training: Creating new response sets and decreasing
accuracy. Journal of Applied Psychology, 65, 60–66.
Bernardin, H. J., & Smith, P. C. (1981). A clarification of some issues regarding the development and use of
behaviorally anchored rating scales (BARS). Journal of Applied Psychology, 66, 458–463.
Bertua, C., Anderson, N., & Salgado, J. F. (2005). The predictive validity of cognitive ability tests: A UK meta-
analysis. Journal of Occupational and Organizational Psychology, 78, 387–409.
Borman, W. C. (1979). Format and training effects on rating accuracy and rater errors. Journal of Applied
Psychology, 64, 410–421.
Borman, W. C. (1987). Personal constructs, performance schemata, and “folk theories” of subordinate effective-
ness: Explorations in an Army officer sample. Organizational Behavior and Human Decision Processes,
40, 307–322.
The Measurement of Task Performance as Criteria in Selection Research 459

Borman, W. C. (1991). Job behavior, performance, and effectiveness. In M. D. Dunnette & L. M. Hough (Eds.),
Handbook of industrial and organizational psychology (Vol. 2, pp. 271–326). Palo Alto, CA: Consulting
Psychologists Press.
Borman, W. C., Ackerman, L. D., & Kubisiak, U. C. (1994). Development of a performance rating program in
support of Department of Labor test validation research. Unpublished manuscript.
Borman, W. C., & Brush, D. H. (1993). More progress toward a taxonomy of managerial performance require-
ments. Human Performance, 6, 1–21.
Borman, W. C., Buck, D. E., Hanson, M. A., Motowidlo, S. J., Stark, S., & Drasgow, F. (2001a). An examina-
tion of the comparative reliability, validity, and accuracy of performance ratings made using computer-
ized adaptive rating scales. Journal of Applied Psychology, 86, 965–973.
Borman, W. C., Hanson, M. A., & Hedge, J. W. (1997). Personnel selection. In J. T. Spence, J. M. Darley,
& D. J. Foss (Eds.), Annual review of psychology (Vol. 48, pp. 299–337). Palo Alto, CA: Consulting
Psychologists Press.
Borman, W. C., Jeanneret, P. R., Kubisiak, U. C., & Hanson, M. A. (1996). Generalized work activities:
Evidence for the reliability and validity of the measures. In N. G. Peterson, M. D. Mumford, W. C.
Borman, P. R. Jeanneret, & E. A. Fleishman (Eds.), O*NET final technical report. Salt Lake City, UT:
Utah Department of Employment Security.
Borman, W. C., & Motowidlo, S. J. (1993). Expanding the criterion domain to include elements of contextual
performance. In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organizations (pp. 71–98).
San Francisco, CA: Jossey-Bass.
Borman, W. C., Penner, L. A., Allen, T. D., & Motowidlo, S. J. (2001b). Personality predictors of citizenship
performance. International Journal of Selection and Assessment, 9, 52–69.
Borman, W. C., White, L. A., & Dorsey, D. W. (1995). Effects of ratee task performance and interpersonal fac-
tors on supervisor and peer performance ratings. Journal of Applied Psychology, 80, 168–177.
Campbell, J. P. (1990a). An overview of the Army Selection and Classification Project (Project A). Personnel
Psychology, 43, 231–239.
Campbell, J. P. (1990b). Modeling the performance prediction problem in industrial and organizational psy-
chology. In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychol-
ogy (2nd ed., Vol. 1, pp. 687–732). Palo Alto, CA: Consulting Psychologists Press.
Campbell, J. P., Dunnette, M. D., Lawler, E. E., & Weick, K. E. (1970). Managerial behavior, performance, and
effectiveness. New York: McGraw Hill.
Campbell, J. P., Gasser, M. B., & Oswald, F. L. (1996). The substantive nature of performance variability. In
K. R. Murphy (Ed.), Individual differences and behavior in organizations. San Francisco, CA: Jossey-
Bass.
Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. (1993). A theory of performance. In N. Schmitt &
W. C. Borman (Eds.), Personnel selection in organizations (pp. 35–70). San Francisco, CA: Jossey-Bass.
Campbell, J. P., & Zook, L. M. (Eds.). (1990). Improving the selection, classification, and utilization of Army
enlisted personnel: Final report on Project A (ARI Research Report 1597). Alexandria, VA: U.S. Army
Research Institute for the Behavioral and Social Sciences.
DeVries, D. L., Morrison, A. M., Shullman, S. L., & Gerlach, M. L. (1981). Performance appraisal on the line.
New York, NY: Wiley.
Farr, J. L., & Levy, P. E. (2007). Performance appraisal. In L. L. Koppes (Ed.), Historical perspectives in indus-
trial and organizational psychology (pp. 311–327). Mahwah, NJ: Lawrence Erlbaum.
Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin, 51, 327–358.
Gatewood, R. D., & Field, H. S. (2001). Human resource selection (5th ed.). Fort Worth, TX: Harcourt College
Publishers.
Ghiselli, E. E. (1956). Dimensional problems of criteria. Journal of Applied Psychology, 40, 1–4.
Goldberg, L. R. (1993). The structure of phenotypic personality traits. American Psychologist, 48, 26–34.
Guion, R. M. (1965). Personnel testing. New York, NY: McGraw-Hill.
Guion, R. M. (1978). Principles of work sample testing: III. Construction and evaluation of work sample tests.
Alexandria, VA: U.S. Army Research Institute.
Holland, J. (1986). New directions for interest testing. In B. S. Plake & J. C. Witt (Eds.), The future of testing
(pp. 245–267). Hillsdale, NJ: Lawrence Erlbaum.
Holland, J. L. (1966). The psychology of vocational choice: A theory of personality types and model environ-
ments. Oxford, England: Blaisdell.
Hülsheger, U. R., Maier, G. W., & Stumpp, T. (2007). Validity of general mental ability for the prediction of
job performance and training success in Germany: A meta-analysis. International Journal of Selection
and Assessment, 15, 3–18.
460 Handbook of Employee Selection

Hunt, S. T. (1996). Generic work behavior: An investigation into the dimensions of entry-level, hourly job
performance. Personnel Psychology, 49, 51–83.
Hunter, D. R., & Burke, E. F. (1994). Predicting aircraft pilot training success: A meta-analysis of published
research. The International Journal of Aviation Psychology, 4, 297–313.
Hunter, J. E. (1980). Validity generalization for 12,000 jobs: An application of synthetic validity and validity
generalization to the General Aptitude Test Battery (GATB). Washington, DC: U.S. Department of Labor,
Employment Service.
Hunter, J. E. (1986). Cognitive ability, cognitive aptitudes, job knowledge, and job performance. Journal of
Vocational Behavior, 29, 340–362.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin, 96, 72–98.
Hunter, J. E., & Schmidt, F. L. (1996). Intelligence and job performance: Economic and social implications.
Psychology, Public Policy, and Law, 2, 447–472.
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of
Applied Psychology, 85, 869–879.
Jensen, A. R. (1986). g: Artifact or reality? Journal of Vocational Behavior, 29, 301–331.
Kanfer, R., & Borman, W. C. (1987). Predicting salesperson performance: A review of the literature (Research
Note 87-13). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
Landy, F. J., & Farr, J. L. (1980). Performance rating. Psychological Bulletin, 87, 72–107.
Landy, F. J., & Farr, J. L. (1983). The measurement of work performance: Methods, theory and applications.
New York, NY: Academic Press.
Latham, G. P., & Wexley, K. N. (1981). Increasing productivity through performance appraisal. Reading, MA:
Addison-Wesley.
Levine, E. L., Spector, P. E., Menon, S., Narayanan, L., & Cannon-Bowers, J. (1996). Validity generalization
for cognitive, psychomotor, and perceptual tests for craft jobs in the utility industry. Human Performance,
9, 1–22.
Martinussen, M. (1996). Psychological measures as predictors of pilot performance: A meta-analysis. The
International Journal of Aviation Psychology, 6, 1–20.
McCormick, E. J. (1976). Job and task analysis. In M. D. Dunnette (Ed.), Handbook of industrial and organi-
zational psychology (pp. 651–696). Chicago, IL: Rand McNally.
McCormick, E. J., Jeanneret, P. R., & Mecham, R. C. (1972). A study of job characteristics and job dimensions
as based on the position analysis questionnaire (PAQ). Journal of Applied Psychology, 56, 347–368.
Morris, M. A. (2003). A meta-analytic investigation of vocational interest-based job fit, and its relationship to
job satisfaction, performance, and turnover. Unpublished doctoral dissertation. University of Houston,
Houston, TX.
Motowidlo, S. J., Borman, W. C., & Schmit, M. J. (1997). A theory of individual differences in task and contextual
performance. Human Performance, 10(2), 71–83.
Motowidlo, S. J., & Van Scotter, J. R. (1994). Evidence that task performance should be distinguished from
contextual performance. Journal of Applied Psychology, 79, 475–80.
Mount, M. K., Oh, I., & Burns, M. (2008). Incremental validity of perceptual speed and accuracy over general
mental ability. Personnel Psychology, 61, 113–139.
Murphy, K. R., Martin, C., & Garcia, M. (1982). Do behavioral observation scales measure observation?
Journal of Applied Psychology, 67, 562–567.
Noonan, L., & Sulsky, L. M. (2001). Examination of frame-of-reference and behavioral observation training on
alternative training effectiveness criteria in a Canadian military sample. Human Performance, 14, 3–26.
Olea, M. M., & Ree, M. J. (1994). Predicting pilot and navigator criteria: Not much more than g. Journal of
Applied Psychology, 79, 845–851.
Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., & Fleishman, E. A. (Eds.). (1999). The occupa-
tion information network (O*NET). Washington, DC: American Psychological Association.
Podsakoff, P. M., MacKenzie, S. B., & Hui, C. (1993). Organizational citizenship behaviors and managerial
evaluations of employee performance: A review and suggestions for future research. In G. R. Ferris &
K. M. Rowland (Eds.), Research in Personnel and Human Resources Management (Vol. 11, pp. 1–40).
Greenwich, CT: JAI Press.
Podsakoff, P. M., MacKenzie, S. B., Paine, J. B., & Bachrach, D. G. (2000). Organizational citizenship behaviors:
A critical review of the theoretical and empirical literature and suggestions for future research. Journal of
Management, 26, 513–563.
Pulakos, E. D. (1984). A comparison of rater training programs: Error training and accuracy training. Journal
of Applied Psychology, 69, 581–588.
The Measurement of Task Performance as Criteria in Selection Research 461

Ree, M. J., & Earles, J. A. (1992). Intelligence is the best predictor of job performance. Current Directions in
Psychological Science, 1, 86–89.
Ree, M. J., Earles, J. A., & Teachout, M. (1994). Predicting job performance: Not much for than g. Journal of
Applied Psychology, 79, 518–524.
Rosenberg, S., & Sedlak, A. (1972). Structural representations of perceived personality trait relationships. In
A. K. Romney, R. N. Shepard, & S. B. Nerlave (Eds.), Multidimensional scaling (pp. 134–162). New
York, NY: Seminar.
Rothe, H. F. (1978). Output rates among industrial employees. Journal of Applied Psychology, 63, 40–46.
Schmidt, F. L. (2002). The role of general cognitive ability and job performance: Why there cannot be a debate.
Human Performance, 15, 187–210.
Schmidt, F. L., & Hunter, J. E. (1981). Employment testing: Old theories and new research findings. American
Psychologist, 36, 1128–1137.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology:
Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Schmidt, F. L., Ones, D. S., & Hunter, J. E. (1992). Personnel selection. Annual Review of Psychology, 43,
627–670.
Schwab, D. P., Heneman, H. G., & DeCotiis, T. (1975). Behaviorally anchored rating scales: A review of the
literature. Personnel Psychology, 28, 549–562.
Smith, D. E. (1986). Training programs for performance appraisal: A review. Academy of Management Review,
11, 22–40.
Smith, P. C., & Kendall, L. M. (1963). Retranslation of expectations: An approach to the construction of
unambiguous anchors for rating scales. Journal of Applied Psychology, 47, 149–155.
Stark, S., & Drasgow, F. (2002). An EM approach to parameter estimation for the Zinnes and Griggs paired
comparison IRT model. Applied Psychological Measurement, 26, 208–227.
Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A
meta-analytic review. Personnel Psychology, 44, 703–742.
Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 34, 273–286.
Viswesvaran, C. (1993). Modeling job performance: Is there a general factor? Unpublished doctoral dissertation,
University of Iowa, Iowa City, IA.
Werner, J. M. (1994). Dimensions that make a difference: Examining the impact of in-role and extra-role
behaviors on supervisory ratings. Journal of Applied Psychology, 79, 98–107.
Woehr, D. J., & Huffcutt, A. I. (1994). Rater training for performance appraisal: A quantitative review. Journal
of Occupational and Organizational Psychology, 67, 189–205.
This page intentionally left blank
22 Adaptive and
Citizenship-Related
Behaviors at Work

David W. Dorsey, Jose M. Cortina, and Joseph Luchman

CONCEPTUALIZATION
Macro-level trends such as globalization, technology, demographic shifts, and alternative work
structures have led researchers and practitioners to challenge traditional definitions of individ-
ual work performance (Ilgen & Pulakos, 1999). Two major ways in which these definitions have
shifted include performing in interdependent and uncertain work contexts (Griffin, Neal, &
Parker, 2007). Thus, a complete understanding of performance must address how individuals adapt
to ever-changing situations and how individuals perform within given social settings and social
systems. In this chapter, we explore such expanded definitions of work performance by considering
what we know (and what we do not know) about adaptive and organizational citizenship-related
behaviors and how this knowledge might be used to inform selection.
Implicit in our effort to highlight adaptive and citizenship behavior is the assumption that such
behaviors are in some ways unique from traditional task performance. Although we argue in vari-
ous ways throughout this chapter that this is true, we acknowledge that the boundaries among such
performance components are fuzzy. It has been argued that neither adaptive nor citizenship per-
formance is mutually exclusive from task performance, and some conceptual and empirical over-
lap should be expected (Schmitt, Cortina, Ingerick, & Wiechmann, 2003; Johnson, 2003; Griffin
et al., 2007). Moreover, it has been demonstrated that differences in specific job requirements can
drive the relative importance (and profile) of various performance components (Pulakos, Arad,
Donovan, & Plamondon, 2000). See Chapter 21, this volume, for a detailed discussion about task
performance.
For the purposes of a selection volume, it is sufficient to observe that one of the reasons for dis-
tinguishing adaptive performance and citizenship performance from task performance is that they
have different determinants.

ADAPTIVE BEHAVIOR DEFINED


Because of the highly dynamic and changing nature of modern work environments, demand is
high for employees with various types and levels of versatility, flexibility, and adaptability (Pulakos
et al., 2000). Consequently, adaptability has become a core construct of interest to practitioners and
researchers in applied psychology. This interest has led to a host of theories and models of adapt-
ability that span multiple levels of analysis (e.g., individuals, teams, organizations; Burke, Pierce, &
Salas, 2006).

463
464 Handbook of Employee Selection

Interest in adaptability at the individual level has generated several important streams of research.
First, researchers have proposed that adaptive performance is a unique performance construct,
distinguishable from the well-documented components of task and contextual performance
(Campbell, 1999; Hesketh & Neal, 1999; Pulakos, et al., 2000). Second, in attempting to
understand (and predict) adaptive performance, researchers have begun investigating the individual
differences that enable various types of adaptive proficiency and performance (Pulakos, Schmitt,
Dorsey, Hedge, & Borman, 2002). Such research has produced an eight-dimension taxonomy of
adaptive performance (Pulakos et al., 2000) and some initial candidates for important individual
difference determinants (Kozlowski et al., 2001; LePine, Colquitt, & Erez, 2000; Pulakos et al.,
2002; Stewart & Nandkeolyar, 2006). However, this body of research is still in its infancy, and a
host of critical unanswered questions remains. For example, it is unclear whether all dimensions
of adaptive performance are in fact distinct from elements of task and contextual performance
(Johnson, 2001).
For the purpose of this chapter, we broadly define adaptive work-related behavior as an effective
change in response to an altered situation (White et al., 2005). Consequently, adaptive performance
entails the capability to do the following:

• Maintain situational awareness and recognize when an adjustment in behavior is needed—


either in response to a change in the environment (reactive) or as an attempt to shape the
environment (proactive)
• Change behavior in a manner that leads to more effective functioning
• Evaluate the outcome of this change and make further adjustments, as needed, to achieve
the desired result

We note briefly two points associated with this definition. First, in the case of proactive adapta-
tion, the altered situation is perceptual rather than environmental. A person perceives that desired
goals could not be met given current circumstances, so the environment must be shaped to achieve
intended objectives. Second, in our view, performing adaptively, in a manner consistent with this
definition, is meaningful at levels of analysis beyond individual performers. As suggested through-
out the literature (e.g., Burke, Pierce, & Salas, 2006; Chen, Thomas, & Wallace, 2005; Griffin
et al., 2007), the capacity to perform adaptively is a multilevel organizational process or phenom-
enon, which is enabled or facilitated by a host of organizational factors and interventions. Here, we
focus on the critical importance of selecting (hiring) individuals with the necessary knowledge,
skills, abilities, and other attributes (KSAOs) that enable adaptive performance.
In terms of criterion definition, Pulakos et al. (2000) developed a taxonomy of adaptive job
performance that expands upon previous models of the performance domain (e.g., Campbell, McCloy,
Oppler, & Sager, 1993). Two studies were conducted to develop and refine this taxonomy. First,
more than 1,000 critical incidents from 21 different jobs (military and nonmilitary) were content-
analyzed, yielding an eight-dimension taxonomy of adaptive performance. Second, this taxonomy
was investigated empirically via the development and administration of a Job Adaptability Inventory
(JAI)—an instrument designed to describe the adaptability requirements of jobs. Exploratory factor
analyses of JAI data from 1,619 respondents across 24 jobs yielded an eight-factor solution that
mirrored the hypothesized eight-dimension taxonomy. Subsequent confirmatory factor analysis
(using a separate subsample) indicated a good fit for the eight-factor model. The eight dimensions of
adaptive performance are as follows (this research is further highlighted in Pulakos et al., 2000):

1. Handling emergencies or crisis situations


2. Learning work tasks, technologies, and procedures
3. Handling work stress
4. Demonstrating interpersonal adaptability
5. Displaying cultural adaptability
Adaptive and Citizenship-Related Behaviors at Work 465

6. Solving problems creatively


7. Dealing effectively with unpredictable or changing work situations
8. Demonstrating physically oriented adaptability

This initial research highlighted two particularly important points, namely that (a) adaptive perfor-
mance is a multidimensional construct and (b) individual jobs or organizational roles have unique
profiles of adaptability requirements, which vary predictably along the eight adaptability dimen-
sions. Both of these findings hold implications for personnel selection.
Note also that the eight dimensions subsume other constructs such as creativity/innovation, learn-
ing, and handling stress. Consistent with our overarching definition of adaptability (an effective
change in response to an altered situation), the eight dimensions in essence point out categories of
responses (e.g., learning) and categories of situations (e.g., intercultural dilemmas) that may appear
in specific job requirements or criterion definitions.
Some authors have argued that most of the dimensions in the Pulakos et al. (2000) eight-dimension
model are, in essence, dimensions of technical and contextual performance (Johnson, 2003). Thus,
some researchers have narrowed the definition of adaptive performance to focus primarily on
Pulakos et al.’s (2000) concept of “dealing with uncertain work situations” as a core element of this
construct (Griffin et al., 2007; Johnson, 2003). The primary definition that we provide above fits
with this line of thinking; dealing with an uncertain situation is roughly the same as our overarching
definition of an effective change in response to an altered situation.
Such debates about the substantive nature of adaptive performance beg the question “How
specifically is adaptive performance different from typical task performance?” Although some
aspects of adaptability may look similar to routine technical performance, adaptation may involve
doing the same activity to a greater degree, with greater intensity, or in a substantially different way.
Thus, distinctions regarding adaptive and technical performance may center on definitions of typical
versus maximal performance (DuBois, Sackett, Zedeck, & Fogli, 1993; Ployhart & Bliese, 2006).
In addition, as pointed out above, we view the eight dimensions as characterizing types of responses
and/or types of situations that may be relevant to various performance component definitions (which
may be technical or contextual in nature). In line with this view, Ployhart and Bliese (2006) suggested
that adaptability is a characteristic driven by the environment, not inherent in the criterion construct
domain, and that any task can have an adaptive component. The question is whether existing criterion
measures sufficiently capture such dynamic and adaptive performance requirements.

CITIZENSHIP DEFINED
Citizenship performance traces its conceptual lineage back to Barnard (1938), Katz (1964), and,
more recently, Organ and colleagues, who first coined the term “organizational citizenship behav-
ior,” or OCB (Bateman & Organ, 1983; Smith, Organ, & Near, 1983). Since its introduction, over 30
potential variants of OCB have arisen (Podsakoff, MacKenzie, Paine, & Bachrach, 2000), includ-
ing umbrella terms such as “contextual performance,” recently renamed “citizenship performance”
(Borman & Penner, 2001; Borman, Penner, Allen, & Motowidlo, 2001; Coleman & Borman, 2000);
prosocial organizational behavior (Brief & Motowidlo, 1986); contextual performance (Borman &
Motowidlo, 1993); and organizational participation (Graham, 1991). Many other variables, although
not variants of citizenship, are clearly linked to its dimensions. For example, organizational iden-
tification overlaps with the representing, loyalty, and compliance dimensions of citizenship, and
socialization is linked with helping and compliance.
Although most studies of citizenship refer to citizenship behavior, we prefer the term citizen-
ship performance because it emphasizes the notion that there is an aspect of quality to citizenship
behavior such that some citizenship behaviors are more successful than others. The notion of qual-
ity in citizenship is necessary for the recognition of the importance of knowledge and skill in the
prediction of citizenship.
466 Handbook of Employee Selection

Consistent with Borman and Motowidlo (1993), we define citizenship performance as activi-
ties that support the broader environment in which an organization’s technical core must func-
tion. Citizenship performance has many subdimensions, and there have been varied attempts
to identify them (e.g., Borman & Motowidlo, 1993; Smith et al., 1983; Williams & Anderson,
1991). In this chapter, we use the most detailed of these—that of Borman et al. (2001a). We
made this choice recognizing that there is considerable overlap among some of the subdimen-
sions in this model. However, we felt that reliance on a model with subtle distinctions would be
more appropriate because it would allow us to suggest potential differences in phenomenology.
If subsequent research suggests that there is little value in these distinctions, then future reviews
can collapse them.
The Borman et al. (2001b) model contained three subdimensions of citizenship performance,
each of which can be broken down further into facets. Table 22.1 contains detailed definitions of
the facets of the model. The three subdimensions are Personal Support, Organizational Support,

TABLE 22.1
Model Facets of Citizenship Performance
Personal Support
Helping Helping others by offering suggestions about their work, showing them how to accomplish
difficult tasks, teaching them useful knowledge or skills, directly performing some of
their tasks, and providing emotional support for personal problems
Cooperating Cooperating with others by accepting their suggestions, following their lead, putting team
objectives over personal interests, and informing others of events or requirements that are
likely to affect them
Courtesy Showing consideration, courtesy, and tact in relations with others
Motivating Motivating others by applauding their achievements and successes, cheering them on in
times of adversity, showing confidence in their ability to succeed, and helping them
overcome setbacks
Organizational Support
Representing Representing one’s organization favorably to outsiders by defending it when others
criticize, promoting its achievements and positive attributes, and expressing own
satisfaction with organization
Loyalty Showing loyalty by staying with one’s organization despite temporary hardships,
tolerating occasional difficulties, handling adversity cheerfully and without complaining,
and publicly endorsing and supporting the organization’s mission and objectives
Compliance Complying with organizational rules and procedures, encouraging others to comply with
organizational rules and procedures, and suggesting procedural, administrative, or
organizational improvements
Conscientious Initiative
Self-development Developing own knowledge and skills by taking courses on own time, volunteering for
training and development opportunities offered within the organization, and trying to
learn new knowledge and skills on the job from others or through new job assignments
Initiative Taking the initiative to do all that is necessary to accomplish team or organizational objectives
even if not typically a part of own duties, correcting nonstandard conditions whenever
encountered, and finding additional work to perform when own duties are completed
Persistence Persisting with extra effort despite difficult conditions and setbacks, accomplishing goals
that are more difficult and challenging than normal, completing work on time despite
unusually short deadlines, and performing at a level of excellence that is significantly
beyond normal expectations

Source: Adapted from Borman, W. C., Buck, D. E., Hanson, M. A., Motowidlo, S. J., Stark, S., & Drasgow, F., Journal of
Applied Psychology, 86, 965–973, 2001.
Adaptive and Citizenship-Related Behaviors at Work 467

and Conscientious Initiative. The first subdimension, Personal Support, has similarities with OCB-
individual (William & Anderson, 1991), social participation (Van Dyne, Graham, & Dienesch,
1994), interpersonal facilitation (Van Scotter & Motowidlo, 1996), and incorporates Organ’s (1988)
altruism and courtesy dimensions (Coleman & Borman, 2000). Personal Support is comprised of
four facets: helping, cooperating, showing courtesy and consideration, and motivating others.
The second dimension, Organizational Support, is similar to OCB-organizational (Williams &
Anderson, 1991) and incorporates sportsmanship and civic virtue (Organ, 1988) and loyalty and
obedience (Van Dyne et al., 1994). It is comprised of three facets: representing the organization
favorably, showing loyalty, and complying with organizational rules and procedures.
The final dimension, Conscientious Initiative, is similar to functional participation (Van Dyne
et al., 1994) and job dedication (Van Scotter & Motowidlo, 1996). It is comprised of three facets:
persisting with extra effort, showing initiative, and engaging in self-development.
Clearly, there is overlap among these facets. Indeed, a recent meta-analysis by Hoffman, Blair,
Meriac, and Woehr (2007) suggested that the citizenship domain is best characterized as a single
higher-order factor. Specifically, they suggested that the subdimensions of citizenship (i.e., coop-
eration, courtesy, etc.) are indicators of a more general citizenship construct. On the other hand,
these dimensions seem to be conceptually distinct (see Table 22.1). For example, Machiavellianism
often involves being courteous without being helpful or cooperative. These facets can be further
distinguished by the fact that they have different consequences. However, most important for our
purposes is the fact that they have different individual and situational determinants. It remains to
be seen if the covariance between subdimensions suggested by Hoffman et al. (2007) result from
halo or common method effects or from the true nature of citizenship dimensions as reflections of a
higher order (see also LePine, Erez, & Johnson, 2002). Before discussing determinants and conse-
quences, a discussion of the nature of citizenship is in order.
Most empirical studies of citizenship, and of performance generally, assign a single score to each
participant in the same way that studies of cognitive ability assign a single ability score to each par-
ticipant. The implication is that a single value can represent the standing of a person on the stable
construct, citizenship.
Ilies, Scott, and Judge (2006) used event sampling to generate daily reports of job attitudes
and citizenship and found that much of the variance in citizenship was within person. As will be
explained later, these authors found that this within-person variance was explained by other within-
person factors. For the moment, we merely wish to point out that the Ilies et al. (2006) findings cast
doubt on the practice of using a single value to represent citizenship and on the conceptualization
that this practice implies.
Predictors of performance often do not have simple cause-effect relationships with performance.
Rather, some predictors can serve as “boundary conditions” or can “act through” other variables.
These two conditions are known as “moderation” (boundary condition) and “mediation” (acting
through).
As we discuss in a later section, Ilies et al. (2006) found that stable individual differences “mod-
erated” within-person variability in citizenship behavior. In this case, levels of a stable trait act as
boundary conditions such that within-person variability is only important when an employee does
not have high levels of this trait. Practically speaking, this would suggest that hiring individuals
with higher levels of this trait should make for more consistent citizenship performance (as opposed
to more citizenship performance). This is because at this level of the trait, within-person causes
(e.g., affect, attitudes) are less relevant.
Mediation, on the other hand, is somewhat simpler. Mediation states that essentially, a third
variable “speaks for” the relation between the other two. For instance, a relationship between age
and performance may exist not because age causes performance, but that age is related to relevant
job-related experiences, which causes performance. Thus, the relationship of age with performance
acts through its relationship with experience. For practice, this suggests that one can use variables
that are mediated by others as proxies for the mediator variable of interest. In so doing, one should
468 Handbook of Employee Selection

remember that proxies are not the variable of direct interest, and their use may have legal implica-
tions (i.e., one would be ill advised to select for age, presuming it is a sufficient proxy for relevant
job experience).

INDIVIDUAL DIFFERENCE PREDICTORS


There are many potential predictors of adaptability and citizenship. Rather than attempt a compre-
hensive review, we offer those predictors that have been most prominent in the recent literature.
Table 22.2 summarizes the predictors on which we focus.

TABLE 22.2
Predictors of Adaptability and Citizenship
Predictor Explanation Primary Source(s)
Adaptability: Distal predictors
Cognitive ability High-g people have more cognitive resources to Pulakos et al. (2002)
devote to adaptive situations, allowing them to
determine appropriate courses of action.
Conscientiousness/resiliency High-conscientiousness people are prepared to LePine et al. (2000); Pulakos et al.
struggle through uncertainty to achieve. They are (2002); Stewart & Nandekeolyar
more likely to engage in task pursuit. On the (2006)
other hand, they may be less willing to revise
their approach.
Openness to experience High-openness people are better prepared to LePine et al. (2000); Pulakos et al.
revise initial approaches on the basis of (2002); Stewart & Nandekeolyar
experience. On the other hand, they may be less (2006)
inclined to pursue formal objectives in the first
place.
Age The role of age is unclear. Although physical DeArmond et al. (2006)
adaptability may be more difficult for older
workers, their experience may help them to
appreciate the need for other kinds of
adaptability.
Citizenship: Distal predictors
Conscientiousness High-conscientiousness people are more likely to Borman et al. (2001b); Moon
recognize the need for and follow through on et al. (2008); Organ & Ryan
citizenship behaviors. “Duty” is positively related (1995)
to “taking charge,” whereas achievement striving
is negatively related.
Prosocial personality Prosocial people are more empathic and more Borman et al. (2001b)
interesting in engaging in activities that benefit
others or the organization.
Collectivism People high in collectivism feel shared fate with Jackson et al. (2006)
their groups and are more concerned with group
outcomes.
Narcissism Causes one to engage in less citizenship and to Judge et al. (2006)
inflate self-ratings of citizenship.
Motives Prosocial value motives and empathy predict Kamdar et al. (2006); Rioux &
OCB-I, whereas organizational concern motives Penner (2001)
and reciprocation wariness predict OCB-O.
Concern over future People high in CFC are less likely to engage in Joireman et al. (2006)
consequences OCB if term of employment is unclear.
Adaptive and Citizenship-Related Behaviors at Work 469

TABLE 22.2 (continued)


Predictors of Adaptability and Citizenship
Predictor Explanation Primary Source(s)
Social networks Relational ties, direct and indirect, lead to OCB. Bowler & Brass (2006);
The structure and degree of ties is also relevant. Venkatramani & Dalal (2007)
Informational privacy Privacy of personal information leads to Alge et al. (2006)
empowerment, which increases OCB.
Adaptability: Proximal predictors
Situational knowledge Because adaptability is situational, practical Schmitt & Chan (2006)
intelligence in the form of situational knowledge
should increase it.
Regulatory processes Motivational components (e.g., goal choice and Chen et al. (2005); Mitchel &
goal striving) and other processes (e.g., strategy Daniels (2003); Ployhart &
selection) transmit the effects of abilities and traits. Bliese (2006)
Citizenship: Proximal predictors
Attitudes Satisfaction, commitment, and other attitudes Ilies et al. (2006); Organ & Ryan
explain between-person and within-person (1995)
variance in OCB.
Knowledge and skill Various person and organization-related forms of Dudley & Cortina (2008);
knowledge influence the success of OCB attempts. Motowidlo et al. (1997)
Leadership Leader guidance, abuse, and LMX have been linked Dineen et al. (2006); Ilies et al.
to OCB-I, although job characteristics may (2007); Piccolo & Colquitt
mediate these relationships. (2006)
Social exchange Climate for feedback and complementarity lead to Glomb & Welsh (2005); Rosen
positive social exchange, which engenders OCB. et al. (2006)

DISTAL INDIVIDUAL DIFFERENCES—ADAPTABILITY


The most important task for a selection chapter such as this is to identify the attributes that distin-
guish desirable from undesirable candidates. Attributes such as personality and cognitive ability
are considered distal in that they are not direct causes of adaptive performance. Rather, personality
and mental ability lie farther down the causal chain. For instance, cognitive ability may predispose
an individual to think more complexly about a situation and generate and subsequently consider
alternative courses of action, which may then lead to better performance. Cognitive ability does not
cause adaptive performance but does cause consideration of alternatives. Nevertheless, cognitive
ability, as a reliably measured stable trait, would be highly relevant for selection purposes.
In one of the few studies conducted with a field sample and real-world performance measures,
Pulakos et al. (2002) found that cognitive ability, some aspects of personality (in particular achieve-
ment motivation), and respondents’ past experiences in situations requiring the eight different types
of adaptability (experience) were all predictive of adaptive performance. Another interesting result
from this study was a confirmatory factor analysis of the predictor measures that yielded support for
the Pulakos et al. eight-dimension model of adaptability.
Other work by LePine et al. (2000) and Stewart and Nandkeolyar (2006) suggested a
more complex view concerning the role of personality factors in adaptive performance, focusing
specifically on the role of conscientiousness and openness to experience. These authors suggested
that conscientiousness may be helpful for task pursuit (i.e., pursuing new yet formally prescribed
objectives), whereas openness to experience may actually deter task pursuit. In contrast, task revision
(pursuing activities that go well beyond typical formal objectives) may be bolstered by higher
levels of openness to experience and impeded by higher levels of conscientiousness. As this line of
research continues to unfold, it is likely that other personality facets (e.g., emotional stability) will
470 Handbook of Employee Selection

be implicated in effective adaptive performance. For example, the broader literature around coping
may suggest a host of variables that play a role in adaptive behavior (Ployhart & Bliese, 2006).
It is also noteworthy that some authors have posited an entirely different view of adaptability-
related individual differences. For example, Ployhart and Bliese (2006) offered a theory of adapt-
ability in which adaptability is viewed as a higher-order compound trait, determined by distal
KSAOs. To date, there is limited research on these issues.
Before moving on, one factor worth noting is the role of demographic factors such as age.
Although we are unaware of any large-scale studies that definitely link age to adaptive performance,
perceptions exist that older workers are less adaptable (e.g., see DeArmond et al., 2006). Although
there are certainly some task environments in which adaptability requirements may bound adaptive
performance among older workers (e.g., physical adaptability requirements), it is equally likely that
older workers excel on other dimensions of adaptive performance because of the wealth of experi-
ence that they have acquired. Possible reasons for increased adaptability among older workers on
some adaptive performance dimensions include (a) an increased probability of having acquired
coping skills, styles, and mechanisms to deal with stress, frustration, and change; and (b) acquired
experiences form a set of possible adaptive responses from which one can draw. As with most of the
areas of predictor research mentioned above, more research is needed to further our understanding
of the determinants of adaptive performance.

DISTAL INDIVIDUAL DIFFERENCES—CITIZENSHIP


Previous research on distal citizenship performance predictors has focused almost entirely on moti-
vational and dispositional variables. Organ and Ryan (1995) reviewed the research on dispositional
(conscientiousness, agreeableness, negative affectivity, and positive affectivity) predictors of OCB
and found that conscientiousness was the strongest predictor. Borman et al. (2001b) examined the
following predictors of citizenship: extroversion, locus of control, collectivism, personal initiative,
and prosocial personality. They found that conscientiousness and the two dimensions of prosocial
personality, other-oriented empathy and helpfulness, were the strongest predictors of citizenship
performance.
More recently, Jackson, Colquitt, Wesson, and Zapata-Phlean (2006) decomposed the collectiv-
ism domain into five subdimensions, finding that the elements of “reliance,” or feeling of shared fate
with group and “concern,” or feelings of concern about outcomes for group members related most
strongly to within-group citizenship behavior in teams.
In a similar vein, Moon, Kamdar, Meyer, and Takeuchi (2008) sought to resolve inconsistent
findings related to conscientiousness and “taking charge” citizenship or voluntary behaviors that
are intended to effect functional organizational change. They decomposed the conscientiousness
trait into “duty” and “achievement striving” and found that “duty” had a positive relationship with
taking charge and “achievement striving” had a negative relationship. Moon et al. attributed this
to the fact that duty is “other-oriented,” or centered on the benefit of others, whereas achievement
striving is not.
Judge, LePine, and Rich (2006) found that narcissism or delusional beliefs about influence and
personal specialness are related to inflated self-ratings of citizenship performance and to inhibited
ratings of these same behaviors by others as a result of delusions about superiority/persuasiveness,
sense of entitlement, and self-admiration. Taken collectively, this research suggests that having a
strong individual orientation can inhibit citizenship performance, particularly those dimensions of
citizenship that are other-oriented.
Recent research has also investigated whether personal motives (e.g., prosocial values, organiza-
tional concern, and impression management motives) relate to citizenship performance (Chandler,
2004; Rioux & Penner, 2001). Rioux and Penner (2001) found that prosocial values motives were
most strongly associated with citizenship directed toward individuals, whereas organizational con-
cern motives were most strongly related to citizenship directed toward the organization.
Adaptive and Citizenship-Related Behaviors at Work 471

Similarly, Kamdar, McAllister, and Turban (2006) found that empathy (defined as empathetic
concern and perspective taking) was a better predictor of individual level citizenship behaviors
whereas reciprocation wariness was a better predictor of citizenship toward the organization. These
latter findings reflect an effort to understand citizenship through motives related to social exchange
theory (SET; Blau, 1956; Cropanzano & Mitchell, 2005). In SET, an individual’s motives are thought
to be contingent upon relational obligations and duties formed through exchanges of resources with
other individuals or groups. SET posits that two exchanging parties are likely to build high-quality
relationships characterized by trust and commitment to one another. From this perspective, citizen-
ship is a form of resource exchange with an organization or other individual. Accordingly, Kamdar
et al. suggested that empathetic individuals tend to define their job role in terms of citizenship and
are willing to exchange their citizenship with coworkers and the organization alike, whereas workers
who are more concerned about reciprocation from their exchange partner will only exchange with
the more powerful and resource-rich organization where there is a higher likelihood of reward.
Also from a social exchange perspective, Joireman, Kamdar, Daniels, and Duell (2006) found
that “concern over future consequences” (CFC) also predicts citizenship. Because the benefits of
engaging in citizenship behavior are longer term, individuals who are not concerned with their
own benefits in the long term are more likely to engage in citizenship behavior irrespective of their
expected length of tenure with an employer. Conversely, individuals who have higher CFC will
withhold citizenship in cases where their expected tenure is short because of lack of perceived ben-
efits. The importance of this finding for selection is that screening for citizenship will be ineffective
for high-CFC applicants if term of employment is unclear or known to be short.
Bowler and Brass (2006) found that individuals were more likely to both give and receive OCB
from those to whom they had a relational tie or were connected through a friend (third-party ties)
in a social network. Workers in powerful structural positions (e.g., individuals who mediated links
between groups of individuals) were also more likely to receive OCB from others. These findings
were bolstered by a later study by Venkatramani and Dalal (2007) who found that having a direct or
indirect (third party) positive affective tie led to increased instances of interpersonal helping. This
relationship was also affected by the amount of contact. Thus, the more frequently the individuals
were exposed to one another, the more likely they were to help. Taken together, these studies sug-
gest that friendship, direct and indirect, can increase the likelihood of giving and receiving helping
behavior from coworkers. In fostering a genial workplace, an organization may then reap the ben-
efits of citizenship performance.
Informational privacy has also been related to increased citizenship behavior (Alge, Ballinger,
Tangirala, & Oakley, 2006). Increased privacy of personal information leads to psychological
empowerment (or intrinsic motivation) because information control is expected to lead to feelings
of self-determination and value of membership in the organization. Informational privacy is then
related to citizenship, but empowerment mediates this relationship. Thus, establishing policy that
maintains confidentiality of sensitive employee information is a way in which an organization can
increase feelings of employee empowerment.

IMMEDIATE/PROXIMAL DETERMINANTS—ADAPTABILITY
If basic distal individual characteristics such as abilities, personality traits, and past experience
serve as determinants of adaptive performance, one can ask “What are more immediate or proxi-
mal determinants?” In contrast to distal predictors, we would expect proximal predictors to have
a more direct influence on performance behavior. Here, extant literature suggests that factors such
as knowledge, self-efficacy, and skill all predict adaptive performance (e.g., Kozlowsk et al., 2001).
To understand selection of those who are high in adaptive performance and good citizens, we must
understand these more proximal predictors. Moreover, there is evidence that such proximal pre-
dictors produce incremental validity over distal factors (Ford, Smith, Weissbein, Gully, & Salas,
1998; Kozlowski et al., 2001). Thus, including variables that are “closer” on the causal chain to
472 Handbook of Employee Selection

performance with distal predictors can improve our predictions of performance. This is noteworthy
in that the distal predictors themselves are thought to be causes of the proximal predictors (e.g.,
cognitive ability causing knowledge uptake causing performance).
In addition to skills and attitudinal changes (e.g., self-efficacy), it is likely that knowledge acqui-
sition plays a key role in adaptive performance. In particular, for individuals to perform adaptively,
they must adequately perceive and interpret various situations and possess knowledge concerning
how to operate adaptively in the given contexts. This assumption is supported by several theoretical
propositions and empirical findings from existing research, as described throughout this chapter. For
example, structural models of job performance have shown that cognitive ability exhibits a strong
indirect effect on ratings of overall performance through the mediating variables of job knowl-
edge (Borman, White, Pulakos, & Oppler, 1991; Borman, White, & Dorsey, 1995; Hunter, 1983;
Schmidt, Hunter, & Outerbridge, 1986) and technical proficiency (Borman et al., 1995; Schmidt
et al., 1986). Moreover, given the situational nature of adaptability, it is likely that some type of situ-
ational knowledge (conceptually close to the concept of practical intelligence, see Schmitt & Chan,
2006) plays an important role.
Another important avenue of current research is the literature regarding self-regulatory skills.
Regulatory processes have been implicated as mediational mechanisms that lead to better adaptive
performance (e.g., Chen et al., 2005). Regulatory mechanisms include motivational components
such as goal choice and goal striving (Kanfer, 1990; Mitchell & Daniels, 2003). Consistent with the
literature on goal-setting, goal choice and goal striving affect performance through the allocation
of attention to and persistence in achieving goals (Locke & Latham, 1990; Chen et al., 2005). The
notion of goal choice is similar to other proximal predictors such as strategy selection that have
been proposed as determinants of adaptive performance (Ployhart & Bliese, 2006). In summarizing
possible predictors of adaptive performance, it is important to highlight that not all of these factors
are strictly malleable (in the sense of being trainable). Thus, from a selection perspective, choosing
predictors that are more likely to be relatively fixed in individuals may be beneficial. Also, careful
consideration of which skills or attributes are needed at entry is warranted. However, we should also
mention that trainability does not necessarily imply that training is superior to selection. Training
programs can be very expensive to develop and administer. As a result, it can be more cost effective
to buy, rather than build, even the most malleable characteristics.

IMMEDIATE/PROXIMAL DETERMINANTS—CITIZENSHIP
Although stable individual differences are clearly important for the prediction of citizenship perfor-
mance, the better part of its variance remains unexplained. One likely reason for the modest cor-
relations between stable characteristics and citizenship is that their influence is transmitted by more
proximal variables. Three categories of predictors that hold great promise are attitudes, knowledge,
and skills.
In their seminal meta-analysis, Organ and Ryan (1995) found that, with the exception of consci-
entiousness, attitudinal variables (job satisfaction, perceived fairness, organizational commitment,
and leader supportiveness) were stronger predictors of OCB than dispositional variables, with mean
uncorrected correlations that were 5–15 points higher. Ilies et al. (2006) found that within-person
variance in citizenship could be explained by within-person variance in positive and negative affect
and job satisfaction.
Until recently, researchers had only hinted at knowledge and skills that might predict citizen-
ship performance. Motowidlo et al. (1997) provided general examples of citizenship performance
knowledge and skills when they stated that, “Examples of contextual knowledge include knowing
how to cooperate with a diverse group of people; knowing how to calm an upset worker; knowing
how to work productively with difficult peers, supervisors, and subordinates, and so forth” (p. 80).
Similarly, they suggested that examples of contextual skill include “skill in actually carrying out
actions known to be effective for handling situations that call for helping and coordinating with
Adaptive and Citizenship-Related Behaviors at Work 473

others; following organizational rules and procedures; endorsing, supporting, and defending orga-
nizational objectives; persisting; and volunteering” (p. 81).
Much of the empirical work has asserted a particular knowledge and/or skill as relevant for effec-
tive performance in a specific domain (e.g., Bergman, Donovan, & Drasgow, 2001, interpersonal
skill predicting leadership; Morgeson, Reider, & Campion, 2005, interpersonal and teamwork skills
predicting team performance; Motowidlo, Brownless, & Schmit, 1998, and Schmit, Motowidlo,
Degroot, Cross, & Kiker, 1996, customer service knowledge and skill predicting customer service
performance). Other research has worked backward to determine knowledge and skills (e.g., Bess,
2001; Schneider & Johnson, 2001). For example, a participant is presented with an interpersonal
situation that may be encountered on the job for which they are applying. Subject matter experts
(SMEs) then decide which items measure overall job knowledge relevant to citizenship performance
dimensions. Thus, no particular citizenship performance knowledge and skill is identified.
To our knowledge, only two studies have tested specific knowledge in the prediction of citizen-
ship performance. Bettencourt, Gwinner, and Meuter (2001) found that two customer-knowledge
antecedents (trait richness, or knowledge of customer characteristics and types, and strategy rich-
ness, or knowledge of strategies for dealing with customer needs and situations) explained unique
variance in service-oriented OCBs after controlling for attitudinal and personality variables.
Dudley and Cortina (2009) examined knowledge and skill predictors of helping performance
(one of the facets of Personal Support) and found that interpersonal construct knowledge, strategy
richness, means end knowledge, emotional support skill, and emotion perception skill predicted
supervisor ratings of person-related and task-related helping performance over and above various
dispositional variables.
Although relatively little empirical work has been conducted to uncover knowledge and skill pre-
dictors of citizenship, indirect evidence has been found for the effect of interpersonal skills on the
fostering of helping in creative tasks. Porath and Erez (2007) found that incidents of rudeness were
related to decreased levels of helping toward the rude person and to unrelated others. This suggests
that rudeness has wide-ranging effects in interdependent contexts, such as team performance, where
helping can contribute substantially to performance.
Dudley and Cortina (2008) developed a conceptual model linking specific knowledge and skill
variables to the Personal Support facets of citizenship performance. Among the most prominent
knowledge-based predictors were strategy richness, emotional knowledge, knowledge influence tac-
tics, and organizational norm knowledge. Among the most prominent skill-based predictors were
emotion support skills, emotion management skills, facework skills, behavioral flexibility, social
perceptiveness, and perspective-taking skills. We refer the reader to Dudley and Cortina (2008) for
the bases of the linkages themselves.
Leader characteristics and practices have been found to be predictive of citizenship behavior
consistent with social exchange. For instance, leaders who provide guidance to their followers on
appropriate types of behaviors (i.e., citizenship), when bolstered by behavioral integrity (word-
behavior consistency in leader), leads to follower OCB (Dineen, Lewicki, & Tomlinson, 2006).
Abusive supervision has also been linked to lower perceptions of organizational justice, which acts
as a mediator to predict levels of citizenship (Ayree, Chen, Sun, & Debrah, 2007). Positive leader-
member exchange (LMX) has also been found to relate to several positive organizational outcomes,
one of which is supervisor-oriented OCB-I (Ilies, Nahrgang, & Morgeson, 2007).
Transformational leadership has also been linked to citizenship behaviors. However, a recent
study found that Hackman and Oldham’s (1974) job characteristics theory constructs mediated the
transformational leadership-citizenship relation (Piccolo & Colquitt, 2006). Specifically, transfor-
mation leadership was found to influence the “meaning” ascribed by employees to elements of their
jobs (such as their perceptions of skill variety and identity). This in turn led to intrinsic motivation,
which mediated the relation between job characteristics constructs and citizenship behavior. Others
suggest that the transformational leadership-citizenship relation is mediated by LMX (Wang, Law,
Hackett, Wang, & Chen, 2006). In this case, transformational leadership is a cue to a follower that
474 Handbook of Employee Selection

the leader is a trustworthy exchange partner. This leads to greater investment by the follower in the
relationship and then stronger positive LMX perceptions.
Although not examined from a SET perspective, Rosen, Levy, and Hall’s (2006) study can be
also understood from a social exchange perspective. Rosen et al. found that fostering a climate in
which feedback is provided to their subordinates reduces perceptions of organizational politics and
increases employee morale. High feedback environments suggest that the outcomes enjoyed by indi-
viduals are determined in a procedurally fair manner and are not based on political skill but on con-
tributions to organizational success, which lead to citizenship. This suggests that resources devoted
to selecting good citizens may not be well spent unless the environment is conducive to citizenship.
Additionally, complementarity in control-related personality characteristics (leader having
more and subordinates having less controlling personalities) was posited to lead to positive social
exchanges (Glomb & Welsh, 2005). The complementarity hypothesis was not supported. Rather
there was a main effect for subordinate control, suggesting that individuals need a sense of control
over their work (consistent with some evidence outlined above) to exhibit OCB.
In closing, we should mention that most research conducted on predictors of citizenship and
adaptive performances has used job incumbents as subjects of their research. To increase the con-
fidence that the above predictors are in fact the causes of the observed effects on performance,
predictive validity studies using applicant populations need to be conducted.

MEASUREMENT
MEASURING ADAPTIVE PERFORMANCE
In considering the measurement of adaptive performance, we highlight two primary measurement
issues and two primary methods. The first issue is the importance of clearly defining the types of
adaptive behavior or performance of interest. As suggested above, there is a host of ways of defining
and conceptualizing adaptive behavior. Following the overarching definition of adaptability offered
above, measures of adaptive performance must focus on the way(s) in which task requirements have
shifted in a manner as to require a change in response. The specific nature of the changing task
requirements and the required responses will drive the relative predictive power (and weighting)
of various distal and proximal predictors (e.g., interpersonal adaptability requirements will relate
more to interpersonal predictors such as communication, negotiation, and persuasion skills). Thus,
in practice, it is essential that a job analysis is conducted to establish appropriate predictor-criterion
linkages. Assuming that all dimensions of adaptive or citizenship performances are de facto rel-
evant (or irrelevant) can lead to measures that are either deficient or contaminated.
This leads to a second important point: Given the possible “maximal” nature of adaptive perfor-
mance, in some contexts it may be hard to measure adaptive behavior under typical work conditions
or in laboratory settings. For example, handling severe crises, tolerating extremes of stress, or meet-
ing extreme physical demands requires observing behavior under nonroutine, possibly low-base-
rate, and possibly hard-to-replicate conditions.
Given the challenges in measuring adaptive performance, how would one go about it? First,
most of the existing techniques for gathering ratings-based performance measures apply to adaptive
performance. Thus, one can gather supervisor, peer, subordinate, or customer ratings using various
behavior-based rating formats. Pulakos et al. (2002) used this approach. Specifically, Pulakos et al.
gathered two types of ratings-based measures. The first measure was a set of eight behavior-based
rating scales constructed to reflect each of the eight adaptability dimensions highlighted earlier.
A second ratings measure asked supervisors to rate how effective each subordinate was in handling
each of 24 situations (3 situations for each of the eight dimensions); each of the situations was based
on job-related critical incidents that had been identified as requiring adaptive responses.
In other situations in which ratings-based measures might be deemed insufficient, simulations
or work samples might be used to measure adaptive behavior. As just one example, the U.S. Army
Adaptive and Citizenship-Related Behaviors at Work 475

Special Forces created a computer game simulation as part of the Adaptive Thinking & Leadership
(ATL) program to train Special Forces team leaders (Raybourn, Deagle, Mendini, & Heneghan,
2005). This simulation challenges trainees to function effectively in an immersive multicultural
environment where adaptability-related skills in negotiation, consensus building, communication,
judgment, self-awareness, and innovation are required. Simulation-based measures are also popular
in applications in which adaptive performance is conceptualized in terms of training transfer. For
a good example of this approach see Chen, Thomas, and Wallace (2005), in which training simula-
tions are used to measure adaptive performance at the individual and team levels.

MEASURING CITIZENSHIP-RELATED VARIABLES


There are four issues relating to the measurement of citizenship that we wish to discuss. First,
although citizenship is often measured with global indices, few jobs require equal amounts of all
dimensions. Thus, global measures are contaminated for almost all settings. Organizations would do
well to identify the citizenship dimensions in which they are particularly interested and to develop
or purchase valid measures of only those dimensions.
Second, measures of citizenship itself vary in the degree to which they reflect activity rather than
performance. Self-report measures of citizenship invariably emphasize activity (e.g., “I often help
my coworkers when they have personal problems”) or attempt (e.g., “I often try to help my cowork-
ers when they have personal problems”) rather than emphasizing the degree to which activities and
attempts are successful. Measures from external sources such as supervisors are much more likely
to reflect success. For this reason, external evaluation of citizenship is even more important than it is
for other factors. Another way of characterizing this difference is to say that it is likely that a differ-
ent model is tested if a self-report measure of citizenship is used than if an external measure is used.
For example, this is not to say that supervisors cannot be asked about citizenship activity as opposed
to citizenship performance. Rather, whereas an external source might be expected to produce an
unbiased evaluation of citizenship quality, it is not reasonable to expect an unbiased evaluation of
quality from the subject him/herself.
To address the third issue, we return to the Ilies et al. (2006) study. These authors showed that
there was tremendous within-person variability in citizenship such that the level of citizenship var-
ied considerably from one day to the next. Episodic measurement of citizenship affords two types
of citizenship measurement. In addition to treating episodic measurements separately, they can also
be averaged. The values based on the individual episodes themselves can be used to test within-
person models such as that tested in Ilies et al. (2006) and Judge et al. (2006). The averages of these
episodic measurements are conceptually similar to the single-shot measures that are typical of
citizenship research and can be used to test individual difference models.
Fourth, in describing the model of citizenship on which we would rely in this chapter, we men-
tioned that there is a good deal of conceptual overlap among the various facets of citizenship. This
overlap creates serious measurement problems. Consider the Personal Support facets helping and
cooperation. Dudley and Cortina (2009) used items such as the following to measure helping:

• Tries to talk fellow cadets through personal problems, helping them come up with
solutions
• Tries to help fellow cadets with difficult assignments, even when assistance is not directly
requested

Brooks-Shesler, LaPort, Dudley, and Cortina (2009) used items such as the following to measure
cooperation:

• Follows others even when he or she would prefer to lead


• Implements coworker suggestions about how he/she performs job tasks
476 Handbook of Employee Selection

These are not items from the original list of items, but rather they represent heavily modified items.
The original helping items were developed not only to be consistent with the conceptual nature of
their respective construct, but also to be distinct from cooperation among other things. The origi-
nal cooperation items were developed not only to be consistent with the conceptual nature of their
respective construct, but also to be distinct from helping. Despite these efforts, a group of SMEs had
difficulty in categorizing these items accurately. This is consistent with the Hoffman et al. (2007)
finding that raters typically have difficulty in discriminating between these subdimensions without
the aid of “frame-of-reference” training. In addition, the SMEs in the Dudley and Cortina (2008)
and Brooks-Shesler et al. (2009) studies had difficulty in distinguishing citizenship items not only
from one another, but also from items measuring knowledge and skills. In these studies, the items
were modified so that their distinctiveness was more apparent. This sort of iterative process is not
uncommon in scale development. Our point is that the process is particularly important and dif-
ficult in measuring the facets of citizenship because of the difficulty in demonstrating discriminant
validity.
It should also be mentioned that, just as there are unique difficulties in the measurement of citi-
zenship, there are unique difficulties in the measurement of the knowledge and skills that predict
citizenship. We mentioned in the previous paragraph that dimensions of citizenship are difficult to
measure in part because they are difficult to distinguish from the knowledge and skills that predict
them. This problem cuts both ways in that knowledge and skills are difficult to measure because
they are difficult to distinguish from the dimension that they are intended to predict.

MODERATORS AND MEDIATORS—ADAPTABILITY


No selection system is complete without an understanding of the factors that might compromise,
enhance, or transmit the predictive power of the system, Moderator variables change the magni-
tude, or even the direction, of relationships (e.g., the conscientiousness-performance relationship is
stronger when employees have autonomy than when employees do not have autonomy). Mediating
variables transmit the effect of causal variables (e.g., skills mediate the relationship between train-
ing and performance such that the reason that training is related to performance is that training
increases skills, which, in turn, increase performance). Depending on the perspective taken, sev-
eral variables highlighted above might be specified as moderators or mediators (e.g., regulatory
skills moderate or mediate effects of basic abilities and traits). In this section, we look beyond the
variables already mentioned to consider additional possible moderators or mediators implicated in
understanding and predicting adaptive performance.
First, as evident in our definition of adaptive performance, characteristics of the task domain
itself play a large role in affecting variables related to adaptability. For example, task complexity can
affect motivational components involving goal-setting, with the relationship between goal-setting
and performance diminishing as task complexity increases (Wood & Locke, 1990). Note that adap-
tive requirements and task/job complexity are not necessarily the same thing. Many elements of
complex jobs (e.g., deep specialized technical procedures) do not necessarily require adaptation.
However, it is a bit harder to conceive of adaptive performance requirements that do not increase
complexity. Also, as we mention elsewhere, the base rate of adaptive performance behaviors may
be low but still present in some jobs. For example, many police jobs contain adaptive elements (e.g.,
creative problem solving, handling crisis situations), yet many of these jobs have large administra-
tive components, limiting the base rate of actual adaptive performance behaviors.
Moreover, for tasks that involve leadership, additional factors likely come into play. It is not
enough for leaders to be individually adaptable. Leaders must also develop adaptability in others by
encouraging and rewarding adaptive behavior and by ensuring cooperation and coordination. The
role of the leader in encouraging adaptability suggests at least two broad performance dimensions of
leader adaptability: developing the adaptive capabilities of others and creating a climate that fosters
adaptability (Zaccaro & Banks, 2004). Secondly, being embedded in groups or teams involves a
Adaptive and Citizenship-Related Behaviors at Work 477

host of group process factors that impact adaptability at team and individual levels (Burke, Stagl,
Salas, Pierce, & Kendall, 2006; Chen et al., 2005; Koslowksi et al., 2001). In some circumstances,
team members may adapt team roles or internal structures to align with the environment (Moon
et al., 2004).
In addition to the nature of tasks and possible team or group factors, the larger organizational
environment also likely moderates predictors of adaptability. As W. Edward Deming once observed,
put a good person in a bad system, the bad system wins every time. Thus, organizational norms,
cultures, climates, and systems can constrain or enhance adaptability. For example, organizational
communication patterns might enable more or less adaptive performance (Entin, Diedrich, &
Rubineau, 2003). Moreover, organizational phenomena such as mergers and acquisitions may
require additional adaptation on the part of individuals (Griffin et al., 2007). These “boundary con-
ditions” must be taken into consideration when an organization is attempting to enhance adaptation
in their employees or when diagnosing why adaptive performance interventions may not be having
the intended effect.

MODERATORS AND MEDIATORS—CITIZENSHIP


Regardless of the predictors on which one chooses to focus, there are many variables that might act
as moderators of the predictive power of those predictors. We consider here a small subset of these
variables.
There are various environmental variables that might act as moderators in the prediction of
citizenship. Much can be learned from research on other criteria. Barrick and Mount (1993) found
that autonomy moderated the degree to which job performance was predicted by conscientiousness,
extraversion, and agreeableness. Because citizenship is more susceptible to individual choices than
is job performance, it seems likely that autonomy would moderate the relationship between most
predictors and citizenship. Colbert, Mount, Harter, Witt, and Barrick (2004) found that perceptions
of the developmental environment and perceived organizational support moderated the relationship
between personality and workplace deviance. This is consistent with the Ilies et al. (2006) finding
that attitudes and personality interact to predict citizenship. Although citizenship can be predicted
by factors such as personality traits (e.g., conscientiousness) and job attitudes (i.e., job satisfaction),
research is beginning to show that focusing only on factors predictive of task performance may
result in decreased ability to explain citizenship.
Citizenship is highly interpersonal in nature and, as was outlined above, is beginning to be
understood from a SET perspective. Accounting for personality and SET on citizenship perfor-
mance, Kamdar and Van Dyne (2007) used conscientiousness, agreeableness, LMX, and TMX
(team-member exchange) to predict citizenship. Consistent with prior research, both personality
traits predicted citizenship toward supervisors and coworkers. However, LMX and TMX were also
able to predict citizenship above and beyond personality (for supervisors and coworkers, respec-
tively). Further, agreeableness was found to moderate the relationship between quality of LMX
and citizenship such that individuals with high levels of agreeableness do not need high-quality
exchanges to engage in citizenship behavior.
In sum, Kamdar and Van Dyne’s (2007) findings suggested that when we fail to account for
nontraditional predictors such as exchange relationship quality, our ability to predict citizenship is
diminished, and agreeableness appears to have a more consistent relationship with citizenship than
it really does (in reality, it appears to change depending on exchange relationship quality).
The organizational justice literature has begun to explore the role of moderators in the relation-
ship between justice and citizenship. For instance Kamdar et al., (2006) found that procedural
justice only predicts citizenship when employees have job roles, which do not involve OCB. Thus
individuals who define their job role as involving citizenship will engage in these behaviors irre-
spective of whether they experience procedural justice at work or not. Those who do not will essen-
tially “withhold” citizenship when not fairly treated. Procedural justice has also been found to
478 Handbook of Employee Selection

be more strongly related to “taking charge” citizenship (beneficial, organizational change related
behaviors) when perceived discretion over the demonstration of these behaviors is low (i.e., they are
role prescribed; McAllister, Kamdar, Morrison, & Turban, 2007).
Interestingly, this same study found that altruistic/interpersonal citizenship had a stronger rela-
tionship with high perceived discretion of citizenship when procedural justice was low (McAllister
et al., 2007). This finding is consistent with research by Halbesleben and Bowler (2007), in which
interpersonal citizenship was found to be used as a social support coping mechanism when condi-
tions at work are stressful.
Findings related to procedural justice climate (PJC) are consistent with some of the research
outlined above. For instance, Yang, Mossholder, and Peng (2007) found that average group levels
of “power distance,” or the extent to which individuals defer to decisions of powerful individuals
and accept power imbalances, moderates the relationship between PJC and citizenship. Groups
with high average levels of power distance will not “withhold” citizenship toward the organization
when faced with unfair treatment at the group level because they do not feel that arbitrary decisions
made by leaders justify such a reaction. Other instances of multilevel research suggest that attitude
targets moderate relationships across levels. Thus, group-level justice perceptions are more strongly
related to “higher-level” recipients of citizenship. For instance, procedural and informational justice
climate (created by averaging both justices at individual level independently) predicted better as the
target of the citizenship “moved up” in level toward “the organization” (Liao & Rupp, 2005).
The relationship between perceived union support (union analog of perceived organizational
support, an indicator of exchange relationship quality) and citizenship was shown to become more
negative with increases in exchange ideology (the extent to which exchanges’ benefits shape behav-
ior; sensitivity to exchange quality) directed toward labor unions (Redman & Snape, 2005). This
was posited to be the result of a lack of salience in individual exchanges relative to collective soli-
darity (group-level concerns).
Other leader characteristics and practices have also been found to moderate relations between
personality and contextual variables and citizenship behavior. For instance, charismatic leadership
has been found to interact with feelings of follower belongingness to the group such that charisma
is less important in cases in which follower belongingness is high as leaders supplement belonging-
ness by making group identity salient and thereby increasing citizenship (Den Hartog, De Hoogh, &
Keegan, 2007). Leader influence tactics on subordinate’s citizenship performance has also been
found to be contingent upon the quality of the relationship between leader and follower (i.e., LMX).
For instance, inspirational techniques are negatively related to citizenship for followers with poor-
quality LMX because these appeals reinforce value incongruence between the leader and the fol-
lower (Sparrowe, Soetjipto, & Kraimer, 2006). However, those higher in LMX were encouraged to
engage in more citizenship by using exchange appeals in which resources are exchanged between
leader and subordinate because this was likely construed as “just another exchange” of many already
positive exchanges between the leader and follower (Sparrowe et al., 2006).
As a whole, the justice and SET-related research above suggests that to the extent that some
other individual difference predictor (personal dispositions toward citizenship, defining job role
as being helping, etc.) is not making an employee engage in citizenship, being treated well by the
organization can compensate. Thus, high levels of certain individual differences (i.e., some people
do not need procedural justice to help the organization) bound justice and SET’s prediction of
citizenship.
In addition to SET as an explanation of citizenship behavior, researchers are beginning to rec-
ognize the role of self-enhancement as a motive for citizenship (e.g., Bowler & Brass, 2006; Yun,
Takeuchi & Liu, 2007). Research has shown that in cases where an employee’s role is ambiguous,
employees will engage in more citizenship performance toward the organization to make up for their
inability to determine which behaviors are valued (Yun et al. 2007). This relationship only holds for
employees perceived as having high levels of affective organizational commitment, otherwise their
citizenship motives are transparent and recognized as being self-interested (Yun et al., 2007).
Adaptive and Citizenship-Related Behaviors at Work 479

One recent study has suggested that commitment may be less predictive than the configurations
of differing types of commitment (Sinclair, Tucker, Cullen, & Wright, 2005). The purpose of this
study was to tease apart how different profiles or levels of affective and continuance commitment
within persons predicted citizenship performance between persons. “Devoted” (high affective, con-
tinuance commitment) employees were found to have consistently higher citizenship than other
profiles and “free agents” (moderate continuance, low affective) were found to have consistently
low citizenship.
Finally, we return once again to Ilies et al. (2006). These authors first demonstrated that there is
meaningful within-person variance in citizenship. Using an event sampling approach, these authors
showed that a sizable percentage of the total variance in citizenship was within-person variance.
These authors then showed that within-person variance had a good deal of overlap with within-
person variance in job attitudes such as job satisfaction. Finally, these authors found that stable
individual difference variables such as agreeableness moderated this within-person relationship
(i.e., agreeableness was a significant predictor in a level-2 random coefficient model equation). By
treating citizenship as a within-person variable, Ilies et al. cast doubt on much of the previous work
on the topic.
In short, there are task and organizational variables that suppress adaptability or citizenship,
change their components, muffle the influence of determinants, or transmit the effects of those
determinants. If one is to maximize the utility of an adaptability-based or citizenship-based selec-
tion system, then these variables and this impact must be recognized.

IMPACT ON ORGANIZATIONAL OUTCOMES


IMPACT OF ADAPTABILITY
How can selecting individuals that are more likely to engage in adaptive behavior and adaptive
performance impact organizational bottom-line results? One answer to this question is to posit that
having more adaptive individuals makes for more adaptive organizations. This line of thinking
views organizational adaptability as an emergent phenomenon driven by the adaptive capabilities
of organizational members (Kozlowski, Watola, Nowakowski, Kim, & Botero, 2008). Still, one can
ask what such adaptability looks like at the level of organizational outcomes. Reviewing the existing
literature on all of the ways in which organizations adapt to become more effective is well beyond
the scope of this chapter. Here, we propose (based on educated speculation) three ways in which
individual-level adaptive performance might aggregate to affect or link to organizational outcomes;
namely, (a) managing change, (b) organizational learning, and (c) maintaining customer focus.
The first of these items suggests that organizations with higher levels of individual adaptive
capacity might manage change better. As suggested earlier, modern organizations merge, grow,
shrink, or expand (often globally), thus requiring adaptation on the part of its members. If members
are better able to tolerate, manage, and leverage such changes, organizations are likely to be more
effective. Research literature supports the contention that variables such as openness to change
serve as moderators of important organizational outcomes (e.g., satisfaction, turnover; Wanberg &
Banas, 2000).
In addition, constant change from technologies, globalization, restructuring, etc. require orga-
nizational members at various levels of aggregation (individual, teams/units, entire originations)
to learn new skills, tasks, and technologies. Thus, the popular notion of a “learning organization”
may depend largely on the adaptive capacity of its constituent members (Redding, 1997). Lastly,
as markets, environments, and missions change, organizations and their members must refocus on
what customers want, value, and need. Thus, we highlight maintaining a focus on customers as a
final potential organizational outcome related to adaptive performance. As individual performers
seek to adaptively sense and respond to customer demands, organizational effectiveness is likely
enhanced.
480 Handbook of Employee Selection

IMPACT OF CITIZENSHIP
Citizenship performance does indeed contribute to organizational effectiveness (e.g., George &
Bettenhausen, 1990; Karambayya, 1990, as cited in Podsakoff et al., 2000; Koys, 2001; Podsakoff,
Ahearne, & MacKenzie, 1997; Podsakoff & MacKenzie, 1994; Podsakoff et al., 2000). This is espe-
cially true in cases in which work tasks are interdependent in nature as highly interdependent tasks
are facilitated by helping behaviors and cooperation (Bachrach, Powell, Collins, & Richey, 2006b).
Bachrach et al. (2006b) found that high levels of interdependence lead to higher ratings of group-
level performance when moderate levels of OCB were observed. Other research has confi rmed
that fostering citizenship leads to positive organizational outcomes. Specifically, service-related
citizenship partially mediated the relationship between social exchange informed HR practices and
organizational productivity and turnover (Sun, Ayree, & Law, 2007).
Payne and Webber (2006) found that service-oriented citizenship (i.e., toward customers) is related
to customer attitudes such as satisfaction, loyalty intentions, relationship tenure, positive word-of-
mouth, and reduced complaining. Similar to Sun et al. (2007), these results suggested that positive
social exchanges and the resultant attitudes (i.e., employee job satisfaction) predict citizenship.
Research has demonstrated that citizenship performance is important in supervisory ratings of
performance (Borman et al., 1995; Conway, 1996; MacKenzie, Podsakoff, & Fetter, 1993; Rotundo
and Sackett, 2002; Werner, 1994). Group task interdependence has been found to moderate the
effects of OCB on performance appraisals (Bachrach, Powell, Bendoly, & Richey, 2006). Although
supervisors may typically ignore or deemphasize the centrality of OCB to overall job performance,
when tasks are highly interdependent, the need for cooperation and helping is more difficult to
disregard. Thus, with higher levels of interdependence, the influence of citizenship on performance
appraisal is more pronounced. However, other research suggests that the influence of citizenship on
performance appraisal is affected by social comparison. If an employee’s workgroup exhibits high
average levels of citizenship, any given employee’s levels are comparatively lower and tend to have
weaker associations with appraisal outcomes than employees in workgroups with lower average
levels of citizenship (Bommer, Dierdorff, & Rubin, 2007).
Whiting, Podsakoff, and Pierce (2008) decomposed the citizenship domain into “helping” or
altruistic citizenship, “voice” (similar to “taking charge”), and “loyalty” to gauge their independent
effects on performance appraisal outcomes. This study found that independent of task performance,
all three citizenship dimensions predicted appraisals, with loyalty having the strongest association.
Interestingly, a three-way interaction was found such that when helping is low and task performance
high, voice loses its association with positive appraisals. Because voice is more confrontational than
the other forms of citizenship, when an employee is not contributing in other areas of their job, their
challenges to organizational routine are undervalued.
Ferrin, Dirks, and Shah (2006) found that OCB-I’s or interpersonal citizenship behaviors were
predictive of ratings of interpersonal trust, especially in cases where the social networks of the rater
and ratee were similar. This is because interpersonal citizenship behaviors are explicitly altruistic in
nature and are therefore taken to be indicative of the trustworthiness of the person engaging in it.

TOO MUCH OF A GOOD THING?


TOO MUCH ADAPTABILITY?
In some performance environments, the study of excessive, unpredictable, and/or ineffective
changes in response to perceptions of altered situations may prove useful. For example, in military
settings, it may be important to research the nature of shifts between following standard operating
procedures (e.g., doctrine) and engaging in nonroutine acts of adaptive performance. It is possible
that an overemphasis on adaptability can lead to individuals, teams, or organizations that are out
of control or fail to institutionalize effective practices. Although we are not aware of research that
Adaptive and Citizenship-Related Behaviors at Work 481

directly addresses the boundary conditions under which adaptive performance becomes too adap-
tive, it should be one consideration in studying adaptive performance. In addition, there is little
extant evidence regarding subgroup differences in adaptive performance. Future research should
address this gap.

TOO MUCH CITIZENSHIP?


Research has shown that there are smaller subgroup differences on citizenship than on technical per-
formance (Hattrup, Rock, & Scalia, 1997; Murphy & Shiarella, 1997; Ployhart & Holtz, 2008). This
would suggest that a criterion space that is comprised of a larger percentage of citizenship should
yield smaller differences among demographic subgroups. The story is not so simple. Heilman and
Chen (2005) found that there are gender differences in expectations regarding the ratio of technical
to citizenship performance such that women are expected to engage in more and better citizenship
than are men. Specifically, they found that the positive effects of engaging in altruistic OCB were
observed in the performance appraisals of men but not of women. Conversely, women were penal-
ized by raters for not engaging in citizenship. The authors attributed this to gender differences in
role expectations. When women do not engage in altruistic citizenship, they are seen as failing to
fulfill their roles. When men do engage in citizenship, they are rewarded because their behavior is
seen as “extra-role.” This creates various problems for selection. First, it may skew validation results
by introducing criterion inflation or deflation depending on the gender of the incumbent. Second, it
creates a need for different weighting schemes for selection tests depending on gender; a need that
cannot legally be met.
Citizenship has also been linked to employee race. Jones and Schaubroek (2004) found that rela-
tive to White employees, non-White employees tended to have lower self- and supervisor-reported
OCB. However, this relationship was partially mediated by job satisfaction, negative affectivity,
and coworker social support. Further, citizenship has been found to be related to employee age. In a
meta-analysis by Ng and Feldman (2008), age was found to have several nonzero correlations with
self- and supervisor-rated dimensions of citizenship.
Citizenship behavior has also been linked to increased amounts of work-family conflict and
stress/strain, especially when an employee’s individual initiative is high. If an individual strives not
only to do their job well but be a good organizational citizen (i.e., engage in OCB-O’s; which often
consume time otherwise given to family), they are likely to experience role conflict with their family
life, an effect which is especially strong for working women (Bolino & Turnley, 2005).

CONCLUSIONS
The purpose of this chapter was to review recent research on two performance dimensions that
represent departures from traditional job performance models: adaptive performance and citizen-
ship performance. For each of these dimensions, we began by offering definitions that clarify their
nature and their distinctiveness. We then reviewed research on distal and proximal predictors of
these dimensions. Next, we discussed variables that might moderate the relationships between these
dimensions and other variables. Finally, we discussed the consequences of these dimensions.
This chapter should make clear that adaptive and citizenship performance are important. One
reason that they are important is that they relate to variables that are of great interest to organi-
zations, such as performance appraisal, group effectiveness, change management, and stressors.
Another reason is that they are distinct from alternative dimensions, conceptually and nomologi-
cally. If, as seems to be the case, these dimensions are underrepresented in performance appraisals,
then the weighting of dimensions in those systems is suboptimal. In addition to the obvious conse-
quences for organizational productivity, this would also result in discrimination against protected
groups to the degree that these dimensions create smaller subgroup differences than do the dimen-
sions that are commonly included in appraisal instruments.
482 Handbook of Employee Selection

Assuming that these dimensions are important, more work must be done to determine how
important they are (i.e., optimal weighting) and how relative importance varies with situational
or organizational characteristics. More research must also be done to identify the determinants of
these dimensions. Because many of these determinants are malleable (e.g., skills, leader behaviors),
research must also evaluate interventions designed to increase adaptive performance and citizen-
ship through the improvement of these determinants.
In closing, we would point out that there is no reason to stop here. If performance is to be under-
stood, then adaptive performance and citizenship must receive specific attention. However, there
are bound to be other dimensions of performance that should also be added to the mix. If it can be
demonstrated that a new dimension is conceptually distinct from existing dimensions, has impor-
tant consequences, and has a set of determinants that differ from those of other dimensions, then
that new dimension should also be added to our models of performance.

REFERENCES
Alge, B. J., Ballinger, G. A., Tangirala, S., & Oakley, J. L. (2006). Information privacy in organizations:
Empowering creative and extrarole performance. Journal of Applied Psychology, 91, 221–232.
Aryee, S., Chen, Z. X., Sun, L.-Y., & Debrah, Y. A. (2007). Antecedents and outcomes of abusive supervision:
Test of a trickle-down model. Journal of Applied Psychology, 92, 191–201.
Bachrach, D. G., Powell, B. C., Bendoly, E., & Richey, R. G. (2006a). Organizational citizenship behavior and
performance evaluations: Exploring the impact of task interdependence. Journal of Applied Psychology,
91, 193–201.
Bachrach, D. G., Powell, B. C., Collins, B. J., & Richey, R. G. (2006b). Effects of task interdependence on
the relationship between helping behavior and group performance. Journal of Applied Psychology, 91,
1396–1405.
Barnard, C. I. (1938). The functions of the executive. Cambridge, MA: Harvard University Press.
Bateman, T. S., & Organ, D. W. (1983). Job satisfaction and the good soldier: The relationship between affect
and employee “citizenship.” Academy of Management Journal, 26, 587–595.
Bergman, M. E., Donovan, M. A., & Drasgow, F. (2001, April). A framework for assessing contextual perfor-
mance. Paper presented at the sixteenth annual conference of the Society of Industrial and Organizational
Psychology, San Diego, CA.
Bettencourt, L. A., Gwinner, K. P., & Meuter, M. L. (2001). A comparison of attitude, personality, and knowledge
predictors of service-oriented organizational citizenship behaviors. Journal of Applied Psychology, 86,
29–41.
Blau, P. M. (1964). Exchange and power in social life. New York, NY: Wiley.
Bolino, M. C., & Turnley, W. H. (2005). The personal costs of citizenship behavior: The relationship between
individual initiative and role overload, job stress, and work-family conflict. Journal of Applied Psychology,
90, 740–748.
Bommer, W. H., Dierdorff, E. C., & Rubin, R. S. (2007). Does prevalence mitigate relevance? The mod-
erating effect of group-level OCB on employee performance. Academy of Management Journal, 50,
1481–1494.
Borman, W. C., Buck, D. E., Hanson, M. A., Motowidlo, S. J., Stark, S., & Drasgow, F. (2001a). An examina-
tion of the comparative reliability, validity, and accuracy of performance ratings made using computer-
ized adaptive rating scales. Journal of Applied Psychology, 86, 965–973.
Borman, W. C., & Motowidlo, S. J. (1993). Expanding the criterion domain to include elements of contextual
performance. In N. Schmitt & W. C. Borman (Eds.), Personnel selection in organizations (pp. 71–98).
San Francisco, CA: Jossey-Bass.
Borman, W. C., & Penner, L. A. (2001). Citizenship performance: Its nature, antecedents, and motives. In
B. W. Roberts & R. Hogan (Eds.), Personality psychology in the workplace. Decade of behavior (pp.
45–61). Washington, DC: American Psychological Association.
Borman, W. C., Penner, L. A., Allen, T. D., & Motowidlo, S. J. (2001b). Personality predictors of citizenship
performance. International Journal of Selection and Assessment, 9, 52–69.
Borman, W. C., White, L. A., & Dorsey, D. W. (1995). Effects of rate task performance and interpersonal fac-
tors on supervisor and peer performance ratings. Journal of Applied Psychology, 80, 168–177.
Borman, W. C., White, L. A., Pulakos, E. D., & Oppler, S. H. (1991). Models of supervisory job performance
ratings. Journal of Applied Psychology, 76, 863–872.
Adaptive and Citizenship-Related Behaviors at Work 483

Bowler, W. M., & Brass, D. J. (2006). Relational correlates of interpersonal citizenship behavior: A social net-
work perspective. Journal of Applied Psychology, 91, 70–82.
Brief, A. P., & Motowidlo, S. J. (1986). Prosocial organizational behaviors. Academy of Management Review,
11, 710–725.
Brooks-Shesler, L., LaPort, K., Dudley, N. M., & Cortina, J. M. (2009). Knowing how to get along: Knowledge
and skill predictors of cooperative performance. Manuscript submitted for publication.
Burke, S., Pierce, L., & Salas, E. (2006). Understanding adaptability: A prerequisite for effective performance
within complex environments. Cambridge, MA: Elsevier Science.
Burke, C. S., Stagl, K. C., Salas, E., Pierce, L., & Kendall, L. (2006). Understanding team adaptation: A con-
ceptual analysis and model. Journal of Applied Psychology, 91, 1189–1207.
Campbell, J. P. (1999). The definition and measurement of performance in the new age. In D. R. Ilgen, & E. D.
Pulakos (Eds.), The changing nature of performance. Implications for staffing, motivation, and develop-
ment (pp. 21–55). San Francisco, CA: Jossey-Bass.
Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. (1993). A theory of performance. In N. Schmitt, W. C.
Borman, & Associates (Eds.) Personnel selection in organizations (pp. 35–70). San Francisco: Jossey-Bass.
Chen, G., Thomas, B. A., & Wallace, J. C. (2005). A multilevel examination of the relationships among training
outcomes, mediating regulatory processes, and adaptive performance. Journal of Applied Psychology,
90, 827–841.
Colbert, A. E., Mount, M. K., Harter, J. K., Witt, L. A., & Barrick, M. R. (2004). Interactive effects of person-
ality and perceptions of the work situation on workplace deviance. Journal of Applied Psychology, 89,
599–609.
Coleman, V. I., & Borman, W. C. (2000). Investigating the underlying structure of the citizenship performance
domain. Human Resource Management Review, 10, 25–44.
Conway, J. M. (1996). Additional construct validity evidence for the task-contextual performance distinction.
Human Performance, 9, 309–329.
Cropanzano, R., & Mitchell, M. S. (2005). Social exchange theory: An interdisciplinary review. Journal of
Management, 31, 874–900.
DeArmond, S., Tye, M., Chen, P. Y., Krauss, A., Rogers, D. A., & Sintek, E. (2006). Age and gender stereo-
types: New challenges in a changing workplace and workforce. Journal of Applied Social Psychology,
36, 2184–2214.
Den Hartog, D. N., De Hoogh, A. H. B., & Keegan, A. E. (2007). The interactive effects of belongingness and
charisma on helping and compliance. Journal of Applied Psychology, 92, 1131–1139.
Dineen, B. R., Lewicki, R. J., & Tomlinson, E. C. (2006). Supervisory guidance and behavioral integrity:
Relationships with employee citizenship and deviant behavior. Journal of Applied Psychology, 91,
622–635.
DuBois, C. L. Z., Sackett, P. R., Zedeck, S., & Fogli, L. (1993). Further exploration of typical and maximum
performance criteria: Definitional issues, prediction, and White-Black differences. Journal of Applied
Psychology, 78, 205–211.
Dudley, N. M., & Cortina, J. M. (2008). Knowledge and skills that facilitate the personal support dimension of
citizenship. Journal of Applied Psychology, 93, 1249–1270.
Dudley, N. M., & Cortina, J. M. (2009). Knowledge and skills that predict helping. Manuscript submitted for
publication.
Entin, E. E., Diedrich, F. J., & Rubineau, B. (2003). Adaptive communication patterns in different organiza-
tional structures. In Proceedings of the Human Factors and Ergonomics Society (HFES) 47th Annual
Meeting (pp. 405–409). Santa Monica, CA: HFES.
Ferrin, D. L., Dirks, K. T., & Shah, P. P. (2006). Direct and indirect effects of third-party relationships on inter-
personal trust. Journal of Applied Psychology, 91, 870–883.
Ford, J. K., Smith, E. M., Weissbein, D. A., Gully, S. M., & Salas, E. (1998). Relationships of goal orientation,
metacognitive activity, and practice strategies with learning outcomes and transfer. Journal of Applied
Psychology, 83, 218–233.
George, J. M., & Bettenhausen, K. (1990). Understanding prosocial behaviour, sales performance, and turn-
over: A group-level analysis in a service context. Journal of Applied Psychology, 75(6), 698–709.
Glomb, T. M., & Welsh, E. T. (2005). Can opposites attract? Personality heterogeneity in supervisor-subordi-
nate dyads as a predictor of subordinate outcomes. Journal of Applied Psychology, 90, 749–757.
Graham, J.W. (1991). An essay on organizational citizenship behavior. Employee Responsibilities and Rights
Journal, 4, 249–270.
Griffin, M. A., Neal, A., & Parker, S. K. (2007). A new model of work role performance: Positive behavior in
uncertain and interdependent contexts. Academy of Management Journal, 50, 327–347.
484 Handbook of Employee Selection

Hackman, J. R., & Oldham, G. R. (1980). Work redesign. Reading, MA: Addison-Wesley.
Halbesleben, J. R. B., & Bowler, W. M. (2007). Emotional exhaustion and job performance: The mediating role
of motivation. Journal of Applied Psychology, 92, 93–106.
Heilman, M. E., & Chen, J. J. (2005). Same behavior, different consequences: Reactions to men’s and women’s
altruistic citizenship behavior. Journal of Applied Psychology, 90, 431–441.
Hesketh, B., & Neal, A. (1999). Technology and performance. In D. R. Ilgen & E. D. Pulakos (Eds.), The
changing nature of performance. Implications for staffing, motivation, and development (pp. 21–55). San
Francisco: Jossey-Bass.
Hoffman, B. J., Blair, C. A., Meriac, J. P., & Woehr, D. J. (2007). Expanding the criterion domain? A quantita-
tive review of the OCB literature. Journal of Applied Psychology, 92, 555–566.
Hunter, J. E. (1983). A causal model of cognitive ability, job knowledge, job performance, and supervisor
ratings. In F. J. Landy, S. Zedeck, & J. Cleveland (Eds.), Performance measurement and theory (pp.
257–266). Hillsdale, NJ: Lawrence Erlbaum.
Ilgen, D. R., & Pulakos, E. D. (1999). Employee performance in today’s organizations. In D. R. Ilgen & E. D.
Pulakos (Eds.), The changing nature of performance: Implications for staffing, motivation, and develop-
ment. San Francisco, CA: Jossey-Bass.
Ilies, R., Nahrgang, J. D., & Morgeson, F. P. (2007). Leader-member exchange and citizenship behaviors: A
meta-analysis. Journal of Applied Psychology, 92, 269–277.
Ilies, R., Scott, B. A., & Judge, T. A. (2006). The interactive effects of personal traits and experienced states on
intraindividual patterns of citizenship behavior. Academy of Management Journal, 49, 561–575.
Jackson, C. L., Colquitt, J. A., Wesson, M. J., & Zapata-Phelan, C. P. (2006). Psychological collectivism: A
measurement validation and linkage to group member performance. Journal of Applied Psychology, 91,
884–899.
Johnson, J. W. (2001). The relative importance of task and contextual performance dimensions to supervisor
judgments of overall performance. Journal of Applied Psychology, 86, 984–996.
Johnson, J. W. (2003). Toward a better understanding of the relationship between personality and individual
job performance. In M. R. Barrick & A. M. Ryan (Eds.), Personality and work: Reconsidering the role of
personality in organizations (pp. 83–120). San Francisco, CA: Jossey-Bass.
Joireman, J., Kamdar, D., Daniels, D., & Duell, B. (2006). Good citizens to the end? It depends: Empathy and
concern with future consequences moderate the impact of a short-term time horizon on organizational
citizenship behaviors. Journal of Applied Psychology, 91, 1307–1320.
Judge, T. A., LePine, J. A., & Rich, B. L. (2006). Loving yourself abundantly: Relationship of the narcissistic
personality to self- and other perceptions of workplace deviance, leadership, and task and contextual
performance. Journal of Applied Psychology, 91, 762–776.
Kamdar, D., McAllister, D. J., & Turban, D. B. (2006). “All in a day’s work”: How follower individual differ-
ences and justice perceptions predict OCB role definitions and behavior. Journal of Applied Psychology,
91, 841–855.
Kamdar, D., & Van Dyne, L. (2007). The joint effects of personality and workplace social exchange relation-
ships in predicting task performance and citizenship performance. Journal of Applied Psychology, 92,
1286–1298.
Kanfer, R. (1990). Motivation theory and industrial and organizational psychology. In M. D. Dunnette & L. M.
Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol. 1, pp. 75–170). Palo
Alto, CA: Consulting Psychologists Press.
Karambayya, R. (1990). Contexts for organizational citizenship behavior: Do high performing and satisfy-
ing units have better ‘citizens’. York University working paper. See Podsakoff, MacKenzie, Paine, &
Bachrach (2000).
Katz, D. (1964). Motivational basis of organizational behavior. Behavioral Science, 9, 131–146.
Koys, D. J. (2001). The effects of employee satisfaction, organizational citizenship behavior, and turnover on
organizational effectiveness: A unit-level, longitudinal study. Personnel Psychology, 54, 101–114.
Kozlowski, S. W. J., Gully, S. M., Brown, K. G., Salas, E., Smith, E. M., & Nason, E. R. (2001). Effects of
training goals and goal orientation traits on multi-dimensional training outcomes and performance adapt-
ability. Organizational Behavior and Human Decision Processes, 85, 1–31.
Kozlowski, S. W. J., Watola, D., Nowakowski, J. M., Kim, B., & Botero, I. (2008). Developing adaptive teams:
A theory of dynamic team leadership. In E. Salas, G. F. Goodwin, & C. S. Burke (Eds.), Team effective-
ness in complex organizations: Cross-disciplinary perspectives and approaches (pp. 113–155). Mahwah,
NJ: Lawrence Erlbaum.
LePine, J. A., Colquitt, J. A., & Erez, A. (2000). Adaptability to changing task contexts: Effects of general cog-
nitive ability, conscientiousness, and openness to experience. Personnel Psychology, 53, 563–593.
Adaptive and Citizenship-Related Behaviors at Work 485

LePine, J. A., Erez, A., & Johnson, D. E. (2002). The nature and dimensionality of organizational citizenship
behavior: A critical review and meta-analysis. Journal of Applied Psychology, 87, 52–65.
Liao, H., & Rupp, D. E. (2005). The impact of justice climate and justice orientation on work outcomes:
A cross-level multifoci framework. Journal of Applied Psychology, 90, 242–256.
Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance. Englewood Cliffs, NJ:
Prentice Hall.
MacKenzie, S. B., Podsakoff, P. M., & Fetter, R. (1993). The impact of organizational citizenship behavior on
evaluations of salesperson performance. Journal of Marketing, 57, 70–80.
McAllister, D. J., Kamdar, D., Morrison, E. W., & Turban, D. B. (2007). Disentangling role perceptions: How
perceived role breadth, discretion, instrumentality, and efficacy relate to helping and taking charge.
Journal of Applied Psychology, 92, 1200–1211.
Mitchell, T. R., & Daniels, E. (2003). Motivation. In W. C. Borman, D. R., Ilgen, & R. J. Klimoski (Eds.),
Comprehensive handbook of psychology: Vol. 12. Industrial and organizational psychology (pp. 225–
254). New York, NY: Wiley.
Moon, H., Hollenbeck, J. R., Humphrey, S. E., Ilgen, D. R., West, B. J., Ellis, A. P. J., & Porter, C. O. L. H.
(2004). Asymmetric adaptability: Dynamic team structures as one-way streets. Academy of Management
Journal, 47, 681–695.
Moon, H., Kamdar, D., Mayer, D. M., & Takeuchi, R. (2008). Me or we? The role of personality and justice as
other-centered antecedents to innovative citizenship behaviors within organizations. Journal of Applied
Psychology, 93, 84–94.
Morgeson, F. P., Reider, M. H., & Campion, M. A. (2005). Selecting individuals in team settings: The impor-
tance of social skills, personality characteristics, and teamwork knowledge. Personnel Psychology, 58,
583–611.
Motowidlo, S. J., Borman, W. C., & Schmitt, M. J. (1997). A theory of individual differences in task and con-
textual performance. Human Performance, 10, 71–83.
Mueller-Hanson, R. A., White, S. S., Dorsey, D. W., & Pulakos, E. D. (2005). Training adaptable leaders:
Lessons from research and practice (ARI Research Report 1844). Arlington, VA: U.S. Army Research
Institute for the Behavioral and Social Sciences.
Organ, D. W. (1988). A restatement of the satisfaction-performance hypothesis. Journal of Management, 14,
547–557.
Organ, D. W., & Ryan, K. (1995). A meta-analytic review of attitudinal and dispositional predictors of organi-
zational citizenship behavior. Personnel Psychology, 48, 775–802.
Payne, S. C., & Webber, S. S. (2006). Effects of service provider attitudes and employment status on citi-
zenship behaviors and customers’ attitudes and loyalty behavior. Journal of Applied Psychology, 91,
365–378.
Piccolo, R. F., & Colquitt, J. A. (2006). Transformational leadership and job behaviors: The mediating role of
core job characteristics. Academy of Management Journal, 49, 327–340.
Ployhart, R. E., & Bliese, P. D. (2006). Individual adaptability (I-ADAPT) theory: Conceptualizing the anteced-
ents, consequences, and measurement of individual differences in adaptability. In S. Burke, L. Pierce, &
E. Salas (Eds.), Understanding Adaptability: A Prerequisite for Effective Performance within Complex
Environments (pp. 3–39). New York, NY: Elsevier.
Podsakoff, P. M., Ahearne, M., & MacKenzie, S. B. (1997). Organizational citizenship behavior and the quan-
tity and quality of work group performance. Journal of Applied Psychology, 82, 262–270.
Podsakoff, P. M., & MacKenzie, S. B. (1997). Impact of organizational citizenship behavior on organizational
performance: A review and suggestions for future research. Human Performance, 10, 133–151.
Podsakoff, P. M., MacKenzie, S. B., Paine, J. B., & Bachrach, D. G. (2000). Organizational citizenship behav-
iors: A critical review of the theoretical and empirical literature and suggestions for future research.
Journal of Management, 26, 513–563.
Porath, C. L., & Erez, A. (2007). Does rudeness really matter? The effects of rudeness on task performance and
helpfulness. Academy of Management Journal, 50, 1181–1197.
Pulakos, E. D., Arad, S., Donovan, M. A., & Plamondon, K. E. (2000). Adaptability in the work
place: Development of a taxonomy of adaptive performance. Journal of Applied Psychology, 85,
612–624.
Pulakos, E. D., Schmitt, N., Dorsey, D. W., Hedge, J. W., & Borman, W. C. (2002). Predicting adaptive perfor-
mance: Further tests of a model of adaptability. Human Performance, 15, 299–323.
Raybourn, E.M., Deagle, E., Mendini, K., & Heneghan, J. (2005). Adaptive thinking and leadership simulation
game training for Special Forces officers (I/ITSEC 2005). Proceedings, Interservice/ Industry Training,
Simulation and Education Conference, November 28-December 1, Orlando, FL.
486 Handbook of Employee Selection

Redding, J. (1997). Hardwiring the learning organization. Training & Development, 15, 61–67.
Redman, T., & Snape, E. (2005). Exchange ideology and member-union relationships: An evaluation of mod-
eration effects. Journal of Applied Psychology, 90, 765–773.
Rioux, S. M., & Penner, L. A. (2001). The causes of organizational citizenship behavior: A motivational analy-
sis. Journal of Applied Psychology, 86, 1306–1314.
Rosen, C. C., Levy, P. E., & Hall, R. J. (2006). Placing perceptions of politics in the context of the feed-
back environment, employee attitudes, and job performance. Journal of Applied Psychology, 91,
211–220.
Rotundo, M., & Sackett, P. R. (2002). The relative importance of task, citizenship, and counterproductive
performance to global ratings of job performance: A policy-capturing approach. Journal of Applied
Psychology, 87, 66–80.
Schmidt, F. L., Hunter, J. E., & Outerbridge, A. N. (1986). Impact of job experience and ability on job
knowledge, work sample performance, and supervisory ratings of job performance. Journal of Applied
Psychology, 71, 432–439.
Schmit, M. J., Motowidlo, S. J., Degroot, T., Cross, T., & Kiker, D. S. (1996, April). Explaining the relation-
ship between personality and job performance. Paper presented at the annual meeting of the Society for
Industrial and Organizational Psychology, San Diego, CA.
Schmitt, N., & Chan, D. (2006). Situational judgment tests: Method or construct? In J. Weekley & R. E.
Ployhart (Eds.), Situational judgment tests (pp.135–156). Mahwah, NJ: Lawrence Erlbaum.
Schmitt, N., Cortina, J. M., Ingerick, M. J., & Wiechmann, D. (2003). Personnel selection and employee perfor-
mance. Handbook of psychology: Industrial and organizational psychology (Vol. 12, pp. 77–105). New
York, NY: John Wiley & Sons, Inc.
Sinclair, R. R., Tucker, J. S., Cullen, J. C., & Wright, C. (2005). Performance differences among four organiza-
tional commitment profiles. Journal of Applied Psychology, 90, 1280–1287.
Smith, C. A., Organ, D. W., & Near, J. P. (1983). Organizational citizenship behavior: Its nature and anteced-
ents. Journal of Applied Psychology, 68, 653–663.
Sparrowe, R. T., Soetjipto, B. W., & Kraimer, M. L. (2006). Do leaders’ influence tactics that relate to mem-
bers’ helping behavior? It depends on the quality of the relationship. Academy of Management Journal,
49, 1194–1208.
Stewart, G. L., & Nandkeolyar, A. (2006). Adaptability and intraindividual variation in sales outcomes:
Exploring the interactive effects of personality and environmental opportunity. Personnel Psychology,
59, 307–332.
Sun, L.-Y., Aryee, S., & Law, K. S. (2007). High-performance human resource practices, citizenship behav-
ior, and organizational performance: A relational perspective. Academy of Management Journal, 50,
558–577.
Van Scotter, J. R., & Motowidlo, S. J. (1996). Interpersonal facilitation and job dedication as separate facets of
contextual performance. Journal of Applied Psychology, 81, 525–531.
Venkataramani, V., & Dalal, R. S. (2007). Who helps and harms whom? Relational antecedents of interpersonal
helping and harming in organizations. Journal of Applied Psychology, 92, 952–966.
Wanberg, C.R., & Banas, J. (2000). Predictors and outcomes of openness to changes in a reorganizing work-
place. Journal of Applied Psychology, 85, 132–142.
Wang, H., Law, K. S., Hackett, R. D., Wang, D., & Chen, Z. X. (2005). Leader-member exchange as a mediator
of the relationship between transformational leadership and followers’ performance and organizational
citizenship behavior. Academy of Management Journal, 48, 420–432.
Werner, J. M. (1994). Dimensions that make a difference: Examining the impact of in-role and extra-role
behaviors on supervisory ratings. Journal of Applied Psychology, 79, 98–107.
White, S. S., Mueller-Hanson, R. A., Dorsey, D. W., Pulakos, E. D., Wisecarver, M. M., Deagle, E. A., &
Mendini, K. G. (2005). Developing adaptive proficiency in Special Forces officers. (ARI Research Report
1831). Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
Whiting, S. W., Podsakoff, P. M., & Pierce, J. R. (2008). Effects of task performance, helping, voice, and orga-
nizational loyalty on performance appraisal ratings. Journal of Applied Psychology, 93, 125–139.
Williams, L. J., & Anderson, S. E. (1991). Job satisfaction and organizational commitment as predictors of
organizational citizenship and in-role behaviors. Journal of Management, 17, 601–617.
Wood, R. E., & Locke, E. A. (1990). Goal setting and strategy effects on complex tasks. In B. Staw & L. L.
Cummings (Eds.), Research in organizational behavior (Vol. 12, pp. 73–110). Greenwich, CT: JAI
Press.
Yang, J., Mossholder, K. W., & Peng, T. K. (2007). Procedural justice climate and group power distance: An
examination of cross-level interaction effects. Journal of Applied Psychology, 92, 681–692.
Adaptive and Citizenship-Related Behaviors at Work 487

Yun, S., Takeuchi, R., & Liu, W. (2007). Employee self-enhancement motives and job performance behaviors:
Investigating the moderating effects of employee role ambiguity and managerial perceptions of employee
commitment. Journal of Applied Psychology, 92, 745–756.
Zaccaro, S. J., & Banks, D. (2004). Leader Visioning and Adaptability: Bridging the gap between research and
practice on developing the ability to manage change. Human Resource Management, 43, 367–380.
This page intentionally left blank
23 Counterproductive Work
Behavior and Withdrawal

Maria Rotundo and Paul E. Spector

Counterproductive work behavior (CWB) is an umbrella term that refers to a wide range of acts
conducted by employees that are harmful to organizations and their stakeholders. Whereas some
specific acts of CWB, most notably withdrawal behaviors such as absence and turnover, have been
investigated for decades, the emergence of the study of CWB as a broad class of behaviors is a
recent development. Unfortunately, the literature on behaviors that can be classified as CWB is
broad and disjointed and is in need of better integration. As we will note, there have been several
terms used to refer to conceptually distinct but operationally overlapping if not identical constructs
that are often studied in isolation from one another.
Our goal in this handbook chapter is to provide an integrative overview of the literature that links
the various forms of CWB that have been studied in the literature. We will begin with an overview
from a historical perspective of the different concepts that can be subsumed under the CWB term.
Second, we will discuss measurement issues and how CWB has been studied. Third, we will dis-
cuss potential antecedents of CWB that arise from the work environment and the person. Fourth,
we will discuss potential consequences of CWB to organizations and stakeholders, including indi-
vidual employees, groups, and customers/clients. Finally, we will take a look forward and suggest
areas that need attention by CWB researchers.

NATURE OF CWB
DEFINITIONS
CWB is defined as intentional behavior by employees that harms or intends to harm organizations
and people in organizations, including employees and customers/clients (Spector & Fox, 2005).
Excluded are behaviors that are accidental or not under a person’s control. Having an accident would
not be CWB unless it occurred because of willful failure to follow required work protocols (e.g.,
failure to wear safety equipment), nor would poor performance be CWB if it were due to lack of
ability to perform job tasks despite efforts to do so. This distinguishes CWB from task performance
itself.
The CWB term reflects a category of disparate behaviors that share the characteristic of being
harmful. This includes nasty interpersonal behavior (e.g., insults and yelling at someone), behaviors
directed toward inanimate objects (e.g., sabotage and theft), purposely doing work incorrectly, and
withdrawal from the organization (e.g., absence or lateness). These varied behaviors fit within the
framework of CWB, but they are not all manifestations of the same thing. Furthermore, these vari-
ous classes of behavior have been studied in isolation and have their own literatures distinct from
CWB and related terms.
Research interest in these various behaviors was sporadic until at least 1980, with relatively few
empirical articles being found before that time. A keyword check of the PsychInfo database from

489
490 Handbook of Employee Selection

1865 until 2007 for the earliest empirical journal article on withdrawal finds a turnover study in
1925 (Bills) and an absence study in 1971 (Chadwick-Jones, Brown, Nicholson, & Sheppard). The
earliest theft study in our journal literature was Rosenbaum (1976). Entering the keyword “employee
absenteeism” received 43 hits between 1865 and 1974, but 1115 hits from 1975 to 2007; for turnover
the numbers of hits were 70 and 1910, respectively, for these time periods. Clearly there has been an
acceleration of interest in these topics after the mid-1970s.
The idea of combining disparate acts into a single construct is fairly recent. Perhaps the earliest
empirical example was Spector (1975), who created a checklist of CWBs that he termed “organi-
zational aggression.” Basing his work on the social psychological aggression research, he argued
that these behaviors were a response to frustration at work. Hollinger and Clark (1982) applied the
sociological concept of deviance to the workplace, studying behavior that represented deviations
from the norms of organizations or society. Included were many of the same behaviors we define as
CWB. Few other researchers showed interest in these behaviors until the mid-1990s. Of particular
relevance is Robinson and Bennett’s (1995) expanded discussion of the deviance concept, suggesting
that it is meaningful to divide these behaviors by target, specifically the organization (subsequently
referred to as CWBO) that includes behavior such as calling in sick when not ill or destroying orga-
nizational property versus people (subsequently referred to as CWBP), which includes making nasty
comments to coworkers or physical violence. Several researchers following Robinson and Bennett
adopted the deviance terminology in their work on behaviors that we classify here as CWB.
From 1995 on there was an explosion of interest in the harmful acts by employees and a pro-
liferation of terms and perspectives (see Pearson, Andersson, & Porath, 2000, and Spector & Fox,
2005, for contrasts among the various terms). Furthermore, whereas the early researchers focused
on actors, some of the new generation of researchers focused on the targets of these behaviors.
On the actor side, of particular note are the concepts of retaliation (Skarlicki, & Folger, 1997),
which includes harmful acts in response to injustice, and revenge (Bies & Tripp, 1996), which
includes provoked behaviors intended to even the score. Other terms that can be found in the lit-
erature include antisocial behavior (Giacalone & Greenberg, 1997), counterproductive job perfor-
mance (Collins & Griffin, 1998), counterproductive employee behavior (Duffy, Ganster, & Shaw,
1998), counterproductive performance (Rotundo & Sackett, 2002), dysfunctional behavior (Griffin,
O’Leary-Kelly, & Collins, 1998), employee delinquency (Hogan & Hogan, 1989), and misbehavior
(Vardi & Weitz, 2002). A similar phenomenon has been studied as employee dishonesty (Jones,
1981) or integrity (Brown, Jones, Terris, & Steffy, 1987) from the perspective of employee selection
and for the purposes of weeding out job applicants who have the potential to engage in a variety of
CWBs. The term CWB itself seems to be catching on recently as a generic term that encompasses
most of these other terms. CWB first shows up in two places at the same time from independent
perspectives. Sackett and DeVore (2001) referred to counterproductive workplace behavior that they
view as an aspect of job performance; specifically, it consists of intentional acts that run counter to
an organization’s legitimate interests. Fox, Spector, and Miles (2001) took an employee perspective
in defining CWB as acts that harm organizations or people, whether or not it has implications for
performance.
On the target side, work in this area first began in Europe with research concerning abusive and
harmful behaviors that individuals endure, including bullying (Einarsen, Raknes, Mattiesen, &
Hellesoy, 1996) and mobbing (Knorz & Zapf, 1996). Although there are similarities in both con-
structs, the major difference is that bullying concerns the acts of a single actor directed toward one
or more targets, whereas mobbing concerns the acts of one or more actors directed toward a single
target (Zapf & Einarsen, 2005). In the United States, Tepper (2000) was among the first to study
abusive supervision, whereas at the same time Pearson, Andersson, and Porath (2000) introduced
the concept of workplace incivility, which is a milder and more ambiguous form of nasty interper-
sonal behavior.
Although the various terms for CWB represent conceptually distinct concepts, those distinctions
are not always maintained in researchers’ operationalizations. Studies taking various perspectives
Counterproductive Work Behavior and Withdrawal 491

provide respondents with checklists of behaviors that they have performed or endured. Absent from
these checklists in most cases is reference to intentions, motives, or norms. A notable exception is
Skarlicki and Folger (1997), who specifically asked about acts of retaliation. Spector and Fox (2005,
p. 157) illustrated the overlap in behaviors by providing example items from measures of aggres-
sion, deviance, and retaliation that were highly similar. For example, each scale had an item about
stealing, starting and/or spreading rumors, speaking poorly about the employer, and calling in sick
when not ill. The extent to which individual acts might represent aggression, deviance, or retaliation
is not always clear.

DIMENSIONS OF CWB
Most studies of CWB and related global constructs (e.g., deviance or retaliation) have used mea-
sures of overall CWB. A handful of studies have followed Robinson and Bennett’s (1995) lead in
dividing these behaviors according to the target of organization versus person (e.g., Fox & Spector,
1999; Penney & Spector, 2005). On the target side, most studies of abuse, bullying, incivility, and
mobbing have used a single index rather than devising subdimensions. More recently Spector, Fox,
Penney, Bruursema, Goh, and Kessler (2006) argued that the individual behaviors subsumed by
CWB were too diverse to be reduced to a single construct. They divided an overall CWB scale
into five dimensions of abuse (nasty acts directed toward people), production deviance (purposely
doing work incorrectly or working slowly), sabotage (physical destruction/defacing of property),
theft (stealing property from organizations or people), and withdrawal (working fewer hours than
required). They showed that the various dimensions had different patterns of relationships with
other variables, suggesting they should be studied independently.
Sackett and DeVore (2001) listed an even finer grained and expanded 11-category classification
of CWB identified by Gruys (1999). They are theft, property destruction, misuse of information,
misuse of resources, unsafe behavior, withdrawal, poor-quality work, alcohol use, drug use, inap-
propriate verbal actions, and inappropriate physical actions.

ASSESSMENT OF CWB
Except for studies of specific withdrawal behaviors such as absence or turnover, the predominant
method for studying CWB has been the self-report survey. Researchers have devised checklists of
behaviors that ask participants to indicate if or how often they engage in each of several specific
acts, or how often they have either observed or endured a list of acts. In most cases, individual
groups of researchers have devised their own checklists, although the specific items are quite over-
lapping (cf., Spector & Fox, 2005, p. 157). Some articles list the items used, making it possible to
recreate the measures (e.g., Skarlicki & Folger, 1997). Bennett and Robinson (2000) described the
development of their scale to assess CWB (which they termed deviance) directed toward organiza-
tions versus people. Spector et al. (2006) provided a scale to assess the five dimensions of CWB
noted in the prior section.
Several researchers have used alternative sources of data on CWB, typically coworkers. For
example, Skarlicki and Folger (1997) asked coworkers to report on the participant’s CWB. Spector
and colleagues have done a series of studies in which participant self-reports were contrasted with
coworker reports of the participant’s CWB divided into organization versus person target (e.g.,
Bruk-Lee & Spector, 2006; Penney & Spector, 2005; Fox, Spector, Goh, & Bruursema, 2007). In all
but one case (organization-targeted CWB in Fox et al., 2007), correlations were significant between
the two sources, with a mean across studies of .29.
For withdrawal behaviors studied separately, it is common to use organizational records on
absence and turnover, although some researchers have used self-reports. There is some evidence
suggesting at least with absence that under some conditions there is reasonable convergence between
self-reports and records (Johns, 1994). Self-reports become more attractive if absence is divided
492 Handbook of Employee Selection

into types, such as absence due to illness versus other reasons. When absence and other forms of
withdrawal are studied as forms of CWB, self-reports are typically used so individuals can report
on instances in which they purposely avoided work, such as calling in sick when not ill or taking
overly long breaks.

POTENTIAL ANTECEDENTS OF CWB


There are several variables that have been studied as possible precursors to CWB. Although the
nature of most studies is unable to allow confident conclusions that such variables are in fact causal,
they fit into proposed theoretical frameworks, suggesting that the combination of individual differ-
ences and environmental conditions lead to these behaviors. Figure 23.1, which is based on Spector
and Fox (2005), is characteristic of such models. It suggests that the connection between environ-
mental conditions that might trigger such behaviors and individual differences that might serve as
predisposing factors might be mediated by emotions and perhaps attitudes. Furthermore, individual
differences might well interact with environmental conditions, suggesting that there are moderating
and not just additive effects between these classes of variables. Finally, it is possible that there are
environmental moderators, most notably control.

INDIVIDUAL DIFFERENCES
There has been considerable work showing a link between individual difference variables and
CWB. One stream of research has concerned the development and validation of integrity tests that
can be used to screen CWB-prone individuals from being hired. This work shows a link between
integrity, as defined by the tests, and CWB, but most of this literature does not identify specific
personality traits that might give insights into personality processes. Rather, test developers choose
disparate items based empirically on their relationships with CWB criteria. The conclusion from
this literature is that individual differences can predict CWB. A second stream is work linking spe-
cific personality characteristics to CWB. One substream links the five categories of the Five-Factor
Model to CWB, whereas another investigates individual personality traits. Finally, some studies
have included demographic characteristics to determine their relationship to CWB. A summary of
effect sizes reported in meta-analyses of antecedents can be found in Table 23.1 for CWB and Table
23.2 for absence and withdrawal behaviors.

Attitudes &
Environment CWB
emotion

Individual
differences

FIGURE 23.1 General framework depicting various antecedents of CWB and withdrawal.
Counterproductive Work Behavior and Withdrawal 493

TABLE 23.1
Summary of Effect Sizes Reported in Meta-Analyses of Antecedents of CWBs
Variables CWB CWBP CWBO Author(s) Comment
Integrity tests
Overt .39 Ones, Viswesvaran, & Mean observed correlation
Schmidt (1993)
Personality-based .22 Ones, Viswesvaran, & Mean observed correlation
Schmidt (1993)
Personality
Conscientiousness −.18 Salgado (2000) Mean observed correlation
−.29 Dalal (2005) Mean sample-size-weighted
correlation
−.19 −.34 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation
Agreeableness −.13 Salgado (2000) Mean observed correlation
−.36 −.25 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation
Emotional stability −.04 Salgado (2000) Mean observed correlation
−.20 −.19 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation
Openness −.10 Salgado (2000) Mean observed correlation
−.07 −.03 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation
Extraversion −.01 Salgado (2000) Mean observed correlation
.02 −.07 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation
Attitudes
Job satisfaction −.29 Dalal (2005) Mean sample-size-weighted
correlation
−.14 −.31 Hershcovis et al. (2007) Mean uncorrected correlation
Organization −.28 Dalal (2005) Mean sample-size-weighted
commitment correlation
Negative effect .34 Dalal (2005) Mean sample-size-weighted
correlation
.22 .24 Hershcovis et al. (2007) Mean uncorrected correlation
Trait anger .37 .28 Hershcovis et al. (2007) Mean uncorrected correlation
Positive effect −.28 Dalal (2005) Mean sample-size-weighted
correlation
Stressors
Organization .26 .31 Hershcovis et al. (2007) Mean uncorrected correlation
constraints
Interpersonal conflict .40 .33 Hershcovis et al. (2007) Mean uncorrected correlation
Justice
Organizational −.25 Dalal (2005) Mean sample-size-weighted
correlation
Distributive −.22 Cohen-Charash & Mean sample-size-weighted
Spector (2001) correlation
.12 .12 Hershcovis et al. (2007) Mean uncorrected correlation
.12 .10 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation

continued
494 Handbook of Employee Selection

TABLE 23.1 (continued)


Summary of Effect Sizes Reported in Meta-Analyses of Antecedents of CWBs
Variables CWB CWBP CWBO Author(s) Comment
Procedural −.28 Cohen-Charash & Mean sample-size-weighted
Spector (2001) correlation
−.18 −.18 Hershcovis et al. (2007) Mean uncorrected correlation
−.19 −.18 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation
Interactional −.22 −.18 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation
Interpersonal −.17 −.06 Berry, Ones, & Sackett Mean sample-size-weighted
(2007) correlation

Integrity Tests
There are several key papers/resources on the ability of integrity tests to predict CWB and with-
drawal behaviors. Ones, Viswesvaran, and Schmidt (1993) conducted the most comprehensive
meta-analysis of integrity test validities for predicting CWB and job performance, and Ones,
Viswesvaran, and Schmidt’s (2003) meta-analysis focused on predicting absenteeism. Sackett and
colleagues reviewed the state of research on integrity tests in two separate reviews and a book chap-
ter (Berry, Sackett, & Wiemann, 2007; Sackett & DeVore, 2001; Sackett & Wanek, 1996). As noted
in these reviews, larger coefficients on integrity tests are reported for self-report measures of CWB
and for concurrent rather than predictive validation designs (Sackett & DeVore, 2001). Integrity
tests can be designed to assess personality traits that relate to various CWBs (labeled personality-
based integrity tests) or designed to assess attitudes toward various forms of CWBs, such as theft
and admission of various forms of CWB (labeled overt-based integrity tests; Berry et al., 2007).
Personality-based (.22) and overt integrity tests (.39) show meaningful mean observed correlations
with the broad construct of CWB (Ones et al., 1993, Sackett & DeVore, 2001) and absence criteria
(.25 for personality-based tests) (Ones et al., 2003). A summary of these relationships can be found
in Tables 23.1 and 23.2.
New research relating integrity tests to CWB involves laboratory work in settings where the CWB
of participants can be observed and recorded (Berry et al., 2007). This approach seeks to improve
on some of the limitations of traditional methods that involve self-reports of CWB and the potential
bias and unwillingness to admit to behaviors associated with these approaches (Berry et al., 2007).
As noted in Berry et al. (2007), the results of this research are mixed in that some studies reported
significant relationships between integrity test scores and CWB (Mikulay & Goffin, 1998; Nicol &
Paunonen, 2002), whereas other studies found no relationship (Hollwitz, 1998; Horn, Nelson, &
Brannick, 2004). Concerns about the external validity of these types of research designs and also
the fidelity of the behaviors/settings have been raised (Berry et al., 2007).

Personality: Five-Factor Model


Substantial research attention has been placed on relating the Five-Factor Model (FFM) dimen-
sions to CWB and withdrawal. Three separate meta-analyses and the Sackett and DeVore (2001)
book chapter form the main pieces of research that summarize these relationships. Salgado’s (2000)
meta-analysis focused on personality correlates of various forms of CWB, whereas Dalal’s (2005)
meta-analysis centered on overall CWB, and Berry, Ones, and Sackett (2007) meta-analyzed corre-
lates separately for interpersonal and organizational forms of CWB (they used the term deviance).
Among the FFM dimensions, the largest correlations have been found for conscientiousness,
agreeableness, and emotional stability (Berry et al., 2007; Dalal, 2005; Salgado, 2000). These
results are summarized in Table 23.1. Conscientiousness demonstrated a stronger relationship (mean
Counterproductive Work Behavior and Withdrawal 495

TABLE 23.2
Summary of Effect Sizes Reported in Meta-Analyses of Antecedents of Withdrawal
Behaviors
Variable Turnover Absence Withdrawal Authors Comment
Integrity tests
All tests .14 Ones, Viswesvaran, & Mean observed correlation
Schmidt (2003)
Overt .06 Ones, Viswesvaran, & Mean observed correlation
Schmidt (2003)
Personality-based .25 Ones, Viswesvaran, & Mean observed correlation
Schmidt (2003)
Stressors
Hindrance .18 .17 Podsakoff, LePine, & Mean sample-size-
LePine (2007) weighted correlation
Challenge .04 (ns) .06 (ns) Podsakoff, LePine, & Mean sample-size-
LePine (2007) weighted correlation
Organizational
commitment
Affective −.17 −.15 −.56 Meyer et al. (2002) Weighted average
corrected correlation
Normative −.16 .05 (ns) −.33 Meyer et al. (2002) Weighted average
corrected correlation
Continuance −.10 .06 −.18 Meyer et al. (2002) Weighted average
corrected correlation
Justice
Distributive −.41 Colquitt, Conlon, Wesson, Mean uncorrected
Porter, & Ng (2001) population correlation
Procedural −.36 Colquitt, Conlon, Wesson, Mean uncorrected
Porter, & Ng (2001) population correlation

sample-size-weighted correlation) with CWBO (−.34) compared with CWBP (−.19), whereas agree-
ableness demonstrated the opposite pattern—stronger relationship with CWBP (−.36) compared
with CWBO (−.25) (Berry et al., 2007). The relationships for emotional stability did not differ for
CWBP (−.20) and CWBO (−.19). Less support was found for openness or extraversion. Thus, from
several meta-analytic findings, it can be concluded that conscientiousness, agreeableness, and emo-
tional stability tend to demonstrate the largest correlations with CWB.
It is also useful to note that research has shown nonnegligible correlations between integrity test
scores and each of conscientiousness, agreeableness, and emotional stability (of the FFM personality
traits) leading some researchers to suggest that integrity test scores reflect a compound variable
comprised of conscientiousness, agreeableness, and emotional stability (Ones & Viswesvaran,
1998). Hence, it is not surprising to find that integrity tests and these three personality traits correlate
significantly with CWB.
Although the preponderance of personality research on CWB over the years focused on the
FFM, Lee et al. (2005a,b) recently have considered the role of another factor of personality labeled
“honesty-humility.” This factor is not originally part of the FFM and has emerged as a sixth factor
in some lexical studies of personality. Honesty-humility includes adjectives such as honest, frank,
truthful, sly, and calculating (Ashton, Lee, & Son, 2000). Research indicates that it relates to CWB
with observed correlations ranging from −.34 to −.55 and explains incremental variance over the
FFM and integrity tests in predicting antisocial behavior and workplace delinquency (Lee, Ashton, &
de Vries, 2005; Lee, Ashton, & Shin, 2005; Marcus, Lee, & Ashton, 2007).
496 Handbook of Employee Selection

Individual Personality Traits


Several individual personality traits have been studied in the literature outside of the FFM frame-
work. Some of those traits are affective dispositions, most notably trait anger and trait anxiety (also
conceptualized as negative affectivity, or NA). Others have been more cognitive by nature, such as
locus of control or hostile attribution bias.
Because models of aggression and CWB have linked these behaviors to negative emotions (e.g.,
Martinko, Gundlach, & Douglas, 2002; Spector & Fox, 2002), it would be expected that affective
dispositions would relate to CWB. Indeed studies have shown significant correlations between over-
all CWB and trait anxiety/NA (the tendency or predisposition to experience negative emotions; e.g.,
Goh, 2007; Penney & Spector, 2005) and trait anger (the tendency or predisposition to experience
situations as annoying; e.g., Goh, 2007; Miles, Borman, Spector, & Fox, 2002; Penney & Spector,
2005). In their meta-analysis, Hershcovis et al. (2007) reported similar relationships between NA
and CWBO (r = .24) versus CWBP (r = .22). However, trait anger related more strongly to CWBP
(mean r = .37) than CWBO (mean r = .28). A single study has linked CWB to the personality trait
of boredom proneness (Bruursema, 2007), and a single study has linked CWB to dispositional envy
(Cohen-Charash & Mueller, 2007). In all of these studies, the tendency to experience high levels of
negative emotion was associated with more frequent CWB.
Work locus of control is the tendency to believe that one does (internality) or does not (externality)
have control over work outcomes (Spector, 1988). Externality has been linked to overall CWB
(Storms & Spector, 1987) as well as CWBO (r = .32) and CWBP (r = .20) (Fox & Spector, 1999).
The connection between CWB and narcissism has been explored in two studies, one of which
found a significant relationship (r = .27; Penney & Spector, 2002), and one that did not, although the
correlation was in the same direction (r = .12; Judge, LePine, & Rich, 2006). Machiavellianism has
been studied in relation to CWB in at least one study. Goh (2007) found that individuals high on
this personality trait reported lower levels of overall CWB (r = −.23), although the correlation with
coworker reports of CWB was nonsignificant but in the same direction (r = −.08).
Aggression has been linked to attributions for the cause of negative events (Martinko et al.,
2002). Individual differences in the tendency to attribute hostile motives to others has been linked
to CWB in the expected direction, with those with hostile attribution bias reporting more CWB
(Douglas & Martinko, 2001; Goh, 2007; Hepworth & Towler, 2004). Goh (2007) found that those
high on this trait were also reported as having engaged in more CWB by coworkers.
Related to the domain of personality and how it impacts CWB is the notion of an aggressive
personality. Rather than identifying specific personality traits that relate to CWB, this approach
assesses aggressiveness itself as a trait. Research has considered the differential impact of implicit
(unconscious cognitions) and explicit components (conscious cognitions) of an aggressive personal-
ity on CWB. Bing et al. (2007) found support for the interactive impact of an implicit and explicit
aggressive personality on various forms of CWB. More specifically, for high implicit aggressive
personalities, as explicit aggressiveness increases CWB also increases, whereas for low implicit
personality, as explicit increases CWB decreases.

Cognitive Ability
Although the category of individual differences has been dominated by research on personal-
ity, recently research has considered cognitive ability as a correlate. In a predictive validity study
involving law enforcement job applicants, Dilchert, Ones, Davis, and Rostow (2007) reported nega-
tive relationships ranging in size from r = −.11 to r = −.20. On the other hand, Marcus and Schuler
(2004) were unable to find a significant correlation between cognitive ability and CWB.

Demographic and Background Variables


Research in criminology and deviance traditionally has reported that young males have a greater
likelihood of engaging in various forms of workplace deviance or crime in general (e.g., Hickman
& Piquero, 2001; Hollinger, Slora, & Terris, 1992; Smith & Visher, 1980; Steffensmeier & Allan,
Counterproductive Work Behavior and Withdrawal 497

1996), although the gender difference is smaller for less serious forms of crime (Steffensmeier &
Allan, 1996). Recent meta-analyses in the psychology and workplace deviance literatures found
support for some of these relationships and indicate that various demographic characteristics
relate to CWB. Specifically, older individuals, females, and individuals with more tenure and work
experience engage in less CWB (Berry et al., 2007). However, the relationships are not strong.
Furthermore, Hershcovis et al. (2007) found that females engaged in significantly less CWBP (mean
r = −.19) than males, but not CWBO (mean r = −.11). Research also indicates that factors assessed
during adolescence, such as childhood conduct disorder (.17) predict adulthood CWB (Roberts,
Harms, Caspi, & Moffitt, 2007).
Some research has also considered the fit between an individuals’ current job and their career
goals and how this fit or lack of fit may help us understand individuals’ tendencies toward CWB.
For example, Huiras, Uggen, and McMorris (2000) found support for the idea that the greater the fit
between a person’s current job and their long-term career goals the lower their CWB. Marcus and
Wagner (2007) reported that being in a preferred vocation has a negative and significant relationship
with CWB (r = −.19).
Research in the criminology and psychology literatures has shown that prior experience engag-
ing in criminal, aggressive, or deviant activities relates positively and significantly to CWB and
workplace errors. More specifically, Greenberg and Barling (1999) found that employees who had
a history of aggressive behavior are more likely to act aggressively toward coworkers. In an assess-
ment of 281 workplace violence incidents at a police department (ranging from obscene phone calls
to physical assaults), Scalora, Washington, Casady, and Newell (2003) reported that perpetrators in
the reported incidents who had prior criminal records were more likely to assault another employee.
Lastly, in a study of alcohol abuse of airline pilots, McFadden (1997) reported that pilots who had
prior convictions of driving while intoxicated had a greater risk of pilot-error accidents.
Evidence showing a positive relationship between prior history of deviant behavior and sub-
sequent CWB would support the use of background checks as part of employee selection sys-
tems, especially because the cost to organizations of negligent hiring is high. Organizations
can be sued for negligent hiring if they hire someone who has a record of criminal or violent
behavior into a position that might put the public (especially those who are vulnerable such
as children or elderly), other employees, or the organization at risk (e.g., of theft or assault;
Connerley, Arvey, & Bernardy, 2001). The scope of background checks can vary from confi rm-
ing prior employment or education to drug testing, criminal record checks, or motor vehicle
record checks. Research on the use of background checks in personnel selection is limited and
the research that does exist raises some questions about their reliability and validity. For exam-
ple, there is variability in the content of data included by different states in criminal records
(e.g., whether or not less serious offenses are recorded) and on how complete and up to date
the records are (Harris & Keller, 2005). Hence, more research is needed on the psychometric
properties of background checks.

Conclusions Regarding Individual Differences


This review indicates that a large body of research has emerged over the last several decades on
the role of personality and integrity tests in explaining and predicting CWB. From the perspective
of the FFM, we can say that three of the five factors—mainly conscientiousness, agreeableness,
and emotional stability—demonstrate medium to large relationships with CWB and that honesty-
humility shows some preliminary support. Personality-based integrity tests also have been shown
to be related significantly to CWB. In terms of individual personality traits, aggressiveness, trait
anxiety/NA, trait anger, hostile attribution bias, work locus of control, Machiavellianism, and
narcissism have all been linked to CWB. The wide range of personality variables that are related
to CWB suggests the possibility that there are various factors and motives that might be impor-
tant precursors. For example, conscientious individuals might be motivated to perform well and
avoid engaging in negative behaviors that might detract from achieving their task goals at work.
498 Handbook of Employee Selection

On the other hand, those high in trait anger are likely to respond emotionally to even minor
provocations and may have trouble controlling associated impulses to strike out at the organiza-
tion or other people.
Demographic characteristics have a smaller relationship with CWB, whereas prior history or
experience engaging in various forms of deviance or criminal activities show positive and sig-
nificant relationships with CWB. Furthermore, the fit between a person’s current job and career
goals can be helpful in explaining CWB. Moving forward, researchers should put aside assessing
linear relationships between these individual difference characteristics as they relate to CWB. More
fruitful research should investigate the conditions under which individuals who are predisposed to
engage in CWB/withdrawal actually do engage in these behaviors.

ATTITUDES AND EMOTIONS


Job attitudes have also been linked to CWB, including withdrawal. The notion that satisfied and
committed employees may engage in less CWB and may be more likely to stay with the organiza-
tion has been supported in meta-analytic work. With respect to CWB, uncorrected sample-size-
weighted correlations of −.29 have been reported for job satisfaction and −.28 for organizational
commitment (Dalal, 2005). Hershcovis et al. (2007) reported stronger mean correlations (uncor-
rected) in their meta-analysis for job satisfaction on CWBO (mean r = −.31) than CWBP (mean
r = −.14), with these two correlations being significantly different from one another. When under-
standing withdrawal behaviors, the largest coefficients (weighted average corrected correlations) are
reported for affective (−.56) and normative commitment (−.33) on withdrawal cognitions (Meyer,
Stanley, Herscovitch, & Topolnytsky, 2002). Smaller relationships have been reported for affec-
tive (−.17) and normative commitment (−.16) on actual turnover, affective commitment (−.15) on
absence, and continuance commitment (−.18) on withdrawal cognitions (Meyer et al., 2002; see
Table 23.1).
As noted earlier, emotions have played a prominent role in research and thinking about aggres-
sion and CWB. In particular, anger (Neuman & Baron, 1997) and feelings of frustration (Spector,
1975) have been emphasized, although more recent work suggests that a wider range of negative
emotions may be important (Spector & Fox, 2005). Empirical studies have shown that overall CWB
relates significantly to frustration (Marcus & Schuler, 2004) and a composite measure of negative
emotions (Bruursema & Spector, 2005; Miles et al., 2002). Studies that distinguish CWBO from
CWBP have uniformly found larger correlations for the former (e.g., Bruk-Lee & Spector, 2006; Fox
et al., 2001; Spector, Fox, Penney, Bruursema, Goh, & Kessler, 2006). For example, Fox, Spector,
Goh, and Bruursema (2007) found a correlation of .39 for CWBO and .19 for CWBP, suggesting
that behavior in response to negative emotions is more likely to be directed toward the inanimate
organization than toward other people.
Several studies have shown links between CWB and positive emotions, with correlations that are
opposite in sign and smaller in magnitude than with negative emotions. For example, Bruursema
and Spector (2005) found a correlation between overall CWB and positive emotion of −.15 versus
a correlation between overall CWB and negative emotion of .53. Studies that have distinguished
CWBO from CWBP have found significant negative correlations for the former, but not the later
(Bruursema & Spector, 2005; Fox et al., 2007; Fox et al., 2001).
Models of the CWB process have suggested that negative emotions mediate the relationship
between environmental/individual factors and the behavior itself (e.g., Martinko et al., 2002). There
is some supporting empirical evidence for the mediation role of negative emotion, although there
have been few tests reported and results for the available tests have been somewhat inconsistent. For
example, Fox and Spector (1999) found support using structural equation modeling for a model in
which frustration served as a mediator of the relationship between organizational constraints and
CWB. Fox et al. (2001) found evidence for mediation with some combinations of job stressors and
CWB, but not others.
Counterproductive Work Behavior and Withdrawal 499

Conclusions Regarding Attitudes and Emotions


Taken as a whole, research has shown a consistent link between CWB and attitudes and nega-
tive emotions. Individuals who have favorable attitudes and those who report infrequent negative
emotions at work tend to report infrequent CWB. Although most studies have relied entirely on
self-reports, there are a few studies showing these linkages with coworker reports of employee
CWB. For example, Bruk-Lee and Spector (2006) found significant correlations between negative
emotions and CWBO and CWBP. Although the correlation for CWBO was smaller for coworker
than self-reports (r = .24 vs. r = .41, respectively), the opposite was the case for CWBP (r = .25 vs.
r = .21, respectively). Likewise, Fox et al. (2007) found that job satisfaction was significantly related
to CWBO and CWBP reported by coworkers.

IMPLICATIONS FOR EMPLOYEE SELECTION


Various individual difference characteristics have been shown to be important for understanding
and explaining CWB. However, to what extent should any of them be included as predictors in
employee selection? The answer to this question requires a consideration of the predictors of the
broader domain of job performance, although a comprehensive discussion of these predictors is
beyond the scope of this chapter. A substantial amount of research has studied the criterion-related
validity of cognitive ability, personality, and integrity tests for predicting job performance. The
preponderance of this research, summarized in various articles, indicates that cognitive ability and
integrity tests are significant and meaningful predictors of job performance (e.g., Ones et al., 1993;
Schmidt & Hunter, 1998). Of the five factors of personality, the best predictor of job performance
across various jobs is conscientiousness, whereas the remaining four factors demonstrate criterion-
related validity within specific occupations (e.g., Barrick & Mount, 1991; Hough, Eaton, Dunnette,
Kamp, & McCloy, 1990). Furthermore, research indicates that integrity tests and conscientious-
ness have incremental validity over cognitive ability in predicting job performance (e.g., Ackerman
& Heggestad, 1997; Schmidt & Hunter, 1998). Given this pattern of findings combined with the
research support for these same individual difference characteristics predicting CWB, employers
may want to consider including integrity tests or a composite of conscientiousness, agreeableness,
and emotional stability in selection systems because these factors have been shown to predict the
domain of overall job performance and CWB. Employers may also want to consider including cog-
nitive ability because it has been shown to predict job performance and does not correlate highly
with integrity tests or the personality factors (Salgado, Viswesvaran, & Ones, 2002). Furthermore,
given the preliminary evidence of the importance of history of prior deviant behavior as an indicator
of future deviant behavior and the costs of negligent hiring for various stakeholders, it might prove
useful to include background checks in selection systems, especially for jobs that may put cowork-
ers or vulnerable populations at risk (e.g., elderly, children). However, more research is needed on
the reliability and validity of background checks. Nevertheless, even if employers try to decrease
the likelihood of CWB by keeping people who are more predisposed to engage in these behaviors
out of the workplace through valid selection systems, the workplace itself may present challenges or
triggers that drive employees to engage in CWB.

ENVIRONMENT
There are several environmental conditions that have been linked to CWB. Some of the origi-
nal work on human aggression suggested that frustrating situations or things that interfere with
ongoing behavior and future goals are important precursors to aggression, although more recent
work has expanded to include various other experiences that are stressful, represent attacks on the
individual, or are violation of norms (Neuman & Baron, 1997). In the following sections we will
discuss environmental conditions that have been linked to CWB including job stressors, justice, and
leadership.
500 Handbook of Employee Selection

STRESSORS
Stress has become an increasingly important topic of research over the years because of its poten-
tially negative consequences for employees and organizations. It has also been considered as a fac-
tor that explains CWB and withdrawal behaviors (Neuman & Baron, 1997; Spector & Fox, 2005).
Included are interpersonal stressors such as conflict, poor relationships with supervisors/coworkers,
and workplace abuse and incivility. Also important are organizational or task-related stressors such
as perceptions of job insecurity, organizational change, poor working conditions, organizational
constraints, role conflict, role ambiguity, and work overload. For the most part, the preponderance
of research has reported that various stressors do relate to increased CWB (see Table 23.1 for a
summary).
The most studied interpersonal stressor in relation to CWB has been interpersonal conflict among
employees. Hershcovis et al. (2007) conducted a meta-analysis that showed a .33 correlation (uncor-
rected) between conflict and CWBO and a significantly larger .40 correlation between conflict and
CWBP. The latter relationship should not be surprising because some of the behaviors people often
engage in during conflicts are forms of CWBP, such as insulting or undermining others. However,
measures of conflict used in these studies focus more on what the person has experienced from
others rather than what they have done themselves. More surprising is the possibility that conflict
at work results in CWB directed toward the organization itself. A study by Bruk-Lee and Spector
(2006) may help explain this phenomenon. They found that the person with whom employees have
conflict might determine the target of their CWB. Conflict with coworkers related more strongly
with CWBP than CWBO (r = .45 vs. r = .23, respectively), whereas the opposite was the case for
conflict with supervisors (r = .14 vs. r = .21, respectively). These authors suggested that employees
are likely to direct their CWB toward organizations when having conflicts with supervisors because
the supervisor is seen as an agent of the organization. It is also likely that inanimate targets are seen
as safer, because CWB directed toward the supervisor might not go unpunished.
Whereas conflicts involve two-way interactions among individuals, abusive behavior and inci-
vility are one-way exchanges in which the employee is subject to stressful mistreatment by others.
At times this might result in conflict, but in many cases, the employee is the almost passive target
of mistreatment. Abusive supervisors might yell, scream, or use power to mistreat and intimidate
employees (Ashforth, 1997; Keashly, 1998). Research has shown that abusive supervision has a more
pronounced effect on supervisor-directed CWB and for individuals who score high on negative reci-
procity (Mitchell & Ambrose, 2007). Other research found that abusive supervision and managerial
oversight increases CWB (Detert, Treviño, Burris, & Andiappan, 2007). Incivility, which involves
rude and unpleasant behavior that might in some cases be unintentional, has also been linked to
CWBO and CWBP (Goh, 2007; Penney & Spector, 2005).
The organizational stressor that comes closest to the classic definition of environmental frustra-
tion is organizational constraints, which are things in the work environment that interfere with task
performance. Hershcovis et al.’s (2007) meta-analysis found that organizational constraints corre-
lated significantly more strongly with CWBO (mean r = .31) than CWBP (mean r = .26). Workload
has been shown to correlate significantly with overall CWB (Goh, 2007; Miles et al., 2002) but
not CWBP (Chen & Spector, 1992). However, Galperin and Burke (2006) found that employees
who experienced workaholism, but who were very involved in their jobs and felt driven for high
performance, actually engaged in less CWB (they used the term deviance). Role ambiguity and role
conflict are significantly related to CWB (Chen & Spector, 1992).
Organizational restructuring and mergers appear to define the organizational landscape, making
perceptions of job insecurity and organizational politics common organizational stressors. Research
has shown a negative relationship between job insecurity and CWB. Individuals who perceived
more insecurity reported less CWB (−.21); that is, the more people perceive their job to be at risk,
the less likely they are to engage in CWB (Probst, Stewart, Gruys, & Tierney, 2007). Although this
finding may appear counterintuitive, it could be the case that individuals who fear that they may
Counterproductive Work Behavior and Withdrawal 501

lose their job as a result of being downsized or outsourced may avoid engaging in negative behaviors
that might increase their chances of being the ones who lose their jobs. A study of part-time work-
ers found that organizational politics related positively to psychological withdrawal (.26; daydream,
chat with coworker about non-work-related subjects) and antagonistic work behaviors (.23; argue
with coworker, gossip) (Cropanzano, Howes, Grandey, & Toth, 1997). However, politics was no
longer significant after controlling for organizational support.

ORGANIZATIONAL JUSTICE
Perceptions of injustice have also been prominent in research on CWB. The negative consequences
of employee perceptions of injustice have been the focus of a great deal of justice research leading
to four meta-analyses in which organizational justice-CWB/withdrawal coefficients were analyzed
(Berry et al., 2007; Cohen-Charash & Spector, 2001; Dalal, 2005; Hershcovis et al., 2007). Strong
support was reported for organizational justice (−.25, sample-size-weighted correlation; Dalal,
2005), procedural (−.28), and distributive justice (−.22) in explaining CWB (Cohen-Charash &
Spector, 2001). The relationships for procedural and distributive justice on CWB did not differ for
CWBP and CWBO. However, the relationship for interpersonal justice was lower for CWBO (−.06)
compared with CWBP (−.17), as was the relationship for interactional justice and CWBO (−.18)
compared with CWBP (−.22) (Berry et. al., 2007; Hershcovis et al., 2007). Meta-analytic findings
also indicate that distributive (r = −.41) and procedural justice (r = −.36) relate to withdrawal behav-
iors (Colquitt, Conlon, Wesson, Porter, & Ng, 2001). These results are summarized in Tables 23.1
and 23.2.
Some researchers have also studied mediators or moderators of the relationship between stres-
sors, organizational justice, and CWB. In a laboratory study, Colquitt, Scott, Judge, and Shaw
(2006) considered trust propensity, risk aversion, and morality as moderators of the relationship
between justice and CWB and reported significant interaction effects for trust propensity and risk
aversion. Cohen-Charash and Mueller (2007) considered envy and self-esteem as moderators and
reported that high perceived unfairness combined with high envy triggered more CWB, whereas
envy did not seem to matter in conditions of low perceived unfairness. In a second study, they found
support for a three-way interaction in which more CWB occurred when perceived unfairness, envy,
and self-esteem were all high.

LEADERSHIP
One might expect that the behavior of leaders would have an impact on their follower’s CWB. The
most studied leadership variable in relation to CWB has been charisma. Most studies have shown
that leaders who are perceived to be high on charisma have subordinates who report low frequen-
cies of CWB (e.g., Brown & Treviño, 2006; Hepworth & Towler, 2004; Judge et al, 2006). One
exception is Marcus and Schuler (2004), who failed to find this relationship. Bruursema and Spector
(2004) found that transformational and transactional leadership related oppositely to CWB, with
transactional leadership being associated with more frequent CWB. They also found that cowork-
er-reported transactional but not transformational leadership was significantly related to CWB of
the study’s subjects. Kwok, Au, and Ho (2005) found that when supervisors reacted negatively to
employee CWB (which they labeled formal normative control), employee CWB decreased even
when the number of employees in the group was large.

GROUP INFLUENCE
Individual behavior can be greatly affected by the behavior of coworkers, often transmitted through
norms for appropriate and inappropriate behavior. This possibility is supported by Robinson and
O’Leary-Kelly (1998), who showed that individuals in workgroups were more likely to engage in
502 Handbook of Employee Selection

CWB if coworkers were engaging in those behaviors. Dunlop and Lee (2004) investigated the fre-
quency of CWB in workgroups, finding more between- than within-workgroup variance, suggesting
that CWB can be a group-level phenomenon.

CONCLUSIONS REGARDING THE ENVIRONMENT


There are many features of the environment in organizations that influence employee behaviors.
To date, the major focus has been on stressors and perceptions of injustice as they relate to CWB.
Less attention has been paid to the impact of leadership and group process, although the findings
to date are promising. Insufficient research has studied the underlying mechanisms by which envi-
ronmental conditions are appraised as stressful and how those perceptions result in behavior. Even
less research has dealt with boundary conditions that might lessen the impact of the environment on
CWB. There has been some attention paid to the interaction of environment and person, which we
will discuss in the next section.

PERSON AND ENVIRONMENT


The preceding discussion highlighted the independent impact of various individual difference char-
acteristics, attitudes, emotions, or environmental antecedents on employee CWB and withdrawal
behaviors. We recognize that rarely do individuals act irrespective of the environment, and so it is
important to consider the interaction of the person and the environment.
There have been several studies that have explored the possible moderating effects of personality
on the relationship between environmental conditions and CWB. For example, Colbert, Mount,
Harter, Witt, and Barrick (2004) considered the interactive influence of personality on perceptions of
the developmental environment and perceived organizational support. They found that withholding
effort increased for individuals who are high on conscientiousness or emotional stability when
their perceptions of the developmental environment are low or when agreeableness and perceived
organizational support were low. Mount, Ilies, and Johnson (2006) found that job satisfaction
mediated the relationship between agreeableness and CWB.
Marcus and Schuler (2004) simultaneously considered various individual and situational anteced-
ents in the prediction of what they labeled generalized counterproductive behavior. They organized
the various antecedents using a taxonomy in which the dimensions reflected person (disposition)/
situation approaches and motivation/control approaches (Marcus, 2001). A motivation approach
assumes that individuals are motivated to engage in CWB by either an external force or an internal
drive, whereas a control approach emphasizes the importance of preventing CWB via restraints
or external factors that inhibit this behavior. The two dimensions present four possible categories:
triggers (situation/motivation), opportunity (situation/control), internal control (disposition/control),
and propensity (disposition/motivation). In a sample of German employees, they reported that CWB
was most prevalent when internal and external control was low or when internal control was low and
triggers were high. In a subsequent study, Marcus and Wagner (2007) reported that job satisfaction
mediated the relationship between triggers and CWB.
Some studies have investigated the moderating effect of individual personality variables. Penney
and Spector (2005) found that NA moderated the relationship between incivility and CWB such
that CWB was higher for individuals high on this personality variable. Fox et al. (2001) found that
trait anxiety moderated the relationship of interpersonal conflict and organizational constraints
with CWBP but not CWBO. Trait anger moderated the relationship between interpersonal conflict
and CWBO but not CWBP. Similarly, Goh (2007) found that hostile attribution bias moderated
the relationship between interpersonal conflict and organizational constraints with overall CWB.
The relationships were stronger for those high on these personality variables. Penney and Spector
(2002) reported a significant moderator effect of narcissism on the organizational constraint-CWB
relationship, with those high on the personality construct showing a stronger positive relationship.
Counterproductive Work Behavior and Withdrawal 503

Finally, locus of control was shown to moderate the relationship between experienced frustration
and CWB (Storms & Spector, 1987).

IMPLICATIONS FOR EMPLOYEE SELECTION


The situation in which employees perform their jobs also plays a role in explaining their tendencies
toward CWB. Regardless of whether or not individuals are predisposed toward CWB, an environ-
ment in which interpersonal conflict, organizational constraints, or perceptions of injustice are pres-
ent increases the likelihood that employees engage in negative behavior. Furthermore, individual
differences such as personality interact with the environment to increase these tendencies even
more. If individuals who are predisposed to engage in CWB are placed in an environment that also
triggers CWB, they are even more likely to engage in CWB. Thus selection would be most impor-
tant in situations that are inherently stressful, such as the hiring of police officers. On the other
hand, there is a danger in assuming that selection is a sufficient solution to the problem of CWB.
Steps should also be taken to reduce triggers as much as possible; for example, by reducing unneces-
sary stressors, treating employees fairly, and mediating disputes among coworkers.

CONSEQUENCES
Very limited research has considered the consequences of CWB, probably because it is assumed
that the outcomes of CWB are negative regardless of the type of CWB that occurs. Nevertheless, it
is useful to study these outcomes because CWBs can impact various stakeholders, including organi-
zations, employees, coworkers, and customers, and the consequences can be severe. For example, in
the food industry it has been shown that employee CWB can affect organization-level outcomes. In
a study of staff and supervisors from an Australian fast-food chain, Dunlop and Lee (2004) reported
that CWBs engaged in by staff members resulted in lower supervisor-rated business unit perfor-
mance, longer drive-through service time, and higher unexplained food figures. Detert, Trevino,
Burris, and Andiappan (2007) found that CWB in restaurants (measured as food loss) negatively
impacted restaurant profitability and customer satisfaction.
Employee theft continues to cost organizations and even consumers billions of dollars. For some
organizations it may even result in bankruptcy, and for consumers it may result in higher prices on
goods and services (Langton & Hollinger, 2005; Payne & Gainey, 2004). Popular strategies that are
believed to reduce theft are not always effective. Research indicates that firms with higher shrink-
age are more likely to use security tags, closed-circuit television, observation mirrors, newsletters,
anonymous telephone lines, and conduct surveys on loss prevention than firms with lower shrink-
age, although these devices are believed to deter theft. The firms with low shrinkage are more likely
to employ criminal background checks, driving history checks, and drug screening (Langton &
Hollinger, 2005). These firms also report lower turnover among sales associates and management
and have fewer employees who work part-time (Langton & Hollinger, 2005). Thus, strategies that
seek to prevent high-risk employees from joining the firm appear to be associated with low shrink-
age rates, whereas asset control strategies appear to be less effective.
Service sabotage is a form of sabotage committed by employees who work in service settings.
It can consist of deliberate negative behavior directed at customers such as taking revenge, slow-
ing down the service, or ignoring rules (Harris & Ogbonna, 2002). Although it would appear that
this type of sabotage is likely to negatively impact the quality of the service that customers receive
and eventually firm performance, it has been shown that service sabotage can have positive effects
on the employees who engage in it because it raises their status, self-esteem, and job satisfaction
and even lowers their stress (Harris & Ogbonna, 2002). Research found support for these relation-
ships and also showed that employees who engage in this type of sabotage report lower perceptions
of customer service, lower perceptions of employee-customer rapport, and lower perceptions of
company performance (Harris & Ogbonna, 2006). Hence, future research may want to study more
504 Handbook of Employee Selection

directly what employees perceive to be the benefits of engaging in various forms of CWB. Whereas
in the case of merchandise theft the rewards may be obvious, other forms of CWB may present
psychological or emotional benefits. For example, the research on revenge and retaliation (Bies &
Tripp, 1996; Skarlicki & Folger, 1997) suggests that there are some benefits (e.g., evening the score),
but there may be others to consider.
As noted earlier, withdrawal can take many forms, including psychological withdrawal, late-
ness, absence, and actual turnover. Traditionally, measuring the consequences of withdrawal
focuses primarily on the financial costs associated with the loss in productivity of the absent
employee. However, recent research has considered a wider range of consequences that span
economic, financial, and noneconomic. More specifically, the indirect costs associated with how
the withdrawal behavior affects coworkers and their propensity to engage in similar forms of
withdrawal behaviors have been considered along with the indirect costs that arise when the
withdrawal behavior escalates to other more serious forms of withdrawal and eventually turnover
(Birati & Tziner, 1996). Sagie, Birati, and Tziner (2002) computed the total combined cost of dif-
ferent forms of withdrawal across employees in a medium-sized, Israeli high-tech company and
reported an estimate of $2.8 million (U.S.). This estimate includes the costs of turnover, absence,
lateness, replacement costs, and withholding effort, as well as psychological withdrawal in the
form of low performance.
Several of the antecedents of CWB that were summarized in the preceding sections may also
become consequences of CWB. As noted earlier, the causal direction of the relationship between
various antecedents of CWB and outcomes is not always clear. Organizational commitment, job
satisfaction, and perceptions of injustice may not only influence the extent to which individuals
engage in negative behaviors but these attitudes and perceptions may also change as a result of the
CWBs experienced by employees. That is, a highly satisfied and committed employee may become
less satisfied and committed if he or she suddenly or continuously is a victim of or observes CWB
by others, especially if the CWB is not addressed or dealt with by the leadership. Research has only
begun to consider these potential outcomes and indicates that employees who have been physically
attacked, been threatened, or experienced abusive supervision in the workplace report lower levels of
job or life satisfaction, lower commitment, greater job stress, lower physical and psychological well-
being, and are more likely to consider changing jobs or leaving the organization (Budd, Arvey, &
Lawless, 1996; Rogers & Kelloway, 1997; Tepper, 2000). Lapierre, Spector, and Leck (2005) con-
sidered the impact of sexual and nonsexual aggression on job satisfaction in a meta-analysis and
reported sizeable mean sample-size-weighted correlations ranging from −.29 (nonsexual aggression
male-only sample) to −.41 (nonsexual aggression female-only sample). Employees who reported
experiencing a personal offense were more likely to seek revenge if they blamed the offender, if the
offender’s status was lower than their own, and in general for lower- compared with higher-status
employees (Aquino, Tripp, & Bies, 2001).
In addition to negatively impacting the job attitudes of employees who are victims of coworker
CWB, employees who engage in CWB also run the risk of receiving lower ratings of overall job per-
formance by their managers. Rotundo and Sackett (2002) reported that employee CWBs decreased
their global ratings of job performance and explained on average 30% of the variation in these rat-
ings. These results generalized partially to Chinese managers, where on average 17% of the varia-
tion in ratings of overall job performance provided by managers was explained by the employee’s
CWB (Rotundo & Xie, 2008).
Employees who work for institutions that are suspected of and fined for white-collar crime
(as a result of the CWB of key executives or leaders) suffer numerous negative consequences. In
a case study of the closure of a bank that was suspected of false accounting, money laundering,
fraud, and conspiracy, employees who worked at the bank reported significant financial, emo-
tional, and psychological effects of the bank’s closure (Spalek, 2001). For example, employees
lost their jobs, access to their own funds and mortgage accounts, and experienced depression,
anxiety, and anger as a result of the crimes and subsequent bank closure (Spalek, 2001). Some
Counterproductive Work Behavior and Withdrawal 505

employees even found it difficult to find employment because of the stigma of being associated
with the bank (Spalek, 2001).
Individuals may have a history of engaging in negative behaviors during the course of their
lifespan and a question that arises is to what extent the early life experiences of rule-breaking
behavior predict outcomes later in life. In a longitudinal twin study Avolio, Rotundo, and
Walumbwa (2009) found that serious rule-breaking behavior (e.g., picked up by police, theft, drug
use) engaged in before high school had a significant and negative relationship with leadership
roles in adulthood, whereas modest rule-breaking (e.g., intentionally break window, skip school
without permission from parents) had a significant and positive relationship with leadership roles.
These preliminary results suggest that individuals who question and challenge boundaries of
various institutions to the extent that they break rules but do not go beyond the limits of the law
early in life may not devoid themselves of important career opportunities or of leadership posi-
tions later in life.

FUTURE DIRECTIONS
Interest in the study of CWB is a relatively recent phenomenon with absenteeism, turnover, and
theft attracting the earliest attention. More recently, aggression, deviance, and dishonesty along
with various related behaviors and constructs have been added to the mix and can be grouped under
the broad umbrella of CWB. The previous sections indicate that a significant amount of research
has emerged that seeks to clarify the definition and dimensionality of this construct as well as envi-
ronmental and individual antecedents of CWB. Less research has focused on understanding the
impact that these behaviors have on organizations, employees, coworkers, and customers and the
mechanisms through which CWBs have this effect. Almost all of this research is cross-sectional
in nature, in which antecedents and outcomes are assessed at the same time (with the exception of
some integrity test validation studies), limiting conclusions about the casual order of these variables.
It is well known that employees’ perceptions, attitudes, and behaviors are continuously evolving and
are influenced by daily events at work and their history with the employer. Hence, future research
needs to capture these dynamics in the measurement of CWB and its various antecedents and
consequences.
The global economy adds diversity of cultures to the workplace and raises questions about the
generalizability of many of the relationships summarized in this chapter that come almost exclu-
sively from North America. Some research has begun to study CWBs in other cultures (e.g., Lee
et al., 2005a; Lee et al., 2005b; Marcus et al., 2007; Rotundo & Xie, 2008). More specifically, Rotundo
and Xie (2008) investigated the dimensionality of CWBs in China using samples of Chinese manag-
ers and professionals. They reported a two-dimensional solution, in which one dimension is defined
by the task relevance of the CWB and the second dimension by the interpersonal/organizational
target of the CWB. These results are similar to what has been reported in prior research conducted
in Western cultures (Bennett & Robinson, 2000; Gruys & Sackett, 2003).
As noted throughout, the targets of CWB that have been studied so far include individuals
(in the form of CWBP) or organizations (in the form of CWBO). However, groups continue to
serve an important function in the workplace and a question that arises is when work groups (or
individual members of workgroups) can become targets of CWB, which we subsequently label
“intergroup CWB.” An important avenue for future research on CWB is to consider factors that
influence group members to engage in CWB directed at another workgroup or its members. Enns
(2007) provided a theoretical account of intergroup CWB engaged in by employees on the basis of
their identification with and membership in a workgroup. Enns and Rotundo (2007) reported that
participants who are members of workgroups express a greater willingness to engage in intergroup
CWB in conditions of realistic conflict and relative deprivation and that self-categorization moder-
ates this relationship. Future research can continue to study the antecedents and consequences of
intergroup CWB.
506 Handbook of Employee Selection

CONCLUSIONS
Given the paucity of research on CWB prior to the 1980s, it is difficult to know the extent to which it
is a growing problem. However, what is certain is that awareness of CWB has grown in the research
literature and popular press. Terms to refer to subsets of CWB, such as “desk rage” can be seen
in the media today. Articles on the topic can be found in all of the major journals with increasing
frequency.
What we have learned is that CWB is associated with many negative experiences at work,
including injustice and job stressors such as interpersonal conflict and organizational constraints.
Furthermore, there are considerable individual differences in employee tendencies to engage in
such behavior. For example, individuals high on hostile attribution bias, trait anger, and trait anxi-
ety are prone to CWB. The limited work that has been done shows connections between CWB
and negative consequences for employees and organizations. The body of work that has emerged
over the decades provides important insight into likely antecedents and consequences of CWB.
Nevertheless, numerous questions remain unanswered and the behaviors still occur. Thus, the need
for research continues.

REFERENCES
Ackerman, P. L., & Heggestad, E. D. (1997). Intelligence, personality, and interests: Evidence for overlapping
traits. Psychological Bulletin, 121, 219–245.
Aquino, K., Tripp, T. M., & Bies, R. J. (2001). How employees respond to personal offense: The effects of
blame attribution, victim status, and offender status on revenge and reconciliation in the workplace.
Journal of Applied Psychology, 86, 52–59.
Ashforth, B. E. (1997). Petty tyranny in organizations: A preliminary examination of antecedents and conse-
quences. Canadian Journal of Administrative Sciences, 14, 126–140.
Ashton, M. C., Lee, K., & Son, C. (2000). Honesty as the sixth factor of personality: Correlations with
Machiavellianism, primary psychopathy, and social adroitness. European Journal of Personality, 14,
359–369.
Avolio, B. J., Rotundo, M., & Walumbwa, F. (2009). Early life experiences as determinants of leadership role occu-
pancy: The importance of parental influence and rule breaking behavior. Leadership Quarterly, 20, 329–342.
Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis.
Personnel Psychology, 44, 1–26.
Bennett, R. J., & Robinson, S. L. (2000). Development of a measure of workplace deviance. Journal of Applied
Psychology, 85, 349–360.
Berry, C. M., Ones, D. S., & Sackett, P. R. (2007). Interpersonal deviance, organizational deviance, and their
common correlates: A review and meta-analysis. Journal of Applied Psychology, 92, 410–424.
Berry, C. M., Sackett, P. R., & Wiemann, S. (2007). A review of recent developments in integrity test research.
Personnel Psychology, 60, 271–301.
Bies, R. J., & Tripp, T. M. (1996). Beyond distrust: “Getting even” and the need for revenge. In R. M. Kramer, &
T. R. Tyler (Eds.), Trust in organizations: Frontiers of theory and research (pp. 246–260). Thousand
Oaks, CA: Sage.
Bills, M. A. (1925). Social status of the clerical worker and his permanence on the job. Journal of Applied
Psychology, 9, 424–427.
Bing, M. N., Stewart, S. M., Davison, H. K., Green, P. D., McIntyre, M. D., & James, L. R. (2007). An integra-
tive typology of personality assessment for aggression: Implications for predicting counterproductive
workplace behavior. Journal of Applied Psychology, 92, 722–744.
Birati, A., & Tziner, A. (1996). Withdrawal behavior and withholding efforts at work (WBWEW): Assessing
the financial cost. Human Resource Management Review, 6, 305–314.
Brown, M. E., & Treviño, L. K. (2006). Socialized charismatic leadership, values congruence, and deviance in
work groups. Journal of Applied Psychology, 91, 954–962.
Brown, T. S., Jones, J. W., Terris, W., & Steffy, B. D. (1987). The impact of pre-employment integrity testing on
employee turnover and inventory shrinkage losses. Journal of Business and Psychology, 2, 136–149.
Bruk-Lee, V., & Spector, P. E. (2006). The social stressors-counterproductive work behaviors link: Are conflicts
with supervisors and coworkers the same? Journal of Occupational Health Psychology, 11, 145–156.
Counterproductive Work Behavior and Withdrawal 507

Bruursema, K. (2007). How individual values and trait boredom interface with job characteristics and job bore-
dom in their effects on counterproductive work behavior. ProQuest Information & Learning. Dissertation
Abstracts International: Section B: The Sciences and Engineering, 68 (4-B) (Electronic; Print)
Bruursema, K. & Spector, P. E. (2005, April). Leadership style and the link with counterproductive work
behavior (CWB). Paper presented at the 20th Annual Conference of the Society for Industrial and
Organizational Psychology, Los Angeles, CA.
Budd, J. W., Arvey, R. D., & Lawless, P. (1996). Correlates and consequences of workplace violence. Journal
of Occupational Health Psychology, 1, 197–210.
Chadwick-Jones, J. K., Brown, C. A., Nicholson, N., & Sheppard, C. (1971). Absence measures: Their reliability
and stability in an industrial setting. Personnel Psychology, 24, 463–470.
Chen, P. Y., & Spector, P. E. (1992). Relationships of work stressors with aggression, withdrawal, theft and
substance use: An exploratory study. Journal of Occupational and Organizational Psychology, 65(3),
177–184.
Cohen-Charash, Y., & Mueller, J. S. (2007). Does perceived unfairness exacerbate or mitigate interpersonal
counterproductive work behaviors related to envy? Journal of Applied Psychology, 92, 666–680.
Cohen-Charash, Y., & Spector, P. E. (2001). The role of justice in organizations: A meta-analysis. Organizational
Behavior and Human Decision Processes, 86, 278–321.
Colbert, A. E., Mount, M. K., Harter, J. K., Witt, L. A., & Barrick, M. R. (2004). Interactive effects of person-
ality and perceptions of the work situation on workplace deviance. Journal of Applied Psychology, 89,
599–609.
Collins, J. M., & Griffin, R. W. (1998). The psychology of counterproductive job performance. In R. W. Griffin,
A. O’Leary-Kelly & J. M. Collins (Eds.), Dysfunctional behavior in organizations: Violent and deviant
behavior (pp. 219–242). New York, NY: Elsevier Science/JAI Press.
Colquitt, J. A., Conlon, D. E., Wesson, M. J., Porter, C. O. L. H., & Ng, K. Y. (2001). Justice at the millenium:
A meta-analytic review of 25 years of organizational justice research. Journal of Applied Psychology,
86, 425–445.
Colquitt, J. A., Scott, B. A., Judge, T. A., & Shaw, J. C. (2006). Justice and personality: Using integrative theories
to derive moderators of justice effects. Organizational Behavior and Human Decision Processes, 100,
110–127.
Connerley, M. L., Arvey, R. D., & Bernardy, C. J. (2001). Criminal background checks for prospective and
current employees: Current practices among municipal agencies. Public Personnel Management, 30,
173–183.
Cropanzano, R., Howes, J. C., Grandey, A. A., & Toth, P. (1997). The relationship of organizational politics and
support to work behaviors, attitudes, and stress. Journal of Organizational Behavior, 18, 159–180.
Dalal, R. S. (2005). A meta-analysis of the relationship between organizational citizenship behavior and
counterproductive work behavior. Journal of Applied Psychology. Special Section: Theoretical Models
and Conceptual Analyses—Second Installment, 90, 1241–1255.
Detert, J. R., Treviño, L. K., Burris, E. R., & Andiappan, M. (2007). Managerial modes of influence and
counterproductivity in organizations: A longitudinal business-unit-level investigation. Journal of Applied
Psychology, 92, 993–1005.
Dilchert, S., Ones, D. S., Davis, R. D., & Rostow, C. D. (2007). Cognitive ability predicts objectively measured
counterproductive work behaviors. Journal of Applied Psychology, 92, 616–627.
Douglas, S. C., & Martinko, M. J. (2001). Exploring the role of individual differences in the prediction of
workplace aggression. Journal of Applied Psychology, 86, 547–559.
Duffy, M. K., Ganster, D. C., & Shaw, J. D. (1998). Positive affectivity and negative outcomes: The role of
tenure and job satisfaction. Journal of Applied Psychology, 83, 950–959.
Dunlop, P. D., & Lee, K. (2004). Workplace deviance, organizational citizenship behavior, and business unit
performance: The bad apples do spoil the whole barrel. Journal of Organizational Behavior, 25, 67–80.
Einarsen, S., Raknes, B. I., Matthiesen, S. B., & Hellesoy, O. H. (1996). Helsemessige aspekter ved mobbing
i arbeidslivet. modererende effekter av sosial stotte og personlighet. [The health-related aspects of
bullying in the workplace: The moderating effects of social support and personality.] Nordisk Psykologi,
48, 116–137.
Enns, J. R. (2007). The roles of realistic conflict and relative deprivation in explaining counterproductive
work behavior. ProQuest Information & Learning. Dissertation Abstracts International: Section B: The
Sciences and Engineering, 68 (1-B) (Electronic; Print)
Enns, J. R., & Rotundo, M. (2007). Intergroup counterproductive work behavior: Effects of conflict and
self-categorization. Paper presented at the 67th Annual Meeting of the Academy of Management, Phil-
adelphia, PA.
508 Handbook of Employee Selection

Fox, S., & Spector, P. E. (1999). A model of work frustration-aggression. Journal of Organizational Behavior,
20, 915–931.
Fox, S., & Spector, P. E. (Eds.). (2005). Counterproductive work behavior: Investigations of actors and targets.
Washington, DC: American Psychological Association.
Fox, S., Spector, P. E., Goh, A., & Bruursema, K. (2007). Does your coworker know what you’re doing?
Convergence of self- and peer-reports of counterproductive work behavior. International Journal of
Stress Management, 14, 41–60.
Fox, S., Spector, P. E., & Miles, D. (2001). Counterproductive work behavior (CWB) in response to job stressors
and organizational justice: Some mediator and moderator tests for autonomy and emotions. Journal of
Vocational Behavior, 59, 291–309.
Galperin, B. L., & Burke, R. J. (2006). Uncovering the relationship between workaholism and workplace
destructive and constructive deviance: An exploratory study. International Journal of Human Resource
Management, 17, 331–347.
Giacalone, R. A., & Greenberg, J. (Eds.). (1997). Antisocial behavior in organizations. Thousand Oaks, CA:
Sage.
Goh, A. (2007). An attributional analysis of counterproductive work behavior (CWB) in response to occupa-
tional stress. ProQuest Information & Learning. Dissertation Abstracts International: Section B: The
Sciences and Engineering, 68 (4-B) (Electronic; Print).
Greenberg, L., & Barling, J. (1999). Predicting employee aggression against coworkers, subordinates and
supervisors: The roles of person behaviors and perceived workplace factors. Journal of Organizational
Behavior, 20, 897–913.
Griffin, R. W., O’Leary-Kelly, A., & Collins, J. M. (Eds.). (1998). Dysfunctional behavior in organizations:
Violent and deviant behavior. New York, NY: Elsevier Science/JAI Press.
Gruys, M. L. (2000). The dimensionality of deviant employee behavior in the workplace. ProQuest
Information & Learning. Dissertation Abstracts International: Section B: The Sciences and Engineering,
61 (5-B) (Electronic; Print).
Gruys, M. L., & Sackett, P. R. (2003). Investigating the dimensionality of counterproductive work behavior.
International Journal of Selection and Assessment, 11, 30–42.
Harris, L. C., & Ogbonna, E. (2002). Exploring service sabotage: The antecedents, types and consequences of
frontline, deviant, antiservice behaviors. Journal of Service Research, 4(3), 163–183.
Harris, L. C., & Ogbonna, E. (2006). Service sabotage: A study of antecedents and consequences. Journal of
the Academy of Marketing Science, 34, 543–558.
Harris, P. M., & Keller, K. S. (2005). Ex-offenders need not apply: The criminal background check in hiring
decisions. Journal of Contemporary Criminal Justice, 21, 6–30.
Hepworth, W., & Towler, A. (2004). The effects of individual differences and charismatic leadership on workplace
aggression. Journal of Occupational Health Psychology, 9, 176–185.
Hershcovis, M. S., Turner, N., Barling, J., Arnold, K. A., Dupré, K. E., Inness, M., et al. (2007). Predicting
workplace aggression: A meta-analysis. Journal of Applied Psychology, 92, 228–238.
Hickman, M., & Piquero, A. (2001). Exploring the relationships between gender, control balance, and deviance.
Deviant Behavior, 22, 323–351.
Hogan, J., & Hogan, R. (1989). How to measure employee reliability. Journal of Applied Psychology, 74,
273–279.
Hollinger, R. C., & Clark, J. P. (1982). Formal and informal social controls of employee deviance. Sociological
Quarterly, 23, 333–343.
Hollinger, R. C., Slora, K. B., & Terris, W. (1992). Deviance in the fast-food restaurant: Correlates of employee
theft, altruism, and counterproductivity. Deviant Behavior, 13, 155–184.
Hollwitz, J. C. (1999). Investigations of a structured interview for pre-employment integrity screening. ProQuest
Information & Learning. Dissertation Abstracts International: Section B: The Sciences and Engineering,
60 (1-B).
Horn, J., Nelson, C. E., & Brannick, M. T. (2004). Integrity, conscientiousness, and honesty. Psychological
Reports, 95, 27–38.
Hough, L. M., Eaton, N. K., Dunnette, M. D., Kamp, J. D., & McCloy, R. A. (1990). Criterion-related validi-
ties of personality constructs and the effect of response distortion on those validities. Journal of Applied
Psychology, 75, 581–595.
Huiras, J., Uggen, C., & McMorris, B. (2000). Career jobs, survival jobs, and employee deviance: A social
investment model of workplace misconduct. Sociological Quarterly, 41, 245–263.
Johns, G. (1994). How often were you absent? A review of the use of self-reported absence data. Journal of
Applied Psychology, 79, 574–591.
Counterproductive Work Behavior and Withdrawal 509

Jones, J. W. (1981). Dishonesty, burnout, and unauthorized work break extensions. Personality and Social
Psychology Bulletin, 7, 406–409.
Judge, T. A., LePine, J. A., & Rich, B. L. (2006). Loving yourself abundantly: Relationship of the narcissistic
personality to self- and other perceptions of workplace deviance, leadership, and task and contextual
performance. Journal of Applied Psychology, 91, 762–776.
Keashly, L. (1997). Emotional abuse in the workplace: Conceptual and empirical issues. Journal of Emotional
Abuse, 1, 85–117.
Knorz, C., & Zapf, D. (1996). Mobbing—eine extreme form sozialer stressoren am arbeitsplatz. [Mobbing:
A severe form of social stressors at work.] Zeitschrift Für Arbeits- Und Organisationspsychologie, 40,
12–21.
Kwok, C., Au, W. T., & Ho, J. M. C. (2005). Normative controls and self-reported counterproductive behaviors
in the workplace in china. Applied Psychology: An International Review, 54, 456–475.
Langton, L., & Hollinger, R. C. (2005). Correlates of crime losses in the retail industry. Security Journal, 18(3),
27–44.
Lapierre, L. M., Spector, P. E., & Leck, J. D. (2005). Sexual versus nonsexual workplace aggression and
victims’ overall job satisfaction: A meta-analysis. Journal of Occupational Health Psychology, 10,
155–169.
Lee, K., Ashton, M. C., & de Vries, R. E. (2005a). Predicting workplace delinquency and integrity with the
HEXACO and five-factor models of personality structure. Human Performance, 18(2), 179–197.
Lee, K., Ashton, M. C., & Shin, K. (2005b). Personality correlates of workplace anti-social behavior. Applied
Psychology: An International Review, 54, 81–98.
Marcus, B. (2001). Approaches to the explanation of counterproductive behaviors in organizations. In R.K.
Silbereisen (Ed.), Psychologie 2000 (pp. 414–425). Lengerich, Germany: Pabst.
Marcus, B., Lee, K., & Ashton, M. C. (2007). Personality dimensions explaining relationships between integ-
rity tests and counterproductive behavior: Big five, or one in addition? Personnel Psychology, 60, 1–34.
Marcus, B., & Schuler, H. (2004). Antecedents of counterproductive behavior at work: A general perspective.
Journal of Applied Psychology, 89, 647–660.
Marcus, B., & Wagner, U. (2007). Combining dispositions and evaluations of vocation and job to account
for counterproductive work behavior in adolescent job apprentices. Journal of Occupational Health
Psychology, 12(2), 161–176.
Martinko, M. J., Gundlach, M. J., & Douglas, S. C. (2002). Toward an integrative theory of counterproductive
workplace behavior: A causal reasoning perspective. International Journal of Selection and Assessment.
Special Issue: Counterproductive Behaviors at Work, 10, 36–50.
McFadden, K. L. (1997). Policy improvements for prevention of alcohol misuse by airline pilots. Human
Factors, 39, 1–8.
Meyer, J. P., Stanley, D. J., Herscovitch, L., & Topolnytsky, L. (2002). Affective, continuance, and normative
commitment to the organization: A meta-analysis of antecedents, correlates, and consequences. Journal
of Vocational Behavior, 61, 20–52.
Mikulay, S. M., & Goffin, R. D. (1998). Measuring and predicting counterproductivity in the laboratory using
integrity and personality testing. Educational and Psychological Measurement, 58, 768–790.
Miles, D. E., Borman, W. C., Spector, P. E., & Fox, S. (2002). Building an integrative model of extra role work
behaviors: A comparison of counterproductive work behavior with organizational citizenship behavior.
International Journal of Selection and Assessment. Special Issue: Counterproductive Behaviors at Work,
10, 51–57.
Mitchell, M. S., & Ambrose, M. L. (2007). Abusive supervision and workplace deviance and the moderating
effects of negative reciprocity beliefs. Journal of Applied Psychology, 92, 1159–1168.
Mount, M., Ilies, R., & Johnson, E. (2006). Relationship of personality traits and counterproductive work
behaviors: The mediating effects of job satisfaction. Personnel Psychology, 59, 591–622.
Neuman, J. H., & Baron, R. A. (1997). Aggression in the workplace. In R. A. Giacalone, & J. Greenberg (Eds.),
Antisocial behavior in organizations (pp. 37–67). Thousand Oaks, CA: Sage.
Nicol, A. A. M., & Paunonen, S. V. (2002). Overt honesty measures predicting admissions: An index of validity
or reliability. Psychological Reports, 90, 105–115.
Ones, D. S., & Viswesvaran, C. (1998). Integrity testing in organizations. In R. W. Griffin, A. O’Leary-Kelly
& J. M. Collins (Eds.), Dysfunctional behavior in organizations: Violent and deviant behavior (pp. 243–
276). New York, NY: Elsevier Science/JAI Press.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities:
Findings and implications for personnel selection and theories of job performance. Journal of Applied
Psychology, 78, 679–703.
510 Handbook of Employee Selection

Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (2003). Personality and absenteeism: A meta-analysis of
integrity tests. European Journal of Personality. Special Issue: Personality and Industrial, Work and
Organizational Applications, 17(Suppl. 1), S19-S38.
Payne, B. K., & Gainey, R. R. (2004). Ancillary consequences of employee theft. Journal of Criminal Justice,
32, 63–73.
Pearson, C. M., Andersson, L. M., & Porath, C. L. (2000). Assessing and attacking workplace incivility.
Organizational Dynamics, 29(2), 123–137.
Penney, L. M., & Spector, P. E. (2002). Narcissism and counterproductive work behavior: Do bigger egos mean
bigger problems? International Journal of Selection and Assessment. Special Issue: Counterproductive
Behaviors at Work, 10, 126–134.
Penney, L. M., & Spector, P. E. (2005). Job stress, incivility, and counterproductive work behavior (CWB): The
moderating role of negative affectivity. Journal of Organizational Behavior, 26, 777–796.
Probst, T. M., Stewart, S. M., Gruys, M. L., & Tierney, B. W. (2007). Productivity, counterproductivity and
creativity: The ups and downs of job insecurity. Journal of Occupational and Organizational Psychology,
80, 479–497.
Roberts, B. W., Harms, P. D., Caspi, A., & Moffitt, T. E. (2007). Predicting the counterproductive employee in
a child-to-adult prospective study. Journal of Applied Psychology, 92, 1427–1436.
Robinson, S. L., & Bennett, R. J. (1995). A typology of deviant workplace behaviors: A multidimensional
scaling study. Academy of Management Journal, 38, 555–572.
Robinson, S. L., & O’Leary-Kelly, A. M. (1998). Monkey see, monkey do: The influence of work groups on the
antisocial behavior of employees. Academy of Management Journal, 41, 658–672.
Rogers, K., & Kelloway, E. K. (1997). Violence at work: Personal and organizational outcomes. Journal of
Occupational Health Psychology, 2, 63–71.
Rosenbaum, R. W. (1976). Predictability of employee theft using weighted application blanks. Journal of
Applied Psychology, 61, 94–98.
Rotundo, M., & Sackett, P. R. (2002). The relative importance of task, citizenship, and counterproductive
performance to global ratings of job performance: A policy-capturing approach. Journal of Applied
Psychology, 87, 66–80.
Rotundo, M., & Xie, J. L. (2008). Understanding the domain of counterproductive work behavior in China.
International Journal of Human Resource Management, 19, 856–877.
Sackett, P. R., & DeVore, C. J. (2002). Counterproductive behaviors at work. In N. Anderson, D. S. Ones, H. K.
Sinangil, & C. Viswesvaran (Eds.), Handbook of industrial, work and organizational psychology, volume
1: Personnel psychology (pp. 145–164). Thousand Oaks, CA: Sage.
Sackett, P. R., & Wanek, J. E. (1996). New developments in the use of measures of honesty, integrity, conscien-
tiousness, dependability, trustworthiness, and reliability for personnel selection. Personnel Psychology,
49, 787–829.
Sagie, A., Birati, A., & Tziner, A. (2002). Assessing the costs of behavioral and psychological withdrawal: A
new model and an empirical illustration. Applied Psychology: An International Review. Special Issue:
Challenges of Applied Psychology for the Third Millennium, 51, 67–89.
Salgado, J. F. (2000). The Big Five personality dimensions as predictors of alternative criteria. Paper presented at the
15th Annual Conference of the Society for Industrial and Organizational Psychology, New Orleans, LA.
Salgado, J. F., Viswesvaran, C., & Ones, D. S. (2002). Predictors used for personnel selection: An overview
of constructs, methods and techniques. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran
(Eds.), Handbook of industrial, work and organizational psychology, volume 1: Personnel psychology
(pp. 165–199). Thousand Oaks, CA: Sage.
Scalora, M. J., Washington, D. O., Casady, T., & Newell, S. P. (2003). Nonfatal workplace violence risk factors:
Data from a police contact sample. Journal of Interpersonal Violence, 18, 310–327.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psycho-
logy: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Skarlicki, D. P., & Folger, R. (1997). Retaliation in the workplace: The roles of distributive, procedural, and
interactional justice. Journal of Applied Psychology, 82(3), 434–443.
Smith, D. A., & Visher, C. A. (1980). Sex and involvement in deviance/crime: A quantitative review of the empiri-
cal literature. American Sociological Review, 45, 691–701.
Spalek, B. (2001). Regulation, white-collar crime and the Bank of Credit and Commerce International. The
Howard Journal, 40, 166–179.
Spector, P. E. (1975). Relationships of organizational frustration with reported behavioral reactions of employees.
Journal of Applied Psychology, 60, 635–637.
Counterproductive Work Behavior and Withdrawal 511

Spector, P. E. (1988). Development of the work locus of control scale. Journal of Occupational Psychology,
61(4), 335–340.
Spector, P. E., & Fox, S. (2002). An emotion-centered model of voluntary work behavior: Some parallels
between counterproductive work behavior and organizational citizenship behavior. Human Resource
Management Review, 12, 269–292.
Spector, P. E., & Fox, S. (2005). The stressor-emotion model of counterproductive work behavior. In S. Fox &
P. E. Spector (Eds.). Counterproductive workplace behavior: Investigations of actors and targets
(pp. 151–174). Washington, DC: American Psychological Association.
Spector, P. E., Fox, S., Penney, L. M., Bruursema, K., Goh, A., & Kessler, S. (2006). The dimensionality of
counterproductivity: Are all counterproductive behaviors created equal? Journal of Vocational Behavior,
68, 446–460.
Steffensmeier, D., & Allan, E. (1996). Gender and crime: Toward a gendered theory of female offending.
Annual Review of Sociology, 22, 459–487.
Storms, P. L., & Spector, P. E. (1987). Relationships of organizational frustration with reported behavioural
reactions: The moderating effect of locus of control. Journal of Occupational Psychology, 60(3),
227–234.
Tepper, B. J. (2000). Consequences of abusive supervision. Academy of Management Journal, 43, 178–190.
Vardi, Y., & Weitz, E. (2002). Using the theory of reasoned action to predict organizational misbehavior.
Psychological Reports, 91, 1027–1040.
Zapf, D., & Einarsen, S. (2005). Mobbing at work: Escalated conflicts in organizations. In S. Fox, & P. E.
Spector (Eds.), Counterproductive work behavior: Investigations of actors and targets (pp. 237–270).
Washington, DC: American Psychological Association.
This page intentionally left blank
24 Defi ning and Measuring
Results of Workplace
Behavior

Elaine D. Pulakos and Ryan S. O’Leary

The previous chapters in this section focused on the measurement of task performance, construc-
tive personal behavior (citizenship and adaptability), and counterproductive behavior and how
these fit in the context of conducting selection research. Each of these represents a conceptually
distinct content area within the performance domain, and all consist of reasonably well-defined
constructs that have been reliably and validly measured in the past and successfully used as cri-
teria in validation research, albeit some more so than others. Alternatively, this chapter focuses
on measuring results—the actual end products, outcomes, or deliverables one produces on a job.
Unlike other criterion constructs, discussions of a “results” construct are relatively rare in the
industrial and organizational (I-O) psychology literature. Likewise, results measures have not
been as well defined and researched as other types of performance measures (e.g., task, citizen-
ship, adaptive, etc.). Thus, we know less about their reliability, validity, accuracy, and fairness
compared with other, more commonly used performance measures. We also know less about
how to develop effective results measures that will possess adequate psychometric properties and
validity.
Given that we already have several conceptually distinct, well-defined, and psychometrically
sound performance measures that appear to comprehensively cover the criterion domain, one might
reasonably question why we should bother adding results measures to the mix. The answer is that
many organizations today are focusing on defining work in terms of the results employees are
expected to achieve, and likewise, they are evaluating and rewarding staff on the extent to which
they have delivered tangible outcomes that are important to the organization’s success. Thus, if a
situation arises in which we must conduct validation research using criterion measures that are
available, chances are that we will increasingly encounter measures of results. In addition, opera-
tional performance measures are sometimes used as predictors in making selection decisions; for
example, performance ratings are often one input used in making promotion decisions. Here, again,
such predictors are increasingly likely to include measures of results.
Many of the performance measures used in validation research have focused on measuring
work behavior, which is important to ensure job relevance. Behavioral measures have also been
used extensively in the past as a basis for performance management. These measures focus on how
employees get the job done; for example, how they contribute to a team, communicate, plan and
organize work, and so forth. Irrespective of how productive employees may be, we are all familiar
with the problems and disruptions they can cause if they are difficult to work with, unhelpful, or
exhibit maladaptive behavior. Thus, evaluating workplace behavior is important. We are all also
familiar with employees who are extremely helpful, communicate well, and are nice to everyone;
yet never seem to get anything done. This is why considering the results that an employee achieves

513
514 Handbook of Employee Selection

is also an important part of overall performance measurement, and as mentioned, it is one that orga-
nizations are increasingly emphasizing.
Although a choice can be made to assess results or behavior, it may be important to include both
types of measures when comprehensive performance measurement is the goal (Landy & Trumbo,
1980; Pulakos, 2008), as would be the case when the measures are used as criteria for validation
research or as predictors in selection processes. Because earlier chapters have discussed behavioral
performance measurement in detail, our focus here is on how to obtain useful and meaningful
measures of results. However, because relatively little research attention has been directed to mea-
suring results, there is not an extensive literature to draw on that speaks directly to the quality and
utility of results measures or how they relate to other, more commonly used predictors and criteria.
Accordingly, we draw on related research to propose methods for developing results measures that
should maximize their reliability, validity, and fairness.
We begin below by reviewing the debate that has surrounded measuring workplace behavior
versus results and discuss why the measurement of results has become increasingly popular
today. We then propose methods for developing results measures and challenges associated with
these. We review the concept of cascading goals and provide guidelines for developing individual
objectives, which are thought to be an important precursor to achieving organizationally relevant
results. We then discuss evaluation methods that should facilitate accurate and fair measure-
ment of the results employees achieve, using a combination of objective and subjective mea-
sures. Finally, we discuss individual difference constructs that are likely to predict performance
results.

MEASURING WORKPLACE BEHAVIOR VERSUS RESULTS


There have been long-standing differences of opinion about what aspects of employee performance
should be measured—behavior, results, or both (see Bernardin, Hagan, Kane, & Villanova, 1998;
Feldman, 1992; Latham, 1986; Murphy & Cleveland, 1991; Olian & Rynes, 1991). The measure-
ment of each offers unique advantages and corresponding disadvantages. In this section, we briefly
discuss these as well as the reasons for the increasingly popular trend of measuring employee per-
formance in terms of results.
Many industrial and organizational psychologists have argued against measuring results,
advocating instead for a focus on behavior. They argue that there are too many measurement
problems associated with results-based criteria that undermine their usefulness (Dunnette, 1966;
Guion, 1965). First, there are some jobs for which results measures are nonexistent (e.g., artistic
and creative jobs, many research and development jobs), making it impossible for job perfor-
mance to be evaluated in these terms. Second, the assessment of results is problematic because it
can be impacted by factors outside an employee’s direct control or be the result of team efforts.
Indeed, it is likely that many of the nontrivial results that an individual achieves are at least
somewhat a function of factors outside of his or her complete control. Consequently, the measure-
ment of important results may inherently suffer from some amount of criterion contamination
(Borman, 1991). Finally, an exclusive focus on results can yield deficient performance mea-
surement because consideration is not given to how employees achieve their results. Although
workers can achieve impressive results, overall performance is not effective if employees have
a “results-at-any-cost” mentality and achieve outcomes in ways that are detrimental to others or
the organization (Cardy, 1998).
To address these issues, job performance has typically been evaluated by measuring work behav-
iors via the use of subjective rating scales. One important advantage of using subjective ratings is
that all of a job’s performance requirements can be described on a set of rating scales, thereby miti-
gating the deficiency problems that often plague results-based measurement (Borman, 1987). Also,
by focusing on behaviors that lead to effective performance, criterion contamination resulting from
situational factors outside of the employee’s control is mitigated.
Defining and Measuring Results of Workplace Behavior 515

Although there are clearly challenges inherent in measuring results, the evaluation of behavior
is not without issues of its own. First and foremost, the common practice of using subjective ratings
to assess behavioral performance (see Chapter 21, this volume) yields measures with notoriously
attenuated variance. This is particularly true when these ratings are collected for operational pur-
poses (e.g., pay, promotion), circumstances in which a large proportion of employees are rated at
the highest levels of the rating scale (Pulakos, 2004). This lack of discrimination among employees
renders the measures virtually useless for validation research or for use as selection measures.
Although for-research-only ratings collected in validation studies tend to be more variable, lack of
discrimination is a chronic problem with subjective ratings, undermining their reliability, validity,
and utility.
Second, advocates of results-based measurement assessment argue that a focus exclusively on
behavior misses what is most important, namely whether or not an employee actually delivered
important bottom-line results. Although an employee can engage in highly effective behaviors, they
are of little value if they do not result in organization-relevant outcomes (Bernardin et al., 1998).
To that end, it has been suggested that behaviors should be measured only if they can be linked to
outcomes that drive organizational success. In addition, research has shown that employees perform
more effectively when they have specific goals and expectations so that they know what they are
accountable for delivering (e.g., Locke, Shaw, Saari, & Latham, 1981). Defining and measuring the
results each employee is expected to achieve and aligning those to organizational performance helps
everyone work toward a common set of important goals.
Despite the difficulties associated with measuring results (e.g., criterion contamination and defi-
ciency), there has been an increasingly popular trend over the last decade for organizations to adopt
a results-focus in the measurement of job performance. This is largely because business leaders and
organizational consultants have become convinced that an exclusive focus on behaviors is remiss
in not sufficiently emphasizing the importance of delivering meaningful results that are critical to
organizational success. This orientation has likely been driven by intensified pressure from stock-
holders and increasingly formidable national and international competition. It is noteworthy that
two other chapters in this volume share the perspective that results criteria are important indices of
selection-related value. Chapter 11, this volume, makes this point in discussing the business value of
selection as it relates to system and organizational level outcomes, whereas Chapter 9, this volume,
discusses this in relation to multilevel issues.
Even public sector and not-for-profit organizations that have not traditionally driven toward
results have adopted this focus to demonstrate their value. In the late 1990s, the Internal Revenue
Service (IRS), the Federal Aviation Administration (FAA), and the Government Accountability
Office (GAO) all initiated pay-for-performance systems, which focused on measuring and reward-
ing results. More recently, the U.S. Departments of Defense (DoD) and Homeland Security (DHS)
have begun to develop similar programs. This results-focus has become so pervasive that the U.S.
Office of Personnel Management (OPM) codified procedures that require Federal government agen-
cies to develop performance management systems for executives that link their performance to
results-oriented goals and to explicitly evaluate results. Similar systems are also being implemented
at lower organizational levels.
With results-oriented performance measurement increasingly emerging as a significant trend,
the remainder of this chapter is devoted to methods for defining and evaluating results in a man-
ner that will yield the highest quality measures possible. One important caveat to point out is that
results have been fairly narrowly defined in the past to include only those outcomes that could be
evaluated using bottom-line, highly objective criteria, such as dollar volume of sales and number
of products produced. More recent operationalizations of results continue to emphasize objective
measurement, but there has also been recognition that it may not be possible to translate every
important aspect of a result into a bottom-line, objective metric. This has opened the door for the
use of some subjective (i.e., judgmental) measures along with objective measures in assessing the
quality of results.
516 Handbook of Employee Selection

DEFINING INDIVIDUAL PERFORMANCE OBJECTIVES


Measuring results relies on identifying performance objectives that state the outcomes an employee
is expected to achieve in sufficient, measurable detail such that it is clear whether or not the objec-
tives have been met. An important goal in many organizations today is ensuring that employ-
ees focus on achieving results that contribute to important organizational goals. For example, if
improved teaming with strategic partners is a key organizational goal, the objectives set for employ-
ees should hold them accountable for seeking out and formalizing these relationships. The value
of developing and linking goals at different levels has been written about extensively in the man-
agement by objectives (MBO) literature (Rodgers & Hunter, 1991). Linking organizational goals
to individual goals not only helps to focus employees’ attention on the most important things to
achieve but also shows how their achievements support the organization’s mission. Additionally, by
showing how the work performed across the organization is related, it is more likely that everyone
will be working in alignment to support the organization’s strategic direction and critical priorities
(Hillgren & Cheatham, 2000; Schneier, Shaw, & Beatty, 1991). Moreover, as strategies and priori-
ties change, the shift in emphasis can be readily communicated to all levels via the linking of goals
(Banks & May, 1999).
To ensure alignment of goals across levels, organizations frequently implement the concept of
cascading goals, in which the organization’s strategic goals are cascaded down from level to level
until they ultimately reach individual employees. In such a system, each employee is accountable for
accomplishing specific objectives that are related to higher-level goals, thus providing obvious and
transparent connections between what an employee does on his or her job and the organization’s key
goals (Hillgren & Cheatham, 2000; Banks & May, 1999).
Figure 24.1 presents an example of linking four levels of organizational goals. Looking at the
connecting symbols, not every goal applies to all levels. For example, only two of the five organiza-
tional goals apply to Administrative Division. Likewise, only two of the Administrative Division’s
goals apply to the Accounting and Finance Department. Finally, in this example, the person’s indi-
vidual performance objectives support only one of the department’s goals. It is extremely unlikely
that an individual’s performance objectives will relate to every goal at every level in the organiza-
tion. What is shown in the example is much more typical, in which an individual’s objectives will
support only a subset of higher-level goals.
Although the value of developing and linking individual and organizational objectives makes a
great deal of sense in theory, practical implementations of this strategy have revealed some signifi-
cant challenges that make the process much easier said than done. First and foremost, it is absolutely

Organization’s strategic
goals

Administrative division
goals

Accounting and
finance department
goals

Finance professional’s
individual goals

FIGURE 24.1 Example of cascaded goals.


Defining and Measuring Results of Workplace Behavior 517

critical for organizations to set goals and objectives in a thoughtful and realistic way to ensure that
mistakes made in setting the highest level goals do not cascade down throughout the entire orga-
nization. For this reason, goals set by the top leadership of the organization need the most critical
scrutiny and opportunities for correction or revision, things that do not always occur to the degree
they should.
Assuming the highest-level goals are well thought through and realistic, one of the challenges in
cascading goals is that it is sometimes difficult to see direct relationships between high-level goals
and what an individual does on the job. This is why organizational goals need to be translated or
cascaded into increasingly refined goals at the division, department, and individual levels. The
process of developing cascading goals usually requires several meetings in which organizational
leaders first develop division goals that align with the organizational goals. Then, mid-level manag-
ers develop unit goals that align with division goals. Then, managers develop group goals that align
with unit goals, and so on until the organizational goals are cascaded down to individual employees.
The process of cascading goals thoughtfully and meaningfully is quite time-consuming and chal-
lenging, especially if managers are not accustomed to thinking about the linkages between goals at
different organizational levels.
On average, organizations spend less than 10 hours per year on performance management activi-
ties for each employee (Brentz, Milkovich, & Read, 1992). However, the process of cascading goals
requires considerably more time. In fact, it is not uncommon for organizations that are initiating
cascading goals to take until the end of the second quarter of the operating year to complete the
process. This poses difficulties, because half of the rating period may have passed before individual
goals and expectations are set for employees, leaving little time for goal attainment. However, as
organizations gain more experience with cascading goals, efficiencies are realized. Training man-
agers and employees in how to identify cascading goals that are meaningful, useful, and show clear
relationships between levels also facilitates the process. The bottom line is that the implementation
of cascading goals requires time, effort, and considerable handholding, at least initially, to ensure
that the process is done well.
Once goals have been successfully cascaded down to the level just above the individual, there are
two ways individual goals can be linked to these higher-level goals:

1. Start with performance objectives and work upward to link them to higher-level goals.
2. Start with higher-level goals that are relevant to an employee’s job and work downward to
develop individual performance objectives.

The decision to link upwards or downward is a personal preference. Some find it easier to start
with something concrete from their job and work upward to a less-tangible concept. Others find it
easier to start with a higher-level goal and develop something they can do on their job that relates
to that goal. Figure 24.2 shows an example of how department goals could be cascaded down to
individual objectives for a human resources (HR) professional. Note that the individual objectives
are related to only one of the department goals. Objectives can be related to more than one goal at
the next higher level, but as mentioned previously, it is unlikely that goals at one level will relate to
all of the goals at the next level.
There are several guidelines that should be followed when developing individual performance
objectives. Many of these are a direct outgrowth of the well-established principles found in the goal-
setting literature (Locke & Latham, 1990). Following these guidelines will help to ensure that the
objectives are clear, employees know what is expected, and they are motivated to achieve success.

• Objectives must be specific: Objectives must be clearly defined, identifying the end results
employees are expected to achieve. Ambiguity is reduced by specifying the outcomes,
products, or services in terms of quality, quantity, and timeliness expectations. Despite the
fact that research has continually found that well-defined objectives are associated with
518 Handbook of Employee Selection

Human resources department goals

Develop a process to tie employee performance to rewards by the end of the year.
Train all managers on the new performance management system by December 31.
Reduce the average cost of processing retirement claims by 5% by the end of year.
Conduct a competency gap analysis for critical occupations by the end of third quarter.

Individual objectives for human resources professional

Design and obtain leadership approval for a new employee incentive system by march.
Implement an employee incentive system for all full-time employees with payouts to
employees beginning in January.

FIGURE 24.2 Example of individual goals cascaded from departmental goals.

higher levels of performance, reviews of results-based performance measurement systems


have shown that objectives are frequently not sufficiently defined or well written to clearly
communicate the employee’s expectations.
• Objectives must be measurable: To the extent possible, objectives should be defined in
terms of measurable outcomes relating to quality, quantity, and timeliness standards so that
both managers and employees know when and whether they have been achieved. However,
to comprehensively measure what is most important, it may be necessary to go beyond
objective measures and allow for some subjective judgment (e.g., quality is sometimes
difficult to operationalize in terms of concrete metrics). Later in the chapter, we discuss
evaluation of objectives in detail and provide examples of objective and subjective criteria
that can be used to measure results.
• Objectives must be difficult but achievable: The goal-setting literature has consistently
shown that difficult but attainable objectives lead to more effective performance than mod-
erately difficult goals (Locke & Latham, 1990). Goals that are perceived as challenging,
but realistic, have been found to have the strongest impact on motivating employees to
perform. Related to this idea is that the objective must be sufficiently within an employee’s
control to achieve and not overly depend on outside factors.
• Objectives must be job relevant: Objectives should have a direct and obvious link to the
employee’s job and important organizational success factors. As discussed, the use of cas-
cading goals and objectives helps to ensure individual and organizational goals are aligned.
We discuss below how to use job analytic information as a basis for developing objectives,
thus helping to ensure their content validity.
• Ideally, no more than three to five objectives should be set: Performance objectives should
reflect significant products or outcomes that employees are expected to deliver. The recom-
mendation to limit objectives to three to five is based on the fact that most employees will be
unlikely to achieve more than this number of significant and important results in a year’s time.
Consequently, establishing more than this number of objectives could be overwhelming and
only serve to demotivate employees. Although it is usually possible to set subgoals for major
objectives, and employees should do this to guide their own performance, it is not recom-
mended that the objectives recorded in an employee’s performance plan contain this level of
detail. Recording many narrowly defined subgoals in one’s formal performance plan can make
it impractically time-consuming to maintain. This is because changes to formal performance
plans often require review and approval from others (e.g., supervisors, second-line managers,
and HR). In most circumstances, it will not make sense to include very detailed subgoals that
may change regularly as the work evolves and require ongoing formal revision of the plan.
Defining and Measuring Results of Workplace Behavior 519

• Employees must be committed to the objectives: A key aspect of commitment that we have
already discussed is that employees must feel that they can reach their objectives, or they
will be demotivated to try. The best way to facilitate employees accepting their objectives
is to make them an active part of the objective-setting process and work with them to arrive
at objectives that are challenging yet achievable. Once managers and employees have come
to agreement on the employee’s objectives in principle, asking employees to prepare the
written description of their objectives helps enhance their ownership of them.
• Managers must show their commitment to the objectives: It is important for managers to
show their support by providing guidance and resources as well as removing obstacles to
goal achievement. The literature clearly shows that management commitment is critical
to successful achievement of objectives (Rodgers, Hunter, & Rogers, 1993).

CHALLENGES ASSOCIATED WITH DEVELOPING INDIVIDUAL


OBJECTIVES AND MITIGATION STRATEGIES
Although it may be intuitively appealing to develop individual employee objectives that link to orga-
nizational goals, there are several challenges associated with developing fair and effective objec-
tives that result in reliable and valid performance measurement. In this section, we discuss seven
major challenges inherent in identifying and setting objectives, along with recommendations for
mitigating these.

CHALLENGE 1: TRAINING MANAGERS AND STAFF TO WRITE EFFECTIVE OBJECTIVES


Managers and employees are not typically accustomed to developing objectives and therefore find it
challenging to identify and clearly define them. One reason is that organizational members seem to
naturally think in terms of the work behaviors that employees perform on the job rather than tangible,
well-defined outcomes. This may be because the materials they tend to review (e.g., job descriptions
or vacancy announcements) typically contain work behaviors or job tasks. Identifying performance
objectives requires going beyond tasks and defining the specific products, services, or outcomes that
result from work activities. Training is necessary to help managers and employees understand what
performance objectives are and how to write them in a clear and unambiguous manner.
However, even after attending training, the quality of the objectives produced by different man-
agers and employees varies greatly. It is especially helpful in the initial implementation process for
individuals who know how to write effective objectives (e.g., trained HR staff or higher-level man-
agers) to review the objectives for each employee and provide feedback on their appropriateness,
clarity, and fairness. One advantage of a higher-level review is that it enables the objectives devel-
oped for similarly situated employees to be assessed for comparability and revised, if necessary.
The process of receiving feedback from higher-level reviews further trains managers and employees
how to write more effective objectives.

CHALLENGE 2: ENSURING OBJECTIVES ARE JOB RELEVANT


In more routine, standard, and predictable jobs, it is often possible to predefine a set of objectives that
apply uniformly to all employees at a given level using standard job analytic procedures. This not
only saves time that would otherwise be spent by each manager and employee developing individual
objectives, but it also ensures that all employees in the same job are held accountable for delivering
the same results. Standardized objectives are not only the most fair for employees, they also allow
straightforward comparisons to be made between employees in terms of the results they delivered.
In a similar vein, future research should investigate whether there are sufficient similarities in
the objectives and structure of similar types of organizations that could serve as the foundation for
developing broad taxonomies of organizational and unit goals that would cascade from them. If it
520 Handbook of Employee Selection

were possible to develop such taxonomies, efficiencies would be gained from alleviating the need
for every organization in a similar line of business to devote time and energy to re-inventing the
wheel over and over again.
In more unique jobs and situations, it may be impossible to predefine objectives that apply across
positions, jobs, or organizations. Although a group of employees may occupy a given job, the spe-
cific results each individual is expected to achieve may vary depending on the nature of his or her
assignments. For example, some organizational consultants may have production or sales results,
others in essentially the same job may be responsible for developing and implementing systems, oth-
ers may have specific levels of customer satisfaction outcomes they are expected to meet, and still
others may have employee development or team-leadership goals. To the extent that people holding
similar jobs have different goals and objectives, evaluating and comparing their performance in a
fair and standardized manner becomes increasingly challenging.
Under these circumstances, we recommend developing individual objectives that further define
critical tasks from a comprehensive job analysis. This helps to ensure that a common base of job-
relevant information is used to develop objectives. Objectives derived in this manner will contain
more specific information than the tasks or work behavior statements, such as what specific proj-
ect, customer, product, etc. the employee is responsible for and what specific quality, quantity, and
timeliness criteria will be measured. Two examples of how objectives can be developed by further
specifying validated work behaviors appear in Table 24.1. The first task is to evaluate and monitor
the quality of product information supplied to potential customers. A reasonable objective for this
task would be to monitor the specific channels that are used to provide information to customers and
evaluate the accuracy and timeliness of the information supplied according to measurable criteria.
The second task is to design, administer, analyze, and interpret surveys. Specifying what type of
survey a given employee is responsible for and the timeline required for its completion allows this
work task to be transformed into an individual objective.

CHALLENGE 3: HELPING MANAGERS DEVELOP COMPARABLE AND FAIR OBJECTIVES FOR EMPLOYEES
A problem that occurs when different managers set objectives for employees who occupy the same
job is that natural inconsistencies among them can result in objectives that are too easy, unattainable,
or unsystematic across employees (Jamieson, 1973; Strauss, 1972). This often results in employees
in the same job being evaluated against objectives that vary significantly in their difficulty and com-
plexity. For example, assume one employee’s objective is to perform a simple information-catalogu-
ing project, whereas another in the same job and level is given the objective of managing the design
and implementation of a complex information management system. If the value of these different
objectives is not established and there is no mechanism in place to review objectives for fairness
and consistency across employees, both of these employees could be considered performing equally

TABLE 24.1
Transforming Work Tasks Into Performance Objectives
Work Task Transformed Into Performance Objective
Evaluate and monitor the quality of Monitor calls to company call center and provide feedback to staff as
information provided to potential necessary to ensure 95% accuracy of product information provided
customers Monitor responses to e-mail inquires to ensure that responses are made
within 24 hours and that accuracy of information provided is at least 95%
Design, administer, analyze, and Develop items, select vendor, and administer survey by January; analyze
evaluate surveys data, conduct focus groups to further understand survey results and write
report with clear, actionable, and feasible recommendations that requires
no grammatical editing and minimal substantive editing by July
Defining and Measuring Results of Workplace Behavior 521

well if they both achieved their stated objectives. Yet, the employee who managed the design and
implementation of the information management system would have undertaken a much more diffi-
cult and complex assignment and contributed substantially more. Thus, evaluating employee results
cannot merely take into account whether each individual simply met or did not meet the established
objectives. This would not only undermine accuracy of performance measurement but could also
rightly be viewed as unfair, with a consequential negative impact on employee acceptance of the
measurement process (e.g., Dipboye & de Pontbraind, 1981; Greenberg, 1986).
We recommend several strategies to mitigate this problem. First, the training provided to man-
agers and employees needs to focus on teaching them how to develop objectives that are of similar
difficulty and complexity for individuals in the same or similar jobs. This process is similar to
frame-of-reference training described by Bernardin and Buckley (1981), in which review and dis-
cussion of example objectives helps to calibrate trainees to apply similar standards. As a supplement
to training, especially in the early stages of implementation, having managers meet to review the
objectives for staff in the same job helps ensure that similarly difficult and complex objectives are
set for similarly situated employees. Such meetings also serve to reinforce development of a com-
mon frames-of-reference among managers for setting objectives.
A third recommendation to facilitate the quality and consistency of individual objectives is to
retain them in a searchable database organized by job and level. These can be used again verbatim
or refined and edited over time to develop future objectives.
Finally, even if an objective appears appropriate for the job and level and is comparable to those
for similarly situated others, a project, program, or goal will sometimes turn out to be much more or
less difficult than anticipated. For this reason, we feel it is important to evaluate employees not only
on the extent to which they achieved or exceeded their stated results but also on the difficulty and
complexity of what they delivered relative to what would be expected for their job. Although this
involves a subjective judgment, it provides a fairer and more accurate assessment of the employee’s
performance overall and a systematic basis for making meaningful comparisons between employ-
ees who may have achieved different types of results. We further discuss and provide examples of
subjective rating criteria to evaluate results later in this chapter.

CHALLENGE 4: ENSURING OBJECTIVES ARE WITHIN AN EMPLOYEE’S CONTROL


When one is developing individual objectives, care must be taken to ensure that they are largely within
the employee’s control and not overly dependent on things he or she cannot control. As discussed above,
differences in the results achieved may not be a function of differences in individual motivation, effort, or
ability, but instead, differences in the opportunities available to different employees. For example,
one employee may have more modern equipment than another and thus be able to produce a higher
volume of product, irrespective of how hard either individual works. In a similar classic example, one
employee may have a sales territory in Wyoming and another in New York City. On the basis of volume
and proximity of potential customers, the individual in New York City should have more opportunity
to make sales than the one in Wyoming. Clearly, circumstances beyond an employee’s control can
have a significant impact on the results achieved (Kane, 1986) and an employee’s motivation.

CHALLENGE 5: HANDLING OBJECTIVES THAT ARE PARTIALLY ATTRIBUTABLE TO OTHERS


A related challenge in setting objectives occurs when outcomes cannot easily be associated with
a specific person’s effort, because the work is team-focused or involves significant interdependen-
cies. For example, in the design and production of a new automobile, the quality of the product is
dependent on the design engineering group and the production group (Cascio, 1998). The results
of team-based jobs are a function of the coordination and seamless performance of the group, not
simply the sum of the individual team member contributions.
522 Handbook of Employee Selection

When the work is team-focused or requires significant interdependencies, objectives should be set
at the level where the key work products are produced. If jobs are so intertwined or dependent on a
team, it may not be practical or even appropriate to set individual objectives. In such circumstances,
individual objectives should be abandoned and replaced with objectives set at the higher group or team
level (Lawler, 1994). Ployhart and Weekley (Chapter 9, this volume) similarly make the point that task
and result interdependencies may make individual-level performance and results impossible to measure
well, if at all, and only aggregated performance/results may be measurable in any reasonable way.

CHALLENGE 6: SETTING OBJECTIVES IN FLUID SITUATIONS


Setting specific objectives in advance may be extremely difficult for some jobs (Cascio, 1998;
Levinson, 2005). Jobs that best lend themselves to setting objectives have relatively static perfor-
mance requirements and definable productivity metrics, both of which are uncommon in many of
today’s jobs. As the economy continues to transform from a manufacturing focus to a knowledge
and service focus, jobs are increasingly becoming more fluid and unpredictable, which makes set-
ting objectives more difficult (Pulakos, Hanson, & O’Leary, 2007). Imagine the challenge of devel-
oping specific objectives for research and development jobs, in which it is impossible to predict
when and what meaningful discoveries will occur.
For jobs that are fluid and unpredictable, or in situations where unforeseen circumstances regularly
interfere with attaining objectives, it may be necessary to alter or completely revise an employee’s objec-
tives during the rating period. Managers and employees need to be prepared to make changes to the
objectives as the situation or priorities change. Obviously, to the extent that a situation is chronically vol-
atile, requirements for constant changes to the formal performance plan may prove to be impractically
time-consuming. An alternative strategy for jobs that are in flux is to set shorter-term objectives that are
more predictable. Feedback can be given during the rating period as employees meet key milestones.
In fact, given the fluid nature of many work environments and jobs, some experts have argued against
setting longer-term objectives and instead recommend setting shorter-term goals as the work evolves.

CHALLENGE 7: ENSURING OBJECTIVES FOCUS ON IMPORTANT ASPECTS OF PERFORMANCE


Measuring important aspects of performance is necessary to obtain valid and useful measures.
Consider the job of an electrician. Although the number of projects completed within budget may
be a useful indicator of performance effectiveness, the ability to complete projects within budget is
only one aspect of the job. There are other, more important contributors to overall performance that
should be assessed, such as whether the work is competently performed according to code.
Ensuring that nontrivial aspects of performance are measured relies on two things. The first is that
careful consideration be given to what types of performance measures are most critical to assessing
effectiveness (e.g., quality, quantity, timeliness, etc.) and appropriately incorporating these factors
into performance measurement. The second is understanding that although some people advocate
using only quantifiable measures (e.g., average call time, sales volume, etc.) to evaluate objectives,
results-based performance measurement does not require consideration of only bottom-line, objec-
tive measures. Instead, the results of some objectives may need to be judged subjectively (e.g., to
evaluate the quality of work produced). Including evaluation of objective metrics and subjective
factors, where appropriate, will help mitigate the problem of only focusing on those results that can
be easily measured rather than those that represent the most important aspects of performance. We
now turn to a more comprehensive discussion of evaluating results.

MEASURING RESULTS OF PERFORMANCE OBJECTIVES


Once individual objectives have been established, employee performance related to those objec-
tives must be evaluated. There are four types of measures that are commonly used for this purpose:
timeliness, quality, quantity, and financial metrics.
Defining and Measuring Results of Workplace Behavior 523

Timeliness refers to the timeframe in which the work was performed. Examples of timeliness
metrics include responding to customer complaints within 24 hours and providing statistical reports
on a quarterly basis that summarize progress toward affirmative action goals.
Quality refers to the effectiveness of the result. Examples of quality metrics include improving
the layout for navigating a website to make it more user-friendly as indicated by a 10% improve-
ment in user survey satisfaction results, independently creating a report containing relevant and
concise information on program operations that required no revisions, and developing an online
training program in which trainees successfully learned 85% of the materials. Although it is use-
ful to develop quantifiable metrics of quality where it is possible to do so, quality assessments will
sometimes require subjective judgments (e.g., how relevant and concise the information contained
in a report actually was). Providing predefined rating criteria to guide subjective judgments helps
ensure employees are fairly evaluated against uniform standards.
Quantity refers to how much work is performed. Examples of quantity metrics include respond-
ing to 95% of requests, providing computer training to 90% of employees, and conducting two-
onsite reviews each month to assess compliance with regulations.
Finally, financial metrics relate to the efficient use of funds, revenues, profits, or savings. Examples
of financial metrics include budgeting operations to achieve a 10% cost savings compared to last
year and convincing customers to increase expenditures for service by 15% more than last year.
Although there are four primary ways to measure results, the different types of measures can be
used together, which usually improves the clarity of expectations. For example:

• Processed 99% of candidate job applications within 1 week of receiving them (quantity
and timeliness)
• Developed an online training course that taught 90% of employees how to use automated
transactional systems and reduced training costs by $500/employee (quantity, quality, and
financial metrics)

Table 24.2 presents examples of well-defined objectives that specify timeliness, quality, quantity,
and/or financial metrics and examples of poorly defined objectives that fail to specify measurable
criteria. As it can be seen by reviewing the first set of objectives (i.e., well-defined) in the table,
articulating expected results in terms of the four types of measures is likely to increase understand-
ing and agreement about whether or not the objectives were achieved. Alternatively, the second set
of objectives (poorly defined) is vague and nonspecific, which could easily lead to differing opinions
about the extent to which they were met.

TABLE 24.2
Example Performance Objectives
Well-defined objectives
• By June 30, develop a plan that allows for 90% of general inquires to company website to be responded to within
72 hours.
• By the end the operating year, implement a self-service benefits system that reduces processing costs by 10%.
• By June 30, draft and submit to the HR vice president a plan and timeline that is accepted without revision for
expanding telework options to at least 70% of full-time employees.
• Reduce average cost of processing travel reimbursements by 5% by end of year.
Poorly defined performance objectives
• Provide effective customer service.
• Coordinate with the legal department to revise the company’s HR policy.
• Promote volunteering in the local community.
• Reduce operating costs of company fleet program.
524 Handbook of Employee Selection

CHALLENGES ASSOCIATED WITH MEASURING


RESULTS AND MITIGATION STRATEGIES
Because our focus in this chapter is on measures that will be used as criteria in validation studies or
as predictors for making selection decisions, reliability, validity, accuracy, and fairness of measure-
ment are essential, as we have discussed. In the previous section, we described four types of mea-
sures that are most commonly used to evaluate results. Although we feel that these are useful and
should be incorporated in measuring results, they have some inherent limitations that are important
to address. To appreciate these limitations fully, it is important to understand the two primary fac-
tors that have driven a results focus in organizations. That is, organizational leaders want to do the
following:

• Drive achievement of important results from all employees that contribute to the organiza-
tion’s success.
• Reward employees on the basis of their performance, which requires accurate performance
measurement. Architects of pay-for-performance systems felt this could be best achieved
by defining results in terms of concrete, objective measures, thus mitigating the chronic
inflation that characterizes subjective ratings.

With this as background, we now discuss three challenges inherent in measuring results and recom-
mendations for addressing these.

CHALLENGE 1: ENSURING THE MEASURES SELECTED ARE THE IMPORTANT ONES


Managers must decide which measures are most important for assessing employee performance on
each objective. They are encouraged to quantify these measures so there is no disagreement about
the extent to which an objective has been met. On the surface, selecting the most appropriate mea-
sures may seem easy and straightforward. But consider the following questions:

• Did the employee who produced the most pieces also produce the highest quality pieces?
• Did the website redesign that was completed on time and within budget actually improve
usability?
• Was the driver who made the most deliveries speeding and endangering others?

The reality is that even when measuring performance on objectives seems straightforward, it is
important to consider the consequences of the measures selected because employees will drive to
those measures. For example, quantity measures are usually easier to define than quality measures.
However, if only quantity metrics are used, employees will focus on production, possibly to the det-
riment of quality. It is also important not to fall prey to measuring peripheral aspects of an objective
that may be easy to measure but are unimportant. For example, meeting a deadline is easy to mea-
sure but improving customer service may be what is important. Researchers and practitioners have
long argued against using convenience criteria because they are often unrelated to the most critical
aspects of job performance (e.g., Smith, 1976).
Despite the limitations associated with use of subjective criteria, inclusion of some subjective
judgment in the measurement of results increases the likelihood that the most important aspects of
performance will be measured. However, we also recommend that uniform standards be provided
to guide raters in making these judgments fairly and systematically across employees. Also, incor-
porating standardized criteria on which ratings are made provides a mechanism for making direct
comparisons between employees who may have delivered different types of results. Shown in Table
24.3 are example criteria with a five-point rating scale that could be used to evaluate the quality of
different results.
Defining and Measuring Results of Workplace Behavior 525

TABLE 24.3
Performance Standards for Evaluating Quality of Results
Low High Exceptional
1 2 3 4 5
The product, service, or other The product, service, or other The product, service, or other deliverable
deliverable had significant deliverable possessed high quality possessed flawless and impeccable quality
problems, did not meet minimum and fully met expectations. There that met the highest possible standards and
quality standards, and fell well were only minor errors or mistakes surpassed expectations. There were no
short of expectations. There were that were easily corrected and errors or mistakes and no revision or
many or very significant errors or inconsequential. reworking was needed.
mistakes and substantial revision
or reworking was needed.

CHALLENGE 2: MEASURING A REASONABLE AND SUSTAINABLE NUMBER OF CRITERIA


Although many different types of measures can be used to evaluate results, there is the very practi-
cal issue of which and how many of these can be reliably and accurately measured without creating
systems and processes that are so burdensome that they die of their own weight. Developing and
collecting meaningful performance measures in organizations can have significant resource impli-
cations and thus, careful consideration must be given to the number and types of metrics that will be
collected. To implement and maintain an effective and sustainable results-based process over time,
any measures that require implementation of special or additional processes or systems for collec-
tion should be judiciously selected.

CHALLENGE 3: ENSURING USEFUL AND HIGH-QUALITY EVALUATION INFORMATION


One of the most challenging problems in measuring results occurs when employees have collec-
tively delivered a myriad of different results and it is difficult to differentiate between them in
terms of their overall contribution to the organization (Graves, 1986). For example, how should a
cost-savings result be evaluated and rewarded as compared to a leadership result? Given that some
employees deliver higher impact results than others, it would not be fair or accurate to assume that
all employees who achieved their objectives were performing with equal effectiveness. Related to
this, some employees consistently deliver results above the expectations for their job level, whereas
others consistently deliver below their level. Thus, although it is useful to know whether or not an
employee achieved his or her objectives, this does not always provide useful information for dis-
criminating between the most and least effective performers for validation research or operational
selection/promotion decisions.
An effective strategy that has been used in public and private sector organizations to address the
above issues is, again, to introduce scaled criteria or standards that enable evaluation of the relative
contribution and level of difficulty associated with different results. By using such standards as a
part of the results evaluation process, managers are able to more accurately and reliably measure the
contribution and value of the results delivered by different employees. The use of individual perfor-
mance objectives without scaled evaluation criteria to assess their relative contribution can result
in a system that fails to differentiate between employees who are contributing more or less and for
differentially rewarding them (Muczyk, 1979). Examples of standards to evaluate three different
aspects of a result (e.g., extent to which objective was met, level of result achieved, and contribution
of result) appear in Tables 24.4, 24.5, and 24.6, respectively. It is important to note that ratings on
these criteria can easily be combined into a composite results measure, the psychometric properties
of which can be readily assessed.
526 Handbook of Employee Selection

TABLE 24.4
Extent to Which Objective Was Met
Not Met Met Exceeded
1 2 3 4 5
Several of the quality, quantity, All of the quality, quantity, timeliness, The quality, quantity, timeliness, or
timeliness, or financial measures and financial measures established for financial measures established for this
established for this objective were this objective were met. objective were significantly exceeded.
not met.

INDIVIDUAL DIFFERENCE PREDICTORS OF RESULTS


As Chapter 21, this volume, discusses, job performance criteria in selection research are often
conceptually ambiguous, which makes specifying relationships between predictors and criterion
measures difficult. In the case of results measures, the problem is compounded by the fact that the
results achieved across different jobs may not reflect conceptually homogeneous content or con-
structs to the extent that other performance measures do. For example, considerable research evi-
dence supports the existence of two major and conceptually distinct performance constructs, task
and citizenship performance, each of which has been found to account for significance variance in
overall job performance and be associated with different antecedents (Borman, White, & Dorsey,
1995; Motowidlo & Van Scotter, 1994; Podsakoff, MacKenzie, Paine, & Bachrach, 2000). Because
results measures reflect major outcomes or deliverables that relate to higher-level goals, they likely
capture variance that is predominantly overlapping with task performance. However, depending on
their nature, some results may be more reflective of citizenship performance, whereas others may
be a combination of both.
Because research has not been conducted to understand the underlying dimensionality of results
measures, coupled with conceptual ambiguity about what underlies achieving results, we can only
speculate about what constructs may be most useful for predicting this aspect of job performance.
Two constructs that have been shown to consistently predict performance across jobs also seem
highly relevant for predicting results. First, cognitive ability, which has been found to be one of the
strongest predictors of job performance in general (Hunter, 1980; Schmidt & Hunter, 1998), is likely
be a strong predictor of results, especially to the extent that the results measures share variance with
task performance measures.
It also seems reasonable that conscientiousness, one of the Big Five personality constructs
(Barrick & Mount, 1991; Chapter 14, this volume), would be associated with an overall predisposition

TABLE 24.5
Level of Result Achieved
Did Not Meet Met Exceeded
1 2 3 4 5
The result achieved fell far below the At this level, work is moderately complex and The result achieved far
difficulty and complexity of work difficult such that critical analysis, integration exceeded the difficulty and
expected for this job level. of multiple sources of information, and complexity of work expected
analyzing pros and cons of multiple solutions for this job level.
are required. Work is performed with minimal
supervision and guidance. The result achieved
was consistent with the difficulty and
complexity of work expected for this job level.
Defining and Measuring Results of Workplace Behavior 527

TABLE 24.6
Contribution of Result
Low Moderate High
1 2 3 4 5
The efficiency or effectiveness of The efficiency or effectiveness of The efficiency and
operations remained the same or were operations was improved, consistent with effectiveness of operations
only minimally improved. The quality what was expected. Product or service was improved tremendously,
of products or services remained the quality showed expected improvements. far surpassing expectations.
same or was only minimally improved. The quality of products or
services was improved
tremendously.

to achieve results. The two major components of conscientiousness are achievement motivation and
dependability. Achievement motivation, in particular, which refers to one’s desire to achieve results
and master tasks beyond others’ expectations, may be particularly relevant to predicting results.
Although the Big Five are rarely broken down into their component parts, Hough (1992) and Hough
and Dilchert (Chapter 14, this volume) have argued for and shown potential advantages of examin-
ing lower-level personality constructs in the prediction of job performance. Because of the direct
conceptual similarity between achievement motivation and achieving results, this may be a circum-
stance in which examining the validity of the component personality constructs may prove fruitful.

CONCLUSIONS
The development of individual performance objectives, linked to key organizational goals and pri-
orities, has been hypothesized to drive important results. Given the pervasive use of results measures
in today’s organizations, future research should investigate the relationships between these perfor-
mance measures and more commonly used predictors and criteria. Many practitioners and organi-
zational leaders certainly believe that unique variance is accounted for in measuring results versus
other types of performance measures. Because this belief has led to implementation of complex
and time-consuming results-based systems, it is important to know if the added effort associated
with these systems is, in fact, producing different or better information than other, less demanding
performance measurement approaches.
Research should also be conducted to evaluate the psychometric properties of results measures to
assess whether or not they possess sufficient reliability, validity, and fairness to be used in validation
research and for making selection decisions. Research is also needed to investigate the underlying
dimensionality of results measures as well as predictors of them. Throughout this chapter, we drew
from the literature to propose methods for identifying objectives and evaluating results that should
maximize the likelihood of obtaining measures with adequate measurement properties, validity,
and utility. However, data need to be collected using these methods to evaluate their efficacy.
Competent development of fair, job-relevant, and useful objectives is difficult, resource-in-
tensive, and time-consuming, requiring considerable training and effort on the part of managers,
employees, and HR staff. If organizational members are not committed to developing effective
objectives, doing this consistently for all employees and devoting the time that is needed to yield
high-quality measures, we recommend that results measures not be collected or included in perfor-
mance measurement processes. This is because poorly developed objectives will neither motivate
employees nor will they provide useful criterion measures for validation research or operational
selection decisions. However, if organizational members are willing to devote the time, energy,
and resources necessary to overcome the inherent challenges involved in developing objectives
and monitoring their effectiveness and completion, results-based measures may hold considerable
528 Handbook of Employee Selection

promise. Research and practice have certainly suggested that defining and measuring results can
have a profoundly positive effect on individual and organizational performance (Locke & Latham,
1990; Rodgers & Hunter, 1991).

REFERENCES
Banks, C. G., & May, K. E. (1999). Performance management: The real glue in organizations. In A. I. Kraut &
A K. Korman (Eds.), Evolving practices in human resource management (pp 118–145). San Francisco,
CA: Jossey-Bass.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Bernardin, H. J., & Buckley, M. R. (1981). Strategies in rater training. Academy of Management Review, 6,
205–213.
Bernardin, H. J., Hagan, C. M., Kane, J. S., & Villanova, P. (1998). Effective performance management: A
focus on precision, customers, and situational constraints. In J. W. Smither (Ed.), Performance appraisal:
State of the art in practice (pp 3–48). San Francisco, CA: Jossey-Bass.
Borman, W. C. (1987). Behavior-based rating scales. In R. A. Berk (Ed.). Performance assessment: Methods
and application. Baltimore, MD: Johns Hopkins University Press.
Borman, W. C. (1991). Job behavior, performance, and effectiveness. In M. D. Dunnette & L. M. Hough (Eds.),
Handbook of industrial and organizational psychology (Vol. 2, pp. 271–326). Palo Alto, CA: Consulting
Psychologists Press.
Borman, W. C., White, L. A., & Dorsey, D. W. (1995). Effects of ratee task performance and interpersonal fac-
tors on supervisor and peer performance ratings. Journal of Applied Psychology, 80, 168–177.
Brentz, R. D., Milkovich, G. T., & Read, W. (1992). The current state of performance appraisal research and
practice: Concerns, directions, and implications. Journal of Management, 18, 321–352.
Cardy, R. L. (1998). Performance appraisal in a quality context: A new look at old problems. In J. W. Smither
(Ed.), Performance appraisal: State of the art in practice (pp 132–162). San Francisco, CA: Jossey-Bass.
Cardy, R. L., & Dobbins, G. H. (1994). Performance appraisal: Alternative perspectives. Cincinnati, OH:
South-Western.
Cascio, W. F. (1998). Applied psychology in human resource management. Upper Saddle River, NJ: Prentice
Hall.
Dipboye, R. L., & de Pontbraind, R. (1981). Correlates of employee reactions to performance appraisals and
appraisal systems. Journal of Applied Psychology, 66, 248–251.
Dunnette, M. D. (1966). Personnel selection and placement. Belmont, CA: Wadsworth.
Feldman, D. (1992). The case for non-analytic performance appraisal. Human Resources Management Review,
2, 9–35.
Graves, J. P. (1986). Let’s put appraisal back in performance appraisal: Part 1. Personnel Journal, 61,
844–849.
Greenberg, J. (1986). Determinates of perceived fairness of performance evaluations. Journal of applied psy-
chology, 71, 340–342.
Guion, R. M. (1965). Personnel testing. New York, NY: McGraw-Hill.
Hillgren, J. S., & Cheatham, D. W. (2000). Understanding performance measures: An approach to linking
rewards to the achievement of organizational objectives. Scottsdale, AZ: WorldatWork.
Hough, L. M. (1992). The “Big Five” personality variables—Construct confusion: Description versus predic-
tion. Human Performance, 5, 139–155.
Hunter, J. E. (1980). Validity generalization for 12,000 jobs: An application of synthetic validity and validity
generalization to the General Aptitude Test Battery (GATB). Washington, DC: U.S. Department of Labor,
Employment Service.
Jamieson, B. D. (1973). Behavioral problems with management by objective. Academy of Management Review,
16, 496–505.
Kane, J. S. (1986). Performance distribution assessment. In R. A. Berk (Ed.), Performance assessment: Methods
and applications (pp 237–274). Baltimore, MD: Johns Hopkins University Press.
Landy, F. J., & Trumbo, D. A. (1980). The psychology of work behavior. Homewood, IL: Dorsey Press.
Latham, G. P. (1986). Job performance and appraisal. In C. L. Cooper & I. Robertson (Eds.), International
review of industrial and organizational psychology (pp. 117–155). New York, NY: Wiley.
Lawler, E. E. (1994). Performance management: The next generation. Compensation and Benefits Review, 26,
16–20.
Defining and Measuring Results of Workplace Behavior 529

Levinson, H. (2005). Management by whose objectives? In Harvard Business Review on Appraising Employee
Performance. Boston, MA: Harvard Business School Publishing.
Locke, E. A., & Latham, G. P. (1990). A theory of goal setting and task performance. Englewood Cliffs, NJ:
Prentice-Hall.
Locke, E. A., Shaw, K. N., Saari, L. M., & Latham, G. P. (1981). Goal setting and task performance, 1969–
1980. Psychological Bulletin, 90, 125–152.
Motowidlo, S. J., & Van Scotter, J. R. (1994). Evidence that task performance should be distinguished from
contextual performance. Journal of Applied Psychology, 79, 475–480.
Muczyk, J. P. (1979). Dynamics and hazards of MBO application. Personnel Administrator, 24, 51–61.
Murphy, K. R., & Cleveland, J. N. (1991). Performance appraisal: An organizational perspective. Needham
Heights, MA: Allyn & Bacon.
Olian, R. L., & Rynes, S. L. (1991). Making total quality work: Aligning organizational processes, performance
measures, and stakeholders. Human Resources Management, 30, 303–330.
Podsakoff, P. M., MacKenzie, S. B., Paine, J. B., & Bachrach, D.G. (2000). Organizational citizenship behav-
iors: A critical review of the theoretical and empirical literature and suggestions for future research.
Journal of Management, 26, 513–563.
Pulakos, E. D. (2004). Performance management. A roadmap for developing, implementing, and evaluating
performance management systems. Alexandria, VA: Society for Human Resources Management.
Pulakos. E. D. (2008). Performance management: How you can achieve important business results. Oxford,
England: Blackwell.
Pulakos, E. D., Hanson, R. A., & O’Leary, R. S. (2007). Performance management in the United States.
In A. Varma, P. Budhwar, & A. DeNisi (Eds), Global performance management. London, England:
Routledge.
Rodgers, R., & Hunter, J. E. (1991). Impact of management by objectives on organizational productivity.
Journal of Applied Psychology, 76, 322–336.
Rodgers, R., Hunter, J. E., & Rogers, D. L. (1993). Influence of top management commitment on management
process success. Journal of Applied Psychology, 78, 151–155.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology:
Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Schneier, C. E., Shaw, D. G., & Beatty, R. W. (1991). Performance measurement and management: A tool for
strategy execution. Human Resource Management, 30, 279–301.
Smith, P. C. (1976). Behaviors, results, and organizational effectiveness: The problem of criteria. In M. D.
Dunnette (Ed.), Handbook of industrial and organizational psychology (pp. 239–287). Chicago, IL:
Rand Mc Nally.
Strauss, G. (1972). Management by objectives: A critical review. Training and Development Journal, 26,
10–15.
This page intentionally left blank
25 Employee Work-Related
Health, Stress, and Safety

Lois E. Tetrick, Pamela L. Perrewé, and Mark Griffin

Organizations are increasingly concerned with the health and safety of their employees. There are
several factors contributing to this concern. First, the legal environment in many countries stipu-
lates that employers are responsible for the safety and health of their employees, at least while at
work. For example, the U.S. Occupational Safety and Health Act of 1970 mandates that employers
provide a safe and healthy work environment for employees, and the European Agency for Safety
and Health at Work has issued several Council Directives that protect the health and safety of work-
ers throughout the European Union. It also has been argued that there are growing concerns and
expectations by the general public over health protection from communicable diseases and noncom-
municable environmental hazards including in the work environment (Nicoll & Murray, 2002).
Secondly, the cost of healthcare continues to climb. In the United States, health insurance cover-
age of employees is a direct expense to employers and continues to increase. Although many compa-
nies have shifted more of the healthcare costs to employees (Kaiser Network, 2006), there is still a
potential savings to organizations to have healthy employees because health insurance premiums in
the United States are often based on claims experience from illnesses and injuries. In the European
Union and other countries around the world, the cost of healthcare is more of a social, public health
responsibility rather than an organizational responsibility based in part on differences in funding
of healthcare systems.
Thirdly, employee health is related to productivity and organizational effectiveness. Recent
research on the association of health risks and on-the-job productivity estimated the annual cost
of lost productivity in one organization was between $1,392 and $2,592 per employee, based on
self-reported health risk factors (Burton et al., 2005). In another study of chronic health conditions,
Collins et al. (2005) found that the cost associated with lost productivity from chronic health condi-
tions exceeded the combined costs of absenteeism and medical treatment. Therefore, the concern
over employees’ health and safety is not limited to healthcare costs but also loss of productivity and
this concern is increasingly a global issue (World Health Organization, 2008).
The purpose of this chapter is to examine potential mechanisms that organizations may use to
maintain and promote healthy employees. These mechanisms might be the selection of “healthy”
workers, modifications to the work environment to reduce work stressors and increase safety, and
employee training. Each mechanism has its pros and cons with respect to maintaining healthy
employees as well as a safe and healthy work environment. We examine the importance of having
healthy workers and the role of stress and safety in the workplace on organizational effectiveness.
Figure 25.1 is an illustration of the chapter overview.

HEALTHY WORKERS
In this section, we first examine the rising costs of healthcare. Next, we discuss how organizations
can develop and possibly obtain healthy workers through wellness programs and selection.

531
532 Handbook of Employee Selection

Work
Selection Training
environment

Employee safety, health, and well being

Organizational effectiveness
Performance
Lower health care costs
Fewer accidents
Less turnover and
absenteeism

FIGURE 25.1 Conceptual overview of chapter.

HEALTHCARE COSTS
Healthcare spending is rising faster than incomes in most developed countries, with the United States
spending more per capita on healthcare than other countries. On the basis of the Organization for
Economic Cooperation and Development (OECD) data, a report by the Kaiser Family Foundation
(January 2007) indicated that in the United States per capita healthcare spending as a proportion
of gross domestic product (GDP) rose from 8.8% in 1980 to 15.2% in 2003 compared with other
OECD countries such as Finland, where the share of GDP spent on healthcare grew from 7.0% in
1980 to 7.4% in 2003 and Switzerland where the share of GDP spent on healthcare grew from 7.4%
in 1980 to 11.5% in 2003.
International comparisons of individual and family health spending are difficult given differ-
ences in healthcare systems. A recent Kaiser Daily Health Report (2008) indicated that medical
care for the typical insured family of four in the United States was $13,382 in 2006—an increase
of 9.6% from 2005, with employers paying 62% or $8,362 per family in 2006. According to
another report by the Kaiser Family Foundation (March 2008), health insurance premiums had
a cumulative growth of 78% between 2001 and 2007, with much of this increase in health insur-
ance cost being borne by employers. On the basis of data from the National Compensation Survey
conducted by the U.S. Bureau of Labor Statistics, the mean health insurance costs per worker hour
for employees with access to coverage rose from $1.81 in 2004 to $2.37 in 2009 (Bureau of Labor
Statistics, 2009). Although the numbers differ based on industry, size of organization, and exact
definitions, it appears that healthcare costs and health insurance premiums for individuals and
organizations will continue to increase.
Analyses of healthcare expenditures find that healthcare costs are often related to modifiable
lifestyle behaviors. Anderson et al. (2000) found that modifiable health risks were associated with
25% of the total healthcare expenditures among a sample of 46,026 employees. Uncontrolled stress,
smoking, and obesity were the three most costly risk factors based on healthcare expenditures in
this study. In a larger study of over 300,000 employees from six companies, Goetzel, Hawkins,
Ozminkowski, and Wang (2003) found that physical health problems cost a total of $3,524 per eli-
gible employee for medical care, absenteeism, and short-term disability program use, and mental
health problems cost only $179 on average. On the basis of an earlier study, Goetzel et al. (1998)
reported that employees with depression had 70% higher healthcare expenditures than those indi-
viduals who were not depressed. In addition, they found that individuals with uncontrolled stress
had 46% greater medical costs than those who were not stressed and the third most costly risk factor
was high blood glucose. Employees with high blood glucose had 35% greater medical expenses than
those with normal blood glucose. Other costly risk factors were obesity, tobacco use, high blood
pressure, and poor exercise habits. Somewhat surprising, excessive alcohol consumption was not
Employee Work-Related Health, Stress, and Safety 533

associated with increased medical costs, although Goetzel et al. suggested this might be reflective
of individuals with drinking problems tending to avoid the healthcare system.
These studies, as well as others, highlight the effects of modifiable health risk factors on overall
healthcare costs. It is not surprising then that organizations have implemented wellness programs
that focus on exercise, nutrition, smoking cessation, and stress management as attempts to enhance
the health of their workforce.

ORGANIZATIONAL WELLNESS PROGRAMS


Rothstein (1983) suggested that many organizations initiated organizational wellness programs
beginning in the 1970s. Organizational wellness programs typically have focused on the modifiable
health risk factors associated with lifestyle such as being overweight, lack of physical activity, poor
diet, smoking, and alcohol use. These programs often include educational and training components;
financial incentives or disincentives; disease management programs; health risk assessments; health
screenings; and special programs for medical management such as flu shots, health fairs, on-site
fitness facilities, and fitness center discounts (Shurtz, 2005). Organizational wellness programs seek
to increase employee health, productivity, and morale while decreasing absenteeism and reducing
healthcare expenditures. Shurtz (2005) reported that 80% of worksites with more than 50 employees
offered some form of wellness program and almost all employers with more than 750 employees
had some form of wellness program. As might be expected, these programs vary considerably, with
some focusing on only a single risk factor such as lack of physical fitness to others with multiple
components and prevention programs, which Parks and Steelman (2008) referred to as “compre-
hensive programs.”
A recent meta-analysis of organizational wellness programs (Parks & Steelman, 2008) found
that participation in a wellness program was related to decreased absenteeism and improved job
satisfaction supporting the effectiveness of wellness programs, although direct measures of health
were not included. Shurtz (2005), citing a review of large employers, reported that the return on
investments for their wellness programs based on healthcare expenditures ranged from $1.49 to
$4.91 with a median of $3.14 per employee. Evidence as to the effectiveness of organizational well-
ness programs in improving employees’ health is still limited. In a recent quasi-experimental design
study, Mills, Kessler, Cooper, and Sullivan (2007) found that participation in a multicomponent
organizational wellness program resulted in reduction of health risks on the basis of several self-
reported risk factors, in contrast with the comparison group—a decrease of 4.3 days annualized
absenteeism compared with the comparison group—and an increase in productivity of 7.9% over
the comparison group. The convergence of evidence suggests that wellness programs can increase
productivity and morale as well as reduce absences and healthcare costs in a cost-efficient manner.
That being said, wellness programs are not without some associated downsides. One challenge has
traditionally been getting those with the most risk to participate in the programs. For example, many
organizational fitness programs have not engaged those individuals who most need to increase their
activity levels. One approach to increase participation has been the use of incentives. The use of incentive
programs may actually create other concerns because incentives/rewards may be found to be discrimi-
natory under the Health Insurance Portability and Accountability Act (HIPAA), the Americans with
Disabilities Act (ADA), or state insurance laws (Simon, Bruno, Grossman, & Stamm, 2006; Simon,
Traw, McGeoch, & Bruno, 2007).
Wellness programs need to be designed and implemented such that they are compliant with
employment law (Kendall & Ventura, 2005; Simon et al., 2006; Shurtz, 2005). HIPAA bars health-
care plans from discriminating against individuals because of medical conditions and health status.
Whether a particular wellness plan is considered a healthcare plan and subject to HIPAA depends
on several factors, including what components are included in the program (Simon et al., 2006;
Simon et al., 2007). In addition to whether a specific wellness program is considered a health plan,
the use of incentives needs to be considered such that they are not construed as discriminatory
534 Handbook of Employee Selection

toward individuals under HIPAA; in other words, that the incentive does not depend on the health
status of the individual and rewards must be available to all similarly situated individuals or at least
provide a reasonable alternative standard for attaining the reward. Simon et al. (2006) and Kendall
and Ventura (2005) suggested that although a particular wellness program may not be discrimina-
tory on the basis of HIPAA, it may still be counter to the ADA and/or state insurance plans. For
example:

If an employer’s wellness program offers lower health plan premiums to employees who complete a
healthy lifestyles course, the ADA may require the employer to ensure that a blind or deaf employee can
access the classes and course materials. In addition, if a disability prevents an employee from earning a
wellness incentive, the ADA may require the employer to work with the employee to develop alternative
incentives that are within the employee’s ability. (Simon et al., 2006, p. 57)

Therefore, organizational wellness programs need to be carefully designed to avoid discrimina-


tion against individuals on the basis of health status and disability.

SELECTING HEALTHY WORKERS


Organizational wellness programs are one mechanism for enhancing the health of workers. An
alternative mechanism for enhancing the health of an organization’s workforce is the selection of
“healthy workers.”
As mentioned above regarding wellness programs, selection based on health status and disabil-
ity may run counter to the ADA (Rothstein, 1983). Under the ADA, it would be unlawful to base
one’s selection decision on a disability, which is defined as a condition that substantially limits one
or more major life activities and the individual can perform the essential functions of the job. For
example, obesity is not considered a disability under the ADA unless the cause of the obesity is a
physiological disorder/impairment or the obesity substantially limits a major life activity, which
might be the case with morbidly obese individuals. Therefore, if an individual is morbidly obese and
can perform the essential functions of the job, denial of employment may be deemed discrimina-
tory under the ADA. In addition, there may be other employment laws that apply. For example, in
Michigan it is illegal to discriminate based on weight. Interestingly, smoking does appear to be one
health risk factor that does not have any protections under employment law. Increasingly, employ-
ers are not only restricting smoking while at work, they are not hiring individuals who are smokers
(Smerd, 2007).
It may be possible to build a case that selection based on at least certain risk factors is a business
necessity. As indicated above, the cost of healthcare insurance, absenteeism, and lower levels of
productivity associated with many health risk factors and ill-health conditions might be regarded as
business necessity, which is one justification that has been used in implementing smoking bans at
work as well as outside of work.
Regardless of legal issues, the decision to select employees on the basis of health or risk factors
has several complications. First, the fundamental principle for selection is that selection criteria
should be job-related or consistent with business necessity. Secondly, selection criteria are generally
considered to be relatively stable characteristics of an individual that predict how well an individual
will be able to perform a job, such as job-relevant knowledge, skills, and abilities. Use of health
and health risk factors may move away from relatively stable selection factors, especially if modifi-
able health risk factors such as smoking, weight, and lack of physical fitness are being considered
as selection criteria. The selection system would then be dealing with a dynamic predictor and a
dynamic criterion. Therefore, one would expect the predictive validities to be lower than when the
predictors and/or criteria are relatively stable. Further, as Rothstein (1983) indicated, the use of
health data as predictors requires that the measurement of these predictors have sufficient sensitiv-
ity (i.e., the measure is accurate in identifying people correctly with the condition being assessed)
Employee Work-Related Health, Stress, and Safety 535

and specificity (i.e., the measure is accurate in identifying people who do not have the condition
being assessed). This is a concern for traditional selection factors such as cognitive ability tests but
is equally a concern for health and health risk factors.
Given the legal implications of using health risk factors for selection and the potentially chang-
ing levels of many health risk factors before and after hiring, the advisability of using selection for
creating and maintaining a health workforce seems weak. The empirical evidence suggests that use
of organizational wellness programs may be more efficient. Future research may determine which
health factors in interaction with which elements of the work environment and organizational well-
ness programs are most effective and which may be appropriate for use in selection systems.

WORK STRESS
The above section highlighted the cost of stress in healthcare expenditures. Considering the results
of Anderson et al. (2000) and Goetzel et al. (2004) that psychosocial risk factors, especially depres-
sion and stress, are prevalent in organizations and account for significant proportions of disabilities,
absences, and healthcare expenditures, this section will focus on stress in the workplace.
Stress in the workplace is associated with numerous health problems, including headaches,
weight control problems, sleeplessness, gastrointestinal problems, heart disease, compromised
immune systems, difficulties with memory, and psychological disorders. It has been estimated that
stress annually costs organizations billions of dollars in disability claims, absenteeism, and lost pro-
ductivity (e.g., Xie & Schaubroeck, 2001; Ryan & Watson, 2004). Various reviews of the extensive
stress literature have generally concluded that prolonged exposure to certain job demands can have
debilitating consequences for employees (Tetrick, 2002).
In this first section, we examine organizational-, job-, interpersonal-, and personal-level predic-
tors of experienced work stress. At the organizational level, we focus on work hours and various
work schedule arrangements that includes a discussion of the pros and cons of using realistic
job previews as a recruiting tool. At the job level, we examine role ambiguity and conflict, job
demands, personal control at work, and adaptive performance. At the interpersonal level, we exam-
ine a lack of social support, abusive supervision, organizational politics, and political skill. Finally,
we examine several personality types and individual-level demographic predictors that include age
and gender.

ORGANIZATIONAL LEVEL STRESSORS


The widely held assumption that long work hours inevitably lead to negative health and quality-
of-life outcomes is highly questionable. Barnett (2006) argued that long work hours appear to be a
weak predictor of outcomes because the absolute number of work hours fails to take into account
the distribution of those hours. Arguably, the distribution of work hours has greater implications
for outcomes than does the number of work hours per se. Over the past 2 decades, the stereotypical
workweek and work schedules have begun to vanish. Typical or standard work is often assumed to
be working during the day on the basis of a Monday through Friday schedule. Interestingly, most of
us do not fit the assumed typical workweek (Barling, Inness, & Gallagher, 2002a). In fact, Fenwick
and Tausig (2001) found that less than one-third of the workforce in the United States and Canada
is employed in jobs that fit the Monday through Friday, full-time day cycle.
In recent years, the presence of contingent workers and flexible work schedules has grown because
of an increasingly competitive market and the availability of new information technology. For many
organizations, employing workers on a more temporary basis provides a way to maximize flexibility
and minimize costs, especially when faced with seasonal work demands. Further, flexibility in work
schedules has been seen as a way to not only help organizations to remain competitive but to offer
employees more control over their own work schedules. Even full-time employment can be flexible,
such as shift work. Full-time shift work might involve working 35–40 hours during the week, but
536 Handbook of Employee Selection

the work may be performed at night or early mornings, such as the “graveyard shifts.” This can
benefit the organization by allowing services or production to be on a continual basis, but this can
also help the employees by allowing flexibility in their work schedules so that they best meet their
own needs. For example, dual career couples with small children may like the idea of working dif-
ferent shifts because this might aid in their ability to care for their children. Flexible time schedules,
job sharing (e.g., two employees working part-time but they share one job), temporary employment,
home-based work, and teleworking (e.g., working from home, hotels, or other remote work sites)
have all become more popular in recent years (Barling et al., 2002a). However, the issue is not if
these types of arrangements aid in flexibility for the organization and the employee, but rather if
these arrangements are reducing experienced stress for employees and are consistent with a healthy
workforce. The following section examines the consequences of alternative work arrangements on
the well being of employees.
Research on the psychological well being of part-time workers versus full-time workers has not
demonstrated significant differences in terms of employee job attitudes or well being (Barling &
Gallagher, 1996). What appears to be the most important factor differentiating full- from part-time
workers regarding their well being is whether working part-time is voluntary. Voluntary part-time
employment actually has been shown to be beneficial in terms of job satisfaction and general well
being if the part-time employee has a sense of control over work scheduling (Krausz, Sagie, &
Biderman, 2000). Finally, many workers are part-time workers who are only working part-time
because of a preexisting health concern (Mykletun & Mykletun, 1999).
Health concerns may become even more pronounced when coupled with rotating shifts (Jamal &
Baba, 1997). Shiftwork, especially nightwork, has been found to be a risk factor for cardiovascular
disease (Boggild & Knutsson, 1999). Parkes (2003) found that, in general, dayworkers reported
more favorable perceptions of their work environment than shiftworkers. However, she also found
that differences in the work environment (i.e., onshore vs. offshore) between dayworkers and shift-
workers were a moderator in these relationships. She argued that the organizational setting in which
work routines are similar for dayworkers and shiftworkers, and in which all resources are available
to both groups, might reduce the negative perceptions associated with shiftworkers. Several factors
may explain the relationship between shiftwork and health concerns, including the employee’s abil-
ity to adjust to differing schedules and the supportive nature of the employee’s family. Additional
research that can separate out these effects is needed before we can make a clear statement about
the relationship between full-time versus part-time workers and working shifts on employee stress
and health. One factor that does appear to be important in promoting health in employees is whether
the work arrangements are voluntary.
In a review of the research on work arrangements, Barling and colleagues reviewed several
important work arrangements, including temporary workers, job sharing, shiftwork, full-time ver-
sus part-time work, and seasonal and migrant employment (Barling et al., 2002a). They concluded
that psychological well being depends less on the nature of the work arrangement and more on
whether the arrangement was voluntary or not. Being able to choose or have some control over work
arrangements is a very important factor in the ability to handle job stressors and the health and well
being of employees.
Given that organizational work hours and schedules have the potential to be stressful to many
workers, perhaps recruiting individuals who are comfortable with less traditional schedules might
help to ensure a long and effective employment relationship. One way to recruit workers who have
an understanding of the employment environment is through realistic job previews (RJPs). The
basic argument is that job applicants will be better able to make informed decisions about whether
or not to pursue a job opportunity if they have a clear idea about the job and job environment. RJPs
give applicants a sense of realism for positive and negative aspects of the job and job environment
that (a) might reduce the number of applicants who remain interested in the job but (b) increase the
retention of those workers who are hired (Wanous, 1980). However, some empirical research (i.e.,
Bretz & Judge, 1998) suggests that RJPs may have too many opportunity costs for the organization
Employee Work-Related Health, Stress, and Safety 537

because the highest quality applicants may be less willing to pursue jobs for which negative infor-
mation has been presented. Clearly, organizations need to be honest about the actual job; how-
ever, emphasizing the negative aspects of the job may hurt recruiting, especially with high-quality
applicants.

JOB-LEVEL STRESSORS
Job and role stressors such as role conflict, role ambiguity, and role overload (Kahn, Wolfe, Quinn,
Snoek, & Rosenthal, 1964) have long been known to contribute to the stress experience. Role conflict
occurs when employees’ expectations are incongruent with those expressed by their role senders.
Communication by those in authority of work expectations that are largely incompatible with those
understood and internalized by employees may increase the likelihood of burnout. Role ambiguity
is related to the amount of predictability in the work environment; thus, experienced stress may be
more prevalent in situations in which employees are uncertain about their work goals and the means
available for accomplishing them. Role overload, qualitative and quantitative, can contribute to the
experience of burnout. Workers may experience qualitative overload if they feel deficient in the basic
skills necessary for effective task completion. Quantitative overload is characterized by the belief
that one’s work cannot be completed in the time allotted. Employees may experience stress if they
perceive that they cannot successfully complete their work because of lack of skill, lack of time,
or both. Vandenberg and colleagues have argued how occupational stress research has consistently
demonstrated the deleterious effects of role stressors on occupational strain outcomes such as burn-
out, dissatisfaction, decreased performance, and psychophysiological responses such as increased
heart rate and blood pressure (Vandenberg, Park, DeJoy, Wilson, & Griffin-Blake, 2002).
Perhaps one of the most well-known conceptualizations of job stress is that of Karasek’s (1979)
Demands-Control model. Karasek suggested that heavy job demands, coupled with a lack of con-
trol, are associated with strain and job dissatisfaction. This is because control provides individuals
with the confidence and efficacy to perceive and interpret their task environment in nonthreatening
ways, thereby neutralizing the potentially dysfunctional effects of job demands (Theorell, 2004).
Accordingly, we must distinguish between the job demands placed on the individual employee, as
well as the discretion permitted the worker in deciding how to cope with these heightened expec-
tations. Implicitly, demands can increase with little or no threat to the individual’s psychological
strain as long as appropriately adequate levels of job control are maintained (Mauno, Kinnunen, &
Ruokolainen, 2007).
Finally, the need for adaptive workers has become increasingly important as today’s organiza-
tions are characterized by changing, dynamic, and sometimes, turbulent environments (Ilgen &
Pulakos, 1999). Employees need to be adaptable, flexible, and tolerant of uncertainty to perform
effectively (Pulakos, Arad, Donovan, & Plamondon, 2000). Employee adaptability encompasses a
wide variety of behaviors including handling emergencies or crisis situations, handling work stress,
dealing with uncertainty, and learning new technologies and procedures (Pulakos et al., 2000). The
question is how can managers select adaptable workers or train workers to be adaptable?
Given the various types of adaptable behaviors, it might not be possible to select or train workers
to be adaptable on all aspects of their performance; however, we may be able to offer some general
guidelines for selection and training. First, research has shown some evidence that certain person-
alities might be more (or less) adaptive. For example, LePine, Colquitt, and Erez (2000) examined
the effects of conscientiousness on decision-making before and after unforeseen changes in a task
context and found individuals higher in conscientiousness do not adapt quickly to change. Much
more research is needed on personality profiles (i.e., examining several personality dimensions in
conjunction with one another) before selecting employees based on personality is warranted. This
will be discussed in more detail in a later section.
Second, managers may want to consider prior experience in adaptability as a selection crite-
rion. Research has long demonstrated that one of the best predictors of future performance is
538 Handbook of Employee Selection

past performance (e.g., Wernimont & Campbell, 1968). Biodata instruments that emphasize prior
experiences with crises and emergencies may prove to be an effective means of selection (cf.
Pulakos et al., 2000). Finally, training employees to be more adaptive by exposing them to various
unpredictable situations in a training setting that they might be expected to encounter in the work
setting may prepare workers to be more adaptive and creative.

INTERPERSONAL RELATIONSHIPS
Employees require specific job resources (e.g., social support) to successfully navigate stressful
work environments while maintaining psychological health. To date, social support has
attracted the most extensive amount of investigation in the interpersonal domain, and findings
consistently support the idea that a lack of support from coworkers and supervisors is highly
correlated with increases in occupational stress and burnout (Maslach, Schaufeli, & Leiter, 2001).
Work environments that fail to support emotional exchange and instrumental assistance may
exacerbate strain by isolating employees from each other and discouraging socially supportive
interactions. Workplaces characterized by conflict, frustration, and hostility may have the same
effect. Besides a general lack of social support, we focus on two additional types of interpersonal
stressors commonly found in the workplace—abusive supervision and perceptions of politics. We
also examine how having political skill can help to alleviate some of the negative effects from
interpersonal stressors.
Abusive supervision is one of the most detrimental interpersonal stressors found in the work-
place. Abusive supervision reflects subordinates’ perceptions of negative and hostile verbal and
nonverbal leader behaviors (Tepper, 2007). Behaviors include public criticism, yelling, rudeness,
bullying, coercion, and blaming subordinates for mistakes they did not make (Burton & Hoobler,
2006; Tepper, Duffy, & Shaw, 2001).
Research indicates that abused subordinates are less satisfied with their jobs, less committed
to their organizations, and more likely to display turnover intentions than nonabused subordinates
(Schat, Desmarais, & Kelloway, 2006). Employees consider abusive supervision to be a source of
stress and injustice in the workplace that has the potential to influence their attitudes, psychologi-
cal distress, and physical well being (Aryee, Chen, Sun, & Debrah, 2007; Grandey, Kern, & Frone,
2007; Tepper, 2007).
If employees believe their behaviors have no bearing on the accrual of desired outcomes, their
sense of volition is weakened (Greenberger & Strasser, 1986), and many researchers believe that
perceptions are more powerful predictors of functioning than actual control (Burger, 1989). This
distinction is critical because individuals’ perceived control influences their behaviors and emo-
tions, regardless of the actual control conditions contributing to these perceptions. Work environ-
ment factors such as regulated administration, available help, and feedback influence perceived
control. Not surprisingly, recent work suggests that supervisors may engage in abuse to protect their
own sense of power and control over work situations (Tepper, Duffy, Henle, & Lambert, 2006), thus
limiting that of their employees. Supervisors who behave in an abusive way toward subordinates
have been found to lead to more experienced stress (Tepper, 2007) and reduced psychological and
physical well being for employees (Grandey et al., 2007).
Another well-researched interpersonal level stressor is organizational politics. Organizations
have long been considered political arenas, and the study of organizational politics has been a
popular topic for many years. Mintzberg (1983) defined politics as an individual or group behavior
that is typically disruptive, illegitimate, and not approved of by formal authority, accepted ideol-
ogy, or certified expertise. Organizations, indeed, can be viewed as political arenas, where informal
negotiation and bargaining, deal-making, favor-doing, quid-pro-quo interactions, and coalition and
alliance building characterize the way things really get done. Environmental circumstances, such
as perceptions of organizational politics, can be thought of as work demands, which are potential
sources of stress because they threaten or cause a depletion of the resources individuals possess.
Employee Work-Related Health, Stress, and Safety 539

Furthermore, research indicates that workplace politics are a significant concern and source of
stress for many workers.
What we know less about are the characteristics that enable one to exercise influence in ways that
lead to success in political work environments. Some have referred to such qualities as interpersonal
style, “savvy,” “street smarts,” and “political skill” (e.g., Reardon, 2000). Research has demon-
strated how different forms of personal control (e.g., interpersonal social skill or political skill) can
mitigate the negative effects of job stressors. Political skill is the ability to effectively understand
others at work and to use such knowledge to influence others to act in ways that enhance one’s per-
sonal and/or organizational goals (Ferris et al., 2007).
Politically skilled individuals are socially astute and keenly aware of the need to deal differently
with different situations and people. Therefore, they reflect the capacity to adjust their behavior
to different and changing situational demands (i.e., self-monitoring) in a sincere and trustworthy
manner. It has been suggested that political skill generates an increased sense of self-confidence
and personal security because politically skilled individuals experience a greater degree of inter-
personal control, or control over activities that take place in social interactions at work (Paulhus &
Christie, 1981; Perrewé et al., 2005). Furthermore, greater self-confidence and control lead indi-
viduals to interpret workplace stressors in different ways, resulting in such individuals experiencing
significantly less strain/anxiety at work (Kanter, 2004). Consistent with this argument, Perrewé
et al., (2005) found that political skill neutralized the negative effects of role conflict on psycho-
logical anxiety, somatic complaints, and physiological strain (i.e., heart rate, systolic and diastolic
blood pressure) and that political skill moderated the role overload-strain relationship in a similar
manner (Perrewé et al., 2005). The important message is that personal control, such as believing
one has the political skill to successfully navigate his or her work environment, appears to play a
fairly significant role in buffering the negative effects of work stressors. On the other hand, a lack
of personal control may exacerbate the stressor-strain relationship or it may even be perceived as a
stressor itself.
Given the importance of political skill, we recommend organizations consider this an important
attribute in employees and encourage the development of political skill. Today’s training, more
than ever before, needs to be compelling, realistic, practical, relevant, and lasting. In addition,
training should encourage risk-taking and facilitate improved awareness and behavioral flexibility.
Assessment centers that include simulations and drama-based training may be viable options for
political skill development (see Ferris, Davidson, & Perrewé, 2005, for a more in-depth discussion
of developing political skill). Drama-based training is a training model that includes lifelike simu-
lations for participants to practice managing complex human interactions in a safe and controlled
learning environment (St. George, Schwager, & Canavan, 2000), and, as such, it provides a useful
vehicle to shape and develop various social skills.
Further, assigning individuals to work with skilled mentors is another important way to develop
influence skills. Individuals can observe professionals in real work situations as they exercise influ-
ence in meetings with subordinates and peers. Language, facial expressions, body posture, and
gestures will convey messages to observers as to how influence is best exercised. The key is to be
sure that individuals are assigned to talented and understanding mentors who have plenty of social
influence interactions and are given plenty of opportunities to discuss various social influence inter-
actions encountered.

PERSONAL CHARACTERISTICS
Although various aspects of the external environment play a critical role in the experience of stress
and burnout, specific personal characteristics may lead some individuals to be more likely to expe-
rience strain than others in the same environment. The evidence on individual differences, such
as personality differences, suggests that certain individuals are more prone to strain than others.
The Five-Factor Model or “Big Five” model of personality has been extensively examined in the
540 Handbook of Employee Selection

organizational context over the past decade. Although some disagreement exists over the appro-
priate names for the five factors, most would agree that the Five-Factor Model consists of five
broad dimensions of personality: extraversion, neuroticism, conscientiousness, agreeableness, and
openness to experience. Proponents suggest that these five factors are the basic building blocks of
personality (McCrae & Costa, 1999). Research using this typology indicates that individuals high
in neuroticism are more likely to experience stress and burnout (Zellars & Perrewé, 2001). Further,
extraversion, agreeableness, and openness to experience are less likely to experience stress (Zellars,
Perrewé, & Hochwarter, 2000). In a review of the role of personality in organizations, Perrewé and
Spector (2002) discussed how Type A behavior pattern and negative affectivity have been shown
to have positive associations with experienced stress and negative associations with health and well
being. In addition, individuals with a high internal locus of control experience strain less than indi-
viduals with high external locus of control.
Individuals high in conscientiousness are described as efficient, diligent, thorough, hardwork-
ing, persevering, and ambitious (e.g., McCrae & John, 1992). Conscientiousness has been posi-
tively related to organizational citizenship behaviors (Borman & Penner, 2001), job performance
(Barrick & Mount, 1991), workplace safety performance (Wallace & Vodanovich, 2003), intrinsic
and extrinsic career success (Judge, Higgins, Thoresen, & Barrick, 1999), and negatively related to
absence (Judge, Martocchio, & Thoresen, 1997). Taken in its entirety, it is clear that conscientious
employees possess several positive work attributes.
However, more research is needed on personality profiles before selecting employees based on
personality is warranted. For example, conscientiousness had a negative relationship with decision
quality after an unanticipated change, which suggests that conscientious people do not adapt quickly
to change (LePine et al., 2000). Thus, we do not recommend selecting (or not selecting) employees
on the basis of one personality dimension (e.g., conscientiousness) alone. For example, research has
found that conscientiousness, when coupled with positive affectivity (i.e., the dispositional tendency
to experience positive emotions across situations and time), resulted in the lowest levels of reported
job tension (Zellars, Perrewé, Hochwarter, & Anderson, 2006). Further, the individual difference
variables of perceived control, optimistic orientation, and self-esteem are highly correlated vari-
ables and, together, form a hardy or “resilient personality” (Major, Richards, Cooper, Cozzarelli,
& Zubek, 1998) that can help workers to adapt to change and cope with work stressors. Although a
comprehensive examination of personality is beyond the scope of this chapter, personality clearly
has the potential to be a powerful selection tool. However, additional research is critical before
confident predictions about workers’ ability to handles stressors and adaptable performance can be
made.
In addition to personality characteristics, simple demographic differences have been shown to
have an association with experienced stress. We focus on two demographic characteristics that have
been found to have some relation with occupational stress—specifically, age and gender. Research
has demonstrated that younger employees consistently report a higher level of burnout (Maslach
et al., 2001). Some researchers suggest that older employees experience lower levels of burnout
because they have shifted their own expectations to fit reality on the basis of their personal experi-
ences (Cordes & Dougherty, 1993). These findings suggest that older, more experienced employees
are better able to handle the demands of stressful work environments. Or alternatively, the findings
regarding older workers may reflect that they have passed some critical threshold of burnout that
would trigger turnover; that is, they may handle stressful environments by altering their percep-
tions and reducing their expectations of what is possible in terms of career accomplishment or
satisfaction. High expectations and unmet expectations can encourage increased levels of burnout
(Cordes & Dougherty, 1993). Younger employees tend to be more idealistic and thus may react more
intensely when their overly optimistic career expectations are shattered.
On the other hand, younger workers have been shown to respond more positively to some work-
place stressors—specifically, perceptions of organizational politics. Results across three studies
demonstrated that increases in politics perceptions were associated with decreased job performance
Employee Work-Related Health, Stress, and Safety 541

for older employees and that younger employees achieved higher performance scores when percep-
tions of politics were high (Treadway et al., 2005).
In regard to gender, stress can result from feelings of discrimination in a male-dominated work
environment (Sullivan & Mainiero, 2006). First, there is much literature that suggests that work-
ing women are more likely to experience stress on the basis of being female (see Powell & Graves,
2003). Being employed in a male-dominated work environment is a cause for stress, because the
norms, values, and expectations of the male-dominated culture are uniquely different (Maier, 1999).
Further, women in male-dominated environments are more likely to face certain stressors such as
sexual harassment and discrimination (Nelson & Burke, 2000).
Gender does not appear to be a strong predictor of preferred work hours (Jacobs & Gerson,
2004). Family circumstances are more important than gender in predicting preferred work hours
(Barnett, 2006); specifically, women and men with young children want more time away from work
than do other groups. Although women with young children cut back on their time at paid work
more so than men, they do so to a smaller extent than in previous generations. Jacobs and Gerson
(2004) found little support for the popular belief that married women with young children are the
primary group wishing to work less and they state that “About half of married men and women
across a range of family situations express such a desire” (p. 73).
Selecting employees on the basis of personality, gender, or age is not recommended. What is
encouraged is setting realistic expectations for career advancement, allowing flexibility and control
in work schedules, and training opportunities for all employees.

SUMMARY OF WORK STRESS


In this section, we examined organizational, job, interpersonal, and personal level predictors of
experienced work stress. Although we examined several personal characteristics that have been
associated with higher levels of experienced stress, the selection of “strain-resistant” individuals
into organizations is not necessarily recommended. Just as environmental conditions can affect
employees, employees can adapt to and change their environments to make the work situation less
stressful. Given the complexity and reciprocal effects of individuals and their environments, we do
not have enough empirical findings to be confident that certain individuals are strain-resistant in
all situations. Further, efforts to recruit and select strain-resistant individuals do little to help exist-
ing employees. For most organizations, strain prevention programs, such as the wellness programs
discussed earlier, may be useful. Such programs can be used to teach individuals how to identify
stressors and modify their coping strategies. Specific training strategies include specific goals to
provide more realistic expectations of work, better time management strategies, facilitation of social
support, simulation and drama-based training, and developing social networking skills through
mentoring.

OCCUPATIONAL SAFETY
In this section, we review the role selection plays in supporting safer workplaces. Accidents at work
are costly for individuals and organizations and avoiding severe accidents is an essential priority
for all organizations. Therefore, it is not surprising that considerable attention is paid to factors that
might influence whether an individual is involved in an accident. Sanders and McCormick (1987)
identified selection as one of the key strategies used to reduce human error in addition to training
and job design.
Despite the popularity of selection systems to manage safety, there is limited evidence for the
effectiveness of selection for improving organizational safety. Guastello (1993) conducted a meta-
analytic review of accident prevention programs and found that personnel selection programs had
a relatively weak relationship with accident rates. He found that although individual selection prac-
tices were the most common type of accident reduction programs that used, they had the least
542 Handbook of Employee Selection

effective outcome compared with ten types of intervention. Sanders and McCormick (1987) con-
sidered work design to be a more effective approach to improving safety compared with selection
because it requires less ongoing maintenance and support. Moreover, they argued it is easier to limit
the occurrence of human errors by making them impossible, difficult, or inconsequential through
work design rather than relying on changing and managing individuals.
There is substantial agreement among researchers from various disciplines that safety needs to be
considered from a systemic perspective that includes factors at the individual, micro-organizational,
and macro-organizational level (Hofmann, Jacobs, & Landy, 1995). Vredenburgh (2002) found that
selection practices were effective as part of a broader proactive strategy of recruiting and training
safety conscious individuals. However, focusing solely on individual causes is neither sufficient nor
useful for understanding safety at work. Some researchers suggest that selection practices designed
to improve safety will be less important than training and design interventions, which have a more
systemic and wide ranging impact on safety (Lawton & Parker, 1998).
Figure 25.2 depicts how selection processes can be situated within a larger systemic framework
that includes other systems (e.g., training) while focusing on individual differences and behavior.
The goal of selection is to identify individual characteristics that might influence individual behav-
ior and cognition, which, in turn, might influence consequences such as accidents in an organiza-
tion. The figure also shows that systemic factors might shape any aspect of the implied causal chain
from individual differences to consequences. In a safety context, these systemic factors have been
conceptualized in terms such as “latent failures.” They represent the impact of managerial and
organizational processes on safety outcomes.
With the above considerations in mind, our review will focus on the role of selection within a
broader context. We begin by looking more closely at the meaning of safety as a criterion construct.

SAFETY CRITERIA AND SAFETY SYSTEMS


Like other topics in this chapter, the criterion of interest is complex and conceptually problematic.
Safe working can be viewed as the presence of safe behaviors (e.g., following correct procedures)
or the absence of unsafe ones (e.g., avoiding errors). In addition, the criterion of safe behavior is
often not clearly distinguished from its consequences, such as personal injury. For example, much
of the research investigating individual differences and safety focuses on the prediction of accidents
reported by the organization. However, accidents might be determined by a range of situational
factors beyond the proximal behavior of the individual. We emphasize the distinction between acci-
dents and more proximal behaviors in Figure 25.2. These proximal behaviors include individual
actions such as slips that might lead to accident and injury as well as positive behaviors such as
using safety equipment that might reduce accidents and injury. Next, we review the literature pre-
dicting accidents at work and then consider the prediction of individual behavior more proximally
associated with accidents.

PREDICTING WORKPLACE ACCIDENTS


It is estimated that at least 80% of accidents are the result of some kind of human error (Hale
& Glendon, 1987). Accidents have been the main focus of much safety research, including that
related to selection. However, the notion of accidents is a broad and a limiting criterion for selec-
tion. Accidents are a broad criterion because accidents encompass events from minor falls to
events that result in death. Accidents are a limiting criterion because they do not include major
events such as where a serious injury is narrowly averted. Accidents are also constrained as a
criterion because of problems in recording and reporting these events as discussed later in this
section.
Despite these concerns with accidents and injury as criteria for evaluating selection practices, they
remain the most commonly used measure of safety outcomes. Therefore, we first review evidence
Employee Work-Related Health, Stress, and Safety 543

System
(e.g., training, risk)
Selection
processes

Individual motivational Proximal behaviors


& cognitive predictors and cognitions Consequences
(e.g., personality, (e.g., rule following, (e.g., accidents)
ability) lapses)

FIGURE 25.2 Proximal safety behaviors and consequences.

for selection methods and measures that reduce accident outcomes. Many reviews of safety also
include road accidents and injuries as part of their measurement. However, we exclude studies of
road safety unless they specifically incorporate the work context.
There is a long history of research seeking to identify factors that predict whether individuals
will experience a work accident. The most often-studied attribute—and perhaps least successful—
has been the search for an “accident-prone” individual. Despite the many studies and the popular-
ity of this idea, there is little consistency in the findings from this kind of research. Overall, there
is little evidence that an accident-prone personality can be identified or that systematic individual
differences can distinguish employees who have accidents from those who do not (Lawton &
Parker, 1998).
Glendon and McKenna (1995) argued that it is impossible to define an overall stable profile that
identifies an accident-prone individual. Although the concept of accident proneness does not match
the evidence, researchers continue to identify personality characteristics that might be linked to
accidents (Lawton & Parker, 1988).
A wide range of personality dimensions have been investigated as potential antecedents of acci-
dents and injury. Dimensions of the Big Five have received considerable attention, and Clarke and
Robertson (2005) were able to conduct a meta-analysis of studies involving workplace and motor
vehicle accidents. Results from the workplace studies showed that low conscientiousness (nine stud-
ies) and low agreeableness (seven studies) were associated with more individual accidents.
Trait affectivity has also been considered as a broad personality dimension that might predict
accidents and injury. Iverson and Erwin (1997) found trait positive affectivity and trait negative
affectivity were related to accidents 1 year later after controlling for a range of job conditions.
Although they did not control for stability in these characteristics, the design was stronger than
many in this area. They suggested that extraversion factors such as overconfidence and intolerance
were associated with risk-taking, and neuroticism factors such as anxiety and indecision were asso-
ciated with task distractibility. Frone (1998) found negative affectivity but not rebelliousness and
impulsivity to be related to accidents. However, this relationship disappeared after taking account
of physical hazards and workload.
Outside of the Big Five and trait affectivity, Guastello (1993) found that predictors associated
with maladjustment did show a positive relationship with lower accidents. Two types of maladjust-
ment were considered. Personal maladjustment was based on measures such as distractibility and
tension. Social maladjustment was based on measures such as safety locus of control. Studies of
impulsivity, alcohol use, and drug use showed no significant relationships with accidents. Liao,
Arvey, Butler, and Nutting (2001) found psychopathic deviant and conversion hysteria scales of the
Minnesota Multiphasic Personality Inventory (MMPI) were associated with frequency of injuries in
a prospective study of firefighters. They also found social introversion to be associated with injury
544 Handbook of Employee Selection

rates (Liao et al., 2001, p. 231, for a review of MMPI types). Conversion hysteria was based on
patients who exhibited some sensory or motor disorder. Psychopathic deviants were more likely to
act on impulse or ignore rules. Finally, locus of control has been identified in some studies as being
related to accidents; however, research in this area is inconsistent and inconclusive (see Lawton &
Parker, 1998).
A range of demographic factors and job types have been linked to accidents. Adolescents repre-
sent the age group with the highest risk for nonfatal accident and injury (Frone, 1998). A concurrent
study of adolescents found work injuries were associated with gender, negative affectivity, job ten-
ure, and exposure to physical hazards, excessive workloads, job boredom, poor physical health, and
on-the-job substance abuse (Frone, 1998). Female firefighters experienced more injuries (Liao et al.,
2001). Studies of general mental ability have been contradictory (see Hansen, 1989, for a review).
Physical abilities such as strength and flexibility can be valid predictors of performance in hazard-
ous work environments and so might be used to predict safety outcomes (Hogan & Lesser, 1996).

PREDICTING SAFETY BEHAVIORS


Beyond accidents as a criterion, it is important to consider how selection procedures might predict
the specific safety behaviors that precede accidents and near misses, or that increase the potential
for accidents to occur. To review this area, we build on a distinction between safety compliance and
safety participation that has been developed in the general area of occupational health and safety
(Griffin & Neal, 2000). Safety compliance refers to behaviors such as using correct safety proce-
dures and equipment and complying with safety regulation. These behaviors contribute to safety
outcomes associated with an employee’s core task activities. Safety participation refers to behaviors
such as participating in safety meetings, communicating safety issues to others, and suggesting
ideas for improving organization. These behaviors support the broader organizational context of
safety rather than the safety of the individual or the specific task.
Cognitive and motivational antecedents influence safety compliance and safety participation.
Cognitive processes include the knowledge to carry out the tasks, understanding the consequences
of actions, and attending to important events. Motivational processes describe the willingness to
engage in a specific behavior. It is possible that cognitive processes are more important for safety
compliance whereas motivational processes are more important for safety participation (Motowidlo,
Borman, & Schmit, 1997). However, there is little empirical evidence for this proposition at this
stage and cognitive and motivational predictors should be considered for safety compliance and
safety participation. We review some of the cognitive and motivational predictors that might be use-
ful for selection of safety compliance and safety participation next.

SAFETY COMPLIANCE
Behaviors associated with safety compliance include vigilance, perseverance, and accurately
following procedures. Tests for vigilance can provide information about the extent to which
individuals are able to maintain attention. For example, in critical medical contexts, the ability
to maintain vigilant scanning might be an important element of safety behavior (Subramanian,
Kumar, & Yauger, 1994).
Persevering with safety compliance requires consistent production and maintenance of effort
over time. Conscientiousness is a predictor of effort that has been validated for general job per-
formance and linked to accident outcomes (Barrick & Mount, 1991; Clarke & Robertson, 2005).
Conscientiousness should play an important role in sustaining safety compliance. From a different
perspective, distractibility or neuroticism can reduce an individual’s ability to maintain consistent
effort over time (Hansen, 1989).
Avoiding errors and mistakes is important for safety compliance. Errors of execution and
action (e.g., slips, lapses, trips, and fumbles) and procedural mistakes are more likely to arise
Employee Work-Related Health, Stress, and Safety 545

from attention failures. Several cognitive processes have been linked to attention failures and the
situational awareness required for scanning and responding to the work environment (Carretta,
Perry, & Ree, 1996). Cognitive failure (Simpson, Wadsworth, Moss, & Smith, 2005) and safety
consciousness (Westaby & Lee, 2003) describe the way individuals pay attention to the safety
requirements of the work environment, and selection activities can assess the degree to which
individuals are able to demonstrate these capacities. On the other hand, knowledge-based mistakes
occur when an individual lacks the appropriate information to perform correctly. For these types
of mistakes, safety knowledge is likely to be a more important predictor (Hofmann et al., 1995).
Finally, it is important to consider deliberate noncompliance with safety requirements. Integrity
tests have shown validity for predicting organizationally counterproductive behaviors such as rule-
breaking and dishonesty (Sackett & Wanek, 1996).

SAFETY PARTICIPATION
The behaviors that comprise safety participation have received less attention than safety compli-
ance behaviors from the perspective of personnel selection. By definition, these behaviors often
go beyond individual core task requirements and may be discretionary in some jobs. Standard job
analysis practices that focus on individual task performance are therefore less likely to articulate the
behaviors that are important for safety participation.
Selection for these behaviors requires consideration of the broader context and its constraints.
For example, the ability to communicate safety concerns with others and encourage safety com-
pliance of team members might be critical where teams work in high-risk environments. Validity
evidence from personality testing suggests that extraversion can predict performance in jobs
requiring social interaction (Barrick & Mount, 1991). To date, research has focused on contextual
factors that motivate these behaviors such as leadership (Barling, Loughlin, & Kelloway, 2002b)
and job conditions (Probst & Brubaker, 2001). Organizations that can articulate the nature of
safety participation in their specific context will be better able to identify potential individual
predictors of these activities.

SUMMARY OF SAFETY
In summary, our review suggests that selection can play a part in a safer work environment but its role
is complex. Many of the attributes required are trainable or are strongly influenced by the organi-
zational environment. Methodological limitations, such as the use of concurrent designs, reduce the
ability of many safety studies to inform selection systems. However, an equally important concern
is the degree to which theory is used to explain the behaviors that constitute the criterion domain
of work safety. Further theoretical development about the way individual behaviors contribute to a
safety system will enhance the role that can be played by selection procedures.

CONCLUSIONS
In the three sections above, we have examined correlates of employee health, work stress, and
safety. On the basis of the literature, there appear to be consistent findings that workplace factors
can enhance the health and safety of employees. Also, there are some relatively stable individual
characteristics that have been found to be related to stress, resilience, safety compliance, and safety
participation. Unfortunately, the empirical literature has not generally considered workplace factors
and individual characteristics to exam potential interactions between person characteristics and
environmental factors or the relative contribution of each in predicting health, stress, and safety.
Many of the theoretical perspectives relative to occupational safety and health including stress have
not specifically taken an interactional perspective and tend to focus on situational factors or per-
sonal characteristics.
546 Handbook of Employee Selection

Although there is support for the effects of some relatively stable individual characteristics
that might be useful for selection purposes in creating and maintaining a healthy workforce and
a healthy work environment, the current empirical evidence is not strong and there are potential
legal ramifications in using some of these characteristics in selection systems. It is possible that
given certain contexts, selection based on individual characteristics may have utility. However, the
literature as a whole appears to currently favor workplace interventions as more effective compared
with selection.

REFERENCES
Anderson, D. R., Whitmer, R. W., Goetzel, R. Z., Ozminkowski, R. J., Dunn, R. L., Wasserman, J., & Serxner,
S. (2000). The relationship between modifiable health risks and group-level health care expenditures.
American Journal of Health Promotion, 15, 45–52.
Aryee, S., Chen, Z., Sun, L., & Debrah, Y. A. (2007). Antecedents and outcomes of abusive supervision: Test
of a trickle-down model. Journal of Applied Psychology, 92, 191–201.
Barling, J., Inness, M., & Gallagher, D. G., (2002a). Alternative work arrangements and employee well being.
In P. L. Perrewé & D. C. Ganster (Eds.), Historical and current perspectives on stress and health:
Research in occupational stress and well being (Vol. 2, pp. 183–216). Oxford, England: JAI Press/
Elsevier Science.
Barling, J., Loughlin, C., & Kelloway, E. K. (2002b). Development and test of a model linking safety-specific
transformational leadership and occupational safety. Journal of Applied Psychology, 87, 488–496.
Barnett, R. C. (2006). Relationship of the number and distribution of work hours to health and quality-of-life
outcomes. In P. L. Perrewé & D. C. Ganster (Eds.), Employee health, coping and methodologies: Research
in occupational stress and well being (Vol. 5, pp. 99–138). Oxford, England: JAI Press/Elsevier Science.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Beehr, T. A., & Glazer, S. (2001). A cultural perspective of social support in relation to occupational stress. In
P. L. Perrewé & D. C. Ganster (Eds.), Exploring theoretical mechanisms and perspectives: Research in
occupational stress and well being (Vol. 1, pp. 97–142). Oxford, England: JAI Press/Elsevier Science.
Boggild, H., & Knutsson, A. (1999). Shift work, risk factors, and cardiovascular disease. Scandinavian Journal
of Work, Environment & Health, 25, 85–99.
Borman, W. C., & Penner, L. A. (2001). Citizenship performance: Its nature, antecedents, and motives. In B.
W. Roberts & R. Hogan (Eds.), Personality in the workplace (pp. 45–61). Washington, D.C.: American
Psychological Association.
Burton, J., & Hoobler, J. (2006). Subordinate self-esteem and abusive supervision. Journal of Managerial
Issues, 18, 340–355.
Burton, W. N., Chen, C., Conti, D., Schultz, A., Pransky, G., & Edington, D. (2005). The association of health
risks with on-the-job productivity. Journal of Occupational and Environmental Medicine, 47, 769–777.
Bureau of Labor Statistics. (2009). Health insurance costs for civilians. Retrieved from http://data.bls.gov/
cgi-bin/surveymost
Carretta, T. R., Perry, D. C., & Ree, M. J. (1996). Prediction of situational awareness in F-15 pilots. The
International Journal of Aviation Psychology, 6, 21–41.
Clarke, S., & Robertson, I. T. (2005). A meta-analytic review of the Big Five personality factors and accident
involvement in occupational and non-occupational settings. Journal of Occupational and Organizational
Psychology, 78, 355–376.
Collins, J. J., Baase, C. M., Sharda, C. E., Ozminkowski, R. J., Nicholson, S., Billotti, G. M., Turpin, R. S.,
Olson, M., & Berger, J. L. (2005). The assessment of chronic health conditions on work performance,
absence, and total economic impact for employers. Journal of Occupational and Environmental Medicine,
47, 547–557.
Cordes, C. L., & Dougherty, T. W. (1993). A review and integration of research on job burnout. Academy of
Management Review, 18, 621–656.
Fenwick, R., & Tausig, M. (2001). Scheduling stress: Family and health outcomes of shift work and schedule
control. The American Behavioral Scientist, 44, 1179–1198.
Ferris, G. R., Davidson, S. L., & Perrewé, P. L. (2005). Political skill at work. Mountain View, CA, Davies-
Black/CPP.
Ferris, G. R., Treadway, D. C., Perrewé, P. L., Brouer, R. L., Douglas, C., & Lux, S. (2007). Political skill in
organizations. Journal of Management, 33, 290–320.
Employee Work-Related Health, Stress, and Safety 547

Frone, M. R. (1998). Predictors of work injuries among employed adolescents. Journal of Applied Psychology,
83, 565–576.
Goetzel, R. Z., Anderson, D. R., Whitmer, R. W., Ozminkowski, R. J., Dunn, R. L., & Wasserman, J. (1998).
The relationship between modifiable health risks and health care expenditures. An analysis of the multi-
employer HERO health risk and cost database. Journal of Occupational and Environmental Medicine,
40, 843–854.
Goetzel, R. Z., Hawkins, K., Ozminkowski, R. J., & Wang, S. (2003). The health and productivity cost burden
of the “top 10” physical and mental health conditions affecting six large U.S. employers in 1999. Journal
of Occupational And Environmental Medicine, 45, 5–14.
Goetzel, R. Z., Long, S. R., Ozminkowski, R. J., Hawkins, K., Wang, S., & Lynch, W. (2004). Health, absence,
disability, and presenteeism cost estimates of certain physical and mental health conditions affecting U.S.
employers. Journal of Occupational and Environmental Medicine, 46, 398–412.
Grandey, A. A., Kern, J., & Frone, M. (2007). Verbal abuse from outsiders versus insiders: Comparing fre-
quency, impact on emotional exhaustion, and the role of emotional labor. Journal of Occupational Health
Psychology, 12, 63–79.
Greenberger, D. B., & Strasser, S. (1986). Development and application of a model of personal control in orga-
nizations. The Academy of Management Review, 11, 164–177.
Griffin, M. A., & Neal, A. (2000). Perceptions of safety at work: A framework for linking safety climate to safety
performance, knowledge, and motivation. Journal of Occupational Health Psychology, 5(3), 34–58.
Guastello, S. J. (1993). Do we really know how well our occupational accident prevention programs work.
Safety Science, 16, 445–463.
Hale, A. R., & Glendon, A. I. (1987). Individual behaviour in the control of danger. New York, NY: Elsevier.
Hansen, C. P. (1989). A causal model of the relationship among accidents, biodata, personality, and cognitive
factors. Journal of Applied Psychology, 74, 81–90.
Hofmann, D. A., Jacobs, R., & Landy, F. (1995). High reliability process industries: Individual, micro, and
macro organizational influences on safety performance. Journal of Safety Research, 26(3), 131–149.
Hogan, J., & Lesser, M. (1996). Selection of personnel for hazardous performance. Stress and Human
Performance, 195–222.
Ilgen, D.R., & Pulakos, E.D. (1999). Employee performance in today’s organizations. In D. R. Ilgen & E. D.
Pulakos (Eds.), The changing nature of work performance: Implications for staffing, motivation, and
development (pp. 1–20). San Francisco, CA: Jossey-Bass.
Iverson, R., & Erwin, P. (1997). Predicting occupational injury: The role of affectivity. Journal of Occupational
and Organizational Psychology, 70, 113–128.
Jacobs, J. A., & Gerson, K. (2004). The time divide: Work, family and gender inequality. Cambridge, MA:
Harvard University Press.
Jamal, M., & Baba, V. V. (1997). Shift work, burnout and well-being: A study of Canadian nurses. International
Journal of Stress Management, 4, 197–204.
Judge, T. A., Higgins, C. A., Thoresen, C. J., & Barrick, M. R. (1999). The Big Five personality traits, general
mental ability, and career success across the life span. Personnel Psychology, 52, 621–652.
Judge, T. A., Martocchio, J. J., & Thoresen, C. J. (1997). Five-Factor model of personality and employee
absence. Journal of Applied Psychology, 82, 745–755.
Kahn, R. L., Wolfe, D. M., Quinn, R. P., Snoek, J. D., & Rosenthal, R. A. (1964). Organizational stress: Studies
in role conflict and ambiguity. New York, NY: Wiley.
Kaiser Daily Health Report. (2008, January). U.S. health care spending reaches $2.1T in 2006, increasing
6.7%. Retrieved from http://www.kaiserhealthnews.org/Daily-Reports/2008/January/08/dr00049709.
aspx?referrer=search
Kaiser Family Foundation. (2007). Health care spending in the United States and OECD countries. Retrieved
from http://www.kff.org/insurance/snapshot/chcm010307oth.cfm
Kaiser Family Foundation. (2008). Employer health insurance costs and worker compensation. Retrieved from
http://www.kff.org/insurance/snapshot/chcm030808oth.cfm
Kaiser Network (2006). Companies shift more medical costs to workers. Retrieved from http://www.kaisernetwork.
org./daily_reports/rep_index.cfm
Kanter, R. M. (2004). Confidence. New York, NY: Crown Business.
Karasek, R. A. (1979). Job demands, job decision latitude, and mental strain: Implications for job redesign.
Administrative Science Quarterly, 24, 285–308.
Kendall, J., & Ventura, P. L. (2005). A stumbling block on the road to wellness: The ADA disability-related
inquiry and medical examination rules and employer wellness incentive programs. Benefits Law Journal,
18, 57–76.
548 Handbook of Employee Selection

Krausz, M., Sagie, A., & Biderman, Y. (2000). Actual and preferred work schedules and scheduling control as
determinants of job-related attitudes. Journal of Vocational Behavior, 56, 1–11.
Lawton, R., & Parker, D. (1998). Individual differences in accident liability: A review and integrative approach.
Human Factors: The Journal of the Human Factors and Ergonomics Society, 40, 655–671.
LePine, J. A., Colquitt, J. A., & Erez, A. (2000). Adaptability to changing task contexts: Effects of general cog-
nitive ability, conscientiousness, and openness to experience. Personnel Psychology, 53, 563–593.
Liao, H., Arvey, R. D., Butler, R. J., & Nutting, S. M. (2001). Correlates of work injury frequency and duration
among firefighters. Journal of Occupational Health Psychology, 6(3), 229–242.
Maier, M. (1999). On the gendered substructure of organization: Dimensions and dilemmas of corporate
masculinity. In G. N. Powell, (Ed.), Handbook of gender and work (pp. 69–93). Thousand Oaks, CA:
Sage.
Major, B., Richards, C., Cooper, M. L., Cozzarelli, C., & Zubek, J. (1998). Personal resilience, cognitive
appraisals, and coping: An integrative model of adjustment to abortion. Journal of Personality and Social
Psychology, 74, 735–752.
Maslach, C., Schaufeli, W. B., & Leiter, M. P. (2001). Job burnout. Annual Review of Psychology, 52,
397–422.
McCrae, R. R., & Costa, P. T. (1999). A five-factor theory of personality. In L. Pervin & O. P. John (Eds.),
Handbook of personality (2nd ed., pp. 139–153). New York, NY: Guilford Press.
McCrae, R. R., & John, O. P. (1992). An introduction to the Five-Factor model and its applications. Journal of
Personality, 60, 175–215.
Mills, P. R., Kessler, R. C., Cooper, J., & Sullivan, S. (2007). Impact of a health promotion program on employee
health risks and work productivity. American Journal of Health Promotion, 22, 45–53.
Mintzberg, H. (1983). Power in and around organizations. Englewood Cliffs, NJ: Prentice-Hall.
Motowidlo, S. J., Borman, W. C., & Schmit, M. J. (1997). A theory of individual differences in task and
contextual performance. Human Performance, 10(2), 71–83.
Mykletun, R. J., & Mykletun, A. (1999). Comprehensive schoolteachers at risk of early exit from work.
Experimental Aging Research, 25, 359–365.
Nelson, D. L., & Burke, R. J. (2000). Women executives: Health, stress and success. Academy of Management
Executive, 14, 107–121.
Nicoll, A., & Murray, V. (2002). Health protection—A strategy and a national agency. Public Health, 116,
129–137.
Parkes, K. R. (2003). Shiftwork and environment as interactive predictors of work perceptions. Journal of
Occupational Health Psychology, 8, 266–281.
Parks, K. M., & Steelman, L. A. (2008). Organizational wellness programs: A meta-analysis. Journal of
Occupational Health Psychology, 13, 58–68.
Paulhus, D., & Christie, R. (1981). Spheres of control: An interactionist approach to assessment of perceived
control. In H. M. Lefcourt (Ed.), Research with the locus of control construct, 1: Assessment methods (pp.
161–188). New York, NY: Academic Press.
Perrewé, P. L., & Spector, P. E. (2002). Personality research in the organizational sciences. In G. R. Ferris &
J. J. Martocchio (Eds.), Research in personnel and human resources management (Vol. 21, pp. 1–85),
Oxford, England: JAI Press/Elsevier Science.
Perrewé, P. L., Zellars, K. L., Ferris, G. R., Rossi, A. M., Kacmar, C. J., & Ralston, D. A. (2004). Neutralizing
job stressors: Political skill as an antidote to the dysfunctional consequences of role conflict stressors.
Academy of Management Journal, 47, 141–152.
Perrewé, P. L., Zellars, K. L., Rossi, A. M., Ferris, G. R., Kacmar, C. J., Liu, Y., Zinko, R., & Hochwarter, W. A.
(2005). Political skill: An antidote in the role overload—Strain relationship. Journal of Occupational
Health Psychology, 10, 239–250.
Powell, G. N., & Graves, L. M. (2003). Women and men in management. Thousand Oaks, CA: Sage.
Probst, T. M., & Brubaker, T. L. (2001). The effects of job insecurity on employee safety outcomes: Cross-
sectional and longitudinal explorations. Journal of Occupational Health Psychology, 6(2), 139–159.
Pulakos, E. D., Arad, S., Donovan, M. A., & Plamondon, K.E. (2000). Adaptability in the workplace:
Development of a taxonomy of adaptive performance. Journal of Applied Psychology, 85, 612–624.
Reardon, K. K. (2000). The secret handshake: Mastering the politics of the business inner circle. New York,
NY: Doubleday.
Rothstein, M. A. (1983). Occupational safety and health law. St. Paul, MN: West Group.
Ryan, D., & Watson, R. (2004). A healthier future. Occupational Health, 56(7), 20–21.
Sackett, P. R., & Wanek, J. E. (1996). New developments in the use of measures of honesty, integrity, conscien-
tiousness, trustworthiness, and reliability for personnel selection. Personnel Psychology, 49, 787–829.
Employee Work-Related Health, Stress, and Safety 549

Sanders, M. S. & McCormick, E. J. (1987). Human factors in engineering and design (6th ed.). New York:
McGraw-Hill.
Schat, A. C. H., Desmarais, S., & Kelloway, E. K. (2006). Exposure to workplace aggression from multiple
sources: Validation of a measure and test of a model. Unpublished manuscript, McMaster University,
Hamilton, Canada.
Shurtz, R. D. (2005). Reining health care costs with wellness programs: Frequently overlooked legal issues.
Benefits Law Journal, 18, 31–60.
Simon, T. M., Bruno, F., Grossman, N., & Stamm, C. (2006). Designing compliant wellness programs: HIPAA,
ADA, and state insurance laws. Benefits Law Journal, 19, 46–59.
Simon, T. M., Traw, K., McGeoch, B., & Bruno, F. (2007). How the final HIPAA nondiscrimination regulations
affect wellness programs. Benefits Law Journal, 20, 40–44.
Simpson, S. A., Wadsworth, E. J. K., Moss, S. C., & Smith, A. P. (2005). Minor injuries, cognitive failures and
accidents at work: Incidence and associated features. Occupational Medicine, 55, 99–108.
Smerd, J. (2007). Smoker? Can’t work here more firms say. Workforce Management, October 22. Retrieved
on June 12, 2008, from http://www.lexisnexis.com/us/lnacademic/frame.do?tokenKey=rsh-20.312685
.587091969
St. George, J., Schwager, S., & Canavan, F. (2000). A guide to drama-based training. National Productivity
Review, Autumn, 15–19.
Subramanian, R., Kumar, K., & Yauger, C. (1994) The scanning of task environments in hospitals: An empirical
study. Journal of Applied Business Research, 10, 104–115.
Sullivan, S. E., & Mainiero, L. (2007). Women’s kaleidoscope careers: A new framework for examining women’s
stress across the lifespan. In P. L. Perrewé & D. C. Ganster (Eds.), Exploring the work and non-work inter-
face: Research in occupational stress and well being (Vol. 6, pp. 205–238). Oxford, England: JAI Press/
Elsevier Science.
Tepper, B. (2007). Abusive supervision in formal organizations: Review, synthesis and research agenda. Journal
of Management, 33, 261–289.
Tepper, B. J., Duffy, M. K., & Shaw, J. D. (2001). Personality moderators of the relationship between abusive
supervision and subordinates’ resistance. Journal of Applied Psychology, 86, 974–983.
Tetrick, L. E. (2002). Understanding individual health, organizational health, and the linkage between the
two from both a positive health and an ill health perspective. In P. L. Perrewé & D. C. Ganster (Eds.),
Historical and current perspectives on stress and health: Research in occupational stress and well being
(Vol. 2, pp. 117–141). Oxford, England: JAI Press/Elsevier Science.
Theorell, T. (2004). Democracy at work and its relationship to health. In P. L. Perrewé & D. C. Ganster (Eds.),
Research in occupational stress and well being (Vol. 3, pp. 323–357). Oxford, England: JAI Press/
Elsevier Science
Treadway, D. C., Ferris, G. R., Hochwarter, W. A., Perrewé, P. L., Witt, L. A., & Goodman, J. M. (2005). The
role of age in the perceptions of politics—job performance relationship: A three-study constructive repli-
cation. Journal of Applied Psychology, 90, 872–881.
Vandenberg, R. J., Park, K., DeJoy, D. M., Wilson, M. G., & Griffin-Blake, C. S. (2002). The healthy work
organization model: Expanding the view of individual health and well being in the workplace. In P. L.
Perrewé & D. C. Ganster (Eds.), Historical and current perspectives on stress and health: Research in
occupational stress and well being (Vol. 2, pp. 57–115). Oxford, England: JAI Press/Elsevier Science.
Vredenburgh, A. G. (2002). Organizational safety: Which management practices are most effective in reducing
employee injury rates. Journal of Safety Research, 33(2), 259–276.
Wallace, J. C., & Vodanovich, S. J. (2003). Workplace safety performance: Conscientiousness, cognitive fail-
ure, and their interaction. Journal of Occupational Health Psychology, 8, 316–327.
World Health Organization. (2008). Workplace health promotion. Retrieved from http://www.who.int/
occupational_health/topics/workplace/en
Xie, J. L., & Schaubroeck, J. (2001). Bridging approaches and findings across diverse disciplines to improve
job stress research. In P. L. Perrewé & D. C. Ganster (Eds.), Exploring theoretical mechanisms and per-
spectives: Research in occupational stress and well being (Vol. 1, pp. 1–53). Oxford, England: JAI Press/
Elsevier Science.
Zellars, K. L., & Perrewé, P. L. (2001). Affective personality and the content of emotional social support:
Coping in organizations. Journal of Applied Psychology, 86, 459–467.
Zellars, K. L., Perrewé, P. L., & Hochwarter, W. A. (2000). Burnout in healthcare: The role of the five factors
of personality. Journal of Applied Social Psychology, 30, 1570–1598.
Zellars, K. L., Perrewé, P. L., Hochwarter, W. A., & Anderson, K. S. (2006). The interactive effects of positive
affect and conscientiousness on strain. Journal of Occupational Health Psychology, 11, 281–289.
This page intentionally left blank
26 Criterion Validity and
Criterion Deficiency
What We Measure Well
and What We Ignore
Jeanette N. Cleveland and Adrienne Colella

Work psychologists have had a long-standing interest in the criterion problem and have been par-
ticularly concerned with determining how to measure job performance and success at work. Many
notable industrial and organizational (I-O) psychologists have urged researchers to develop theo-
ries of employee performance (e.g., Campbell, 1990). Within the last 2 decades, we have made
progress in the articulation and measurement of required tasks and behaviors at work (Campbell,
1990; Chapter 21, this volume) and the identification of contingent, discretionary behaviors that are
important for person-team success (e.g., Borman & Motowidlo, 1993; Chapter 22, this volume).
However, the approach to the criterion problem followed by most researchers continues to be gener-
ally narrow and reinforces the status quo in terms of what is defined as success or successful work
behaviors in organizations.
The definitions of what represents success in organizations at the individual level (e.g., job perfor-
mance) and the organizational level (e.g., organizational effectiveness) have not changed fundamen-
tally over the years. There have been advances in understanding particular aspects of performance
and success (e.g., contextual performance), but there have not been substantial changes in the way
we think about the criteria that are used to evaluate personnel selection, training, or other interven-
tions in organizations. Our thinking has not changed, but the context in which work occurs certainly
has (Cascio & Aguinis, 2008).
The boundaries between the spheres of work, as well as between nonwork, local, national,
and global or international boundaries, have steadily eroded and these domains increasingly over-
lap. The world of work is becoming increasingly complex and intrusive (e.g., it is common for
employees to take their work with them when they go home or on vacations), and the definition
of success in the workplace is constantly evolving. This implies the need for a broader view of
the criterion domain. Several previous chapters in this volume (e.g., Chapters 21 and 22) pro-
vide excellent reviews and discussions about specific aspects or facets of workplace behavior and
performance domains. Each of these performance aspects is important to the effectiveness and
health of employees and organizations within the 21st-century workplace. In the current chapter,
we argue that both traditional and emerging facets of the performance domain must be incorpo-
rated as part of the foundation for an integrated and long-term focused human resources (HR) sys-
tem, and, importantly, that issues concerning the larger context, especially the interface between
work-nonwork issues, must be incorporated into our criterion models to more fully capture the
increasing complexities of the workplace and the diversity of the workforce. Finally, as Ployhart
and Weekley so eloquently articulate in Chapter 9, this volume, using a multilevel lens, we may be
better able to link individual-level HR systems to more macro indices of organizational productiv-
ity and sustainability.

551
552 Handbook of Employee Selection

WORK AND WORKFORCE IN THE 21ST CENTURY: OUTMODED


ASSUMPTIONS AND BASES FOR CHANGE
The design of work and the definition of success in the workplace continue to be built around the
assumption that most or all employees have an adult working at home in the role of “caregiver”
(Smolensky & Gootman, 2003). That is, our models for defining successful job performance, a suc-
cessful career, and a successful organization (Murphy, 1998) begin with the outmoded assumptions
that each worker can and should devote a great deal of time, attention, and loyalty to the organiza-
tion; that there will be someone at home to take care of the other needs; that the demands of the work
and nonwork sides of life are distinct and nonoverlapping; and that the costs associated with work
interfering with nonwork can be ignored (or at least are not the concern of the organization) when-
ever the organization places demands on its members. The way psychologists and managers have
defined and measured success, in general, and work performance, in particular, makes a good deal
of sense if you start with a homogenous (e.g., male, White, traditional family structure) and local
(e.g., American workers) workforce, but it is not necessarily sensible in the current environment.
Given the changing nature of the workforce both within the United States and globally, it is now
time to think more broadly about the conceptualization of our criteria within I-O psychology. Job
performance is not the same as success. We need to clearly distinguish between job performance
and success, a broader construct that might be assessed and defined across multiple levels of analy-
sis and might be defined differently depending on whether the focus is on the short or the long term.
Further, both constructs need to be considered in relation to their costs. That is, the headlong pursuit
of performance in an organization might have several costs to the organization (e.g., short-term
focus) and to the community (e.g., work-family conflict); different definitions of success in organiza-
tions might push employees to engage in a range of behaviors that have personal and societal costs
(e.g., workaholism).
Why should we examine how success is measured in organizations? We argue that (a) success
is a much broader and encompassing construct with content that spills over from work to nonwork
domains; (b) success and performance must be understood within a multilevel context, recognizing
that for some organizational problems and decisions, we can focus on understanding performance
at a given level but that what occurs at one level may not reverberate at other levels in a similar way;
and (c) we need to reexamine the costs associated with a short-term focus in the way we use HR
and develop ways to incorporate a longer-term focus. I-O psychology has made significant progress
in specific facets of criterion theory and measurement as shown by in-depth review chapters in
this volume (see Chapters 21, 22, 23, and 25). In the following section, the concepts of ultimate or
conceptual criterion and actual criteria (and the subsequent criterion relevance, contamination, and
deficiency) are used to describe how I-O psychologists have contributed to the understanding of
one of the most important psychological outcomes—performance success. Briefly, we review the
development of task performance theory, context performance, adaptive performance, and counter-
productive work behaviors. Using the notion of criterion deficiency, we identify where our current
conceptualizations of success are likely to be narrow, outmoded, and deficient.

CRITERION PROBLEM IN I-O PSYCHOLOGY


The legacy of 60 years of scientific research on criteria between 1917 and 1976 was the identification
of the “criterion problem” (e.g., Austin & Villanova, 1992; Flanagan, 1954). The term denotes the
difficulty involved in the conceptualization and measurement of performance constructs, particu-
larly when performance measures are multidimensional and used for different purposes.

DEFINITION AND ASSUMPTIONS OF CRITERION PROBLEM


Bingham (1926) was perhaps the first to use the word criterion in one of the two ways that it is fre-
quently used today, as “something which may be used as a measuring stick for gauging a worker’s
Criterion Validity and Criterion Deficiency 553

relative success or failure” (p. 1). Success was recognized as nearly always multidimensional in
nature, suggesting that its sources of variability are complex. The choice of dimensions to represent
or define performance depends on how broadly or narrowly one interprets the meaning of success
(i.e., conceptual criterion; Nagle, 1953; Toops, 1944).
Traditionally, discussions of the criterion problem have started with the assumption that the
conceptual or ultimate criterion of success is reasonably well defined and that the major problem
involves the shift from conceptualizing or defining success to its actual measurement. When this
shift is made, a gap is likely to develop between the “ideal” conceptualization of success and its
practical or actual measurement. The relationship between conceptual and practical measurement
of success is depicted using two general notions: conceptual criterion and actual criteria. The term
“conceptual,” “theoretical,” or “ultimate criterion” (Thorndike, 1949) describes the full domain of
everything that ultimately defines success (Cascio, 2000). Because the ultimate criterion is strictly
conceptual, it cannot be measured or directly observed. It embodies the notion of “true,” “total,”
“long-term,” or “ultimate worth” to the employing organization (Cascio, 2000).
Implicit in this model is the questionable assumption that we all know and agree about the
conceptual definition of success (i.e., the idea that the ultimate criterion is obvious and noncontro-
versial). Yet, key performance stakeholders (e.g., employees, organizations, families, society, and
the environment) do not necessarily know or agree on the conceptual definition and content of the
ultimate criterion. In short, an ultimate criterion is important because the relevance or linkage of
any operational or measurable criterion is better understood if the conceptual stage is clearly and
thoroughly documented (Astin, 1964). I-O psychology can and does measure some facets of success
very well, but these may reflect a small, narrow proportion of the ultimate criterion.

WHAT IS SUCCESS AS DEFINED BY I-O PSYCHOLOGISTS?


Performance is one of the most important outcome variables within psychology (Campbell,
McCloy, Oppler, & Sager, 1993), and I-O psychologists have been successful in constructing actual
measures of performance that theoretically overlap with the ultimate or true criterion as much
as possible. Yet, until recently, little attention has been given to the explication of the content or
domain of performance (exceptions are Chapters 21–25, this volume). One reason for this is the
Classic Model of performance, which has dominated thinking in applied research. The model
states that performance is one general factor and will account for most of the variations among
different measures. Therefore, the objective with performance measures is to develop the best pos-
sible measure of the general factor. However, throughout most of the history of I-O psychology,
the adequacy of this “ultimate criterion” has rarely been questioned or debated. In recent decades,
Campbell and others (Borman & Motowidlo, 1993; Cleveland, 2005; Johnson, 2003) have sug-
gested that the notion of an ultimate criterion or single general performance factor has no meaning
and is not the best representation of the performance construct, but the ultimate-actual criterion
distinction as shown in Figure 26.1 is still a useful heuristic for understanding the nature of the
criterion problem.

Task Performance
Campbell et al.’s (1993) model of performance focused on required worker behaviors in a given job
and attempts to delineate the dimensions of job performance. This model guides researchers and
managers in assessing and preventing criterion contamination and deficiency by articulating the
eight most important aspects of job performance. The eight factors (e.g., job-task proficiency, non-
job-specific task proficiency, written and oral communication proficiency, demonstrating effort,
maintaining personal discipline, facilitating peer and team performance, supervision/leadership,
and management/administration) are assumed to be the highest-order factors that are sufficient
to describe the latent hierarchy among all jobs. That is, the construct of performance cannot be
554 Handbook of Employee Selection

Ultimate or conceptual criterion


Actual criterion
“Success”

Criterion contamination

Criterion deficiency

Criterion relevance

Traditional content of Expanded content of


ultimate criterion-success ultimate criterion-success

Task efficiency Task efficiency


Written communication Written communication
Oral communitcation Oral communication
Work-nonwork conflict
Spousal stress
Family health & well-being

FIGURE 26.1 Criterion components: relationships among criterion relevance, contamination, and deficiency.

meaningfully understood by combining these factors into a smaller subset or one general factor.
Although the content of the factors may vary slightly across jobs, the focus of each is in terms of the
observable and behavioral things that people do which are under their control.
There is a good deal of value to articulating what task performance actually means. However,
the specific performance components articulated by Campbell et al. (1993) and others address work
success from what is arguably a very narrow perspective. In particular, this model defines perfor-
mance in the workplace as a set of behaviors that is independent from behavior associated with our
nonwork lives, or at least that nonwork factors are not relevant for defining success at work. From
this perspective, the flow back and forth between the work and nonwork spheres of life is at best a
form of criterion contamination.

Contextual Performance (Organizational Citizenship Behavior)


Within the last 2 decades, several researchers have noted that job performance involves more than
task performance (Borman & Motowidlo, 1993; Organ, 1988). For example, Borman and Motowidlo
(1993) proposed a model of performance with two components at the highest level: task perfor-
mance, as we have already discussed, and contextual performance. Smith, Organ, and Near (1983)
labeled a similar construct organizational citizenship behavior (OCB).
Although task performance consists of required behaviors for a job that either directly produce
goods and services by the organization or services/maintains the technical core or required tasks,
contextual performance consists of behaviors that support the broader environment in which
the required tasks or technical core must operate (Borman & Motowidlo, 1993). Contextual
performance (or OCB) includes behaviors such as volunteering for tasks not formally part of the
Criterion Validity and Criterion Deficiency 555

job, demonstrating effort, helping and cooperating with others, following organizational rules, and
supporting organizational objectives (Borman & Motowidlo, 1993). A number of these behaviors
would fall under a subset of components identified by Campbell et al. (1993). Borman et al. (2001)
found that the structure of citizenship behaviors could be described using three categories: personal
support (behaviors benefiting individuals in the organization including helping, motivating,
cooperating with, and showing consideration), organizational support (behaviors benefiting the
organization including representing the organization favorably, showing loyalty, and complying
with organizational rules and procedures), and conscientious initiative (behaviors benefiting the job
or task including persisting with extra effort to complete tasks, taking initiative, and engaging in self
development activities; Borman, et al., 2001; Johnson, 2003).
Although we do not have consensus on the dimensionality of OCBs, the articulation of contextual
performance challenges traditional definitions of individual work performance (Ilgen & Pulakos,
1999). Further, as discussed in greater detail in Chapter 22, this volume, contextual performance/
OCBs reflect an initial shift toward broadening work performance criteria to include performing in
interdependent and uncertain work contexts (Neal & Hesketh, 2002).

Adaptive Performance
A third component of job performance, adaptive performance, is distinct from task and contex-
tual performance (Hesketh & Neal, 1999). Adaptive performance is the proficiency with which a
person alters his or her behavior to the demands of the environment, an event or a new situation
(Pulakos, Arad, Donovan, & Plamondon, 2000), or an effective change in response to an altered
situation (White et al., 2005). Although some dimensions of adaptive performance overlap with
task or contextual performance, the dimension of addressing uncertain and unpredictable work
situations may be distinct from task and citizenship performance (Johnson, 2003). Related to the
construct of adaptive performance, the recent conceptualization of successful aging refers to the
construct as successfully adjusting to change that is developmental (Baltes & Baltes, 1990) or
as competently adapting or adjusting (Featherman, 1992; Abraham & Hansson, 1995; Hansson,
DeKoekkoek, Neece, & Patterson, 1997). Adaptive performance is reviewed in more detail in
Chapter 22, this volume.

Organizational Deviant Behaviors


Finally, organizationally deviant behaviors that have negative value for organizational effectiveness
have been proposed as a fourth distinct component of job performance (Sackett & Wanek, 1996;
Viswesvaran & Ones, 2000). This component is also known as counterproductive work behavior,
and an excellent discussion of it is presented in Chapter 23, this volume. Organizationally deviant
behavior is defined as voluntary behavior that violates organizational norms and also threatens
the viability and well being of the organization and/or its members (Robinson & Bennett, 1995).
Currently, there is little consensus regarding the dimensionality of counterproductivity. For exam-
ple, some researchers have identified property damage, substance abuse, and violence on the jobs
as facets of counterproductivity (Sackett & Wanek, 1996); withdrawal behaviors such as tardiness,
absenteeism, and turnover; or even social loafing or withholding effort are included in some defini-
tions of this aspect of job performance (Kidwell & Bennett, 1993).

CRITERION DEFICIENCY: WHAT HAVE WE IGNORED?


Modern organizations are becoming more and more concerned with the notion of sustainability
rather than focusing solely on profit (Senge, Smith, Kruschwitz, Laur, & Schley, 2008). The term is
usually used in conjunction with the sustaining of natural resources and processes. However, sus-
tainability can also generalize to the management of HR. Traditional criterion measures focus on
aspects of short-term performance, ignoring the influence that behavior has on other stakeholders
and the long-term consequences over time. This is analogous to focusing solely on profit. We need to
556 Handbook of Employee Selection

be aware of how current measures of success impact the future ability of employees to remain with
the organization and to continue to perform in a manner that is beneficial to the organization, them-
selves, and society. Considering the sustainability of HR requires taking a longer-term perspective
than is usually the case. Furthermore, given current trends in criterion measurement and typical
failure to consider multiple stakeholders, our criteria of “success” continue to be deficient in at least
two ways. First, we need to expand the notion of criteria to include aspects of individual functioning
outside of the work context. Employee health and well-being, stress, marital quality, and parental
performance are all potential aspects of an emerging performance domain within the larger context
of our lives and are inextricably linked with work organizations (Cleveland, 2005). Behavior at work
affects behavior away from work and vice versa, and a truly comprehensive definition of effective-
ness and success in an organization is likely to include facets (e.g., health and well-being) that have
traditionally not been thought of as part of the performance domain.
Second, the content of our criteria should continue to be broadened to explicitly recognize the
multilevel implications of the construct (Cleveland, 2005; DeNisi, 2000; Murphy & Cleveland,
1995). We need to more explicitly link conceptions of individual performance and success to
definitions of effectiveness and sustainability at the group, organizational, and societal level. The
same behaviors and outcomes that contribute to success as traditionally defined at the individual
level (e.g., high level of competitiveness, high level of involvement in work) might sow the seeds
of failure at other levels (e.g., by building destructive conflict within organizations, by contribut-
ing to the failure of community institutions that compete with the workplace for employees’ time
and effort). These are themes that are echoed in Cascio and Aguinis (2008) and Chapter 9, this
volume.
Recognition of the multilevel nature of performance is important for several reasons (DeNisi,
2000). Notably, it provides one way that we can examine how our definitions and measures of
success at one level are linked with potential costs associated with or incurred at another level.
Broadening the definition of the criterion space to include extra-work functioning and multilevel
effects leads us to consider domains that have been given little emphasis by traditional definitions
of individual performance and organizational success. Two such domains are individual well-being
and organizational health.

HEALTH AND WELL-BEING


At the individual level, health is not simply the absence of ill health (e.g., Jahoda, 1958). Within
Western societies, the concept of mental health also includes aspiring to learn, being reasonably
independent, and possessing confidence (Karasek & Theorell, 1990).

Individual Health
Drawing on Warr’s framework (1987, 1994a), variations in mental health reflect different relative
emphases on ill health and good health. Mental or psychological health can be described using six
dimensions: subjective or affective well-being, positive self-regard, competence, aspiration, auton-
omy, and integrated functioning (Warr, 2005). Well-being is the most commonly investigated facet
of mental health and, according to Warr (1987), it includes two orthogonal dimensions: pleasure
(feeling bad to feeling good) and level of arousal (low to high). He identified three assessable aspects
of well-being that can be viewed in terms of their location on these dimensions: specifically, the hor-
izontal axis of pleasure or displeasure, which is measured in terms of satisfaction or happiness; an
axis from anxiety (high arousal, low pleasure) to comfort (low arousal, high pleasure); and an axis
from depression (low arousal, low pleasure) to enthusiasm (high arousal, high pleasure). Indicators
of well-being that emphasize the detection of ill-health rather than good health assess anxiety,
depression, burnout, psychological distress, and physiological or psychosomatic symptoms. On the
other hand, indicators of well-being that emphasize positive mental health assess high arousal-high
pleasure states, such as enthusiasm. Job satisfaction is considered to be either an indicator of ill
Criterion Validity and Criterion Deficiency 557

health (e.g., dissatisfaction) or positive health (e.g., job satisfaction). Either way, it is thought to be a
relatively passive form of mental health because, although it assesses the degree of pleasure/displea-
sure about a job, it does not assess arousal (Parker, Turner, & Griffin, 2003; Warr, 1997).
In addition to affective well-being, Warr (1987) identified the five other components of mental
health: competence (e.g., effective coping), aspiration (e.g., goal directedness), autonomy/indepen-
dence (e.g. proactivity), positive self-regard (e.g., high self esteem), and integrated functioning
(i.e., states involving balance harmony and inner relatedness). These are important components
of mental health in their own right because (a) they are potentially more enduring than affec-
tive well-being and (b) competence, aspiration, and autonomy/independence represent more active
states and behaviors than most measures of well-being that reflect passive contentment (e.g., job
satisfaction).
How does individual well-being fit as part of a definition of effectiveness or success? First,
there is considerable evidence that workplace stresses are an important source of physical and
mental health problems. Warr (1987, 1999) has developed a framework that identifies key fea-
tures of an environment that have been shown to be related to mental health. The ten features are
described in Table 26.1 in positive terms, but low values are viewed as stressful (Warr, 2005). This
table suggests that the design of jobs (e.g., variety, opportunities for skill use), workplaces (e.g.,
physical security), reward systems (e.g., availability of money), leadership training and develop-
ment systems (e.g., supportive supervision), and design of personnel recruitment selection sys-
tems (e.g., valued social position) could all have mental health implications for the workforce. For
example, given the shrinking, aging, and increasingly diverse global workforce, organizations
need to rethink the primary objectives of recruitment and selection systems. Organizations may
increasingly face the situation of having more job vacancies than qualified individuals to fill them.
Selection systems may need to be retooled to reflect more “recruitment selection.” Selection tests
or measures not only may need to assess how well applicants can perform across various work
contexts over a period of time, but also convey to the applicant what range of situations they are
likely to encounter and what resources the organization can provide to sustain their performance
and worklife health. It can certainly be argued that individuals who perform well in an environ-
ment that has adverse effects on their physical or mental health should not be necessarily described
as successful.
Second, health effects today are likely to have performance effects tomorrow. That is, an
employee whose health is impaired by the workplace will probably make a smaller contribution to
the organization, the family, and the community over the long run than one whose employment is a
source of well being. Thus, employee well-being and health are important components to sustaining

TABLE 26.1
Job Characteristics Related to Mental Health
Opportunity for personal control
Opportunity for skill use
Externally generated goals
Variety
Environmental clarity
Availability of money
Physical security
Supportive supervision
Opportunity for interpersonal contact
Valued social position

Adapted from Warr, P., Work, well-being and mental health, in J. Barling, E. K. Kelloway, & M. R. Frone, Eds., Handbook
of work stress, 547–574, Sage, Thousand Oaks, CA, 2005.
558 Handbook of Employee Selection

an organization’s human capital. Indeed, successful organizations already are aware of the link
between health and well-being, performance, and sustainability, as documented in Chapter 25,
this volume. For example, IBM’s corporate policy on employee well being has led the organiza-
tion to develop a myriad of programs to insure employee health, well-being, and family balance.
Furthermore, they tie these programs to criteria such as work-related injury and lost workdays (IBM
Corporate Responsibility Report, 2008).

Organizational Health
A healthy organization is one that is competitive within the marketplace and also has low rates of
injury, illness, and disability (Hofmann & Tetrick, 2003). Individual health outcomes are distin-
guished from organizational-level outcomes, but both are likely to be related to individual behavior
at work. Together, individual and organizational effectiveness constitute the health of an organiza-
tion (Parker, Turner, & Griffith, 2003). That is, a healthy organization is one that accomplishes the
business-related goals that define traditional financial success and the human goals of advancing the
health and welfare of the organization’s members.
It is possible to move this discussion one step further and define a healthy organization as involv-
ing three dimensions: (a) competitive within the marketplace; (b) low rates of injury, illness, and
disability (lack of negative outcomes); and (c) promoting long-term sustainability and well being of
its constituents (e.g., work that increases the success of constituents in terms of competence, aspira-
tion, autonomy, and balance).

Integrating Health and Well-Being Into the Criterion Space


One reason why it is useful to distinguish between performance and success is that a narrow focus
on performance forces one to search for similarly narrow reasons for including factors such as health
in the criterion domain. It is certainly possible to do so; unhealthy individuals and unhealthy orga-
nizations are not likely to maintain any notable level of performance over the long run. On the other
hand, a focus on success does not require one to locate some performance-related pretext for includ-
ing health as part of the ultimate criterion. Rather, the promotion of individual and organizational
health is likely to be a valued outcome in and of itself (i.e., valued by at least some stakeholders) and
does not require justification in terms of some other set of criteria (e.g., profitability). We argue that
employees, their families, and their communities all have a vested interest in workplaces that promote
physical and mental health and all have a vested interest in minimizing a range of negative outcomes
(e.g., spillover of work-related conflicts) that might be associated with unhealthy organizations.

MULTILEVEL ISSUES IN DEFINING PERFORMANCE AND SUCCESS


Performance and success all occur at the individual, group, and organizational levels (Campbell,
Dunnette, Lawler, & Weick, 1970; DeNisi, 2000); also, they can be defined within the larger context
(level) of society and environment. Performance and success are not only defined at many levels
of analysis, they can also be defined in terms of multiple units of time. Perhaps the most serious
deficiency in many definitions of individual performance and success is the lack of awareness or
concern with the relationship between choices in defining the domain at one level (e.g., Is “face
time” an important part of performance and success?) and effects felt at other levels of the system
(e.g., If “face time” at work is viewed as important, time spent in family or community activities is
likely to decline). According to DeNisi (2000), when we acknowledge that performance is a multi-
level phenomenon, then several important implications follow:

1. We assess and develop individual employee performance with the intent of ultimately
affecting the performance of the team or the whole organization.
2. Individuals and teams perform in ways to allow the organization to achieve outcomes
referred to as “organizational performance.”
Criterion Validity and Criterion Deficiency 559

3. Performance at higher levels of analysis is more than just the simple sum of performance at
lower levels; that is, it is not always sufficient to change individual performance to change
team or organization performance.
4. Variables at higher levels of analysis (e.g., organizational structure or climate) can serve as
constraints on (or facilitators of) the performance of individuals and teams. Therefore, we
must understand the organizational context in order to fully understand the performance
of individuals or teams.

In particular, thinking about performance and success from a multilevel perspective might help
understand how and why the ultimate criterion should be expanded. For example, we tradition-
ally construct and validate personnel selection systems as if the only objective of those systems
was to predict future performance at the individual level (e.g., virtually all validation studies use
measures of individual job performance as the criterion of choice). Yet it is clear that the goals of
a personnel selection system are not solely to predict future performance; the goals are to help the
organization make better strategic decisions, be profitable, and sustain productivity. Consistent
with the message conveyed in Chapter 9, this volume, it is critical that the criteria are linked
with unit or organizational strategy. Therefore, our criteria may include unit-, organizational-,
and societal-level assessments, as well as individual-level performance assessments, to be most
consistent with a firm’s strategy. One plausible reason that a validated selection system does not
translate into better unit performance may be the narrowly defi ned criteria used. There are usu-
ally real and legitimate differences in different stakeholders’ definitions of “better decisions.” For
example, an organization may use a class of tests in personnel selection that results in predicted
increases in individual performance but also results in adverse impact, in confl ict between super-
visors and subordinates, and in negative images of the organization. This might not be thought of
as a success, even if the validity coefficients are all large and positive (Murphy, 2010). Therefore,
the logic that Ployhart and Weekley develop in Chapter 9, this volume, to link individual-level
selection tests to organizational business strategy should also be applied to the re-examination
and development of the criterion domain. That is, relevant macro work context and nonwork fac-
tors should be included within the articulation and domain of success. Cascio and Aguinis (2008)
make similar recommendations using the emerging construct they label, “in situ performance,”
which refers to the situational, contextual, strategic, and environmental effects that may influ-
ence individual, team, or organizational performance. By integrating or specifying these effects,
we develop a “richer, fuller, context-embedded description of the criterion space that we wish to
predict” (Cascio & Aguinis, 2008, p. 146). With the changing nature of work and the workforce,
such criterion evolution can more fully capture how work is done in the 21st century (Cascio &
Aguinis, 2008).
I-O psychologists devote a great deal of time and effort in helping organizations make high-
stakes decisions about people (e.g., whom to hire, where to place them, and what behaviors to
reward and sanction). A multi-level perspective suggests that these decisions can and probably
should be evaluated in terms of their effects on individuals, work groups, organizations, and
families and communities, and that short- and long-term perspectives should be considered.
To be sure, there are many difficult issues that have to be addressed to put such a program
of criterion development in place. Whose perspectives should be considered and how much
weight should be given to each stakeholder in defi ning individual or organizational success?
How should confl icts between stakeholders be addressed (e.g., it might benefit organizations but
harm the communities that support them if many employees put in 80-hour weeks)? There are
no simple answers to these questions, but I-O psychologists do have experience dealing with
the multilevel issues in several other domains, and we may be able to draw from this research
and this experience to gain insights into developing more inclusive defi nitions of what it means
to be a success in the workplace. In particular, there is much to be learned from research on
work-family confl ict.
560 Handbook of Employee Selection

WORK-FAMILY CONFLICT AS A MULTILEVEL CRITERION PROBLEM


Research on work-family conflict provides an example of the implications of thinking about perfor-
mance and success from the perspectives of multiple stakeholders. Given I-O psychologists’ interest
in the work context, the work side of the work-family interface has been more focal in I-O research
(Major & Cleveland, 2005). Research in work-family conflict has typically emphasized the experi-
ences of managers and professionals, as opposed to other types of workers (e.g., laborers), and has
typically focused on the individual employee and his or her performance at work. Although some
I-O studies have examined outcomes for employed couples (e.g., Hammer, Allen, & Grigsby, 1997),
these are few and far between, and research that includes or acknowledges children is sparse indeed.
Nevertheless, the field of work-family conflict can be viewed as one of the most successful examples
of multilevel, multiperspective thinking, particularly if we recast some of the traditional areas of
work-family conflict research in a slightly different light.
I-O psychologists have been particularly interested in the effects of work-family conflict on
employee job-related attitudes. They have usually not thought of work-family conflict as a mea-
sure of success (or lack thereof), but rather as a criterion contaminant. However, it is reasonable
to argue that work-family conflict should be part of the definition of success, particularly when we
define success at the organizational level. That is, an organization that frequently places demands
on employees that interfere with their ability to function well as spouses, parents, caregivers, etc.,
should be considered as less successful than similar organizations that find a way to minimize their
encroachment on the family roles of their employees. The decision not to include work-family bal-
ance in the scorecard used to evaluate organizations may make sense from the perspective of some
stakeholders (e.g., investors, or executives with stay-at-home spouses), but it is not likely to be in
the interest of families, children, and perhaps even the larger society that provides the customers,
infrastructure, employees, and support that is necessary for the organization’s survival. Although
I-O psychologists often ignore work-family balance as a criterion of success, some organizations,
such as IBM, do not. IBM has given $213 million to dependent care services globally because its
management has acknowledged that it is in the best interest of employees and the organization
(http://www.ibm.com/ibm/responsibility).
Why should organizations care about work-family confl ict? First, work-family conflict has
been linked to organizational commitment, turnover intentions (e.g., Lyness & Thompson, 1999;
Netemeyer, Boles, & McMurrian, 1996), turnover (Greenhaus, Collins, Singh, & Parasuraman,
1997), and stress and health (Frone, 2000; Frone, Russell, & Cooper, 1997). Second, some
studies have found a negative relationship between work-family conflict and job performance
(Aryee, 1992; Frone et al., 1997), particularly when performance is defined as task performance.
By revealing links to outcomes that traditionally matter to business (e.g., turnover), this research
illustrates that attending to work-family concerns is not simply a “moral imperative” or the
“right thing” to do, but also makes good business sense. That is, a reasonable case can be made
that work-family conflict is harmful to an organization’s bottom line, especially over the long
term.
A multilevel perspective suggests that it is not necessary (although it is likely to be desirable)
to focus on the links between work-family conflict and the bottom line to justify including this
conflict as a facet of success. Rather, there are important stakeholders (e.g., employees, their families,
their communities) who have a legitimate stake in wanting to minimize work-family conflict,
regardless of whether or not it affects the bottom line of the organization. This multilevel per-
spective is particularly important because it has been consistently found that work-to-family con-
flict is more likely to occur than family-to-work conflict (Eagle, Miles, & Icenogle, 1997; Gutek,
Searle, & Klepa, 1991; Netemeyer et al., 1996). Organizational demands on the time and energy of
employees appear to be more compelling than those of the family because of the economic contri-
bution of work to the well being of the family (Gutek et al., 1991). Employees are often afraid to
be away from the workplace and “presenteeism” takes its toll (Lewis & Cooper, 1999; Simpson,
Criterion Validity and Criterion Deficiency 561

1998). Workers are spending more time in the workplace in response to job insecurity, workplace
demands, perceived career needs, and financial pressure. That is, the most compelling finding in
the domain of work-family conflict is not that family interferes with work but that work interferes
with family. If we, as I-O psychologists, focus only on outcomes that directly affect the employer’s
interests (particularly employers’ short-term interests), we are likely to dismiss the most important
aspect of work-family conflict (i.e., the way work can adversely affect families) as outside of the
boundaries of the criterion domain. If we consider the interests of employees, their families and
their communities as a legitimate part of the definition of the ultimate criterion space, we are
less likely to dismiss this important set of findings as being largely irrelevant, or at least as being
someone else’s problem.
Women and men in the United States increased their annual working hours by an average of 233
and 100 hours, respectively, between 1976 and 1993 (Bureau of Labor Statistics, 1997). In 1999, the
average weekly full-time hours over all industries were 43.5–45.2 for certain professional groups
and executives (Bureau of Labor Statistics, 1999). Many employees work longer hours, and dual-
earner couples may work unusual hours or shifts. In the United States and the United Kingdom,
workers feel they need to put in substantial “face time” to demonstrate their commitment (Bailyn,
1993) and many in low-wage occupations work in more than one job. Despite the increasing time
and effort devoted to work, employees are feeling increasing levels of job insecurity (Burchell,
Felstead, & Green, 1997; Reynolds, 1997). From the perceptive of multilevel systems, this increas-
ing focus on face time, long hours, and increased insecurity is arguably evidence that organizations
are increasingly unhealthy, and, therefore, increasingly unsuccessful.
Similarly, we can think of research on workers’ experiences with family-friendly work policies
(e.g., parental leave, flextime) differently if we broaden our definitions of performance, effective-
ness, and success. For example, family-friendly policies are of limited value without a secure job,
and there is evidence that many qualified employees decline opportunities to participate in these
programs (Lewis et al., 1998). One way of evaluating the success of an organization would be to pay
attention to the uptake rates for policies such as these. If employees report stress and dissatisfac-
tion as a result of work-family conflict but are unwilling or unable to take advantage of workplace
policies designed to reduce these stresses, this can be considered evidence that the organization is
failing its stakeholders, regardless of what the balance sheet says.
Few of the studies examining the effects of family-friendly policies focus on the couple or the
family as the unit of analysis. In addition, such factors as marital well being and healthy family
relations are rarely assessed. Finally, few studies in I-O psychology or management tap spousal or
children’s perceptions of work-family conflict and employee or parental behaviors. As a result, we
know little about how family-friendly policies actually affect families. A multilevel perspective in
defining success suggests that we should try to learn more about all of these issues.
Although studied far less frequently than work-related outcomes, psychological research has not
completely neglected outcomes in the family domain (Major & Cleveland, 2007). Numerous empir-
ical studies demonstrate a negative relationship between work-family conflict and life satisfaction
(e.g., Adams, King, & King, 1996; Netemeyer et al., 1996); the results of two meta-analyses (Allen,
Herst, Bruck, & Sutton, 2000; Kossek & Ozeki, 1998) reinforce this conclusion. The results are
similar for work-family conflict and marital functioning and/or satisfaction (e.g., Duxbury, Higgins,
& Thomas, 1996; Netemeyer et al., 1996) and family satisfaction (e.g., Parasuraman, Purohit,
Godshalk, & Beutell, 1996). Yet again, often this research taps only the perceptions of the employed
worker and does not collect information from spouses or children.
Children are virtually absent from I-O research on the work-family interface (Major & Cleveland,
2007), and when they are included, it is typically as a demographic control variables (i.e., number of
children, age of youngest child) in studies of an employed parent’s family demands (see Rothausen,
1999 for a review). With few exceptions (e.g., Barling, Dupre, & Hepburn, 1998), children’s out-
comes are seldom considered in I-O work-family research. Moreover, I-O research lacks a rich
treatment of the how children and other family variables influence employee behavior (cf. Eby,
562 Handbook of Employee Selection

Casper, Lockwood, Bordeaux, & Brinley, 2005) or, importantly, how workplace characteristics
and the employment/parental behaviors of both working parents influence the well being and work
attitudes of their children. Further, current measures of success are deficient and lack consideration
of children’s well being. If we think about the family as one of the important set of stakeholders
in defining what we mean by success, we will be more likely to consider the reciprocal effects of
work and family in deciding whether our HR systems (e.g., personnel selection) are indeed leading
to better decisions.
In the traditional model of success, in which the ultimate criterion is entirely focused on what is
good (often in the short term) for the organization, including measures of work-family conflict in
evaluations of careers, organizations, etc., would probably be dismissed as criterion contamination.
If we recognize that the worlds of work and nonwork are inextricably intertwined, we are likely to
reach a very different conclusion; that is, that the failure to include variables such as work-family
conflict in our definitions of success has led to conceptions of the ultimate criterion that are them-
selves deficient.

“CLOSING IN” ON CRITERION DEFICIENCY: ONE APPROACH TO


BRIDGING HR SYSTEMS WITH BUSINESS UNIT STRATEGY
Scholars in management and applied psychology have often worked from the assumption that work
could and should be analyzed and understood as a separate domain from our nonwork lives. This
probably made a good deal of sense for workplace in the late 19th and early 20th century (a forma-
tive period for work organizations and for I-O psychology) when White males were the predominant
members of the workforce, with unpaid wives at home tending to children and nonwork needs.
This characterization increasingly is not accurate of workers in the 21st century, nor is it accurate
for their families. Families are more diverse in structure, and it is more likely that all adult family
members are paid employees working outside of the home.
With the changing demographic composition of the workforce and working families, and the
changing demands and technology within organizations, the way success is defined and measured
must undergo transformation as well. We argue that this transformation in evaluation at work needs
to reflect the following. First, the domain of success must encompass a more inclusive set of content,
including individual employee well being, martial and family well being, and traditional indicators
of task and citizenship behaviors. Second, the domain of success must reflect multiple levels of anal-
ysis, including individual employee, couples, families, teams, work units, organization productivity,
and community quality. Further, the multiple levels of analysis may include varying units of time—
short term including up to about 1 year to longer term including up to decades of time. For example,
children often leave home at 18 years of age, and the balance between work and nonwork that is
best for the employee, the child, the spouse, the organization, and the community might constantly
shift during those 18 years. Some employees might attempt to maximize their career advancement
before starting a family, whereas others might reenter a career after childrearing is completed. The
definition of the employees’ behaviors that are most desirable will probably vary over employees,
over time, and over stakeholders.
Third, the set of stakeholders who have a legitimate interest in defining what behaviors should
occur in the workplace are not found only at work (e.g., employees, coworkers, customers). Our defi-
nition of stakeholders must include nonworking and working spouses/partners and children. Finally,
our nonwork lives should not be viewed as contaminants of job performance or success but rather as
part of the ultimate criterion of success, and therefore very relevant and appropriate to assess.

IMPLICATIONS FOR SELECTION


We do not suggest that organizations measure employee marital satisfaction or fire employees when
they divorce or have problematic children, nor that they use health status as selection criterion (see
Criterion Validity and Criterion Deficiency 563

Chapter 25, this volume, for a discussion of work-related health, stress, and safety). Rather, just
as many organizations collect and monitor various safety criteria at the organizational level (e.g.,
accident rates), an organization can monitor at an aggregate level the work and nonwork health
of their organization. To ensure privacy for employees, information on nonwork issues can be
collected at group or organizational levels of analysis about marital health and family relationships,
not from individual employees. However, information on work performance using task and
citizenship behaviors can be collected at individual and aggregated levels. Further, it is important
that organizations tap not only perceptions of individual employees, coworkers, supervisors, and so
forth, but also the perceptions of employees’ partners/spouses and children. Just as 360º performance
feedback programs have gained some popularity in management circles (Bracken, Timmreck, &
Church, 2001), organizations should also receive feedback from nonwork sources (Shellenbarger,
2002). Using a type of family 360 may provide useful feedback to the employee.
Adopting a broader, more heterogeneous conceptualization of worker success would have impor-
tant implications for the way we evaluate the validity and adequacy of our criteria and for the
conclusions we reach about the validity and value of many of the systems psychologists develop
for organizations. A broader concept of success may have considerable appeal for employees and
their families and could even be thought of as a competitive advantage for organizations (i.e., orga-
nizations that think more broadly about defining success may be better positioned to recruit and
retain particular employees) and enhance the sustainability of the organization. Perhaps one basis
for worker dissatisfaction with performance appraisal is that what employees value as success is
not reflected in the organization evaluation process. Taking a broader perspective may also pro-
vide the organization with a strategic advantage within the public’s eye. In addition, organizations
would gain essential insight to potential HR challenges facing working families that can provide
the basis for innovative and effective interventions. Not only would I-O psychologists and managers
have more actual measures to tap success, but they would also have more sources of performance
information. Finally, using a multilevel orientation to tap multisource information, we plausibly can
begin to (a) link our HR systems with business strategy (as discussed in Chapter 9, this volume) and
(b) develop selection tools that predict in situ performance and more fully reflect individual success
and well-being as well as organizational sustainability.

CONCLUSIONS
The way we go about predicting and understanding success in organizations (and designing per-
sonnel selection systems that will maximize success) depends largely on how we define success.
Researchers and practitioners increasingly question the adequacy of traditional definitions of job
performance, promotions, salary, job title, organizational level, and so forth as indicators of success.
These are all important and relevant, but success almost certainly should be defined more broadly
and comprehensively. As the permeability of the boundaries between work and nonwork domains
increases in the 21st century, our definition of what it means to the organization, the individual, and
the broader society to be a success or a failure in the workplace is likely to change.
We have argued in this chapter that criteria such as marital and family well-being are of legiti-
mate concern to responsible organizations and are part of the ultimate criterion. The wealth of
evidence shows that employees place family as their number one priority (Lewis & Cooper, 1999)
and that employees’ work demands regularly interfere with their ability to meet family demands,
and (to a lesser degree) there is also some evidence that employees’ family demands interfere with
their ability to carry out work demands (cf. Greenhaus & Parasuraman, 1999). Business strategies
that emphasize promoting long-term sustainability and concern with the construct of in situ perfor-
mance (Cascio & Aguinis, 2008) will necessarily be concerned with determining how work and
nonwork domains affect one another and with how nonwork criteria such as family well being are
likely to influence the viability of organizations. The literature includes constant calls for aligning
HR practices with business strategy (Chapter 9, this volume) to promote the long-term benefit of
564 Handbook of Employee Selection

organizations, and it is likely that understanding the effects of work and organizational demands on
the quality of nonwork life will be an important factor in building and sustaining healthy organiza-
tions. Our current criteria for success (and theories of performance) are arguably deficient because
we ignore the facets and structures of work that affect nonwork areas of our lives.
For example, suppose that a new performance management system led employees to book more
hours at work but also led to increased stress at home. It might be reasonable to ask whether the
organization should consider their new system a success or a failure. It may not be easy to determine
the best balance between the positive and negative effects of this system, but it seems reasonable to
at least ask the question of how interventions that have what seem like beneficial effects at one level
of analysis might have negative effects at other levels. Our current narrow focus on what is good for
the organization may lead us to miss the effects of what happens in the workplace on any number
of domains other than work.
What happens at work does not always stay at work; the workplace affects our nonwork lives,
and our nonwork lives affect the workplace. It is important to more fully appreciate the reciprocal
relationships between work and nonwork and to recognize the larger developmental and cultural
context in which work behaviors unfold. Including nonwork factors in our evaluations of careers,
jobs, and organizations is not a source of criterion contamination. Rather, failure to consider these
factors in defining success should be thought of as a source of criterion deficiency. There are many
challenges in determining what to measure, how to measure it, and how to use that information, but
the case seems clear—we need to take a broader (and richer) approach to defining performance and
success for individuals and organizations.

REFERENCES
Abraham, J. D., & Hansson, R. O. (1995). Successful aging at work: An applied study of selection, optimiza-
tion, and compensation through impression management. Journal of Gerontology, 50, 94–103.
Adams, G. A., King, L. A., & King, D. W. (1996). Relationships of job and family involvement, family social
support, and work-family conflict with job and life satisfaction. Journal of Applied Psychology, 81,
411–420.
Allen, T. D., Herst, D. E., Bruck, C. S., & Sutton, M. (2000). Consequences associated with work-to-family
conflict: A review and agenda for future research. Journal of Occupational Health Psychology, 5,
278–308.
Aryee, S. (1992). Antecedents and outcomes of work-family conflict among married professional women:
Evidence from Singapore. Human Relations, 45, 813–837.
Astin, A.W. (1964). Criterion-centered research. Educational and Psychological Measurement, 24, 807–822.
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917–1992. Journal of Applied Psychology, 77,
836–874.
Bailyn, L. (1993). Breaking the mold: Women, men and time in the new corporate world. New York, NY: Free
Press.
Baltes, P. B., & Baltes, M. M. (1990). Psychological perspectives on successful aging: The model of selective
optimization with compensation. In P. B. Baltes & M. M. Baltes (Eds.), Successful aging: Perspectives
from the behavioral sciences (pp. 1–33). Cambridge, England: Cambridge University Press.
Barling, J., Dupre, K. E., & Hepburn, C. G. (1998). Effects of parents’ job insecurity on children’s work beliefs
and attitudes. Journal of Applied Psychology, 83, 112–118.
Bingham, W. V. (1926). Measures of occupational success. Harvard Business Review, 5, 1–10.
Borman, W. C. (1993). Job behavior, performance, and effectiveness. In M. D. Dunnette & L. M. Hough (Eds.),
Handbook of industrial and organizational psychology (Vol. 2, pp. 271–326). Palo Alto, CA: Consulting
Psychologists Press.
Borman, W., & Motowidlo, S. (1993). Expanding the criterion domain to include elements of contextual
performance. In N. Schmitt & W.C. Borman (Eds.), Personnel selection in organizations. (pp.71–98).
San Francisco, CA: Jossey-Bass.
Borman, W. C, Penner, L. A., Allen, T. D., & Motowidlo, S. J. U. (2001) Personality predictors of citizenship
performance. International Journal of Selection and Assessment, 9, 52–69.
Bracken, D., Timmreck, C., & Church, A. (2001). The handbook of multisource feedback: The comprehensive
resource for designing and implementing MSF processes. San Francisco, CA: Jossey-Bass.
Criterion Validity and Criterion Deficiency 565

Burchell, B., Felstead, A., & Green, F. (1997, September). The age of the worried workers: Extent, pattern
and determinants of insecurity in Britain over the last decade. Paper presented at the 12th Annual
Employment Research Unit Conference, Cardiff, Wales.
Bureau of Labor Statistics. (1997). Workers are on the job more hours over the course of a year (issues in Labor
Statistics, Summary 97-3). Washington, DC: U.S. Department of Labor.
Bureau of Labor Statistics. (1999, April). Household data. Washington, DC: U.S. Department of Labor.
Campbell, J. P. (1990). Modeling the performance prediction problem in industrial and organizational psychology.
In M. D. Dunnette & L. M. Hough (Eds.), Handbook of industrial and organizational psychology, (2nd ed.,
Vol. 1, pp. 687–732). Palo Alto, CA: Consulting Psychologists Press.
Campbell, J. P., Dunnette, M. D., Lawler, E. E., & Weick, K. (1970). Managerial behavior, performance and
effectiveness. New York, NY: McGraw-Hill.
Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. (1993). A theory of performance. In N. Schmitt &
W. C. Borman (Eds.), Personnel selection in organizations (pp. 35–70). San Francisco, CA: Jossey-
Bass.
Cascio, W. F. (2000). Managing human resources: Productivity, quality of work life and profits. New York, NY:
McGraw-Hill.
Cascio, W., & Aguinis, H. (2008). Staffing twenty-first-century organizations. In J. P. Walsh & A. P. Brief
(Eds.). Academy of management annals (pp. 133–165). Mahwah, NJ: Lawrence Erlbaum.
Cleveland, J. N. (2005). What is success? Who defines it? Perspectives on the criterion problems as it relates
to work and family. In E. E. Kossek & S. J. Lambert (Eds.), Work and life integration: Organizational,
cultural and individual perspectives (pp 319–346). Mahwah, NJ: Lawrence Erlbaum.
DeNisi, A.S. (2000). Performance appraisal and performance management: A multilevel analysis. In K. J. Klein
& S. J. W. Kozlowski (Eds.), Multilevel theory, research, and methods in organizations: Foundations,
extensions, and new directions (pp 121–156). San Francisco, CA: Jossey-Bass.
Duxbury, L. E., Higgins, C. A., & Thomas, D. R. (1996). Work and family environments and the adoption of
computer-supported supplemental work-at-home. Journal of Vocational Behavior, 49, 1–23.
Eagle, B. W., Miles, E. W., & Icenogle, M. L. (1997). Interrole conflicts and the permeability of work and fam-
ily domains: Are there gender differences? Journal of Vocational Behavior, 50, 168–184.
Eby, L. T., Casper, W. J., Lockwood, A., Bordeaux, C., & Brinley, A. (2005). A twenty-year retrospective on
work and family research in IO/OB: A content analysis and review of the literature. [Monograph] Journal
of Vocational Behavior, 66, 124–197.
Featherman, D. L. (1992). Development of reserves for adaptation to old age: Personal and societal agendas. In
E. Cutler, D. W. Gregg, & M. P. Lawton (Eds.), Aging, money, and life satisfaction: Aspects of financial
gerontology (pp. 135–168). New York, NY: Springer.
Flanagan, J. C. (1954). The critical incidents technique. Psychological Bulletin, 51, 327–358.
Friedman, J. H., & Greenhaus, S. D. (2000). Work and family—Allies or enemies? What happens when business
professionals confront life choices. New York, NY: Oxford.
Frone, M. R. (2000). Work-family conflict and employee psychiatric disorders: The national comorbidity survey.
Journal of Applied Psychology, 85, 888–895.
Frone, M. R., Russell, M., & Cooper, M. L. (1997). Relation of work-family conflict to health outcomes: A four-
year longitudinal study of employed parents. Journal of Occupational & Organizational Psychology, 70,
325–335.
Greenhaus, J. H., Collins, K. M., Singh, R., & Parasuraman, S. (1997). Work and family influences on departure
from public accounting. Journal of Vocational Behavior, 50, 249–270.
Greenhaus, J. H., & Parasuraman, S. (1999). Research on work, family and gender: Current status and future direc-
tions. In G. N. Powell (Ed.), Handbook of gender and work (pp. 391–412). Thousand Oaks, CA: Sage.
Griffin, B., & Hesketh, B. (2003). Adaptable behaviours for successful work and career adjustment. Australian
Journal of Psychology, 55, 65–73.
Gutek, B.A., Searle, S., & Klepa, L. (1991). Rational versus gender role explanations for work-family conflict.
Journal of Applied Psychology, 76, 560–568.
Hammer, L. B., Allen, E., & Grigsby, T. D. (1997). Work-family conflict in dual-earner couples: Within-individual
and crossover effects of work and family. Journal of Vocational Behavior, 50, 185–203.
Hansson, R. O., DeKoekkoek, P. D., Neece, W. M., & Patterson, D. W. (1997). Successful aging at work: Annual
Review, 1992–1996: The older worker and transitions to retirement. Journal of Vocational Behavior, 51,
202–233.
Hesketh, B., & Neal, A (1999). Technology and performance. In D. R. Ilgen & E. D. Pulakos (Eds.), The
changing nature of performance: Implication for staffing, motivation, and development (pp. 21–55). San
Francisco, CA: Jossey-Bass.
566 Handbook of Employee Selection

Hofmann, D. A. & Tetrick, L. E. (2003). The etiology of the concept of health: Implications for “organizing”
individual and organizational health. In D. A. Hofmann & L. E. Tetrick (Eds.), Health and safety in orga-
nizations: A multilevel perspective (pp. 1–26). San Francisco, CA: Jossey-Bass.
Johnson, J. W. (2003). Toward a better understanding of the relationship between personality and individual
job performance. In M. R. Barrick & A. M. Ryan (Eds.), Personality and work: Reconsidering the role of
personality in organizations (pp. 83–120). San Francisco, CA: Jossey-Bass.
Karasek, R. A. & Theorell. T. (1990). Healthy work: Stress, productivity, and the reconstruction of working life.
New York, NY: Basic Books.
Kossek, E. E., & Ozeki, C. (1998). Work-family conflict, policies, and the job-life satisfaction relationship:
A review and directions for organizational behavior-human resources research. Journal of Applied
Psychology, 83, 139–149.
Lewis, S., & Cooper, C. L. (1999). The work-family research agenda in changing contexts. Journal of
Occupational Health Psychology, 4, 382–393.
Lewis, S., Smithson, J., Brannen, J., Das Dores Guerreiro, M., Kugelberg, C., Nilsen, A., & O’Connor, P.
(1998). Futures on hold: Young Europeans talk about combining work and family. London, England:
Work-Life Research Centre.
Lyness, K. S., & Thompson, D. E. (1997). Above the glass ceiling? A comparison of matched samples of
female and male executives. Journal of Applied Psychology, 82, 359–375.
Major, D. A. & Cleveland, J. N. (2005). Psychological perspectives on the work-family interface. In S. Bianchi, L. Casper,
& R. King (Eds.), Work, family, health and well-being (pp. 169–186). Mahwah, NJ: Lawrence Erlbaum.
Major, D. A. & Cleveland, J. N. (2007). Reducing work-family conflict through the application of industrial/
organizational psychology. International Review of Industrial and Organizational Psychology, 22,
111–140.
Murphy, K. R. (1998). In search of success: Everyone’s criterion problem. SIOP Presidential Address at the
Annual Conference for Industrial and Organizational Psychology, Dallas, TX.
Murphy, K. R. (2010). How a broader definition of the criterion domain changes our thinking about adverse
impact. In J. Outtz (Ed.), Adverse impact (pp. 137–160). San Francisco: Jossey-Bass.
Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational and
goal-oriented perspectives. Newbury Park, CA: Sage.
Nagle, B. F. (1953). Criterion development. Personnel Psychology, 6, 271–289.
Neal, A., & Hesketh, B. (2002). Productivity in organizations. In N. Anderson, D. S. Ones, H. K. Sinangil, &
C. Viswesvaran, (Eds.), Handbook of industrial, work and organizational psychology, Vol. 2:
Organizational psychology (pp. 7–24). Thousand Oaks, CA: Sage.
Netemeyer, R. G., Boles, J. S., & McMurrian, R. (1996). Development and validation of work-family conflict
and family-work conflict scales. Journal of Applied Psychology, 81, 400–410.
Organ, D. W. (1988). Organizational citizenship behavior. Lexington, MA: D.C. Health.
Parasuraman, S., Purohit, Y. S., Godshalk, V. M., & Beutell, N. J. (1996). Work and family variables, entrepre-
neurial career success and psychological well-being. Journal of Vocational Behavior, 48, 275–300.
Parker. S. K., Turner, N., & Griffin, M. A. (2003) Designing healthy work. In D. A. Hofmann & L. E. Tetrick
(Eds.), Health and safety in organizations: A multilevel perspective (pp. 91–130). San Francisco, CA:
Jossey-Bass.
Pulakos, E. D., Arad, S., Donovan, M. A., & Plamondon, K. E. (2000) Adaptability in the workplace:
Development of a taxonomy of adaptive performance. Journal of Applied Psychology, 85, 612–624.
Reynolds, J. R. (1997). The effects of industrial employment conditions on job related distress. Journal of
Health and Social Behaviour, 38, 105–116.
Robinson, S. L., & Bennett, R. J. (1995). A typology of deviant workplace behaviors: A multidimensional scaling
study. Academy of Management Journal, 38, 555–572.
Rothausen, T. J. (1999). “Family” in organizational research: A review and comparison of definitions and
measures. Journal of Organizational Behavior, 20, 817–836.
Sackett, P. R., & Wanek, J. E, (1996). New development in the use of measures of honesty, integrity, conscien-
tiousness, dependability, trustworthiness and reliability for personnel selection. Personnel Psychology,
49, 787–830.
Senge, P. M., Smith, B., Kruschwitz, N., Laur, J., & Schley, S. (2008). The necessary revolution: How individuals
and organizations are working to create a sustainable world. New York, NY: Random House.
Shellenbarger, S. (2002). Executive dad asks family for a 360 review. Wall Street Journal. Retrieved June 12,
2002, from http://www.careerjournal.com
Simpson, R. (1998). Organisational restructuring and presenteeism: The impact of long hours on the working
lives of managers in the UK. Management Research News, 21, 19.
Criterion Validity and Criterion Deficiency 567

Smith, C. A., Organ, D. W., & Near, J. P. (1983). Organizational citizenship behavior: Its nature and antecedents.
Journal of Applied Psychology, 68, 655–663.
Smolensky, E., & Gootman, J. A. (2003). Working families and growing kids: Caring for children and adoles-
cents. Washington, DC: The National Academies Press.
Thorndike, R. L. (1949). Personnel selection: Test and measurement techniques. New York, NY: Wiley.
Toops, H. A. (1944). The criterion. Educational and Psychological Measurement, 4, 271–297.
Viswesvaran, C., & Ones, D. S. (2000). Perspectives on models of job performance. International Journal of
Selection and Assessment, 8, 216–226.
Warr, P. B. (1987). Work, unemployment and mental health. Oxford, England: Clarendon Press.
Warr, P. B. (1994a). A conceptual framework for the study of work and mental health. Work and Stress, 8,
84–97.
Warr, P. (1994b). Age and employment. In H. Triandis, M. Dunnette, & L. Hough (Eds.), Handbook of industrial
and organizational psychology (2nd ed., Vol. 4, pp. 485–550). Palo Alto, CA: Consulting Psychologists
Press.
Warr, P. (2005). Work, well-being and mental health. In J. Barling, E. K. Kelloway, & M. R. Frone (Eds.).
Handbook of work stress. (pp. 547–574). Thousand Oaks, CA: Sage.
Warr, P. B. (1997). Age, work and mental health. In K. W. Schaie & C. Schoder (Eds.), The impact of work on
older adults (pp. 252–296). New York, NY: Springer.
Warr, P. B. (1999). Well-being in the workplace. In D. Kahneman, E. Diener & N. Schwarz (Eds.), Well-being:
The foundation of hedonic psychology (pp. 392–412). New York, NY: Russell Sage Foundation.
This page intentionally left blank
Part 6
Legal and Ethical Issues in
Employee Selection
P. Richard Jeanneret and Paul R. Sackett,
Section Editors
This page intentionally left blank
27 Ethics of Employee Selection
Joel Lefkowitz and Rodney L. Lowman

Each of the chapters in this handbook focuses on determinants of how the organizational human
resource (HR) practice of employee selection can be done well. That is, the contents are aimed at
providing the guidance needed to develop selection and promotion procedures that are accurate,
valid, and useful for organizations. In this chapter we suggest another standard. In addition to doing
selection well, we add a concern for doing it right. Hence, added to the technical and procedural
knowledge and empirical criteria that guide employee selection, this chapter emphasizes the nor-
mative or moral standards associated with notions of the good, right, fair, or just. We suggest that
doing selection well (i.e., technical competence) is inextricably bound up with doing it right. This
approach also opens to reflection the implicit values and moral justification underlying the practice
itself, in addition to considering the manner in which its constituent activities are implemented. In
other words, the ethics of employee selection are as relevant as the ethics in employee selection.

SOME META-ISSUES
THE INEXTRICABLE MIX OF COMPETENCE, ETHICS, JUDGMENT, AND VALUES
In the selection enterprise, industrial-organizational (I-O) psychologists work at the intersection of no
fewer than four domains that ostensibly are conceptually distinct but which have ambiguous, uncer-
tain, and probably overlapping boundaries. We make decisions that reflect simultaneously varying
aspects and degrees of (a) technical competence, (b) ethical considerations, and (c) differences in
professional judgment. Moreover, interpretations and conclusions regarding the substantive matters
at hand also reflect (d) the individual I-O psychologist’s views regarding such questions as “Whose
interests matter?”, “Who is to benefit?”, or “What is the right thing to do?”, as well as other personal
beliefs, attitudes, assumptions, and social values. For example, the choices and decisions made to
estimate population parameters from single-sample or mean validity coefficients involve the gener-
ally unrecognized melding of technical, normative, and values issues in which it may be difficult to
disentangle one’s professional judgment from one’s personal preferences. One sometimes encounters
parameter estimates that are based on national rather than local labor pool measures of predictor
variability in the applicant population or that use low archival estimates of the reliability of supervi-
sor criterion ratings when actual reliability data may be accessible. And perhaps most dramatically,
the economic utility of selection tests based on the prediction of individual-level subjective criteria
(supervisor ratings) may be extrapolated to estimates of organizational-level firm financial perfor-
mance in the absence of data justifying such causal inferences, particularly at that level of analysis.1

1 A recent meta-analysis of the between-organizations effects of “high-performance work practices” (HPWPs) on organizational-
level performance uncovered just 15 studies that investigated the impact of selection (Combs, Liu, Hall, & Ketchen, 2006).
They yielded a mean validity coefficient of only .11 (.14 corrected for measurement error). Moreover, most such studies
have used “postdictive” designs in which a claim that the HPWP has had a causal influence on the organizational outcomes
is not warranted (Wright, Gardner, Moynihan, & Allen, 2005). Perhaps more importantly, we are not aware of any within-
organization studies documenting the effect of selection systems on overall financial performance of the firm.

571
572 Handbook of Employee Selection

The point of the illustration is not to denigrate the attempt to better understand the validity
and utility of selection systems but to point out the underlying nature of the estimation procedure
and our critique. They both inextricably entail decisions reflective of not only technical knowl-
edge and experience, but also ethical considerations of appropriateness and professional judgment.
Moreover, all of those actions are shaped in the context of motives that reflect personal, societal,
and/or professional interests and values. Is it coincidental that the effect of each of the choices
and common practices mentioned above is to maximally increase the numeric value of estimated
validity and utility? As noted below, one of the customary “gut checks” for those who consciously
wrestle with ethical dilemmas is to look with suspicion on one’s tendency to opt for a solution that
just happens to be self-serving.
Those who have given some thought to the matter have identified the values structure of I-O
psychology as representing primarily managerial or corporate interests; historically, even at times
to the extent of having an anti-labor bias (Baritz, 1960; Katzell & Austin, 1992; Lefkowitz, 1990,
2003, 2004, 2005, 2008; Zickar, 2001). The viewpoint that informs this chapter differs in that we
are greatly influenced by three values positions that are at variance with such a unitary perspective.
We will have more to say about them later, but we highlight them briefly here so that the reader
may be clear about our values position and how it may agree with or differ from the reader’s own.
First is the universalist principle in moral philosophy that suggests that no one’s interests warrant
a priori preference over anyone else’s—although there may be factual reasons in specific instances
that justify doing so (Rachels, 1993; Singer, 1995). Second, and commensurate with universalism,
is the normative version of the prominent business ethics model of multiple stakeholder manage-
ment (Freeman, 1984; Freeman & Phillips, 2002), which asserts that it is right and just that powerful
organizations that have enormous impact on society should recognize the legitimacy of the interests
of those affected by it. (The instrumental version of the model holds that an organization’s actual
success is dependent on how well it manages its relationships with all its key stakeholder groups.)
And third, complementing the first two, is the so-called “professional ideal” (Kimball, 1992) or
“professional model” (Hall, 1975), which asserts that the power and authority granted by society to
a profession, such as I-O psychology, entail reciprocal responsibilities of the profession extending
beyond its direct clients to the society at-large.
With respect to the practice of employee selection, there are at least ten discernable groups of
people who have a stake, directly or indirectly, in the process and/or its outcomes. They include
qualified job candidates who are recognized as such by the selection system and thus hired; quali-
fied candidates who are misidentified by the system and rejected; unqualified candidates correctly
identified and so not hired; unqualified candidates who are misidentified and hired; coworkers of
the successful candidates, and other employees, whose own work is in some way impacted by them;
their direct supervisors, whose own success may be dependent on the work performance of the new-
hires; higher-level supervisors and managers of superordinate work units whose success also may
be contingent on the performance of the newcomers; the owners or shareholders of the company,
whose investments depend on the overall performance of the organization; the company’s clients or
customers, who purchase the goods or services produced by it; and the local community from which
the job applicants are drawn, which may be affected in various ways by the company’s actions and
success. The nature of their interests or “stake” in the selection system differs for many of them, as
does the extent of its impact on them. But they all potentially have some legitimate claim to have
their interests considered.

AN UNDERAPPRECIATED CONSTITUENCY: THE PARTICIPANTS IN


VALIDATION RESEARCH AND SELECTION PROGRAMS
There is one ethically relevant matter that underlies research with human participants in the biologi-
cal, social, and behavioral sciences that is often overlooked, including in selection contexts. It is that
such research, with a few exceptions, is generally not aimed at directly benefiting the people who
Ethics of Employee Selection 573

participate in it as subjects (Lefkowitz, 2007a,b). This is not to deny that research participants may
ultimately benefit from the application of research findings (contingent on positive study outcomes)
through the development of new drug treatments, more effective teaching strategies, more reward-
ing and satisfying jobs, or by not being placed in an ill-fitting job. But most of the applied research
conducted by I-O psychologists is driven by the intentions of senior organizational policy-makers in
the service of organizational objectives or by the theoretical interests, curiosity, or ambitions of the
researcher. For example, the development and validation of employee selection methods is generally
not aimed explicitly at benefiting members of the validation sample(s). Whether the participants are
current employees or job applicants, the results are applied to subsequent groups of job candidates
to improve organizational effectiveness. Admittedly, however, those selected may benefit indirectly
by having more competent coworkers.
Because research participants are often, in this sense, used by us for testing and validation
research in which they may have no personal interest in the outcome, we are ethically duty-bound
to seriously consider issues such as the voluntary nature of their participation in the study, the
extent of potential subjects’ obligation to participate, obtaining their participation through use of
various inducements or implicit coercion, providing informed consent, examinees’ rights to access
their own test data, and providing feedback. And when a testing program becomes operational,
additional matters arise, including whether to provide an opportunity for retesting rejected appli-
cants and the confidentiality and use of assessment data from incumbents. In the United States,
researchers’ obligations in some of these areas are incorporated in regulations promulgated by
the Federal Office for Human Research Protection (OHRP) of the U.S. Department of Health and
Human Services (OHRP, 1991).2
However, the circumstances under which employee selection I-O psychologists conduct employee
selection are generally recognized to provide us with somewhat greater ethical latitude. For example,
(a) informed consent for testing is ordinarily not required of educational or employment applicants
because they are deemed to have implicitly given consent by virtue of having applied (American
Educational Research Association, American Psychological Association, National Council on
Measurement in Education Standards for Educational and Psychological Testing, 1999 [hereafter,
APA Standards, 1999], Standard 8.4; American Psychological Association, Ethical Principles of
Psychologists and Code of Conduct, 2002 [hereafter, APA Code], Ethical Standard 9.03[a]); (b)
job applicants may acceptably be asked to waive (or be assumed to have waived) access to their
test results and so not receive any feedback (APA Standards, 1999, Standard 8.9 & 11.6; APA Code
2002, Ethical Standard 9.10); and (c) providing an opportunity for job candidates to be retested is
not obligatory (APA Standards, 1999, Standard 12.10). On the other hand, some ethical require-
ments are viewed as virtually universal, even in the operational employment setting, such as safe-
guarding to the extent feasible the confidentiality of test data and protecting against their misuse
and informing employees or applicants beforehand of any limitations on confidentiality (APA Code,
Ethical Standards 1.01, 3.11, 4.01, 4.02, 4.05, 9.04).
Moreover, we argue there are some very good reasons why we ought not always avail ourselves
of all the legitimate ethical exceptions that have been ceded to the practice of I-O psychology and
should instead behave as if the more stringent ethical research environment pertained. As one of us
has noted previously:

A corollary of that advantage, or right, we enjoy as a consequence of employees’ obligations [to


cooperate with legitimate, non-threatening research] is the duty to see that their obligation is not
abused or experienced as coercive. There is, obviously, an inherent conflict between the principle that

2 The regulations pertain to all research with human participants whether supported by government funds or not. And
research is defined as “a systematic investigation, including testing and evaluation, designed to develop or contribute to
generalizable knowledge” (§46.102[d]). Special note should be taken that “contribut[ing] to generalizable knowledge” is
often operationalized as seeking to publish or otherwise make public the findings of the study (such as at a professional
conference). The regulations are available at http://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm#46.102
574 Handbook of Employee Selection

all research participation should be explicitly voluntary and the existence of a relatively open-ended
implicit obligation of workers to participate in legitimate organizational research. Notwithstanding
the implied obligation, adherence to the moral principle of respect for persons requires that we treat
research participation as genuinely voluntary and volitional to avoid even the semblance of coercion.
(Lefkowitz, 2003, p. 336)

To the moral arguments regarding respect for the worth and dignity of all people, we add some
pragmatic considerations. To function effectively in an organization, I-O psychologists depend on
the good will and cooperation of organization members, which in turn depend on the psychologist’s
reputation and the general reputation of the profession. Treating people cavalierly by taking their
cooperation for granted is likely to produce adverse effects on future interactions—including those
initiated by other I-O researchers and practitioners in that organization. In other words, it is in our
own self-interest and in the interests of the profession to always treat those with whom we work
honestly and with deference. A good principle to follow stems from the suggestion of the social
psychologist Robert Rosenthal (1994), which is that we ought to think of our potential participants
as a “granting agency” to which we must apply for necessary resources to implement our proposed
investigations. Obviously, most of the selection work we do could not be accomplished without the
input and cooperation of applicants, employees, and/or other subject matter experts (SMEs).

THE UNIVERSALIST, MULTIPLE STAKEHOLDER, PROFESSIONAL PERSPECTIVE


An implicit attribute on which all normative moral theories can be arrayed is the extent to which they
are egoistic or universalist in nature. This meta-issue pertains to whose interests should be consid-
ered in understanding what is the good or right thing to do—only one’s own, or also those of others
(typically, all those affected by the actions contemplated)? Although the meta-theoretical position
of unqualified ethical egoism has not been represented frequently among moral philosophers since
the time of Aristotle, it should be understood at least as a point of departure. For Aristotle, the
ultimate aim of human behavior (i.e., the ultimate good) is one’s own happiness. (In the Greek it is
eudaimonia—generally thought to connote personal fulfillment and actualization as well as simply
feeling happy.) For Aristotle, happiness results from acting in accord with all of the human virtues,
even the altruistic ones such as beneficence and sympathy. So for him there was no contradiction
between self-interest and a broader-based, more altruistic conception of morality. Needless to say,
however, contemporary ethical debacles in the world of business and elsewhere have displayed the
ugly side of an unqualified pursuit of self-interest. Modern philosophers such as Rachels (1993)
have outlined two arguments that seem to repudiate unrestricted ethical egoism as a basis for moral
theory. First, if one accepts that a major objective of the ethical enterprise is to provide moral guid-
ance that serves to reduce conflict and enhance cooperation among members of a society, it is clear
that the unqualified pursuit of self-interest is counterproductive of these aims (Samuelson, 1993).
Second, unrestricted egoism can be classified as one of a family of moral perspectives that makes a
priori distinctions between people and justifies treating them differently on the basis of those puta-
tive differences (as with racism, sexism, anti-Semitism). In this instance, the distinction is simply
between oneself and everyone else. But “We can justify treating people differently only if we can
show that there is some factual difference between them that is relevant to justifying the difference
in treatment” (Rachels, 1993, p. 88). (In this context, the process of negatively stereotyping a minor-
ity group can be understood as an attempt to manufacture such “differences” as justifications war-
ranting prejudicial treatment.) Singer (1995), in complementary fashion, observes “Self-interested
acts must be shown to be compatible with more broadly based ethical principles if they are to be
ethically defensible, for the notion of ethics carries with it the idea of something bigger than the
individual” (p. 10). (And, indeed, in the next section we turn to a discussion of those bigger ideas.)
In other words, in the universalist tradition no one’s interests and welfare, including one’s own, have
a greater a priori moral claim than anyone else’s. Or, expressed affirmatively, the interests and rights
Ethics of Employee Selection 575

of all those affected by an action are to be considered equal with respect to moral judgments regard-
ing the action, unless good and reasonable arguments to the contrary can be made.
The best-known reflection of moral universalism in the field of business ethics and the social
responsibility of business institutions is the normative version of the multiple stakeholder perspec-
tive (Freeman, 1984; Freeman & Phillips, 2002). Instrumental stakeholder theory is merely descrip-
tive. Normative stakeholder models are prescriptive and stem from recognition of the enormous
power, size, and widespread societal impact of corporations. From those observations it is con-
cluded that they have an obligation to take into account the interests of the many constituencies that
are impacted by their actions and with whom they may be thought of as having implicit social con-
tracts. I-O psychologists are probably ahead of our colleagues in other subfields of psychology who
apparently are only now anticipating the likelihood of finding themselves “increasingly drawn into
situations where a multitude of social and political interests apply across hierarchies of individuals
to whom we owe various degrees of professional duties” (Koocher, 2007, p. 381). But Koocher’s
advice in such situations is well taken. Among other things, he recommended:

By focusing on the welfare and best interests of the most vulnerable party in the chain of individuals
to whom one owes a legitimate professional duty, psychologists optimize the likelihood of a good
outcome. At times, the most vulnerable party may be the public at large. (p. 381)

All of this suggests that in evaluating a selection system we ought to consider not only its effec-
tiveness but, from an ethical perspective, its impact on all of those affected. We always have an
instrumental concern for the extent of productivity improvement our client company or employer
can expect from cohorts of job applicants hired on the basis of validated predictor measures. That
is, in the language of selection classification, we anticipate significantly increasing the proportion
of those selected who are successful on the job (“true positives”) in relation to those hired who are
unsuccessful (“false positives”). But we should also be concerned about the proportion of incorrectly
rejected applicants who have been denied employment (“false negatives”) because of the imperfect
validity of those predictors. In other words, enhancing the interests of the organization adversely
and arguably unfairly impacts a substantial proportion of the applicant population. In addition, cus-
tomary attempts to further increase productivity improvement by means of more restrictive hiring
(decreasing the selection ratio) can generally be expected to exacerbate the harm by increasing the
proportion of false negatives among those rejected—to a greater extent than the decrease in the pro-
portion of false positives. The structure of this situation is that of a classic ethical dilemma—actions
that benefit some directly hurt innocent others—yet to our knowledge it has never been addressed
seriously in the literature of I-O psychology. We surmise that the reason is that I-O psychology, at
least in the context of employee selection, tends to view the organization itself as the primary (or
only) relevant stakeholder. The interests of applicants, especially rejected applicants who are not
and will not be members of the organization, are generally not considered.
But that is an inappropriately narrow view for professionals to hold. Professions are character-
ized by attributes that distinguish them from other occupations (Haber, 1991), and among the more
salient of those attributes is a sense of responsibility and obligation that extends beyond the paying
client to segments of the broader society. This is generally thought to constitute a quid pro quo for
the considerable amount of power, influence, and respect afforded by society to professions and their
members. This broader perspective has been referred to as a “true professional ideal” (Kimball,
1992, p. 303) or “the professional model” (Hall, 1975, p. 72). In sum, the universalist tradition in
moral philosophy; the multiple stakeholder approach from the study of management, business, and
society; and the professional model from the sociological study of occupations all coalesce around
the notion that ethical evaluations of our selection programs require that their impact on all of those
affected by them be considered. This could mean assuring that rejected candidates are afforded an
opportunity for retesting and perhaps even, when circumstances and budgets allow, utilizing rela-
tively low cut-scores and relying on probationary employment as a selection device.
576 Handbook of Employee Selection

ETHICAL PRINCIPLES AND DILEMMAS


How does one know when he or she is faced with an ethical dilemma, as opposed to a mere
technical, procedural, administrative, or professional problem? (They are not mutually exclusive.
Ethical dilemmas may occur in any of those realms.) The study of moral thought has yielded three
succinct criteria by which to answer the question (Wittmer, 2001). The problem will involve (a) the
expression of fundamental moral or ethical principles like those discussed below and articulated in
formal ethical codes such as that of the APA (2002), and the individual will be faced with (b) having
to make a decision that (c) has significant impact on others.

ETHICAL PRINCIPLES
Although space constraints preclude delving into the origins of the ethical principles presented here,
it should be noted that they emerge from a long history of moral philosophy and more recent work
in moral psychology (see Lefkowitz, 2003, for a review). They are reflected in various normative
ethical theories that are generally either deontological or consequentialist in nature. Deontological
theories are concerned with right and wrong per se; they hold that the rightness or wrongness of
an action is intrinsic to the nature of the act on the basis of whether it violates a moral principle.
Deontologists are concerned with principled expressions of rights, duties, responsibilities, virtue,
fairness, and justice. Some deontological principles are expressed positively in terms of affirmative
duties (e.g., treat job applicants with respect; protect the confidentiality of test data), but many are
expressed negatively in terms of actions that are disallowed versus what it is permissible to do (e.g.,
do not exaggerate or make “hyperclaims” to organization decision-makers about the likely benefits
of your proposed selection system). If one of us fails to protect the confidentiality of employee test
data, the deontologist will view that person as having wronged those individuals even if there is no
evidence of their actually having been harmed. Note, however, that harm may have been caused to
the reputation of the profession.
Consequentialists, on the other hand, define right and wrong in terms of the good and bad (or
benefits and harms) that will result from an action. The relative morality of alternative actions is
directly reflected in the net amounts of goodness that can be expected to result from each. The
option that leads to the greatest amount of net good or the least amount of net harm (considering
all those to be impacted) is the morally imperative choice, not merely a permissible one, as with
deontological approaches. Neither system of thought is free from legitimate criticism by proponents
of the other perspective, and some situations seem more amenable to analysis by one rather than the
other of the two strategies, so that the prudent professional should be familiar with both approaches
to ethical analysis and decision-making.

Respect
People have the right to be treated with dignity and respect and allowed to exercise their rights to
privacy, confidentiality, freedom, autonomy, and self-expression. These rights are universalizable
(i.e., applicable as much to anyone else as to oneself) and bounded by reciprocal obligations. For
example, our right to pursue our research objectives should not supersede an employee’s right to
not participate in the study. With regard to the principle of respect, psychologists are obligated in
particular to be aware of and to eliminate the effects of prejudice and bias related to “age, gender,
gender identity, race, ethnicity, culture, national origin, religion, sexual orientation, disability, lan-
guage, and socioeconomic status” (APA Code, 2002, Principle E).

Fairness and Justice


The notion of justice can be among the more nebulous ethical principles to conceptualize and imple-
ment. For example, it can be defined deontologically as each person having a fair balance of rights
and duties, or in consequentialist fashion as each receiving a fair proportion of the available benefits
Ethics of Employee Selection 577

and burdens associated with membership in a social system (e.g., organization or nation). However,
alternative criteria of fairness represent social and moral values positions and so are influenced
greatly by macro-level political and economic systems (e.g., the marked preference in the American
free-enterprise system for the distributive justice criterion of equity, or “merit” as opposed to equal-
ity or need).
On the face of it, equality appears to be a straightforward and uncomplicated standard of fairness.
If a metaphorical apple pie represents the sum benefits and rewards provided by a social system such
as a work organization or an entire society, and there are ten members of the system, it would seem a
simple matter to cut ten equal pieces. But even this simple criterion may be difficult to implement if
there are procedural and/or technological difficulties in dividing the pie into exactly equal portions.
For example (a) some resources may be easily divided, such as money, others less so (e.g., there may
be fewer promotions available than the number of eligible and deserving employees); (b) it may be
difficult to equate portions if the benefits consist of multiple nonfungible rewards such as an apple
pie and a steak; moreover (c), because equality does not consider likely differences in people’s pref-
erences for different benefits, numerical equality may not correspond with perceived equality.
Now if we add the further need to specify members’ inputs or contributions to the social
system—as we must to achieve the criterion of equity—things get really complicated. Equity has
to do with the equivalence of people’s ratio of inputs to outcomes or as it is frequently expressed,
“benefits and burdens” or “earned rewards,” and this is the normative hallmark of fairness in the
United States. All of the potential administrative and measurement problems of achieving equality
are multiplied by having also to consider people’s contributions to the system and their relation-
ships to rewards, in an attempt to reach equity! Suppose that some of our ten members each spent
a couple of hours in the kitchen baking that apple pie and grilling that steak, whereas some spent
the past year tending to the orchard that produced the apples and raising the cow that yielded the
steak, whereas a couple of people are merely members of the family that owns the farm. How are
their contributions to be assessed? And we have not yet considered the criterion of need. What if all
of the above “facts” are true, but in addition we know that some of the ten members have been and
continue to be well-fed, some have not eaten since yesterday, and some are near-starvation? Why
should we not recognize need as the relevant criterion of fairness? And if we were to do so, would it
matter why some of them are so hungry? Lastly, the importance of the criterion chosen is suggested
by the observation that there is no reason to suppose any substantial correlation among the three—
meaning that the rank ordering of people in terms of benefits received will likely be different under
equality, equity, and need.

Caring: Beneficence
The origins of this principle are in consequentialist theory and the “ethics of care” in moral psychol-
ogy. It is reflected in the traditional service ideal of the professions: “Psychologists are committed
to increasing scientific and professional knowledge of behavior … and to the use of such knowledge
to improve the condition of individuals, organizations, and society” (APA Code, 2002, p. 1062), and
“providers of I-O psychological services are guided primarily by the principle of promoting human
welfare” (APA, 1981, p. 668). Although this principle is generally interpreted within the context of
the universalist meta-principle that the interests of all those concerned are to be considered equal, it
is also generally recognized that most of us justifiably care more for some people more than others,
and/or there may be some to whom we owe a special obligation or duty (e.g., family, friends, neigh-
bors, colleagues, clients, employer). Therefore, it is usually not viewed as unethical per se to act on
those special concerns and obligations. However, there may be occasions when such actions slide
impermissibly far down the slippery slope of favoritism, prejudice, bias, or nepotism.

Caring: Nonmaleficence
This principle—refraining from unjustifiably harming others—is the moral principle about
which there seems to be the greatest consensus among ethicists. It is not in all respects simply the
578 Handbook of Employee Selection

opposite of beneficence. In contrast to the equivocation regarding the extensiveness of the domain
of beneficence as a function of our social identity and role relationships, the obligation to not cause
unjustifiable harm is generally thought to apply equally to all others, even strangers. It is espe-
cially pertinent with regard to those who are in potentially vulnerable positions (e.g., employees,
students, job candidates, research participants). The primacy of nonmaleficence is indicated in the
APA Code (2002):

When conflicts occur among psychologists’ obligations or concerns, they attempt to resolve these
confl icts in a responsible fashion that avoids or minimizes harm … and [they] guard against personal,
fi nancial, social, organizational, or political factors that might lead to misuse of their influence.
(p. 1062)

Moral Character
Many ethical treatises and professional codes of conduct include discussions of personal attributes
having to do with the character of the person potentially faced with a moral dilemma rather than
on the process of his or her ethical decision-making. For example, the APA’s (2002) ethical code
is explicit about the importance of fidelity and responsibility (to those with whom we work, to
our communities, and to society) and of integrity (accuracy, honesty, truthfulness, and promise-
keeping) (Principles B and C, respectively, p. 1062).
In recent years technological advances (e.g., web-based job applications and employment test-
ing), changes in the nature and conditions of work (e.g., home-based work, increased use of teams),
and other dramatic changes such as the globalization of organizations have impacted the way in
which ethical problems in employee selection (and other professional domains) are manifested.
Notwithstanding those changes in the manifest circumstances of contemporary work life, the
importance of the five sets of fundamental moral principles noted above is reflected in the following
observation:

The paradigmatic forms taken by those problems, the character traits and motives needed to recognize
them as such, the ethical reasoning used to address them, as well as the substance of the ethical
principles on which such reasoning is based are all essentially unaffected and still pertain. (Lefkowitz,
2006, p. 245)

We turn now to a consideration of those paradigmatic forms.

FORMS OF ETHICAL DILEMMAS


Hoffman’s (1988) theory of moral development included three ideal types of moral dilemma from
which the internalized sense of morality develops. Lefkowitz (2003, 2006) has elaborated and
extended those to four forms or paradigms of ethical challenges that seem to represent a comprehen-
sive taxonomy (with the understanding that there may be combinations of two or more of them).

Paradigm I. Preventing Harm: Possessing Foreknowledge


of Someone to Be Harmed or Wronged
HR managers and organizational consultants frequently are privy to impending company policy
decisions or personnel actions that may entail some harms or wrongdoing. For example, a manager
may intend to promote someone generally known to be less qualified than other candidates. A
senior HR administrator may be intent on implementing the latest selection test fad that you know is
likely to have adverse impact on minority applicants and has virtually no credible validity evidence.
Failing to act to prevent an impending harm or wrong may sometimes be motivated primarily by a
sense of organizational loyalty rather than by self-serving objectives; but revealing, challenging, or
resisting a contemplated action by a superior might also entail some personal risk, hence exacerbat-
ing the dilemma.
Ethics of Employee Selection 579

Suppose you are an internal consultant in the management development section of the HR depart-
ment of a large corporation and you are beginning to train a group of high-level managers to serve as
assessors for a planned assessment center. When leading an evaluation discussion following a mock
exercise engaged in by actors playing hypothetical promotion candidates, a senior executive—the
apparent informal leader of the group—makes a demeaning sexist remark about the one female
candidate being discussed, and all of the other managers laugh appreciatively. Responding appro-
priately and ethically may require an abundance of courage and social skill.

Paradigm II. Temptation: Contemplating a Self-Serving Action That Would


Be Unjust, Deceitful, or Potentially Cause Harm to Another
Recent notorious examples of this sort of unethical action in the corporate world are well known.
Other examples may be less extreme instances of acquiescing to inappropriate peer (or superior)
expectations to “get along.” Of particular relevance to organizational life are instances in which
one’s potential unethical actions serve the explicit or implicit policies, directives, or aims of the
organization, rather than one’s own personal interests. Even so, given the prevalence of employees’
psychological identification with the organization, formal performance-based reward systems, and
the informal recognition to be gained by accomplishing company goals and objectives, such (mis)
behavior nevertheless might also be readily construed as self-serving.

Paradigm III. Role Conflict: Having Competing Obligations or Responsibilities


to Two or More Persons or Other Entities Such That Fulfilling
One Might Mean Risking the Other(s)
This type of dilemma is almost inevitable, given the universalist, multiple-stakeholder, professional
perspective that acknowledges responsibility to several (perhaps conflicting) constituencies. Role conflict
is especially salient for employees who are in internal boundary-spanning positions with responsibili-
ties to multiple departments. It is also pertinent for those who operate at the external boundaries of the
organization, such as salespersons and purchasing agents who may have considerable loyalty to long-
standing customers, clients, or suppliers as well as to their employer, or such as professionals who
acknowledge their responsibilities to society and the common good as well as to the organization.
Consultants who are afforded the opportunity to work with multiple (i.e., competing) firms in
the same industry should also be familiar with this form of potential dilemma. Relevant matters to
be considered include the consultant’s general representations regarding knowledge gained from
working with previous clients/competitors; each party’s understanding of the expectations of client
#1 with respect to the consultant’s prospective work with competitors; what useful information,
whether proprietary or not, was garnered from working with client #1 that might be useful in
working with client #2 and by extension improve their competitive position; client #2’s expectations
regarding accessibility of the consultant’s cumulative knowledge of the policies of other firms in the
industry; etc. For example, suppose a portion of the criterion-related selection test validation project
paid for by client #1 consisted of the time-consuming development of a complex criterion measure
based on an empirically derived composite of job performance indicators. Such sophisticated job
knowledge, if shared, could be a persuasive part of the consultant’s “sales pitch” for conducting a
selection study for client #2. Is that appropriate? These are all matters best discussed with client #1
before beginning that project.

Paradigm IV. Values Conflict: Facing Conflicting and Relatively Equally Important
Personal Values So That Expressing One Entails Denying the Other(s)
At a somewhat macro-level, this is the battlefield on which conflicts play out between the objectives
of shareholder value and corporate financial performance (CFP) on one side versus the putative
societal obligations of business reflected in the corporation’s “social performance” (CSP; Lefkowitz,
2007c). At the level of specific HR systems such as selection, it gets reflected in the attempt to bal-
ance ostensibly competing objectives such as increasing economic utility and decreasing adverse
580 Handbook of Employee Selection

impact on minorities (De Corte, Lievens, & Sackett, 2007). It is on a note of optimism that we point
out that the most recent accumulations of evidence suggest that CFP and CSP may be entirely com-
patible or even complementary (Guenster, Derwall, Bauer, & Koedijk, 2005; Orlitzky, Schmidt, &
Rynes, 2003).
None of the four paradigms explicitly mention matters of technical competence. Is competence
an ethical issue? In fact, the APA Code (2002) contained six enforceable standards in the section
headed “Competence”; for example, “Psychologists’ work is based upon established scientific and
professional knowledge of the discipline” (Standard 2.04), and “Psychologists undertake ongoing
efforts to develop and maintain their competence” (Standard 2.03). Suppose an I-O psychologist
conducts a well-done validation study for a client and reports a statistically significant validity coef-
ficient for a set of predictors but fails to report several other nonsignificant coefficients with respect
to other relevant criteria investigated. If the psychologist is ignorant of the stricture against exag-
gerating validation findings by capitalizing on chance relationships, he/she is not competent. Now,
what if the psychologist was not ignorant of these psychometric matters, but had struggled with
temptation and ultimately, and with some misgivings, omitted the negative findings out of concern
for disappointing the client? Or worse still, in a third variant, suppose he/she freely and deceitfully
chose to distort the nature of the findings to justify having guaranteed favorable results in advance?
Ethical analyses invariably include a focus on the “bottom line” of an action taken or actions
contemplated (i.e., the consequences of the act(s) on all those affected). Each of these three sce-
narios represents an ethical transgression because of the potential harm to be caused the client and
job applicants by using an ineffective selection system and ultimate harm to the reputation of the
profession when the ineffectiveness becomes apparent. However, the motives of the psychologist
are rather different in each scenario, and in ethical analyses it is also true that motives matter. So,
in what way do they matter? In each of the scenarios the psychologist is portrayed as increasingly
venal: from merely inexcusably ignorant and failing to live up to one’s professional obligations,
to defensively self-protective and disrespectful of a client’s rights, to premeditatedly self-serving
and deceitful. Two observations are warranted. First, these different motives—implying different
moral characters—make little or no difference in terms of the consequences of the incomplete data
reporting. That is why “mere” incompetence is an ethical matter. Second, it is likely that the reader
feels somewhat differently about each of our three hypothetical transgressors—perhaps feels that
their degree of venality is related directly to their degree of culpability. That may, depending on
circumstances, appropriately lead to differences in what we view as the suitable degree of penalty
or opprobrium for each of our three transgressors for the “same” offense.

ROLE OF ETHICAL CODES IN PROFESSIONAL PRACTICE:


HISTORICAL AND CURRENT PERSPECTIVES
Whether licensed or not, professional psychologists are expected to follow the ethics code of the
APA (APA Code, 2002). A brief history of professional ethics in psychology reveals the absence of
a code for the first 50 years of the APA (Pope & Vetter, 1992), its initial empirical start based on
critical incidents and its evolution over the last 60 years. Over the last decade, greater attention has
been paid to I-O issues so that the current code even applies to selection work.
Professional codes of conduct typically derive from the practice of a profession; behaviors that
arouse concerns about appropriate and inappropriate behavior work their way into a code of conduct
over time. They also often inductively work themselves backward to a philosophical basis rather
than starting that way. For example, consider the famous Hippocratic Oath for medicine, one trans-
lation (Edelstein, 1967) of which, thought to date back to the 5th century BC, is as follows:

I swear … that I will fulfill according to my ability and judgment this oath and this covenant:
I will apply dietetic measures for the benefit of the sick according to my ability and judgment; I will
keep them from harm and injustice.
Ethics of Employee Selection 581

I will neither give a deadly drug to anybody who asked for it, nor will I make a suggestion to this
effect.
I will not use the knife … but will withdraw in favor of such men as are engaged in this work.
Whatever houses I may visit, I will come for the benefit of the sick, remaining free of all intentional
injustice, of all mischief . . . .
What I may see or hear in the course of the treatment or even outside of the treatment in regard to
the life of men, which on no account one must spread abroad, I will keep to myself, holding such things
shameful to be spoken about . . . . (p. 6)

Note that the Hippocratic Oath is not particularly abstractly philosophical. It does not emphasize
moral principles underlying the ethical admonitions despite having been created in a golden era of
philosophical and ethical considerations. It imposed on those taking the oath very specific obliga-
tions to behave in a certain way and not to behave in other ways. Some of its tenets are readily
interpretable in terms of modern professional ethical standards and others would be irrelevant or
considered inappropriate in today’s world. Another aspect of the oath that is relevant to contempo-
rary I-O psychologists is, despite its pragmatic orientation, its explicit recognition of the broader
professional and moral context within which the specific obligations are taken on. The doctor is
not only to be technically competent (providing good dietary recommendations, not performing
surgery), but is to be pure, prevent harm and injustice, and protect confidentiality.
Some professions favor a narrow and explicit approach to ethical practice—i.e., if it is not explic-
itly prohibited (e.g., by an ethics code) then it is not unethical. (An extension of this approach is
the view that any action that is not clearly illegal is morally permissible.) Others see the need for
a broader more proactive approach in which moral principles and the values to which they give
rise deserve expression, even when no specific ethical “violation” is identified. In its enforceable
Standards, the APA Code (2002) is an example of the former pragmatic approach; in its General
Principles, it exemplifies the latter. It bears reminding that for more than the first half century of
its existence, the APA had no ethics code at all. This case of apparent arrested development reflects
the historical growth of a field that for the early part of its existence was not as concerned with the
practice of psychology as with the establishment of the field as a science. Only with the burgeoning
growth of psychological practice around the time of World War II did the need for a formal code of
ethics for psychology become more intensely apparent.
The initial code of ethics for psychologists emerged from an empirical rather than an a priori
theoretical or philosophical base (cf. Pope & Vetter, 1992). Members of the association were polled
about incidents they had encountered that raised ethical issues, and from those data, an initial code
of ethics was written. The field of I-O psychology is relatively new as an applied area of training and
practice (cf. Lowman, Kantor, & Perloff, 2006). As a result, until the 2002 revision of the code there
has not included much in it to suggest that it was written with applied practice in I-O psychology in
mind. A partial exception was the area of testing, a domain that is included in many types of applied
practice, and so has had a long-time focus in the various editions of the code. However, more atten-
tion is paid in the code to issues associated with test construction and with applications in individual
assessment contexts than explicitly to the mass testing often associated with employee selection.
However, the 2002 code did take modest steps to address how the ethics principles and standards
applied to I-O and organizational consulting psychology. The code added several references to
consulting and psychological work in organizations and includes “organizational clients” in most
categories of service. For example, ethics Standard 3.11 explicitly concerns psychological services
delivered to or through organizations. It clarifies the issues involved in working with individuals
versus organizations and the responsibilities of psychologists to the individual organization mem-
bers with whom they work when they are not themselves the defined client.

3.11 Psychological Services Delivered To or Through Organizations


(a) Psychologists delivering services to or through organizations provide information beforehand to
clients and when appropriate those directly affected by the services about (1) the nature and objectives
582 Handbook of Employee Selection

of the services, (2) the intended recipients, (3) which of the individuals are clients, (4) the relationship
the psychologist will have with each person and the organization, (5) the probable uses of services
provided and information obtained, (6) who will have access to the information, and (7) limits of
confidentiality. As soon as feasible, they provide information about the results and conclusions of such
services to appropriate persons.
(b) If psychologists will be precluded by law or by organizational roles from providing such information
to particular individuals or groups, they so inform those individuals or groups at the outset of the
service. (APA Code, 2002)

The entire ninth standard, which is on assessment, has direct applicability to most employee
selection psychology. It encompasses individual assessments and those done in the context of groups
such as with applicant selection, and it indicates that consent may ethically be implied.

9.03 Informed Consent in Assessments


(a) Psychologists obtain informed consent for assessments, evaluations, or diagnostic services,
as described in Standard 3.10, Informed Consent, except when … (2) informed consent is implied
because testing is conducted as a routine educational, institutional, or organizational activity (e.g., when
participants voluntarily agree to assessment when applying for a job); or (3) one purpose of the testing
is to evaluate decisional capacity. (APA Code, 2002)

This section of the code also identifies the requirements for test construction, issues related to out-
dated assessments, and issues related to feedback on the results of tests. It also deals with release of
information about tests, obligations of psychologists concerning test security, and situations involv-
ing obsolete tests.
However, it can be argued that the code says very little per se about the common situation in
which psychologists who administer large testing programs in industry or government work for
non-psychologists, and the decisions about testing programs are made by persons with little psycho-
logical training. A section of the code does cover such situations generically.

1.02 Conflicts Between Ethics and Law, Regulations, or Other Governing Legal Authority
If psychologists’ ethical responsibilities conflict with law, regulations, or other governing legal authority,
psychologists make known their commitment to the Ethics Code and take steps to resolve the conflict.
If the conflict is unresolvable via such means, psychologists may adhere to the requirements of the law,
regulations, or other governing legal authority. (APA Code, 2002)

There is also the following ethics standard which imposes an ethical obligation to take appropri-
ate action in response to misuse of one’s work:

1.01 Misuse of Psychologists’ Work


If psychologists learn of misuse or misrepresentation of their work, they take reasonable steps to correct
or minimize the misuse or misrepresentation. (APA Code, 2002)

SOME SPECIFIC ISSUES AND SOURCES OF ETHICAL PROBLEMS3


In this section we present several specific illustrative ethical issues in the practice of employee
selection and indicate the sections of the APA Code that provide some guidance.

3 The authors are grateful for input on this section from Robert Hogan, Joel Moses, George C. Thornton III, John G. Veres
III, and Michael J. Zickar.
Ethics of Employee Selection 583

GENERIC ETHICAL ISSUES IN SELECTION


The Basics: Issues of Validity
As the reader is likely to be well aware, the overwhelmingly most important matter in selection—
from technical, professional, and ethical perspectives—is the appropriate justification of the person-
nel actions taken, that is, the validity of the measures on which those decisions to hire or promote
people (or to decline to do so) are based. Validity is inherently an ethical issue because it reflects
the relative accuracy of selection decisions by which some are selected/hired and some are rejected;
the absence of validity can result in serious harm to both applicants and employers. (Many of the
ethical issues that seem associated with particular selection procedures represent manifestations of
this generic issue.) That validity is a fundamental ethical requirement is suggested, among other
Standards, by the following:

2.04 Bases for Scientific and Professional Judgments


Psychologists’ work is based upon established scientific and professional knowledge of the discipline.
(APA Code, 2002)

Professional Competence
As noted earlier, competence in the conduct of one’s profession is an important ethical issue for
many reasons (cf. Ethical Standards 2.01B2.06, APA Code, 2002). The issue of competence of
course overlaps that of validity but it also requires that psychologists base their practice on mastery
of the relevant technical knowledge base associated with their area of psychology; the consequences
of inappropriate practice by psychologists in this area can be considerable. As indicated by the fol-
lowing code:

2.01 Boundaries of Competence


(a) Psychologists provide services, teach and conduct research with populations and in areas only
within the boundaries of their competence, based on their education, training, supervised experience,
consultation, study, or professional experience. (APA Code, 2002)

Test Security
Psychologists are mandated by Ethical Standard 9.11 to maintain test security, viz.:

9.11 Maintaining Test Security


The term test materials refers to manuals, instruments, protocols, and test questions or stimuli and does
not include test data as defined in Standard 9.04, Release of Test Data. Psychologists make reasonable
efforts to maintain the integrity and security of test materials and other assessment techniques consistent
with law and contractual obligations, and in a manner that permits adherence to this Ethics Code. (APA
Code, 2002)

In the case of a psychologist administering intelligence tests in the context of a private practice
or school system, the issues of maintaining test security may be straightforward. However, in
today’s employee selection context, tests may be administered to hundreds of thousands of appli-
cants, tests may be administered electronically with no oversight of the test-taking circumstances,
and a team of psychologists and nonpsychologists may help to create and validate a test with little
direct control by a psychologist of the security of the process. Although the psychologist’s ethi-
cal mandate to protect test security is conceptually clear, the practical realities of corporate and
government testing contexts are often far more complicated and difficult than the code may have
contemplated.
584 Handbook of Employee Selection

Multiple Responsibilities: Who Is the Client?


As noted earlier, an important hallmark of a true profession is the recognition by its practitioners
of responsibilities that extend beyond the paying client. In the case of employee selection in orga-
nizations, those responsibilities extend in two directions: within the organization to individual job
applicants and promotional candidates and beyond the organization to the community that depends
on the continued success of the organization and that is impacted by its actions.

SOME ISSUES RELATING TO PARTICULAR SELECTION METHODS


Individual-Level Assessments
There are many ethical issues associated particularly with selection or evaluation at the individual
level (see Jeanneret, 1998, for a review). Although standards of practice are well defined at the level
of applying individual tests to clinical practice (e.g., assessing parents and children in the context
of fitness-for-parenting in divorce proceedings or prisoners’ fitness to stand trial), the literature on
individual assessments in selection contexts is far less developed. Issues of validity for individual
instruments such as the selection interview (Fletcher, 1992) and, particularly, the proper metric for
combining across domains of testing, such as in the domains of occupational interests, abilities, and
personality characteristics (cf. Lowman, 1991), suggest that there is much work still to be done for
valid conclusions to be drawn reliably. The use of multiple types of psychological assessment data
and the translation of such data into predictions that have psychological validity entail at least three
very significant issues: (a) whether all of the data can be quantified, and, if so, the relative weights to
be given each of the sources of information in arriving at a composite evaluation or prediction; (b)
if not, how to meaningfully integrate qualitative and quantitative information about the candidates;
and (c) what role the specific organizational context should play in any recommendations based on
the assessments (e.g., factoring in what is known about the supervisor of the targeted position and
the culture and expectations of the organization).
Additional ethical issues particularly relevant to the process of individual assessment include
maintaining confidentiality, recognizing that the assessee and the client organization are both cli-
ents of the assessor, the qualifications and proper training of those administering and interpreting
examinations, assuring that the client organization understands the limitations of the assessment
process, and providing adequate feedback to the candidates (Jeanneret, 1998; Prien, Schippmann, &
Prien, 2003).

Assessment Centers
Assessment centers (ACs) seem to have attributes that attract ethical challenges. Their notable
early and well-publicized success has led to a faddish proliferation beyond the resources of those
actually trained and skilled in their development, implementation, and administration. (Refer to
Chapter 37, this volume, for a description of the seminal AT&T Management Progress Study.)
For example, Caldwell, Thornton III, and Gruys (2003) have itemized ten “classic errors” in this
area of practice, most of which have ethical implications (poor planning, shoddy exercise devel-
opment, no pretesting of exercises, using unqualified assessors, etc.) Colloquially, experienced
AC practitioners also have observed problems such as promotion by consultants of the utility of
their AC entirely on the basis of the general research literature, which has little to do with the
consultants’ specific proposed procedures; use of unprofessional assessors who receive virtually
no training in behavioral observation, rating, and evaluation and who may be unfamiliar with the
particular exercises used; subsequent pressure on AC staff or consultants to use the data from an
earlier developmental AC for personnel actions (e.g., retention decisions during a reduction-in-
force, sometimes exacerbated by how old the data are); widely disseminating individual assessment
data in the organization for various unintended purposes; and using generic exercises that have
little or no demonstrable relationship to the target jobs. In addition, various administrative gaffs
Ethics of Employee Selection 585

have been noted, such as failing to maintain the security of measures for days 2 and 3 of a multi-day
AC; allowing nonassessors (e.g., senior managers) to influence assessment evaluations; and fail-
ing to provide appropriate feedback to candidates or providing inappropriate feedback such as
implying organizational actions to be taken (“you clearly have the talent to be promoted soon”),
etc. All of these matters (and others, such as “rights of the participant”) are discussed carefully
in Guidelines and Ethical Considerations for Assessment Center Operations (International Task
Force on Assessment Center Guidelines, 2000).

Computer- and Web-Based Testing4


The administrative and financial efficiencies of computerized and web-based job application and
selection testing procedures are often considerable. It is not surprising that the practice is grow-
ing. However, as is often the case with new technologies or procedures, the incidence of usage has
probably outstripped careful consideration of potential problems in implementation. We see three
broad sets of problems to be considered. The first is largely pragmatic and has to do with adminis-
trative and technical problems associated with a computerized delivery system (e.g., provision of an
adequate number of computer consoles). The second set of problems has more professional and ethi-
cal overtones—having to do with the equivalence of test results obtained by traditional means of
testing with the results of the same tests administered via computer (Potosky & Bobko, 2004). That
is, to what extent is empirical validity evidence from traditional test administrations to be taken as
wholly applicable to web-based administration? It is at least possible, if not likely, that degrees of
equivalence will vary as a function of the domain tested (e.g., cognitive ability vs. personality attri-
butes), type of test (e.g., timed vs. untimed), response format (e.g., short-answer vs. open-ended),
examinee attributes (e.g., facility with computers, degree of self-efficacy, etc.), and other factors.
Psychometricians may be sanguine that the degree of correlation between paper-and-pencil and
computer-delivered test administrations is high enough to conclude that the same measurement
objectives are being met (r ~ .6 – .8). From a fairness perspective, however, the same pass/fail cut
score on the two forms of administration will include and exclude some different examinees, as may
rank-ordered selection. The third set of problems is associated with web-based assessment, inde-
pendent of the equivalence issue. These include concern for test security, the possibility of cheating
when the testing is unproctored, differential access to the Internet for different groups of potential
applicants, etc. (cf. Tippins et al., 2006).

SOME ISSUES RELATING TO SITUATIONAL OR CONTEXTUAL ORGANIZATIONAL ISSUES


Selection in the Context of a Unionized Organization
There seem to us to be at least four important matters to be considered:

1. The potential difficulties one might encounter in this regard in implementing a selection
study are likely to be as much or more influenced by the history of union/management rela-
tions in the organization as by the attributes of the proposed program, so that one should
become familiar with that history.
2. The parameters of the project may be set or limited by terms of the collective bargaining
agreement, so one also needs to be knowledgeable about that.
3. Even if it were legal (by virtue of recognized management prerogative in the union con-
tract) to implement a selection program for union members without obtaining prior union
agreement, the prudent I-O psychologist (i.e., one who would like the project to succeed)
would be well-advised to proceed as if such approval were necessary—and to not proceed

4 See Chapter 8, this volume, concerning technology and employee selection for a more comprehensive presentation.
586 Handbook of Employee Selection

until some acceptable arrangement was achieved, preferably even actively involving the
union in the project from its outset.
4. Because the topic of unions tends to be a volatile one on which people hold strong opin-
ions, and because most I-O psychologists tend to view themselves as representatives of
management, it is advisable to consider the extent to which one’s personal values in that
regard might conflict with a more universalistic set of professional values including respect
for all relevant stakeholders, including union members.

Ethical Issues Regarding Setting Cut Scores5


The purpose of a cut (or passing) score is to segment examinees into two groups: one deemed “unac-
ceptable,” hence rejected for employment or promotion, and another thought to be “acceptable,”
hence hired/promoted or deemed eligible for further screening. Consequently, the primary ethical
issue is inextricably bound up with the technical psychometric issues having to do with the accuracy
of those classification decisions. The generally preferred method of setting a cut score on a predic-
tor is to do so empirically by predicting a specified minimum criterion score, which first needs its
own justification (Green, 1996). Such predictor cut-off scores are affected by issues of criterion rel-
evance, extent of predictor validity, measurement error around the predictor score, and the error of
estimate associated with that prediction. Oftentimes, those sources of variability are not considered.
Alternatively, when criterion-related validation is not technically feasible, cut scores sometimes are
determined nonempirically by one of several subjective rating or judgmental methods using SMEs’
knowledge about the content domain of the examination (Mills & Melican, 1987). In the absence
of any criterion information, the issue of classification errors (particularly “false rejects”) is exac-
erbated by virtue of there being no way to assess their extent, and generally no attempt is made to
assess the accuracy of the classifications. The I-O psychologist may experience a dilemma when,
perhaps for reasons of cost, criterion-related validation is not done although feasible. The resulting
ignorance of classification (in)accuracy was potentially avoidable.

Ethical Issues Regarding Retesting


To acknowledge that even the most valid selection system entails classification errors, particularly
applicants who are rejected incorrectly, suggests (by virtue of the ethical principles of fairness and
nonmaleficence) that unsuccessful candidates should be allowed the opportunity for reexamination
if it is feasible. However, it is known that people do tend to improve their performance on abil-
ity tests when retested—on average, by about one-quarter standard deviation for cognitive abili-
ties (Hausknecht, Halpert, Di Paolo, & Moriarty Gerard, 2007). Consequently, to allow those who
request it to be retested raises an additional ethical issue with respect to unfairness to those unsuc-
cessful candidates who have not requested retesting and under some circumstances even to those
who passed initially (e.g., when all of those who pass are to be rank-ordered). However, a reasonably
satisfactory solution seems attainable. First, the practice effect is reduced if an alternate form of the
examination is used for retesting. Second, the effect is enhanced when accompanied by coaching,
but that is not likely to be the case in employment testing; and third, it declines with the length of
time before retesting. Therefore, if it is financially feasible to develop alternative forms and to pro-
vide the administrative resources for retesting, and, if the opportunity to request retesting is known
and available to all candidates, retesting seems feasible and fair. To reduce the possible practice
effects, most organizations that adopt the policy generally specify a minimum waiting period.

Organizational Pressures for the Misuse of Test Data


Pressures on psychologists to use data in ways inconsistent with their ethical standards can be con-
siderable, particularly for psychologists employed in industry and government. Such pressures arise
from various sources, including the reality that most of those involved in leadership roles in such

5 See Chapter 7, this volume, on the use of test scores for a more thorough treatment of this issue.
Ethics of Employee Selection 587

settings are not psychologists and may not understand the ethical constraints on psychologists or the
complexities of standards associated with employee selection research. For example, the adequacy
of sample sizes for establishing reliable validity coefficients may seem like an academic concern to
an impatient manager eager to get on with implementation. Psychologists do have an ethical obliga-
tion in such circumstances, but, in recognition of the salience of organizational realities, it is not
absolute.

1.03 Conflicts Between Ethics and Organizational Demands


If the demands of an organization with which psychologists are affiliated or for whom they are working
conflict with this Ethics Code, psychologists clarify the nature of the conflict, make known their
commitment to the Ethics Code, and to the extent feasible, resolve the conflict in a way that permits
adherence to the Ethics Code. (APA Code, 2002)

CONCLUSIONS
It was not our intention to present an exhaustive catalogue of potential ethical problems in
selection—nor, in all likelihood, would it have been possible if we had intended to do so. But it is
our hope that the reader comes away with an enhanced understanding of the following:

• The nature of moral and ethical issues and the general forms in which they may be
manifested
• How ethical issues differ and can be differentiated from other sorts of problems
• The complexity of moral dilemmas (i.e., their interconnectedness with matters of technical
competence, professional judgment, and personal values)
• How ethical problems may arise in the realm of employee selection
• Some ways to judge the relative appropriateness of alternative solutions to ethical dilemmas
• The guidance offered by and limitations of both the more specific, but narrow, formal-
istic approach to ethical compliance represented by specifically prescribed and (mostly)
proscribed actions that are contained in codes of conduct, and the more general approach
that emphasizes the expression of well-established moral principles and the values they
engender
• Some of the typical ethical issues potentially associated with various facets of employee
selection in organizations, including contextual and organizational influences as well as
those particularly associated with specific measures and/or procedures

Following all of that descriptive and analytic input, it seems most appropriate, if not necessary, to
conclude by focusing on application and solution, that is, what to do. On one hand, general ethical
principles and written sources such as APA’s Code of Conduct and the Society for Industrial and
Organizational Psychology (SIOP)’s casebook (Lowman, 2006) are readily available but may not
explicitly include one’s particular problem(s). On the other hand, specific potential ethical issues—
even within a limited domain such as employee selection—are innumerable, not entirely predict-
able, and so cannot all be itemized a priori. The best we can hope to do, aside from noting some
particularly common examples as we have done, is to present a general scheme emphasizing preven-
tion (cf. Pryzwansky & Wendt, 1987), that is, highlighting the importance of trying to anticipate and
prevent problems before they arise. Largely on the basis of the work of Canter, Bennett, Jones, and
Nagy (1994), as well as Pryor (1989), we offer the following six-step plan.

1. BE FAMILIAR WITH APPLICABLE ETHICAL CODES AND PROFESSIONAL STANDARDS


Ethical guidelines are available from the APA (2002), the Canadian Psychological Association
(2000), the Academy of Management (2002), the International Personnel Management Association
588 Handbook of Employee Selection

(1990), the Society for Human Resource Management (1990), the International Task force on
Assessment Center Guidelines (2000), and other relevant organizations. Gaining familiarity with
them can help one to avoid blundering into ethical indiscretions because of sheer ignorance, which
is important because “lack of awareness or misunderstanding of an ethical standard is not itself a
defense to a charge of unethical conduct” (APA Code, 2002, p. 1061). Indispensable sources of pro-
fessional information include the APA Standards (1999), SIOP Principles (2003), and knowledge
concerning how tests are often misused (Moreland, Eyde, Robertson, Primoff, & Most, 1995).

2. BE FAMILIAR WITH RELEVANT FEDERAL, STATE, AND LOCAL LAWS AND REGULATIONS
These pertain to rules regarding conducting research with human participants (OHRP, 1991), one’s
particular (U.S.) state laws regulating the licensing of psychologists, and federal and state laws
governing employment practices such as the Civil Rights Acts of 1964 and 1991, the Americans
With Disabilities Act of 1990, the Age Discrimination in Employment Act of 1967, the Equal Pay
Act of 1963, and the Uniform Guidelines on Employee Selection Procedures (Equal Employment
Opportunity Commission, Civil Service Commission, U.S. Department of Labor, and U.S.
Department of Justice, 1978). (Refer to Chapters 28, 29 and 30, this volume, for in-depth treatments
of these professional and legal standards.)

3. KNOW THE RULES AND REGULATIONS OF THE ORGANIZATION IN


WHICH YOU WORK AND/OR THOSE OF YOUR CLIENT
Having that knowledge serves at least two purposes. First, it helps assure competent and appropriate
professional practice by incorporating and meeting organizational expectations regarding proce-
dures and outcomes. The second purpose is more problematic. It pertains to the possible conflict
between organizational practices or objectives versus our professional ethical and/or legal stan-
dards (e.g., some I-O psychologists have been directed to use confidential research or test data for
purposes not originally intended or consented to; some have been told that the organization will
not provide test feedback to employees who were candidates for promotion). As quoted earlier, but
worth the reminder, the more “I-O friendly” revision of the APA ethical principles includes enforce-
able Standard 1.03, which requires I-O psychologists to “clarify the nature of the conflict, make
known their commitment to the Ethics Code, and to the extent feasible, resolve the conflict in a way
that permits adherence to the Ethics Code” (APA, 2002).

4. PARTICIPATE REGULARLY IN CONTINUING EDUCATION IN ETHICS AND


IN PROFESSIONAL/TECHNICAL ISSUES AFFECTING COMPETENCE

This admonition is obviously not entirely necessary for you, the reader. Attending courses, work-
shops, and professional conference presentations and seminars; subscribing to journals; and reading
books that focus on ethical, professional, and technical matters are some of the means of keeping
abreast of new technical developments and honing one’s ethical sensitivities and decision-making
skills. Conferring with colleagues is often indispensable—most especially when one is in the throes
of an uncomfortable ethical dilemma or sees the potential for one developing. In addition to our
national association, SIOP, the regularly published newsletters of several local organizations of
applied psychologists have proven to be consistently reliable sources of information, including the
Metropolitan New York Association for Applied Psychology (Metro); the Personnel Testing Council
of Metropolitan Washington, DC (PTC/MW); and the Personnel Testing Council of Southern
California (PTC/SC).6

6 Information may be obtained from the following websites: http://www.siop.org, http://www.metroapppsych.com, http://
www.ptcmw.org, and http://www.ipmaac.org/ptcsc
Ethics of Employee Selection 589

5. MAINTAIN A MINDSET OF ETHICAL WATCHFULNESS AND IDENTIFY POTENTIAL


ETHICAL PROBLEMS
To a considerable degree, the purpose of this entire chapter is to promote one’s ability to do just
this. If we have been at all successful, it will have been by increasing the salience and the reader’s
knowledge of ethical principles; the way in which those moral issues are enmeshed with matters of
personal values, professional judgment, and technical competence; the typical forms or structures
of ethical dilemmas; the role to be played by formal ethical guidelines; and some particular and
somewhat predictable ethical problems associated with particular selection practices. Hopefully,
this will help to avoid ethically ambiguous situations or to clarify them early on. We believe that
such moral sensitivities (Rest, 1994) are learned attributes and can be enhanced with practice. All
in all, we hope to have contributed to I-O psychologists’ “staying ethically fit” (Jeanneret, 1998),
which leads to the last item.

6. LEARN SOME METHOD(S) FOR ANALYZING ETHICAL SITUATIONS AND


MAKING ETHICAL DECISIONS IN COMPLEX SOCIAL SITUATIONS
Space does not permit delving into this process in this chapter. Fortunately however, others have
done so. Several ethical decision-making models and procedures have been reviewed by Wittmer
(2001) and by Pryzwanski and Wendt (1999). We have (unsurprisingly) found one decision-making
model to be helpful that was synthesized with I-O psychology in mind (Lefkowitz, 2003), even
though such models have been criticized with some justification as being simplistic (Ladenson, in
Gellerman, Frankel, & Ladenson, 1990, p. 90), that is, as not matching the complexities of many
ethical dilemmas. However, their value may lie in the psychologist becoming accustomed to the
general process of ethical reasoning they promote, rather than adhering to specific decision-making
steps. Moreover, as first emphasized by Koocher and Keith-Spiegel (1998), such analytic decision-
aids may have longer-range value by helping to “fine-tune and shape appropriate responses” (p. 12)
by repeated use.

REFERENCES
Academy of Management. (2002). Academy of Management code of ethical conduct. Academy of Management
Journal, 45, 291–294.
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (1999). Standards for educational and psychological testing. Washington,
DC: American Educational Research Association.
American Psychological Association. (1981). Specialty guidelines for the delivery of services by I/O psycholo-
gists. American Psychologist, 36, 664–669.
American Psychological Association. (2002). Ethical principles of psychologists and code of conduct. American
Psychologist, 57, 1060–1073.
Baritz, L. (1960). The servants of power: A history of social science in American industry. Westport, CT:
Greenwood.
Caldwell, C., Thornton, G. C., III, & Gruys, M. L. (2003). Ten classic assessment center errors: Challenges to
selection validity. Public Personnel Management, 32, 73–88.
Canadian Psychological Association. (2000). Canadian code of ethics for psychologists (3rd ed.). Ottawa,
Ontario: Author.
Canter, M. B., Bennett, B. E., Jones, S. E., & Nagy, T. F. (1994). Ethics for psychologists: A commentary on the
APA ethics code. Washington, DC: American Psychological Association.
Combs, J., Liu, Y., Hall, A., & Ketchen, D. (2006). How much do high performance work practices matter?
A meta-analysis of their effects on organizational performance. Personnel Psychology, 59(3), 501–528.
Cooper, T. L. (Ed.). (2001). Handbook of administrative ethics. New York, NY: Marcel Dekker.
De Corte, W., Lievens, F., & Sackett, P. R. (2007). Combining predictors to achieve optimal trade-offs between
selection quality and adverse impact. Journal of Applied Psychology, 92, 1380–1393.
590 Handbook of Employee Selection

Edelstein, L. (1967). The Hippocratic Oath: Text, translation and interpretation. In O. Temkin & C. L. Temkin
(Eds.), Ludwig Edelstein. Ancient medicine: Selected papers of Ludwig Edelstein (pp. 3–64). Baltimore,
MD: Johns Hopkins University Press.
Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department
of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166),
38290–38315.
Fletcher, C. (1992). Ethical issues in the selection interview. Journal of Business Ethics, 11, 361–367.
Freeman, R. E. (1984). Strategic management: A stakeholder approach. Boston, MA: Pitman.
Freeman, R. E., & Phillips, R. A. (2002). Stakeholder theory: A libertarian defense. Business Ethics Quarterly,
12, 331–349.
Gellermann, W., Frankel, M. S., & Ladenson, R. F. (1990). Values and ethics in organization and human sys-
tems development: Responding to dilemmas in professional life. San Francisco, CA: Jossey-Bass.
Green, B. F. (1996). Setting performance standards: Content, goals, and individual differences. The second
annual William H. Angoff Memorial Lecture. Princeton, NJ: Educational Testing Service.
Guenster, N., Derwall, J., Bauer, R., & Koedijk, K. (2005, July). The economic value of corporate eco-effi-
ciency. Paper presented at the Academy of Management Conference. Honolulu, HI.
Haber, S. (1991). The quest for authority and honor in the American professions, 1750–1900. Chicago, IL:
University of Chicago Press.
Hall, R. T. (1975). Occupations and the social structure (2nd ed.). Englewood Cliffs, NJ: Prentice-Hall.
Hausknecht, J. R., Halpert, J. A., DiPaolo, N. T., & Moriarty Gerrard, M. O. (2007). Retesting in selection: A
meta-analysis of coaching and practice effects for tests of cognitive ability. Journal of Applied Psychology,
92, 373–385.
International Personnel Management Association. (1990). IPMA code of ethics. Alexandria, VA: Author.
International Task Force on Assessment Center Guidelines. (2000). Guidelines and ethical considerations for
assessment center operations. Pittsburgh, PA: Development Dimensions International.
Jeanneret, P. R. (1998). Ethical, legal, and professional issues for individual assessment. In R. Jeanneret &
R. Silzer (Eds.). Individual psychological assessment: Predicting behavior in organizational settings (pp.
88–131). San Francisco, CA: Jossey-Bass.
Katzell, R. A., & Austin, J. T. (1992). From then to now: The development of industrial-organizational psychol-
ogy in the United States. Journal of Applied Psychology, 77, 803–835.
Kimball, B. A. (1992). The “true professional ideal” in America. Cambridge, MA: Blackwell.
Koocher, G. P. (2007). APA Presidential Address. Twenty-first century ethical challenges for psychology.
American Psychologist, 62, 375–384.
Koocher, G., & Keith-Spiegel, P. (1998). Ethics in psychology: Professional standards and cases (2nd ed.).
New York, NY: Oxford University Press.
Lefkowitz, J. (1990). The scientist-practitioner model is not enough. The Industrial-Organizational Psychologist,
28(1), 47–52.
Lefkowitz, J. (2003). Ethics and values in industrial-organizational psychology. Mahwah, NJ: Lawrence
Erlbaum.
Lefkowitz, J. (2004). Contemporary cases of corporate corruption: Any relevance for I-O psychology? The
Industrial-Organizational Psychologist, 42(2), 21–29.
Lefkowitz, J. (2005). The values of industrial-organizational psychology: Who are we? The Industrial-
Organizational Psychologist, 43(2), 13–20.
Lefkowitz, J. (2006). The constancy of ethics amidst the changing world of work. Human Resource Management
Review, 16, 245–268.
Lefkowitz, J. (2007a). Ethics in industrial-organizational psychology research. In S. Rogelberg (Ed.), The ency-
clopedia of industrial and organizational psychology (Vol. 1, pp. 218–222). Thousand Oaks, CA: Sage.
Lefkowitz, J. (2007b). Ethics in industrial-organizational psychology practice. In S. Rogelberg (Ed.), The
encyclopedia of industrial and organizational psychology (Vol. 1, pp. 215–218). Thousand Oaks, CA:
Sage.
Lefkowitz, J. (2007c). Corporate social responsibility. In S. Rogelberg (Ed.), The encyclopedia of industrial
and organizational psychology (Vol. 1, pp. 114–118). Thousand Oaks, CA: Sage.
Lefkowitz, J. (2008). In order to prosper the field of organizational psychology should … expand its values to
match the quality of its ethics (Special issue). Journal of Organizational Behavior, 29, 439–453.
Lowman, R. L. (1991). The clinical practice of career assessment: Interests, abilities, and personality.
Washington, DC: American Psychological Association.
Lowman, R. L. (Ed.). (2006). The ethical practice of psychology in organizations (2nd ed.). Washington, DC:
American Psychological Association/Society for Industrial-Organizational Psychology.
Ethics of Employee Selection 591

Lowman, R. L., Kantor, J., & Perloff, R. (2006). History of I-O psychology educational programs in the United
States. In L. L. Koppes (Ed.), Historical perspectives in industrial and organizational psychology (pp.
111–137). Mahwah, NJ: Lawrence Erlbaum.
Mills, C. N., & Melican, G. J. (1987). A preliminary investigation of three compromise methods for establishing
cut-off scores (Report No. RR-87-14). Princeton, NJ: Educational Testing Service.
Moreland, K. L., Eyde, L. D., Robertson, G. J., Primoff, E. S., & Most, R. B. (1995). Assessment of test user
qualifications: A research-based measurement procedure. American Psychologist, 50, 14–23.
Office for Human Research Protections, Department of Health and Human Services (1991). Protection of
human subjects. Code of Federal Regulations, Title 45, Public Welfare. June 18.
Orlitzky, M., Schmidt, F. L., & Rynes, S. L. (2003). Corporate social and financial performance: A meta-
analysis. Organization Studies, 24, 403–441.
Pope, K. S., & Vetter, V. A. (1992). Ethical dilemmas encountered by members of the American Psychological
Association: A national survey. American Psychologist, 47, 397–411.
Potosky D., & Bobko, P. (2004). Selection testing via the Internet: Practical considerations and exploratory
empirical findings. Personnel Psychology, 57, 1003–1034.
Prien, E. P., Schippmann, J. S., & Prien, K. O. (2003). Individual assessment as practiced in industry and con-
sulting. Mahwah, NJ: Lawrence Erlbaum.
Pryor, R. G. L. (1989). Conflicting responsibilities: A case study of an ethical dilemma for psychologists work-
ing in organisations. Australian Psychologist, 24, 293–305.
Pryzwansky, W. B., & Wendt, R. N. (1999). Professional and ethical issues in psychology: Foundations of
practice. New York, NY: W.W. Norton & Co.
Rachels, J. (1993). The elements of moral philosophy (2nd ed.). New York, NY: McGraw-Hill.
Rest, J. R. (1994). Background: Theory and research. In J. R. Rest & D. Narvaez (Eds.). Moral development in
the professions (pp. 1–26). Hillsdale, NJ: Lawrence Erlbaum.
Rosenthal, R. (1994). Science and ethics in conducting, analyzing and reporting psychological research.
Psychological Science, 5, 127–134.
Samuelson, P. A. (1993). Altruism as a problem involving group versus individual selection in economics and
biology. American Economic Review, 83, 143–148.
Society for Human Resource Management. (1990). Code of ethics. Alexandria, VA: Author.
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of person-
nel selection procedures (4th ed.). Bowling Green, OH: Author.
Tippins, N. T., Beaty, J., Drasgow, F., Gibson, W. M., Pearlman, K., Segall, D. O., & Shepherd, W. (2006).
Unproctored Internet testing in employment settings. Personnel Psychology, 59, 189–225.
Wittmer, D. P. (2001). Ethical decision-making. In T. L. Cooper (Ed.), Handbook of administrative ethics (2nd
ed., pp. 481–507). New York, NY: Marcel Dekker.
Wright, P. M., Gardner, T. M., Moynihan, L. M., & Allen, M. R. (2005). The relationship between HR practices
and firm performance: Examining causal order. Personnel Psychology, 58, 409–446.
Zickar, M. J. (2001). Using personality inventories to identify thugs and agitators: Applied psychology’s con-
tribution to the war against labor. Journal of Vocational Behavior, 59, 149–164.
This page intentionally left blank
28 Professional
Standards
Guidelines/

P. Richard Jeanneret and Sheldon Zedeck

INTRODUCTION 1
There are three sources of authoritative information and guidance that can be relied upon in the
development, validation, and implementation of an employment selection procedure. The sources
are the Standards for Educational and Psychological Testing (1999; Standards), the Principles
for the Validation and Use of Personnel Selection Procedures (2003; Principles), and the Uniform
Guidelines on Employee Selection Procedures (1978; Uniform Guidelines). The term selection pro-
cedure in this instance should be interpreted broadly to be inclusive of any process or information
used in personnel decision-making. Selection procedures would include (but not be limited to) all
forms and types of tests (e.g., cognitive, personality, work samples, and assessment centers), job
performance appraisals, and measures of potential. These procedures may be administered, scored,
and interpreted as paper-and-pencil or computer-based instruments and/or by individuals internal
or external to the organization. This broad view is consistent with the interpretations expressed by
the authoritative sources. Oftentimes the term “test” is used in one of the sources. For the purposes
of this chapter, a test is synonymous with a selection procedure.

PURPOSE AND CHAPTER FLOW


The purpose of this chapter is to describe the history and substance of each of the three authoritative
sources, compare and contrast their technical content, and provide some guidance as to how they
might be particularly useful to those directly associated with employment selection procedures.
Each of the three authoritative sources will be discussed separately in chronological order defined
by the date of their initial publication. The discussion will begin with the purpose and brief his-
tory of the authoritative document. Then information will be presented that describes the content
relevant to employment decision-making. After describing each document, the three sources will
undergo comparisons with indications as to where there are inconsistencies and how they might be
resolved. Finally, suggestions are made as to what additions or changes would be appropriate given
the current state of relevant research.

APPLICATION TO EMPLOYMENT SELECTION ONLY


The Standards in particular and the Principles perhaps to a somewhat lesser extent have poten-
tial relevance to settings outside employment selection. Such venues include forensic, academic,

1 This chapter reproduces with some modifications the major part of “Professional and Technical Authorities and Guide-
lines” by P. Richard Jeanneret, which is found in Landy, F. J. (2005). Employment discrimination litigation: Behavioral,
quantitative and legal perspectives. San Francisco, CA: John Wiley & Sons.

593
594 Handbook of Employee Selection

counseling, program evaluation, and publishing that involve psychological instruments and
measurements. This chapter does not address these applications. The focus is strictly on orga-
nizational settings and employment related selection decisions.

IMPORTANCE OF THE AUTHORITIES


For the most part, the authorities are retrospective rather than prospective. By necessity they must
rely on the state of knowledge in the fields of measurement and applied psychology. Reality, of
course, is that knowledge changes as research in the field develops more information about the strat-
egies and psychometrics of employment selection procedures. Therefore, the authoritative sources
become outdated and include either guidance that is no longer relevant or do not offer guidance that
is very important in current times. Nevertheless, there are several reasons why the three authorita-
tive sources are valuable resources that can be relied upon by individuals associated with employ-
ment selection.

1. The study of employment-related psychometrics (and particularly the validity of selection


procedures) has been taking place for about 100 years. Accordingly, there is a body of
knowledge that is stable, well researched, and directly relevant to understanding the mea-
surement properties of an employment-based selection procedure. Much of this knowl-
edge, with varying degrees of specificity, is embedded in all three authorities with little,
if any, contradiction. Consequently, the authoritative sources are able to provide accurate
information about the state of the science, at least at the time they were written, that can
support the proper development and use of an employment selection procedure.
2. The three documents describe and discuss several specific concepts and terms associated
with the psychometric qualities of a selection procedure. Although not intended as teach-
ing documents per se, they do frequently summarize bodies of research that are otherwise
buried in textbooks and research journal articles.
3. The current editions of the Standards and the Principles have undergone extensive profes-
sional peer review. Although the initial preparations of the documents were accomplished
by committees of experts in the field, both documents were open for comment by the
membership of the American Psychological Association (APA) and, especially in the case
of the Principles, the document was subject to review by the entire membership of the
Society for Industrial and Organizational Psychology (SIOP), a division of APA. Although
the Standards was authored jointly by three professional organizations, the Principles
was authored by a committee of SIOP members. The Standards and the Principles were
adopted as policy by APA and hence have formal professional status. Accordingly, there
were much greater levels of scrutiny and approval of the scientific content of the Standards
and Principles than typically occurs for a textbook or journal article.
4. The Uniform Guidelines was authored by the Equal Employment Opportunity
Commission (EEOC), the Civil Service Commission (CSC), the Department of Labor
(DoL), and the Department of Justice (DoJ). The preparation of the Uniform Guidelines
also relied upon input from several individuals with expertise in psychological mea-
surement, but others (e.g., attorneys) were influential in creating the document as well.
Given this complement of authors, it is understandable that there was less psychometric
content and greater emphasis on the documentation of validity evidence that would be
satisfactory in a judicial proceeding. Interestingly, when the Uniform Guidelines was
under development and when the U.S. House of Representatives was holding hearings
on revisions to the Uniform Guidelines, the APA and SIOP submitted information that
was, for the most part, not incorporated into the final document. Subsequently, in 1985,
an APA representative gave congressional testimony that there were four technical
issues that psychologists disagreed with as these topics were addressed in the Uniform
Professional Guidelines/Standards 595

Guidelines: (a) validity generalization, (b) utility analysis, (c) differential prediction, and
(d) validity requirements and their documentation. Similarly, SIOP believed the Uniform
Guidelines was incorrect with respect to requiring fairness studies, the definition of
construct validity, and how validity generalization and utility analyses were considered
(Camera, 1996). Nevertheless, the EEOC and the Office of Federal Contact Compliance
Programs (OFCCP) currently rely on the Uniform Guidelines to determine whether or
not a selection procedure is discriminatory.
5. For those who are involved in the judicial process (particularly judges and lawyers) the
authoritative sources are alternative reference sources to case law and other judicial
writings. The three sources have been relied upon by experts in the fields of personnel,
industrial, organizational, and measurement psychology when formulating opinions about
selection procedures. In such instances, the authoritative sources have become benchmarks
that help define sound professional practice in the employment setting. Unfortunately the
apparent use of the three sources is rather limited as indicated by the judicial interviews in
Chapter 15 of Landy (2005).

STANDARDS FOR EDUCATIONAL AND PSYCHOLOGICAL TESTING


BRIEF HISTORY
The Standards has a history dating back more than 50 years. The first edition was titled Technical
Recommendations for Psychological Tests and Diagnostic Techniques and was authored by a com-
mittee of APA members and published in 1954. A similar publication was prepared by a com-
mittee comprised of members from the American Educational Research Association (AERA) and
the National Council on Measurement Used in Education (NCMUE). The document was titled,
Technical Recommendations for Achievement Tests and was published in 1955 by the National
Education Association.
In 1966 the two separate documents were revised and combined into a single document: the
Standards for Educational and Psychological Tests and Manuals. Authorship was by a committee
representing APA, AERA, and the National Council on Measurement in Education (NCME). These
three organizations have continued to jointly publish revisions ever since. A revision completed by
another joint committee comprised of AERA, APA, and NCME members was published in 1974,
and the document title was changed to Standards for Educational and Psychological Tests. The
1966 document delineated about 160 standards, and this number was increased to over 225 stan-
dards in 1974. However, the number of standards declined to about 180 in 1985 after a revision and
publication of the Standards for Educational and Psychological Testing (Standards). This title has
remained with the subsequent 1999 revision.
In 1991, APA began an initiative to revise the 1985 Standards. In 1993 a joint AERA, APA,
and NCME committee was formed, and after 6 years of effort the final document was published. It
incorporates 264 standards and was adopted as APA policy. The Standards is intended to be pro-
scriptive but does not have any associated enforcement mechanisms. More so than with past versions,
the 1999 Standards devotes considerable attention to fairness; testing individuals with disabilities;
scales, norms, and score comparability; reliability; and the responsibilities of test users. Currently the
Standards is undergoing another revision with an expected release of 2010.

APPLICATION
The 1999 Standards is applicable to the entire domain of educational and psychological measure-
ment. In fact, there is one chapter devoted completely to educational testing and assessment, as well
as one on testing in employment and credentialing. Because the Standards provides a comprehen-
sive wealth of information on psychological measurement, it is not possible to adequately discuss all
596 Handbook of Employee Selection

of the standards in their entirety. Consequently, this review will focus on just those components of
the Standards that are most applicable to psychometric issues in employment selection.

PURPOSE OF THE STANDARDS


As stated in the document introduction, “The intent of the Standards is to promote the sound and
ethical use of tests and to provide a basis for evaluating the quality of testing practices.” (p. 1). It
is further emphasized that the evaluation of a test or its application should rely heavily on profes-
sional judgment and that the Standards provides a set of references or benchmarks to support the
evaluation process. Finally, the Standards is not intended to respond to public policy questions that
are raised about testing; however, the psychometric information embedded in the Standards may
be very useful to informing those involved in debates and decisions regarding testing from a public
policy perspective. This is so because the Standards preceded and was, in part, the foundation for
the Uniform Guidelines and the Principles.

VALIDITY DEFINED
A key term that will be appearing throughout this chapter and other chapters is “validity” or one
of its derivatives (e.g., validation process). In several respects the Standards has established the
most current thinking regarding validity and provides the definition that should be accepted by all
professionals concerned with the psychometrics of a selection procedure.
According to the Standards:

Validity refers to the degree to which evidence and theory support the interpretations of test scores
entailed by proposed uses of tests. Validity is, therefore, the most fundamental consideration in
developing and evaluating tests. The process of validation involves accumulating evidence to provide a
sound scientific basis for the proposed score interpretation. It is the interpretation of test scores required
by the proposed uses that are evaluated, not the test itself. When test scores are used or interpreted in
more than one way, each intended interpretation must be validated. (p. 9)

Validity is a unitary concept and can be considered an argument based on scientific evidence
that supports the intended interpretation of a selection procedure score (Binning & Barrett, 1989;
Cronbach & Meehl, 1955; McDonald, 1999; Messick, 1980, 1989; Wainer & Braun, 1988). There are
24 specific standards regarding validity incorporated in the 1999 document.
Generally speaking, if a test does not have validity for a particular purpose, it also will not
have utility. Utility is an estimate of the gain in productivity or other practical value that might
be achieved by use of a selection procedure. There are several measures used to estimate utility,
including increases in job proficiency, reduced accidents, savings in employee costs, reduction in
turnover, training success, and others (Cascio, 2000; Cronbach & Gleser, 1965; Hunter & Hunter,
1984; Naylor & Shine, 1965; Schmidt, Hunter, McKenzie, & Muldrow, 1979). Consequently, it
will be an unusual situation whereby an organization would want to use a test that lacked validity
and utility. Furthermore, if a test lacks validity it is possible that some unintended consequence
might result from its use. Thus, reliance on test scores that are not valid will not yield results
intended by the selection process and may in fact yield outcomes that are detrimental to the
organization.

APPLICATION TO SELECTION DECISION-MAKING


Of the three authoritative sources, the Standards offers the greatest amount of detail regarding
the psychometric properties and use of selection procedures. Terms are clearly defined. However,
all standards are not necessarily equally important in a given situation, but no attempt is made to
Professional Guidelines/Standards 597

categorize some standards as “primary” and others as “secondary” as occurred in earlier versions.
An entire chapter that incorporates 12 standards is focused on fairness; another chapter devotes
13 standards to the rights and responsibilities of test-takers; and still another chapter is focused on
testing individuals with diverse linguistic backgrounds (11 standards). This level of description is
oftentimes less precise in the Principles and Uniform Guidelines.

CAUTIONS OFFERED BY THE STANDARDS


The Standards (p. 4) sets forth five cautions that are intended to prevent misinterpretations:

1. Evaluation of a selection procedure is not just a matter of checking-off (or not) one stan-
dard after the other to determine compliance. Rather the evaluation process must consider
(a) professional judgment, (b) satisfaction of the intent of a relevant standard, (c) alternate
selection procedures that are readily available, and (d) feasibility of complying with the
standard given past experience and research knowledge.
2. The Standards offers guidance to the expert in a legal proceeding, but professional judg-
ment determines the relevance of a standard to the situation.
3. Blanket statements about conformance with the Standards should not be made without
supporting evidence. Otherwise, care should be exercised in any assertions about compli-
ance with the Standards.
4. Research is ongoing and knowledge in the field will continue to change. Accordingly, the
Standards will be revised over time.
5. The Standards is not intended to mandate use of specific methodologies. The use of a
“generally accepted equivalent” is always understood with regard to any method provided
in the Standards.

SOURCES OF VALIDITY EVIDENCE


There are multiple ways in which validity evidence might be assembled for a selection procedure,
and no one method is necessarily superior to another. Rather, the validation strategy should be con-
sistent with the nature and intended use of the selection procedure.
Briefly described below are the five validation strategies or sources of validity evidence set forth
in the Standards (pp. 11–17).

1. Content: This evidence is based on an analysis of the relationship between the content of a
selection procedure and the construct it is intended to measure. This construct is typically
an important job requirement [knowledge, skill, ability, and/or other personal character-
istics (KSAOs)], job behavior, or job performance (e.g., performing a task or duty). Often,
reference is made in employment settings to the job content domain. A content study is typ-
ically completed by relying on logical judgments made by subject matter experts (SMEs)
regarding the inferences that can be made linking the test content and scores to the content
domain of the job.
2. Response processes: Studying the processes followed by individuals in responding to a
selection procedure can provide validity evidence regarding the fit between the construct
being measured and the behavior actually engaged in by respondents. For example, if a
work sample test is given to select individuals for an electronics technician job, the exam-
iner may not only want to know whether the candidate ultimately identified the correct
solution, but also what analytical processes were used to fi nd the solution. And, the exam-
iner may want to know if differences in scores among applicants are due to knowledge dif-
ferences or reading comprehension differences.
598 Handbook of Employee Selection

3. Internal structure: Empirical analysis (e.g., factor analysis, differential item functioning)
of the internal structure of a selection procedure can provide evidence regarding the extent
to which the relationships between test items conform to the construct that is the founda-
tion for test score interpretations (McDonald, 1999). For example, a test may be expected to
measure a single construct (e.g., numerical ability) or several constructs (e.g., five different
dimensions of personality). The extent to which the test item relationships conform to the
hypothesized dimensionality is evidence of validity. Another aspect of internal structure
is the extent to which test items function differently for different subgroups of applicants
(differential item functioning). This does not necessarily represent a problem, but rather
may be due to multidimensionality in the test.
4. Relations to other variables: This source of evidence has typically been labeled “criterion-
related validity.” However, it also includes what was once called “construct validity.” It is
based on the empirical relationship(s) between scores on a selection procedure and some
external variable(s). If the relationship is between a test and a measure of job performance
or some other job related outcome (e.g. turnover), this has been called criterion-related
validity evidence. If the relationship is between a selection procedure and some other
measure(s) of the same or different construct(s) this was labeled construct validity in the
past. The essential point is that a hypothesis is set forth as to the nature of the relation-
ship that is tested using appropriate statistical methods, and empirical relationships are
established between the selection procedure of interest and one or more external vari-
ables. Criterion-related validity evidence can be collected in two ways: concurrently and
predicatively. In a concurrent study, the selection procedure scores and measurement(s)
of the criterion/criteria are obtained at approximately the same time (concurrently) for
an incumbent group. The data are analyzed to determine the accuracy with which the
selection procedure scores predict criterion performance. In a predictive criterion-related
validity study, selection procedures scores are obtained but not used in decision-making
for an applicant group. A portion of the applicants is hired, and at some later point in time
(after a period of time whereby the applicant can reliably demonstrate his/her perfor-
mance) the criterion measure is obtained and the accuracy of the selection procedure is
evaluated. Generally, concurrent and predictive studies reach the same conclusions about
the validity of inferences from selection procedure scores, at least for cognitive abil-
ity measures (Barrett, Phillips, & Alexander, 1981; Bemis, 1968; Pearlman, Schmidt, &
Hunter, 1980).

The evidence based on relations to other variables is further explained in terms of two strategies
that can be used to understand those relationships: convergent/discriminant validity and validity
generalization. These two strategies are briefly discussed next.

Convergent and Discriminant Validity


When selection procedure scores and other measures of the same or similar constructs are
correlated, this is convergent validity evidence. When selection procedure scores are not correlated
to measures of purportedly different constructs this is evidence of discriminant validity (Campbell &
Fiske, 1959; McDonald, 1999). Although both types of evidence are useful, the more frequent
study examines convergent validity. For example, in the typical criterion-related validity study the
relationship between a cognitive selection procedure and a measure of job performance (such as
decision-making) is purportedly concerned with the same or very similar constructs (i.e., convergent
validity). However, if a selection procedure was comprised of a cognitive measure and a test of
interpersonal skills, and there were two job performance indices (decision-making and teamwork)
the lack of relationship (or low relationship) between the cognitive measure and teamwork (or the low
correlation between the interpersonal skills test and decision-making) would provide discriminant
evidence. Convergent and discriminant evidence can be equally valuable.
Professional Guidelines/Standards 599

Validity Generalization
The issue is whether or not validity evidence obtained in one situation can be generalized to a new
situation without further study of the validity of that procedure in the new setting. When criterion-
related validity evidence has been accumulated for a selection procedure, meta-analysis has been
a useful statistical method for studying the validity generalization question. There are numerous
methodological and statistical issues associated with meta-analytic studies, and these matters are
too lengthy to be addressed here. The interested reader is referred to Hunter and Schmidt (1990),
Rosenthal (1991), or Chapter 42, this volume.

5. Consequences of testing: This is probably the most recent source of validity evidence to
be addressed in the literature regarding selection procedures. The question being asked
is whether or not the specific benefits expected from the use of a selection procedure are
being realized, and are there other consequences that are not expected or desired. In the
employment context, an unintended consequence might be differential pass rates for vari-
ous subgroups of job candidates. This alone does not detract from the validity of test score
interpretations. Only if the subgroup differences can be linked to a specific source of bias
or contamination in the selection procedure will the negative consequences detract from
the validity of score interpretations. Said differently, if the selection procedure is assessing
an important skill underlying job performance and any subgroup differences are attributed
to unequal distributions of that skill in the applicant pool, then the observation of those
differences do not undermine the validity of the inferences that are drawn from the selec-
tion procedure. The Standards is very clear on this point and explicitly states that there is a
difference between issues of psychometric validity versus social policy, and the Standards
is only addressing validity.

INTEGRATING VALIDITY EVIDENCE


A comprehensive and sound validity argument is made by assembling the available evidence
indicating that interpretations of scores from a well-developed selection procedure can accurately
predict the criterion of interest. Although the various sources of validity evidence discussed above
are directly relevant, there are many other valuable information sources, including information
obtained from prior research; reliability indices; information on scoring, scaling, norming, and
equating data; standard settings (e.g., cut scores); and fairness information. All of these informa-
tion sources, when available, contribute to the final validity argument and decision regarding the
use of a selection procedure.

VALIDITY STANDARDS
There are 24 specific standards presented in the validity chapter of the Standards. Although all
24 standards are important, there are certain themes that are particularly relevant in the context of
employment selection. A brief summary of these themes follows.

• The rationale and intended interpretation of selection procedure scores should be set forth
at the outset of a validity study. When new interpretations or intended uses are contem-
plated, they should be supported by new validity evidence.
• Descriptions of individuals participating in validation studies should be as detailed as is
practical. If SMEs are used, their qualifications and the procedures they followed in devel-
oping validation evidence should be documented.
• When criterion-related validity studies are completed, information about the quality and
relevance of the criterion should be reported.
600 Handbook of Employee Selection

• When several variables are predicting a criterion, multiple regressions should be used to
evaluate increments in the predictive accuracy achieved by each variable. Results from the
analyses of multiple variables should be verified by cross-validation whenever feasible.
• If statistical adjustments (e.g., the correction of correlations for restriction in range) are
made, the unadjusted and adjusted correlations and the procedures followed in making the
adjustments should be documented.
• If meta-analyses are relied upon as criterion-related validity evidence, the comparability
between the meta-analytic variables (predictors and criteria) and the specific situation of
interest should be determined to support the applicability of the meta-analytic findings
to the local setting. All assumptions and clearly described procedures for conducting the
meta-analytic study should be reported.

RELIABILITY AND MEASUREMENT ERRORS


Chapter 2 of the Standards describes reliability and errors of measurement and sets forth 20 standards
related to the topic. The chapter is concerned with understanding the degree to which a selection pro-
cedure score is free from error. To the degree to which a score is contaminated, it is due to errors of
measurement that are usually assumed to be unpredictable and random in occurrence. There are two
sources of error: (a) within individuals subject to the selection procedure and (b) conditions external to
the individuals, such as the testing environment or mistakes in scoring the selection procedure.
Reliability is an index indicating the degree to which selection procedure scores are measured
consistently across one or more sources of error such as time, test forms, or administrative settings.
Reliability has an impact on validity in that to the extent the selection procedure is not reliable it will
be more difficult to make accurate predictions from the selection procedure scores. Excellent treat-
ments of reliability may be found in McDonald (1999), Nunnally and Bernstein (1994), Pedhazur
and Schmelkin (1991), and Traub (1994).
The reliability chapter of the Standards develops many of the basic concepts embedded in reli-
ability theory. It is important to note that there is no single index of reliability that measures all the
variables that influence the accuracy of measurement. The two major theoretical positions regarding
the meaning of reliability are classical reliability theory and generalizability theory. Under classical
reliability theory, a test score is comprised of two components: a true score and error. Hypothetically,
a true score can be obtained by calculating the average of numerous repeated applications of the the
same test. The comparable concept under generalizability theory is a universe score. The hypotheti-
cal difference between an observed score and a universe score is measurement error, which is random
and unpredictable. A generalizability coefficient is similar to a reliability coefficient in that both are
defined as the ratio of true (or universe) score variance to observed score variance. The major difference
is that using analysis of variance techniques, generalizability theory provides for the partitioning of
variance as universe score variance, error variance, and observed score variance.
What is important is that the method used to determine reliability be appropriate to the data and
setting at hand and that all procedures be clearly reported. Furthermore, various reliability indices
(e.g., test-retest, internal consistency) are not equivalent and should not be interpreted as being inter-
changeable; accordingly, one should not state that the “reliability of test X is …”, but rather should
state “the test-retest reliability of test X is …”. Finally, the reliability of selection procedure scoring
by examiners does not imply high candidate consistency in responding to one item versus the next
item that is embedded in a selection procedure. In other words, just because the scoring of a test is
reliable does not mean that the test itself is reliable.

STANDARDS FOR EMPLOYMENT AND CREDENTIALING TESTS


Chapter 14 of the Standards is devoted to testing used for employment, licensure, and certification.
In the employment setting, tests are most frequently used for selection, placement and promotion.
Professional Guidelines/Standards 601

There are 17 standards set forth in Chapter 14. They address the collection and interpretation
of validity evidence, the use of selection procedure scores, and the importance of reliability
information regarding selection procedure scores. The chapter’s introduction emphasizes that
the contents of many other chapters in the Standards also are relevant to employment testing.
One point of emphasis in Chapter 14 is the influence of context on the use of a selection proce-
dure. There are nine contextual features identified, which by their labels are self-explanatory (see
Standards, pp. 151–153).

• Internal versus external candidate pool


• Untrained versus specialized jobs
• Short-term versus long-term focus
• Screen-in versus screen-out
• Mechanical versus judgmental decision-making (when interpreting test scores)
• Ongoing versus one-time test use
• Fixed applicant pool versus continuous flow
• Small versus large sample sizes
• Size of applicant pool, relative to number of job openings (selection ratio)

The Standards emphasizes that the validation process in employment settings is usually grounded
in only two of the five sources of validity evidence: relations to other variables and content. One or
both types of evidence can be used to evaluate how well a selection procedure (predictor) predicts
a relevant outcome (criterion). However, there is no methodological preference or more correct
method of establishing validity; rather the selection situation and professional judgment should be
the determiners of what source(s) of evidence are appropriate.

EVALUATING VALIDITY EVIDENCE


Perfect prediction does not occur, and the evaluation of validity evidence is often completed on
a comparative basis (e.g., how an observed validity coefficient compares to coefficients reported
in the literature for the same or similar constructs). Consideration may be given to available
and valid alternative selection procedures, utility, concerns about applicant reactions, social
issues, and organizational values. Any or all of these considerations could influence the final
conclusions drawn about the validity evidence as well as the implementation of the selection
procedure.

PROFESSIONAL AND OCCUPATIONAL CREDENTIALING


In Chapter 14, the Standards also address the specific instance of credentialing or licensing
procedures that are intended to confi rm that individuals (e.g., medical doctors or nuclear power
plant operators) possess the knowledge or skills to the degree that they can safely and/or effectively
perform certain important occupational activities. The credentialing or licensing procedures
are intended to be strict to provide the public as well as governmental and regulatory agencies
with sound information regarding the capabilities of practitioners. The procedures are designed
to have a gate-keeping role and often include a written examination as well as other specific
qualifications (e.g., education or supervised experience). Content validity evidence is usually
obtained to support the use of the credentialing procedures, because criterion information is gen-
erally not available. Establishing a cut score is a critical component of the validation process
and is usually determined by SMEs. Arbitrary passing scores, such as 70% correct, typically
are not very useful and may not defi ne the level of credentialing procedure success equivalent
to acceptable job performance and the assurance that there will be no resultant harm to the
public.
602 Handbook of Employee Selection

REVIEW OF THE 17 STANDARDS IN CHAPTER 14


Although there is no intention to discuss all 17 standards in detail, there are certain standards that
deserve some emphasis because they bear directly on issues that have emerged as especially impor-
tant in employment selection decision-making. A brief discussion of these standards follows.

• The objective of the employment selection procedure should be set forth, and an indication
of how well that objective has been met should be determined.
• Decisions regarding the conduct of validation studies should take into consideration prior
relevant research, technical feasibility, and the conditions that could influence prior and
contemplated validation efforts.
• When used, the representativeness of the criterion (which could be important work behaviors,
work output, or job-relevant training) should be documented.
• Inference about the content validity of a selection procedure for use in a new situation is
only appropriate when “critical job content factors are substantially the same (as deter-
mined by a job analysis), the reading level of the test material does not exceed that appro-
priate for the new job, and there are no discernable features of the new situation that would
substantially change the original meaning of the test material.” (Standards, p. 160)
• When there is multiple information available to decision-makers regarding an employment
process, the use of each informational component should be supported by validity evidence.
Furthermore, the role played by each component as it is integrated into a final decision should
be clearly explained. In credentialing situations, the rules and procedures followed when com-
bining scores from multiple information sources should be made available to candidates.
• Cut scores for credentialing tests should be determined on the basis of the skill or knowledge
level necessary for acceptable job performance and not on the basis of the number or propor-
tion of candidates passing.

FAIRNESS
For the first time an entire chapter in the Standards (pp. 73–84) has been devoted to fairness. In the
1985 edition there were no standards related to fairness; in 1999 there is considerable background
information plus 12 standards devoted to the topic. Fairness is defined in the Standards as “the
principle that every test-taker should be assessed in an equitable way” (p. 175). There are several
viewpoints as to exactly what fairness means, and several of these perspectives are addressed in the
chapter as described below.

• Candidates of equal standing on the construct being measured should, on average, obtain
the same test score, irrespective of group membership. The requirement that the overall
passing rates be comparable or equal across groups is not a definition that is accepted in
the professional literature.
• There is agreement that equality of treatment for all candidates is mandatory and is one
acceptable definition of fairness.
• The absence of bias in the test instrument itself or how it is used is also an acceptable
definition of fairness.
• The fourth definition discussed in the Standards gives consideration to the opportunity to
learn the material assessed by a test and is specific to educational achievement testing.

In employment testing, the issue of fairness is typically addressed by statistically examining test
results for evidence of bias. It is not simply a matter of whether or not test score averages differ by
majority and minority groups, but whether or not there are differences in test score predictions by
subgroup. If the predictions are equivalent (i.e., no difference in the slopes or intercepts), then there is
Professional Guidelines/Standards 603

no bias. Another statistical perspective is that of differential item functioning (DIF). In this instance
if there is bias, candidates of equal ability differ in their responses to a specific item according to their
group membership. Unfortunately, the underlying reason for DIF, when it has been observed, has not
been apparent; oftentimes items work to the favor of one group relative to another for no explainable
reason such as item content. Use of sensitivity review panels that are comprised of individuals repre-
sentative of the subgroups of interest has been one mechanism that is intended to prevent item content
being relevant for one group but not another. Thus, members of the review panel would be expected to
flag items that were potentially unfair to a subgroup. However, there has not been research evidence
indicating that sensitivity review panels find a great deal to alter in test item content.

SELECTION PROCEDURE DEVELOPMENT AND ADMINISTRATION


Chapters 3–6 in the Standards are concerned with the development, implementation, and documen-
tation of selection procedures. For the most part, the discussions and standards are very technical in
nature and will not be reviewed in this chapter. However, a couple of topics that are very relevant to
employment selection will be mentioned.

• A cut score is used to partition candidates into two groups: one passing or successful and
the other not passing or not successful. There is no single or best method for setting a cut
score. Further, because selection procedures are not perfect, there will be errors—some
candidates will pass who do not truly have adequate skills (false positives), and some will
fail when in fact they do have adequate skills (false negatives). Changing a cut score to cor-
rect for one concern will increase the occurrence of the other. Thus, professional judgment
always must be relied upon when setting a cut score.
• Normative data should be clearly documented in terms of demographics, sampling proce-
dures, descriptive statistics, and the precision of the norms.
• The psychometric characteristics of different forms of the same test should be clearly doc-
umented, and the rationale for any claim of equivalency in using test scores from different
test forms must be reported.
• Standardization in the administration procedures is extremely important, and all instruc-
tions should be clear.
• Selection procedures and results (including individual scores) should be treated as confi-
dential and kept in a secure manner.
• Documentation for a selection procedure typically includes information about intended
purpose; prior research evidence; the development process; technical information regard-
ing validity, reliability, fairness, score interpretation, scaling, or norming, if appropriate;
administration; and use of the results (e.g., pass/fail).

RIGHTS AND RESPONSIBILITIES


There are four chapters in the Standards (Chapters 8–11) that discuss such matters as test user and
test-taker rights and responsibilities, testing individuals with diverse linguistic backgrounds and
assessing individuals with disabilities. The standards set forth in these chapters are for the most part
concerned with policy and administrative issues. Generally, these matters become more relevant in
specialized circumstances (e.g., an applicant with a verified disability who needs an accommoda-
tion to a selection procedure). Professional judgment is typically required because of the individual-
ized nature of the conditions.

SUMMARY
The 1999 Standards reflects the state of the science and much of the most current professional
knowledge available regarding psychological testing. As in the past, the Standards will be revised in
604 Handbook of Employee Selection

the future and that process is now underway. Nevertheless, the Standards is extremely informative
about the requirements associated with the development and application of a selection procedure
in an employment setting. The document has been published to promote the professionally sound
and ethical use of selection procedures, and to provide a set of standards that can be the basis for
developing and implementing a new selection procedure, or for evaluating the quality of an existing
selection procedure and practice.

PRINCIPLES FOR THE VALIDATION AND USE OF


PERSONNEL SELECTION PROCEDURES
BRIEF HISTORY
The first edition of the Principles was published in 1975 in response to the growing concern about
the need for professional standards for validation research that was identified by the leadership of
Division 14 of the American Psychological Association. Furthermore, because early versions of
what became the Uniform Guidelines were being prepared by various governmental organizations,
Division 14 representatives wanted to set forth the perspective of industrial and organizational (I-O)
psychology, particularly with regard to validation studies. The second edition was published 5 years
later and, for the first and only time, cited specific references regarding equal employment opportu-
nity and associated litigation. Because of continuing changes in employment case law, subsequent
editions have not attempted to stay current with them. Further, it has not been the purpose of the
Principles to interpret these cases in terms of the science of I-O psychology
In 1987 the third edition of the Principles was published by SIOP. This edition consisted of
36 pages of text and 64 citations to published research to support the various principles contained in
the document. There was an appended glossary that defined 76 terms used in the Principles.
The fourth edition of the Principles was published by SIOP and adopted as policy by the APA in
2003. This edition consists of 45 pages of text and an appended glossary of 126 terms. There are 65
research literature citations that support the scientific findings and professional practices that under-
lie the principles for conducting validation research and using selection procedures in the employ-
ment setting. The increase in glossary terms reflects some of the more recent scientific findings and
thinking related to such topics as generalized evidence of validity, work analysis, internal structure
validity evidence, models of reliability and fairness, and test development and implementation.

PURPOSE OF THE PRINCIPLES


The Principles establishes ideals and sets forth expectations for the validation process and the pro-
fessional administration of selection procedures. The document also can inform those responsible
for authorizing the implementation of a validation study and/or selection procedure. The Principles
does not attempt to interpret federal, state or local statues, regulations, or case law related to mat-
ters of employment discrimination. However, the Principles expects to inform decision-making in
employment administration and litigation and offers technical and professional guidance that can
help others (human resource professionals, judges, and lawyers) understand and reach conclusions
about the validation and use of employment selection processes.

PRINCIPLES VERSUS THE STANDARDS


The Principles was revised in 2003 with the full understanding that the document would be consis-
tent with the Standards. This is especially true with regard to the psychometric topics of validity,
reliability, and bias. Both documents are grounded in research and express a consensus of profes-
sional opinion regarding knowledge and practice in personnel selection. However, there are also
some important differences between the two documents.
Professional Guidelines/Standards 605

First, unlike the Standards, the Principles does not enumerate a list of specific principles in the
same manner as the Standards sets forth 264 standards. Consequently, the Principles is more aspi-
rational and facilitative in content, whereas the Standards is more authoritative in nature.
Second, the Standards is much broader than the Principles with respect to psychological mea-
surement. For example, although many of the concepts expressed in the Principles could be relevant
to the field of educational testing, the Standards directly addresses the topic. The same is true for
such topics as testing in program evaluation and public policy.
Third, the Standards gives somewhat more attention to the rights and responsibilities of test-
takers, whereas the Principles focuses more on the responsibilities of selection procedure develop-
ers and users. This is no doubt at least partially due to the fact that the Principles places most of
the responsibility for proper selection processes on the employer rather than the candidate, whereas
the Standards considers a much wider group of test-takers to include students, patients, counselees,
and applicants.
Fourth, and finally, the Principles provides more guidance on how to plan a validation effort
and collect validity evidence within the context of an employment setting. Consequently, there is
more discussion of such topics as feasibility of a validation study; strategies for collecting informa-
tion about the work and work requirements, as well as about job applicants or incumbents and their
capabilities; analyzing data including such topics as multiple hurdles versus compensatory models,
cut-off scores, rank orders, and banding; and information to be included in an administrative guide
for selection procedure users.

APPLICATION TO LITIGATION
The Principles offers the most up to date and relevant information and guidance regarding person-
nel selection procedures that might be the subject of litigation. Although the document is not written
in absolute terms, it provides a wealth of information that defines best practices in the validation
and implementation processes required to properly use a selection procedure. When examining the
qualities of a validation study or the implementation of a selection procedure, a decision-maker in
litigation proceedings might find that one or more expectations set forth in the Principles were not
met and ask why. Sound and logical explanations might be forthcoming; if not, then the unexplained
issues could be strong indicators that the processes being scrutinized were not established in accord
with accepted professional expectations.

ANALYSIS OF WORK
Given that the Principles is focused on selection procedures in the employment setting, there is
a particular emphasis on the analysis of work. Such an analysis establishes the foundation for
collecting validity evidence. More specifically, information from the analysis of work defines
relevant worker requirements and determines the KSAOs needed by a worker to successfully
perform in a work setting. Second, the work analysis defines the criterion measures that, when
appropriate for the validation strategy being used, indicate when employees have successfully
accomplished relevant work objectives and organizational goals.
Historically, the analysis of work was labeled “job analysis,” and that term is still frequently
used. The Principles expanded the term to the analysis of work to give clear recognition to the
realization that the concept of a traditional job is changing. Furthermore, the “analysis” should
incorporate the collection of data about the workers, the organization, and the work environment, as
well as the specific job or some future job if that is relevant to the study. As implied by the various
permutations that might be considered, there is no one preferred method or universal approach that
is appropriate for completing an analysis of work.
The Principles encourages the development of a strategy and a sampling plan to guide an analy-
sis of work. Further, the analysis of work should be conducted at a level of detail consistent with
606 Handbook of Employee Selection

the intended use and availability of the work information. Any method used and outcomes obtained
should be well documented in a written report.

VALIDATION
The Principles adopts the same definition of validity as given in the Standards. Validity is a unitary
construct and there are different sources of evidence that can contribute to the degree to which there
is scientific support for the interpretation of selection procedure scores for their proposed purpose.
If a selection procedure is found to yield valid interpretations, it can be said to be job-related.
The Principles recognizes the five sources of evidence discussed in the Standards. However, the
Principles places more emphasis on the two sources of evidence most frequently relied upon when
studying validity in the employment context—criterion-related and content validity.

Criterion-Related Validity Evidence


The Principles emphasizes several issues related to obtaining criterion-related validity evidence.

• Feasibility: Is it technically feasible to conduct the study in terms of measures, sample


sizes, and other factors that might unduly influence the outcomes?
• Design: Is a concurrent or predictive design most appropriate?
• Criterion: Is the criterion relevant, sufficient, uncontaminated, and reliable?
• Construct equivalence: Is the predictor measuring the same construct underlying the
criterion?
• Predictor: Is the selection procedure theoretically sound, uncontaminated, and reliable?
• Participants: Is the sample of individuals in the study representative, and will it support
the generalization of results?
• Analyses: Are the analytical methods to be used appropriate for the data collected?
• Strength of relationships: What effect size and statistical significance or confidence intervals
were hypothesized and observed?
• Adjustments: What adjustments are necessary because uncorrected observed valid-
ity relationships typically underestimate the predictor-criterion relationship? It may be
appropriate to adjust for restriction in range and unreliability in the criterion.
• Combining predictors/criteria: How are predictor and/or criteria scores weighted if
combined?
• Cross-validation: Should the estimates of validity be cross-validated to avoid capitalization
on chance? Typically, when regression analyses are used and the sample is small, adjustments
should be made using a shrinkage formula or a cross-validation design.
• Interpretation: Are the results observed consistent with theory and past research
findings?

Content Validity Evidence


The Principles also emphasizes several issues related to obtaining content validity evidence.

• Feasibility: Are the job determinant conditions (e.g., is the work stable or constantly chang-
ing), worker-related variables (e.g., are past experiences no longer relevant for the current
work), or contextual matters (e.g., are the work conditions extremely different from the
testing environment) that might influence the outcome of the validity study under sufficient
control so as to not contaminate the study?
• Design: Has an adequate sample of important work behaviors and/or worker KSAOs been
obtained and analyzed?
• Content domain: Has the work content domain been accurately and thoroughly defined
and linked to the selection procedure?
Professional Guidelines/Standards 607

• Selection procedure: Does the selection procedure adequately represent the work content
domain? The fidelity of this relationship is the basis for the validity inference.
• Sampling: Is there a sound rationale for the sampling of the work content domain?
• Specificity: Has the level of specificity necessary in the work analysis and selection proce-
dure been described in advance?
• Administrative procedures: Are there adequate guidelines established for administering and
scoring the selection procedure that will maintain the integrity of the validity evidence?

The Principles also recognized internal structure validity evidence. The Principles points out
that evidence based on the structure of a selection procedure is not sufficient alone to establish the
validity of the procedure for predicting future work performance or other work-related behaviors
(e.g., attendance, turnover). However, consideration of the internal structure can be very helpful dur-
ing the design of a selection procedure.

GENERALIZING VALIDITY EVIDENCE


The Principles provides considerably more detail regarding the generalization of validity evidence
in comparison to the Standards. There are at least three strategies for generalizing evidence, known
as transportability, synthetic validity/job component validity, and meta-analysis. The Standards
discusses meta-analysis in some detail, but not the other two strategies.

Transportability
This strategy refers to relying on existing validity evidence to support the use of a selection pro-
cedure in a very similar but new situation. The important consideration underlying the transport
argument is work/job comparability in terms of content and requirements. Also, similarity in work
context and candidate groups may be relevant to documenting the transport argument (Gibson &
Caplinger, 2007).

Synthetic/Job Component Validity


This type of generalization relies upon the demonstrated validity of selection procedure scores for one
or more domains or components of work. The work domains or components may occur within a job or
across different jobs. If a sound relationship between a selection procedure and a work component has
been established for one or more jobs, then the validity of the procedure can be generalized to another
job that has a comparable component. As in the transportability argument, the comparability of work
content on the basis of comprehensive information is essential to the synthetic/job component
validity process (Hoffman, Rashkovsky, & D’Egidio, 2007; Johnson, 2007)

Meta-Analysis
The information on meta-analysis in the Standards and Principles is very similar. In the Principles,
meta-analysis is acknowledged as the technique that serves as the foundation for validity gener-
alization. Both documents point out that meta-analytic findings may be useful but not sufficient
to reach a conclusion about the use of a selection procedure in a specific situation. Rather, a local
validation study may be more appropriate. Both sources also emphasize that professional judg-
ment is necessary to evaluate the quality of the meta-analytic findings and their relevance to the
specific situation of interest. The general conclusion in the Principles is that meta-analytic find-
ings for cognitive tests indicate that much of the difference in validity coefficients found from
one study to the next can be attributed to statistical artifacts and sampling error (Callendar &
Osburn, 1981; Hartigan & Wigdor, 1989; Hunter & Hunter, 1984). Similar but not conclusive
evidence is occurring for noncognitive measures (Barrick & Mount, 1991; Barrick, Mount, &
Judge, 2001; Hurtz & Donovan, 2000). Furthermore, the strength of the validity may be less for a
noncognitive test (Morgeson et al., 2007; Hogan, Davies, & Hogan, 2007).
608 Handbook of Employee Selection

The Principles add to the literature by discussing the appropriateness of the technique and its
interpretation in specific situations. In general, reliance on meta-analytic results is most appropriate
when the studies contributing to the meta-analysis focus on constructs. In such instances, the find-
ings reflect the degree to which the measures of the constructs are measuring the same construct.
In contrast, when the studies in the meta-analysis focus on methods (e.g., the interview) instead of
constructs, several interpretational difficulties arise. Because the interview measures different con-
structs, it is difficult to generalize about the general method of the interview unless the features of
the interview method “are clearly understood, if the content of the procedures and meaning of the
scores are relevant for the intended purpose, and if generalization is limited to other applications
of the method that include those features” (Principles, p. 30). Generalizing from a meta-analysis
of “the” interview method to a new interview method measuring different constructs or to a new
interview that addresses a new situation is problematic when constructs do not serve as the founda-
tion of the analysis.

FAIRNESS AND BIAS


As presented in the Standards, the topics of fairness and bias are also prominent in the Principles.
The Principles endorses the definitions and positions taken by the Standards. Further, the Principles
is somewhat more precise than the Standards with regard to defining predictive bias versus mea-
surement bias.

Predictive Bias
An alternative term is differential prediction, but in either case bias has occurred if consistent, non-
zero errors of prediction are made for individuals in a particular subgroup. Multiple regression is the
typical method used to assess predictive bias, which is indicated if there are slope and/or intercept
differences observed in the model. Research on cognitive ability measures has typically supported
the conclusion that there is no predictive bias for African-American or Hispanic groups relative to
Whites, and when predictive differences are observed they indicate overprediction of the perfor-
mance of the minority group. It is also important to note that there can be mean score differences
on a selection procedure for minority versus majority subgroups but no predictive bias.

Measurement Bias
This form of bias is associated with one or more irrelevant sources of variance contaminating a
predictor or criterion measure. There are not well-established approaches to assessing measurement
bias, as is the case for predictive bias. However, DIF and item sensitivity analyses are suggested as
options in the Principles, but considerable caution in the value of such analyses is also mentioned.
As noted by Sackett, Schmitt, Ellingson, and Kabin (2001), the research results indicate that item
effect is often very small and there is no consistent pattern of items that favor one group of individu-
als relative to another group.

OPERATIONAL CONSIDERATIONS
Almost one half of the Principles is devoted to operational considerations. The issues discussed
are related to initiating and designing a validation effort; analysis of work; selecting predictors, a
validation strategy, and criterion measures; data collection and analyses; implementation; recom-
mendations and reports (technical and administrative); and other circumstances that may influence
the validation effort (e.g., organizational changes; candidates with disabilities; and responsibilities
of selection procedure developers, researchers, and users).
There are a few topics discussed in the operational considerations section of the Principles that
could be critical issues in the development and implementation of an employment selection proce-
dure that are discussed in the following subsections.
Professional Guidelines/Standards 609

Combining Selection Procedures


If selection procedure scores are combined in some manner, the validity of the inferences derived
from the composite is most important. In other words, it is not sufficient to simply report the validity
index for each procedure as a stand-alone predictor; rather, an index of validity should be reported
for the combined selection procedure score that is used for decision-making.

Multiple Hurdle Versus Compensatory Models


There is no definitive guidance as to which model is most appropriate; rather, each situation must
be evaluated on its own merits. It is important to realize that combining scores into a compensatory
sum may affect the overall reliability and validity of the process. When multiple predictors (with
different reliabilities and validities) are combined into a single weighted composite score, the result
produces a single-stage selection decision. Depending on how each predictor is weighted will influ-
ence the psychometric characteristics of the compensatory selection procedure score, and the final
reliability/validity indices may be lower than if used in their individual capacities in a multistaged
selection process (Sackett & Roth, 1996).

Cut-Off Scores Versus Rank Order


A cut-off score may be set as high or low as needed relative to the requirements of the using orga-
nization given that a selection procedure demonstrates linearity or monotonicity across the range of
predictions (i.e., it is valid). For cognitive predictors, the linear relationship is typically found using a
criterion-related validity model and is assumed with a content validity process. Under these circum-
stances, using a rank-order (top-down) process will maximize expected performance on the criterion.
Whether this same premise holds true for noncognitive measures has not been determined.
In a rank-order model, the score of the last person selected is the lower bound cut-off score.
A cut-off score set otherwise usually defines the score on the selection procedure below which
applicants are rejected. Professional judgments that consider KSAOs required, expectancy of suc-
cess versus failure, the cost-benefit ratio, consequences of failure, the number of openings, the
selection ratio, and organizational diversity objectives are important to setting a cut-off score. In the
case of organizational diversity objectives, using lower cut-off scores could result in higher propor-
tions of minority candidates passing some valid initial hurdle, with the expectation that subsequent
hurdles might have less adverse impact. In such instances, cut-off scores are set low with the realiza-
tion there will be a corresponding reduction in job performance and selection procedure utility, but
that the tradeoffs regarding minority hiring are sufficient to overcome such reductions.

Utility
Gains in productivity, reductions in outcomes (e.g., accidents, manufacturing rejects, or absentee-
ism), or comparisons among alternate selection procedures can be estimated by utility computa-
tions. Typically, several assumptions must be made with considerable uncertainty to satisfy the
computational requirements of the utility models. Thus, caution should be observed in relying upon
such utility estimates.

Bands
When a range of selection procedure scores is established that considers all candidates within the
range to be equivalent, a band results. Banding may necessarily lower expected criterion outcomes
and selection utility when compared to top-down, rank-order selection, but these consequences may
be balanced by increased administrative ease and the possibility of increased workforce diversity.

Technical Validation Report Requirements


Every validation study should be documented with a technical report that contains sufficient infor-
mation to allow an independent researcher to replicate the study. Such a report should accurately
present all findings, conclusions, and recommendations. In particular, the technical report should
610 Handbook of Employee Selection

give clear information regarding the research sample and the statistical analyses conducted, as well
as recommendations on implementation and the interpretation of the selection procedure scores.

SUMMARY
The Principles offers a very current and comprehensive resource for use by decision-makers when
developing and implementing employment selection procedures. Because the Principles is focused
specifically on the employment setting, there is frequently more guidance offered on matters that
arise in the development and use of selection procedures than will be found in the Standards.
Nevertheless, the two documents are very compatible and not at all contradictory. The Principles
has undergone substantial professional peer review and represents the official policy of the SIOP
and APA.

UNIFORM GUIDELINES ON EMPLOYEE SELECTION PROCEDURES


When the U.S. Congress passed the Equal Employment Opportunity (EEO) Act of 1972 it created
the Equal Opportunity Coordinating Council, which was comprised of the Directors/Secretaries
of the EEOC, the CSC, the Civil Rights Commission (CRC), DoJ, and the DoL. The Council was
given the mandate to develop and implement policies, practices, and agreements that would be
consistent across the agencies responsible for enforcing EEO legislation. In 1977 the Council began
developing the Uniform Guidelines document, and it was adopted on August 25, 1978 by the EEOC,
the CSC, the DoJ, and the DoL’s OFCCP with an effective date of September 25, 1978. On March 2,
1979, the EEOC, Office of Personnel Management (OPM), DoJ, DoL, and Department of Treasury
published the Questions and Answers (the Q & As) to clarify and provide a common interpretation
of the Uniform Guidelines on Employee Selection Procedures. The change in agencies adopting
the Q & As was because OPM, and to some degree the Office of Revenue Sharing of the Treasury
Department, had succeeded the CSC.
Although some psychologists participated in the development of the Uniform Guidelines, there
was not consensus from the professional associations (e.g., SIOP, APA) that the document reflected
the state of the scientific knowledge regarding the validation and use of employee selection pro-
cedures. Ad hoc committees of psychologists from SIOP and APA reviewed draft versions of the
Uniform Guidelines and offered considerable input, but most of the suggestions were not incorpo-
rated (Camera, 1996). When Congress considered revising the Uniform Guidelines in 1985, APA
offered testimony that the document was deficient with respect to differential prediction, validity
generalization, utility analysis, and validity requirements and documentation. SIOP concurred with
the APA’s concerns and further argued that the Uniform Guidelines was in error in defining con-
struct validity and in determining the acceptable types of validity evidence.

PURPOSE
The Uniform Guidelines is intended to do the following:

[I]ncorporate a single set of principles which are designed to assist employers, labor organizations,
employment agencies, and licensing and certification boards to comply with requirements of Federal
Law prohibiting employment practices which discriminate on grounds of race, color, religion, sex, and
national origin. They are designed to provide a framework for determining the proper use of tests and
other selection procedures. These guidelines do not require a user to conduct validity studies of selection
procedures where no adverse impact results. However, all users are encouraged to use selection procedures
which are valid, especially users operating under merit principles. (Section 1.B 29C.F.R.1607)

The Q & As was prepared “to interpret and clarify, but not to modify, the provisions of the Uniform
Guidelines.” (Introduction, Federal Register 43, 166, 11996–12009, March, 1979)
Professional Guidelines/Standards 611

All subsequent references in this chapter to the Uniform Guidelines should be considered to be
inclusive of the Q & As.

APPLICATION AND LIMITATIONS


The Uniform Guidelines applies to Title VII of the Civil Rights Act, Executive Order 11246, and
EEO laws regarding race, color, religion, sex, and national origin. They do not apply to the Age
Discrimination in Employment Act of 1967, nor to sections 501, 503, and 504 of the Rehabilitation
Act of 1973, which prohibit discrimination on the basis of disability. Because the Americans
with Disabilities Act (ADA) was not enacted until 1991, the Uniform Guidelines was not able to
address this legislation and the protection it affords people with disabilities. Generally, the Uniform
Guidelines applies to most public and private sector employers.

SELECTION PROCEDURES/EMPLOYMENT DECISIONS


In general, the Uniform Guidelines defines selection procedures and employment decisions in a
manner similar to the Standards and the Principles. Thus, processes related to hiring, promotion,
retention, and certification are covered. These processes would include tests, assessment centers,
interview protocols, scored applications, physical ability measures, work samples, and performance
evaluations. Further, the Uniform Guidelines applies to any intermediate process (e.g., having to
complete a certification program to be eligible for a promotion) that leads to a covered employ-
ment decision. Two practices are exempt or are not considered selection procedures: recruitment
(excluded to protect the affirmative recruitment of minorities and women) and bona fide seniority
systems.

DISCRIMINATION/ADVERSE IMPACT
The Uniform Guidelines explicitly defines discrimination and introduces the term “adverse impact.”
In essence, discrimination occurs when a selection procedure results in unjustifiable adverse impact.
Adverse impact occurs when the selection rate for a protected group is less than four-fifths (80%)
of the rate for the group with the highest rate (typically the nonprotected group). To illustrate, if the
passing rate for the majority group is 60% whereas the passing rate for a protected group is 40%,
the 40/60 yields 67%, which is less than 80%, and consequentially, the conclusion may be adverse
impact. If, on the other hand, the passing rate of the protected group was 50%, then 50/60 yields
83% and thus the ruling will be no adverse impact. This “rule of thumb” is not intended as a legal
definition and for good reason because it is problematic from a couple of perspectives. First, it is
highly influenced by sample size. For example, if there are 50 male and 50 female applicants and
20 open positions, the only way a selection process will not violate the 80% rule is to hire at least
9 females (9/50 = 18%), which does not violate the 80% rule in this case because the passing rate
for the males is 22% (18/22 = 82%). Note that if the samples of males and females were each 500,
then the same percentages of 22% and 18% hired would yield 110 males and 90 females hired; this
difference of 20 would not be considered adverse impact.
Secondly, and perhaps most important, the 80% rule of thumb is not a statistical test. The null
hypothesis is not stated, and there is no estimate of the likelihood of any difference observed being
because of chance. Such hypothesis testing is accomplished using binomial or hypergeometric prob-
ability models. Typically the .05 level of statistical significance under a two-tailed test (e.g., 1.96
standard deviation units) is considered the threshold of significance. Although the 80% value has no
standing in the scientific literature, the .05 level of significance is well accepted in social sciences
research as indicating statistical significance. But this test also has its practical limitation because
statistical significance is a function of sample size. A difference of 5 points between two groups
would be statistically significant if the total sample were in the 1000s, but would not be statistically
612 Handbook of Employee Selection

significant if the total sample was two-digit (e.g., 30). Although the Uniform Guidelines recognizes the
problems inherent in the rule of thumb in Section 3D, where it recognizes that statistical significance
is impacted by “small numbers,” it does not provide guidance as to what is the favored strategy—
the 80% rule or statistical difference. This issue then becomes a point of argument in litigation.

FAIRNESS
This concept is introduced in the discussion of criterion-related validity [see Sec. 7.B (3) and Sec.
14.B (8)]. The Uniform Guidelines requires that a fairness investigation of a selection procedure
be conducted if technically feasible before applying validity evidence from one situation to a new
situation. Further, if adverse impact is observed and data from a criterion-related validation study
are available, the user is expected to conduct a fairness analysis. Unfairness occurs when lower
minority scores on a selection procedure are not reflected in lower scores on the criterion or index
of job performance. The Standards and Principles consider this a matter of predictive bias, and it
is found when consistent nonzero errors of prediction occur for a protected subgroup. Moderated
multiple regression is the accepted statistical method for examining predictive bias, which occurs if
there are slope and/or intercept differences between subgroups. As previously mentioned, there is
no consistent research evidence supporting predictive bias on cognitive tests for African Americans
or Hispanics relative to Whites. Research studies of noncognitive test results by subgroups have not
yet been conducted in sufficient numbers to draw any definitive conclusions.

CUT-OFF SCORES
Cut-off scores are discussed first in the Uniform Guidelines as part of the general standards for
validity studies (Sec. 5. H.) and then in the Technical standards section [Sec. 14. B. (6)]. According
to the Uniform Guidelines, “Where cutoff scores are used, they should normally be set so as to
be reasonable and consistent with normal expectations of acceptable proficiency within the work
force” (Sec. 5. H.).
This definition seems to imply the need for professional judgment in setting a cut-off score, and
such a stance is consistent with the Standards and the Principles.

BOTTOM LINE
Another concept introduced by the Uniform Guidelines when trying to assess adverse impact or
discrimination is the bottom-line approach. If there are multiple components to a selection proce-
dure the final decision point is evaluated for adverse impact. According to the Uniform Guidelines,
only if the adverse impact occurs at the bottom line must the individual components of a selection
procedure be evaluated. However, this concept was struck down by the U.S. Supreme Court in
Connecticut vs. Teal (1982). Hence, today it is typical that all components of a selection procedure
are evaluated in terms of adverse impact if they can be examined individually.

ALTERNATIVE SELECTION PROCEDURE


The Uniform Guidelines introduced the concept that if two or more selection procedures are available
that serve the user’s interest and have substantially equal validity, then the procedure demonstrating
the lesser amount of adverse impact should be used. Although conceptually the alternative selection
procedure is understandable, it is difficult to contend with in practice. There is no clear definition
for “substantially equal valid.” Although there are many alternatives, which ones might have lesser
amounts of adverse impact in a given situation is not easily discerned. Oftentimes, the degree of
adverse impact observed is very specific to the numbers and qualifications of applicants at a particular
point in time. And then there is the question as to what constitutes “lesser adverse impact”? Finally,
Professional Guidelines/Standards 613

many selection procedures are available, “which serve the user’s legitimate interest in efficient and
trustworthy workmanship” but still may not be a feasible alternative (see 3.B.). Examples of concerns
include faking or response distortions of personality and biodata inventories, costs of development
and implementation, and the ability to assess very large numbers of applicants at the same time.
Also of note is the general application of the “alternative selection procedure” section of the
Uniform Guidelines, Section 3B. Whereas most of the attention in the literature and litigation has
focused on alternative procedures, the Uniform Guidelines also considers “an investigation of …
suitable alternative methods of using the selection procedure which have as little adverse impact as
possible.” Thus, application of a particular method in a given situation might be used as pass-fail
instead of as top-down selection.

JOB-RELATEDNESS/BUSINESS NECESSITY
An employment selection procedure that has adverse impact may be justified in two ways: (a) show-
ing that the procedure is job-related or (b) showing that the procedure is justified by business neces-
sity. Job-relatedness is demonstrated by the validation process. Business necessity is demonstrated
when a selection procedure is necessary for the safe and efficient operation of the business entity.
Oftentimes there are relevant statutes and regulations that define the business necessity argument
(i.e. legislation regarding public safety job requirements); other times information from the analysis
of work will demonstrate the business necessity of a selection procedure.

VALIDITY
The Uniform Guidelines sets forth what are considered acceptable types of validity studies by
the enforcement agencies and identifies three types: criterion-related, content, and construct. The
document notes that new validation strategies “will be evaluated as they become accepted by the
psychological profession” (see 5.A.). The Uniform Guidelines also states that the validation provi-
sions “are intended to be consistent with generally accepted professional standards … such as those
described in the Standards for Educational and Psychological Tests … and standard textbooks and
journals in the field of personnel selection” (see 5.C). Of course the Standards being referred to
were published in 1974, and there have been two major revisions published in 1985 and 1999. The
Uniform Guidelines makes no specific reference to the Principles although the first edition was pub-
lished in 1975. Consequently, it is easy to understand how the treatment of validity by the Uniform
Guidelines is not particularly consistent with the state of the scientific knowledge as set forth in the
current editions of the Standards and the Principles.
When introducing validity, the Uniform Guidelines offers several warnings or conditions.

• Do not select on the basis of knowledge, skills, and abilities (KSAs) that can be learned on
the job during orientation.
• The degree of adverse impact should influence how a selection procedure is implemented,
and evidence sufficient to justify a pass/fail strategy may be insufficient for rank order.
• A selection procedure can be designed for higher level jobs if most employees can be
expected to progress to those jobs in about 5 years.
• An employer can use a selection procedure if there is substantial validity evidence from
other applications and if the employer has in progress, if technically feasible, a validity
study that will be completed in a reasonable period of time.
• Validity studies should be reviewed for currency, particularly if there may be available
alternative procedures with equal validity but less adverse impact.
• There are no substitutes for validity evidence and no assumptions of validity based on
general representation, promotional material, testimony, and the like.
• Employment agencies are subject to the guidelines in the same manner as employers.
614 Handbook of Employee Selection

Criterion-Related Validity
The Uniform Guidelines’ position on criterion-related validity is very consistent with the informa-
tion set forth in the Standards and Principles. Job analysis is important for decisions regarding
grouping jobs together and selecting and developing criterion measures. An overall measure of job
performance may be used as a criterion if justified by the job analysis (the Principles and Standards
emphasize the need for construct equivalence for predictor and criterion measures and typically
there is a greater degree of construct specificity than “overall performance” developed from the job
analysis); success in training also can be used as a criterion. Concurrent and predictive designs are
recognized, and emphasis is placed on the representativeness of the sample of individuals partici-
pating in the validity study, regardless of its design.
Criterion-related validity evidence should be examined using acceptable statistical procedures,
and the Uniform Guidelines establishes the .05 level of statistical significance as the threshold for
concluding that there is a relationship between a predictor and a criterion. Usually the relationship
is expressed as a correlation coefficient, which must be assessed in the particular situation, “There
are no minimum correlation coefficients applicable to all employment situations” [see 14.B. (6)].
Additionally, care must be taken to not overstate validity findings.
Content Validity
The technical standards for content validity studies begin by focusing on the appropriateness of
such a study. A selection procedure must be a representative sample of the job content or purport
to measure KSAs that are required for successful job performance. Selection procedures based on
inferences about mental abilities or that purport to measure traits such as intelligence, common
sense, or leadership cannot be supported only on the basis of content validity. Solid job analysis
information that is representative of the jobs, and when necessary is operationally defined, is critical
to a content validity argument.
The Uniform Guidelines provides for the ranking of candidates assessed by a content-valid selec-
tion procedure, given that the procedure is measuring one or more capabilities that differentiate
between levels of job performance. This is generally compatible with the guidance offered by the
Principles, although the Q & As to the Uniform Guidelines gives more examples as to when it is, or
is not, appropriate to use rank ordering.
Construct Validity
This form of validity is defined in Section 14.D (1) of the Uniform Guidelines as “a series of research
studies, which include criterion related and which may include content validity studies.” In Section
14.D (1) and (3), it is stated that a “construct” is the intermediary between the selection procedure
on the one hand and job performance on the other. A job analysis is required, and one or more con-
structs that are expected to influence successful performance of important work behaviors should
be identified and defined. To accomplish a construct validity study, it should be empirically dem-
onstrated “that the selection procedure is validly related to the construct and that the construct is
validly related to the performance of critical or important work behaviors” [14.D (3)]. (This is the
definition that drew the objections of APA and SIOP.) In turn, a selection procedure is developed
that will measure the constructs of interest.
Documentation Required
The Uniform Guidelines sets forth many documentation requirements for a validity study, and many
of these requirements are labeled “essential.” Generally speaking, the information expected as part
of the documentation effort is very consistent with the material presented in each of the various sec-
tions of the Uniform Guidelines.
Utility
There is one term that does not have a definition in the Uniform Guidelines that could have many
interpretations; that term is “utility.” It is found in the sections dealing with the uses and applications
Professional Guidelines/Standards 615

of a selection procedure that has been evaluated by a criterion-related validity study. Specifically,
when documenting the methods considered for using a procedure it “should include the rationale for
choosing the method of operational use, and the evidence of validity and utility of the procedure as
it is to be used (essential)” [see 15.B. (10)]. Identical sentences appear in the uses and applications
sections for content and construct validity. Furthermore, in Section 5.G. the Uniform Guidelines
states:

If a user decides to use a selection procedure on a ranking basis, and that method of use has a greater
adverse impact than use of an appropriate pass/fail basis …, the user should have sufficient evidence of
validity and utility to support the use on a ranking basis.

COMPARISONS AMONG THE THREE AUTHORITIES


Given different authorships, different purposes, and regardless of different dates of adoption, it is
useful to make comparisons among the three authorities to identify areas of agreement and dis-
agreement. Such information might be particularly valuable to a user who is deciding about relying
on one or more of the authorities or who has relied on one of the authorities and not realized what
one or two of the other authorities had to say on the topic of interest.
The common themes across the three authorities are matters of validation and psychometric mea-
surement. To facilitate this discussion, Table 28.1 has been prepared to compare the three authorities
on several concepts or terms and their respective definitions or explanations. Before discussing any
of the specifics, it is quickly noticed that there are many terms without definitions or explanations
under the Uniform Guidelines column. There are, no doubt, several reasons for this situation, and
two explanations follow.

• The Uniform Guidelines is some 20 years older than the Standards and 25 years older than
the Principles. The later two documents have undergone one and two revisions, respec-
tively, since the Uniform Guidelines was published, but the Uniform Guidelines has never
been revised or brought up to date.
• The Uniform Guidelines was written to guide the enforcement of civil rights legislation.
The Standards and Principles were written to guide research and professional practice and
to inform decision-making in applicable areas of employment selection. Hence, the latter
two documents have more of a scientific focus and rely heavily on the current research
literature; the Uniform Guidelines was intended to be consistent with generally accepted
professional standards set forth in the 1974 version of the Standards, but was not necessar-
ily research-based at the time of its preparation.

STANDARDS VERSUS PRINCIPLES


There are no areas of disagreement between the Standards and the Principles. In some areas the
Standards offers more information and guidance than the Principles. Examples include (a) discus-
sions of validity evidence based on response processes, internal structure, and the consequences
of testing; (b) discussions of reliability and errors of measurement; (c) the test development and
revision process; (d) scales, norms, and score comparability; and (e) the rights and responsibilities
of test-takers. There also are a few topics that are more broadly considered in the Principles than
is true for the Standards. Examples include (a) the concept of the analysis of work (to incorporate
the work context and organizational setting) rather than job analysis; (b) clarifying that the gener-
alization of validity evidence can be accomplished by several methods, including transportability
and synthetic/job component validity, as well as being supported by meta-analysis; and (c) certain
operational considerations associated with the conduct of a validation study in an organizational
setting (e.g., communications, organizational needs and constraints, quality control and security,
implementation models, and utility).
616 Handbook of Employee Selection

TABLE 28.1
Validation and Psychometric Terminology Comparison
Standards Principles Uniform Guidelines
Validity (unitary The degree to which The degree to which accumulated Not defined.
concept) accumulated evidence and evidence and theory support
theory support specific specific interpretations of scores
interpretations of test scores. from a selection procedure
entailed by the proposed uses of
that selection procedure.
Sources of validity evidence
(a) Relations to other The relationship of test scores to The statistical relationship Empirical data showing
variables / variables external to the test between scores on a predictor that the selection
criterion-related such as measures of some and scores on a criterion procedure is predictive
criteria that the test is expected measure. of or significantly
to predict. correlated with
important elements of
work behavior.
(b) Content The linkage between a predictor The extent to which content Data showing that the
and one or more aspects of a of a selection procedure is a content of a selection
criterion construct domain. representative sample of procedure is
work-related personal representative of
characteristics, work important aspects of
performance, or other work performance on the
activities or outcomes. job.
(c) Internal structure The extent to which the The degree to which psychometric Not defined.
relationships between and statistical relationships
test items conform to the among items, scales, or other
construct that is the foundation components within a selection
for test score interpretation. procedure are consistent with the
intended meanings of scores on
the selection procedure.
(d) Response process The study of the cognitive The study of the cognitive account Not defined.
account of some behavior, such of some behavior, such as
as making a selection procedure making a selection procedure
item response. item response.
(e) Consequences of Whether or not the specific Evidence that consequences of Not defined.
testing benefits expected from the use selection procedure use are
of a selection procedure are consistent with the intended
being realized. meaning or interpretation of the
selection procedure.
Construct validity An indication that test scores are Evidence that scores on two or Data showing that the
to be interpreted as indicating more selection procedures are selection procedure
the test-taker’s standing on the highly related and consistent with measures the degree to
psychological construct the underlying construct; can which candidates have
measured by the test. The term provide convergent evidence in identifiable
construct is redundant with support of the proposed characteristics that have
validity. The validity argument interpretation of test scores as been determined to be
establishes the construct representing a candidate’s standing important for successful
validity of a test. on the construct of interest. job performance.
Convergent validity Evidence based on the relationship Evidence of a relationship Not defined.
between test scores and other between measures intended to
measures of the same constructs. represent the same construct.
Professional Guidelines/Standards 617

TABLE 28.1 (continued)


Validation and Psychometric Terminology Comparison
Standards Principles Uniform Guidelines
Discriminant validity Evidence based on the Evidence of a lack of a Not defined.
relationship between test scores relationship between measures
and measures of different intended to represent different
constructs. constructs.
Validity Applying validity evidence Evidence of validity that Not defined.
generalization obtained in one or more generalizes to setting(s) other
situations to other similar than the setting(s) in which the
situations on the basis of original validation evidence was
simultaneous estimations, documented. Generalized
meta-analysis, or synthetic evidence is accumulated through
validation arguments. such strategies as transportability,
synthetic/job component validity,
and meta-analysis.
Transport of validity Reliance on a previous study of A strategy for generalizing Using evidence from
the predictor-criterion evidence of validity in which another study when the
relationship done under demonstration of important job incumbents from
favorable conditions (i.e., large similarities between different both situations perform
sample size and a relevant work settings is used to infer that substantially the same
criterion) and the current local validation evidence for a major work behaviors
situation corresponds closely selection procedure accumulated as shown by
to the previous situation (i.e., in one work setting generalizes appropriate job
the job requirements or to another work setting. analyses; the study
underlying psychological should also include an
constructs are substantially the evaluation of test
same) as determined by a job fairness for each race,
analysis, and that the predictor sex, and ethnic group
is substantially the same. that constitutes a
significant factor in the
labor market for the
job(s) in question
within the labor force
of the organization
desiring to rely on the
transported evidence.
Synthetic/job Not defined. Generalized evidence of validity Not defined.
component validity based on previous demonstration
of the validity of inferences from
scores on the selection procedure
or battery with respect to one or
more domains of work (job
components).
Meta-analysis A statistical method of research A statistical method of Not defined.
in which the results from research in which results from
several independent, several independent studies of
comparable studies are comparable phenomena are
combined to determine the size combined to estimate a
of an overall effect on the parameter or the degree of
degree of relationship between relationship between variables.
two variables.

continued
618 Handbook of Employee Selection

TABLE 28.1 (continued)


Validation and Psychometric Terminology Comparison
Standards Principles Uniform Guidelines
Reliability The degree to which test scores The degree to which scores for a The term is not defined
for a group of test-takers are group of assessees are consistent but the reliability of
consistent over repeated over one or more potential selection procedures,
applications of a measurement sources of error (e.g., time, particularly those used
procedure and hence are raters, items, conditions of in a content viability
inferred to be dependable; the measurement, etc.) in the study, should be of
degree to which scores are free application of a measurement concern to the user.
of errors of measurement for a procedure.
given group.
Fairness/unfairness The principle that every There are multiple perspectives on When members of one
test-taker should be assessed in fairness. There is agreement that race, sex, or ethnic
an equitable way. There is no issues of equitable treatment, group characteristically
single technical meaning; in the predictive bias, and scrutiny for obtain lower scores on
employment setting fairness possible bias when subgroup a selection procedure
can be defined as an absence of differences are observed are than members of
bias and that all persons are important concerns in personal another group and the
treated equally in the testing selection; however, there is not differences in scores
process. agreement that the term are not reflected in
“fairness” can be uniquely differences in a
defined in terms of any of these measure of job
issues. performance.
Predictive bias The systematic under- or The systematic under- or Not defined, but see
overprediction of criterion overprediction of criterion “Fairness/unfairness”
performance for people performance for people above.
belonging to groups belonging to groups
differentiated by characteristics differentiated by characteristics
not relevant to criterion not relevant to criterion
performance. performance.
Cut score/cut-off A specific point on a score scale A score at or above which Cut-off scores should
score such that scores at or above that applicants are selected for further normally be set so as
point are interpreted or acted consideration in the selection to be reasonable and
upon differently from scores procedure. The cut-off score may consistent with normal
below that point. be established on the basis of expectations of
several considerations (e.g., labor acceptable proficiency
market, organizational within the workforce.
constraints, normative
information). Cut-off scores are
not necessarily criterion
referenced, and different
organizations may establish
different cut-off scores on the
same selection procedure on the
basis of their needs.

The above definitions or explanations are taken verbatim from the glossaries or definition section of the authoritative
sources whenever possible. Otherwise the definitions were extracted from document text on the subject.
Professional Guidelines/Standards 619

STANDARDS/PRINCIPLES VERSUS UNIFORM GUIDELINES


The comparisons set forth in Table 28.1 delineate many of the typical areas where the first two
authoritative sources do not agree with the third. In several instances, the term or concept is sim-
ply not defined or considered by the Uniform Guidelines; in other instances, there is disagreement
regarding the meaning or methods to be followed to satisfy a term or concept. A discussion of each
of these points in the order they appear in Table 28.1 follows.

Validity (Unitary Concept)


The Standards and the Principles view validity as a unitary concept, whereas the Uniform Guidelines
partitions validity into three types: criterion-related, content, and construct. This partitioning of
validity was the thinking 40 years ago, but is clearly out of date now.

Sources of Validity Evidence


(a) Relations to other variables/criterion-related: The Uniform Guidelines’ focus on work
behavior as a criterion excludes potential studies of the relationships between a selection
procedure of interest and other tests hypothesized to measure the same or different con-
structs (i.e., other external variables).
(b) Content: All three authorities agree that content validity is dependent on a sound deter-
mination that the selection procedure is a representative sample of work-related behavior.
The analysis of work (or the job) is fundamental to establishing the predictor-criterion
linkage. The Uniform Guidelines confines job requirements to a study of KSAs; the
Standards and Principles provide for the study of KSAOs and would include “O” vari-
ables in a selection procedure subject to a content validity study. The Uniform Guidelines
precludes use of a content strategy to study the validity of traits or constructs such as spa-
tial ability, common sense, judgment, or leadership. Although it is important to describe
the relevant work behavior or KSAO at a level of specificity so there is no misunderstand-
ing about what is being measured, it is not wise to reject content validity evidence simply
because it is concerned with linking an ability or personal characteristic (i.e. leadership)
to the domain of job performance. Many constructs can be defined in terms of specific
work behaviors although they have broad labels. Further, there are many situations in
which content validity may be the only option. If leadership capabilities are critical to
job performance, a content validity study may be the only alternative. There may not
be adequate numbers of candidates or incumbents to conduct a criterion-related study.
There may not be a sufficient and reliable criterion to measure. Consequently, a content
validity study may be the only viable approach to evaluating the validity of a construct of
interest.
(c) Internal structure/response processes/consequences of testing: These three lines of evi-
dence for a validity argument were not developed at the time the Uniform Guidelines was
written and hence are not discussed.

Construct Validity
The Uniform Guidelines treats construct validity as a separate type of validity. In the Standards and
Principles, all selection procedure scores or outcomes are viewed as measures of some construct.
Consequently, any evaluation of validity is a “construct validity” study.

Convergent and Discriminant Validity


Although these terms and their implications were well-established at the time the Uniform
Guidelines was prepared, there was no discussion about the value of these types of evidence in the
document.
620 Handbook of Employee Selection

Validity Generalization
The concept was known at the time the Uniform Guidelines was prepared but was not specifically
used in the document. Many have interpreted Section 7.B of the Uniform Guidelines as providing
for validity generalization arguments. The provisions of that section are described under transport
of validity evidence in Table 28.1.

Transport of Validity
The three authoritative sources agree that a work or job analysis is necessary to support the
transport of validity. However, the Uniform Guidelines goes further and requires that there be
an existing criterion-related validity and a fairness study of the selection procedure for relevant
protected subgroups. However, there is no guidance as to the acceptability of transporting the
validity of a selection procedure that has some demonstrated unfairness. Furthermore, in many,
if not most situations, sample sizes preclude adequate fairness analyses.

Synthetic/Job Component Validity


This validity generalization strategy has been known for more than 40 years but has not received
much attention in validation research conducted outside of the employment arena. Neither the
Standards nor the Uniform Guidelines have defined this strategy of validity generalization.

Meta-Analysis
Again, given the document date, the authors of the Uniform Guidelines did not have knowledge of
the research findings that have emerged from meta-analytic research. This, unfortunately, is another
void, and a significant amount of research is available today that might not be considered to be
within the scope of validation strategies acceptable under the Uniform Guidelines.

Reliability
The term is not defined in the Uniform Guidelines as it is in the other two authoritative sources
but is considered to be important for selection procedures that have been supported with a content
validity strategy. The Standards and Principles emphasize that the reliability of any measurement
be considered whenever it is technically feasible to do so.

Fairness/Unfairness and Bias


The Standards and the Principles consider fairness to be a very broad concept with many facets.
Alternatively, the two sources consider bias to a very specific term concerned with under- or over-
prediction of subgroup performance. This is basically the same interpretation that the Uniform
Guidelines gives to the term unfairness while relying on the 1974 version of the Standards.

Cut Score/Cut-Off Score


The Principles gives more attention to developing in detail many of the issues underlying the set-
ting of a cut-off score than the other two authoritative sources. However, there does not seem to
be any significant disagreement across the three documents as to how a cut-off score will function
and the intent for a cut-off score to screen out those who will not achieve acceptable levels of job
performance.

SUMMARY
There are some levels of consistency or agreement across the three authoritative sources but also
consequential areas of disagreement. It is very likely that the advances in selection procedure
research and scholarly thinking regarding validity that have occurred over the last 35 years account
for these differences. Although the Uniform Guidelines is the document that seems most deficient
in terms of knowledge of the field, it is also the first document of the three compared in terms of its
Professional Guidelines/Standards 621

adoption. Accordingly, its deficiencies can be excused by being out of date. But, as noted earlier in
this chapter, the Uniform Guidelines allowed for other procedures and issues to take precedence.
Sections 5.A and 5.C acknowledge, respectively, that “New strategies for showing the validity of
selection procedures will be evaluated as they become accepted by the psychological profession”
and that “The provisions of these guidelines … are intended to be consistent with generally accepted
professional standards … and standard textbooks and journals in the field of personnel selection.”
These clauses can be interpreted to suggest that deference should be given to the Principles and
Standards where they disagree with the Uniform Guidelines. Nevertheless, sometime in the near
future it will be important for the Uniform Guidelines to be revised to reflect the current state of
the science. Until that time, the decision-maker involved in employment selection should look to the
Standards and Principles for guidance on many issues that either are now incorrect or not addressed
in the Uniform Guidelines.

FINAL THOUGHTS
SCIENCE VERSUS LITIGATION VERSUS TECHNICAL AUTHORITIES/GUIDELINES
It is recognized that there are some significant inconsistencies at this time between the technical
information provided by the Standards and Principles, on the one hand, and the Uniform Guidelines,
on the other hand, and that these differences can be extremely important in the event of litigation
regarding a selection procedure. However, these differences can be resolved. Unfortunately, until a
revision to the Uniform Guidelines is forthcoming, to the extent that there is more than one author-
ity introduced in litigation that is offered as support to only one side of an argument, resolution of
differences that appear in print will need to be part of the judicial decision-making process. In this
regard, it is incumbent upon those who do rely on any of the authoritative sources during the course
of litigation to be clear about the relevance and currency of the source(s) that are providing guidance
to their opinions.

CONCLUSIONS
In closing, there are several broad as well as some specific issues that we want to note. We will
start with the broader issues. First, what deference should be given to the Uniform Guidelines,
Principles, and Standards in guiding psychologists as they make decisions in employment settings?
We ask this question given that the three documents are in many ways static, whereas the field is
dynamic. That is, research is constantly being conducted that provides new knowledge and/or influ-
ences how we interpret behavioral phenomena. For example, it is a commonly excepted fact that the
validity of cognitive ability tests generalizes across situations and jobs (Hunter & Hunter, 1984).
Yet this was not always the “accepted” fact in that in the 1960s, validity was described as “situation
specific” (Ghiselli, 1966). If there had been three sets of sources promulgated by various agencies in
the 1960s, they most likely would have advocated for “situation specificity,” and the practice would
have been for the need to validate tests in every situation for every job. The point of this example is
that perhaps the current sources—Uniform Guidelines, Principles, and Standards—should not be
viewed as authoritative regarding knowledge, but rather as primers for “how to conduct research”
and what factors to consider when determining the validation of a test.
Reliance on the sources for factual information may hamper the field. The documents are not
“living” and thus cannot account for changes due to new research. However, the practitioner or
researcher can rely on the sources with regard to how to establish the validity of a test and what
information is needed as part of the research.
Acceptance of the above premise brings us to the second broad issue. Given that the sources
are relied upon in litigation, whether introduced directly in testimony in court cases or as authority
references when trying to explain to lawyers what and how we conduct our research, the question
622 Handbook of Employee Selection

becomes “How sound are the sources as authoritative documents in court proceedings?” This brings
us to the utility of the sources in our practice.
We believe that psychologists know science when it is forthcoming; it is a method of inquiry
and not necessarily a category of knowledge. But, we cannot go into court and pronounce conclu-
sions from our “science” without being required to pass what are known as “Daubert thresholds.”
Daubert thresholds or criteria address what is admissible scientific evidence. The thresholds came
about from Supreme Court (Daubert v. Merrell Dow Pharmaceuticals Inc., 1993) rulings pertain-
ing to what is expert testimony and scientific evidence and has been codified in the Court’s Federal
Rules of Evidence.

Rule 702: If scientific, technical, or other specialized knowledge will assist the trier of fact to understand
the evidence or to determine a fact in issue, a witness qualified as an expert by knowledge, skill,
experience, training, or education, may testify thereto in the form of an opinion or otherwise, if (1) the
testimony is based upon sufficient facts or data, (2) the testimony is the product of reliable principles
and methods, and (3) the witness has applied the principles and methods reliably to the facts of the
case.

Essentially, the courts have determined that our expert opinions must be reliable and valid. But note
two issues: (a) we are not certain the courts are using “reliable” and “valid” as we use them in psychology
and (b) there is lots of room for interpretation (e.g., “sufficient facts,” “reliable principles and methods,”
and “applied … reliably”).
But the Daubert decision provided four factors for the Courts to use when assessing the scientific
validity underlying expert testimony.

1. Testing—adequate testing; can be or has been tested


2. Has a known or potential error rate
3. Has been subjected to peer review and publication
4. Has gained general acceptance in a relevant scientific community—a proxy for validity?

The above is not a checklist nor is it exhaustive. Different courts have used it differently. The focus
has been on the expert’s principles and methodology and not necessarily on the conclusions that
the principles and methodology generate. This position reinforces our view that the three sources
should be relied upon for the procedural and methodological content and not necessarily for the
knowledge being espoused when the source was written.
Can we work with the Daubert factors? Before answering the question, let us present some con-
clusions from a survey of about 300 judges reported by Landy (2005) that focused on reported prob-
lems with expert testimony; areas where judges experienced discomfort with experts. The results,
reported in descending order of magnitude, show the following problems:

1. Experts were not objective and acted as an advocate.


2. Experts were expensive.
3. Expert testimony was of questionable validity or reliability.
4. The disagreements between opposing experts defied reason.
5. There were often dramatic disparities in the level of competence of opposing experts.
6. The expert testimony was not comprehensible.
7. The expert testimony was comprehensible but useless.
8. Information about the experts was withheld by the party retaining the expert.

Given that nearly all of the cases underlying the above conclusions involved experts who were
familiar with one or more of the sources covered in this chapter, why would there be such conclu-
sions? One answer is that the sources were simply ignored by the experts. Second, the court may
have discounted the value or authority of the source. Third, the courts may not have gained an
Professional Guidelines/Standards 623

appreciation for the technical details. Finally, the courts may prefer to rely more on common sense
than I-O knowledge. Regardless of the reason, we need to do a better job of conveying our “prin-
ciples” and “standards.”
Given that the reality is that we will continue to have revisions of authoritative sources, what
should those revisions address? The general headings and categories in the sources are appropriate.
What needs to be addressed are specific issues and considerations: The need to consider the fact that
the global economy is changing the way in which we work and with whom we work. Accordingly,
future sources should address cultural issues and the changing nature of work.

1. The need to consider assessment of individuals with diverse linguistic backgrounds as well
as the need to accommodate test-takers whose first language is not English.
2. The need to consider electronic, Internet, and web-based technology and the fact that the
next generation of workers will likely have not been exposed to the same methods of train-
ing, operating, and performing at work as the current generation. Advanced technology
should provide for greater opportunity to capture actual samples or simulations of job
behaviors than are garnered in paper-and-pencil multiple-choice formats.
3. The need to identify criteria that are relatively focused on more short-term gains than those
that have been used in the past (e.g., tenure in the position for at least 1 year). Businesses
want to cut their losses (such as incorrect “hires”) much more quickly than was common in
the past.
4. The need to recognize that current tests explain at most, approximately 25% of the variance
in criteria. Although it is appropriate to concern ourselves with searching for additional
predictors, we need to consider ways in which to broaden the criterion space and how to
combine the criteria in such a fashion as to provide a “comprehensive” picture of the worker.
That is, although we can predict to a reasonable degree (15–25% of the variance) how well
entering college students may perform as represented by the criterion of final grade point
average, we need to examine other factors that measure success in college and how these
additional factors can be combined to represent success in the “college experience.”

Authoritative sources that incorporate principles, guidelines, and standards have a valuable role
to play in the science of employment selection. However the limitations inherent to such sources
must be openly recognized, and to the degree there is disagreement or conflicts among the sources
they should be revealed before they attain a stature that creates a disservice to employees, employ-
ers, and I-O psychology.

REFERENCES
American Educational Research Association, American Psychological Association, & National Council on
Measurement in Education. (1999). Standards for educational and psychological testing. Washington,
DC: American Education Research Association.
Barrett, G. V., Phillips, J. S., & Alexander, R. A. (1981). Concurrent and predictive validity designs: A critical
reanalysis. Journal of Applied Psychology, 66, 1–6.
Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis.
Personnel Psychology, 44, 1–26.
Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). The FFM personality dimensions and jobs: Meta-analysis
of meta-analysis. International Journal of Selection and Assessment, 9, 9–30.
Bemis, S. E. (1968). Occupational validity of the General Aptitude Test Battery. Journal of Applied Psychology,
52, 240–249.
Binning, J. F., & Barrett, G.V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential
evidential bases. Journal of Applied Psychology, 74, 478–494.
Callender, J. C., & Osburn, H. G. (1981). Testing the constancy of validity with computer-generated sampling
distributions of the multiplicative model variance method estimate: Results for petroleum industry
validation research. Journal of Applied Psychology, 66, 274–281.
624 Handbook of Employee Selection

Camara, W. J. (1996). Fairness and public policy in employment testing: Influences from a professional asso-
ciation. In R. S. Barrett (Ed.), Fair employment strategies in human resource management. Westport, CT:
Quorum Books.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validity by the multitrait-multimethod
motive. Psychological Bulletin, 56, 81–105.
Cascio, W. F. (2000). Costing human resources: The financial impact of behavior in organizations (4th ed.).
Cincinnati, OH: Southwestern.
Connecticut v. Teal, 457 U.S. 440 (1982).
Cronbach, L. J., & Gleser, G. C. (1965). Psychological tests and personnel decisions (2nd ed.). Urbana, IL:
University of Illinois.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52,
281–302.
Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993).
Equal Employment Opportunity Commission, Civil Service Commission, Department of Justice, & Department
of Labor. (1978). Uniform guidelines on employee selection procedures. 29 CFR, 1607.
Equal Employment Opportunity Commission, Office of Personnel Management, Department of Justice,
Department of Labor, & Department of Treasury. (1979). Questions and answers to clarify and provide
a common interpretation of the Uniform Guidelines on Employee Selection Procedures. 44FR, No. 43,
March 2, 1979.
Ghiselli, E. E. (1996). The validity of occupational aptitude tests. New York, NY: John Wiley.
Gibson, W. M., & Caplinger, J. A. (2007). Transportation of validation results. In S. M. McPhail (Ed.),
Alternative validation strategies: Developing new and leveraging existing validity evidence (pp. 29–81).
San Francisco, CA: John Wiley and Sons.
Hartigan, J. A., & Wigdor, A. K. (Eds.). (1989). Fairness in employment testing. Washington, DC: National
Academy Press.
Hoffman, C. C., Rashkovsky, B., & D’Egidio, E. (2007). Job component validity: Background, current research,
and applications. In S. M. McPhail (Ed.), Alternative validation strategies: Developing new and leverag-
ing existing validity evidence (pp. 82–121). San Francisco, CA: John Wiley and Sons.
Hogan, J., Davies, S., & Hogan, R. (2007). Generalizing personality-based validity evidence. In S. M. McPhail
(Ed.), Alternative validation strategies: Developing new and leveraging existing validity evidence (pp.
181–229). San Francisco, CA: John Wiley and Sons.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin, 96, 72–88.
Hunter, J. E., & Schmidt, F. L. (1990). Methods of meta-analysis. Newbury Park, CA: Sage.
Hurtz, G. M., & Donovan, J. J. (2000). Personality and job performance: The Big Five revisited. Journal of
Applied Psychology, 85, 869–879.
Johnson, J. W. (2007). Synthetic validity: A technique of use (finally). In S. M. McPhail (Ed.), Alternative
validation strategies: Developing new and leveraging existing validity evidence (pp. 122–158). San
Francisco, CA: John Wiley and Sons.
Landy F. J. (2005). A judge’s view: Interviews with federal judges about expert witness testimony. In F. Landy
(Ed)., Employment discrimination litigation: Behavioral, quantitative and legal perspectives (pp. 503–
572). San Francisco, CA: Jossey-Bass.
McDonald, R. P. (1999). Test theory: Unified treatment. Mahwah, NJ: Lawrence Erlbaum.
Messick, S. (1980). Test validity and the ethics of assessment. American Psychologist, 35, 1012–1027.
Messick, S. (1989). Validity. In R. L. Linn (Ed), Educational measurement (3rd ed.). New York, NY: Macmillan.
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007).
Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60,
683–729.
Naylor, J. C., & Shine, L.C. (1965). A table for determining the increase in mean criterion scores obtained by
using a selection device. Journal of Industrial Psychology, 3, 33–42.
Nunnally, J.C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw Hill.
Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980). Validity generalization results for tests used to predict job
proficiency and training success in clerical occupations. Journal of Applied Psychology, 65, 373–406.
Pedhazur, E. J., & Schmelkin, L. P. (1991). Measurement, design, and analysis: An integrated approach.
Hillsdale, NJ: Lawrence Erlbaum.
Rosenthal, R. (1991). Meta-analytic procedures for social research. Newbury Park, CA: Sage.
Sackett, P. R., & Roth, L. (1996). Multi-stage selection strategies: A Monte Carlo investigation of effects on
performance and minority hiring. Personnel Psychology, 49, 549–572.
Professional Guidelines/Standards 625

Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in employment, creden-
tialing, and higher education. American Psychologist, 56, 302–318.
Schmidt, F. L., Hunter, J. E., McKenzie, R. C., & Muldrow, T. W. (1979). Impact of valid selection procedures
on work-force productivity. Journal of Applied Psychology, 64, 609–626.
Society for Industrial and Organizational Psychology. (2003). Principles for the validation and use of person-
nel selection procedures (4th ed.). Bowling Green, OH: Author.
Traub, R. E. (1994). Reliability for the social sciences: Theory and applications. Thousand Oaks, CA: Sage.
Wainer, H., & Braun, H. I. (Eds.). (1988). Test validity. Hillsdale, NJ: Lawrence Erlbaum.
This page intentionally left blank
29 AinSampler of Legal Principles
Employment Selection

Frank J. Landy,1 Arthur Gutman, and James L. Outtz

INTRODUCTION
We are very pleased that we have been asked to contribute a chapter on legal principles for this
volume. For almost 40 years, principles of employee selection—practice and legal—have been illu-
minated through employment litigation opinions. Granted, judges write opinions, but these judges
depend heavily on industrial-organizational (I-O) psychologists in framing these opinions and are
generous in their citations to our work and our testimony. So we see clear role for employment liti-
gation knowledge bases in this volume. In the context of this role, we have been asked to identify
some selection practices that are commonly at issue in employment discrimination cases. Further,
we have been asked to formulate some “principles” that the practitioner might keep in mind when
attempting to minimize legal risk when using such practices. Finally, we have been asked to identify
exemplar court cases related to each principle we have formulated. Here it is important to make a
distinction between our current use of the term “principle” as simply a basic and generally accepted
combination of science and practice and “the” Principles promulgated by the Society for Industrial
and Organizational Psychology (SIOP; 2003), which are currently in their fourth edition. The SIOP
Principles are not intended to illuminate exclusively legal issues, whereas our principles are. The
SIOP Principles do not ignore legal foundation but have much broader applications in mind and are
considerably greater in scope than the current “legal” principles. Neither are we intending to address
the “best” practice from a theoretical or empirical perspective. Instead, we are simply articulating
what might be seen as a “defensible” practice from the legal perspective. Nevertheless, the principles
we present are general rather than specific and are well documented through years of litigation.
Like other chapter authors in this handbook, we are constrained by allotted pages. Therefore,
we had to be selective in formulating principles and providing underlying case law. As a result, we
do not consider these principles to be exhaustive, or possibly even the most important, from a psy-
chometric or theoretical sense. Although we would consider these principles more rather than less
important than others that might have been chosen, we acknowledge that there may be other equally
important principles that we might have chosen to articulate: so many practices, so few pages.
The principles we have chosen to present either deal with a general practice (e.g., work analysis,
adverse impact), a statute (e.g., ADA, CRA, ADEA), a protected group (e.g., gender, race, age, dis-
ability), or some combination of those factors. We have not included principles that directly address
some statutes (e.g. the Equal Pay Act, Fair Labor Standards regulations); again, not because they are
trivial, unimportant, or not within the domain of I-O psychology, but because of page limitations
and distance from the general focus of this book—selection decisions. As a matter of convention

1 We are saddened by the loss of our good friend and colleague Frank J. Landy. He was a great mentor to us and a major
contributor to our field who will be missed more than words can possibly say. We dedicate this chapter to his memory. He
will live forever in our hearts.

627
628 Handbook of Employee Selection

for this chapter, we have chosen to use the term “work analysis” rather than the more historically
common term “job analysis.” This is in keeping with the evolution of the study of work behavior and
compatible with the structure and content of Chapter 4, this volume.
The employment selection practices we have chosen to address in the 14 principles below repre-
sent “terms and conditions” of employment as defined in key federal laws. Terms and conditions are
broadly defined to include recruitment, hiring, training, promotion, benefits, and termination. The
term “law” is broadly defined as including statutes, executive orders, and constitutional amendments.
Broader coverage of these Equal Employment Opportunity (or EEO) laws is provided by Gutman
(2000), Landy (2005), and Outtz (2009). As a group, these EEO laws proscribe discrimination in the
workplace on the basis of race or color, religion, sex, national origin, age, and/or disability.
Title VII of the Civil Rights Act of 1964 is the most critical of the statutes. It includes proscrip-
tions relating to the largest number of protected classes (race or color, religion, sex, and national
origin) and the full gamut of terms and conditions of work. Title VII also provided a critical model
for later statutes, including the Age Discrimination Act of 1967 (or ADEA) and the Americans
with Disabilities Act of 1990 (or ADA), which address the full gamut of terms and conditions for
age and disability, respectively. The ADA updates the Vocational Rehabilitation Act of 1973 (or
Rehab-73). Finally, the Equal Pay Act of 1963 (or EPA) is a more surgical statute that proscribes
only wage discrimination on the basis of sex, which is also proscribed in Title VII.
Among other relevant laws, Executive Order 11246 (or EO 11246) contains rules for contrac-
tors doing business with the federal government. These contractors must agree to provide affirma-
tive action for minorities and women if there are discrepancies between their representation in the
employer’s workforce and qualified workers in the labor pool. It is critical to note that the primary
requirement for affirmative action in EO 11246 is recruitment and outreach, and there is no require-
ment (in fact, it is illegal under Title VII and other laws) to make actual selection decisions only on
the basis of race or gender. The key constitutional amendments governing employment decisions are
the 5th and 14th Amendments, which apply to federal and state employers, respectively, and which
have been used extensively in so-called “reverse discrimination” lawsuits on the basis of race or sex
and provide overlapping coverage with Title VII. Additional overlapping coverage with Title VII
is represented in the 13th Amendment, which applies to the full gamut of terms and conditions of
employment as in Title VII, with the exception of adverse impact challenges. Critically, although
each of these amendments preceded Title VII, no one ever thought to use them in workplace dis-
crimination cases until after Title VII, not only for overlapping coverage, but also for gaps in cover-
age in Title VII that do not apply to constitutional claims, such as the minimal number of employees
required in Title VII (N = 15); no such minimum workforce size exists for constitutional claims.
Some final but important points to note are that each of the aforementioned statutes, except for
Rehab-73, requires that claims of discrimination be processed by the Equal Employment Opportunity
Commission (EEOC). In comparison, constitutional claims can go directly to federal district court.
Additionally, the Office of Contract Compliance Programs (OFCCP) of the Department of Labor
regulates EO 11246 in the nonfederal sector, the EEOC regulates EO 11246 in the federal sector,
and the EEOC and Merit Systems Protection Board (MSPB) share responsibilities for federal claims
in Title VII and the ADEA.
Because this entire volume is dedicated to the concept of “selection,” some of our principles may
appear to stray from that mark. We construe selection to encompass many personnel actions. These
include but are not limited to applicant screening, hiring, promotion, selection to training experi-
ences that are gateways to promotion, and layoffs (or deselection). We believe that selection should
be broadly and not narrowly construed because in each case a personnel decision is being made that
implies a direct or indirect movement (or lack thereof) of an applicant or a current employee. Finally,
in the context of a selection or promotion decision, we will occasionally comment on a co-relevant
issue such as compensation or training. It is not that we intend to turn the focus to compensation, but
that often compensation decisions are made at the same time as selection or promotion decisions and
deserve comment.
A Sampler of Legal Principles in Employment Selection 629

PRINCIPLES AND EXEMPLAR CASE LAW


1. Companies using Internet recruitment should understand the definition of “Applicant” and
be prepared to defend qualifications listed in the job description as well as procedures used
to screen applicants.
After publication of the Uniform Guidelines on Employment Selection Procedures (UGESP) in
1978, the EEOC answered related questions in 1979 (44 FR 11998). Among them, Q15 defined the
term “Applicant” as follows:

The precise definition of the term “applicant” depends upon the user’s recruitment and selection
procedures. The concept of an applicant is that of a person who has indicated an interest in being
considered for hiring, promotion, or other employment opportunities.

This definition was rendered obsolete by recent developments in Internet recruitment, because it
is easy to indicate interest in many jobs at the same time. Therefore, the EEOC appended Q15 on
March 4, 2004 (FR Doc 04-4090) to require more than an expression of interest. Additionally, the
OFCCP issued separate guidance on March 29, 2004 (41 CFR Part 60-1).
The OFCCP focuses on record keeping for EO 11246 (emphasizing recruitment), whereas the
UGESP focuses on selection (emphasizing adverse impact in Title VII). Because our focus is also
on selection, the following discussion relates to the EEOC modification to Q15. Readers should note
that SIOP’s Professional Practice Committee has provided a detailed evaluation of the EEOC and
OFCCP documents (Reynolds, 2004).
The EEOC answered five new questions. Two of the answers clarify that Internet methodolo-
gies are covered by laws such as Title VII and by the UGESP, and two others clarify that Internet
search criteria are subject to adverse impact rules and that tests administered online are subject to
the UGESP. The last and most critical of the five answers defines the term “Applicant” using the
following three prongs:

1. The employer has acted to fill a particular position.


2. The individual has followed the employer’s standard procedures for submitting
applications.
3. The individual has indicated an interest in the particular position.

For prong 1, the EEOC cites a company seeking two hires from among 200 recruits in a database. If
100 recruits respond to employer inquiries and 25 are interviewed, all 100 responders are applicants
and all 100 nonresponders are not applicants.
For prong 2, the EEOC cites two examples: (a) if employers require completion of an “online
profile,” only those completing the profile are applicants; and (b) if employers e-mail job seekers
requesting applications, only those who respond are applicants.
The prong 3 answer clarifies that individuals are not applicants if they (a) only post a resume, (b)
express interest in several potential jobs, or (c) follow prong 2 for a job other than those the employer
acts on (prong 1).

EXEMPLAR CASE LAW


Although there is yet no case law citing the new definition of “Applicant,” there is a relevant case
decided under prior rules (Parker v. University of Pennsylvania, 2004). Parker, a White male, filed
a Title VII reverse-discrimination claim for failure to consider his application for various jobs at
the university. Parker submitted his resume online. The university’s website advised individuals
to submit their resumes into the database, and if they so choose, to apply for specific job postings.
Parker received a letter stating that if appropriate, he would receive a letter within 30 days notifying
him of an interview, but that “otherwise this will be your only communication from us.” Summary
630 Handbook of Employee Selection

judgment was granted to the university because Parker never expressed interest in a specific posted
job. Parker also claimed adverse impact, but it was dismissed because he could provide no basis for
assuming he was harmed. It should be noted that as a standard university policy, recruiters routinely
searched the database and forwarded resumes to hiring officers of individuals who expressed inter-
est in a posted job and who met the minimum qualifications (MQs) for that job.
Interestingly, the district court judge ruled that Parker satisfied the four-prong prima facie test
for disparate treatment from McDonnell Douglas v. Green (1973) in that he (a) was a protected class
member, (b) applied for a position for which he was qualified, (c) was not hired, and (d) positions
remained open for which Parker was qualified. However, the judge also ruled that the university
articulated a legitimate nondiscriminatory reason for not hiring him and that Parker could not prove
that the articulation was a pretext for discrimination. Accordingly:

Defendant claims to have failed to consider Parker’s application because of its resume reviewing
procedures: The reason Penn did not hire Parker is because he had not applied for any specific position,
and it did not conduct any search of the resumes of non-applicants for positions being filled. (Parker v.
University of Pennsylvania, 2004)

There are several things to note about this ruling. Although under prior rules, Parker would
have been considered an applicant, on the basis of the new rules Parker should not be considered an
“Applicant” and therefore should lose at the prima facie level. Nevertheless, even nonapplicants may
file valid adverse impact claims if they are “chilled” by MQs such as education requirements that
disproportionately exclude minorities (Griggs v. Duke Power, 1971; Albemarle v. Moody, 1975) or
physical requirements such as height and weight that disproportionately exclude females (Dothard
v. Rawlinson, 1977).

2. Adverse impact may appear at any or all stages of a multistage selection or promotion
decision process. It should be examined at every point at which it may appear. If there is no
adverse impact at a particular step, there is no requirement that a selection procedure repre-
sented in that step be job-related. MQs represent a “stage” in a selection process and should
be considered as vulnerable to challenge as a traditional predictor would be.
The employment process is often more complex and staged than it might appear at first glance.
Consider the following steps that might characterize a hire:

1. Potential applicant reviews an application blank for minimal qualifications.


2. Completes and submits an application blank for a position opening.
3. Completes some preliminary screening tests.
4. Completes a more detailed series of tests.
5. Participates in interviews.
6. Accepts or rejects a job offer.

Each of these process steps implies a decision on the part of the employer. The MQs review assumes
that the applicant will self-select in or out of the subsequent process. The application blanks are
reviewed and a subset of applicants is invited to complete some preliminary screening tests. A
subset of those applicants is invited to complete a more extensive assessment test. A subset of those
applicants is invited to participate in interviews. A subset of those applicants receives job offers.
What has been referred to as a “bottom-line” analysis of adverse impact would compare those
individuals offered a position (step 6) with those individuals who applied (step 2). Nevertheless,
adverse impact could occur as early in the process as the consideration of MQs (i.e., the self-
selection step). If the MQs are not job related, then a case could be made that adverse impact should
be evaluated at step 1 (although it would be difficult to identify those potential applicants who
self-selected out of the process). Therefore, in a multistage hiring process (also known as a multiple
A Sampler of Legal Principles in Employment Selection 631

hurdle process), adverse impact should be considered at each stage of the multistage process and not
simply at the last step.

EXEMPLAR CASE LAW


This principle is supported by Connecticut v. Teal (1982), a landmark Supreme Court ruling. In
Teal, provisional promotions for four Black candidates to the position of welfare eligibility supervi-
sor were rescinded after they failed a written test. This written test was the first of several hurdles.
After all hurdles were completed, a higher percentage of Blacks (22.9%) than Whites (13.5%) were
promoted. In calculating adverse impact, the defendant excluded from the calculation the Blacks
who had failed the written examination. However, the passing rate for Blacks on written test was
only 68% relative to Whites. The defendant appealed to Sec.1607.4(C) of the UGESP, which, in
part, states that if “the total selection process does not have an adverse impact, the … [EEOC] …
will not expect a user to evaluate the individual components for adverse impact, or to validate such
individual components.” The defendant argued that the “bottom line” (i.e. final promotions) showed
no evidence of adverse impact. However, a slim 5-to-4 majority of the Supreme Court justices ruled
that Title VII provides for individual rights, stating, “It is clear that Congress never intended to
give an employer license to discriminate against some employees on the basis of race or sex merely
because he favorably treats other members of the employee’s group.”
Teal parallels Furnco v. Waters (1978), a disparate treatment case in which the Supreme Court
ruled against a company that stopped hiring Black applicants after its affirmative action goals were
achieved.
It should be noted that a plurality of four justices in Watson v. Fort Worth (1988) and then a
majority of five justices in Wards Cove v. Atonio (1989) interpreted Teal to mean that plaintiffs are
responsible for identifying the cause(s) of adverse impact. This “Identification” rule was upheld in
Sec. 105(B)(i) of the Civil Rights Act of 1991 (CRA-91), which states:

The complaining party shall demonstrate that each particular challenged employment practice causes
disparate impact, except that if the complaining party can demonstrate to the court that the elements
of a respondent’s decision making processes are not capable of separation for analysis, the decision
making process may be analyzed as one employment practice.

In summary, when combined with subsequent case law and CRA-91, Teal implies that employ-
ers should not only examine individual components of a multiple hurdle for adverse impact but
also offer transparency on the components of the selection process. If they do not adopt this
transparency, they may be faced with the prospect of defending an entire selection process if there
is bottom-line adverse impact.

3. The search for alternatives to procedures that might result in or have resulted in lesser
adverse impact is split between defendants and plaintiffs. At the outset of a selection project,
defendants should consider alternatives that will meet business needs, be job-related and will
minimize adverse impact. After a demonstration of job-relatedness, the plaintiffs have the
burden of demonstrating that there is an alternative procedure that might retain the same
levels or similar levels of job-relatedness but shows lesser adverse impact.
In years past, for both of these stages (initial and postjob relatedness), the search for alternatives
may have been considered an onerous and overly burdensome task. However, the I-O literature has
expanded significantly over the past decade with regard to decision-making tools and strategies for
assessing the tradeoffs between validity and adverse impact. As an example, De Corte, Lievens,
and Sackett (2007) investigated the possibility of combining predictors to achieve optimal tradeoffs
between selection quality and adverse impact. This is a follow-up to earlier research in this area.
De Corte (1999) and Sackett and Ellingson (1997) investigated the effects of forming predictor
632 Handbook of Employee Selection

composites on adverse impact. Other researchers have used meta-analytic techniques to forecast
the likely outcomes of combining high and low adverse impact predictors in a selection proce-
dure (Schmitt, Rogers, Chan, Sheppard, & Jennings, 1997; Bobko, Roth, and Potosky, 1999). Thus
employers and their consultants (in-house or external) should become familiar with this literature to
meet their burden of demonstrating a search for alternatives at the outset of a selection project
After the fact, once adverse impact has been shown and defendants have adequately made the
job-related argument, it is the plaintiffs’ burden to demonstrate a feasible alternative to that job-
related process that would have lesser adverse impact while retaining job relatedness. One of the
interesting aspects of this requirement is that plaintiffs can meet their burden by showing that there
is an alternative method of using the selection procedure (e.g. an alternative method of weighting
predictor components) that would have less adverse impact without affecting validity. Section 5(G)
of the UGESP states the alternative method of use requirement.
The evidence of the validity and utility of a selection procedure should support the method the
user chooses for operational use of the procedure, if that method of use has a greater adverse impact
than another method of use.
This principle was established in Albemarle v. Moody (1975). The Supreme Court ruled that if
job-relatedness is proven, the plaintiff may prove pretext by showing that “other tests or selection
devices, without a similarly undesirable racial effect, would also serve the employer’s legitimate
interest in ‘efficient and trustworthy workmanship’.” This principle was subsequently written into
the UGESP and later codified in CRA-91.

EXEMPLAR CASE LAW


Over the years, there have been unsuccessful attempts to prove job-related alternatives with lesser
adverse impact (e.g., Bridgeport Guardians v. City of Bridgeport, 1991). However, there are three
2006 district court rulings in which the principle was supported. These rulings are discussed in
detail by Outtz (2007) and are briefly summarized below.
In Ricci v. Destefano (2006), the New Haven Civil Service Board (CSB) refused to certify pro-
motional tests for firefighters to lieutenant and captain because it adversely impacted minorities,
seeking instead input on alternative methods (i.e., assessment centers) with lesser adverse impact.
The plaintiffs (17 Whites, 1 Hispanic) charged disparate treatment (race as the sole reason for not
certifying the tests). The judge ruled that employers are not “required to await a lawsuit before
voluntarily implementing measures with less discriminatory impact.” The judge also ruled that all
applicants were equally treated irrespective of their race and that all applicants will be required to
participate in a new test or selection procedure.
In Bradley v. City of Lynn (2006), a cognitive test served as the sole basis for selecting entry-
level firefighters. Adverse impact was proven and the judge ruled there was insufficient evidence of
job-relatedness. The judge also ruled that there were several equally valid alternatives with lesser
impact, most notably the combination of cognitive and physical abilities (e.g., Brunet v. City of
Columbus, 1995), as well as personality (work style) and biodata devices, which when combined
with cognitive tests should produce less adverse impact than cognitive tests alone. The judge ruled,
“while none of these approaches alone provides the silver bullet, these other noncognitive tests
operate to reduce the disparate impact of the written cognitive examination.”
In Johnson v. City of Memphis (2006), the judge ruled that a promotion exam to police sergeant
was job-related, but ruled for the plaintiffs because of existence of valid alternatives with lesser
impact. Critical to this case was that the city had previously used a valid promotion test in 1996
and deviated from its prior procedure in the challenged test. The judge ruled, “It is of considerable
significance that the City had achieved a successful promotional program in 1996 and yet failed to
build upon that success.”
In short, Ricci implies that employers are free to seek alternatives when they find adverse impact
and Lynn implies that a combination of abilities is as if not more predictive of job success than a
A Sampler of Legal Principles in Employment Selection 633

single ability alone. Johnson is the most intriguing ruling because it is the only one in which alterna-
tives decided the case. As a caveat, all three are district court rulings, and reader should stay abreast
of these and related rulings as they are appealed.

4. Work analysis should precede any selection process. The shelf life of a work analysis
will normally run 5–8 years, in the absence of any information that the job has changed
substantially.
This is one point about which there is little disagreement. The SIOP Principles, the APA
Standards, and the UGESP all clearly indicate that a demonstration of the validity of a selection
device begins with a thorough analysis of the job if the criterion of interest is job performance.
Nevertheless, if the criterion of interest is turnover or absenteeism, a full work analysis is not
necessary.
Work analysis refers to a detailed examination of a job, which may include the determination of
what is done, how it is done, the context in which it is done (including the strategic importance of
that job and its essential functions to organizational success), as well as the knowledge, skills, abili-
ties and other characteristics (KSAOs) required to perform the job successfully. The most common
“O” characteristic is some measure of personality. Work analysis is also important in demonstrating
criterion-related validity because it establishes the importance or relevance of the criterion measure.
Finally, work analysis is important for demonstrating content validity because it establishes fidelity
of the test content to the job content.
Proper work analysis is a critical requirement to meet professional standards, particularly for
content-oriented validation. Work analysis should establish the linkage between important func-
tions/characteristics of the job and KSAOs, and it should form the basis for linking a selection
device to those KSAOs, thus linking the selection device to the job. The importance of work analy-
sis to validation, particularly content validation, is clearly established in the UGESP. Section 14(A)
provides the following guidance regarding the importance of work analysis:

Validity studies should be based on review of information about the job. Any validity study should be
based upon a review of information about the job for which the selection procedure is to be used. The
review should include a work analysis. Section 14 (C2) is also instructive.

Work analysis for content validity (from Section 1607.14(C)(2)):

There should be a work analysis which includes an analysis of the important work behavior(s) required
for successful performance and their relative importance and, if the behavior results in work product(s),
an analysis of the work product(s). Any work analysis should focus on the work behavior(s) and the
tasks associated with them. If work behavior(s) are not observable, the work analysis should identify
and analyze those aspects of the behavior(s) that can be observed and the observed work products. The
work behavior(s) selected for measurement should be critical work behavior(s) and/or important work
behavior(s) constituting most of the job.

EXEMPLAR CASE LAW


The role of work analysis in criterion validity was established in Griggs v. Duke Power (1971)
and Albemarle v. Moody (1975). Duke Power attempted to defend cognitive tests on grounds
that they were “professionally developed,” yet they conducted no validity study. On the basis
of the 1966 EEOC Guidelines, the Supreme Court ruled that professionally developed tests
must:

[F]airly measures the knowledge or skills required by the particular job or class of jobs which the
applicant seeks, or which fairly affords the employer a chance to measure the applicant’s ability to
perform a particular job or class of jobs. (Griggs v. Duke Power, 1971, footnote 9)
634 Handbook of Employee Selection

The Albemarle Paper Company attempted to defend cognitive tests with a deficient criterion
validity study conducted one month prior to trial. The most critical deficiency was absence of work
analysis. In Moody v. Albemarle (1973), the 4th Circuit ruled:

In developing criteria of job performance by which to ascertain the validity of its tests, Albemarle failed
to engage in any work analysis. Instead, test results were compared with possibly subjective ratings of
supervisors who were given a vague standard by which to judge job performance. Other courts have
expressed skepticism about the value of such ill-defined supervisor appraisals.

The Supreme Court affirmed the 4th Circuit ruling and the Griggs and Albemarle rulings formed
the basis of the 1978 UGESP.
The role of work analysis in content validity was established in Guardians v. Civil Service (1980)
and has since been followed by every circuit that has ruled on content validity. Relevant cases for
content validity are discussed under Principle 5. On the basis of Guardians and other cases, it is
axiomatic that content validity cannot be established absent work analysis.
Case law also reveals that work analysis is critical in establishing MQs. For example, in
Brunet v. City of Columbus (1993), criterion validity data showed that cognitive skills were
more predictive of fi refighter performance than physical skills. Only physical skills produced
adverse impact (for females). Nevertheless, physical skills were justified based on work analysis
data showing that they distinguished between average and superior fi refighters, whereas cogni-
tive skills do not. Additionally, as we will discuss under Principle 5, work analysis is critical for
establishing transportability, and as we will discuss under Principle 8, work analysis is also critical
for establishing cutoff scores.
Finally, the concept of a “shelf-life” of a work analysis was endorsed in Bradley v. City of Lynn
(2006), in which a 1992 work analysis was offered to support a 2004 criterion validity study. The
judge rejected the defendant’s argument that the work analysis was sufficiently current to inform
test development. Experts for the plaintiffs and defendants agreed that a work analysis should be
at least revisited every 5–8 years. One reason for revisiting a work analysis every 5–8 years is that
although the basic structure of a job (e.g. the major duties and responsibilities) may not change over
that period, the relative emphasis placed on specific responsibilities and related work behaviors job
may change.

5. Validation evidence can come in many forms and is not limited to one particular approach.
In general, the greater number of converging sources of validation evidence, the better.
In the period immediately following passage of the Civil Rights Act of 1964, many considered
criterion-related validity the most appropriate design for determining job-relatedness of assessment
devices or processes. This was mainly due to the 1970 EEOC Guidelines requiring that employ-
ment tests be “predictive of or significantly correlated with important elements of work behavior.”
Subsequently, the 1978 UGESP added content and construct-related validation to the arsenal of
tools for demonstrating job-relatedness, thus finally cementing the “Trinitarian” view of validation,
which had been introduced in psychometric literature the 1950s. However, Sec. 1607.C (1) contained
the following warning on “appropriateness of content validity studies”:

A selection procedure based on inferences about mental processes cannot be supported solely or
primarily on the basis of content validity. Thus, a content strategy is not appropriate for demonstrating
the validity of selection procedures that purport to measure traits or constructs such as intelligence,
aptitude, personality, common sense, judgment, leadership, and spatial ability.

For a short time, this passage perpetuated the myth that criterion-related validity is superior
to content-related validity. By the mid-1980s there was a growing consensus that discussions of
“acceptable” models for validation were inappropriate because virtually any validation evidence,
regardless of how gathered, is possible evidence of job relatedness on its merits rather than by
A Sampler of Legal Principles in Employment Selection 635

its name. This “Unitarian” view is now widely accepted by I-O psychologists and the courts.
Furthermore, in recent years, there have been several new and powerful techniques within and
beyond the “Trinity,” including:

• Validity transport: Transporting criterion-related validity to a novel setting where the jobs
in question can be seen to be highly similar
• Synthetic validity: Aggregation of data from different settings that address subsets of
predictor and/or criterion variables
• Construct validation: The use of powerful statistical techniques such as structural equation
modeling or path analysis to test theories of predictor-criterion associations
• Meta-analysis: Collecting validity coefficients from many settings and correcting for
statistical artifacts and possible moderator variables to estimate the mean and variance
of the validity coefficients after correction for relevant artifacts
• Validity generalization: Demonstrating that the confidence interval for the true validity
coefficient does not include r =.00 by assembling data from a wide variety of job titles,
variations on a predictor type, and criteria

Therefore, when evaluating job-relatedness of assessment devices or processes, it is best to col-


lect evidence from as many different sources or designs as feasible, without considering one particu-
lar design to be the best or only design. In general, the more evidence that is collected, the greater
will be one’s confidence in proving job-relatedness.

EXEMPLAR CASE LAW


The 1966 EEOC Guidelines were supported by the Supreme Court in Griggs v. Duke Power (1971)
and Albemarle v. Moody (1975). However, after the UGESP in 1978, courts immediately supported
content validity for inferences about mental processes. The gold standard was established by the
2nd Circuit Guardians v. Civil Service (1980), which outlined five steps for content validity.

1. Suitable work analysis


2. Reasonable competence in test construction
3. Test content related to job content
4. Test content representative of job content
5. Scoring systems selecting applicants who are likely to be better job performers

In Guardians, although the defendants could not support a cut score or rank ordering on the
basis of weakness in steps 2 and 3 (see Principle 8 below), they were able to use a content validation
strategy for demonstrating job relatedness. In Gillespie v. Wisconsin (1985), the 7th Circuit ruled
“neither the Uniform Guidelines nor the psychological literature express a blanket preference for
criterion-related validity” and in Police Officers v. Columbus, Ohio (1990), the 6th Circuit, citing
the 1987 SIOP Principles ruled that it is critical that “selection instruments measure a substantial
and important part of the job reliably, and provide adequate discrimination in the score ranges
involved.” Content validity has been supported in many subsequent cases, including the 2nd Circuit
in Gulino v. NY State (2006).
Attempts to support job-relatedness on the basis of meta-analysis alone have often failed (EEOC
v. Atlas, 1989 and Lewis v. Chicago, 2006). Nevertheless, meta-analysis has been credited as sup-
plemental to local validity studies (Adams v. Chicago, 1996, and Williams v. Ford Motor, 1999).
Also, transportability and upward score adjustments were supported in Bernard v. Gulf Oil (1989),
in which criterion validity was found for two of five jobs, and the 5th Circuit found “sufficient simi-
larity in the skills required” for all five jobs. The Court also ruled “the adjusted figures … are better
estimates of validity” and uncorrected coefficients “underestimate” validity of the tests.
636 Handbook of Employee Selection

In summary, although any of the “Trinity” approaches alone are likely sufficient for job-
relatedness, the body of case law as a whole suggests the “Unitarian” approach may have greater
acceptance in court and that new methods of supporting validity are at least acceptable supplements
to traditional methods.

6. In a selection context, criterion information, particularly in the form of performance


ratings, is as important in validation as is predictor information.
For well over 50 years, I-O psychology has recognized the importance of the criterion in a human
resource (HR) system. The criterion issue is invoked at the preliminary stages of selection in the
form of establishing MQs or desired qualifications for a position in question. MQs imply acceptable
performance either in the form of a lower bound for issues such as licensing or certification or at the
upper end of a desired performance distribution with respect to an organization’s view of contem-
poraneous challenges to organizational viability.
Criterion information also represents the second “half” of the criterion-related validation model.
Most often, criterion information appears in the form of supervisor ratings of performance. As is
the case with predictors, the credibility of criterion information should also be established through
empirical analysis, content-related analysis, or both when possible.
Performance evaluations often play a significant role in nonentry selection decisions such as
promotions, training assignments, job changes, and reductions in force. To the extent that criterion
information becomes predictor information (e.g., performance ratings are at least a partial foundation
for a personnel decision such as promotion or downsizing), then that criterion information should be
considered part of the selection process and analyzed in such a way that permits the inference that
this performance information was job-related, psychometrically credible, and fair. Examinations
of criterion information often involve not only substance issues, but also process issues such as the
right of an employee to “appeal” information as well as the extent to which those providing criterion
information are knowledgeable and competent judges of the employee’s performance.

EXEMPLAR CASE LAW


Performance appraisal was a key feature in two landmark Supreme Court rulings on adverse impact:
Albemarle v. Moody (1975), a hiring case, and Watson v. Fort Worth Bank (1988). It was also a key
feature in Meacham v. Knolls (2006). As noted elsewhere in this chapter, Albemarle occurred in the
wake of Griggs v. Duke Power (1971) (Principle 5). Watson was a precursor to Wards Cove v. Atonio
(1989) and the Civil Rights Act of 1991 (Principle 11).
There were multiple deficiencies in the criterion validity study conducted by the defendant in
Albemarle v. Moody (1975). Chief among them was the failure to establish a reliable and valid crite-
rion against which test scores were compared. In the words of the Supreme Court, “The study com-
pared test scores with subjective supervisorial rankings.” Although they allow the use of supervisorial
rankings in test validation, the Guidelines quite plainly contemplate that the rankings will be elicited
with far more care than was demonstrated here. Albemarle’s supervisors were asked to rank employ-
ees by a “standard” that was extremely vague and fatally open to divergent interpretations. As previ-
ously noted, each “job grouping” contained several different jobs, and the supervisors were asked, in
each grouping, to “determine which ones [employees] they felt irrespective of the job that they were
actually doing, but in their respective jobs, did a better job than the person they were rating against.”
In Watson, the main challenge was to subjective ratings of job performance. Clara Watson also
challenged subjective ratings of interview performance and past experience. The main issue was
subjective causes of adverse impact. The Supreme Court unanimously ruled there can be subjective
causes of adverse impact in an 8–0 decision.
In Meacham, 30 of 31 employees laid off in an involuntary reduction in force (IRIF) were over age 40. The
layoffs were based entirely on performance appraisal. In the words of the 2nd Circuit, the key RIF principles
were “subjective assessments of criticality and flexibility” of employee skills (Meacham v. Knolls, 2006).
A Sampler of Legal Principles in Employment Selection 637

In summary, irrespective of whether the issue is hiring, promotion, termination, or the criterion
in a criterion validity study, the Albemarle, Watson, and Meacham rulings carry inherent warnings
that performance appraisals should be based on work analysis and that the methodology used to
appraise worker performance meet acceptable psychometric principles.

7. The relevant standard for the “significance” of a criterion-related validity coefficient is


p = .05 and/or a confidence interval that does not include p = .00 within its boundaries.
It should be noted that satisfying one or both of the criteria above does not in and of itself consti-
tute a necessary and sufficient condition for establishing the validity of a predictor. Several factors
must be considered. First, statistical significance does not necessarily constitute practical signifi-
cance. A common argument used by plaintiffs with regard to weak (low) validity coefficients that
are statistically significant is that almost any validity coefficient will be statistically significant if the
sample size is sufficiently large. However, courts will generally accept the .05 level of significance
as indicating some level of criterion-related validity. The issue then becomes whether, all things
being equal, the level of the relationship justifies the operational use of the predictor. Section 14(B6)
of the UGESP is quite clear on this point.
Users should evaluate each selection procedure to ensure that it is appropriate for operational
use, including establishment of cut-off scores or rank-ordering. Generally, if other factors remain
the same, the greater the relationship (e.g., correlation coefficient) between performance on a selec-
tion procedure and one or more criteria of performance on the job and the greater the importance
and number of aspects of job performance covered by the criteria, the more likely it is that the pro-
cedure will be appropriate for use. Reliance upon a selection procedure that is significantly related
to a criterion measure, but based on a study involving a large number of subjects and has a low cor-
relation coefficient will be subject to close review if it has a large adverse impact.
In essence, the magnitude of the validity coefficient must be considered in light of the level of
adverse impact and the nature of the criterion measure to establish legal defensibility.

EXEMPLAR CASE LAW


The principle that validity coefficients must be statistically significant was established by the
Supreme Court in Griggs v. Duke Power (1971) and Albemarle v. Moody (1975). In both cases,
the Supreme Court cited Section 1607.35 of the 1970 EEOC Guidelines, noting, for example, in
Griggs:

These guidelines demand that employers using tests have available data demonstrating that the test is
predictive of or significantly correlated with important elements of work behavior which comprise or
are relevant to the job or jobs for which candidates are being evaluated.

However, the Supreme Court did not specify an alpha level for significance, leaving this issue to the
lower courts.
The 5th Circuit was the first appeals court to pin down the alpha = .05 level for validity coeffi-
cient in U.S. v. Georgia Power (1973). Citing the “95% Guideline” from the 1970 EEOC Guidelines,
the 5th Circuit ruled that “this guideline requires it be shown that there was no more than 1 chance
out of 20 that job relatedness appeared by coincidence.” The 5th Circuit struck down the chal-
lenged tests because the expert’s validation study was deemed a product of “chance not science.”
Subsequently, in Boston NAACP v. Beecher (1974), the 1st Circuit struck down a nonsignificant
validity coefficient of r = +.078, ruling that “were this statistically significant, it would indicate
that the test grade ‘explained only approximately 0.6% of all observed variance in fire fighters’
on-the-job performance.’” Because the value was nonsignificant, it was not necessary for the Court
to address any corrections for unreliability or range restriction. All courts have since generally
accepted alpha = .05 as a minimum requirement for the validity coefficient.
638 Handbook of Employee Selection

The Beecher Court was also the first appeals court to distinguish between statistical significance
(“likely to occur by chance in fewer than five of one hundred similar cases”) and practical signifi-
cance (“correlation of ± 0.3 or higher, thus explaining 9% or more of the observed variation”). Similar
distinctions were made by the 5th Circuit in Ensley NAACP v. Seibels (1980), by the 9th Circuit in
Clady v. Los Angeles (1985), and the 11th Circuit in Hamer v. Atlanta (1989). The Clady Court also
noted that “the greater the test’s adverse impact, the higher the correlation which will be required.”
More recently, in Lanning v. Septa (1998), the Eastern District Court of Pennsylvania accepted
expert testimony that “practical significance is found through regression analysis and expectancy
tables” as opposed to corrected correlations that exceed the r=.30 standard. Using a regression
analysis with multiple predictors and statistical controls, the expert demonstrated that scores on an
aerobic test above the minimum cutoff were associated with a 10% increase in arrests for serious
crimes and a 4% increase in arrests for all crimes for transit authority police officers. As noted
under Principle 8 (immediately below), the 3rd Circuit ultimately affirmed the district court in
Lanning, but in doing so, it established a new standard for cut-off scores.

8. Cut scores should be based on a rational foundation that may or may not include empirical
analyses.
There are several types of cut scores. The first is a nominal cut score. This is a value often
established as an arbitrary pass-fail score in a multiple hurdle system. It designates the threshold
for continuing to the next stage in the process. The second is known as the effective cut score. This
is the score below which no one is hired or appointed. It is not predetermined but simply identi-
fied after-the-fact. This is most often seen in strict rank-order appointment where candidates are
appointed from the top scorer down until all positions are filled. The score of the individual filling
the last opening is the effective cut score. The third type of cut score is the critical cut score. This is
a value that has been chosen to represent the score below which an applicant is thought to fall below
a minimal standard for effective job performance. Often, a critical cut score is tied to safety or well
being of the candidate or the public/customer base.
Nominal cut scores are often set by combining practical and theoretical issues. A practical con-
sideration might be the cost of testing. Assume there are 1,000 applicants for 20 openings and there
will be a multistage assessment process. The employer might want to limit the cost of testing by
eliminating many individuals at the first stage of the process. In this case, the cut score can be set
by looking at the selection ratio and deriving some estimate of acceptable performance expectations
for candidates.
There is no need to “set” an effective cut score. It is an axiomatic score determined solely by
the score of the last person hired or appointed in a rank-ordered list of candidates. With respect
to a critical cut score, there are several options available, but all require some consideration of a
criterion level that distinguishes between competent and incompetent or minimally qualified and
less-than-minimally qualified. Subject matter experts can estimate requisite predictor performance
and associated criterion performance. This, of course, would benefit greatly from a comprehensive
and accurate work analysis. Incumbent populations can be used to identify predictor scores associ-
ated with minimally acceptable performance. Regardless of which techniques are used to set the
critical cut score, there should be a rational foundation for its choice, or an empirical one based on
a work analysis. Although it is axiomatic that a single score (i.e., a cut score) cannot be “validated,”
it is possible to produce evidence that provides some confidence that a critical cut score is related to
an anticipated real-world outcome.

EXEMPLAR CASE LAW


Until the 3rd Circuit’s ruling in Lanning v. Septa (1999), all courts relied on Section 1607.5(H) of the
UGESP, which states, “Where cut-off scores are used, they should normally be set so as to be reason-
able and consistent with normal expectations of acceptable proficiency within the work force.”
A Sampler of Legal Principles in Employment Selection 639

For example, in Guardians v. Civil Service (1980) and Gillespie v. Wisconsin (1985), two early
cases discussed under Principle 5, the 2nd and 7th Circuits ruled that

An employer may establish a justifiable reason for a cut-off score by, for example, using a professional
estimate of the requisite ability levels, or, at the very least by analyzing the test results to locate a logical
break-point in the distribution of scores.

Both studies employed content validity strategies. However, the defendants lost on cut-off score in
Guardians, but won in Gillespie.
In Guardians, the defendants made selections on strict rank-ordering, and defined the cutoff at
below the point that the last applicant was selected (effective cut score). The 2nd Circuit ruled:

If it had been shown that the exam measures ability with sufficient differentiating power to justify
rank-ordering, it would have been valid to set the cutoff score at the point where rank-ordering filled
the City’s needs . . . . But the City can make no such claim, since it never established a valid basis for
rank-ordering.

However, in Gillespie, the 7th Circuit ruled in favor of the defendant on the basis of two factors:
(a) establishment of interrater reliability and (b) the cutoff was selected to permit interviewing “as
many minority candidates as possible while at the same time assuring that the candidates possessed
the minimum skills necessary to perform” the job. On rank ordering, the 7th Circuit cited verbatim
from Q62 of the “Questions and Answers” of the UGESP.

Use of a selection procedure on a ranking basis may be supported by content validity if there is evidence
from work analysis or other empirical data that what is measured by the selection procedure is associated
with differences in levels of job performance.

Therefore, up until the Lanning ruling, there was little dispute among the circuit courts on either
cutoff scores or rank ordering.
In Lanning, the 3rd Circuit interpreted the terms “job-related” and “consistent with business
necessity” from the Civil Rights Act of 1991 as implying separate standards and that the “business
necessity” part implied proof that the cut-off score “measures the minimum qualifications neces-
sary for successful performance of the job in question.” This interpretation was explicitly rejected
in Bew v. City of Chicago (2001), where the 7th Circuit ruled, “Griggs does not distinguish business
necessity and job relatedness as two separate standards.” Although most of the other circuits have
continued to follow precedents from Guardians and Gillespie, in one recent case (Isabel v. City of
Memphis, 2005), the 6th Circuit adopted the Lanning standard.
In summary, the basis for establishing cut-off scores is evolving, and interested readers need to
keep abreast of rulings in this area.

9. An optimal balance between job-relatedness and reduction of adverse impact should be


struck when possible.
The key point here is the definition of optimal. Several researchers have shown that optimum
validity depends upon the manner in which job performance is defined or the specific aspect(s) of
performance that are most important to an employer (Hattrup, Rock, & Scalia, 1997; Murphy &
Shiarella, 1997). Murphy and Shiarella (1997) proposed that weighting of predictors and criteria
provides a better understanding of the relationship between selection and job performance. They
show that the validity of a predictor composite can vary substantially depending upon the weight
given to predictors and criterion measures. The 95% confidence interval for the validity coefficients
for various weightings varied widely from as low as .20 to as high as .78.
Another challenging aspect of balancing job-relatedness and adverse impact is the fact that job
relatedness can be defined via different levels of analysis. When individual productivity and task
640 Handbook of Employee Selection

performance are the focus, cognitive ability tests (instruments that typically have high adverse impact)
result in the highest validity for a single predictor. However, if overall organizational effectiveness is
the objective, factors such as legal defensibility, strategic positioning within the marketplace, employee
performance, workforce diversity, and corporate social responsibility must all be considered.
The bottom line is that the organization’s mission and values dictate what constitutes acceptable
performance and, ultimately, the best methods of achieving that performance. This argument was
made quite forcefully in the University of Michigan Law School admission case. The University of
Michigan took the position that its objective was to admit a first-year class that collectively advanced
the Law School’s overall mission as opposed to simply admitting each student with the highest prob-
ability of achieving a given law school grade point average (GPA). Whether one agrees with the
school’s stated mission or not, once formulated, that mission basically defines job-relatedness. It
also drives the types of strategies that are most likely to achieve an optimum balance between job
relatedness and reduction of adverse impact.

EXEMPLAR CASE LAW


Two methods of reducing adverse impact have found favor among the courts: (a) eliminating test
battery components that are most likely to produce adverse impact and (b) using other factors (e.g.,
diversity) in the selection process that are more likely to benefit minorities. Hayden v. County of
Nassau (1999) illustrates the first method and Grutter v. Bollinger (2003) illustrates the second.
After losing in prior litigation, and facing a consent decree, Nassau County New York was moti-
vated to develop a hiring exam for police officers that reduced or eliminated adverse impact. A
25-component test battery was initially administered to 25,000 candidates. The goal of reducing
(but not eliminating) adverse impact was accomplished by eliminating scores on 16 of the 25 com-
ponents. Another configuration with even less adverse impact (and even fewer components) was
rejected because it had lower validity. This process was challenged by 68 unsuccessful nonminor-
ity candidates who would have benefited had all 25 components been maintained. The 2nd Circuit
favored Nassau County, ruling, “the intent to remedy the disparate impact of the prior exams is not
equivalent to an intent to discriminate against non-minority applicants.”
In Grutter, the Supreme Court ruled (under 14th Amendment) that (a) diversity is a compelling
government interest and (b) the method used by the law school was narrowly tailored to that interest.
At the same time, the Supreme Court struck down the Michigan undergraduate diversity plan for not
being narrowly tailored (Gratz v. Bollinger, 2003). The Grutter ruling was based on Justice Powell’s
1978 ruling in Regents v. Bakke. Between Bakke and Grutter, several courts upheld preference for
minorities based on diversity in police forces, including Detroit Police v. Young (1979) and Talbert
v. City of Richmond (1981). After Grutter, the 7th Circuit upheld preference for Black police officers
in a promotion process, ruling:

It seems to us that there is an even more compelling need for diversity in a large metropolitan police
force charged with protecting a racially and ethnically divided major American city like Chicago.
Under the Grutter standards, we hold, the city of Chicago has set out a compelling operational need for
a diverse police department.

The 7th Circuit then upheld out-of-rank promotions of minority applicants on grounds that it was
narrowly tailored.
It should be noted that the key ingredient in a successful defendants’ diversity argument is prov-
ing to the satisfaction of the courts that diversity is an important job-related factor that furthers the
mission of the organization. Diversity for diversity’s sake, therefore, will not work. For example,
in Lomack v. City of Newark (2006), a newly elected mayor transferred firefighters to and from
various posts so that all 108 fire stations were racially diverse. The mayor felt there were “educa-
tional and sociological” benefits for such a “rainbow,” but the 3rd Circuit saw it as “outright racial
A Sampler of Legal Principles in Employment Selection 641

balancing.” Similar rulings were rendered in Biondo v. City of Chicago (2004) and Rudin v. Lincoln
Land Community College (2005).

10. Many of the same practices that define responsible selection define responsible downsizing
efforts.
Downsizing (also known as RIFs, or reductions-in-force) represents a special instance of selec-
tion. Individuals are selected to “stay” in an organization (or conversely to “leave” an organization).
It might be thought of as “deselection.” It represents the mirror image of the selection scenario.
However, there are some unique aspects to downsizing. In the context of litigation, the employer
must be prepared to show that the reduction in force was not a pretext for simply eliminating mem-
bers of a protected group (e.g., female, minority, and/or older employees). Thus, the reduction in
force should be tied to a larger business plan that documents the need for the reduction. This busi-
ness plan should also support why certain jobs, departments, or divisions have been targeted for the
force reduction.
As is the case in many selection/promotion scenarios, the employer should identify the par-
ticular knowledge, skills, abilities, or other personal characteristics that are central to the decision
about who to lay off. These variables might include abilities and skills needed for future vitality of
the organization, critical experience bases, customer contacts, past performance, and future output
responsibilities. Many of these variables will be illuminated by a current or future-oriented work
analysis. In addition, rating scales used to evaluate current employees (e.g. performance ratings,
skill ratings, knowledge ratings) should conform to accepted professional and scientific standards
for the use of such rating devices.
Unlike selection scenarios, it is difficult to “validate” a downsizing process because there is no
obvious criterion for success beyond simple organizational survival. Nevertheless, just as in the case
of selection, it is important to develop a theory of the downsizing process and its desired results.
This theory would include some statement of requisite KSAOs for going forward as well as the
organizational need for downsizing.

EXEMPLAR CASE LAW


Most RIF cases are age-based and involve disparate treatment charges. However, some are race-
based (e.g., Jackson v. FedEx, 2008). Furthermore, after the Supreme Court’s ruling in Smith v. City
of Jackson (2005), which clarified that adverse impact is a valid ADEA claim, we can expect more
age-based adverse impact cases even in RIFs (e.g., Meacham v. Knoll, 2006). In a meta-analysis of
115 district court cases involving disparate treatment based on age, Wingate, Thornton, McIntyre,
and Frame (2003) found that 73% of the rulings were summary judgment for defendants (SJDs).
Factors associated with SJD were use of (a) performance appraisal, (b) organizational review, (c)
employee assessment and selection methods, and (d) a concrete layoff plan. As a general principle,
employers establishing and following sound concrete layoff policies will likely prevail.
Cases with favorable rulings for plaintiffs reveal key mistakes made by employers. The most
obvious mistake is weakness in the layoff plan. For example, in Zuniga v. Boeing (2005), the defen-
dant had a concrete layoff plan that relied heavily on performance evaluations. However, the plaintiff
defeated SJD by proving that the evaluations he received during the layoff process were inconsistent
with performance evaluations he received shortly before the plan was established.
Employers have made mistakes in reassignment after the RIF. For example, in Berndt v. Kaiser
(1986), Berndt could not compete for other jobs in which he was arguably more qualified than
younger employees who were afforded this opportunity. Similarly, in Zaccagnini v. Levy (2003),
older truck drivers were not considered for newly available jobs that younger drivers received.
There are also cases where reassignment is part of the RIF plan, but the definition of “similarly
situated” older versus younger employees is narrowly construed. For example, in Ercegovich v.
Goodyear (1998), three HR positions were eliminated, and the oldest employee was not reassigned.
642 Handbook of Employee Selection

The defendant argued that the three employees were not similarly situated because they performed
different job functions, but the 6th Circuit ruled for the plaintiff because the three eliminated jobs
involved common knowledge, skills, and abilities.
Plaintiffs have also successfully used direct evidence of stray remarks by supervisors. For exam-
ple, in Starceski v. Westinghouse (1995), a supervisor admitted he stated that “it was actually a fact
that older engineers … were going to be let go” (see also Madel v. FCI Marketing, 1997).
Finally, employers must strictly follow eight explicit requirements in the Older Workers Benefit
Protection Act of 1990 if they offer enhanced benefits to older employees who accept early retire-
ment in exchange for waiver of the right to sue. These requirements are that the (a) the waiver docu-
ment is clearly written and easily understood, (b) the waiver document cites the ADEA, (c) it only
affects rights prior to the effective date of the waiver, (d) it offers enhanced benefits, (e) it advises
employees of their right to seek counsel, (f) it provides 21 days for individuals and 45 days for
groups to make a decision, (g) it is revocable within 7 days of signing, and (h) it provides extensive
information about who is affected if a group is involved. In Oubre v. Entergy (1998), the Supreme
Court made it clear it will strictly construe these eight requirements. Therefore, employers should
know them.

11. In the employment context, disparate treatment may appear in both obvious and nuanced
ways. In addition, disparate treatment based on race and gender may appear very differently
than disparate treatment based on age or disability.
A now classic mechanism by which disparate treatment based on race or gender may occur is in
steering racial minorities or women into dead-end, lower-paying positions outside of the primary
functions within the organization.
Subtle instances of disparate treatment based on gender can also occur in the initial salary offer
made to female applicants when compared to similarly situated males. In such a scenario, a female
applicant with the same qualifications as a male counterpart may be started out in a lower pay
grade. This difference can then carry over throughout the employee’s tenure, although salary raises
and job promotion may be similar from that point on. Unfortunately the damage may already have
been done despite the fact that the two employees appear to have been treated fairly after hire.
Job titles, advancement rates, and salary growth are often used by organizations as surrogates for
“promotion potential” of employees or applicants, thus representing an indirect effect on selection
or promotion.
In some downsizing situations (i.e., deselection), an older worker may be the victim of disparate
treatment with regard to a performance appraisal rating. The older employee is rated lower sup-
posedly because he or she does not have the ability to adapt or adjust to future conditions in the
marketplace. It has been alleged by older plaintiffs that performance appraisals may be arbitrary
and excessively subjective assessments that represent a pretext for age discrimination. Without
appropriate rating scale development and construct definition, the ability may be ill-defined and
may lend itself to biased evaluations.
Similarly, an employee with a disability may be assumed to be unable to carry out some physical
task associated with the job simply because the employer failed to recognize alternative ways of per-
forming the task or subtle ways in which the task could be restructured. In other words, the employer
ignored the responsibility of providing reasonable accommodations to the disabled employee.

EXEMPLAR CASE LAW


Wards Cove v. Atonio (1989) illustrates the pitfalls of steering a subset of applicants into lower-
paying jobs. Two companies used different procedures for hiring unskilled salmon packers (primar-
ily Eskimos and Filipinos) and skilled workers (primarily White, who also had better pay, food, and
housing). The Supreme Court issued an adverse impact ruling favoring the defendants in Wards Cove
(later overturned in the Civil Rights Act of 1991). Interestingly, the 9th Circuit decision (on which
A Sampler of Legal Principles in Employment Selection 643

the Supreme Court decision was based), which ruled in Wards Cove, previously found the challenged
practices in Wards Cove (word-of-mouth recruitment, walk-in hiring, nepotism, and “vague subjec-
tive hiring criteria”) as more indicative of disparate treatment than adverse impact in Domingo v.
New England Fish (1984). Similarly, post-Wards Cove cases with parallel charges have also implied
disparate treatment (e.g., Thomas v. Washington County, 1990, and EEOC v. O&G, 1991).
Initial gender-based pay differences are illustrated in EEOC v. Aetna (1980) and EEOC v. Ligget
Myers (1982). In Aetna, Barratt, a female, was hired at the same time as Garrett, a male. Barratt learned
that Garrett was paid more only after Garrett was fired, and Aetna settled with Barratt. In Ligget
Myers, male supervisors were paid more than their female counterparts as a result of prepromotion sal-
aries. Although it used a sex-blind procedure for all promotions to supervisor (absolute dollar amount
increases), the company could not explain why prepromotion salaries were higher for the males.
“Adaptability” in downsizing is illustrated in Meacham v. Knolls (2006), in which 30 of 31 laid
off employees were over age 40. The plaintiffs proved adverse impact, but the defendants, using the
statutory RFOA (reasonable factors other than age) defense, articulated that the basis for determin-
ing layoffs was whether employees were “flexible” and “retrainable” for alternative assignments.
A divided panel of the 2nd Circuit ruled for Knolls because the plaintiffs could not prove this
employer strategy was unreasonable. However, the dissenting judge argued that the defendant’s
burden is to factually prove (rather than simply articulate) that its criteria are reasonable, and it is
not the plaintiff’s burden to prove otherwise. In 2008, the Supreme Court ruled that the employer
actually had the burden of proof, and the case has been remanded for further trial proceedings in
which the employer will be required to present that proof.
Finally, job restructuring is not required under the ADA if it entails elimination of essential job
functions. However, employers are free to restructure if they so desire. For example, in Barth v.
Gelb (1993), a diabetic applied unsuccessfully for an overseas job at Voice of America (or VOA).
VOA required rotation among its sites and medical facilities were not available at each cite. VOA
routinely permitted such selective assignments for injured employees. However, distinguishing
between “friends” and “strangers,” the DC Circuit ruled it was legal to so restructure jobs for exist-
ing employees without being obligated to do likewise for applicants

12. Employers should take proactive measures to prevent, detect, and correct EEO violations
involving procedural unfairness to workers or violations inferred from analysis of workforce
data.
The model for preventing, detecting, and correcting EEO violations involving unfair treatment
of employees is provided in exemplar form by EEOC’s approach to sexual harassment (SH). In
EEOC Policy Guidance 915.050 (June 1999; http://www.eeoc.gov/policy/docs/currentissues.html),
the EEOC distinguishes between minimum requirements and best practices. The minimum require-
ments are as follows:

• A clear explanation of prohibited conduct


• Assurance that employees who make complaints of harassment or provide information
related to such complaints will be protected against retaliation
• A clearly described complaint process that provides accessible avenues of complaint
• Assurance that the employer will protect the confidentiality of harassment complaints to
the extent possible
• A complaint process that provides a prompt, thorough, and impartial investigation
• Assurance that the employer will take immediate and appropriate corrective action
when it determines that harassment has occurred

An aggressive response requires additional actions, including training employees to understand


all employer policies, a dedicated “EEO Officer” to handle complaints, and an employee handbook
that summarizes the rights and privileges for all employees.
644 Handbook of Employee Selection

Although Policy Guidance 915.050 was expressly targeted at SH, it applies equally well to
broader actions of any sort. For example, employees may feel that their performance is being
improperly appraised or that they are not receiving training or certification opportunities necessary
for advancement.
Workforce data may include confidential survey information obtained by a trained EEO Officer
or statistical data relating to performance of different groups of employees on selection tests, com-
position of the workforce in relation to the appropriate labor pool, or the funneling of different
groups into different jobs (e.g., female offered jobs as cashiers vs. male offered jobs as assistant
managers). Properly analyzed data could lead to the detection of a potential adverse impact or pat-
tern and practice violation, thus enabling employers to take action before expensive and disruptive
litigation while simultaneously increasing employee perceptions of fairness.
Finally, employers need a good antiretaliation policy. EEOC statistics reveal that retaliation com-
plaints are increasing although EEO claims in general have stabilized and even decreased (Zink &
Gutman, 2005). Adding to this caution, the Supreme Court recently lightened the burden on plain-
tiffs to prove retaliation, requiring employers to educate their HR professionals and managers on
what to do when individuals complain about a workplace policy or file formal charges. It is not
unusual for plaintiff employees to complain about routine selection decisions such as lost training
opportunities that might presage promotions, or failure to promote per se. It is critically important
for employers to prevent reprisals against those who do complain.

EXEMPLAR CASE LAW


EEOC Policy Guidance 915.050 interprets the Supreme Court’s 1998 rulings in Burlington v. Ellerth
and Faragher v. Boca Raton. The Court ruled in both cases that when “no tangible employment
action is taken” the employer has vicarious liability for supervisors, but may affirmatively defend
itself by proving with evidence that it exercised: (a) “reasonable care to prevent and correct promptly,
sexually harassing behavior” and (b) “the … employee unreasonably failed to take advantage of any
preventive or corrective opportunities provided by the employer to avoid harm otherwise.”
Ellerth applies to private entities and Faragher to public entities. Both rulings clarify there is
(a) no defense for quid pro quo SH (strict liability), (b) an affirmative (see above) defense for hos-
tile harassment by supervisors, and (c) employers must know or have a basis for knowing that SH
occurred among coworkers or they will be considered guilty of reckless disregard. Examples of
employer policies that succeeded in affirmative defenses are in Coates v. Sundor Brands (1998) and
Shaw v. AutoZone (1999), and examples of employer failures to affirmatively defend are in Baty v.
Willimamette (1999), Dees v. Johnson Controls (1999), and Gentry v. Export Packing (2001).
There are two major Supreme Court rulings on retaliation. In Robinson v. Shell Oil (1997),
the Court unanimously ruled that retaliation applies to actions of a former employer who wrote a
negative letter of reference for a previously fi red employee. Subsequently, the EEOC issued policy
guidance (915.003) in May, 1998 outlining three steps for proving retaliation: (a) opposing an
employer policy (opposition) or filing a legal claim (participation), (b) suffering an adverse action,
and (c) causally connecting opposition or participation to the adverse action. At the same time,
the EEOC defined “adverse action” as any action reasonably likely to deter charging parties (or
others) from engaging in a protected activity.
More recently, in BNSF v. White (2006), the Supreme Court endorsed the EEOC’s definition of
“adverse action” over two earlier but heavier standards. One of those heavier standards required
“adverse action” to include ultimate employment consequences such as hiring, discharge, promotion,
or compensation. The other required proof of interference with terms and conditions of employment.
By endorsing a lighter EEOC standard, the Supreme Court made it possible for otherwise legal
employer actions to constitute retaliation. For example, in Moore v. Philadelphia (2006), White
police officers opposed harassment against fellow Black police officers, and in Hare v. Potter (2007),
a female employee cited ten incidents in which her life was made miserable after she filed an EEOC
A Sampler of Legal Principles in Employment Selection 645

complaint. These actions were deemed insufficient for proof of racial or SH, but were deemed suf-
ficient to prove retaliation against the complainants, although they were not original plaintiffs.
Finally, the risks employers face when altering selection processes to avoid adverse impact is
illustrated in Ricci v. Destafano (2008) and Hayden v. County of Nassau (1999). In Ricci, where
exams were discarded, there was a weak basis for the New Haven CSB to believe they would lose an
adverse impact challenge to minority applicants. However, in Hayden, where Nassau County (New
York) eliminated portions of the test to reduce the adverse impact, there was a much stronger basis
for that fear since the county was under a consent decree to create a valid test with the least amount
of adverse impact.

13. Using statistics to demonstrate age discrimination is more challenging than using similar
statistics to argue for racial or gender discrimination because many variables that would not
be expected to be correlated with race or gender (e.g. career progress, promotions, compensa-
tion) are commonly correlated with age.
Race and gender are immutable personal characteristics. Age is not because we all age (although
some less willingly than others.) When considering race and gender in the calculation of adverse
impact, it is common to compare differences in the relative rates of “success” of men and women,
or Whites and Blacks. When holding relevant variables constant (e.g. education, experience, per-
formance), in a “fair” work environment, one would not expect to find differences in those success
rates when considering race or gender. In contrast, age changes over time for an individual and is
often correlated with variables of interest such as experience and stage of career development. It is
widely recognized that early in one’s career, promotions occur more rapidly as do percent increases
in compensation. Thus, it is common to find that older individuals tend to receive fewer promotions
and lower percent increases in compensation. This is tempered by the fact that these same older
individuals are typically more highly paid than their younger counterparts in absolute dollars and
that these older individuals also tend to be at higher levels in the organization.
In age discrimination cases, the plaintiff will often present statistics demonstrating a decrease
in the rate of promotions for older employees as a way of addressing the issue of adverse impact.
Similarly, plaintiffs may produce evidence of statistically significant differences in performance
ratings between older and younger employees. To the extent that performance ratings play a role in
decisions regarding promotions, and/or layoffs, plaintiffs argue that performance ratings are biased
against the older employee and represent the foundation for palpable harm to the older employee.
For any informative discussion of the possibility of adverse impact in age cases, one should intro-
duce controls related to organizational and job tenure, work-related experience, and knowledge/
skill/ability variables. Longitudinal analyses are superior to cross-sectional analyses when dealing
with age. With respect to differences in performance ratings, two phenomena should be recognized.
The first might be called the “frog pond” or “the all-state-quarterback effect.” To the extent that
performance ratings represent comparisons between an individual and peers, as an individual pro-
gresses up the organizational ladder (and, at the same time, ages), he or she is being compared with
an increasingly competent cohort. Thus, the individual arrives at “ponds” with increasingly bigger
frogs. The second phenomenon can be termed the “speed of success” factor. This means that an
individual who moves rapidly up in an organization might be considered more capable and effective
than a colleague who spends longer periods of time in a given job title.

EXEMPLAR CASE LAW


In the 1980s, courts treated adverse impact in age cases with Title VII rules. For example, cost-
cutting defenses in Geller v. Markham (1980) (hiring at the lowest of six steps) and Leftwitch v.
Harris Stowe (1983) (termination of tenured faculty) failed because neither was deemed job-related
in accordance with then existing Department of Labor regulations (subsequently adopted by the
EEOC).
646 Handbook of Employee Selection

Then in Hazen v. Biggens (1993), a 62-year old was terminated shortly before eligibility for
pension vestment, a clear-cut ERISA violation. However, the lower courts also favored disparate
treatment because age and years of service are correlated. The Supreme Court reversed on disparate
treatment, ruling unanimously that employer decisions may be motivated by “factors other than age
… even if the motivating factor is correlated with age.” Additionally, three justices opined that it is
“improper to carry over disparate impact analysis from Title VII to the ADEA.” After Hazen, three
circuit courts continued to entertain age-based adverse impact claims, but seven circuit courts found
adverse impact inapplicable in the ADEA as a matter of law. Disputes among circuits often lead to
Supreme Court decisions intended to add consistency across circuits.
Then, in Smith v. City of Jackson (2005), police officers and dispatchers with less than 5 years
of experience received higher percentage compensation increases. The lower courts ruled adverse
impact was unavailable in the ADEA, but the Supreme Court ruled that Hazen does not prelude
such claims. However, the Supreme Court affirmed Hazen that factors correlated with age (e.g.,
years of service) do not qualify and, where adverse impact is proven, defendants may use the statu-
tory RFOA in lieu of proving job-relatedness. The plaintiffs ultimately lost the prima facie case
(failure to prove adverse impact) and the City had a valid RFOA (the need to compete with neigh-
boring municipalities for filling lower-level positions by increasing the compensation for those posi-
tions, although the entry-level positions were often filled by younger applicants).
Two subsequent lower court rulings in which adverse impact was proven are worth noting. As
noted in Principle 11 (below), in Meacham v. Knolls (2006), 30 of 31 laid-off employees were over
age 40. Additionally, in EEOC v. Allstate (2006), over 90% of employees subject to a “reorganiza-
tion plan” (making them ineligible for rehire) were over age 40. As noted above, in the Meacham
case, the Supreme Court ruled that the employer has the burden of proof when invoking an RFOA
defense.

14. All disability-related decisions should be made on a case-by-case basis, including deter-
mining if an applicant or employee is (a) disabled, (b) needs accommodations for performing
essential job functions, and/or (c) assessments of KSAOs deemed necessary to perform essen-
tial job functions.
Under the ADA of 1991 (and the prior Rehabilitation Act of 1973), there is no such thing
as “disability as a matter of law.” The general requirements for being disabled under the law
include: (a) a physical or mental impairment that (b) interferes with a major life function. In addi-
tion to prongs (a) and (b), the individual must (c) be able to perform all essential job functions
with or without accommodations. This is true regardless of whether the impairment is current
or past or if the employer mistakenly believes an individual is disabled. As a result, disabilities
fall within an interval between prongs (b) and (c) in which individuals with minor or temporary
impairments cannot demonstrate interference with major life functions, and individuals with
extremely severe impairments may not be able to perform all essential job functions, even with
accommodations.
The first step in determining if a disabled person requires accommodations is determining
whether there is a nexus between a physical or mental impairment and essential job functions.
A person with one leg is clearly disabled under the ADA, but it is unlikely that accommoda-
tions beyond access to the workplace are required if the job is computer programmer. If there
is a nexus, the employer should next meet with the applicant or employee and together explore
potential accommodations to overcome the barrier implied by the disability. If there are no such
accommodations possible, the individual, unfortunately, faces an insurmountable yet legal barrier
to employment.
If assessment is involved in the selection process, it is important to accommodate applicants
with special needs. For example, if a paper-and-pencil or computer-presented format is used and the
construct of interest is “good judgment,” it may be important to allow applicants with limited vision
to have the test questions read to them. More generally, it is good practice to incorporate KSAOs
A Sampler of Legal Principles in Employment Selection 647

needed to perform essential job functions in the testing process itself. For example, if a job or criti-
cal tasks can be performed without undue concern for the passage of time, it may be inappropriate
to use demanding time limits for a test to be taken by an individual who claims a learning disability
related to speed of reading or processing.

EXEMPLAR CASE LAW


The individual approach to defining disability was affirmed in three 1999 Supreme Court rulings:
Sutton v. UAL, Murphy v. UPS, and Albertsons v. Kirkingberg. For example in Kirkingburg, the
Supreme Court ruled:

This is not to suggest that monocular individuals have an onerous burden in trying to show that they
are disabled . . . . We simply hold that the Act requires monocular individuals … to prove a disability by
offering evidence that the extent of the limitation in terms of their own experience, as in loss of depth
perception and visual field, is substantial.

In other words, Kirkingburg could have proven he was disabled within the meaning of the law, but
did not; assuming amblyopia is a disability as a matter of law.
In the other two cases, the Supreme Court ruled that impairments must be evaluated with mitigation
(eyeglasses for visual impairments in Sutton and high blood pressure medication for hypertension in
Murphy). In an extension of the ruling in Kirkingburg, the Murphy Court ruled:

Murphy could have claimed he was substantially limited in spite of the medication. Instead, like
Kirkingburg, [he] falsely assumed that his impairment was a disability as a matter of law.

Taking advantage of this “advice,” plaintiffs subsequently proved disability despite medication, as
for example, in EEOC v. Routh (2001) (seizures only partially controlled with epilepsy medication)
and Lawson v. CSX (2001) (debilitating side effects of insulin medication for diabetics). It should be
noted that in the ADA Amendments Act of 2008, Congress changed the rules for mitigating mea-
sures (except for eyeglasses), thus reversing the Albertsons and Kirkingburg rulings. This does not,
however, alter the principle of assessing impairment on a case-by-case basis.
Example of insurmountable barriers include Southeastern v. Davis (1979) (a deaf woman excluded
from nursing school), Treadwell v. Alexander (1983) (a heart patient who cannot perform all-day
foot patrols excluded from park ranger job), and Miller v. Illinois (1996) (a blind person who could
perform some but not essential functions of corrections officer). However, it is critical that the job
functions in question are essential, as, for example, in Stone v. Mt. Vernon (1997) (paraplegic former
firefighter refused a desk job because he cannot fight fires in emergencies).
When accommodations are possible, plaintiffs have a duty to inform and both parties have
a duty to “flexibly interact” to seek accommodations. Examples of failure to notify include
Hedberg v. Indiana Bell (1995) (notification of fatigue syndrome after termination) and Taylor v.
Principle Financial (1997) (notification of bipolar disorder after a poor performance evaluation).
Examples of employee failures to flexibly interact include Beck v. University of Wisconsin (1996)
(the employee refused a request for medical records to identify accommodations) and Grenier
v. Cyanamid (1995) (the employee refused a request for psychiatric information). Examples of
employer failures to flexibly interact include Bultmeyer v. Fort Wayne (1996) (the employer
ignored a request by psychiatrist for reassignment), Feliberty v. Kemper (1996) (the employer
falsely assumed that a medical doctor can design his own accommodations), Whiteback v. Vital
signs (1997) (the employer ignored a request for motorized cart because “it wouldn’t look right”),
and Dalton v. Suburu-Izuzu (1998) (the employer ignored a request for step stools and guard rails
without discussion).
Mistakes in assessment include Stutz v. Freeman (1983), in which a dyslexic applicant failed the
General Aptitude Test Battery (GATB) exam for a job (heavy truck operation) that did not require
648 Handbook of Employee Selection

reading skills. On the other hand, in Fink v. New York City (1995), the city was not liable when
accommodations for blind applicants (readers and interpreters) did not result in passing scores on a
civil service exam.

CONCLUSIONS
The term “law” in the selection context has two separate but related meanings. There is statutory
law embodied in the Civil Rights Acts, ADA, ADEA, and similar federal statutes that we have dis-
cussed above. There is also “case law,” which is embodied in the opinions of various levels of the
federal judiciary (trial, appeals, and Supreme Courts). The latter interprets the former from the legal
perspective. Selection practitioners must be aware of both aspects of “the law.” They must be aware
of the statutory requirements as well as how judges have interpreted these requirements. Statutory
statements of the law seldom recognize the specific contributions of I-O psychology to selection
practices. Judges, on the other hand, often cite the testimony of I-O psychologists and the standards
by which selection practice is evaluated (e.g. UGESP, SIOP Principles). In this chapter, we have
attempted to bring together practice, statutory law, and case law as a way of educating practitioners.
Other chapters provide more detailed descriptions of practices. We provide a legal context for many
of those practices.

REFERENCES
Adams v. City of Chicago, 469 F.3d 609 (CA7 2006).
Albemarle Paper Co. v. Moody, 422 U.S. 405 (1975).
Albertsons v. Kirkingburg, 527 U.S. 555 (1999).
Barth v. Gelb, 2 F.3d 1180 (DC Cir. 1993).
Baty v. Willamette Industries, 172 F.3d 1232 (CA10 1999).
Beck v. University of Wisconsin Bd. of Regents, 75 F.3d 1130 (CA7 1996).
Bernard v. Gulf Oil, 890 F.2d 735 (CA5 1989).
Bew v. City of Chicago, 252 F.3d 891 (CA7 2001).
Biondo v. City of Chicago, 383 F.3d 680 (CA7 2004).
BNSF v. White, 126 S.Ct. 240 (2006).
Bobko, P., Roth, P. L., & Potosky, D. (1999). Derivation and implications of a meta-analytic matrix incorporat-
ing cognitive ability, alternative predictors, and job performance. Personnel Psychology, 52, 561–590.
Boston NAACP v. Beecher, 504 F.2d 1017 (CA1 1974).
Bradley v. City of Lynn, 443 F. Supp. 2d. 145 (D.Mass 2006).
Bridgeport Guardians, Inc. v. City of Bridgeport, 933 F.2d 1140 (CA2 1991).
Brunet v. City of Columbus, 58 F.2d 251 (CA6 1995).
Bultmeyer v. Fort Wayne School, 100 F.3d 1281 (CA7 1996).
Burlington Industries, Inc. v. Ellerth, 524 U.S. 742 (1998).
Clady v. Los Angeles, 770 F.2d 1421 (CA9 1985).
Coates v. Sundor Brands, 164 F.3d 1361 (CA11 1998).
Connecticut v. Teal, 457 U.S. 440 (1982).
Dalton v. Subaru-Isuzu Automotive, Inc., 141 F.3d 667 (CA7 1998).
De Corte, W. (1999). Weighing job performance predictors to both maximize the quality of the selected work-
force and control the level of adverse impact. Journal of Applied Psychology, 84, 695–702.
De Corte, W., Lievens, F., & Sackett, P. R. (2007). Combining predictors to achieve optimal trade-offs between
selection quality and adverse impact. Journal of Applied Psychology, 92, 1380–1393.
Dees v. Johnson Controls World Services, Inc., 168 F.3d 417 (CA11 1999).
Detroit Police Officers Association v. Young, 999 F.2d. 225 (CA6 1979).
Domingo v. New England Fish, 727 F.2d 1429 (CA9 1984).
Dothard v. Rawlinson, 433 U.S. 321 (1977).
EEOC v. Aetna Insurance Co., 616 F.2d 719 (CA4 1980).
EEOC v. Allstate, 458 F. Supp. 2d 980 (ED Missouri 2006).
EEOC v. Atlas Paper, 868 F.2d. 487 (CA6 1989).
EEOC v. Liggett & Myers, Inc., 690 F.2d 1072 (CA4 1982).
A Sampler of Legal Principles in Employment Selection 649

EEOC v. O&G, 383 F3d 872 (CA7 1991).


EEOC v. Routh, 246 F.4d 850 (CA6 2001).
Ensley NAACP v. Seibels, 616 F.2d 812 (CA5 1980).
Ercegovich v. Goodyear Tire & Rubber Co., 154 F.3d 344 (CA6 1998).
Faragher v. City of Boca Raton, 524 U.S. 775 (1998).
Feliberty v. Kemper, 98 F.3d 274 (CA7 1996).
Fink v. NYC, 53 F.3d 565 (CA2 1995).
Furnco Construction Corp. v. Waters, 438 U.S. 567 (1978).
Geller v. Markham, 635 F.2d 1027 (CA2 1980).
Gentry v. Export Packaging Co., 238 F.3d 842 (CA7 2001).
Gillespie v. State of Wisconsin, 771 F.2d 1035 (CA7 1985).
Gratz v. Bollinger, 539 U.S. 244 (2003).
Grenier v. Cyanamid, 703 F.3d 667 (CA1 1995).
Griggs v. Duke Power Co., 401 U.S. 424 (1971).
Grutter v. Bollinger, 539 U.S. 306 (2003).
Guardians v. Civil Service, 630 F.2d 79 (CA2 1980).
Gulino v. State Education Department, 461 F.3d 134 (CA2 2006).
Gutman, A. (2000). EEO law and personnel practices (2nd ed.). Newbury Park, CA: Sage.
Hamer v. Atlanta, 872 F.2d 1521 (CA11 1989).
Hare v. Potter, WL 841031 (CA3 2007).
Hattrup, K., Rock, J., & Scalia, C. (1997). The effects of varying conceptualizations of job performance on adverse
impact, minority hiring, and predicted performance. Journal of Applied Psychology, 82, 656–664.
Hayden v. Nassau, 180 F.3d 42 (CA2 1999).
Hazen v. Biggens, 507 U.S. 604 (1993).
Hedberg v. Indiana Bell Tel. Co., 47 F.3d 928 (CA7 1995).
Isabel v. City of Memphis, 404 F.3d 404 (CA6 2005).
Jackson v. Fedex, U.S. App. LEXIS 4802 (CA6 2008).
Johnson v. City of Memphis, WL 3827481 (W.D. Tenn 2006).
Landy, F. J. (Ed.). (2005). Employment discrimination litigation: Behavioral, quantitative, and legal perspec-
tives. San Francisco, CA: Jossey Bass.
Lanning v. SEPTA, No. 97-1161 (ED PA 1998).
Lanning v. SEPTA, 181 F.3d 478 (CA3 1999).
Lawson v. CSX, 245 F.3d 916 (CA7 2001).
Leftwich v. Harris-Stowe State College, 702 F.2d 686 (CA8 1983).
Lewis v. Chicago, 299 F.Supp 1357 (ND ILL 2005).
Lomack v. City of Newark, 463 F.3d 303 (CA3 2006).
Madel v. FCI Marketing, 116 F.3d 1247 (CA8 1997).
Meacham v. Knolls Atomic Power Laboratory, 461 F.3d 134 (CA2 2006).
Miller v. Illinois, 107 F.3d 483 (CA7 1996).
Moody v. Albemarle, 474 F.2d 134 (CA4 1973).
Moore v. Philadelphia, 461 F.3d 331 (CA3 2006).
Murphy v. United Parcel Service, 527 U.S. 516 (1999).
Murphy, K. R., & Shiarella, A. H. (1997). Implications of the multidimensional nature of job performance for
the validity of selection tests. Personnel Psychology, 50, 823–854.
Oubre v. Entergy Operations, 118 S.Ct. 838 (1998).
Outtz, J. L. (2007). Less adverse alternatives: Making progress and avoiding red herrings. The Industrial-
Organizational Psychologist, 15(2), 23–27.
Outtz, J. L. (Ed.). (2009). Adverse impact: Implications for organizational staffing and high stakes selection.
New York, NY: Psychology Press.
Parker v. University of Pennsylvania, 2004 U.S. Dist. Lexis 17423 (2004).
Petit v. City of Chicago, 352 F.3d 1111 (CA7 2003).
Police officer v. Columbus, Ohio, 916 F.2d 1092 (CA6 1990).
Regents of University of California v. Bakke, 438 U.S. 265 (1978).
Reynolds, D. H. (2004). EEOC and OFCCP guidance on defining a job applicant in the Internet age: SIOP’s
response. The Industrial-Organizational Psychologist, 42(2), 127–138.
Ricci v. Destefano, Civil No. 3:04cv1109 (JBA) (2006).
Robinson v. Shell Oil, 519 U.S. 337 (1997).
Rudin v. Lincoln Land Community College, 420 F.3d 712 (CA7 2005).
650 Handbook of Employee Selection

Sackett, P. R., & Ellingson, J. E. (1997). The effects of forming multi-predictor composites on group differences
and adverse impact. Personnel Psychology, 50, 707–722.
Schmitt, N., Rogers, W., Chan, D., Sheppard, L., & Jennings, D. (1997). Adverse impact and predictive efficiency
of various predictor combinations. Journal of Applied Psychology, 82, 719–730.
Shaw v. AutoZone, Inc., 180 F.3d 806 (CA7 1999).
Smith v. City of Jackson, 544 U.S. 228 (2005).
Society for Industrial and Organizational Psychology. (2003). Principles for the validation use of personnel
selection procedures. Bowling Green, OH: Author.
Southeastern Community College v. Davis, 442 U.S. 397 (1979).
Starceski v. Westinghouse, 54 F.3d 1089 (CA3 1995).
Stone v. City of Mount Vernon, 118 F.3d 92 (CA2 1997).
Stutts v. Freeman, 694 F.2d 666 (CA11 1983).
Sutton v. United Air Lines, 527 U.S. 471 (1999).
Talbert v. City of Richmond, 648 F.2d 925 (CA4 1981).
Taylor v. Principal Financial Group, 93 F.3d 155 (CA5 1997).
Thomas v. Washington County, 915 F.2d 922 (CA4 1990).
Treadwell v. Alexander, 707 F.2d 473 (CA11 1983).
US v. Georgia Power, 474 F.2d 906 (CA5 1973).
Watson v. Fort Worth Bank & Trust, 487 U.S. 977 (1988).
Wards Cove Packing Company v. Atonio, 490 U.S. 642 (1989).
Whiteback v. Vital Signs, 116F.3d 588 (CA DC 1997).
Wingate, P. H., Thornton, G. C., III, McIntyre, K. S., & Frame, J. H. (2003). Organizational downsizing and
age discrimination litigation: The influence of personnel practices and statistical evidence on litigation
outcomes. Law and Human Behavior, 27, 87–108.
Zaccaginini v. Levy, 338 F.3d 672 (CA7 2003).
Zink, D. L., & Gutman, A. (2005). Statistical trends in private sector employment discrimination suits. In
F. J. Landy (Ed.), Employment discrimination litigation: Behavioral, quantitative, and legal perspectives
(pp. 101–131). San Francisco, CA: Jossey Bass.
30 Perspectives From Twenty-
Two Countries on the Legal
Environment for Selection

Paul R. Sackett, Winny Shen, Brett Myors, Filip Lievens,


Eveline Schollaert, Greet Van Hoye, Steven F. Cronshaw,
Betty Onyura, Antonio Mladinic, Viviana Rodríguez,
Dirk D. Steiner, Florence Rolland, Heinz Schuler, Andreas
Frintrup, Ioannis Nikolaou, Maria Tomprou, S. Subramony,
Shabu B. Raj, Shay Tzafrir, Peter Bamberger, Marilena
Bertolino, Marco Mariani, Franco Fraccaroli, Tomoki
Sekiguchi, Hyuckseung Yang, Neil R. Anderson, Arne Evers,
Oleksandr Chernyshenko, Paul Englert, Hennie J. Kriek,
Tina Joubert, Jesús F. Salgado, Cornelius J. König, Larissa
A. Thommen, Aichia Chuang, Handan Kepir Sinangil,
Mahmut Bayazit, Mark Cook, and Herman Aguinis1

In the United States, the legal context plays a major role in how psychologists approach selection
system development. Psychologists know well the set of protected groups, the approaches to making
an a priori case of discrimination (e.g., differential treatment vs. adverse impact), the key court cases
influencing selection, and the prohibitions against preferential treatment (e.g., the 1991 ban on score
adjustment or within-group norming). Selection texts (e.g., Guion, 1998) and human resource man-
agement texts (e.g., Cascio & Aguinis, 2008) give prominent treatment to the legal context. In recent
years, there has been a growing internationalization of industrial-organizational (I-O) psychology
such that psychologists from all over the world work with clients in other countries and contribute
to our journals and to our conferences. Test publishers and consulting firms establish offices all
over the world. As this internationalization continues to increase, it becomes increasingly useful to
take a broader look at the legal environment for selection, examining similarities and differences
in various countries. For example consider a U.S firm with operations in several other countries.
Although U. S. fair employment law applies only to those overseas employees who are U.S. citizens,

1 All authors contributed equally to this chapter. Paul R. Sackett and Winny Shen integrated the text materials provided
by each author. Portions of this chapter were drawn from an article by the same set of authors: Myors, B., Lievens, F.,
Schollaert, E., Van Hoye, G., Cronshaw, S. F., Mladinic, A., et al. (2008). International perspectives on the legal environ-
ment for selection. Industrial and Organizational Psychology: Perspectives on Science and Practice, 1, 200–256. Used
by permission of the Society for Industrial and Organizational Psychology and Wiley Blackwell.

651
652 Handbook of Employee Selection

the employment by U.S. firms of host country nationals or third country nationals is subject to the
legal environment of the host country.

DATA COLLECTION METHODOLOGY


To compare and contrast the legal environment for selection in various countries, the senior author
prepared a set of questions about the legal environment for selection, prepared model answers
describing the legal environment in the United States, and contacted psychologists in various coun-
tries, asking them to prepare a document responding to each question and describing the legal
environment in their country. They were also invited to suggest additional project participants in
other countries. Some invitees declined; some initially agreed, but subsequently did not participate.
The goal was to obtain a range of perspectives by sampling about 20 countries, thus, this is by no
means a complete catalog of the legal environment around the world. Researchers and practitioners
who are experts on the topic of selection participated from the following 22 countries: Australia,
Belgium, Canada, Chile, France, Germany, Greece, India, Israel, Italy, Japan, Kenya, Korea, The
Netherlands, New Zealand, South Africa, Spain, Switzerland, Taiwan, Turkey, the United Kingdom,
and the United States. As the list indicates, the countries covered do broadly sample the world.
Because of space constraints, the write-up for each country was subsequently summarized and
organized by issue (e.g., what groups are protected; is preferential treatment of minority group
members permitted) rather than by country to create this chapter. For more context on the legal,
social, and political environment of the countries surveyed, see Myors et al. (2008). Contributing
authors from each of the 22 countries responded to several questions, nine of which are addressed
in turn in this chapter.

Question 1: Are there racial/ethnic/religious subgroups such that some are viewed as “advan-
taged” and others as “disadvantaged”?
Table 30.1 identifies the major groups viewed as “disadvantaged” in each country (note that gen-
der is treated separately in the next section, and specific legal protections for disadvantaged groups
are treated under Question 4). As Table 30.1 indicates, the disadvantaged groups differ on several
dimensions. First, the basis for disadvantaged status varies: (a) native/aboriginal people in a setting
where colonizers became the majority group (e.g., Native Americans in the United States, Maori in
New Zealand, First Nations Peoples in Canada), (b) recent immigrants (e.g., many European coun-
tries), (c) racial groups either native to or with long histories in the country (e.g., African Americans
in the United States; Blacks, colored individuals, and Indians in South Africa), (d) religious groups
(e.g., India), and (e) language groups (e.g., Francophones in Canada, Rhaeto-Romanic speakers in
Switzerland). Second, the size of the minority population varies, from a very small percentage of
the population in some countries to the South African extreme of a previously disadvantaged Black
majority. These findings illustrate that there is considerable variability from country to country in
what constitutes a disadvantaged group.

Question 2: What is the general picture regarding women in the workplace (e.g., historical
trends regarding employment for women; current data on percentage of women in the work-
force; and current status regarding occupational segregation, such as gender representation
in various job classes and at various organizational levels)?
Among the countries surveyed, women make up a substantial portion of the workforce. In gen-
eral, women make up from over one quarter to slightly less than one half of the working population
(see Table 30.2). Great strides have been made such that women are being increasingly involved
in the workforce across all countries surveyed, as evidenced by reports of the increased rate of
women’s participation in the workforce, with the exception of Turkey, who reports a slight decline
in the recent years (34% in the early 1990s down to 25.4% in 2004; State Institute of Statistics,
2006). There is substantial variability among countries in terms of the percentage of women who
Perspectives From Twenty-Two Countries on the Legal Environment for Selection 653

TABLE 30.1
Disadvantaged Groups Within Each Country
Country Group Percentage of Population
Australia Indigenous Australians 2.5
Belgium Non-Western immigrants
Moroccan 0.8
Turkish 0.4
Canada Immigrants 18.4
Visible minorities 13.4
First Nations peoples 2.1
Francophones 15.7
Chile Recent immigrants
Argentinean
Peruvian 1.2
Bolivian
Ecuadorian
France Immigrant groups 7.4
European 3.33
North African 2.22
Other African 0.67
Asian 0.96
Germany Migrant workers/immigrants
Turkish 3.7
Southern European countries
Reimmigrants (Volga-Germans) 2.8
Greece Immigrants 7.0
Albanian
Bulgarian
Georgian
Romanians
India Within Hindu Castesa
Scheduled castes 15.06
Scheduled tribes 7.51
Other backward classes 43.70
Muslims 13.0
Israel Palestinian Arabs 22.0
Druze 2.0
Sephardic Jews 31.0
Iraq
Iran
Morocco
Ethiopia
Italy Albanian 1.0
Rumanian 0.9
Moroccan 0.9
Ukrainian 0.4
Chinese
Japan North and South Korean 0.5
Chinese 0.4
Brazilians 0.2
Philippines 0.1

continued
654 Handbook of Employee Selection

TABLE 30.1 (continued)


Disadvantaged Groups Within Each Country
Country Group Percentage of Population
Kenya Foreigners 1.5
Asians
Europeans
Muslims 7.0
Less populous Kenyan tribes 51.5
(Swahili, Kalenjin, Kamba,
Kisii, Ameru, Embu, Maasai, Somali,
Turkana, Taita, and Samburu)
Korea Foreigners 0.8
The Netherlands Non-Western immigrants 10.5
Turkish 2.2
Moroccan 2.0
Surinamese 2.0
Antillean/Aruban 0.8
New Zealand Pacific peoples 6.4
Maori 13.5
South Africa Black (disadvantaged majority)
African 79.5
Colored (mixed race) 8.9
Indian 2.5
Spain Immigrant groups 9.25
Moroccan 1.16
Ecuadorian 1.01
Rumanian 0.89
Colombian 0.59
Argetinean 0.43
Bolivian 0.31
Chinese 0.22
Peruvian 0.21
Switzerland Immigrant groups 21.9
Ex-Yugoslavia 4.7
Italians 4.1
Portuguese 2.5
Germans 2.4
Rhaeto-Romanic-speaking 0.5
Swiss
Taiwan Taiwanese aborigines 2.0
Turkey Religious minorities
Alevi 20.0
Christian and Jewish 0.3
Kurdish 11.0
Arabic 1.5
Other 1.8
Armenian
Greek
Jewish
Perspectives From Twenty-Two Countries on the Legal Environment for Selection 655

TABLE 30.1 (continued)


Disadvantaged Groups Within Each Country
Country Group Percentage of Population
United Kingdom Indian 1.78
Pakistani 1.26
Black Caribbean 0.95
Black African 0.82
Bangladeshi 0.48
Chinese 0.41
Other 2.1
United States Black/African American 12.3
Hispanic/Hispanic American 12.5
Native American and Alaskan Native 0.9

a The Hindu caste system differentiates between “forward” (advantaged) and “backward” (disadvantaged) groups. A national
“schedule” or classification of castes differentiates between scheduled castes (previously “untouchable” castes), scheduled
tribal groups, and other backward castes.

participate in the workforce, ranging from approximately one quarter of women in Turkey (State
Institute of Statistics, 2006) to between 60 and 70% in France (Attal-Toubert & Lavergne, 2006),
Kenya (primarily due to the high involvement of women in small-scale farming and pastoralism),
New Zealand, and the United Kingdom. These differences are undoubtedly at least partially due
to the multitude of differences among countries including those in history, culture and values, eco-
nomic conditions, and political conditions. It is interesting to note that in no instance is the female
participation rate higher than the male participation rate; this may partially reflect the traditional
division of labor between men and women, such that women are more likely to have household and
childcare duties.
Although women are less likely to participate in the workforce than their male counterparts, it
appears that there tend to be no or small differences in the unemployment rate for men and women
(usually within 1 or 2 percentage points). In fact, in recent years in Taiwan, the unemployment
rate for women has been lower than that for men. Exceptions to this trend include Greece (where
the unemployment rate of women is often 2 to 3-fold that of men), Kenya, and Switzerland, where
women are still substantially more likely to be unemployed then male workers. However, it must
be noted that even small changes in the unemployment rate may have strong repercussions for the
economic, political, and social situation of a country.
Among all nations surveyed, there is still gender disparity in pay, and this disparity continues to
be substantial in magnitude. Among all countries where gender disparity information was available,
women earned between 11 and 34% less than men. However, this figure may be lower or higher
among countries where we currently do not have the information available. Although it is unclear as
to whether these estimates take into account factors such as differences in occupations, differences
in full- versus part-time work, differences in educational attainment, etc., other research has shown
that even taking into account some of these extraneous factors, women still earn less than their male
counterparts (although the magnitude does decrease slightly). The U.S. General Accounting Office
(2003) reported that women still only earn 80% of what men earn (compared to 75% when not tak-
ing into account differences) in 2000 after taking into account occupation, industry, race, marital
status, and job tenure. Currently, the most positive outlook for women’s earning are in Belgium,
France, Israel, New Zealand, Switzerland, and the United Kingdom, where women earn 80 cents or
more for every dollar earned by men (Equal Opportunities Commission, 2004).
There continues to be occupational segregation to some extent in all 22 countries. Across the
board, women are still more likely to be found in clerical or secretarial, retail or sales, healthcare
656

TABLE 30.2
Women’s Status in the Workplace Within Each Country
Percentage of Men Percentage of Women
Percentage of Workforce Participation in Participation in Male Unemployment Female Unemployment
Country Population Workforce Workforce Rate Rate Wage Differentiala
Australia 72.0 57.0 66.0
Belgium 73.6 58.3 7.6 9.6 85.0
Canada 64.0
Chile 38.2
France 46.4 74.5 63.8 81.0
Germany 47.0
Greece 64.1–65.0 38.9–42.7 5.1–8.2 13.0–18.6
India 30.0
Israel 81.6
Italy 69.7 45.3
Japan 30.0b 73.2–79.4 48.0–50.0 67.1
Kenya 29.0d 74.7 72.6 25.0c 38.0 c
Korea 41.7 74.4 50.0 3.8 3.1 66.2
The Netherlands 70.0–77.0 54.0 4.5 6.8
New Zealand 61.2 81.0–87.0
Spain 42.2 3.5 4.8
South Africa 45.7
Switzerland 72.8 56.9 79.0–89.0
Taiwan 67.6 48.1 4.3 3.9 76.9
Turkey 36.3 72.3 25.4
United Kingdom 78.0 69.0 83.0
United States 46.4 74.0 59.0 4.7 4.5 77.0

The authors representing the various countries have undertaken to report the most recent data available from their country; there may be slight discrepancies between the years reported for
each country.
a Percent of women’s salary compared to men’s salary (men’s salary = 100%).

b Percent of full-time workforce.

c In urban areas.

d Within the modern wage sector.


Handbook of Employee Selection
Perspectives From Twenty-Two Countries on the Legal Environment for Selection 657

(e.g., nursing, childcare services), education (e.g., elementary school teachers), public services, or
small-scale agricultural farming occupations (e.g., Kenya and Turkey) than their male counterparts.
Furthermore, the occupations that women are most heavily concentrated in tend to be in the lower
income segment. Women remain underrepresented in business and management positions, particu-
larly higher levels of management. In most countries, women continue to lag behind in representa-
tion for technical and scientific positions, professional jobs, higher-level governmental positions
(e.g., judges, cabinet members, etc.), and most higher-level jobs across sectors.
Authors for several countries note that women are more likely to join the workforce as part-time
workers (e.g., Belgium, France, Germany, Japan, Switzerland, and the United Kingdom) to bet-
ter balance work and family demands or leave the workforce because of childcare demands (e.g.,
Japan, Korea, and the United States). The latter trend is particularly pronounced in Japan, where
the participation ratio by age groups shows an M-shaped curve, because labor force participation
rate declines in women’s early 30s because of childcare responsibilities. During the period of 1970–
2004, the valley section of this M-curve has shifted northeastward due in part to the trend of late
marriage and late childbirth. In addition, both peaks of this M-curve have become higher, indicating
that women’s workforce participation has substantially increased in their 20s and late-30s or older
(Japan Institute of Labor Policy and Training, 2007). However, some countries also indicated that
the wage gap between men and women may be even more pronounced among part-time workers.
For example, in the United Kingdom, women are paid 17% less then men in full-time work and 38%
less in part-time work (Equal Opportunities Commission, 2004).

Question 3: Is there research documenting mean differences between groups identified above
on individual difference measures relevant to job performance?
Mean differences on ability and personality measures are commonly examined in the United
States, with enough data for large-scale meta-analytic summaries. Mean differences on tests of
developed abilities of roughly 1.00 standard deviation (SD) between Whites and African Americans
and roughly 0.67 SD between Whites and Hispanics have been consistently reported. The largest-
scale summary of this literature is a meta-analysis by Roth, Bevier, Bobko, Switzer, and Tyler
(2001). This abundance of data proves to be in marked contrast to the pattern of findings in the
countries examined here. In fact, for most countries, the authors reported finding either no research
or research with samples so small that they refrained from drawing conclusions (i.e., Chile, France,
Greece, Italy, Japan, Korea, Spain, Switzerland, Turkey, and the United Kingdom). Although lim-
ited, there are some data on group differences in some countries.
Two countries (Australia and Taiwan) report research on cognitive ability differences between
aborigines and the advantaged group. The lower cognitive ability scores for Australian aborigi-
nes may reflect differences in language and culture. Aborigines in Taiwan, who typically have
lower educational attainment (Council of Indigenous Peoples, 2002), also score lower than non-
aborigines on several cognitive ability tests. Data from the United Arrangement Commission for
college entrance examinations in Taiwan in 2006 showed d values between 0.44 and 0.68 in favor
of nonaborigines, depending on the particular test subject (A. Chuang, personal communication,
May 1, 2007).
Cognitive ability mean score differences have been reported of d = 1.39 between Turkish/
Moroccan immigrants and Dutch test-takers and d = 1.08 between Surinamese/Antillean and Dutch
test-takers, in both cases favoring the majority group (te Nijenhuis, de Jong, Evers, & van der Flier,
2004). Language differences appear to contribute to these findings because higher scores are found
for second-generation than first-generation immigrants. Studies in Belgium also report mean differ-
ences of about 1.00 SD on cognitive tests between Belgians and Turkish and Moroccan immigrants
in samples of children (Fontaine, Schittekatte, Groenvynck, & De Clercq, 2006).
In South Africa, mean score differences on cognitive tests between Black and White groups are
normally larger than U.S. studies and have d values of approximately 1.00–1.50, with Whites obtain-
ing the higher mean scores. In a study performed in a South African financial services organization,
658 Handbook of Employee Selection

d values of 0.99 for averbal ability, 1.03 for a numerical ability, and 1.14 for a diagrammatic ability
test were found (SHL, 2006). In South Africa, these differences are largely ascribed to the differ-
ences in the educational level of the racial groups. In the 2001 census, it was determined that 22.3%
of Africans, 8.3% of Colored (mixed race), 5.3% of Indians, and 1.4% of Whites had no schooling
(Statistics South Africa, 2001).
Limited data report lower scores for Arabs than Jews in Israel (Zeidner, 1986), for Canadian
First Nations people than for Whites, for New Zealand Maori than for Whites (Chernyshenko,
2005; Guenole, Englert, & Taylor, 2003), and differences between individuals in various provinces
in Kenya (Kinyungu, 2006). Data on personality measures are even more limited than for cogni-
tive ability, with authors reporting personality data from only two countries: a large-scale study of
Black-White differences in South Africa (Kriek, 2006) showing small differences and several stud-
ies of Dutch-immigrant differences in the Netherlands showing much larger differences (van Leest,
1997; te Nijenhuis, van der Flier, & van Leeuwen, 1997, 2003).
Overall, several findings of interest emerge. First, it is clear that gathering data and reporting
mean differences by group is generally far more common in the United States than in virtually all
of the countries contributing to this report. This is likely the result of the legal scrutiny to which
tests are held in the United States. The Uniform Guidelines on Employee Selection Procedures
(U.S. Equal Employment Opportunity Commission, 1978) use adverse impact computations as the
basis for a prima facie case of discrimination, and thus, adverse impact resulting from test use is
routinely examined, with mean differences between groups and the method of test use (e.g., a high
or a low cutoff) functioning as key determinants of adverse impact. Second, although data tend to
be more sparse than in the United States, group differences are studied and observed in various
settings involving different types of disadvantaged groups (e.g., immigrant groups in Belgium and
The Netherlands; native peoples in Australia, New Zealand, and Canada; tribal and provincial dif-
ferences in Kenya; the native Black population in South Africa; and Arab groups in Israel). Third,
as in the United States, there is interest not only in whether there are group differences, but also
in understanding the basis for these differences. Language, culture, and differences in educational
access and attainment are seen as key concerns in understanding differences in test scores across
groups.
In the United States, disparate impact is the basis for a prima facie case of discrimination. The
implicit assumption is that various groups are expected to obtain similar mean scores absent bias in
the measure. Our data suggest that many European countries target certain groups as immigrants
to meet specific labor shortages. Thus, immigrants might have higher or lower abilities, depending
whether a country tried to attract highly skilled people (e.g., recent immigrants into Switzerland
from northern and western Europe) or tried to attract people with low skills (e.g., Turkish immi-
grants to Germany). In other words, even if one has a general expectation of no group differences
at the population level, a finding of differences between locals and immigrants would be expected
given this targeted immigration.

Question 4: Are there laws prohibiting discrimination against specific groups and/or man-
dating fair treatment of such groups? Which groups are protected? Which employers are
covered? Which employment practices are covered (e.g., selection, promotion, dismissal)?
Table 30.3 presents summary information addressing the above questions for each country.
Several findings emerge. First, there is some basis for legal protections for members of specified
groups in all countries. The bases for these protections vary widely. In many cases the national
constitution provides general, or at times specific, protections. This may be seen as analogous to the
5th and 14th Amendments to the U.S. Constitution, which respectively state that “no person shall …
be deprived of life, liberty, or property without due process of law,” and that “no state shall …
deny to any person within its protection the equal protection of the laws.” However, in virtually all
cases there are also specific laws defining specified protected classes, specifying which employment
practices are covered and which employers are required to comply. The intent here is to identify the
TABLE 30.3
International Laws and Practices
Country Law Employers Covered Employment Practices Covered
Australia The Crimes Act 1914 All employers; EOWW of 1999 refers to All stages of the employment relationship including but not
Racial Discrimination Act 1975 organizations of 100+ limited to recruitment, selection, termination, training, and
Sex Discrimination 1984 promotion.
Human Rights and Equal Opportunity Commission
Act 1986
Disability Discrimination Act 1992
Workplace Relations Act 1996
Equal Opportunity for Women in the Workplace Act
1999
Age Discrimination Act 2004
Belgium Belgian Constitution of 1994 Article 10, 11, 191 All employers Most employment practices including selection and
Law Equality of Men-Women of 1978 appointment, promotions, employment opportunities,
Antidiscrimination law of 2003 labor conditions, dismissal, and wages.
Canada Canadian Human Rights Code of 1985 Federal government departments, crown Most employment practices including selection,
Section 15 of the Charter of Rights and Freedoms corporations, and other federally regulated performance appraisal, termination, and compensation.
(1982) agencies and organizations
Federal Employment Equity Act (2004)
Federal Contractors Program
Pay equity legislation (federal and some provinces)
Chile Constitution, Chapter 3 (Rights and Duties), article 19 All employers The Constitution establishes the general nondiscrimination
Nº 16 (Freedom of Work and its protection) and principle on the basis of race, color, sex, age, marital
Work Code, Article 2º status, union membership status, religion, political
(2002) opinions, nationality, and national or social origin. In
March 2008, a new law went into take effect (law #
20,087). This new law defines discrimination as any action
Perspectives From Twenty-Two Countries on the Legal Environment for Selection

that is against the equal opportunity for all workers. A


new regulation will specify the practices that are covered
by the law.

continued
659
660

TABLE 30.3 (continued)


International Laws and Practices
Country Law Employers Covered Employment Practices Covered
France French Constitution of 1958 All employers Many employment practices including selection, access to
International convention of the United training, pay, layoffs, transfers, and job classification.
Nations (1965) ratified in 1971
International convention of the International Labor
Organization (1958) ratified in 1981
“The law concerning the fight against
racism” of 1972
“The law concerning worker’s liberties in
organizations” of 1982
Treaty of Amsterdam of 1997
L. 122-45 from Labor Law
225-1 and 225-2 from the Penal Code
Germany Allgemeines Gleichbehandlungsgesetz: General Equal All employers, except tendency organizations All stages of the employment relationship including placing
Opportunity Law (e.g. religious organizations) a job ad, hiring and selection, definition of payment,
performance appraisal and promotion, job-related training
and job counseling, corporate health services, design of
working conditions, social services, and dismissal.
Greece Greek Law 3304 of 2005, equal treatment All employers Conditions for access to employment, to self-employment, or to
Greek Law 3488 of 2006, on equal treatment between occupation, including selection criteria and recruitment
people in the labor market conditions; promotion; access to all types and to all levels of
vocational guidance, vocational training, advanced vocational
training and retraining, including practical work experience,
employment and working conditions; dismissals, pay,
membership, and involvement in an organization of workers
or employers, or any organization whose members carry on a
particular profession, including the benefits provided for by
such organizations; social protection, including social
insurance and sanitary relief; social provisions; education; and
access to disposal and to provision of benefits, which are
provided to the public, including housing.
Handbook of Employee Selection
India Indian Constitution Government entities, public sector organizations, Selection; previously promotion.
Article 15. Prohibition of discrimination on grounds and organizations receiving government
of religion, race, caste, sex, or place of birth funding
Article 16. Equality of opportunity in matters of
public employment
Article 39
Article 46
Article 335
Israel Basic Law on Human Dignity and Liberty All employers Compensation, staffing, conditions of employment,
Basic Law on the Freedom of Occupation All employers 6+ promotion, training and development, dismissal, severance
Women’s Equal Rights Law of 1951 pay, retirement benefits.
Equal Pay Law of 1996
Equal Employment Opportunity of 1988
Italy Italian Constitution of 1948 All employers Recruitment, selection, promotion, employment agencies,
Article 3 outplacement procedures, training, working conditions.
Legislative decree 216 of 2003
Japan Labour Standards Law of 1947 All employers Wages, working hours, other working conditions.
Law on Securing Equal Opportunity and Treatment All employers Recruitment and hiring, assignment, promotion, demotion,
between Men and Women in Employment of 1972 training, fringe benefits, change in job type and
employment status, encouragement of retirement,
mandatory retirement age, dismissal and renewal of
employment contract.
Law for Employment Promotion, etc. of the Disabled All employers Recruitment and hiring.
of 1960
Law Concerning Stabilization of Employment of All employers Mandatory retirement.
Older Persons of 1971
Kenya Kenyan Constitution Chapter 5, Section 82 All employment practices.
HIV and AIDS Prevention and Control Act 14
The Persons with Disabilities Act 14 of 2003
Perspectives From Twenty-Two Countries on the Legal Environment for Selection

continued
661
662

TABLE 30.3 (continued)


International Laws and Practices
Country Law Employers Covered Employment Practices Covered
Korea National Human Rights Not specified Recruitment, hiring, training, placement, promotion,
Commission Act of 2001 compensation, loans, mandatory retirement age,
retirement, and dismissal.
Equal Employment Act of 1987 All employers Recruitment, selection, compensation, education, training,
Employers of 500+ workers for affirmative action job placement, promotions, setting a mandatory retirement
clause age, retirement, and dismissal.
The Act of Employment Promotion and Vocational Employers with 50+ workers Hiring, promotion, transfer, education, and training.
Rehabilitation for the Disabled of 1990 Government employees
The Aged Employment Promotion Act of 1991 Employers with 300+ employers Recruitment, hiring, and dismissal.
The Basic Employment Policy Act of 1993 Not specified Recruitment and hiring.
The Netherlands Constitution, Article 1 of 2003 All employers (except religious, philosophical, Recruitment, selection, employment agencies, dismissal,
General Law Equal Treatment of 1994 or political organizations) labor agreements, education before and during
employment, promotion, and working conditions.
New Zealand Human Rights Act of 1993 All employers Refusal of employment, less favorable employment,
conditions of work, superannuation, fringe benefits,
training, promotion, transfer, termination, retirement,
and resignation.
South Africa Constitution of the Republic of All employers except the National Defense Includes, but is not limited to, recruitment procedures,
South Africa of 1996 Force, National Intelligence Agency, and South advertising, selection criteria, appointment and
Labour Relations Act, Act 66, of 1995 African Secret Service appointment process, job classification and grading,
Employment Equity Act, No. 55, of 1998 remuneration, employment benefits, terms and
conditions of employment, job assignments, working
environment and facilities, training and development,
performance evaluation systems, promotion, transfer,
demotion, disciplinary measure other than dismissal,
and dismissal.
Handbook of Employee Selection
Spain Spanish Constitution, Article 14 of 1978 All employers Recruitment, selection, promotion, compensation, training,
Law of Worker’s Statute of 1980, 2005, Article 4.2 y 17 temporal employment companies, employment agencies,
Organic Law for Effective Equality between Women dismissal, labor agreements, collective bargaining,
and Men of 2007, Article 1, 3, 4, 5, 6 education before and during employment, health
Law of Basic Statute of Public Employee of 2005, programs, and working conditions.
Article 14.i
Switzerland Bundesverfassung of 1999 (Swiss Federal
Constitution)
Bundesgesetz über die Beseitigung von Public employers Includes pre- (particularly), during, and postemployment
Benachteiligungen von Menschen mit Behinderungen practices.
of 2002 (Federal Law for the Equal Treatment of
People with Disabilities)
Bundesgesetz über die Gleichstellung von Mann und All employers Includes pre-, during, and postemployment practices (i.e.,
Frau of 1995 (Federal Law for the Equal Treatment recruitment, sexual harassment, earnings, promotions, etc.).
of Men and Women)
Schweizerisches Zivilgesetzbuch of 1907 (Swiss Civil
Code)
Bundesgesetz betreffend die Ergänzung des All employers Protection of employee personality and personal data
Schweizerischen Zivilgesetzbuches — throughout all stages of the employment process.
Obligationenrecht of 1912 (Swiss Code of
Obligations)
Taiwan Article 5 of the Employment Services Act of 1992 All employers Staffing.
Gender Equality in Employment Law of 2002 All employers Recruitment, selection, promotion, job allocation, performance
evaluation, promotion, training, compensation, benefits,
retirement, dismissal, and quit.
Equal Employment Opportunity for Aborigines Act Public employers and private employers who Staffing.
of 2001 are government contractors with domestic
employee of 100+

continued
Perspectives From Twenty-Two Countries on the Legal Environment for Selection
663
TABLE 30.3 (continued)
664

International Laws and Practices


Country Law Employers Covered Employment Practices Covered
Turkey Republic of Turkey Constitution of 1982 All employers Article 70 specifically covers selection for public
Article 10 institutions; other practices are implicitly covered
Article 49 including pay, promotion, and dismissal in other articles.
Article 50
Article 70
Labor Law, Article 5 of 2003 All employers (except sea transportation, air Performance appraisal, pay, promotion, and termination
transport, agricultural and forestry with less practices are implicitly covered; selection is not covered
than 50 employees, home services, internships, because the law only covers private sector employees who
professional athletes, rehabilitation workers, are already employed.
businesses with 3 workers, handmade art jobs
done at home, journalists)
UN’s Convention on the Elimination of All Sorts of All employers Generally all employment practices. including selection,
Discrimination Against Women Article 11 promotion, termination, pay, performance appraisal,
access to training, and treatment.
Prime Minister’s office circular of 2004 Public employers Selection.
United Kingdom Race Relations Act of 1976 All employers, trade unions, professional bodies, Generally all employment practices: selection, promotion,
and employment agencies termination, pay, performance appraisal, access to training,
Sex Discrimination Act of 1975 All employers, trade unions, professional bodies, and treatment.
and employment agencies
Employment Equality (Age) Regulations 2006 All ages, young and old
Equal Pay Act of 1970
Disability Discrimination Act 1995
European Community Directives
United States Civil Rights Act of 1964, Title VII (amended 1972, All public employers and private employers with Range of employment decisions including hiring, com-
1991) 15+ employees pensation, terms, conditions, and privileges of employment.

Age Discrimination Act 1967 Private employers with 20+ employees, state and Prohibits discrimination against individuals age 40 or older.
local governments
Americans with Disabilities Act 1990 and ADA covers private employers, state and local Prohibits discrimination against individuals with disabilities
Rehabilitation Act 1973 governments; Rehabilitation Act covers federal in the full range of employment decisions.
government; Virtually all employers
Equal Pay Act 1963 Prohibits discrimination against women in pay decisions.
Handbook of Employee Selection
Perspectives From Twenty-Two Countries on the Legal Environment for Selection 665

major contemporary federal laws and government decrees, and as such it is not a complete record
of all historical employment regulations. For example, in the United States a specialist can rightly
note that the Civil Rights Acts of 1866 and 1871 are still relied upon on occasion, although these are
not listed in the table. Also, several states and cities have additional statutes, offering protection to
groups beyond those covered by federal law.
Second, the protections offered are generally quite sweeping in terms of the types of employers
covered. In most cases all employers are covered. Some laws are restricted to government employ-
ees, and in some cases, coverage is restricted to larger employers, with the coverage threshold
varying quite widely for some statutes (e.g., more than 6 employees in Israel, 15 in the U.S., 100 in
Taiwan, and 300 in Korea).
Third, it is typical for a broad range of employment practices to be included. Employee selection
is specifically included in all countries except Chile, which has the least developed set of employ-
ment rights regulations of the countries examined here, and which has yet to specify a set of cov-
ered employment practices. However, Chile does prohibit discrimination based on race, color, sex,
age, marital status, union membership, status, religion, political opinions, nationality, and national
or social origin in its Constitution but does not specify which specific employment practices are
covered.
Fourth, there is considerable commonality and variation in the classes that receive protection in
each country. Table 30.4 identifies the most common protected classes and indicates whether those
classes are covered in each of the contributing countries. The classes covered in U.S. Civil Rights
law emerge as widely commonly covered across countries: race, color, religion, gender, national ori-
gin, age, and disability status. Three categories not protected by federal statute in the United States
are protected in most countries: political opinion, sexual orientation, and marital/family status.
Several protected classes are covered in only a few counties or are unique to a few countries; Table
30.5 identifies these less common protected classes. Examples include language, physical appear-
ance, union membership, socioeconomic status, and HIV status.

Question 5: What is required as prima facie evidence of discrimination? What is required to


refute a claim of discrimination?
In most countries, direct (e.g., differential treatment) and indirect (e.g., disparate impact) prima
facie evidence of discrimination are acknowledged. In India, disparate impact is necessary but
not sufficient to prove a case of discrimination; underrepresentation must be shown to be due to
historical social or religious discrimination toward a particular group. Only two countries require
evidence of the intent to discriminate, Taiwan and Turkey, thus ruling out a disparate impact theory
of discrimination.
However, although disparate impact evidence can be used as evidence in most countries, highly
specific evidentiary rules used in the United States (e.g., the four-fifths rule and tests of the statisti-
cal significance of the difference between passing rates for various groups) are generally not in use
(Canada, is an exception, because cases using the four-fifths rule in the United States have been
used to make a case for a similar standard). Commentators note that in most cases there are few or
no cases involving disparate treatment challenges to predictors commonly used by psychologists,
and thus, there is not the extensive case law that has developed in the United States. Recall that the
four-fifths rule in the United States derives from guidelines issued by enforcement agencies, and
the use of significance testing derives from case law; neither the concept of disparate impact nor
the mechanisms for identifying its presence are contained in statute. Absent a history of challenges
resulting in case law, it is not surprising to see the lack of specificity as to evidentiary standards.
A similar lack of specificity applies to the question of what is required to refute a claim of dis-
crimination. Table 30.6 summarizes information across countries. In general, there is some version
of the shifting burden of proof model in countries where disparate impact evidence is permissible.
After a prima facie showing, the burden to justify the use of the employment practice shifts to the
employer in all countries except Switzerland, where the burden of showing that the practice is not
666

TABLE 30.4
Most Common Protected Classes
Common Protected Classes
National/ Political Sexual Marital/
Country Race Sex Ethnic Origin Color Age Religion Disability Opinion Orientation Family Status
Australia X X X X X X X
Belgium X X X X X X X X X X
Canada X X X X X X X X X
Chile X X X X X X X X
France X X X X X X X X X
Germany X X X X X X
Greece X X X X X
India X X
Israel X X X X X X X X
Italy X X X X X X X
Japan X X X X X X
Kenya X X X X X X X
Korea X X X X X X X X X X
The Netherlands X X X X X X X X X
New Zealand X X X X X X X X X X
South Africa X X X X X X X X X X
Spain X X X X X X X X X
Switzerland X X X X X X X
Taiwan X X X X X X X
Turkey X X X X X
United Kingdom X X X X X X X
United States X X X X X X X
Handbook of Employee Selection
Perspectives From Twenty-Two Countries on the Legal Environment for Selection 667

TABLE 30.5
Other Protected Classes by Country
Country Other Protected Classes
Australia Breastfeeding, family or career responsibilities, irrelevant criminal record, physical features,
potential pregnancy, trade union or employer association activity, sexual harassment, pregnancy and
transgender status
Belgium Union membership, membership of other organizations, health, and any other personal characteristic
Chile Union membership status
France Moral principles, genetic characteristics, union activities or activities in a “mutuelle,” physical
appearance, family name, and health
Germany Philosophy of life, sexual harassment
India Scheduled castes, scheduled tribes, and other backward classes
Israel Personal status and military service
Italy Personal and social conditions and language
Japan Social status
Kenya Tribe, local connection, and HIV/AIDS status
Korea Social status, region of birth, appearance, criminal record after punishment has been served,
academic background, medical history, pregnancy, and physical conditions (e.g. appearance, height,
weight)
The Netherlands Philosophy of life, chronic disease, full-time/part-time work, and type of contract
New Zealand Ethical belief, employment status, and sexual and racial harassment
South Africa HIV status, conscience, belief, culture, birth, pregnancy, and language
Spain Social condition and membership to a labor union
Switzerland Socioeconomic status, way of life, and language
Taiwan Thought, provincial origin, appearance, facial features, union membership, status, and language
Turkey Philosophical belief, sect, and language
United Kingdom Persons who have undergone gender reassignment or intend to
United States Pregnancy

job-related is only partially reduced or remains with the plaintiff. There is a general notion that
the employer should present evidence to support the job relatedness of the employment practice in
question, but rarely is the required form of such evidence specified. The identification of validity
evidence as a mechanism for establishing job relatedness is rare.

Question 6: What are the consequences of violation of the laws?


Table 30.6 also summarizes possible consequences of violation in each participating country.
There is considerable variation in the array of possible remedies. As a point of reference, note that in
the United States the focus is on compensatory or “make-whole” remedies, with punitive damages
reserved for instances of intentional discrimination. Similarly, make-whole remedies are part of the
landscape in all countries for which information could be obtained. Several countries also provide
fines and punitive damages (e.g., Switzerland and Turkey), and several include imprisonment as a
possible consequence (e.g., Belgium, France, and Greece).

Question 7: Are particular selection methods limited or banned as a result of legislation or


court rulings?
There are relatively few restrictions on specific selection methods. As a point of reference, U.S.
law regulates the use of the polygraph, prohibiting its use for most private employers; several other
countries restrict polygraph use as well (e.g., Germany, Israel, and Turkey). The only selection
method specifically mentioned in U.S. law is the reference in the Tower amendment to Title VII
of the Civil Rights Act of 1964 (U.S. Code, 1964) to the permissibility of professionally developed
668

TABLE 30.6
Evidence Needed to Refute a Discrimination Claim, Consequences of Violation, and Permissibility of Preferential Treatment by Country
Country Evidence Needed to Refute a Claim Consequences of Violation Permissibility of Preferential Treatment
Australia Inherent requirements of the job, existence of special Injunction to stop the act, award of damages, order to Within-group norming is not banned and is used by
measures to eliminate discrimination, occupational the organization to redress the situation, variation, or some psychological testers as a means of complying
requirements, actions required by law, employment cancellation of a contract or agreement that violates with legislation (Myors, 2003). Targets may be used
within small organizations, consistent beliefs (e.g. the law. in some EEO plans, but explicit quotas are avoided.
religious organizations or educational institutes). The
statutes make no reference to the psychological
concept of validity nor has it arisen in case law.
Belgium Statistical data or practical tests can be used as Mediation or binding judgment from civil court. Preferential treatment is permitted to remedy a
evidence. Imprisonment and/or fines. historical discrimination against a group. Quotas are
permitted, but seldom utilized. Some organizations
also utilize target numbers.
Canada The employer must demonstrate that the employment Fines, payment for lost wages, reinstatement, and Preferential treatment permitted (mainly in the public
policy, practice, or procedure that is challenged is a ordering of special programs. sector).
bona fide occupational requirement. Tribunals and
courts are quite liberal in the evidence that they will
accept from employers in defense of their
employment practices. Empirical and statistical
evidence generated by I-O psychologists (e.g., local
validation studies) may be useful in defending
employment practices, but courts and tribunals often
lack the sophistication to make full use of such
detailed and complex technical information.
Chile Unclear, unless for sexual harassment or unionization Unknown. Currently, sexual harassment suits may Government has enacted an informal quota for women
suits. Empirical evidence not required. result in monetary compensation and up to 3 years in minister positions; however, this has not crossed
imprisonment. over into the private sector.
Handbook of Employee Selection
France Vague. Employer should present any information Three years imprisonment and/or a fine for conviction Considerable discussion about this; politically,
showing the decision is legitimate, nondiscriminatory, in a criminal court. Discriminatory act is annulled in a preferential treatment is seen as undesirable. However,
and based on objective information. civil court and possibly financial compensation. there are settings where it is used.
When parties present lists of candidates for regional and
senatorial elections they are required to have an equal
number of men and women. Also, there are quotas in one
setting: at least 6% of workforce needs to be handicapped
for organizations with more than 20 employees.
Germany Needs to be based on job requirements. Employee has right to refuse to work while on No formalization, but public authorities are to give
payroll and sue employers for damages. preference to women and handicapped persons.
Greece Employer must show that there has been no The employer who infringes the laws about equal Preferential treatment to prevent or compensate for
breach of the principle of equal treatment. treatment on the grounds of racial or ethnic origin, disadvantages linked to any of the protected classes.
religion or belief, disability, age or sex is punished by
imprisonment of 6 months up to 3 years and together
with a penalty of 1,000 up to 5,000 euros.
India At the discretion of the judge. Preferential treatment in the form of a relaxation of
qualifying scores for protected groups in external
recruitment is permitted; however, a common standard
is required for promotion. Not all members of protected
groups are equally eligible, also dependent on social/
economic status. Government positions also use quotas.
Israel Evidence of test reliability and validity, which can be Small fines. Hiring, reinstatement, or career Preferential treatment is required by public
based on validity generalization. In addition, the advancement of plaintiff, payment of back wages. organizations and state-owned enterprises for women
National Labor Court recently ruled that employers and minorities. Preferential treatment is permitted in
seeking to prove their innocence will be subject to the private sector.
less severe tests of selection validity to the extent that
they are accused of discriminating against internal as
opposed to external candidates; the logic being that
employers typically have far greater information upon
which to base a selection decision when choosing
Perspectives From Twenty-Two Countries on the Legal Environment for Selection

among internal candidates.


Italy Validity evidence not requested. Evidence to refute a Unknown. Preferential treatment permitted for women.
claim is currently unclear.

continued
669
670

TABLE 30.6 (continued)


Evidence Needed to Refute a Discrimination Claim, Consequences of Violation, and Permissibility of Preferential Treatment by Country
Country Evidence Needed to Refute a Claim Consequences of Violation Permissibility of Preferential Treatment
Japan Administrative advice. Preferential treatment permitted and supported by the
government. Quotas required for disabled.
Kenya Must show that decisions were based on applicant Remedy by following recommendations of Ministry of Different cut-off scores are set for members from
aptitudes and abilities. Empirical validity evidence Health, Labour, & Welfare. different ethnic groups to ensure that some members
not required. Possible public announcement of violation. Civil fine from each group will be selected. There are required
of maximum 200,000 yen ($2,400 U.S.). quotas of 5% in the private and public sector for
disabled individuals.
Korea Show job relatedness, but specific method unclear. National Humans Right Commission will make a Quotas required for disabled. Preferential treatment for
binding conciliation resolution. Fines. women, although firms with over 50% women in
workforce are exempt.
The Netherlands Generally no validity evidence is requested because the Nonbinding judgment by the Commission of Equal Preferential treatment is permitted for women and
validity of common psychological tests, such as tests Treatment and possibly judgment referral to a civil ethnic minorities (does not have to be equally
for cognitive abilities, personality inventories and court. qualified).
assessment center exercises, is taken for granted.
Most claims concern direct discrimination or
treatment discrimination (Commissie Gelijke
Behandeling, 2006). Exceptions are clear-cut cases
of indirect discrimination in which inappropriate job
requirements were set.
New Zealand Unclear, because few cases make it to court. Genuine Apology, payment or compensation, assurance that the This is currently being explored. Preferential treatment
Occupational characteristics (GOQ). discriminatory act will not be repeated, or referral to a appears to be permitted (and may be soon applied to
Human Rights Tribunal for further judgment. the Maori population).
South Africa Qualitative and empirical data can be brought to bear Fines. Possible cancellation of government contracts. Preferential treatment is permitted and applied. Racial
to support validity. quotas are legal and practiced by many large
employers. The practical implication for this is that it
is legal in the South African context to use race
norming, or within-group top-down selection
strategies, to address affirmative action needs of
organizations.
Handbook of Employee Selection
Spain Recent laws may lead to greater focus on empirical Compensation, rejection of the decision, and Preferential treatment for women in some cases.
evidence; up until now, validity of tests was taken subsequent application of the court decision,
for granted. repetition of the selection process with new
procedures.
Switzerland Empirical evidence not generally presented or Courts can award damages including payment of Preference is permitted but not required.
required. owed earnings and payment of compensation and
satisfaction.
Taiwan Provide evidence of job relatedness. Fines. Quotas required for aborigines (at least 1% of private
organizations’ workforce).
Turkey Reinstatement, back pay, and/or monetary damages. Preferential treatment is not required or permitted and
is actually forbidden.
United Show that requirement is justified. The employer can Court has discretion. Compensation to the plaintiff. Preferential treatment is not permitted, but “positive
Kingdom show that they took all “reasonable” steps to prevent Formal investigation by governing bodies that can action” such as encouraging certain groups to apply or
discrimination. No impact cases involving tests have recommend changes in procedures. offering training to these groups.
reached the stage of a court decision, so there is as yet
no requirement of validity evidence.
United States Evidence that the challenged practice is job-related for Upon a finding of discrimination, a judge can specify 1991 amendments to Title VII of Civil Rights Act
the position in question and consistent with business “make whole” remedies, such as back pay, hiring, or prohibit preferential treatment, specifically in the form
necessity (largely through validity studies). reinstatement. There are no punitive damages absent a of adjusting scores or using separate norms for
finding of intentional discrimination. minority group members. Preferential treatment is
permitted after a finding of discrimination as part of a
judicially ordered remedy.
Perspectives From Twenty-Two Countries on the Legal Environment for Selection
671
672 Handbook of Employee Selection

ability tests, provided that such tests are not designed, intended, or used to discriminate. Additional
instances reported of restrictions on specific selection methods in participating countries include a
prohibition against comprehensive personality assessment in Switzerland and a restriction on the
use of certain Minnesota Multiphasic Personality Inventory (MMPI) and California Psychological
Inventory (CPI) items in Spain.
The most strikingly different approach to regulating selection practices is found in South
Africa. Rather than the common approach of a presumptive right of an employer to use a particu-
lar method absent a successful challenge by a plaintiff, South African law puts the burden imme-
diately on the employer. According to the Employment Equity Act of 1998 (Government Gazette,
1999), psychological testing and other similar assessments are prohibited unless the test is proven
to be scientifically valid and reliable, can be applied fairly to all employees, and is not biased
against any employee or group. The Society for Industrial and Organizational Psychology (SIOP)
in South Africa published “Guidelines for the Validation and Use of Assessment Procedures for
the Workplace” during 2005 to provide guidelines for practitioners in the field of I-O psychology to
ensure that their assessment instruments and practices comply with the scientific requirements
and international best practices. These guidelines were largely based on the American SIOP
guidelines.

Question 8: What is the legal status of preferential treatment of members of minority groups
(e.g., quotas or softer forms of preference)?
To set the stage, note that the term “affirmative action” is used in various contexts, only some of
which involve preferential treatment for protected groups. Some forms of affi rmative action involve
outreach efforts to publicize openings and to encourage applications from members of protected
groups. However, there is no preferential treatment given once an individual is in the applicant pool.
Approaches involving preferential treatment fall into two main classes: (a) those that set differing
standards for protected and nonprotected groups without setting aside a specified number or propor-
tion of openings for members of protected groups (e.g., using different cut-off scores, using within-
group norming) and (b) quota approaches that set aside a fixed number or proportion of openings
for members of protected groups.
Table 30.6 summarizes the status of preferential treatment in the participating countries. Preferential
treatment is a domain in which the United States emerges as a clear outlier. Preferential treatment in
terms of differing score cutoffs or separate norming of tests within group is prohibited by the U.S.
Civil Rights Act of 1991 (U.S. Code, 1991), and the use of quotas is restricted to very limited set-
tings, such as a court-ordered remedy following a finding of discrimination. In contrast, in only two
countries do commentators report a prohibition against minority preference (Turkey and the United
Kingdom). The types of preference permitted and the settings in which it is used do vary widely. The
status of quotas varies, from prohibited (Australia), to permitted but rarely used (Belgium), to permit-
ted and widely used (South Africa), to used in government sectors (backward classes in India and
women in Chile), to required for certain groups (e.g., aborigines in Taiwan, individuals with disabili-
ties in France, Japan, Kenya, and Korea). Several commentators note that applying lower standards to
protected groups (e.g., different cutoffs or within-group norming) is used (Australia, India, and South
Africa). In India, lower qualifying scores for protected groups are permitted for external selection,
but not for promotion.

Question 9: How have laws and the legal environment affected the practice of science-based
employee selection in this country?
In only a few countries (Canada, South Africa, and the United States) is the legal environment
seen as having a large effect on science-based employee selection. In general, this can partially be
attributed to the much more amorphous legal standards and consequences with regards to employ-
ment discrimination in most countries surveyed. The reciprocal relationship between science-based
selection and the legal environment will need to be continually monitored because many countries
Perspectives From Twenty-Two Countries on the Legal Environment for Selection 673

are still in the process of developing legal statutes and requirements or establishing guidelines for
the prosecution and rulings on employment discrimination.
Overall, most employers in the countries surveyed have great latitude in choosing what selec-
tion procedures to utilize. However, most employers are aware of the social and political nature of
selection procedures and seem to err on the side of mainstream, popular, and usually well-validated
selection methods. The most common type of selection procedures do vary by country. It is com-
mon to see reports of increased use of the tools and techniques of science-based selection, but the
driving forces are more commonly the presence of multinational firms and consulting firms that
import these techniques into the country.

DISCUSSION
In this section we offer 35 broad summary statements about the patterns emerging from the narra-
tives from the various countries.

DISADVANTAGED GROUPS
1. Disadvantaged groups could be divided into four main groups: immigrants or foreign resi-
dents, religious minorities, racial/ethnic minorities, and language group minorities (speak
different primary language).
2. Many European (especially European Union) nations have disadvantaged groups who are
immigrants or foreign workers. The groups that are disadvantaged are usually Eastern
European or African.
3. Many Asian countries also have disadvantaged groups who are immigrants or foreign
workers.
4. Many of the racial/ethnic minorities are indigenous people (e.g., Australia, Canada, New
Zealand, Taiwan, and the United States).
5. Most disadvantaged groups are a relatively small proportion of the population, most below
the 20% “breaking point” specified in research on tokenism (Kanter, 1977).
6. Disadvantaged groups can constitute the majority of the population (e.g., South Africa).

WOMEN IN THE WORKPLACE


7. Women are now well represented in the workforce, and between one quarter to approxi-
mately one half of the workforce are women in most countries.
8. Women have generally substantially increased their participation rate in the workforce in
the last decade. However, men’s rate of participation in the workforce continues to greatly
outstrip that of women.
9. Women are still underrepresented in management and professional positions. However,
European nations and the United States have a sizeable representation of women in lower
and middle-management positions. However, all countries have very few women in top-
and senior-management positions.
10. Wage differentials are still sizeable between men and women; women generally earn
60–80 cents to the dollar compared with men.
11. Considerable occupational segregation remains for women, such that women tend to be
heavily concentrated in lower-income-segment occupations. These include clerical/secre-
tarial jobs, service jobs, nursing and childcare services, and primary education.
12. Women tend to engage in more part-time work (partly because of childcare
responsibilities).
674 Handbook of Employee Selection

SUBGROUP MEAN DIFFERENCES


13. Very few countries have research exploring potential mean differences in cognitive ability,
personality, or job performance. In terms of cognitive ability, findings usually favor the
advantaged group and/or men.
14. Mean differences between local and immigrant populations are affected by immigration
policies. Targeting either high- or low-skill immigrants can affect the magnitude and direc-
tion of mean differences.

DISCRIMINATION LAWS
15. Every country has a law or directive that prevents discrimination on the basis of sex or
race/ethnic origin and many other personal characteristics and beliefs.
16. Most discrimination cases seem to be settled by special commissions and/or courts rather
than by juries (which do not exist in several countries).
17. In many countries, few actual cases are actually filed and/or brought to trial, not because
discrimination does not occur, but because workers do not understand their rights, are not
used to protecting these rights (e.g., collectivistic orientation, etc.), or do not see much
benefit in going to court.
18. Punishment is generally usually rather light (e.g., minimal to moderate fine or reinstate-
ment, payment of back wages).
19. Concerns about privacy are very prominent in Europe. Many European countries are so
concerned that data on race or gender are not collected.

MAKING AND REFUTING A CLAIM OF DISCRIMINATION


20. For many countries, although there are laws in place, there is very little clarity about how
to establish discrimination and/or what kind of evidence required.
21. Intent to discriminate is not required in most countries (exceptions are Taiwan and
Turkey).
22. Most discrimination cases are handled on a case-by-case basis and are based on treating
people differently on the basis of group membership (direct discrimination) rather than a
procedure or test that systematically disadvantages a group (indirect discrimination). In
most countries surveyed, both are illegal.
23. Few actual cases outside of the United States challenging the adverse impact or discrimi-
natory nature of formal tests (cognitive ability or personality) exist, and therefore most
countries do not really use validity evidence to refute discrimination.
24. Most countries do not require validity evidence. In many places the empirical validity of
formal tests (e.g., cognitive ability, personality) is implicitly assumed.
25. Most countries do not use relevant workforce comparisons as a basis for discrimination
although this information is sometimes taken under consideration in certain countries.
26. The evidence to refute a claim of discrimination is usually some qualitative evidence of
job-relatedness or bona fide occupational requirement.

MINORITY PREFERENCE
27. Minority preference is permitted (and even recommended) in most countries. This is more
likely to be true for women or those with disabilities than for racial groups.
28. It is more common for government entities than for private-sector fi rms to engage in prac-
tices involving preferential treatment.
Perspectives From Twenty-Two Countries on the Legal Environment for Selection 675

29. Forms of affirmative action vary, ranging from active recruitment and training of women
or racial groups that have been traditionally disadvantaged to lower standards for these
groups.
30. Quotas are relatively rare but are present in several countries; for example, India (lower
castes), Taiwan (aborigines), Korea and France (disabled), and South Africa (race and
gender).
31. Explicitly forbidding preferential treatment is rare (e.g., Turkey).

SPECIFIC SCIENCE-BASED SELECTION TOOLS


32. Generally, science-based tools are not explicitly referenced in laws or in common legal
practices (exceptions include South Africa, Switzerland, and the United Kingdom).
33. Generally, although firms are free to use whatever selection methods they desire, large
firms tend to be aware of social and business pressures for effective selection.
34. The selection method that is most limited/banned is the polygraph.
35. Selection practice tends to be influenced more by the presence of multinational corpora-
tions and consulting firms than by legal pressures (with the exception of the United States,
Canada, and South Africa).

We fully anticipate that some readers may question the value of knowing the legal environment
of countries other than their own, because they are inevitably bound by the legal constraints of the
country they operate in. We have several responses. First, in today’s global world, more and more
firms engage in business that extends across national boundaries. Second, there is value in extend-
ing one’s framework beyond the national setting with which one is most familiar. Discovering that
the same issue is treated differently elsewhere breaks the mold of viewing a certain set of circum-
stances as inevitable. Third, documenting these differences sets the stages for comparative research
asking questions about why certain variations are found. For example, why is preferential treatment
not generally permitted and held in such negative popular opinion in the United States and not in
many other countries? Why are some groups protected in some countries but not others?
In conclusion, we hope this compilation of information about perspectives from a wide range of
countries is useful to students, researchers, and practitioners around the globe. We encourage inter-
national collaborations on other workplace issues, and hope this project provides a useful model.

AUTHORS’ NOTE
This research was conducted while Antonio Mladinic was on leave from the Pontificia Universidad
Católica de Chile and holding a visiting appointment at the University of Texas at El Paso and while
Herman Aguinis was on sabbatical leave from the University of Colorado Denver and holding a
visiting appointment at the University of Salamanca (Spain). Oleksandr Chernyshenko is now at the
Nanyang Business School, Nanyang Technological University (Singapore).

REFERENCES
Attal-Toubert, K., & Lavergne, H. (2006). Premiers résultats de l’enquête sur l’emploi 2005. [Initial results
from the 2005 employment survey]. INSEE Première, number 1070. Paris: INSEE. Retrieved April 15,
2007, from http://www.insee.fr/fr/ffc/ipweb/ip1070/ip1070.pdf
Cascio, W. F., & Aguinis, H. (2008). Applied psychology in human resource management (6th ed.). Upper
Saddle River, NJ: Pearson Education.
Chernyshenko, O. S. (2005). Report on psychometric evaluation of the general reasoning test (GRT2) for
the New Zealand Police: Measurement equivalence across ethnic and gender groups. Auckland, New
Zealand: OPRA Consulting Group.
676 Handbook of Employee Selection

Council of Indigenous Peoples. (2002). Yearbook of Taiwanese aborigines statistics. Taipei, Taiwan: Executive
Yuan.
Equal Opportunities Commission. (2004). Sex and power: Who runs Britain. Manchester, England: Author.
Fontaine, J. R. J., Schittekatte, M., Groenvynck, H., & De Clercq, S. (2006). Acculturation and intelligence
among Turkish and Moroccan adolescents in Belgium. Unpublished manuscript, Ghent University,
Ghent, Belgium.
Government Gazette. (1999). Employment Equity Act, 1998 (Act No. 55 of 1998), R 1360.
Guenole, N., Englert, P., & Taylor, P. (2003). Ethnic group differences in cognitive ability test scores within a
New Zealand applicant sample. New Zealand Journal of Psychology, 23, 39–54.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions. Mahwah, NJ: Lawrence
Erlbaum.
Japan Institute of Labor Policy and Training. (2007). Labor situation in Japan and analysis 2006/2007. Retrieved
June 5, 2007, from http://www.jil.go.jp/english/laborinfo/library/documents/Labor2006_2007.pdf
Kanter, R. M. (1977). Men and women of the corporation. New York, NY: Basic Books.
Kinyungu, C. (2006). KCPE: Public schools feel the heat. Retrieved January 31, 2007, from http://www.
eastandard.net/archives/cl/hm_news/news.php?articleid=1143963072
Kriek, H. J. (2006). Personality assessment: Group differences, language proficiency and fairness. Presented at
the Society of Industrial and Organizational Psychology Conference, May 2006, Dallas, TX.
Myors, B. (2003). Within-group norming: Just because it’s illegal in America, doesn’t mean we can’t do it here.
Paper presented at the 5th Australian Conference on Industrial/Organisational Psychology, Melbourne,
Australia.
Myors, B., Lievens, F., Schollaert, E., Van Hoye, G., Cronshaw, S. F., Mladinic, A., et al. (2008). International
perspectives on the legal environment for selection. Industrial and Organizational Psychology:
Perspectives on Science and Practice, 1, 206–256.
Roth, P. L., Bevier, C. A., Bobko, P., Switzer, F. S., & Tyler, P. (2001). Ethnic group differences in cognitive
ability in employment and educational settings: A meta-analysis. Personnel Psychology, 54, 297–330.
SHL. (2006). Validity study V036. Retrieved on June 7, 2007, from http://www.shl.com/globallocations/pages/
southafrica.aspx
Society for Industrial and Organisational Psychology in South Africa. (2005). Guidelines for the validation
and use of assessment procedures for the workplace. Retrieved on June 7, 2007, from http://www.siopsa.
org.za
State Institute of Statistics. (2006). Census of population: Social and economic characteristics. Ankara, Turkey:
Author.
Statistics South Africa. (2001). Census 2001: Primary tables South Africa, Census 1996 and 2001 compared.
Johannesburg, South Africa: Author.
te Nijenhuis, J., de Jong, M., Evers, A., & van der Flier, H. (2004). Are cognitive differences between immi-
grant and majority groups diminishing? European Journal of Personality, 18, 405–434.
te Nijenhuis, J., van der Flier, H., & van Leeuwen, L. (1997). Comparability of personality test scores for
immigrants and majority group members: Some Dutch findings. Personality and Individual Differences,
23, 849–859.
te Nijenhuis, J., van der Flier, H., & van Leeuwen, L. (2003). The use of a test for neuroticism, extraver-
sion, and rigidity for Dutch immigrant job-applicants. Applied Psychology: An International Review, 52,
630–647.
U.S. Code. (1964). Pub. L. 88-352.
U.S. Code. (1991). Pub. L. 102-166.
U.S. Equal Employment Opportunity Commission. (1978). Uniform guidelines on employee selection proce-
dure. 29 CFR 1607.1. Washington, DC: Author.
U.S. Office of General Accounting. (2003). Women’s earnings: Work patterns explain difference between men
and women’s earning. Retrieved on June 20, 2008, from http://usgovinfo.about.com/gi/dynamic/offsite.
htm?zi=1/XJ&sdn=usgovinfo&cdn=newsissues&tm=110&gps=64_261_1276_825&f=00&tt=2&bt=
0&bts=0&zu=http%3A//www.gao.gov/new.items/d0435.pdf
van Leest, P.F. (1997). Persoonlijkheidsmeting bij allochtonen [Assessment of personality for ethnic minori-
ties]. Lisse, The Netherlands: Swets & Zeitlinger.
Zeidner, M. (1986). Are scholastic aptitude tests in Israel biased toward Arab student candidates? Higher
Education, 15, 507–522.
Part 7
Employee Selection in Specific
Organizational Contexts
Rick Jacobs and Ann Howard, Section Editors
This page intentionally left blank
31 Selection and Classification
in the U.S. Military

Wayne S. Sellman, Dana H. Born,


William J. Strickland, and Jason J. Ross

The quality of a military, or any workforce for that matter, depends upon the quality of its people.
Successful attainment of military missions requires a force composed of dedicated, knowledge-
able, and competent members. When an organization can hire persons with prior experience, an
evaluation of past performance can serve as the primary criterion for selection and assignment
into jobs. Other characteristics such as aptitudes and education assume less importance. However,
when organizations hire young people without job experience, it becomes important to evaluate
aptitudes, education, interests, and other characteristics known to predict success in jobs sought by
the applicants.
The U.S. Department of Defense (DoD) is the world’s largest employer of young people.
Depending on personnel requirements, the DoD screens hundreds of thousands of youth for enlist-
ment annually. During the late 1970s, the DoD screened approximately 1 million applicants each
year; that number declined to “only” about 500,000 during the first years of the 21st century
(Sellman, 2001). As noted above, the military’s task in screening potential recruits is complicated
by the fact that the available personnel pool is composed predominately of young men and women
who have never held a permanent full-time job. Consequently, the services must depend mainly on
indicators of potential performance such as aptitude and levels of education.

MILITARY PERSONNEL SYSTEM


At its most basic division, the military separates its personnel into two or three categories: enlisted
personnel, commissioned officers, and (for all services except the Air Force) warrant officers. Each
military service uniquely recruits, trains, and professionally develops its members.
Comprising approximately 85% of the entire military, the enlisted force consists of (a) the basic
level (e.g., entry-level soldiers, sailors, airmen, or marines), (b) noncommissioned officers and
petty officers (NCOs), and (c) senior noncommissioned officers and senior petty officers (Senior
NCOs). These levels correspond to different levels of experience, training, education, and leader-
ship. Individuals at the basic level have typically just entered the military, are in training, or have
achieved initial competency in their occupational specialties. NCOs are technical experts in their
primary jobs and serve as first-line supervisors, who teach, train, and supervise basic-level person-
nel. Finally, Senior NCOs are seasoned individuals who have experienced a myriad of technical
jobs, held numerous first-line supervisory positions, and have performed at a higher level than many
of their contemporaries.
Commissioned officers are the senior leadership and management of the military. Similar to the
enlisted force, the commissioned officer force is divided into three subgroups: (a) company-grade
officers, (b) field-grade officers, and (c) general or flag officers. Company-grade officers are the

679
680 Handbook of Employee Selection

military’s action officers and are largely involved in the tactical level of the military organization.
Field-grade officers typically fill many operational-level positions and most command and staff
assignments. Lastly, the general or flag officers are the service executives and are primarily engaged
in strategic, policy-making decisions that affect the organization in the long-term.
There are four principal paths that can be taken to become a commissioned officer. Two of
the primary officer commissioning programs, the service academies and the Reserve Officers
Training Corps (ROTC), are administered in conjunction with an individual’s undergraduate
academic studies. The two remaining principal commissioning programs, Officer Candidate/
Training School (OCS/OTS) and Direct Commissioning, are designed almost exclusively for
individuals who already possess at least a baccalaureate degree (U.S. Department of Defense,
2006).
Creating a new officer through either the service academies or ROTC is a 4-year process. The
services use OCS/OTS as a source for a specific portion of their new officers annually. In addi-
tion, in times of growth, OCS/OTS provides a quick-reaction surge capability that the longer-term
programs cannot match. Direct commissioning is normally reserved for people with professional
credentials (e.g., physicians, attorneys).
With the sole exception of the Air Force, the Army, Navy, and Marine Corps have warrant
officers who fill highly specialized leadership positions. Unlike their commissioned officer coun-
terparts whose experiences are broad and service-encompassing, warrant officers are employed in
positions that require highly specialized technical or tactical skills (e.g., helicopter pilots). Selection
as a warrant officer is highly competitive and only available to those who meet rank and length-of-
service requirements in the enlisted force.
Distinct from the civilian sector, the military is a completely closed personnel system; this
means that the services fill personnel vacancies with members already employed within their ranks.
American military leaders are “grown” from the junior ranks; the services do not hire military indi-
viduals to enter mid- or senior-level ranks. Because it takes years to successfully replace a member
who leaves the military, attracting officer and enlisted candidates is a high priority for military
policy-makers. Selecting the correct number of high-quality individuals each year is essential to
sustain a flow of seasoned leaders for the future.
Table 31.1 shows the names of the officer ranks as well as the number of officers in each service
as of December 31, 2007. The rank structure is analogous to a pyramid, with junior individuals serv-
ing as the base of the pyramid and outnumbering those individuals in the increasingly senior ranks.
The table also shows the same information for the enlisted ranks (highest is E-9 and the lowest is
E-1) but does not identify them by name because each service uses its own nomenclature.
Within the services, there are literally hundreds of military occupations. Although many are
similar to civilian jobs, there also are large numbers of occupations that are unique to the military.
The services categorize the plethora of job specialties into several broad occupational areas as
shown in Tables 31.2 and 31.3. Because of the large number of military enlistees (about 350,000
annually in the active and reserve components) who must be assigned into a large number of
military occupations, the services, unlike most civilian employers, must be proficient at job clas-
sification as well as personnel selection. However, classification to military occupations depends
on eligibility, individual preference, and availability of openings (U.S. Department of Defense,
2006b; Campbell & Knapp, 2001). With an enormous diversity of occupations, a vast number of
openings at specific positions, and a variety of individual skills, the challenge of classification is
appreciable.

INDICATORS OF RECRUIT QUALITY


The DoD and the services use aptitude and educational achievement as indices of recruit quality
(Sellman, 1997; Sellman & Valentine, 1981). These “quality” indices are used in lieu of evaluating
past work experience—a criterion that rarely exists for military applicants, who are for the most
Selection and Classification in the U.S. Military 681

TABLE 31.1
U.S. Department of Defense Current Officer, Warrant Officer,
Enlisted, and Cadet/Midshipmen Numbers by Servicea
Rank/Grade—All Services
Army Navy Marine Corps Air Force Total
General-Admiral 11 11 3 15 40
Lieutenant General-Vice Admiral 52 34 15 31 132
Major General-Rear Admiral (U) 92 69 27 95 283
Brigadier General-Rear Admiral (L) 151 110 40 146 447
Colonel-Captain 4,084 3,128 699 3,385 11,296
Lieutenant Colonel-Commander 9,127 6,713 1,842 9,928 27,610
Major-Lieutenant Commander 15,436 10,324 3,633 14,723 44,116
Captain-Lieutenant 25,006 17,061 5,572 22,418 70,057
1st Lieutenant-Lieutenant (JG) 6,862 5,975 2,812 7,565 23,214
2nd Lieutenant-Ensign 9,944 6,239 3,019 7,104 26,306
Chief Warrant Officer W-5 456 61 84 601
Chief Warrant Officer W-4 2,382 257 268 2,907
Chief Warrant Officer W-3 3,369 780 531 4,680
Chief Warrant Officer W-2 4,493 503 758 5,754
Warrant Officer W-1 3,233 232 3,465
Total officers 84,698 51,265 19,535 65,410 220,908
E-9 3,580 2,844 1,555 2,688 10,667
E-8 11,498 7,122 3,591 5,148 27,359
E-7 39,119 23,632 8,121 26,112 96,984
E-6 61,332 49,654 13,725 43,209 167,920
E-5 81,674 68,861 28,351 69,288 248,174
E-4 118,117 51,928 37,147 52,330 259,522
E-3 63,316 43,855 38,170 47,348 192,989
E-2 33,037 18,193 19,867 6,397 77,494
E-1 21,327 14,476 16,147 10,340 62,290
Total enlisted 433,300 280,565 166,674 262,860 1,143,399
Cadets-midshipmen 4,390 4,384 0 4,393 13,167
Grand total 522,388 336,214 186,209 332,663 1,377,474

a The most recent figures for this table can be obtained from the website noted in the source line below.
Source: Adapted from http://siadapp.dmdc.osd.mil/personnel/MILITARY/rg0712.pdf

part recent high school graduates. For enlisted selection and classification, the Armed Services
Vocational Aptitude Battery (ASVAB) is the single test used to determine enlistment eligibility
of applicants for the Army, Navy, Air Force, and Marine Corps as well as their respective reserve
components. In addition, ASVAB is used to assign successful applicants to military occupations.
Although a part of the U.S. Department of Homeland Security, the Coast Guard also uses ASVAB
for personnel selection and job classification.
ASVAB is administered in computer adaptive and paper-and-pencil versions (Sands, Waters, &
McBride, 1997). The computer adaptive version is administered to about 70% of applicants at 65
Military Entrance Processing Stations across the country. The remaining 30% of applicants receive
the paper-and-pencil form at 650 remote, satellite testing sites (Sellman, 2004).
ASVAB is a battery comprising ten tests that measure verbal, mathematics, and science/techni-
cal skills and knowledge. The Armed Forces Qualification Test (AFQT), a composite of ASVAB
tests, measures verbal (word knowledge and paragraph comprehension) and mathematics (arithmetic
682

TABLE 31.2
FY2006 Active Component Enlisted Corps by Occupational Area and Service
Occupational Area
Infantry, Gun Crews, Other
Gender and Seamanship Electronics Communications Medical Technical Administrators Electrical Craftsman Supply Nonoccupationala Total
Army 110,778 23,649 49,199 34,208 15,626 59,520 58,472 9,300 56,642 2,771 420165
Navy 24,605 35,817 26,147 25,185 2,752 35,698 86,549 17,362 21,000 17,988 293,103
Marine 39,982 10,187 11,626 0 4,129 25,016 25,561 3,870 18,786 22,120 161,277
Corps
Air Force 27,945 23,970 23,434 20,688 11,083 56,331 64,963 11,765 13,939 19,872 273,990

DoD total 203,310 93,623 110,406 80,081 33,590 176,565 235,545 42,297 110,367 62,751 1,148,535

a Nonoccupational includes patients, students, those with unassigned duties, and unknowns.
Handbook of Employee Selection
TABLE 31.3
FY2006 Active Component Officer Corps by Occupational Area and Service
Occupational Area
Engineering Supply,
General Tactical and Scientists and Health Procurement,
Service Officers Operations Intelligence Maintenance Professionals Care Administration and Allied Nonoccupationala Total
Selection and Classification in the U.S. Military

Army 318 23,515 4,582 10,592 5,082 14,043 4,776 5,894 808 69,610
Navy 225 20,458 2,183 5,049 2,102 10,033 2,267 2,518 5,574 50,409
Marine Corps 83 8,343 848 1,345 454 0 982 2,115 2,272 16,442
Air Force 287 23,856 3,859 10,952 4,628 11,153 4,666 6,026 4,746 70,173

DoD Total 913 76,172 11,472 27,938 12,266 35,229 12,691 16,553 13,400 206,634

a Nonoccupational includes patients, students, those with unassigned duties, and unknowns.
Source: U.S. Department of Defense, Office of the Under Secretary of Defense for Personnel and Readiness, Population representation in the military services FY2006, Washington, DC,
2008.
683
684 Handbook of Employee Selection

reasoning and mathematics knowledge) abilities. AFQT1 is the primary enlistment screen for all
services and is the DoD’s first index of recruit quality. The tests of science/technical knowledge
include general science, electronics information, mechanical comprehension, auto information,
shop information, and assembling objects.
On the basis of statistical validity analyses, the services combine the various ASVAB tests into
“aptitude area” composites, which are used to assign new recruits to military occupations. Each ser-
vice computes and uses its own set of composites for job classification giving each service a degree
of flexibility to stipulate the necessary skills required to fill each job position (Diaz, Ingerick, &
Lightfoot, 2004; Lightfoot, Diaz, Heggestad, Darby, & Alley, 1999; Rumsey, Walker, & Harris,
1994; Waters, Laurence, & Camara, 1987). Although there has been some research to explore the
relationship between AFQT and service composite scores and performance in the second term of
enlistment (generally from 4 to 8 years of service), ASVAB validity is usually established using
entry-level training or first-term job performance as criteria (Oppler, McCloy, & Campbell, 2001).
Such service differences in composites make sense, even for what appear to be virtually identical
occupations (e.g., electronic repair specialists, motor mechanics, cooks, supply technicians, clerks).
The services have distinctly different functions that affect their need to fulfill their respective mis-
sions. For example, the Army and Marine Corps have extensive ground combat responsibilities
that are quite different from most Navy and Air Force activities. Certainly, a ship’s environment is
very different from that of an aircraft or tank. Consequently, for what is ostensibly the “same” job,
the particular equipment used by personnel in the different services may dictate a different mix of
abilities (Waters et al., 1987).
ASVAB is normed against a nationally representative sample of young people ages 18–23 years
old tested in 1997 as part of the Bureau of Labor Statistics’ National Longitudinal Survey of Youth
(Segall, 2004). Such norms allow the comparison of applicant and recruit aptitude levels with those
of the contemporary civilian youth population from which they come. AFQT scores are expressed
on a percentile scale and grouped into five categories for reporting purposes. Table 31.4 shows the
percentile score ranges and percent of civilian youth that correspond with each AFQT category.
Persons who score in Categories I and II tend to be above average in cognitive ability; those in
Category III, average; those in Category IV, below average; and those in Category V, markedly
below average. (Category III is divided at the 50th percentile into subcategories A and B. This
facilitates reporting the proportion of scores above and below the mean of the AFQT distribution.)
By law, Category V applicants and those in Category IV who have not graduated from high school
are not eligible for enlistment.
The best single predictor of successful adjustment to military life is possession of a high school
diploma. Consequently, the services also value recruits with high school diplomas because they are
more likely to complete an initial tour of duty than are enlistees with alternative credentials or non-
graduates. About 80% of high school diploma graduates complete their first 3 years of service, com-
pared to only 50% of high school dropouts (U.S. Department of Defense, 1996). Completion rates
for enlistees holding an alternative credential such as a General Education Development (GED)
certificate fall in between the high school diploma graduate and nongraduate rates (Elster & Flyer,
1981; Flyer, 1959; Laurence, 1984, 1997). Thus, educational achievement is the DoD’s second index
of recruit quality.
Over the past 25 years, there has been a proliferation of education credentials in the United
States. In addition to earning a regular high school diploma, young people can receive credentials
through adult education programs and home schooling, through experiential learning, and by taking
high school equivalency tests. The DoD uses a three-tier system to classify education credentials.
The system was developed after research indicated a strong relationship between level of education

1 The AFQT, as either a stand-alone test or a composite of ASVAB tests, has been in use for personnel selection since 1950.
Although its content has changed somewhat over the years, the AFQT has always been a measure of “g” with verbal and
math components.
Selection and Classification in the U.S. Military 685

TABLE 31.4
AFQT Categories by Corresponding Percentile Score Ranges
and Percentage of Civilian Youth Population
AFQT Categories Percentile Score Range Percentage of Civilian Youth
I 93–100 8
II 65–92 28
IIIA 50–64 15
IIIB 31–49 19
IV 10–30 21
V 1–9 9

and successful completion of the first term of military service (Laurence, 1997; U.S. Department
of Defense, 1996). Tier 1 includes regular high school diploma graduates, adult diploma holders,
and nongraduates with at least 15 hours of college credit. Tier 2 comprises alternative credential
holders such as those with GED diplomas or certificates of completion or attendance, and Tier 3 is
composed of non-high-school graduates.
The services prefer to enlist people in Tier 1 (high school diploma graduates) because they have
a higher likelihood of completing a first term of service than do individuals in Tiers 2 and 3 (e.g.,
GED holders or high school dropouts). Consequently, education standards refer to the application of
progressively higher aptitude test score minimum requirements for high school diploma graduates,
equivalency credential holders, and nongraduates, respectively (Laurence, 1984). The rationale for
this policy is based on the differential attrition rates of these three education groups. That is, mem-
bers of Tiers 2 and 3 are about twice as likely to leave service prematurely as those in Tier 1. Higher
aptitude requirements for Tiers 2 and 3 are used to accept only the “best” from the statistically less
successful and thus less preferred group of applicants (U.S. Department of Defense, 1996).

NEED FOR MILITARY SELECTION


Military recruiting is a supply and demand phenomenon (Sellman, 1998, 1999) that is influenced by
the costs of recruiting qualified individuals for enlistment. When recruiting prospers, the services
raise their enlistment standards. When times are bad, the services sometimes lower their standards
and allow services to access2 somewhat lower-quality recruits to enter the service; thus, allowing
the services to meet their recruiting goals. Military recruiting, assignment, and training of young,
unskilled people is an investment; the underlying purpose of the personnel selection and job clas-
sification process is to reduce the risk that an investment will be made in persons who are unable
(or unwilling) to perform their duties. There also are costs associated with recruit quality levels; it
is more difficult and costly to recruit high-quality youth (high school graduates with above average
aptitude) than their lower-quality peers. Thus, recruit quality standards directly influence recruiting
resource requirements (Sellman, 1999).
Once admitted into service, recruits are expected to progress through training, to perform their
duties competently, and to observe military order and discipline. Unfortunately, not all enlistees get
through basic training and job skill training and, even for those who do, not all manage to avoid
disciplinary problems. Still others may play by the rules but may perform well below par on the
job for reasons not related to low aptitude but rather to lack of motivation. The consequences for
substandard performance may include slow promotion progress, reassignments, various forms of
punishment from reprimands to incarceration, and in many cases an early exit from service.

2 “Access” is a term used by the U.S. military to indicate “entrance” into service. Thus, an “accession” is an entering service
member.
686 Handbook of Employee Selection

The most analyzed indicator of maladjustment to the military is first-term attrition, the failure to
complete an obligated period of service. According to the U.S. Government Accountability Office
(GAO), it cost $40,000 in 1997 to replace (recruit, train, and equip) each individual who failed to
successfully complete a first tour of duty (U.S. Government Accountability Office, 1997). Given
the substantial increase in recruiting resources associated with recruiting challenges brought on
by the war in Iraq, today that number is considerably higher (Stewart, 2005). There also are non-
pecuniary or indirect costs, which include force instability, lowered morale, and lack of readiness.
Individuals also may pay a personal price: Failure in military service may significantly affect their
future employment opportunities and earning potential. Consequently, it is in the interest of recruits
and the services to reduce first-term attrition (Strickland, 2005).
Attrition of newly commissioned officers during their initial service commitment is generally not
a problem; however, officer retention beyond that initial service commitment is a constant concern.
Because a service academy education represents a substantial investment (with cost estimates rang-
ing as high as $403,000 per graduate), the services need many of those officers to remain in service
far past their initial obligation.

SHORT HISTORY OF MILITARY PERSONNEL


TESTING (PRE-ALL VOLUNTEER FORCE)
Although current testing methods are codified into U.S. law today, these testing methods have not
always been in place. Because of the advent of new weaponry in World War I (tanks, airplanes,
chemicals, etc.), the American military started using tests to screen people for service and assign
them to a military occupation. In 1917–1918, the Army Alpha and Army Beta tests were developed so
commanders could have some measure of the ability of their men (Waters, 1997). The Army Alpha
was a verbal, group-administered test that measured verbal ability, numerical ability, ability to follow
directions, and information. The Army Beta was a nonverbal, group-administered counterpart to the
Army Alpha. It was used to evaluate the aptitude of illiterate, unschooled, or non-English speaking
inductees (Yerkes, 1921). Both tests are recognized as prototypes for subsequent group-administered
cognitive ability tests.
Rising from the Army Alpha and Beta tests’ foundations, the Army General Classification Test
(AGCT) of World War II replaced its predecessors. The AGCT’s intent was similar to the Alpha
and Beta tests in that it was designed to be a general learning test used for job placement. Although
it served the services successfully throughout the World War II years, at the war’s conclusion, each
service developed its own aptitude test for service entry. Eitelberg, Laurence, and Waters (1984)
noted, “Though different in structure, primarily with respect to qualifying scores, the service tests
were essentially the same with respect to content area, relying on the time-honored items of vocabu-
lary, arithmetic, and spatial relationships.”
In 1950, the military returned to a single test, the AFQT, to be used in conjunction with the
Selective Service System draft. The AGCT served as the AFQT’s model in which the AFQT
measured basically the same variables as the AGCT and the previous Army Alpha and Beta
tests; however, contrary to the previous tests, the AFQT was specifically designed to be used as a
screening device (Karpinos, 1966). Thus, the AFQT was established for the purpose of (a) measuring
examinees’ general ability to absorb military training and (b) providing a uniform measure of
examinees’ potential usefulness in the service, if qualified, on the test (Maier, 1993; Uhlaner &
Bolanovich, 1952).

MOVING TO AN ALL-VOLUNTEER FORCE


Throughout most of American history, the U.S. military has been composed of volunteers. However,
conscription was the primary means of obtaining military personnel during World Wars I and II and
the Korean Conflict to the point that its renewal became perfunctory. The decision to move to an
Selection and Classification in the U.S. Military 687

all-volunteer military evolved from criticism of the inequities of conscription during the Vietnam
War—who shall serve when not all serve? In the late 1960s, President Richard Nixon established
a commission to develop a comprehensive plan for eliminating conscription and moving toward an
all-volunteer force. The commission built a case for a volunteer military by pointing out the unfair-
ness of conscription, establishing the feasibility of a volunteer force on economic grounds, and
refuting all major arguments against ending conscription and relying totally on volunteers (Lee &
Parker, 1977; Gates, 1970).
The commission believed that sufficient numbers of qualified youth could be persuaded to
volunteer by increasing military pay to levels more competitive with civilian wages. They dis-
puted claims that total reliance on volunteers would lead to a mercenary force consisting mainly
of minorities, the poor, and the uneducated, and loss of civilian control. After much debate within
the Administration and Congress and across the country, it was decided that an all-volunteer force
was feasible, affordable, and would not jeopardize the nation’s security (Rostker, 2006; Defense
Manpower Commission, 1976). Thus, the authority for conscription was allowed to lapse on July 1,
1973, and the last conscript entered the Army in December 1972.
With adequate resources and support to attract and retain the brightest personnel, conscription is
not needed to meet future military personnel requirements (Bicksler & Nolan, 2006). An all-volunteer
force is more expensive than a conscription force in terms of military compensation and funds for
advertising and enlistment incentives. However, a voluntary military is less expensive in overall costs
(Fredland, Gilroy, Little, & Sellman, 1996; Lee & McKenzie, 1992; Warner & Asch, 1996). It is
more stable and career-oriented, thereby leading to extra performance and experience with reduced
training and other turnover costs (Oi, 1967). During conscription, 10% of new inductees reenlisted;
today’s new recruits reenlist at a 50% rate (Rostker, 2006). In short, military service is an economi-
cally rational choice for high-quality men and women looking for an edge on life. The military also
is a good choice for people who want to serve a greater cause (Bicksler, Gilroy, & Warner, 2004).
During the first years of the all-volunteer force, the AFQT was used to identify individuals who
had a reasonable probability of success in service, and other service-specific tests were required for
job classification. The Army Classification Battery, the Navy Basic Test Battery, and the Airman
Qualifying Examination, just to name a few, were used from the late 1950s to the mid-1970s
(Waters, 1997). During this period, the AFQT was administered to military applicants (including
draft inductees) at Armed Forces Examining and Entrance Stations (AFEES) across the country for
selection purposes. Because women were not subject to the draft, a different aptitude test was used
for female applicants for enlistment. The Armed Forces Women’s Selection Test was administered
to female applicants in lieu of the AFQT from 1956 to 1974. If individuals successfully “passed” the
AFQT and were accepted for service, they were sent to basic training, although the specific occu-
pation to which they would be assigned had not yet been determined. During basic training, new
enlistees were administered their service’s classification tests and were assigned to their appropriate
military occupations.
During the mid-1970s, DoD determined that a single test that measured aptitude and job place-
ment was to be used, resulting in the development and implementation of the ASVAB, which is still
in use today (Sellman & Valentine, 1981). The ASVAB’s creation and implementation enabled DoD
to successfully screen applicants, match applicants with job positions, reserve job skill training for
applicants if they qualified, and provided a uniform standard measure on which all applicants across
the board could be ranked. This was a departure from previous procedures when selection testing
was conducted at AFEES during the entrance process (for either enlistment volunteers or draft
inductees) and classification testing was accomplished at service basic training centers preparatory
to assigning new enlistees to military occupations and sending them for job-skill training.
By combining selection and classification testing at the AFEES, the testing process was to be
made more expedient for the newly implemented all-volunteer military. Young people volunteering
for enlistment would take one test and come away from the AFEES knowing not only if they quali-
fied for enlistment, but, if qualified, also the military occupation to which they would be assigned.
688 Handbook of Employee Selection

Thus, the new testing process enabled the services to improve the matching of applicants with
available occupations before they actually reported for duty and allowed job guarantees for individuals
qualified for enlistment.
With the end of conscription and the advent of the all-volunteer force, there has been a
significant change in the composition of new recruit cohorts (Sellman, Carr, & Lindsley, 1996). The
percentage of female accessions has more than tripled, rising from 5% in 1973 (Goldman, 1973)
to approximately 17% in 2006 among nonprior service members (Manning & Griffith, 1998; U.S.
Department of Defense, 2008). Although the services have increased their proportions of women,
youth propensity polls indicate that young women are still approximately 50% less likely to indicate
an interest in joining the military than are young men (Ramsberger, 1993; Sackett & Mavor, 2004;
U.S. Department of Defense, 2008).
The percentage of Black enlisted accessions also rose, with some fluctuation, following the end
of the draft (MacGregor, 1981). Increases in the proportion of Black accessions coincided with the
ASVAB misnorming, which led to erroneous enlistment of many low-scoring applicants. Thus,
representation of Blacks—whose test scores are generally lower than those of Whites—increased
during the misnorming period. In the early 1980s, revised standards corrected the ASVAB scoring
error. As young Black men and women increasingly viewed the military as an opportunity for upward
mobility, a gradual increase in Black accessions ensued through the early 1990s. Participation for
active component Black enlisted has remained relatively stable at around 20% into the 21st century
(U.S. Department of Defense, 2008).
Hispanics make up a much smaller but growing proportion of the military services than Blacks.
Enlisted Hispanics comprised just over 1% in the early 1970s, but by the late 1980s, that percent-
age had increased to nearly 5%. There has been a steady increase in enlisting men and women
of Hispanic descent ever since. However, with 11% of active duty enlisted members counted as
Hispanic in 2006, this group remained underrepresented relative to the growing comparable civil-
ian population (17%) (U.S. Department of Defense, 2008).

ASVAB MISNORMING AND JOB PERFORMANCE MEASUREMENT PROJECT


In 1980, the DoD announced that the ASVAB in use since 1976 had been misnormed with the result
that scores in the lower ranges were artificially inflated (Jaeger, Linn, & Novick, 1980; Boldt, 1980;
Maier & Grafton, 1980; Sims & Truss, 1978, 1979, 1980). In other words, in developing norms for
the ASVAB, an error was made in the sample and method used to convert raw scores to percentile
scores. As a result, approximately 360,000 young men and women, who had entered service during
the period 1976–1980, would have been unable otherwise to meet enlistment standards (Eitelberg,
1988). About one out of every four male recruits across all services in those years would have been
disqualified under the aptitude standards the services intended to apply. Black young men appear
to have been the biggest beneficiaries of the misnorming. Over 40% of Black recruits during this
period had test scores that ordinarily would have kept them out of the military. Hispanics, too,
benefited greatly from the misnormed ASVAB. Almost 33% would have been considered ineligible
under the correct aptitude standards (Eitelberg, 1988). The quality of Army recruits fell to an all-
time low during this period, even lower than during the period of heavy mobilization for World War
II (U.S. Department of Defense, 1985).
The ASVAB misnorming episode turned out to be a natural experiment with large numbers of
new recruits entering service “unselected.” The misnorming presented a unique opportunity to
study, on a large scale, the validity of selection standards in an unrestricted population. The people
who were admitted to the military with aptitude scores below the cut-off points were assumed
by their supervisors to have had scores above the enlistment standards. Individuals with legiti-
mately qualifying scores did appreciably better than their lower-scoring peers in terms of training
performance, promotions, disciplinary problems, and attrition. At the same time, the low-aptitude
recruits were able to successfully perform in low- and medium-demand occupations (Greenberg,
Selection and Classification in the U.S. Military 689

1980; Means, Nigam, & Heisey, 1985; Shields & Grafton, 1983). As a consequence of the misnorm-
ing, members of Congress and policy-makers in DoD became interested in the methods used to set
enlistment standards and to establish recruit quality requirements.
In the congressional view, the fact that the ASVAB traditionally had been validated against suc-
cess in training rather than on-the-job performance was potentially problematic. Supporting studies
regarding the relationship between recruit quality and military performance lacked persuasive power
because proxy measures (e.g., attrition, promotion rates, or reenlistment eligibility) were used rather
than actual measures of job performance. Congressional scrutiny of the ASVAB misnorming and
surrounding issues of recruit quality and entry standards led to the Joint-Service Job Performance
Measurement/Enlistment Standards Project—hereafter referred to as the JPM Project.
The JPM Project comprised three phases: (a) determine the feasibility of measuring hands-on
job performance; (b) if feasible, validate ASVAB against on-the-job performance; and (c) develop
an enlistment standards cost/performance trade-off model that linked recruit quality, recruiting
resources, and job performance. The overall project strategy called for each service to develop and
demonstrate various job performance measurement approaches that could be used to link enlistment
standards to job performance (U.S. Department of Defense, 1991; Wigdor & Green, 1986, 1991).
Each service developed and demonstrated hands-on job performance measures in several mili-
tary occupations. These job performance measures were used to evaluate certain surrogate measures
of performance (less expensive, easier to administer tests or existing performance information) as
substitutes for the more expensive, labor-intensive, hands-on job performance tests (Armor & Roll,
1984; Green, Wing, & Wigdor, 1988). The performance tests consisted of tasks selected from the
domain of tasks in selected military occupations, on which examinees (job incumbents) were evalu-
ated. These measures were designed to replicate actual job performance yet provide objective evalu-
ation of the performance demonstrated.
Integration of the different service research efforts into a joint service product was accomplished
through development of a common data analysis plan. These analyses (a) described the distributions
of hands-on performance test scores, aptitude scores, job experience, and educational attainment;
(b) assessed the reliability of the hands-on performance test scores; and (c) measured the degree of
relationship (i.e., correlation) between the performance test scores and other variables of interest.
These tests were administered to 8,000 incumbent, first-term soldiers, sailors, airmen, and
marines assigned to 24 different occupations (U.S. Department of Defense, 1991). The occupations
were selected to be representative of all military occupations, with large numbers of recruits enter-
ing job skill training (McCloy, 1994). The examinees averaged 25.1 months in service, and the aver-
age AFQT score was 55.1 on a 100-point percentile scale (U.S. Department of Defense, 1991).
The average reliability coefficient for the performance tests across all 24 occupations in the JPM
Project was .72 (U.S. Department of Defense, 1991). The measures of reliability showed an accept-
able degree of consistency in the performance test scores, and the services believed that those scores
reflected that a reliable benchmark measure had been developed against which to compare the vari-
ous surrogate measures of job performance (U.S. Department of Defense, 1991). Those surrogate
measures could be used in subsequent selection and classification research.
The correlation between AFQT and hands-on performance tests, corrected for restriction in
range, yielded an average validity coefficient of .40 (U.S. Department of Defense, 1991). This level
of validity is of interest because the AFQT is a test of general aptitude, whereas the performance test
scores reflected observable performance in different types of occupations. Thus, the JPM project
established the link between measured aptitude for performing a job and the demonstration of doing
it. Considering the nature of the performance test criterion, a validity coefficient of .40 compared
well with other military validity studies (Armor & Sackett, 2004).
The job performance measurement research performed by the services provided performance
measures that closely replicated actual job performance. Rather than assessing, via a paper-and-
pencil test, what enlisted personnel might know about calibrating a piece of precision avionics
equipment or operating a weapon’s targeting system, the services were able to assess how well
690 Handbook of Employee Selection

enlisted job incumbents did such tasks. Although the two are related, knowledge about a job is
not the same thing as being able to do the job. Typically, the (corrected) validities of military apti-
tude tests for predicting training success or supervisor ratings have ranged between .30 and .60
(Hartigan & Wigdor, 1989).
Research shows a strong relation between ASVAB (including AFQT) scores and success in
military job skill training and hands-on job performance across a range of occupations (Campbell,
1990; Claudy & Steel, 1990; Dunbar & Novick, 1988; Earles & Ree, 1992; Holmgren & Dalldorf,
1993; Hunter, Crosson, & Friedman, 1985; Mayberry & Carey, 1997; Welsh, Kucinkas, & Curran,
1990; Wigdor & Green, 1991). The services value recruits with above-average aptitude because they
are more trainable and their job performance is superior to that of their lower-scoring peers. Even
with on-the-job experience, enlistees with lower aptitude continued to lag behind those with higher
aptitude. As is shown in Figure 31.1, below-average (AFQT Category IV) recruits require more than
3 years of experience to attain the level of performance at which the higher aptitude recruits (AFQT
Categories I-II) begin (Armor & Roll, 1994; Armor & Sackett, 2004; U.S. Department of Defense,
1991). Higher-aptitude personnel also experience fewer disciplinary problems.
The information shown in Figure 31.1 came from the JPM project (U.S. Department of Defense,
1991). Although collected more than a decade ago, these job performance data continue to be the
best source of information about the job performance of enlisted personnel. For one thing, research
has consistently demonstrated that cognitive ability, such as is measured by AFQT, is a strong
predictor of job performance across a variety of occupations (Campbell, 1990; Hunter & Hunter,
1984; Schmitt, Gooding, Noe, & Kirsch, 1984; Welsh, Watson, & Ree, 1990). In addition, recent
interviews with military training specialists responsible for the occupations used in the research
reported that the occupations had changed little since the original job performance data were col-
lected. Thus, it is safe to generalize from these data and to conclude that the relation between apti-
tude, experience, and job performance is still pertinent.
One of the major objectives of the JPM project was development of a mathematical model to
link recruit quality, recruiting resources, and job performance. Working with the National Research
Council, in 1991 the DoD used that model to establish the DoD recruit quality benchmarks (Sellman,
1997). In general, enlistment standards are based on judgments by service policy-makers as to the
level of job performance required. However, standards should be guided by empirical evidence

Higher aptitude means better performance


58
Mean hands-on performance test scores

56

54
52

50 I-II

48
46
IIIA
44 IIIB
42

40 IV
38
0-12 13-24 25-36 37+
Job experience (months)

Note: AFQT percentile: I (93-99); II (65-92); IIIA (50-64); IIIB (31-49); IV (10-30)

FIGURE 31.1 Hands-on job performance scores as a function of aptitude and experience.
Selection and Classification in the U.S. Military 691

of the relationship between recruit quality and the required level of performance. Although it is
extremely difficult to specify an absolute value of performance that can be considered sufficient to
guarantee successful military mission accomplishment, even so, the research performed within the
JPM project developed reliable and valid measures of individual job performance that became the
basis for the linkage model.
For years, industrial psychologists contended that job performance was the ultimate criterion for
validating selection tests. In fact, S. Rains Wallace (1965), an eminent psychologist, once called it
the holy grail of industrial psychology. Measuring job performance is a very expensive proposition.
With the support of Congress and the DoD’s effort to recover from the embarrassing misnorm-
ing episode, $40 million was made available for the JPM project. Another aspect of this research
effort that made it unique was its sustainability. It was widely recognized as a project of great merit
and it lasted for over 15 years, spanning five presidential administrations, both Democratic and
Republican.

ENLISTED SELECTION AND CLASSIFICATION IN TODAY’S MILITARY


Currently, the U.S. military recruits nearly 200,000 young people annually into full-time, active
duty service and another 150,000 into the reserve components (U.S. Department of Defense, 2009).
Standards for enlistment are established under the authority of Title X of the U.S. Code (January
2009). Enlistment criteria are based on the needs of the services and are designed to ensure those
individuals accepted are qualified for general military duties. These individuals must be able to
cope successfully with a wide range of demands occurring in a military situation such as expo-
sure to danger, emotional stress, harsh environments, and the handling or operation of dangerous
equipment. Further, the services require all military members to be available for worldwide duty
24 hours a day without restriction or delay. Frequently, this duty is in remote areas devoid of normal
outside support.
Operating at the service-wide level are several mechanisms that probably do more to deter-
mine the character of entering recruits than do formal enlistment standards. The most important of
these is the general recruiting environment—the ever-varying willingness of high-aptitude youth
with high school diplomas to enter the military. This willingness cannot be considered part of a
service’s enlistment standards, but it sometimes directly affects the standards that a service sets.
For example, during good recruiting times, a service may stop accepting nongraduates in AFQT
Category IIIB (percentiles 31–49) even though they satisfy the entrance standards codified in Title
X of the U.S. Code.
Each service attempts to assign the highest quality recruit possible into the various military
occupations. Consequently, composite cut scores for occupational classification becomes a com-
promise between service ideals and fluctuating supply/demand pressures. Service officials set cut
scores on the basis of personnel requirements, equipment used, training curricula, retention, the
economy, and the availability of recruits with various composite aptitudes.
Because ASVAB is used to determine enlistment eligibility and job placement, it is important to
DoD and the services that the test be fair and equitable for all military applicants, no matter their
gender or race/ethnicity. Over the years, military personnel researchers have devoted considerable
effort to ensure that ASVAB is a valid predictor of job training success and performance on the
job and to minimize adverse impact for the various subgroups. Results indicate that ASVAB is
valid for minorities and women. Equations for prediction of final school grades from ASVAB were
essentially the same for Whites and minorities and men and women (Held, Fedak, Crookenden,
& Blanco, 2002; Mayberry, 1997; Wise et al., 1992). Where differences in prediction of school
grades were observed, technical training performance of minorities was overpredicted by ASVAB.
For women, ASVAB slightly overpredicted technical training performance in nontraditional career
fields. No differences were found for traditional military occupations. The Office of the Secretary
of Defense asked the Defense Advisory Committee on Military Personnel Testing to review the
692 Handbook of Employee Selection

Wise et al. research, which looked at applicants across all services. In responding, the chair of that
committee noted:

The conclusions from the analyses—that the ASVAB technical composites are fair and sensitive—are
clear and compelling, and the use of the same enlistment standards and qualification scores for military
occupations for all young people is justified.” (Drasgow, 1992, p. 2)

ENLISTMENT PROCESS
Young men and women interested in joining the military enter the enlistment process by contract-
ing service recruiters. In addition to providing information about service life, opportunities, and
benefits, recruiters also begin the initial screening of applicants. Most prospects take an enlistment-
screening test at a recruiting office. This enlistment-screening test is used to predict the likelihood
of “passing” the AFQT. Estimates are that 10–20% of prospects do not continue beyond this point
(U.S. Department of Defense, 2004).
There are multiple requirements that must be met before applicants are selected for service. After
recruiters have completed the preliminary screening and prospects have decided to enlist, they can
go either to a Military Entrance Processing Station (MEPS3) or a military entrance test (MET) site
to take the ASVAB. The military and civilian staffs at MEPS evaluate applicants’ medical4 quali-
fications, aptitude, and moral character standards on the basis of standards predetermined by the
services. Some services also require a test of physical ability at the MEPS.
If an applicant achieves qualifying ASVAB scores and wants to continue the application pro-
cess, a physical examination and background review is conducted at the MEPS. The physical exam
assesses medical fitness for military service and includes the measurement of blood pressure, pulse,
visual acuity and hearing; blood testing and urinalysis; drug and HIV testing; and medical his-
tory. If a correctable or temporary medical problem is detected, applicants may be required to get
treatment before proceeding. Other applicants may require a service waiver for some disqualifying
medical conditions before being allowed to enlist (Sackett & Mavor, 2006).
Furthermore, applicants must meet rigorous moral character standards. Applicants undergo
detailed interviews covering any involvement with civil law enforcement (e.g., arrests, convictions)
and some undergo a financial check or computerized search for criminal records. Some types of
criminal activity are immediately disqualifying; other cases may offer the possibility of a waiver
of the rule, wherein the services examine applicants’ circumstances and make an individual deter-
mination of qualification (Putka, Noble, Becker, & Ramsberger, 2004). Moreover, applicants with
existing financial problems are not likely to overcome those difficulties on junior enlisted pay.
Consequently, credit histories may be considered as part of the enlistment decision.
If the applicant’s ASVAB score, education credentials, medical fitness, and moral character
qualify for entry, the applicant meets with a service classification counselor at the MEPS to
discuss options for enlistment (Sackett & Mavor, 2003). The counselor considers the appli-
cant’s qualifications along with service training or skill openings, schedules, and enlistment
incentives. In this classification process, high-scoring recruits are discouraged from choosing
jobs that require only low aptitude, and recruits who want to enter jobs for which they barely
meet the standard but who have high aptitudes in other areas, are encouraged to choose jobs for
which they are better qualified. Each service has incorporated its algorithms into computerized
job reservation systems that service counselors at MEPS use to match the individuals’ desires
with the needs of the services so that one component of those needs will be how well recruits’
ASVAB scores suit them for the various jobs.

3 MEPS is the current name given for the 65 enlistment processing centers located across the country; MEPS replaces the
earlier term Armed Forces Examining and Entrance Stations.
4 Although the Americans with Disabilities Act (ADA) generally precludes employers from administering any type of
medical exam before making a job offer, the military services are explicitly exempt from that requirement.
Selection and Classification in the U.S. Military 693

Generally, those who score higher on tests will have more occupational options. Although the
process differs by service, specific skills and occupational grouping are arranged similarly to an
airline reservation system, with the training “seat” and time of travel (to recruit training) based on
the school or the field unit position openings. Using enlistment incentives (cash bonuses or extra
money that can be used to cover college costs), recruiters may encourage the applicant to choose
hard-to-fill occupational specialties. Ultimately, it is the applicant’s decision to accept or reject
the offer. Although some discuss options with their family and friends, others decide not to enlist
(Sackett & Mavor, 2006).

RECRUIT QUALITY BENCHMARKS AND ENLISTMENT STANDARDS


How does the U.S. military decide how many high school diploma graduate and above-average apti-
tude recruits to enlist? The goal is to maximize recruit quality (aptitude and education) while mini-
mizing recruiting, training, and attrition costs. In conjunction with the National Research Council,
and based on the results of the JPM project discussed earlier, DoD developed a mathematical model
that links job performance to recruit quality and recruiting resources; this model specifies the num-
ber of high-quality recruits who will provide the desired level of job performance for the least cost
(Harris et al., 1991; McCloy, 1994; Smith & Hogan, 1994; Wise, 1994). Scores from the JPM project
define the job performance variable (Green & Mavor, 1994; Wigdor & Green, 1991). Costs reflect
training costs, compensation costs, and recruiting costs (e.g., recruiter compensation and money for
advertising, education benefits, and enlistment bonuses). Using these relations, the model allows
“what-if” analyses to examine how changes in one or more of these variables affect the other vari-
ables. For example, the model could answer how decreasing the DoD advertising budget by $20
million would affect recruit quality and job performance.
What should be the desired level of performance? Recruit quality benchmarks are used to help
ensure that recruit performance is sufficient to complete military missions. The model cannot esti-
mate how much quality is enough; rather, policy decision/recruiting policy analysts within DoD set
the desired level of performance. Nevertheless, the model can help specify a cohort of recruits that
will provide the desired level of performance for the lowest cost.
The performance level identified by the policy analyst is a minimally acceptable value. DoD
has chosen the level of performance provided by the 1990–1991 enlisted cohort (the cohort in ser-
vice during Operations Desert Shield and Desert Storm). Specifying this level of desired perfor-
mance resulted in recruit quality benchmarks that call for 60% of recruits to score above the 50th
percentile on the AFQT (i.e., to be in Categories I-IIIA) and 90% to have high school diplomas
(Sellman, 1994). These benchmarks are not enlistment standards that the services use to establish
entrance eligibility. Rather, they are recruiting goals that the services strive to meet to maximize
performance and minimize recruiting costs. The standards codified in Title X of the U.S. Code are
considerably lower (i.e., AFQT scores at the 10th and 31st percentiles for high school diploma grad-
uates and nongraduates, respectively) than the standards actually used by the services for enlistment
purposes (Sellman, 2004).

SELECTION FOR OFFICER COMMISSIONING PROGRAMS


Up to this point, we have focused largely on the accession of enlisted members. However, officers
are recruited quite differently from enlisted personnel. As mentioned earlier, there are five principal
ways to join the U.S. military as an officer: 4-year service academies (Army, Navy, Air Force, Coast
Guard, and Merchant Marine), ROTC, OCS/OTS, and direct commissioning (Thirtle, 2001).
Various aptitude and academic criteria are used to screen officer candidates. Generally, the service
academies and ROTC scholarship programs evaluate the candidates using a “whole person” approach.
Factors such as Scholastic Achievement Test (SAT) or American College Test (ACT) scores, leader-
ship experience, athletic participation, teacher recommendations, high school grade point average
694 Handbook of Employee Selection

(GPA) and class rank, and extracurricular activities may all be weighed together to derive an applicant
numerical score. The service academies draw on highly selective national pools of high school gradu-
ates, and their classes look very much like students entering highly competitive civilian universities,
with comparable GPA and SAT/ACT scores. Most Army and Air Force cadets and Navy midshipman
are nominated by their local U.S. senator or congressional representative (Segal & Segal, 2004).
Selection for nonscholarship ROTC varies by service wherein candidates may be selected based
on service-specific test scores in conjunction with fulfillment of other academic, physical fitness,
and experience-based criteria. Factors considered for OCS/OTS include the Physical Aptitude
Examination, college GPA, letters of recommendation, college major, an interview by a selec-
tion board, and scores on service-specific officer selection measures (e.g., Army Officer Aptitude
Rating, Air Force Officer Qualifying Test) (Eitelberg, Laurence, & Brown, 1992). Individuals enter-
ing a direct commissioning program have completed graduate programs and are not subject to
selection testing. Instead, they are evaluated by an entrance board to ensure their adherence to the
DoD quality standards for members of their professions (attorneys, physicians, and other healthcare
professionals).
Most individuals interested in attending one of the service academies must first receive a congres-
sional appointment.5 It is not necessary for individuals to personally know the member of Congress,
because most Congressional offices use a strictly competitive process based on college admission test
scores, academic grades, and leadership performance in high school and other organizations to make
their appointments. Additionally, applicants may not be married or have any dependents to qualify to
attend one of the service academies. Should individuals, who are married or have a dependent wish to
become an officer, they have the option to complete ROTC or OCS/OTS to obtain their commission.
Once accepted to a service academy, individuals become cadets/midshipmen and receive a
Bachelor of Science degree at the completion of their 4-year program. While attending the academy,
the cadets/midshipmen are provided free room and board, tuition, medical and dental care, and a
monthly allowance. Further, a cadet/midshipman’s academy life typically revolves around educa-
tion, military training, and physical fitness.
In conjunction with their academic studies, the cadets/midshipmen participate in military train-
ing in leader and follower roles. As they advance through the academy, they are exposed to a mul-
titude of leadership activities designed to hone their leadership skills in preparation for active duty.
Additionally, there is a significant emphasis on physical education, whereby they continually train
in intercollegiate sports, intramural sports, clubs, or physical fitness classes. Upon completion of
their degree, cadets/midshipmen receive a commission in the U.S. military with an active duty
service commitment of 5 years, plus 3 years in the active or inactive reserve. Selection for further
training after commissioning may incur an additional active duty or reserve commitment.
The largest source of commissioned officers is ROTC. Historically, ROTC was designed for
officers who would enter into reserve status. However, following World War II, officers receiving
their commissions via ROTC had the option to enter active duty upon graduation, and in the decades
that followed, larger percentages did so. Today, only the Army continues to offer the opportunity to
commission directly into the reserve components from ROTC; the majority of Army cadets and all
other service cadets/midshipmen commission directly onto active duty. To participate in ROTC,
individuals must register for and complete military/naval science classes in addition to normal aca-
demic coursework through the university. ROTC classes range from 2 to 3 hours each week in addi-
tion to a weekly leadership laboratory.
The ROTC classroom activities include various instructional methods designed to educate
cadets/midshipmen on the military culture. Leadership laboratories provide the environment for
cadets/midshipmen to hone their leadership skills as the upperclassmen assume military leadership
positions within the ROTC unit, and train the underclassmen. In addition cadets/midshipmen must

5 Some applicants (e.g., children of military personnel who have been awarded the Medal of Honor, military enlisted
personnel on active duty) compete for appointments separately from the Congressional category.
Selection and Classification in the U.S. Military 695

attend various summer training programs throughout their college years: summer cruises for mid-
shipmen and various levels of field training for Army and Air Force cadets. Successful completion
of summer training programs is mandatory for all ROTC students.
Although attending ROTC, only scholarship cadets/midshipmen receive some portion of
room and board and/or tuition. All contracted cadets/midshipmen (which can include non-
scholarship students who are at least sophomores) are eligible for medical and dental benefits
and a monthly allowance. Upon graduation, the cadets/midshipmen enter active duty with any-
where from a 2- to 4-year active duty service obligation, depending on whether and what type
of scholarship, they received. As with academy cadets/midshipmen, ROTC graduates receiving
further training after commissioning (e.g., pilots or nuclear officers) may incur additional active
duty obligations.
The most flexible commissioning program is OCS/OTS. This program is designed to accept
applicants who currently possess at least a bachelor’s degree and is designed to select those with
particular skills or aptitudes to fill service-manning requirements not met through the academies or
ROTC. Training occurs during one or two short (usually 2 to 4 months) training cycles. This training
is unique to each service on the basis of its culture and requirements but is generally similar to basic
training for enlisted personnel, but with the added aspects of leadership training. Active duty ser-
vice commitments upon commissioning range from 2 to 4 years, depending on the service and other
factors. Because of the shorter time span required for this commissioning option, OCS/OTS serve
as the short-term officer accession valve, in which the services can increase or decrease enrollment
on the basis of officer requirements and the projected number of academy and ROTC graduates to
ensure that the congressionally authorized supply of commissioned officers is maintained.
The smallest and most specialized commissioning method is through direct appointment. This
program is designed for those individuals who currently possess an advanced degree and wish
to enter the military in the fields of medicine, dentistry, law, or the chaplainry. Upon selection,
individuals are immediately commissioned and subsequently attend a short training course to pre-
pare them for the military. These officers receive wages comparable to civilian professionals by
commissioning into the military at a higher rank commensurate with their expertise and experi-
ence. Once on active duty, there are additional bonus pays of varying levels offered to healthcare
professionals on the basis of their particular specialty. Further, for certain health professionals, they
may apply to the Uniformed Services University of Health Sciences, which offers a salary and free
tuition in a program leading to a doctor-of-medicine (MD) degree. In return, graduates owe a 7-year
commitment to the military or the U.S. Public Health Service.

OFFICER RETENTION AND ATTRITION


Upon entering active duty or the reserve component, officers’ attitudes toward military service
are continually monitored by the DoD to ensure that personnel will be available to fill senior-level
positions in the future. In recent years, the U.S. Navy, Air Force, and Marine Corps have generally
succeeded in retaining the desired numbers of officers but have experienced some challenges in
specific career occupations such as medical officers (GAO, 2007). On the other hand, the Army has
not retained the desired numbers of officers, projecting a shortage of 3,000 or more officers annu-
ally through Fiscal Year (FY) 2013 (GAO, 2007). The assumption is that continued deployments for
operations in Iraq and Afghanistan are the cause of the shortfall.
A recent study by the GAO studied officers who were in their 3rd, 4th, 5th, and 10th years of
service to assess retention attitudes among the services. These year groups were chosen because
they are typical of points when retention-decisions are made (GAO, 2007). This study concluded
that retention rates for graduates from the U.S. Military Academy (i.e., West Point) and Army
ROTC were 62%, which is 20–30 points below normal. To combat these lower retention rates,
the Army instituted three principal measures. In 2007, the Army reduced the amount of time to
promote First Lieutenants to Captains from 42 months to 38 months. The second measure was
696 Handbook of Employee Selection

to implement a higher promotion rate. For 2007, the promotion rate to Captain was 98% and the
promotion rate to Major was 97%, which are both substantially higher than the Defense Officer
Personnel Management Act’s goals of 90% and 80%, respectively (U.S. Government Accountability
Office, 2007). Lastly, the Army began offering eligible Captains a retention menu of incentives for
additional service to include: graduate school or military school opportunities; branch or post of
choice; or a Critical Skills Retention Bonus (CSRB).

OFFICER EXECUTIVE DEVELOPMENT


As previously mentioned, the American military personnel system is a closed system; therefore,
the services must grow tomorrow’s leaders today. As a consequence, the military sets as a priority
the opportunity for lower ranking officers to receive developmental education in preparation for
future senior leadership positions.
Although the services promote their people at slightly different times in their careers, each mili-
tary rank has relatively similar command or responsibility levels (Moore & Trout, 1978). At the
company-grade level, officers are put in charge of smaller projects/groups of personnel. During this
phase, promotions are granted at regular intervals with very little competition.
At the field-grade level, officers are responsible for larger projects, more equipment (e.g., tanks,
ships, aircraft) and more personnel. It is during the field-grade years that individuals typically
receive command authority and are put to their ultimate leadership tests: where the mission and
people collide. Promotion at the field-grade level becomes more competitive and fewer individu-
als are advanced to the next higher rank. Most officers, if serving 20 years in the military, will
achieve the rank of Lieutenant Colonel/Commander. However, only a select few will be promoted
to Colonel/Captain and even fewer will move on to the General/Flag officer category and senior
leadership positions.
The general or flag officers typically are responsible for large groups of people with their accom-
panying equipment and policy creation. These individuals set their sights on the future and establish
policy to ensure that the services are capable of executing the missions of today and tomorrow.
Promotion at the general/flag officer level is very competitive and comes with assigned senior com-
mand or staff positions. Once assigned to a position, individuals will be promoted to the appropriate
rank if they have not already been elevated to that rank.

COMMAND SELECTION AND CAREER BROADENING EXPERIENCES


The first point where the services start to identify tomorrow’s senior leaders is command selec-
tion and career broadening. As officers move from the company-grade officer level to the field-
grade officer level, they are considered for command positions. In collaboration between officers,
their supervisors, and the service personnel managers, officers’ records are submitted to develop-
ment teams; if approved at that level, they become eligible for command. If selected, individuals
then assume command of a unit. This first command opportunity is often perceived by the respec-
tive services as a “command test” to see if the officer is capable of leading potentially at higher
levels of command and also is worthy of the next level of promotion and accompanying increased
responsibilities.
Another facet where officers can stand out among their peers is through career broadening.
This program is not solely designed for field-grade officers because many company-grade officers
compete and are selected for career broadening. Once selected, these officers leave their career field
and fill other positions in nonrelated occupations that are oftentimes considered intern positions.
By taking advantage of these opportunities early in their career, officers are exposed to a broader
picture of how the services operate at higher levels. Typically officers selected for these career-
broadening positions have been selected through a highly competitive process and will be groomed
to be future senior leaders.
Selection and Classification in the U.S. Military 697

DEFENSE TRANSFORMATION IN MILITARY SELECTION


The ASVAB has undergone several revisions since 1976 when it became the official Joint-Service
selection and classification battery for all services. However, the last comprehensive review of
ASVAB content was completed during the early 1990s. Since that review, the ASVAB has under-
gone two major methodological changes. The first change was transforming the paper-and-pencil
form of the ASVAB to a computerized adaptive testing (CAT) version, which was implemented
at all 65 MEPS in 1997 (Sands et al., 1997). The second major change was the implementation of
item response theory scoring techniques—a more advanced psychometric procedure to develop and
tailor test items to an individual examinee’s ability level (e.g., McBride, Wetzel, & Hetter, 1997).
These methodological changes to military enlistment testing presented the opportunity and pos-
sibility of adding new ASVAB content and/or new item formats that could potentially increase the
battery’s predictive validity for military occupations.
In addition to changes over the past 25 years in the testing environment, there have been changes
in the nature of military service (e.g., more diverse missions, more complex organizations and sys-
tems, and enhanced technology) that affect the nature of military work and the prerequisite charac-
teristics of military personnel (Levy et al., 2001). Consequently, in 2005, the DoD, in conjunction
with the military services, initiated a review of ASVAB by a panel of experts in the areas of person-
nel selection, job classification, psychometrics, and cognitive psychology to determine if revisions
in content and testing methodology were warranted (Drasgow, Embretson, Kyllonen, & Schmitt,
2006).
The panel made several recommendations for ways to streamline the military personnel selection
and job classification process. Among others, these included such revisions to the ASVAB system
as linking ASVAB test content directly to military job analytic information and training curricula
and updating ASVAB content by including nonverbal reasoning tests, information technology/com-
munication literacy tests, and noncognitive measures to enhance selection and classification effi-
ciency. DoD also is considering a proctored Internet application of CAT-ASVAB. Since completing
the technical review of the ASVAB in 2006, DoD and the services have pursued implementation of
the panel’s recommendations (Sellman, Shaw, Waters, & Geimer, 2007) after prioritizing them on the
basis of (a) anticipated impact on improving the enlistment process, (b) sufficient research to support
the recommendation, (c) cost of additional research, (d) time to implement the recommendation, and
(e) cost to implement the recommendation (Brown, Stawarski, Sellman, & Warthen, 2008).
The U.S. military is undertaking fundamental changes to transform itself in response to changes
in the world environment. For example, the Army is developing and fielding new combat systems,
organizations, and doctrine intended to address global terrorism and more traditional warfare.
Fortunately, Army leadership recognizes the importance of its people—soldiers—to the effective-
ness of transformation and has begun a series of interrelated research efforts to approach the human
side of transformation. The Army’s approach continues long-standing research programs covering
(a) the selection of individuals into the Army, (b) classification into the correct Army jobs, (c) subse-
quent promotion to positions of increasing rank and leadership, and (d) the assessment of skills and
performance at selected career points.
Recent projects under this approach have focused these programs on occupations, jobs, and
organizational structures that do not yet exist. For example, the Army has developed a process for
conducting future-oriented job analysis, tied to Army transformation plans. As part of this process,
the Army is determining future knowledge areas, skills, and aptitudes (KSAs) for various career
points and different (sometimes emerging) Army jobs. The Army is also examining criteria for
promotion to leadership positions and the development of a model assessment program for soldiers
in Army-wide and job-specific technical KSAs.
Several innovative products and procedures have characterized these projects. For example, the
Army has developed several applications of situational judgment tests (SJTs) for use as predictors
and criteria and in competency and promotion assessments. There has also been extensive predictor
698 Handbook of Employee Selection

development including computer-based, faking-resistant personality instruments. An overarching


theme has been the development and use of computer-based measures and web-based data collec-
tion. This has allowed not only exploration of new testing techniques and capabilities but also the
transformation of the approach to field data collection and analysis. These projects have involved
criterion-related validation (concurrent and longitudinal) in addition to pilot administrations of
assessment measures under operational conditions.
The mission of the three service academies is to educate, train, and inspire men and women
to become military officers of character. Selection for service academy admission has tradition-
ally relied primarily on high school academic performance, standardized test scores, activities and
honors, and athletics/fitness. The instruments used to assess academic prowess and physical fitness
have proven to be good predictors of such performance by cadets/midshipmen. However, in keep-
ing with the mission of the academies there is a new selection component emerging—character and
leadership. Although such a component is clearly in line with the mission of the academies, several
questions remain. For example, what instrument could be used to predict character and leadership
development, and how are those traits demographically distributed among American youth?
In 2005, the Air Force initiated work to review the admission practices at all academies, as
well as their character and leadership development programs. This effort also included a review of
recruiting and interviewing procedures at civilian institutions in an attempt to identify an existing
instrument to assess character and leadership. The results of the U.S. Department of the Air Force
(2005) study highlighted the similarities and differences among the admissions programs of the
three service academies but concluded that there was no viable instrument for assessing character
and leadership. Consequently, the Air Force undertook research to develop and validate an instru-
ment that will measure that construct. If such a device can be developed, it could result in (a) an
increase in the number of cadets/midshipmen and subsequent academy graduates innately possess-
ing a high level of character and leadership, (b) an improved recruiting and admissions process, and
(c) a higher level of officership among cadets/midshipmen at each academy as well as among newly
commissioned officers.

CONCLUSIONS
Since the advent of the Army Alpha and Beta in 1917, the U.S. military has been on the cutting
edge regarding personnel selection, and, later, job classification. During World War II, many
eminent psychologists participated in the Army Air Corps psychology program, focusing on
aircrew selection and supporting measurement techniques (Flanagan, 1948). In the 1980s, the
DoD sponsored research to refine item response theory and to develop groundbreaking techniques
to calibrate paper-and-pencil tests to computer adaptive versions. This led to implementation of
the Computer Adaptive ASVAB—the first and certainly the largest adaptive “employment” test
program in use today. The job performance measurement project, conducted over a 15-year period
in the 1980s and early 1990s, demonstrated that ASVAB was a valid predictor of hands-on job
performance and provided the foundation for the DoD model linking recruit quality and recruiting
resources to job performance. This model is used to set recruit quality benchmarks and to establish
and defend DoD’s recruiting budget. That the model has been widely accepted by Congressional
staffers and analysts at the Office of Management and Budget is testimony to the quality of the
science underpinning the effort.
Implementation of the new selection and classification procedures should affect the military in four
significant ways: (a) increase personnel productivity, (b) increase job satisfaction and commitment,
(c) reduce first-term attrition, and (d) reduce adverse impact against women and minorities. To empha-
size the significance of the potential benefits of enhancing military selection and classification, con-
sider that approximately 30% of those who entered the military during FY 2008 will fail to complete
their first 3-year term of service. The GAO estimated in 1997 that it cost DoD about $40,000 to recruit,
train, and equip each replacement for an individual who prematurely leaves the service (U.S. General
Selection and Classification in the U.S. Military 699

Accounting Office, 1997). That figure is probably approaching $45,000 today. Thus, any increase in
job performance or decrease in attrition will improve military readiness and save valuable resources.
In summary, as the United States military transforms itself in the 21st century, military selection
and classification methods will undoubtedly change as well to meet the needs of the all-volunteer
force. Today, as it was several decades ago, the ASVAB continues to serve as the principal screening
tool for selection and job classification for enlisted personnel. With the end of conscription and the
inception of the all-volunteer force, each new recruit represents an investment and with millions of
dollars and national security at stake, those whom we select today will represent us in the future as
leaders of the military services.

REFERENCES
Armor, D. J., & Roll, C. R. (1994). Military manpower quality: Past, present, and future. In B. F. Green &
A. S. Mavor (Eds.), Modeling cost and performance for military enlistment (pp. 13–34). Washington,
DC: National Academy Press.
Armor, D. J., & Sackett, P. R. (2004). Manpower quality in the all-volunteer force. In B. A. Bicksler, C. L.
Gilroy, & J. T. Warner (Eds.), The all-volunteer force: Thirty years of service (pp. 90–108). Washington,
DC: Brassey’s.
Bicksler, B. A., Gilroy, C. L., & Warner, J. T. (2004). The all-volunteer forces: Thirty years of service.
Washington, DC: Brassey’s.
Bicksler, B. A., & Nolan, L. G. (2006). Recruiting the all-volunteer force: The need for sustained investment in
recruiting resources. Policy Perspectives, 1, 1–27.
Boldt, R. F. (1980). Check scaling of the AFQT 7C portion of the Armed Services Vocational Aptitude Battery
Form 7, and General Classification Test Form 1C to the Armed Forces Qualification test scale. Princeton,
NJ: Educational Testing Service.
Brown, D. G., Stawarski, C. A., Sellman, W. S., & Warthen, M. S. (2008). Using a Delphi procedure to establish
ASVAB research priorities. Alexandria, VA: Human Resources Research Organization.
Campbell, J. P. (1990). Modeling the performance prediction problem in industrial and organizational psychology.
In M. D. Dunnette & L. J. Hough (Eds.), Handbook of industrial and organizational psychology (2nd ed.,
Vol.1, pp. 687–732). Palo Alto, CA: Consulting Psychologists Press.
Campbell, J. P., & Knapp, D. J. (2001). Exploring the limits in personnel selection and classification. Mahwah,
NJ: Lawrence Erlbaum.
Claudy, J. G., & Steel, L. (1990). Armed Services Vocational Aptitude Battery (ASVAB): Validation for civilian
occupations using National Longitudinal Survey of Youth data (AFHRL-TR-90-29). Brooks AFB, TX:
Manpower and Personnel Division, Air Force Human Resources Laboratory.
Defense Manpower Commission. (1976). Defense manpower: The keystone of national security. Report to the
President and the Congress. Washington, DC: Author.
Diaz, T. E., Ingerick, M. J., & Lightfoot, M. A. (2004). New Army aptitude areas: Evaluation of new compos-
ites and job families for Army classification (FR-04-29). Alexandria, VA: Human Resources Research
Organization.
Drasgow, F. (1992, September). Review of sensitivity and fairness of the Armed Services Vocational Aptitude
Battery technical composites. Letter from the Chairman, Defense Advisory Committee on Military
Personnel Testing to the Director for Accession Policy, Office of the Assistant Secretary of Defense
(Force Management and Personnel). Champaign-Urbana: University of Illinois.
Drasgow, F., Embretson, S. E., Kyllonen, P. C., & Schmitt, N. (2006). Technical review of the Armed Services
Vocational Aptitude Battery (ASVAB) (FR 06-25). Alexandria, VA: Human Resources Research Organization.
Dunbar, S. B., & Novick, M. R. (1988). On predicting success in training for men and women: Examples from
Marine Corps clerical specialties. Journal of Applied Psychology, 73, 545–550.
Earles, J. A., & Ree, M. J. (1992). The predictive validity of the ASVAB for training grades. Educational and
Psychological Measurement, 52, 721–725.
Eitelberg, M. J. (1988). Manpower for military occupations. Washington, DC: Office of the Assistant Secretary
of Defense (Force Management and Personnel). Human Resources Research Organization.
Eitelberg, M. J., Laurence, J. H., & Brown, D. C. (1992). Becoming brass: Issues in the testing, recruiting, and
selection of American Military Officers. In B. R. Gifford & L. C. Wing (Eds.), Test policy in defense:
Lessons from the military for education, training and employment (pp. 79–219). Boston, MA: National
Commission on Testing and Public Policy.
700 Handbook of Employee Selection

Eitelberg, M. J., Laurence, J. H., & Waters, B. K. (with Perelman, L. S.). (1984). Screening for service: Aptitude
and education criteria for military entry. Washington, DC: Office of the Assistant Secretary of Defense
(Manpower, Installations, and Logistics). Human Resources Research Organization.
Elster, R. E., & Flyer, E. S. (1981). A study of the relationship between educational credentials and military
performance criteria. Monterey, CA: Naval Postgraduate School.
Fairbank, B. A., Welsh, J. R., & Sawin, L. L. (1990). Armed Services Vocational Aptitude Battery (ASVAB):
Validity of ASVAB form 14 for the prediction of high school course grades (AFHRL-TR-90-48). Brooks
AFB, TX: Manpower and Personnel Division, Air Force Human Resources Laboratory.
Flanagan, J. C. (Ed.). (1948). The aviation psychology program in the Army Air Forces (Report No. 1).
Washington, DC: U.S. Printing Office.
Flyer, E. S. (1959). Factors relating to discharge for unsuitability among 1956 airmen accessions to the Air
Force (WADC-TN-59-201). Lackland AFB, TX: Air Force Personnel Research Laboratory.
Fredland, J. E., Gilroy, C. L., Little, R. D., & Sellman, W. S. (1996). Professionals on the front line: Two
decades of the all-volunteer force. Washington, DC: Brassey’s.
Gates, T. S. (1970). Report of the President’s Commission on an All-Volunteer Armed Force. Washington, DC:
U.S. Government Printing Office.
Goldman, N. (1973). The changing role of women in the armed forces. The American Journal of Sociology,
78, 892–911.
Green, B. F., & Mavor, A. S. (Eds.). (1994). Modeling cost and performance for military enlistment. Washing-
ton, DC: National Academy Press.
Green, B. F., & Wigdor, A. K. (1988). Measuring job competency. In B. F. Green, H. Wing, & A. K. Wigdor
(Eds.), Linking military enlistment standards to job performance (pp. 23–44). Washington, DC: National
Academy Press.
Green, B. F., Wing, H., & Wigdor, A. K. (Eds.). (1988). Linking military enlistment standards to job performance.
Washington, DC: National Academy Press.
Greenberg, I. M. (1980). Mental standards for enlistment performance of Army personnel related to AFQT/
ASVAB scores (MGA-0180). Monterey, CA: McFann-Gray.
Harris, D. A., McCloy, R. A., Dempsey, J. R., Roth, C., Sackett, P. R., Hedges, et al. (1991). Determining the
relationship between recruit characteristics and job performance: A methodology and a model (FR-PRD-
90-17). Alexandria, VA: Human Resources Research Organization.
Hartigan, J., & Wigdor, A. K. (Eds.). (1989). Fairness in employment testing: Validity generalization, minority
issues, and the General Aptitude Test Battery. Washington, DC: National Academy Press.
Held, J. D., Fedak, G. E., Crookenden, M. P., & Blanco, T. A. (2002). Test evaluation for augmenting the Armed
Services Vocational Aptitude Battery. Paper presented at the 44th Annual Conference of the International
Military Testing Association, Ottawa, Canada.
Hogan, P. F., Curtis, J. S., & Warner, J. T. (2004). Sustaining the force in an era of transformation. In B. A. Bicksler, C.
L. Gilroy, & J. T. Warner, (Eds.), The all-volunteer force: Thirty years of service. Washington DC: Brassey’s.
Holmgren, R. L., & Dalldorf, M. R. (1993). A validation of the ASVAB against supervisors’ ratings in the
General Aptitude Test Battery (GATB). Washington, DC: U.S. Employment Service.
Hunter, J. E., Crosson, J. S., & Friedman, D. H. (1985). The validity of the Armed Services Vocational Aptitude
Battery (ASVAB) for civilian and military job performance. Washington, DC: Office of the Assistant
Secretary of Defense (Force Management and Personnel).
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological
Bulletin, 98, 72–98.
Jaeger, R. M., Linn, R. L., & Novick, M. R. (1980). A review and analysis of score calibration for the Armed
Services Vocational Aptitude Battery. Washington, DC: Committee Commissioned by the Office of the
Secretary of Defense.
Karpinos, B. D. (1966). The mental qualification of American youth for military service and its relationship to
educational attainment. Proceedings of the American Statistical Association. Alexandria, VA: American
Statistical Association.
Laurence, J. H. (1984). Education standards for military enlistment and the search for successful recruits
(FR-PRD-84-4). Alexandria, VA: Human Resources Research Organization.
Laurence, J. H. (1993). Education standards and military selection: From the beginning. In T. Trent & J. H.
Laurence (Eds.), Adaptability screening for the Armed Forces (pp. 1–40). Washington, DC: Office of the
Assistant Secretary of Defense (Force Management and Personnel).
Laurence, J. H. (1997). Does the education credential still predict attrition? In J.M. Arabian (Chair), Everything
old is new again: Current research issues in accession policy. Symposium conducted at the 105th Annual
Convention of the American Psychological Association, Chicago, IL.
Selection and Classification in the U.S. Military 701

Laurence, J. H., & Waters, B. K. (1993). Biodata: What’s it all about? In T. Trent & J. H. Laurence (Eds.),
Adaptability screening for the Armed Forces (pp. 41–70). Washington, DC: Office of the Assistant
Secretary of Defense (Force Management and Personnel).
Lee, D., & McKenzie, R. (1992). Reexamination of the relative efficiency of the draft and the all-volunteer
army. Southern Economic Journal, 59, 640–654.
Lee, G. C., & Parker, G. Y. (1977). Ending the draft: The story of the all-volunteer force (FR-77-1). Alexandria,
VA: Human Resources Research Organization.
Levy, D. G., Thie, H. J., Robbert, A. A., Naftel, S., Cannon, C., Enrenberg, R.G., & Gershwin, M. (2001).
Characterizing the future defense workforce (MR-304-OSD) Santa Monica, CA: The RAND
Corporation.
Lightfoot, M. A., Diaz, T. E., Heggestad, E. D., Darby, M. M., & Alley, W. E. (1999). New Air Force classi-
fication composites for USAF/DPX (FR-WATSD-99-17). Alexandria, VA: Human Resources Research
Organization.
MacGregor, M. (1981). Integration of the Armed Forces: 1940–1965. Washington, DC: Center of Military
History.
Maier, M. H. (1993). Military aptitude testing: The past 50 years (DMDC-TR-93-007). Monterey, CA: Defense
Manpower Data Center.
Maier, M. H., & Grafton, F. C. (1980). Renorming ASVAB 6 and 7 at Armed Forces examining and entrance
stations. Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
Manning, L., & Griffith, J. E. (1998). Women in the military: Where they stand (2nd ed.). Washington, DC:
Women’s Research and Education Institute.
Mayberry, P. W. (1997). Competing criteria in the formation of aptitude composites (CAB-97-03). Alexandria,
VA: Center for Naval Analyses.
Mayberry, P. W., & Carey, N. B. (1997). The effect of aptitude and experience on mechanical job performance.
Educational and Psychological Measurement, 57, 131–149.
McBride, J. R., Wetzel, C. D., & Hetter, R. D. (1997). Preliminary psychometric research for CAT-ASVAB:
Selecting an adaptive testing strategy. In W.A. Sands, B. K. Waters, & J. R. McBride (Eds.), Computerized
adaptive testing: From inquiry to operation. Washington, DC: American Psychological Association.
McCloy, R. A. (1994). Predicting job performance scores for jobs without performance data. In B. F. Green &
A. S. Mavor (Eds.), Modeling cost and performance for military enlistment (pp. 61–99). Washington,
DC: National Academy Press.
Means, B. M., Nigan, A., & Heisey, J. G. (1985, October). When low-aptitude recruits succeed. Exceptional
recruits: A look at high and low-aptitude personnel. Symposium conducted at the 27th Annual
Conference of the Military Testing Association, San Diego, CA.
Moore, D. W., & Trout, B. (1978). Military advancement: The visibility theory of promotion. The American
Political Science Review, 72, 452–468.
Oi, W. Y. (1967). The economic cost of the draft. American Economic Review, 57, 39–62.
Oppler, S. H., McCloy, R. A., & Campbell, J. P. (2001). The prediction of supervisory and leadership
performance. In J. P. Campbell & D. J. Knapp (Eds.), Exploring the limits in personnel selection
and classification (pp. 389–410). Mahwah, NJ: Lawrence Erlbaum.
Putka, D. J., Noble, C. L., Becker, D. E., & Ramsberger, P. F. (2004). Evaluating moral character waiver
policy against servicemember attrition and in-service deviance through the first 18 months of service
(FR-03-96). Alexandria, VA: Human Resources Research Organization.
Ramsberger, P. F. (1993). Influences on the military enlistment decision-making process: Findings from the
1991 youth attitude tracking study (FR-PRD-93-06). Alexandria, VA: Human Resources Research
Organization.
Rostker, B. (2006). I want you: The evolution of the all-volunteer force. Santa Monica, CA: The RAND
Corporation.
Rumsey, M. G., Walker, C. B., & Harris, J. H. (Eds.). (1994). Personnel selection and classification. Hillsdale,
NJ: Lawrence Erlbaum.
Sackett, P. R., & Mavor, A. S. (Eds.). (2003). Attitudes, aptitudes, and aspirations of American youth:
Implications for military recruitment (pp. 70–96). Washington, DC: National Academy Press.
Sackett, P. R., & Mavor, A. S. (Eds.). (2004). Evaluating military advertising and recruiting: Theory and
methodology (pp. 40–67). Washington, DC: National Academy Press.
Sackett, P. R., & Mavor, A. S. (Eds.). (2006). Assessing fitness for military enlistment: Physical, medical, and
mental health standards (pp. 21–40). Washington, DC: National Academy Press.
Sands, W. A., Waters, B. K., & McBride, J. R. (Eds.). (1997). Computerized adaptive testing: From inquiry to
operation. Washington, DC: American Psychological Association.
702 Handbook of Employee Selection

Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta analysis of validity studies published
between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology, 37, 407–422.
Segall, D. O. (2004). Development and evaluation of the 1997 ASVAB score scale. Monterey, CA: Defense
Manpower Data Center.
Sellman, W. S. (1986). Military adaptability screening: A manpower management perspective. In A. R. Lancaster
(Chair), Recent developments in military suitability research. Symposium conducted at the 28th Annual
Conference of the Military Testing Association, Munich, Germany.
Sellman, W. S. (1994). Job performance measurement: The nexus between science and policy. In B. F. Green &
A. S. Mavor (Eds.), Modeling cost and performance for military enlistment (pp. 1–6). Washington, DC:
National Academy Press.
Sellman, W. S. (1997). Public policy implications for military entrance standards. Keynote address presented
at the 39th Annual Conference of the International Military Testing Association, Sydney, Australia.
Sellman, W. S. (1999). Military recruiting: The ethics of science in a practical world. Invited address to the
Division of Military Psychology, 107th Annual Convention of the American Psychological Association,
Boston, MA.
Sellman, W. S. (2001). Reinventing DoD corporate marketing. Paper presented at the International Workshop
on Military Recruitment and Retention in the 21st Century. Sponsored by the Belgian Defense Staff,
Royal Netherlands Army, and U.S. Office of Naval Research, The Hague, Netherlands.
Sellman, W. S. (2004). Predicting readiness for military service: How enlistment standards are established.
Commissioned paper prepared for the National Assessment Governing Board. Washington, DC: U.S.
Department of Education.
Sellman, W. S., Carr, W. K., & Lindsley, D. H. (1996). Shaping tomorrow’s military: The National agenda and
youth attitudes. Proceedings of the 15th Biennial Behavioral Sciences Symposium. Fort Collins, CO: U.S.
Air Force Academy.
Sellman, W. S., Shaw, M. N., Waters, S. D., & Geimer, J. L. (2007). Research and implementation plan:
Addressing recommendations for enhancing ASVAB and the DoD enlisted personnel selection and job
classification system (FR 07-46). Alexandria, VA: Human Resources Research Organization.
Sellman, W. S., & Valentine, L. D. (1981). Aptitude testing, enlistment standards, and recruit quality. Paper
presented at the 89th Annual Convention of the American Psychological Association, Los Angeles, CA.
Shields, J. L., & Grafton, F. C. (1983). A natural experiment: Analysis of an almost unselected Army population.
Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
Sims, W. H., & Truss, A. (1978). An analyses of the normalization and verification of the Armed Services
Vocational Aptitude Battery (ASVAB) forms 6 and 7 (CNA 1115). Alexandria, VA: Center for Naval
Analyses.
Sims, W. H., & Truss, A. (1979). A reexamination of the normalization of the Armed Services Vocational
Aptitude Battery (ASVAB) (CNA 79-3059). Alexandria, VA: Center for Naval Analyses.
Sims, W. H., & Truss, A. (1980). A reexamination of the normalization of the Armed Services Vocational
Aptitude Battery (ASVAB) forms 6, 7, 6E, and 7E (CNA 1152, MCOAG). Alexandria, VA: Center for
Naval Analyses.
Smith, D. A., & Hogan, P. F. (1994). The accession quality cost/performance trade-off model. In B. F. Green &
A. S. Mavor (Eds.), Modeling cost and performance for military enlistment (pp. 105–128). Washington,
DC: National Academy Press.
Strickland, W. J. (2003). Trends in youth qualification and enlistment standards. In P. R. Sackett & A. S. Mavor
(Eds.), Attitudes, aptitudes, and aspirations of American youth: Implications for military recruitment
(pp. 70–96). Washington, DC: National Academy Press.
Strickland, W. J. (Ed.). (2004). A longitudinal study of first-term attrition and reenlistment among FY 1999
enlisted accessions (FR-04-14). Alexandria, VA: Human Resources Research Organization.
Strickland, W. J. (Ed.). (2005). Longitudinal examination of first term attrition and reenlistment among
FY1999 enlisted accessions (Technical Report 1172). Arlington, VA: U.S. Army Research Institute for
the Behavioral and Social Sciences.
Thirtle, M. R. (2001). Educational benefits and officer commissioning opportunities available to U.S. Military
Servicemembers (MR-981-OSD). Santa Monica, CA: The RAND Corporation.
Uhlaner, J. E., & Bolanovich, D. J. (1952). Development of the Armed Forces Qualification Test and predeces-
sor Army screening tests, 1946–1950 (PRS Report 976). Washington, DC: Personnel Research Section,
Department of the Army.
U.S. Code. (2009, January). Title X - Armed Forces, Subtitle - General Military, Part II - Personnel, Chapter
31 - Enlistments, Section 520 - Limitation on enlistment and induction of persons whose score on the
Armed Forces Qualification Test is below a prescribed level. Washington, DC: Author.
Selection and Classification in the U.S. Military 703

U.S. Department of the Air Force, Office of the Assistant Secretary of the Air Force (Manpower and Reserve
Affairs). (2005). Developing leaders of character at the U.S. Air Force Academy: From first contact to
commissioning. Washington, DC: Author.
U.S. Department of Defense. (2000). Report of the Defense Science Board Task Force on Human Resources
Strategy. Washington DC: Office of the Under Secretary of Defense (Acquisition, Technology, and
Logistics).
U.S. Department of Defense, Defense Human Resources Activity. (2007, June). Youth propensity and influ-
encer attitudes: Youth poll and advertising tracking findings. Washington, DC: Author.
U.S. Department of Defense, Office of the Assistant Secretary of Defense (Manpower, Installations, and
Logistics). (1985). Defense manpower quality (Vol. 1–3). Report to the Senate Committee on Armed
Services. Washington, DC: Author.
U.S. Department of Defense, Office of the Assistant Secretary of Defense (Force Management and Personnel).
(1991). Joint-service efforts to link military enlistment standards to job performance. Report to the House
Committee on Appropriations. Washington, DC: Author.
U.S. Department of Defense, Office of the Assistant Secretary of Defense (Force Management and Personnel).
(1993). Population representation in the military services: Fiscal year 1992. Washington, DC: Author.
U.S. Department of Defense, Office of the Assistant Secretary of Defense (Force Management Policy).
(1996). Educational enlistment standards: Recruiting equity for GED certificates. Report to Congress.
Washington, DC: Author.
U.S. Department of Defense, Office of the Under Secretary of Defense (Personnel and Readiness). (2004).
Population representation in the military services: Fiscal year 2002. Washington, DC: Author.
U.S. Department of Defense, Office of the Under Secretary of Defense (Personnel and Readiness). (2006).
Population representation in the military services: Fiscal year 2004. Washington, DC: Author.
U.S. Department of Defense, Office of the Under Secretary of Defense (Personnel and Readiness). (2007).
Population representation in the military services: Fiscal year 2005. Washington, DC: Author.
U.S. Department of Defense, Office of the Under Secretary of Defense (Personnel and Readiness). (2008).
Population representation in the military services: Fiscal year 2006. Washington, DC: Author.
U.S. Department of Defense, Office of the Under Secretary of Defense (Personnel and Readiness). (2009).
Population representation in the military services: Fiscal year 2007. Washington, DC: Author.
U.S. Government Accountability Office. (1997, January). Military attrition: DoD could save millions by better
screening enlisted personnel (NSIAD-97-39). Washington, DC: Author.
U.S. Government Accountability Office. (2007, January). Military personnel: Strategic plan needed to address
Army emerging officer accession and retention challenges. Report to the Committee on Armed Services,
House of Representatives. Washington, DC: Author.
Wallace, S. R. (1965). Criteria for what? American psychologist, 20, 411–417.
Wamsley, G. L. (1972). Contrasting institutions of Air Force socialization: Happenstance or bellwether? The
American Journal of Sociology, 78, 399–417.
Waters, B. K. (1997). Army Alpha to CAT-ASVAB: Four score years of military selection and classification
testing. In R.F. Dillon (Ed.), Handbook on testing. Westport, CT: Greenwood Press.
Waters, B. K., Laurence, J. H., & Camara, W. J. (1987). Personnel enlistment and classification procedures in
the U.S. military. Washington, DC: National Academy Press.
Waters, B. K., & Lindsley, D. H. (1996). Ability of military recruits: 1950 to 1994 and beyond. Proceedings of
the 15th Biennial Behavioral Sciences Symposium, Fort Collins, CO: U.S. Air Force Academy.
Warner, J. T., & Asch, B. J. (1996). The economic theory of a military draft reconsidered. Defense and Peace
Economics, 7, 297–312.
Welsh, J. R., Kucinkas, S. K., & Curran, L. T. (1990). Armed Services Vocational Aptitude Battery (ASVAB):
Integrative review of validity studies (AFHRL-TR-90-22). Brooks AFB, TX: Manpower and Personnel
Division, Air Force Human Resources Laboratory.
Welsh, J. R., Watson, T. W., & Ree, M. J. (1990). Armed Services Vocational Aptitude Battery (ASVAB):
Predicting military criteria from general and specific abilities (AFHRL-TR-90- 63). Brooks AFB, TX:
Manpower and Personnel Division, Air Force Human Resources Laboratory.
Wigdor, A. K., & Green, B. F. (Eds.). (1986). Assessing the performance of enlisted personnel. Washington,
DC: National Academy Press.
Wigdor, A. K., & Green, B. F. (1991). Performance assessment in the workplace (Vol. 1 & 2). Washington, DC:
National Academy Press.
Wise, L. L. (1984). Setting performance goals for the DoD linkage model. In B. F. Green & A. S. Mavor (Eds.),
Modeling cost and performance for military enlistment (pp. 37–60). Washington, DC: National Academy
Press.
704 Handbook of Employee Selection

Wise, L. L., Welsh, J. R., Grafton, F., Foley, P., Earles, J. A., Sawin, L. L., & Divgi, D. R. (1992). Sensitivity
and fairness of the Armed Services Vocational Aptitude Battery (ASVAB) technical composites (DMDC
Technical Report 92-02). Monterey, CA: Defense Manpower Data Center.
Yerkes, R. M. (1921). Psychological examining in the United States Army. Memoirs of the National Academy
of Sciences, Vol. XV. Washington, DC: U.S. Government Printing Office.
32 Public Sector Employment
Rick Jacobs and Donna L. Denning

Historians often cite the origin of civil service or public sector testing as far back as 2200 BC, when
a Chinese emperor used a process of systematic assessment to determine if his officials were fit for
office (DuBois, 1970; Frank, 1963). In these early times, individuals were assessed with what might
now be labeled job-relevant work samples; they included tests of specific skills such as horseman-
ship and archery. The Han Dynasty (202 BC to 200 AD) is credited with moving testing from the
actual actions required on the job to a surrogate, written format that included five areas of knowl-
edge: civil law, military affairs, agriculture, revenue, and geography (Gregory, 1996). Candidates
who were successful in meeting rigorous cut-off scores on local examinations were deemed appro-
priate to continue with the process of testing at regional and higher levels in the overall process,
much like what is seen today in multiple hurdle testing. Thus, in many respects, these ancient tests
were prototypes of what has become known generically as civil service examinations or, more gen-
erally, public sector testing.
This brief historical description depicts the genesis of public sector testing and illustrates that it
shares some similarities with current practices, but important differences exist. Most noteworthy,
use of these early tests was deficient in terms of systematic evaluation of outcomes or, more specifi-
cally, demonstration of their predictive validity. Further, the tests were conducted under extreme
conditions that required candidates to spend long hours in confined spaces that would never be
tolerated today, and they routinely had failure rates that were considerably higher than would often
prove viable today, well in excess of 90%.
If we move forward a few thousand years, from China (200 AD) to France (1791), England (1833),
and finally the United States (1883), we see the more immediate historical roots of modern-day pub-
lic sector testing (Graham & Lily, 1984). In these systems, tests were used to select individuals for
government positions in a way that was intended to be free of patronage and fair to all candidates.
These tests were each designed to identify the individuals most likely to succeed in a given position
on the basis of specific subject matter that made up the content of the tests, a precursor to what is
now routinely labeled as validity based on test content. Although much has been done over the years
to improve the characteristics of these assessments, such as more carefully matching test materials
to job requirements, further standardizing testing processes, and evaluating predictive efficiencies
by validation studies, the basic ideas underlying civil service examining have a long and rich his-
tory, in fact, one that predates emergence of the discipline of industrial psychology.
This chapter will provide details of the important characteristics of testing in the public sector.
It starts with the process of identifying the positions that are part of the competitive examination
process and then moves on to discuss the development and administration of examinations. A section
follows this on the key principle of validity, or linking tests to jobs. The next section addresses
recruitment of candidates; optimal selection decisions require maximizing the number of individuals
competing for the job. Part five of this chapter moves the discussion from entry-level testing to testing
for promotional opportunities. Next is a discussion of legal considerations surrounding testing in
the public sector. The chapter concludes with a summary of how public sector testing has evolved
through the past century and a view on where it might be moving in the 21st century.

705
706 Handbook of Employee Selection

POSITION CLASSIFICATION IN THE PUBLIC SECTOR


To fully appreciate the extent to which use of formal civil service examinations is entrenched in
public sector employee selection, the role of position classification in the public sector must be
considered. In this context, a “position” is the segment of work to be performed by one person.
Classification of positions involves documentation and analysis of the work of each position, then
grouping the positions with sufficiently similar work into a “class” of positions. More formally, a
“class” may be defined as follows:

a group of positions … sufficiently similar in respect to the duties, responsibilities, and authority thereof
that the same descriptive title may be used with clarity to designate each position allocated to the class,
that the same requirements as to education, experience, capacity, knowledge, proficiency, ability, and
other qualifications should be required of the incumbents, that the same tests of fitness may be used to
choose qualified employees, and that the same schedule of compensation may be used. (Committee on
Position-Classification and Pay Plans in the Public Service, 1941, p. 45, italics added)

Historically, this has been a judgmental exercise, but more recently it may include use of formal
job analytic methods. Subsequent to the creation of a class, additional positions are allocated to the
class when an additional need for work is determined and documented and the additional work is
deemed sufficiently similar to the work performed by incumbents in the class to warrant inclusion
of the position to the existing class. Similarly, an existing class is abolished when the need for the
work performed by those in the class no longer exists.
A description of the work performed by incumbents in a class and the qualifications neces-
sary for performing this work are then documented in a “class specification.” The “position
classification plan” of the organization, then, “consists of (1) the system of classes and class
specifications and (2) a code of formal fundamental rules for installation and maintenance of the
classification plan” (Committee on Position-Classification and Pay Plans in the Public Service,
1941, p. 47).
The classification of positions is a formal process that provides the underlying rationale for
assessment. With the specification of position qualifications and requirements, organizations can
seek to identify existing tests or construct new tests that match this information. What we have in
this approach are the roots of a content-based validation strategy, which may then stand on its own
or may be supplemented by additional validation information such as criterion-related evidence of
validity.

CIVIL SERVICE EXAMINATIONS


The provision in the definition of “class” that all positions in it require comparable qualification
led to regulatory provisions regarding the evaluation of qualification. The U.S. Code (Section 2301,
Title 5), which governs the U.S. federal civil service system and serves as a model for many other
government agencies, stipulates that “selection and advancement should be determined solely on
the basis of relative ability, knowledge and skills after … competition” (i.e., competitive exami-
nation). The official City of Los Angeles charter is even more explicit on this point, stating that
“Examinations shall … test the relative capacity of the persons examined to discharge the duties of
the class” (The City of Los Angeles, 2009, p. 69).
A separate civil service examination (either a single test or often a series of tests, the scores on
which are combined in a specific way to form a final examination score) is typically conducted for
selection into each class. Results of the examination appear as a list of candidates who successfully
completed all portions of the examination, ranked in descending order by their score. This list is
variously referred to as an “eligible list” or “register of eligibles,” indicative that all persons on it are
eligible for employment in the class on the basis of their having demonstrated in the civil service
examination appropriate qualification to occupy a position in the class.
Public Sector Employment 707

Adoption of the eligible list/register of eligibles, usually by a civil service commission or the
head of the department responsible for examining, marks the end point of the examination; but the
selection process has not concluded, because no one has yet been hired or promoted. This final step
is accomplished by the department with the vacancy requesting a “certification” of the list. Then,
in accordance with specific, strict rules, a designated number of candidates’ names are provided to
the department for their final hiring consideration (technically, they are provided to the “appointing
authority,” who is typically the department head and is the only person who can legally fill a posi-
tion in a civil service class). This certification rule has many variants and can range, for example,
from a “rule of one” (the highest scorer only, who is then hired unless there is reason not to do so, in
which case the second highest scorer is considered, and so forth), a “rule of three” (the three highest
scorers), a “rule of 2N + 1” (2 times the number of vacancies, plus 1, of the highest scores), to the
very liberal “rule of the list” (all persons on the list may receive final consideration for selection).
Evaluation of candidates for this final selection decision is to be based on additional job-related
criteria not found in the testing process. These can be difficult to identify when a thorough exami-
nation has been given.
In 1983, voters in Los Angeles approved a City Charter amendment for use of a rule of “Three
Whole Scores” for certification selection. This rule accomplished two things: First, the rounding of
scores to whole numbers eliminated the miniscule, decimal point differences in scores that had pre-
viously separated the ranks of candidates, and, second, the hiring department was able to consider
an expanded pool of candidates for final selection. In all instances described, all candidates tied at
a given score are treated the same (either certified for final hiring consideration or not), so rounding
scores to whole numbers has a greater impact than might be expected in grouping candidates at a
given score (rank) and thus expanding the pool from which a selection can be made.
Civil service examinations are seen as embodying “merit principles” in selection in that they are
based on job-related criteria and provide a ranking of candidates in terms of their relative degree of
qualification. Position classification invariably results in a class specification document that at the
very least provides a starting point for construction of a job relevant examination. Often the descrip-
tion of the job in the class specification document is supplemented by a more detailed job analysis.
Job analysis procedures may vary but have the common objective of specifying the work performed
(tasks) on the job. Additionally, and especially relevant for the purpose of developing selection test-
ing, the job analysis also often provides identification of the knowledge, skills, abilities, and pos-
sibly other personal characteristics needed to perform these tasks. This information then allows for
designation of the most appropriate types of tests for use and their content.
Once again, these provisions require that examinations are based on job requirements or the
ability to “discharge the duties of the class,” which logically results in a content-based test construc-
tion strategy. This, coupled with classification practices that are based on extreme similarity of the
work required of all positions in a class, results in nearly universal reliance on content-based testing.
And not incidentally, the narrowness in scope of the work performed by incumbents in a class thus
defined results in a proliferation of classes, relative to the number of employees, which precludes
local empirically based test validation strategies.

TESTING FOR A MULTIPLICITY OF JOBS


The statutory requirement that an objective assessment program be in place for each job (class) and
ensuing mandates that examinations be tailored to the unique demands of each are major challenges
facing public sector organizations; usually these organizations have a very large number of classes rela-
tive to the number of employees. As an example, in one county in the state of Ohio, the public library
service employs just over 1,000 individuals, and these 1,000 employees are classified into over 110 civil
service classes. The turnover rate in this organization is approximately 6% annually, indicating that in
any given year there may be about 60 job openings, and these 60 openings may span nearly as many
classes, each requiring a different civil service examination (State of Ohio, personal communication,
708 Handbook of Employee Selection

February 2008). Similarly, in a medium size city in Pennsylvania, the Civil Service Commission must
monitor staffing for over 400 classes. Although some testing for larger classes is predictable and regular,
many jobs may have a single vacancy only occasionally (and unpredictably), and the organization must
be ready to examine candidates for each and every one of them at any point in time. In the City of Los
Angeles, the Personnel Department is responsible for testing nearly 1,000 classes. The shear number of
jobs and requisites of the civil service system creates a situation in which tailoring selection programs
very specifically to each job can make timely completion of all examinations extremely challenging.
Another factor contributing to the volume of civil service examinations that must be developed
is the reluctance within many agencies to reuse tests. Test security reigns supreme, given the high
stakes of these examinations and the need to preserve the integrity of the process, to the extent that
considerable caution is exercised even with respect to repeat exposure of test material. And this
caution is well founded; incidents of candidates colluding to reproduce a test (by each memorizing
a specific set of items) have been repeatedly encountered.

DEFINING TEST CONTENT


One approach that helps to meet the demand for a separate examination for each job, given the multiplic-
ity of jobs within a given organization, is a systematic approach of analysis across jobs with a focus on
the commonality among jobs. This helps organizations bring order to jobs in terms of their similarities
and, potentially, to assessment tools and processes. Such commonalities are identified by analyzing indi-
vidual jobs and then comparing them for patterns of similar tasks, duties, responsibilities, and/or, most
importantly, knowledge, skills and abilities (KSAs). Public sector organizations that must select for many
classes can help reduce the burden of creating a complete examination unique to each class by construct-
ing assessment procedures and processes for use across multiple classes on the basis of their similarities.
This not only makes the development of selection systems more efficient, but such a process can also
result in the compilation of normative information for a much larger sample of individuals, which, in turn,
can improve the understanding of the tests used and the applicants being considered for employment.
Implementation of such a process requires use of job analysis procedures that are consistent
across jobs and that yield results that allow for comparison of jobs. Once this is accomplished, tests
that meet the needs of multiple jobs may be created and any modifications for individual jobs can
be made. This approach can also simultaneously facilitate consideration of a candidate for multiple
jobs through administration of a comprehensive battery of tests. From this perspective, either the
candidate, through application to multiple positions with similar requirements, or the organization,
via evaluation of candidates for multiple positions simultaneously, can benefit from knowing the
relationship among jobs. As earlier stated, for many public sector organizations, the number of jobs
considered distinct (i.e., classes) is daunting, and the use of common testing across jobs can help
make far more attainable the ultimate goal of timely administration of a formal examination for
each class with a vacancy (or to always have an eligible list available for each class).
Although many public sector organizations continue to use traditional job description and job
analysis procedures to define jobs, the past decade has seen a rise in the use of competency model-
ing as an underlying process for identifying job requirements, parallel to its use in the private sector.
A competency model may be constructed for higher-level jobs, especially those considered leader-
ship positions, or for all jobs in the organization. In both cases, the competencies identified form the
basis of the examination plan.

LINKING TESTS TO JOBS: VALIDATION AND ITS MANY FORMS


Any examination used for employee selection, whether it takes advantage of similarities identified
across jobs or not, must have a logical framework demonstrating how the tests are linked to the
job, or, more generally, an evaluation of test validity. Note that regardless of the validity evidence
used to support the tests included in the selection process, the examiner (or examination analyst,
Public Sector Employment 709

as they are often called) must engage in developing an examination plan. Examination plans link
the information about the job to the types of assessments that are included in the selection process.
Examination plans provide not only a logical underpinning to the use of a given type of test and its
content but also to the weight that each test receives in the final examination score. As an example,
in police officer selection, there has been a movement to establish a more broad-based assessment
consisting of not only cognitive ability but also personality characteristics that lead to effective
policing. An examination plan for the class of police officer would likely include multiple types
of tests with specific assessment dimensions for each and instructions as to how these tests are to
be considered (pass/fail or weighted) in the final composite score on which candidates are ranked.
Following this logic, it is not hard to see that very different jobs (e.g., library clerical worker, meter
reader, lifeguard, and purchasing agent) would have examination plans that differ from police offi-
cer and from one another, given the nature of each job and the knowledge, skills, abilities, and other
characteristics (KSAOs) necessary to perform in the position.
Public sector employment covers a very wide range of jobs and thus requires use of a correspond-
ingly wide range of assessment tools. Although the final examination may differ for various posi-
tions, a similar process is used for examination development for the vast array of public sector jobs.
First an analysis of the job is undertaken. Then, based on the results of a job analysis, an examina-
tion plan is developed that identifies the optimal (and feasible) type(s) of test(s) necessary to assess
the knowledge, skills, and/or abilities and aptitudes critical to performance of the job. For library
clerical workers and meter readers, tests might focus on attention to detail, whereas for lifeguards
the certification of successful completion of a first aid course might be supplemented with a physical
abilities test that includes water rescue.

MINIMUM QUALIFICATIONS
Threshold requirements, usually in the form of training, experience, or certification attained, are
often established as minimal qualifications for potential applicants. Public sector organizations
rely heavily on minimum qualifications as an initial step in the employment process. Minimum
qualifications (alternatively referred to as “requirements”) are threshold requirements that potential
applicants must meet to participate in the competitive examination. In reality, they are the first test
in the examination, because they consist of carefully established job-related criteria that each appli-
cant must meet precisely to be allowed to proceed further in the examination process. These criteria
are clearly communicated to applicants (so those lacking can self-select out at the earliest possible
time), consistently and rigidly applied, and are often subject to verification. Their use is completely
consistent with the content-based approach to testing so prevalent in the public sector, in that the cri-
teria individuals must meet to participate in the examination for a given class are those that indicate
a reasonable likelihood that they will have acquired the knowledge, skills, and abilities that will be
subjected to more refined assessment through the remainder of the examination.

IDENTIFYING POTENTIAL SELECTION TOOLS


Once this (preliminarily) qualified pool of applicants is established, public sector testing person-
nel identify or construct selection instruments that can be used for more refined assessment in
the remainder of the examination. They may search test publisher catalogues and/or professional
volumes that review tests, such as Tests in Print and Mental Measurements Yearbook (Murphey,
Plake, & Spies, 2006; Spies, Plake, & Geisinger, 2007). At times, this search process results in
identification of instruments that are appropriate and sufficient in their coverage to comprise the
entire examination used for selecting the most qualified applicants. However, even when this is
the case, test security issues may dictate that the use of a test that is readily available may be inap-
propriate because some candidates may gain access to the tests whereas others cannot. However,
in many instances, certain features of the job or the need to address specific issues of job content
710 Handbook of Employee Selection

require the creation of new tests; in fact, this is often a primary responsibility of the public sector
testing professional. Clearly the development of new assessment tools requires a great deal of time,
effort, and skill, and when that is multiplied by the number of jobs in the organization, the workload
can become overwhelming. Many public sector organizations pursue another option in some cases
by outsourcing to individuals or consulting firms specializing in instrument development. This is
especially true for high-stakes positions in which many jobs are being filled and the likelihood of
follow-up objections and legal action on the part of candidates is high.

ROLE OF THE INTERVIEW


As in the private sector, interviews are an extremely common type of test used in the public sector.
As with other types of tests, public sector organizations most often use interviews that are carefully
tailored to the job. For many jobs, formal written or actual work sample tests may not be a viable
alternative, at times simply because there are very few candidates and the cost of developing a test-
ing program does not warrant the effort. In these instances, an interview may be the only test in the
examination (except the minimum qualifications). For other jobs for which there are many steps in
the examination, those responsible for examining have the added obligation of creating and imple-
menting an interview procedure that is well integrated with other tests in the process. Interview
materials are typically developed directly from information contained in the job analysis. A viable
set of questions for the interview and scoring criteria must be established. In addition, the most
effective interview programs include careful standardization of interview administration, a writ-
ten guide to conducting the interview, and a training session for interviewers. In the public sector,
an interview panel is virtually always used as opposed to a single interviewer (or even sequential
interviews).
The American Public Transportation Association (APTA) has developed a Bus Operator Selection
System for bus operators (BOSS) that includes a 75-item survey of attitudes, beliefs, and experi-
ences, followed by a multifaceted interview designed to be conducted by a panel of three interview-
ers, each representing a different perspective on the job: operations, training, and human resources
(HR). This system is being used in about 30 transit organizations and has been administered to over
100,000 candidates nationwide. The original work documenting the system is described in Jacobs,
Conte, Day, Silva, and Harris (1996).

ALTERNATIVE MEASURES
All employers should identify appropriate predictors for use in employee selection and, in addition,
are required to search for alternative predictors for any original predictor that demonstrates a marked
difference in pass rates on the basis of designated candidate demographic/cultural group membership.
For example, cognitive ability tests are effective predictors of subsequent job performance. However,
the use of cognitive ability tests are alone will also usually result in large racial/ethnic group dif-
ferences in pass rates, with majority group members passing at a higher rate than members of most
minority groups. In this case, employers are required to seek out other predictors that can be used in
conjunction with or in place of the predictor(s) that result in large group difference(s) in pass rates.
The search for alternatives may take the form of identifying additional dimensions upon which to
assess candidates. This approach is reflected in the previously mentioned police officer selection
example, in which, for many decades, police candidates were given a cognitive ability test for the
initial identification of qualified candidates and then further vetted via interviews, physical ability
tests, background investigations, and medical/psychological evaluations. More recently, systems for
initial screening commonly include a cognitive test with additional areas measured, such as per-
sonality or biographical data, and expand alternative testing formats to include video-based tests
or job simulations. Both approaches to identifying test alternatives, content, and format should be
considered when meeting the mandate of alternative tests.
Public Sector Employment 711

RISKS AND LEGAL CHALLENGES


Ultimately any testing system must have a formal evaluation regarding its ability to accurately
select individuals. Public sector testing programs are often the first to be challenged because they
require for use of formalized testing and they impact large numbers of applicants to jobs that are
so visible and pervasive in our society. The Equal Employment Opportunity Commission (EEOC),
the U.S. Justice Department, and state and local fair employment agencies often view the selection
process for public sector jobs as a good mechanism for making policy statements by challenging the
validation evidence for these highly visible testing programs. Particularly when applying to police
or fire departments, many unsuccessful candidates adamantly disagree with their failure to attain
employment. These disappointed candidates may become enthusiastic plaintiffs, with no shortage
of lawyers and public action organizations willing to assist them in their attempt to prove the testing
process wrong.
In 1988, an external consulting firm worked with a major city to develop and administer a test for
selecting firefighters. Three days after the administration of the written test, there was an advertise-
ment in the local paper by an attorney soliciting individuals who had taken the test and did not think
the test was fair. Interesting to note was the fact that the posting appeared well before the results of
the test were available. What was clear from this episode is that any test used for selecting a few suc-
cessful candidates from a large number of applicants is likely to be challenged, and those respon-
sible for the testing process must have some way of demonstrating its links to the job, its ability to
select the right people, and, possibly, why this test was used whereas others (alternatives) were not.
This sets a high standard, and one that is required by law. It also demands a strong logical basis for
decisions made in developing the testing process, and, ultimately, leads to higher-quality tests and
more capable individuals being selected. Even with all of these objectives met, those who believe
the results were unfair can challenge a testing program; such challenges should further encourage
agencies to document all they have done to promote fairness.

CREATING A TALENT PIPELINE: RECRUITING CANDIDATES


No selection program can be successful unless the number of candidates exceeds the number of
positions available. This has been an operating principle of employee selection for years and was
first codified in the work of Taylor and Russell (1939). The challenge that faces public sector employ-
ers is two-fold: (a) to create efficient recruitment and selection systems for all jobs in high demand so
as to maximize the number of qualified applicants and (b) to expend extra effort to find candidates
for jobs where demand has outstripped supply. With respect to the first, although a larger number of
candidates is generally seen as a positive in terms of selection utility, assessing very large numbers
of candidates can result in much higher selection costs, thereby driving down the overall utility of
the program. A case in point is a large police force in New York State. The agency charged with
selecting new police officers administers a test once every four years and hires approximately 100
new officers per year, or 400 over the life of the list of eligible candidates. Because this is a particu-
larly attractive job in terms of prestige, location, and salary, the number of candidates is very high. In
1999, over 31,000 individuals applied to take the test and over 26,000 actually appeared for testing.
In 2007, there were in excess of 23,000 test-takers. Clearly a ratio of 65 to 1 or even 57 to 1 is well
beyond what is needed for effective selection and simply increases the cost of testing. In this case,
the testing process required use of over 160 schools and over 3,000 test monitors and administrators,
and its total cost exceeded $2 million (EB Jacobs, LLC, personal communication, June 2007).
In contrast, finding sufficient applicants is the major issue for several public sector positions.
For some jobs, there may be as many openings as candidates, and, in some cases, fewer qualified
individuals than positions available. When this occurs, there is no selection program that can be
effective, and the focus must turn to recruiting. From the standpoint of the agency responsible for
hiring new employees, the job analysis and historical records offer valuable information. The job
712 Handbook of Employee Selection

analysis identifies the KSAs of people to attempt to attract, making it possible to target the recruit-
ing effort. As an example, although there are always many men who want to be firefighters, finding
women for the job can often be a problem for many agencies. Because a firefighter’s job is very
physically demanding, finding women who are physically capable of not only lifting and carry-
ing heavy objects, including other people, but doing so with about 46 lb of equipment presents an
additional challenge. Clearly, just going out and recruiting any woman is not a good strategy in light
of our additional knowledge about the job. The focus in this example and many others must be on
recruiting individuals who are interested in and likely to be qualified for the job.
Historical data can also be of great value by pointing agency personnel to past successful
recruits and helping to identify schools, vocational programs, and other training grounds where
those likely to be successful in the job may be found. Establishing partnerships with educa-
tional institutions may also be considered. Further to the point regarding targeted selection of
fi refighters, establishing physical pretraining programs targeted at women is not uncommon. In
one particularly good demonstration of effective recruiting, a city agency responsible for hiring
fi refighters concluded that the major hurdle in hiring women was not an overall lack of appli-
cants, but a lack of female applicants who could pass their physical abilities test. This conclu-
sion resulted in an adjustment in recruitment strategy to focus on recruitment of women from
places where previously successful female applicants had been found. These female fi refighters
belonged to local gyms, fitness clubs, and local athletic teams. The logic proved to be accurate
in that the targeted recruitment identified more female applicants who successfully passed the
physical abilities test. The strategy was successful, and the city increased its percentage of
female fi refighters.
Not all recruiting efforts return such positive results. Recruiting without information on the
KSAs required to be successful is likely to be a waste of time and effort. In some cases “random
recruiting” may actually detract from the goal of more effective testing. When recruits who are
poorly prepared or not truly interested in the job are solicited and become candidates, the potential
for changes in the passing rates of the various demographic applicant groups increases, and the
result can be an increase in adverse impact, no doubt the opposite of the desired outcome.
Indeed, effective recruitment is vital to creating an effective selection program, but the pro-
cess must be guided by the knowledge of what is required by the job, what have historically been
successful avenues for finding the needed talent, and what new approaches (e.g., pretraining pro-
grams and educational partnerships) may prove viable in the future.

PROMOTIONAL PROCESSES: USING WHAT WE KNOW ABOUT


PEOPLE AND THEIR CAPABILITIES TO OUR ADVANTAGE
A word about some differences between public sector entry-level testing and public sector promo-
tional testing is important to fully understand the various approaches to selection that are required.
For many entry-level positions, a formal training program exists, and those selected for the position
will be placed in that program once they formally accept the job. In many public sector promotional
systems, the individual who was at a lower-level job on Monday may find herself at a higher-level
job on Tuesday with little or no training prior to moving up the organizational ladder. This distinc-
tion has important implications for testing. Because training will occur for the first example and
not for the second, it means that testing for the lower-level position should not include the knowl-
edge, skills, and expertise that will be learned prior to moving into the position. In the promotion
situation, the requisite information, skills, and expertise are needed on day one of the higher-level
position incumbency, so it is legitimate to test for all. In practice, what this often means is that entry-
level tests focus on underlying abilities requisite for learning the job, whereas promotional tests are
more closely linked to actual job requirements.
There are also distinctions in terms of the validation processes most often encountered when
it comes to entry-level versus promotional testing. A content strategy for validation is likely to be
Public Sector Employment 713

used for either entry or promotional testing. As stated earlier, this method of validation establishes
logical links between test requirements and job requirements, often times supported with judgment
data from subject matter experts (SMEs). In many programs of validation for entry-level testing, this
strategy is supplemented with a criterion validity study. However, it is rare that a criterion-related
study is part of the validation process in promotional exams. This is, for various reasons, including
relatively small sample sizes for many promotable positions, issues of test security that arise as a
function of administering a test to a group of incumbents and then using it again for candidates,
difficulties in getting a sample of incumbents to do the necessary level of preparation to take the
test, and lack of incumbents’ motivation to participate in an activity that is seen as not relevant to
their job.
The number and variety of promotional examinations are also daunting, and the task of creating
and implementing job-related examinations is equally difficult. In the vast majority of instances,
promotional examinations include minimum qualifications that specify the lower-level class or
classes from which promotion to a given class must be made, as well as the required number of
years of service in the lower-level class or classes. Most public sector organizations have these rules
and regulations to not only specify job-relevant experience, but also as part of a mandate to prevent
patronage or favoritism from replacing the identification of the most talented individuals for the job.
The process of developing a promotion examination is similar to that for entry-level examinations,
with one very important difference: Most agencies emphasize promotion from within (again, often
legally mandated). As such, the amount of information known about the candidates greatly exceeds
what is known about entry-level applicants. This could be a tremendous advantage in the identifica-
tion of talent if properly tapped.

DEVELOPING PROMOTIONAL TESTS


Promotion examinations are developed based on the concept that those in lower-level jobs acquire
KSAs required for the next job in the hierarchy. Similar to entry examinations, potential appli-
cants for promotional examinations are prepared for testing by informing them about (a) the
duties and responsibilities of the job, (b) the required knowledge base, and (c) the underlying
skills, abilities, and other characteristics required by the job. This information is conveyed to
candidates via a test announcement, or “bulletin”, which outlines the types of tests, and often
their content and scoring, as well as hurdles (decision points for progression in the examination)
that candidates will encounter. For some positions, this may require very little preparation, but
for others (e.g., police sergeant or fire captain) agencies often announce the examination 6 months
or more in advance to give candidates adequate time to prepare for the various tests that make up
the promotion process.

APPRAISING PAST PERFORMANCE


One frequently missing element in promotional processes is the assessment of past performance.
Although this has the potential to be the most important single indicator of future performance,
its rare use in promotional processes stems from a lack of confidence that performance ratings have
been or will be consistent and accurate. More generally, it is typically believed that performance
ratings lack the psychometric rigor required for any formal testing process. This is a true chal-
lenge for many organizations, because it speaks to a dilemma that can best be summarized as,
“We have potentially very valuable information about individuals that we cannot extract and
use because of a lack of confidence in (or acceptance of) the ratings given by supervisors or
others.”
Indeed, one clear opportunity for improving promotion processes is the more effective use
of past performance for determining who will move up in the organization. To this end, several
assessment techniques, some of which have been used in private sector selection and, especially,
714 Handbook of Employee Selection

in employee development programs, have been devised for the measurement of past performance.
Behavioral accomplishment records (Hough, 1984), ratings of promotability, career review boards,
and behavior-based interviews have all been seen as additions to the overall promotional processes
used in public sector testing. It remains the task of testing professionals to further enhance promo-
tional processes by continuing to improve these techniques that capture prior job-relevant perfor-
mance. It is further the responsibility of the professional testing community to convince candidates
and their agencies to more frequently incorporate these improved testing practices into existing
assessment programs.

PERSONNEL DECISION-MAKING AND LEGAL JEOPARDY


As noted above, the promotion process (as well as the selection of new employees) in public sector
organizations can often lead to legal disputes and challenges by individuals, groups, and govern-
ment entities, such as the U.S. Department of Justice. Most of the time what is at issue is disparate
impact, in which the results of the selection or promotion systems appear to disadvantage one or
more demographic/cultural groups. When this occurs, as for private employers, the public sector
agency is required to demonstrate the validity of the process. This demonstration can take many
forms, and it is not unusual to provide multiple sources of validity evidence ranging from the most
common form, content-based evidence of validity, to extensive documentation of criterion related
validity, which may be based on research conducted internally or by external consultants for the
organization and/or generalized evidence of test validity.

UNIQUE COMPETITIVE PROCESSES


The stakes in public sector promotional testing can be very high. As stated above, in many pub-
lic sector jobs, the only way to advance is by having served in one or more specific job(s) for
a minimum number of years, sometimes additionally having successfully completed specialized
education/training or other formal certification, and by successfully competing in the promotional
process. In these competitive promotional examinations, some candidates succeed, but a larger
number of candidates do not. This competition among peers can have negative consequences for the
individuals involved and for the organization. It is not uncommon for the entire process to be chal-
lenged by individuals who did not do well enough to be promoted and concluded that the process
was flawed and unfair. When this happens, colleagues find themselves on opposite sides of a legal
battle, in which the candidates successful during the testing process hope the test results will be
upheld, and those who did not do sufficiently well on the examination to be promoted work to dis-
credit the process. Unfortunately, at this point in the promotion process, all candidates have invested
a great deal of preparation time and energy, and many feel frustrated by the delay in implementing
the results. It is not unusual for these types of challenges to stretch out for years, thereby creating
problems for all participating entities: the candidates, the HR professionals, and management of the
agency wishing to promote its employees.
Another factor that affects the tendency for legal challenges of selection processes within
the public sector, in contrast to much of the private sector, is that these processes are by design
open and visible; unquestionably, the examination is “responsible for” selection outcomes.
This provides disappointed candidates an obvious target for pursuit of litigation. Furthermore,
because civil service systems still very frequently have mandated candidate appeal or “pro-
test” rights, filing suit may seem nothing more than an obvious extension of a right they are
already afforded. In point of fact, formal, stringent requisites of what constitutes an appeal or
protest and how they are adjudicated exist, but these rights at times seem to be misinterpreted
simply as a right to register complaints. Once administrative remedies have been exhausted
and the outcome remains negative to the candidate’s interest, it may seem a natural next step
to pursue litigation.
Public Sector Employment 715

NEGATIVE CONSEQUENCES FOR INDIVIDUALS AND ORGANIZATIONS


Frequently, the legal challenges that confront public sector testing create a crisis of confidence
not only in testing, but, more generally, in the promotional process moving forward. Individuals
question the ability of the people responsible for testing and speculate that the system has come
under the control of the legal system without regard for identification of the top candidates. As
these cases drag on, temporary appointments may be made, which further complicate the situa-
tion. When order is finally restored, another problem can occur with respect to what to do with
those who were placed in the higher-level job as a provisional appointment. When a new testing
program is instituted and someone who has been in the job for many months or even years does
not achieve a successful score, the immediate question that arises is “How valid could the test
be with respect to predicting performance?” The underlying logic here is, “How could the test
be relevant to the job if someone who has managed to do the job successfully for the past few
months/years cannot pass it?” This problem is sometimes resolved by making permanent the
provisional appointments, thereby further complicating the testing/promotional process by hav-
ing one set of individuals in the job on the basis of one system and others getting there through
an alternative route.

BALANCING VALIDITY AND DIVERSITY


Public sector agencies are in a constant struggle to simultaneously increase the validity of their
selection and promotion processes and to improve the diversity of the group that is selected. This
is a complex task that may involve actions that result in focus on one at the expense of the other
(Aguinis & Smith, 2007 ; DeCorte, Lievens, & Sackett, 2007; Ployhart & Holtz, 2008; Pyburn,
Ployhart, & Kravitz, 2008; Sackett & Lievens, 2008). Many of the selection procedures used
to predict future performance show large differences among various groups. With respect to a
variety of measures of cognitive ability and knowledge-based multiple-choice tests, both popular
with public sector agencies because of their clear right-and-wrong response format, Caucasian
candidates consistently outperform Black and Hispanic candidates. When the selection proce-
dure switches from cognitive ability to physical ability, women typically score lower than men.
Agencies take steps to minimize these differences, and although some approaches may be help-
ful, (e.g., study guides, training sessions, and practice testing) none eliminate the group differ-
ences completely.
Recently, many public sector agencies, although still acknowledging the need for some component
of the process to test for cognitive ability, have created examinations that include noncognitive
measures such as personality tests or biographical information. These types of tests are sometimes
met with protest from applicants, unions, and other interested parties on several grounds, but, at
least when it comes to selection of new candidates (versus promotional testing), such testing has
been implemented. In some instances, the inclusion of different types of instruments has reduced
group differences that are observed when testing only for cognitive ability and has also enhanced
overall validity, but they have not eliminated adverse impact (Sackett & Lievens, 2008). One of
the reasons for this failure in removing adverse impact is a low but impactful correlation among
cognitive-ability-oriented predictors and less traditional selection tools such as personality indica-
tors. Although many personality scales show no difference between minority and majority group
members, numerous scales do show some difference, and the difference is also in favor of majority
test takers in a way believed to be linked to the positive correlation between these personality mea-
sures and cognitive ability (Cascio, Jacobs, & Silva, 2009).
This problem is made even more difficult by the fact that for many public sector jobs, the selec-
tion ratio is quite favorable for the organization (i.e., many candidates and few individuals selected).
As the selection rate gets smaller and smaller (more candidates relative to the number of positions
to be filled), the impact of any group difference grows quickly. Even small group differences can
716 Handbook of Employee Selection

cause large levels of adverse impact when selection ratios drop below .20. This further complicates
the situation for the agency, because one goal is to make the jobs widely available, but doing so can
have a negative consequence on diversity.

DEFENSIBILITY OF PROCESS
Ultimately, a public sector agency must make its employee selection systems (entry and promo-
tional) defensible. To do this, there are steps that must be taken, and these steps must not only con-
form to the laws and guidelines governing selection, they must also be meticulously documented
(Guion, 1998).
In entry and promotional programs there are winners and losers. The winners get what they
desired, the job, and those less fortunate walk away either without a job (entry) or in the same job
they were in prior to studying and preparing for the promotional process. At times, those in the latter
category seem to decide that challenging the test on any number of grounds is a good strategy for
obtaining a second chance. In some situations, the tests may actually be poorly prepared, lacking in
job relevance, undocumented with respect to how they were created and/or linked back to the job, or
simply administered without regard to accepted testing practices. However, in other cases, the alle-
gations about the test may be a disingenuous attempt to vacate the results and provide all test-takers
with another opportunity for success. When a test is challenged, it does not automatically mean the
test was deficient or that the process violated the laws and guidelines that prevail. Challenging the
test is a right of any candidate who can establish an underlying legal basis, most often in the form
of adverse impact. Once adverse impact is established, it becomes the responsibility of those using
the test to establish the validity of the process.
Given the above, it is important to consider what must be present to make a hiring or promotional
system defensible. Below we provide further details on the following critical factors for defending
a testing process:

• Job analysis
• Links between test elements and aspects of the job
• Logic behind the combination of multiple scores into a final composite
• Test administration details
• Scoring processes
• Documentation

There is unanimous agreement that a fair test is only possible with confirmation that those responsible
for the test understand the job. In the context of public sector testing, this means that the job in question
has been defined and documented, which occurs at the inception of a job class through the creation
of a class specification, and then is often supplemented with a more detailed job analysis and/or dur-
ing examination development with the assistance of job experts. The results are widely accepted by
incumbents, supervisors, HR specialists, and potential applicants as reflecting the important features
of the job. To be useful as underlying test development documents, the job analysis and job description
must reflect not only the tasks and responsibilities of the job, but also the knowledge base required by
the job and those skills, abilities, and other personal characteristics that facilitate job performance.
A second important element of providing the necessary information for defense of a test is evi-
dence that links the test items, work samples, and other components of the examination to the actual
requirements of the job. This linking process often takes the form of surveys that identify the mate-
rials underlying the test questions and asks SMEs to identify the degree to which these materials aid
in completion of various work tasks and responsibilities. This process is commonly accepted as a
means of establishing validity on the basis of test content and takes many forms. At the root of any
successful demonstration of validity is a clear listing of the test requirements, the job requirements,
Public Sector Employment 717

and how those two sets of requirements are associated. Critical to this approach to the demonstra-
tion of validity based on test content is an appropriate sampling of incumbents, supervisors, and/or
other job experts, along with clear instructions to those who are providing the responses. Although
surveys are often used, this process can also be accomplished with review meetings involving job
experts in which the material is analyzed and discussed and consensus judgments of the experts
are documented.
In most contemporary entry and promotional processes, a single test does not represent the full
range of job requirements; so multiple tests are used for selection or promotion. When this is the
case, a third requirement comes into play. Because all selection and promotional processes require
a single, final score in the examination, the manner in which that score is calculated becomes an
important consideration. As an example, in police officer selection and in firefighter selection there
often is a written and a physical test. Combining these two scores becomes an issue because the
weight assigned to each score will determine, in part, its impact on final score. Years of job analysis
for both of these jobs have yielded consistent results. Although both jobs require physical capabil-
ity, the firefighter job is more physically demanding. Results from our own job analysis work across
various police and fire departments have shown that the job of a firefighter is between 40% and 60%
physical with the remainder requiring cognitive abilities, whereas the job of a police officer is often
reported by incumbents to be between 20% and 30% physical and 70–80% cognitive. The two jobs
clearly need different weights for the physical test when it comes to creating a selection composite.
Like the evidence for linking the test elements to the job requirements, a rationale for the weight-
ing used to form the final test score is necessary. This is often based on input from job experts and
professionals in the testing area. The important point is that there is a rationale for the weights that
are being used and that the rationale is tied to requirements of the job. It should also be noted that
there is no one best way to establish these weights; also, in many testing situations the components
of the examination are correlated with one another. When correlation exists among components, the
weights become somewhat less of an issue because small variations in weights do not substantially
change the overall results. As the correlations increase, the impact of differentially weighting the
components becomes far less of an issue, and at some point, simply equally weighting the test com-
ponents works in a similar manner to elaborately defining a very precise set of differential weights.
On the other hand, some would argue for use of equal weights simply because of their demonstrated
robustness in prediction (Schmidt, 1971).
A fourth area for defensibility is in the actual administration of the tests. The best-developed
tests, the ones with the highest degree of validity evidence and the strongest rationale for weight-
ing of components, can become useless if test administration processes are deficient. Threats to
administration can come in various forms, ranging from failure to protect testing materials before
the actual test date to administering the test in a room with poor lighting, loud outside noises, or
missing pages in test booklets. Although this seems to be the least difficult part of defending a test
and the easiest to achieve, it is often the Achilles heel of a testing process. Care must be given to all
phases of test administration; for example, materials such as instructions to candidates, information
about the testing locations and facilities, and any irregularities in the actual administration of the
test all must be well documented. Nothing is worse than doing all of the right things when it comes
to test development and then compromising it all during test administration.
All test materials must be scored, and the scoring process represents a fifth area in which threats
to the defense of a test can occur. Many modern tests are administered via paper and pencil and
scored by scanning machines or taken online and scored automatically. Scoring integrity must be
demonstrated in the form of getting the correct outcome for each candidate. In the case of scanned
answer sheets, this means that all answer sheets must be reviewed for irregularities. It is a good
idea to scan each test sheet twice and to compare scores for any differences in the two scans; any
differences indicate scanner problems or simple “hiccups” in the scanning process. Another way to
ensure accuracy is to compare candidates’ hand-scored tests with their scanned scores (usually for a
sample of candidates). For online testing, periodically sending through a “phantom candidate” with
718 Handbook of Employee Selection

a known score to make sure that the algorithm is generating the correct score is a useful step. With
respect to other types of potential test scoring problems, demonstrations of interrater agreement and
other forms of reliability help to substantiate the appropriateness of scoring protocols. Although a
discussion of reliability is not consistent with the goals of this chapter, any and all steps that can be
taken to show the consistency of test results will be of great assistance in addressing any challenges
to the scoring process.
A final step in the defensibility of a testing program is the documentation of all steps in the pro-
cess. This includes specification of how each test was developed, how each test was administered
and scored, and how the final examination score was calculated for all candidates. Creating the
paper trail of your work not only allows everyone to see the steps taken but also memorializes the
process. In many challenges to public sector testing, legal proceedings take place years after the
examination was developed and administered. Relying on memory and randomly filed memos of
what happened will never provide the information necessary to successfully support the conten-
tion of adequacy of the process. Public sector testing is best completed by the compilation of a
final report or file that details the project from start to finish. This documentation should be clear
and it should contain all of the necessary surveys, instructions, and tests used in the examination.
There is no better way to defend a test than to have it well documented. In situations in which
a challenge is presented to an agency regarding the testing process, the agency can provide the
potential plaintiff with a copy of the report. On more than one occasion, this has ended the chal-
lenge to the test.
Doing everything correctly and doing it by the steps outlined above is not a guarantee of
an instant determination of validity or being able to use the results of the examination. Public
sector testing has a long and interesting history of legal challenges and decisions. Not all deci-
sions follow what would logically flow from the various perspectives presented in a case. Public
sector testing is just that, it is public, and much of what is decided in a specific case is best
understood from the historical perspective of the selection problems for that city, state, or
agency. Like any other understanding of a complex issue that has developed over time, context
is critical.

CONCLUSIONS
Public sector testing has evolved over the past 2 centuries in terms of test content and test format.
We have seen the movement from tests based solely on memory and other cognitive abilities to
the inclusion of social judgment, personality, and biographical information. We have seen simple
paper-and-pencil testing transition to testing formats that are computer-based and inclusive of video
stimulus materials. With respect to the identification of critical underlying job requirements, we
have seen public sector testing programs expand their use of systematic job analytic techniques that
approach not only single jobs under study, but also classes of jobs, so that the inherent interrelation-
ships among jobs can be identified to better take advantage of opportunities to use a common testing
system across jobs. With respect to the legal arena, it is public sector testing that is often singled out
as the test case for looking at the defensibility of specific test formats and test content as well as the
way in which test scores are used in the decisions made about people and jobs. Clearly, as the chal-
lenges to the fairness of various types of testing programs move forward, public sector applications
will be part of the landscape.
Unlike the private sector, public sector employees are less susceptible, although still not immune,
to layoffs or downsizing, although hiring freezes are common. This fiscal reality translates to fact
that most cities and public agencies, even in their toughest financial times, continue to require sub-
stantial levels of staffing. Therefore, the enormous demand for testing programs for the hiring and
promotion of public sector employees will continue, and the need for accomplished and creative test
development professionals will offer tremendous opportunities to further develop the way in which
we measure candidates against job requirements.
Public Sector Employment 719

REFERENCES
Aguinis, H., & Smith M. A. (2007). Understanding the impact of test validity and bias on selection errors and
adverse impact in human resource selection. Personnel Psychology, 60, 165–99.
Baruch, I. (Chair). (1941). Committee on Position-Classification and Pay Plans in the Public Service. Position-
classification in the public service. Chicago, IL: Public Personnel Association.
Cascio, W. F., Jacobs, R. R., & Silva, J. (2010). Validity, utility and adverse impact: Practical implications from
30 years of data. In J. Outtz (Ed.), Adverse impact: Implications for organizational staffing and high
stakes selection (pp. 271–288). New York, NY: Psychology Press.
City of Los Angeles. (2009). Official city of Los Angeles charter. American Legal Publishing Corporation.
Retrieved November 16, 2009, from http://www.amlegal.com/nxt/gateway.dll?f=templates&fn=default.
htm&vid=amlegal:laac_ca
De Corte W., Lievens F., & Sackett P. R. (2007). Combining predictors to achieve optimal trade-offs between
selection quality and adverse impact. Journal of Applied Psychology, 92, 1380–1393.
DuBois, P. H. (1970). A history of psychological testing. Boston, MA: Allyn & Bacon.
Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, & Department
of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43,
382990–38315.
Franke, W. (1963). The reform and abolition of the traditional Chinese examination system. Cambridge, MA:
Harvard University Press.
Graham J. R., & Lily, R. S. (1984). Psychological testing. Englewood Cliffs, NJ: Prentice Hall.
Gregory, R. J. (1996). Psychological testing: History, principles and applications (2nd ed.) Boston, MA: Allyn
& Bacon.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions. Mahwah, NJ:
Lawrence Erlbaum.
Hough, L. M. (1984). Development and evaluation of the Accomplishment Record: Method of selecting and
promoting professionals. Journal of Applied Psychology, 69, 135–146.
Jacobs, R. R., Conte, J. M., Day, D. V., Silva, J. M., & Harris, R. (1996). Selecting bus driver multiple perspectives
on validity and multiple estimates of validity. Human Performance, 9, 199–218.
Murphy, L. L., Plake, B. S., & Spies, R. A. (Eds.). (2006). Tests in print VII. Lincoln, NE: Buros Institute of
Mental Measurement.
Ployhart, R. E., & Holtz, B. C. (2008). The diversity-validity dilemma: Strategies for reducing racioethnic and
sex subgroup differences and adverse impact in selection. Personnel Psychology, 61, 153–172.
Pyburn, K. M., Jr., Ployhart, R. E., & Kravitz, D. A. (2008). The diversity-validity dilemma: Overview and legal
context. Personnel Psychology, 61, 143–151.
Sackett, P., & Lievens, F. (2008). Personnel selection. Annual Review of Psychology, 59, 419–450.
Schmidt, F. L. (1971). The relative efficiency of regression and simple unit predictor weights in applied
differential psychology. Educational and Psychological Measurement, 31, 699–714.
Spies, R. A., Plake, B. S., & Geisinger, K. F. (Eds.). (2007). The seventeenth mental measurements yearbook.
Lincoln, NE: Buros Institute of Mental Measurement.
Taylor, H. C., & Russell, J. T. (1939). The relationship of validity coefficients to the practical effectiveness of
tests in selection. Journal of Applied Psychology, 23, 565–578.
This page intentionally left blank
33 Selection Methods and
Desired Outcomes
Integrating Assessment
Content and Technology to
Improve Entry- and Mid-Level
Leadership Performance
Scott C. Erker, Charles J. Cosentino, and Kevin B. Tamanini

The importance of selection to modern organizations cannot be overstated. By now, it is safe to


consider that most private organization use some form of structured selection tool, method, or
process to make decisions about people—who to hire, promote, and/or accelerate toward future
leadership roles. Organizations have paid the price for unstructured selection procedures at leader-
ship levels. Lack of consistency in selection of leaders can lead to poor motivational and skills fit
between the individual and the job, as well as ineffective leader to follower relationships resulting
in low performance, unmet expectations, and high turnover. This ultimately leads to low organi-
zational productivity, inconsistent customer service, and low employee engagement. As modern
organizations must frequently respond to new market demands, competitors, and technologies,
unstructured selection methods that rely on “gut feel” of hiring managers pose a greater risk to
organizational growth and survival. Rapid advancements in technology, increasing sophistication
among consumers, and competition in global markets make the ability to adapt and change orga-
nizational strategies and tactics critical to sustained success. The requirements for leaders and the
selection methods used to identify those who are the most likely to excel in these demanding roles
must keep up with the rapid change in business. Today, organizations benefit from decades of sci-
ence and practice that provide guidance for how to maximize the likelihood that an informed deci-
sion can be made about whom to hire/promote for a particular job or role. When you consider the
direct costs (e.g. hiring and training costs, compensation, and benefits) of employing an individual
across their potential tenure in an organization, each hire/promotion over time is an investment in
the millions of dollars. The financial investment for organizations in their leaders is significant. The
utilization of accurate information for making hiring decisions significantly increases the likeli-
hood of positive return on investment in those leaders. When looked at from an aggregate level,
the cumulative effect of effective selection decisions can lead to extraordinary performance and
becomes a true competitive advantage.
Personnel selection has been one of the central topics in the study of work behavior (Guion,
1998) and ultimately aims to identify the individuals who will comprise the workforce in any given
organization (Salgado, Viswesvaran, & Ones, 2001). As noted by Howard (2006), effective systems
boost organizational performance and allow individuals to excel by engaging in work they enjoy
because the organization gets the right people into the right jobs. Although much of the research lit-
erature has focused on selection issues associated with entry-level jobs (e.g., Campbell, McHenry, &

721
722 Handbook of Employee Selection

Wise, 1990), the selection systems at higher levels within organizations are just as, if not more,
critical. The variation in job performance is greater for those in leadership roles than other positions
(Schmidt & Hunter, 1998). From a utility perspective, selecting a superior manager will result in
48% more output than an average manager (as compared to 32% for skilled workers and 19% for
lower-level jobs; Hunter, Schmidt, & Judiesch, 1990). Getting the right leaders into the top posi-
tions will stimulate organizations to grow and prosper (Howard, 2006). Indeed, the financial health
of an organization is predicated on the optimal selection and placement of employees (Hunter,
Schmidt, & Judiesch, 1990). The bottom line is that it pays to have an effective selection system,
especially when dealing with leader-level positions.
This chapter focuses on the selection of entry- and mid-level leaders within private sector orga-
nizations. From this point forward, the authors refer to this as the selection of leaders or leader-
ship selection. A review of contemporary organizational challenges unique to entry- and mid-level
leadership sets the stage for a discussion of themes and strategies for enhancing leadership effec-
tiveness and bench strength through improved leadership selection and assessment practice. Next,
a comprehensive set of leadership assessment tools and procedures are described. Real-world case
studies are used to illustrate the application and results achieved from enhanced selection programs.
Finally, we overview common business scenarios in which assessment is used to guide leadership
selection in the private sector.
Recently, Development Dimensions International (DDI), a consulting firm that aids organiza-
tions in hiring and promoting better employees and developing leadership talent, conducted a lead-
ership forecast study among 4,559 leaders and 944 human resources (HR) representatives across
the management hierarchy (17% first-level, 38% mid-level, 34% higher-level, and 10% senior level;
Bernthal & Wellins, 2005). They found that regardless of management level, respondents had
more confidence in senior leaders than either mid-level or low-level leaders (55% rated their confi-
dence high as compared to 39% and 37%, respectively). In another DDI study, HR leaders in 348
organizations around the world anticipated greater difficulty filling leadership positions that can
meet the challenge of a global economy in the future (Bernthal & Erker, 2005). Specifically, 52%
of respondents expected problems filling mid-level leadership positions with qualified candidates
and 28% anticipated problems filling first-level leader positions with qualified candidates. These
statistics are alarming in that they suggest considerable variance/gaps in the selection of entry- and
mid-level leaders—a job that is increasingly important as global growth, service expectations, and
technology complexity have made the quality of leaders at this level more critical to sustained orga-
nizational success. This situation is exacerbated by the shortage of qualified candidates, especially
in developing economies. Urgency in filling a position with few candidates is one of the most com-
mon sources for selection errors. Although poor leader selection at the higher levels (e.g., senior
executives and CEOs) has more serious consequences monetarily, as well as in terms of organiza-
tional effectiveness, poor selection at lower leader levels (i.e., entry- and mid-level) is considered
more pervasive (Howard, 2006).

CURRENT BUSINESS TRENDS AFFECTING LEADERSHIP


SELECTION AND DEVELOPMENT
There are several current business trends that have exacerbated the difficulty in selecting and pro-
actively preparing individuals for leadership positions.

TREND 1. FLATTER AND LEANER ORGANIZATIONS HAVE LIMITED CRITICAL ON-THE-JOB


DEVELOPMENT EXPERIENCES TO PREPARE LEADERS FOR HIGH-LEVEL ASSIGNMENTS
The delayering of organizations has diminished the number of opportunities people have to develop
and practice their leadership skills. In the 1980s and earlier, extensive management trainee programs,
Selection Methods and Desired Outcomes 723

with development opportunities and exposure to more senior leaders while occupying lower-level
positions (e.g., assistant managers), were effective means for identifying and developing future lead-
ers. Since then, organizations have reduced leadership levels and eliminated assistant manager posi-
tions. This has reduced the organizations’ ability to identify and develop those individuals with the
greatest leadership potential. Reduced organizational levels have made each transition into a new
level of leadership responsibility more difficult for newly promoted leaders. This has increased the
importance of other strategies to identify, select, and accelerate leaders.

TREND 2. BABY BOOMERS ARE RETIRING, LEAVING ORGANIZATIONS


IN SEARCH OF EXPERIENCED LEADERS TO TAKE THEIR PLACE

When organizations have long-tenured employees, they have the benefit of an experienced work-
force, but they are at risk of mass retirements. Many organizations that we work with, from chemi-
cal processing to transportation companies to financial service organizations, expect up to 70%
of their senior team to retire over the next 5 years. Many organizations are totally unprepared to
manage this challenge. The challenge has been made more difficult if there is no systematic pro-
cess in place to select and develop the next generation of leaders. Succession planning, especially
at the mid-level is largely absent as a strategy for managing the tremendous change in workforce
demographic. Couple this with the new demands associated with leading in a global economy and
the “perfect storm” is created. Underprepared leaders can be faced with an impossible situation.
Organizations do not have the time they had in the past to grow leaders through trial and error.
Changing business realities have increased the use of more visionary job analysis methods that
study the new leadership challenges facing the organization and select competencies most closely
associated with addressing those changes. The assessments are tailored to select and develop the
leaders against a new competency profile that is better aligned with the new business and cultural
strategies. Organizations have less confidence in the judgment of incumbent leaders as the sole
method of making decisions about who will become the new generation of leaders, especially dur-
ing times of change. Incumbent leader judgment is seen as skewed by strategies and skills that were
important in the past.

TREND 3. GLOBALIZATION AND AN INCREASE IN THE DIVERSITY OF


FOLLOWERS HAVE CHANGED LEADERSHIP REQUIREMENTS
Whether leaders are working in their home countries or are taking an assignment in another part of
the world, the effect of globalization over the last decade has been profound. The Internet, technol-
ogy advancements, and global competitive pressure for resources have made the job of information
management, collaboration, managing in a matrixed structure, and decision-making for first- and
mid-level leaders much more complex. The ability to adapt one’s leadership behaviors to peers
from a different culture is a considerable challenge for leaders at this level. A growing number of
companies are using the assessment center method to assess leaders’ abilities (especially expatriate
and sales leaders) to meet challenges of effective leadership behaviors in a global economy. In these
contemporary virtual assessment centers, managerial candidates interact with trained assessors
from the local culture who role-play direct reports, peers, and customers. They must respond online
to critical business and HR issues. Performance in this process helps predict who will succeed in
jobs that require influencing and leading individuals from a different culture and managing the
complexity of making decisions in a global context. Participants are given detailed feedback on the
likely impact of leadership and managerial behaviors on associates from the culture in which they
currently work, or in which they will work in the future. For those who choose to take an extended
assignment in another country, the job challenges are compounded by challenges with adapting to
a different culture. It has been reported that 28% of expatriate leaders leave their assignment early
724 Handbook of Employee Selection

because of family concerns (Society for Human Resource Management, 2009). In addition, the
attrition rate among expatriates once they have returned to their home country is significant (27%
leave within the first year compared with 13% attrition for leaders who do not take an expatriate
assignment; Society for Human Resource Management, 2009) because they are challenged with
reassimilating into their companies as organizations struggle to fully utilize and recognize these
leaders’ new-found skills.

TREND 4. ORGANIZATIONAL COMMITMENT IS DECLINING


The frequent downsizing, mergers, and acquisitions over the last few decades have reduced employees’
commitment and trust in organizations (Kramer, 1999; Burke & Cooper, 2000). In these situations
the challenge of identifying the right leader is redefined as finding leaders who are engaged in their
role because morale and feelings of mistrust are pervasive among viable internal candidates. This
distrust can result in greater focus on managing one’s own career through “job hopping” and working
independently within small companies with an ownership share. Employees no longer have a strong
belief that the company will develop them for the next level. This trend has substantial impact on mil-
lennials (people who are entering the workforce after the turn of the century), who commonly have
lower trust in large companies’ development and promotion practices and the companies’ long-term
viability in the marketplace (Marston, 2007). Without immediate growth and concrete indications of
promotion, they are disposed to seek job alternatives more frequently than similar aged employees
in the past. Knowing that there is a shortage of talent at their level also makes them more likely to
job-hop. This change in employee expectations creates a dilemma for organizations. Organizations
must balance the need to take action to accelerate the development of high potentials with the risk
associated with the newly developed leaders becoming a retention problem if not rewarded (e.g.
greater responsibility and compensation desired almost immediately). Organizations need to follow
through on raised expectations and explicit or implicit promises to leaders or risk their investment in
development because it can be quickly lost through turnover. Well thought out and planned leader-
ship identification and development programs communicated clearly through explicit career plan-
ning processes are critical to the engagement and retention of highly talented leaders.

TREND 5. LEADER READINESS FOR PROMOTION IS LOW, AND PROGRAMS DESIGNED TO


INCREASE “SPEED TO PRODUCTIVITY” ARE BEING DEFINED AS A CRITICAL BUSINESS PROCESS
When new leaders do not have the skills to engage their team, the impact is damaging at two levels.
First, senior leaders fail to realize the results they had planned in terms of goals being achieved,
and second, the employees managed by these ineffective leaders become frustrated and lose focus
resulting in lost workforce productivity and turnover. The complexity and rapid growth of business
today, and truncated efforts to prepare leaders for new roles, leaves leaders with far less time and
support to achieve mastery in their roles. Leadership mistakes are costly at a personal and organi-
zational level. Lack of confidence in leading a new team can result in apprehension or avoidance
in handling difficult leader challenges, requests to return to former positions, or micromanagement
of direct reports. These effects are very apparent in various organizational settings and industries.
For example, it affects the newly promoted leaders in a service industry who ask to return to the
crew because they lacked the confidence, motivation, or skills to manage former peers’ perfor-
mance. Ineffective leader behaviors are also apparent in technology and financial companies when
new leaders focus on directing others’ technical activities rather than coaching and building a suc-
cessful team.
This behavior is shaped by their “comfort zone”—that is, their greater confidence in their techni-
cal rather than leadership skills. It impacts the new sales manager whose “coaching” consisted of
expounding upon what has worked for him or her in the past. This often results in sales associates
working around their leaders and developing their own, often unproductive, strategy to address
Selection Methods and Desired Outcomes 725

a new competitor or market reality. Consider a mid-level sales leader who has accountability for
$10 million of business. When this leader retires or is promoted, who will take her place? Will it be
a highly effective leader or a mediocre one? If we assume a 10% difference in the revenue results
achieved between the highly effective and mediocre leaders, even a small difference in revenue
performance can add up to the difference between success and failure. There is growing recogni-
tion that speed to productivity, enhanced through effective new leader selection, on-boarding, and
development programs, is an important lead measure of success in meeting and exceeding sales
goals. When direct sales costs and lost business opportunities are calculated across a large sales
organization, the dollar figure can run into the tens of millions of dollars. In this context, unwanted
turnover, failure to coach weak performing sales associates, ineffective sales tactics, and lost busi-
ness opportunities can be attributed to average versus highly effective leaders.
In summary, the five trends outlined above have contributed to an increase in the pressure and
difficulty organizations face when defining and managing the applicant pool for entry- and mid-
level leader positions. Although this list is not exhaustive, it does highlight some of the more severe
contextual challenges that must be taken into account when designing a sustainable leadership
selection program. Our contention is that those organizations that can best anticipate the impact
of these trends and then take action to implement programs that select the right leaders will be
the organizations that are prepared with the right people in place to meet their future business
challenges.

CHANGING BEHAVIORAL REQUIREMENTS FOR LEADERS


There has been a shift in the behavioral requirements of entry- and mid-level leaders. Table 33.1
gives the results of job analyses conducted targeting managerial positions in large, multinational
Fortune 1,000 companies before and after 2000 (job analyses conducted by a large HR consulting
firm). These data were taken from several hundred job analyses conducted with these employers
over the past 25 years. The 11 most common competencies identified as critical to success across
a broad array of industries are given in Table 33.1. Each competency was examined for the per-
centage of time that it was identified as important for job success. The competencies at the top of
the list were identified as important more often than the competencies at the bottom. The table
represents the top 11 competencies for job analyses before and after 2000. It should be noted that
many other competencies were identified as being important for leaders, and that this list represents
the top 11 (or the summation of required job behaviors) most often chosen by job content experts.

TABLE 33.1
Rank Order of Most Common Leadership Competencies for Major Global
Companies
1980–2000 2000–2008
1. Communications 1. Decision-making
2. Decision-making 2. Communication
3. Planning and organizing 3. Adaptability
4. Stress tolerance 4. Coaching
5. Gaining commitment 5. Gaining commitment
6. Adaptability 6. Planning and organizing
7. Job fit motivation 7. Building trust
8. Technical knowledge 8. Customer focus
9. Coaching 9. Delegation
10. Formal presentation 10. Building effective work relationships
11. Initiative 11. Innovation
726 Handbook of Employee Selection

The leaders before 2000 were required to be technical experts and good managers (making plan-
ning and organizing, decision-making, and formal presentation more likely to be identified) as well
as able to handle stress and adapt to change. After 2000, the priority of behaviors required for suc-
cess has changed. Although decision-making, communication, and adaptability remained high on
the list, coaching, building trust, delegation, and building effective work relationships were given
more weight. Today, effective managers are seen as balancing technical knowledge and experience
with interpersonal skills. Contemporary selection programs measure technical and social ability.
The customer (customer focus) and innovation are also valued as important in leadership roles.
Today, when it comes to human capital management, many managers and executives are becom-
ing increasingly sophisticated. They realize the importance of objective HR assessment and estab-
lishing the return on investment (ROI) for their HR and that information collected, compiled,
consumed, and applied from assessment practices can have a tremendous impact on organizational
success. High performance in leadership selection can best be achieved when the specific require-
ments for people in leader roles are understood and when systematic methods of assessment are used
as inputs to making selection decisions. In this section, we will outline common pitfalls that under-
mine selection and assessment; demonstrate the essential role that assessment plays in developing
world-class leaders; provide an overview of assessment tools; and show how to best use assessment
tools, systems, and processes to build a competitive advantage through people.
Quality leaders are essential drivers of business strategy and critical differentiators for com-
panies. Every day, organizations make critical business decisions, many of which are centered on
leaders, such as the following:

• Strengthening current leadership talent (e.g., through steady-state hiring, or developing


current talent for future roles)
• Rapidly building workforce capacity (e.g., expanding a sales force, starting up a new
facility)
• Supporting a shift in strategic direction (e.g., moving from transaction-based selling to
consultative selling organization)
• Identifying high-potential leader candidates (e.g., following a merger, choosing among
internal or external candidates for key leadership positions)
• Growing an organizational culture rooted in strategically important competencies

ASSESSMENT PRINCIPLES FOR LEADERSHIP SKILLS AND POTENTIAL


To make the best possible entry- and mid-level leadership decisions, many organizations turn to var-
ious forms of assessment. Assessment helps organizations gather and organize information about
their current and potential future leaders. When applied in the context of hiring, promotion, or
development, better decisions are made when assessments are used. The best assessment techniques
are not only aligned with leaders’ job challenges but also with business and cultural strategies.
Specifically, effective leadership assessment (a) increases the probability that an individual who is
chosen has the skills, motivations, and experiences needed to succeed in a leadership position and
(b) provides insights into leadership potential and readiness that can guide an individual’s develop-
ment and deployment. To build selection criteria and tools that will assess leaders accurately, we
believe it is important to understand the role of the leader in ensuring an organization’s success.
Understanding the leader’s role shapes the selection criteria and approach needed to make accurate
predictions of candidates’ potential and performance. Fundamentally, from a psychological per-
spective, leaders maintain group cohesiveness, manage conflicts, sustain the group’s value to the
broader organization, and, most importantly, manage external events that may threaten the group’s
value to the organization or customers it serves. Leaders provide the structure through which pri-
orities are set and norms are established to ensure the group’s value to a broader organization is
sustained (Katz & Kahn, 1978).
Selection Methods and Desired Outcomes 727

Leadership in this deeper sense cannot be bestowed on an individual by an organization.


Although formal leaders can be given status and authority, leaders need to earn the role described
above. If leaders fail to gain personal influence, their role is limited to becoming the “enforcer”
who monitors compliance with rules. To add value, leaders need to provide more value/benefits
than others. According to Hollander (2006, 2008), this gives the leader idiosyncrasy credits. This
bank of earned credit, or perceived greater value, gives leaders the influence to change established
procedures, behaviors, or decisions that do not add value to internal or external customers. This
enhanced power and credibility of leaders enables greater control of decisions, greater receptivity
to their ideas, and various forms of rewards in the form of greater respect and monetary incentives.
Leaders need strong skills in influencing and engaging others to achieve important business and
people objectives and in making sound decisions and plans. In private sector organizations, the
value that leaders bring to the organization is translated into higher productivity, customer satisfac-
tion, and the effective management of competitive threats. Leaders ensure that for every member,
the benefits and costs of staying with the group outweigh the benefits and costs associated with
leaving (Bandura, 2006; Hollander, 2008).
The interpersonal and social skills needed to succeed as leaders in a deeper sense are not fully
understood by organizations. Failing to understand their importance, companies erroneously use
leadership positions as a reward for past contribution and loyalty. They make these selection deci-
sions on the basis of reputation and past technical successes and accomplishments rather than
selecting candidates who have the skills and motivation to ensure their teams add value and stay
engaged. Candidates for leadership positions also fail to understand the nature of the leader’s role
and seek these positions for status, power, and money. They often become a source of frustration
for direct reports and have little influence over behaviors and attitudes of direct reports or become
disillusioned and disengaged from the role of leading others.
To truly maximize the predictive power of entry- and mid-level leadership selection, a number of
important assessment principles should be taken into account.

ASSESSMENT PRINCIPLE 1. MULTIPLE SELECTION TECHNIQUES CREATE BETTER


PREDICTION AND MITIGATE THE RISK OF SELECTION ERROR
Past performance and sales results achieved as an individual contributor has limited power for
predicting future leadership performance when the uniqueness and complexity of the leadership role
is significant and when there are substantial differences in skill sets required between leaders and
individual contributors. Screening assessments of various types (e.g., application forms, experience
and action benchmarking items) are very effective when used to identify the more qualified. For these
candidates, multiple selection methods (situational tests, personality and cognitive ability tests, and
batteries that more comprehensively screen candidates on dispositions/abilities and past experience,
as well as interviews and behavioral simulations) provide a more comprehensive view of potential
and readiness. Simulations and tests are particularly important when entry-level leader candidates
have little previous leadership experience. There are no silver bullets in leadership selection. When
practitioners are trying to mitigate the risk of selection error, some level or redundancy is important
and may even be valued. Given all of the sources of error variance (e.g. methods and evaluators) and
the rather low correlations between many selection tools and job performance, it is beneficial to have
multiple processes in place. Similar to the mindset of an engineer who is designing a fail-safe system,
a multiple hurdle selection process is helpful in ensuring that only the best candidates are selected.

ASSESSMENT PRINCIPLE 2. LEADERSHIP SELECTION AND DEVELOPMENT,


WHEN LEVERAGED TOGETHER, CAN HAVE SIGNIFICANT IMPACT
In a well-designed and implemented leadership succession process (hiring, promotion, and suc-
cession management), assessment should focus on all elements of the job requirements, whereas
728 Handbook of Employee Selection

development focuses on trainable elements. Not all leader requirements are equally developable.
Training and development can enhance knowledge and build proficiency in skill-based competen-
cies. However, less developable personal attributes such as mastery of complexity, self-development
orientation, and leadership motivation lend themselves to identification through assessment. A well-
designed assessment program will examine both nontrainable and trainable dimensions of success.
This design recommendation is especially important for entry-level leader selection when there are
little data about the individual contributor for less trainable, positive leadership characteristics.

ASSESSMENT PRINCIPLE 3. TRANSPARENCY ABOUT ASSESSMENT RESULTS AND THEIR


IMPACT ON CAREERS IS PARTICULARLY IMPORTANT WHEN SELECTING LEADERS
The best candidates for entry-level leadership positions are often the best and most valued indi-
vidual contributors, as well as external candidates who are often highly sought after by other com-
panies. Most individuals are resistant to evaluation, especially when they are uncertain of how it
will impact their employment possibilities or careers. Explaining the importance of the role of
leadership and the importance of objective assessment to the company and the candidates’ own
career decisions, reduces the natural resistance to be evaluated and produces greater acceptance
of the process and its results. Internal candidates also should know who will review the results
and how the results will be used and impact their career. Having alternative career paths for these
valued employees who are not successful in the leadership selection process is critical to reduce the
potential negative impact of failure.
All personnel selection program implementations begin by identifying what incumbents do and
know to be successful in the target job. To successfully obtain this information, we need to (a)
understand job challenges, and what incumbents do to be successful in managing those challenges
and for what purposes; and (b) understand how success in that position contributes to organiza-
tional success/business drivers. Through job analysis, we can develop success profiles (e.g., specific
behavioral competencies, knowledge, experience, and personal attributes such as motivation and
personality characteristics) that determine success in a target job or job family. Once the success
profiles are developed for entry- and mid-level leaders, there are several methods to assess the suc-
cess profile elements. In this next section, we will discuss assessment tools and specific recommen-
dations for enhancing selection implementations for leaders.

ASSESSMENT TOOLS AND TECHNIQUES


An ideal selection (and on-boarding) process for leadership positions consists of multiple hurdles.
Multiple methods used in multiple hurdles is a common method to create efficient screening out
of less qualified candidates and a more in-depth evaluation of the most qualified, especially when
efficiency and productivity are of critical importance. The process often begins with a screening of
candidates on the basis of an evaluation of relevant knowledge and experience, and then tests and
inventories are used to provide more information about skills, potential, and attributes. Remaining
candidates can be put through more in-depth assessments that can include simulations and inter-
views. Once candidates are hired, development plans are built upon their selection results and are
incorporated into the on-boarding process. This ensures that new hires are brought up to speed
quickly, thereby reducing time to contribution.
Various tools may be utilized to effectively evaluate a candidate’s capabilities for each of the
success profile components. Some methods (i.e., tests) assess basic psychological constructs or job
knowledge, whereas other methods (e.g., work samples, simulations, and interviews) are more con-
textual and directly measure critical job challenges and competencies. These methods may also
be placed along a continuum that ranges from measuring signs of behavior to samples of behavior
(Wernimont & Campbell, 1968). Signs of behavior include an individual’s personality or disposi-
tions and motivations related to job success, whereas samples of behavior refer to the demonstration
Selection Methods and Desired Outcomes 729

of behaviors related to job success. Thus, methods may also be categorized as those that provide
inferences about behavior (e.g., personality tests, cognitive tests), assess descriptions of work behav-
ior (e.g., biodata, interviews), or demonstrate behavior (e.g., job simulations) (Howard, 2006). This
is an important difference for organizations because the use of different methods requires different
validation strategies. Effective entry- and mid-level leadership assessment programs use various
assessment tools.
Whether selection methods measure constructs or focus on job content—that is, depict signs
(inferences) of behavior or samples (descriptions or demonstrations of behavior)—some have been
shown to be better predictors of leader performance than others. Although there is an abundance
of literature on the validity of selection predictors across jobs (mainly relying on entry-level jobs;
e.g., Hunter & Hunter, 1984), there is much less that has focused primarily on entry- and mid-level
leader selection. Considerable research is needed to confirm these inferences. The unique nature of
leadership positions will impact the results of studies targeted at this level.

SCREENING METHODS
Biographical Data
Biographical data or biodata measures are empirically developed and quantify descriptions of past
activities and accomplishments, such as life experiences, hobbies, and other pursuits. According to
Mumford, Stokes, and Owens (1990), studying patterns of life history sheds light on the ecology
of human individuality. Although that may be true, Schmidt, Ones, and Hunter (1992) noted that
by the late 1980s and early 1990s in the United States, only 6.8% of firms had ever used biodata
in employment decisions, and, similarly, Shackleton and Newell (1997) estimated that 6% of firms
used biodata in all employment decisions and 11% in some cases. We believe that the use of biodata
for leadership selection may be on the rise. Biodata typically lends itself to a decrease in adverse
impact and has the benefit of being verifiable through interviews or reference checks. Reilly and
Chao (1982) found that personal history correlated .38 with success in management positions, and
Stricker and Rock (1998) found that personality-based biodata measures effectively predicted lead-
ership potential. Rothstein and colleagues (1990) demonstrated a validity coefficient of .32 across
organizations with a sample of 11,000 first-line supervisors, which generalized across race, gender,
and age groups along with levels of education, work experience, and tenure with the company.
Similarly, Carlson, Scullen, Schmidt, Rothstein, and Erwin (1999) found that the validity for a bio-
data inventory predicting managerial success generalized across organizations (ρ = .53), and, again,
was predictive for men and women as well as for managers of all age groups, tenure with company,
and education levels. Further research on the predictive validity of life experiences for early success
as a leader is needed to substantiate these findings.

Behavioral Consistency Method


The behavioral consistency method of evaluating training and experience is a type of biodata
evaluation. Although some have categorized the behavioral consistency method as biodata (i.e.,
Hough & Oswald, 2000), most others have differentiated the two types of measures (e.g., Howard,
2006; Robertson & Smith, 2001; Schmidt & Hunter, 1998). Also called individual achievement
records/career achievement records/career achievement profiles, this method is based on the well-
established principle that the best predictor of future performance is past performance, and accord-
ing to Howard (2006) is a useful tool for leader selection. Applicants are asked to describe their past
achievements or experiences, either in writing or orally. Managers, with the aid of scales that are
anchored, then score these achievements. This works well for mid-level leadership selection but is
problematic when individuals have no formal leadership experience and are applying for an entry-
level leadership job. There are few relevant past behaviors to document giving this method limited
practical utility. Research has also shown that contemporary items (current or ongoing behaviors/
experiences) tend to be more valid than hypothetical/future (potential behaviors) or historical items
730 Handbook of Employee Selection

(past experiences), and items that ask respondents about other’s opinions of them are more valid
than direct self-report items (Lefkowitz, Gebbia, Balsam, & Dunn, 1999). Although the behavioral
consistency method is time-consuming and costly to construct, Schmidt and Hunter (1998) noted
that the method is well worth the cost and effort for higher-level jobs, such as entry- and mid-level
leaders. There have been no empirical studies that have directly examined the predictive validity of
achievement profiles for entry- to mid-level leaders.

TESTS AND INVENTORIES


Cognitive Ability Tests
Since the earliest research on personnel selection, cognitive ability measures have been one of the
major methods used when attempting to discriminate between candidates. Specifically, various cog-
nitive ability tests (e.g., verbal, numerical, and spatial tests) intercorrelate, and the common variance
often operationalizes a general cognitive ability factor, often called g (Hough & Oswald, 2000).
Among the various measures that might be used for personnel selection, cognitive ability (g) is one
predictor that has demonstrated strong validity across most jobs. Interestingly, the main factor that
moderates the validity of g as a predictor of performance is the complexity of the job. Hence, tests
that measure g have their highest validity for complex jobs. General cognitive ability is an excellent
predictor of academic achievements and professional expertise. It may not predict interpersonal
leadership complexity related to operating in a business setting.
Complexity often focuses on mastering ambiguous business situations and dealing with difficult
social interaction and persuasion. The complexity is somewhat different from that found in other
professional positions such as engineering and finance. Indeed, Schmidt and Hunter (1998) reported
an adjusted correlation of .58 with performance for managers. These results are based on the large-
scale meta-analytic study conducted for the U.S. Department of Labor (Hunter, 1980; Hunter &
Hunter, 1984). However, it is important to note that there is likely to be a restriction of range in
cognitive ability as leaders move up the management hierarchy (Howard, 2006). Although cogni-
tive ability tests are unquestionably valid, they are commonly found to demonstrate considerable
adverse impact and do not measure all of the elements of leadership success. For this reason, many
practitioners (in the United States) have avoided using cognitive tests as the sole screening tool early
in the selection process (Sackett & Wilk, 1994).
Another related, although different, concept to cognitive ability (i.e., practical intelligence) is
called emotional intelligence. Specifically, emotional intelligence (Goldman, 1996) refers to the
ways in which people perceive, understand, and manage emotion. Although the term (and concept)
is widespread among practitioners, there is little, if any, scientific literature that has demonstrated
any criterion-related validity for any leader position. Over time and with better assessment methods
we believe that emotional or social intelligence may prove to be as or more important than cognitive
ability for predicting entry-level leadership success.

Personality Measures
Personality measurement has been extensively researched, and practitioners continue to explore the
practical value of personality for predicting leadership success. Within personnel selection, personality
predictors can be roughly divided into two categories: (a) general measures of adult personality (e.g.,
NEO-PI, 16PF, HPI) that are used to describe individual differences in behavior and (b) measures of
personality (e.g., integrity tests, violence scales, drug and alcohol scales, etc.) that are used for the pre-
diction of individual differences (Salgado et al., 2001). Despite the extensive research on the Big Five
for predicting job performance (e.g., Barrick & Mount, 1991) and relatively high validity coefficients for
both conscientiousness (.31; Schmidt & Hunter, 1998) and integrity tests (.41; Ones, Viswesvaran, &
Schmidt, 1993), these measures may have low validity for management jobs depending upon how the
construct (e.g., conscientiousness) is defined (Hough & Oswald, 2000). For example, Hogan and Ones
(1997) defined conscientiousness as conformity and socially prescribed impulse control. On the basis
Selection Methods and Desired Outcomes 731

of this definition, Hough and Oswald believed that conscientiousness would not predict performance,
in which creativity and innovation are highly important (characteristics that are aspects of many lead-
ership positions). Although Ones and Viswesvaran (1996) argued that broad personality domains are
better for predicting performance across job levels than narrow domains, others have shown that con-
scientiousness was not a valid predictor of managerial performance (Robertson, Barron, Gibbons,
MacIver, & Nyfield, 2000). This study suggests that practitioners should recommend concurrent valid-
ity studies before recommending personality tests as core elements of a selection process.
In contrast to Roberston et al. (2000), Bartram (2004) indicated that scores on scales of the
Occupational Personality Questionnaire (OPQ) and ratings of work behavior on the Inventory of
Management Competencies showed an average uncorrected validity of .48, with a range of .29 to .69
(zero-order correlations). Additionally, personality measures have also been shown to predict lead-
ership style (Hogan & Kaiser, 2005). More recently, Hogan, Davies, and Hogan (2007) proposed
a conceptual model that links certain personality variables to workplace behaviors. They outlined
various strategies for utilizing the validity evidence from prior research to apply to other positions,
and they used research from managerial jobs as examples.
Although there is some debate as to the level of analysis that should be used (e.g., Robertson &
Smith, 2001) and there have been some conflicting findings regarding the validity for leader selec-
tion, personality measures (whether conscientiousness or integrity) add a degree of validity (i.e.,
incremental validity) over and beyond cognitive ability. An advantage of personality measures
over cognitive ability measures is that personality measures do not demonstrate adverse impact
to the same extent as other measures (Hogan & Hogan, 1995). As with cognitive ability tests,
there are various group differences that tend to be associated with personality measures; however,
these differences tend to focus on sex differences rather than racial differences. Indeed, as noted
previously, personality tests tend to not show significant group differences (i.e., adverse impact)
in regards to racial groups. For example, Ones and Viswesvaran (1998) compared the scores of
African Americans, Hispanics, Native Americans, Asians, and Whites and found trivial differ-
ences. They went on to note that group differences with these “trivial magnitudes” are not likely to
cause any discernable adverse impact. In regards to sex differences, the Big Five facet of conscious-
ness, women tend to score higher than men (Feingold, 1994). Hough, Oswald, and Ployhart (2001)
note that women tend to score higher on “dependability” scales whereas men tend to score higher
on “achievement” scales. Similarly, Ones and Viswesvaran (1998) found that women tended to
score higher than men on overt integrity tests. Overall, the use of personality measures for making
employment decisions is accepted, and the validity evidence for certain scales is growing (Hogan,
Hogan, & Roberts, 1996). Indeed, in a survey of organizations from 20 countries and 959 organiza-
tions, Ryan, McFarland, Baron, and Page (1999) found two general trends: (a) personality test usage
around the globe is increasing and (b) personality measures are being used more frequently for
managerial and professional selection.

Situational Judgment Tests


Situational judgment tests (SJTs) are characterized by items that provide a work-related scenario and
then ask test-takers to choose among a list of actions that respond to the scenario. These tests of deci-
sion-making and judgment in work settings can be constructed as a low-fidelity job simulation (Salgado
et al, 2001) and are used primarily at lower levels of management (Howard, 2006). Action benchmark-
ing items (i.e., leadership challenges), in which respondents are presented with a leadership situation and
asked to indicate their level effectiveness about the appropriateness of various actions, are another type
of SJT that is growing in use. Indeed, McDaniel, Morgeson, Finnegan, Campion, and Braverman (2001)
estimated the population validity of SJTs at .34 with job performance for leader and nonleader jobs.

Assessment Centers
In this review, assessment centers (ACs) are the only method developed specifically to measure
leadership and managerial performance. AC refers to an evaluation method, or process that usually
732 Handbook of Employee Selection

includes multiple simulations, designed to assess both dimensions (i.e., competencies) and cate-
gories of behaviors associated with success or failure in the target position and simulate critical
managerial job activities (Bray, 1982). An AC simulates critical and representative job challenges. It
may include written simulations (e.g., in-basket, fact finding, analysis, and planning exercises) and
interactive simulations (e.g., role-plays, presentation, group discussion, and business game; Howard,
2006). The groundbreaking work with ACs was the Management Progress Study at AT&T, which
led to the use of ACs as an aid in selecting first-line supervisors (Bray & Howard, 1983).
ACs are mainly used for managerial selection and have an impressive record of predictive
validity (.37; Schmidt & Hunter, 1998). Thornton and Rupp (2006) indicated that the estimates of
the relationship between AC ratings and management success range from .31 to .43, and Gaugler,
Rosenthal, Thornton, and Bentson (1987) found in their meta-analysis that there was an upper bound
of .63 under “optimal” conditions. AC researchers and practitioners are in conflict about the appro-
priate means to approach AC research. Most practitioners agree that competencies are categories
of behavior related to job success and not psychological constructs. Most research treats competen-
cies as constructs in which factor analytic studies indicate that the key factors that emerge from an
analysis of AC data are related to exercises rather than the competencies that are the assessment
target (Robertson & Smith, 2001). Lance (2008) recommended that dimensions should not be the
focus of the AC process, but that the exercises themselves should be rated. However, the influence
of exercises may have been exaggerated. Indeed, Arthur, Woehr, and Maldegen (2000) showed that
60% of the variance can be attributed to individuals and dimensions and only 11% to assessors and
exercises. This is in line with Howard’s (2008) assertion that an “integrative” perspective is more
appropriate. Specifically, Howard noted why dimensions (i.e., competencies/behaviors) are essential
to ACs and that focusing solely on exercises (i.e., the situations) is a mistake. Although the type of
exercise, or situation, does influence how individuals are judged, the assessors are primarily dif-
ferentiating people on the competencies of interest (Howard, 2006). Also, exercises tend to focus on
different elements of key behaviors that constitute a competency reducing the extent of redundancy
across exercises and behavior in competency correlations.
Particular attention has been given to group differences associated with ACs. Goldstein, Riley,
and Yusko (1999) examined group mean differences on an exercise level and found that the dif-
ferences varied by exercise. Specifically, the mean differences appeared to be a function of the
cognitive component of the exercise. They found that those exercise components that required inter-
personal skills resulted in less adverse impact for African Americans than exercise components
that focused on cognitive abilities in which Whites tended to score higher. Indeed, Bobrow and
Leonards (1997) developed an AC for first-line supervisors in a customer service division (i.e.,
requiring substantial interpersonal skills) and found no differences between Whites and minorities.
Overall, Hoffman and Thornton (1997) have reported that, although Whites tend to score higher for
overall AC ratings, the differences are typically lower than those found with cognitive ability. As
noted by Hough, Oswald, and Ployhart (2001), these racial differences appear to be associated with
measuring cognitive components. In addition to racial differences, Clapham and Fulford (1997)
examined age differences in overall AC ratings. They found that older adults tended to score lower
than younger workers and noted that this was a surprising result, particularly if experience, which
presumably comes with age, is important for developing managerial skills.

INTERVIEWS
Interviews are the most frequently used procedures in personnel selection across all countries, jobs,
and levels (McDaniel, Whetzel, Schmidt, & Maurer, 1994; Salgado et al., 2001) and likely the
most frequently utilized method for leadership selection. Indeed, it is estimated that practically
100% of selection processes use one or more interviews, although not all types of interviews are
considered as valid, or even as useful as others. The employment interview has been the target of
significant research (e.g. McDaniel et al., 1994). Krajewski and colleagues (2006) compared the
Selection Methods and Desired Outcomes 733

validity of situational versus past experience interviews for predicting managerial performance.
Using a sample of 157 applicants to managerial positions, they found that the experience-based
interview significantly predicted overall performance (.32), whereas the situational interview did
not (.09). Additionally, in an examination of the construct differences between the two interview
types, Krajewski and colleagues also showed that the experience-based interviews were highly
related to manager-relevant work sample measures (i.e., AC exercises), cognitive ability facets, and
personality traits.
Ultimately, the most effective selection system will use various methods, and this is especially
true for entry- and mid-level leadership jobs in which the job requires balance among experience,
knowledge, interpersonal competencies, leadership motivation, and personality. On the basis of
Schmidt and Hunter (1998), “incremental validity” can be translated into increases in utility (i.e.,
practical value). The issue of the incremental validity provided by different methods is useful for
assessing the extent to which combinations of methods are useful, or by contrast overly redun-
dant. For example, personality tests and assessment simulations measure different job requirements.
Using both methods together should produce incremental validity, thereby leading to a stronger
relationship with leader performance.

CASE STUDIES OF LEADERSHIP SELECTION


The final section of this chapter will examine common, high-stakes organizational contexts in
which selection systems are likely to be deployed. Two cases are described, one that illustrates high-
velocity hiring of leaders for an organization start-up and a second that illustrates a promotional
process for leadership succession.

CASE 1. HIGH-VELOCITY HIRING FOR ENTRY- AND MID-LEVEL LEADERSHIP POSITIONS


Most private sector organizations, if successful, face the positive prospect of starting up a new
facility, plant, or store. This is a positive outcome of success and growth in the business. Although
there are many positive aspects to growth, in these instances, the pressure for (continued) success
is very high. Senior leaders are under pressure to invest wisely. They must choose the right site;
pick the right product for manufacturing, distributing, or selling; install the right technology; and
all while facing deadlines for achieving a profitable ROI. In this complex mix of business issues is
the chance to hire the right people the first time. For “greenfield” facility start-ups, a new system of
operation can be deployed and an operating culture can be engineered because everything is new.
This situation can be contrasted with “brownfield” or retrofit work, in which existing facilities and
incumbent employees are pointed in a new direction. In this less enviable situation, current opera-
tions must overcome the natural inertia caused by years, if not decades, of work conducted in the
older operating style. It is much easier to start a new facility and hire a new workforce than to try
to change an existing one.
The authors have worked on many greenfield and brownfield facility start-ups. In our experi-
ence, these capital-intensive projects are entered into with a high degree of hope for return as well
as incredible pressure for the people involved to be successful. One start-up in particular had this
mix of factors at play from the start. A U.S.-based automotive manufacturer was entering into a
joint venture with two Japanese auto manufacturers to build a new engine manufacturing facility in
the Midwest. The goal was to build a mid-size automobile engine with better quality and less cost
than one that could be imported from international facilities. The pressure was high given that most
high-quality engines at the time were built at international plants. If quality and cost goals could not
be achieved, the plant would be seen as a failure.
Early in the planning for the plant, it was recognized that a new style of manufacturing would
be required to achieve the goals. The two Japanese partners were primarily providing manufactur-
ing technology for the plant. The work style or culture of the plant was to be determined by the
734 Handbook of Employee Selection

American company. With this in mind, the plant start-up team set out to specifically define working
requirements. In the new plant, lean operating procedures would be the core concept that defined
requirements for people and teams. Identification and elimination of waste is a core concept of lean
manufacturing. This requires everyone in the plant to work together to follow established operat-
ing procedures and to put in place improvements on a fast, continuous basis. Older manufacturing
practices defined by hierarchical decision-making, command, and control management style, and
adversarial manager to worker relationships would not work.
The core management team set out to define the requirements for leaders and team members.
The definition of success was highlighted by higher levels of teamwork, empowerment, coach-
ing, and initiative than had been tried in the past. The management team struggled to break away
from past practices to define this new working culture. A job analysis was conducted with the new
management team. The job analysis could not rely on incumbent interviews or job observation as
the new jobs did not exist within older plants. A series of “visionary job analysis” discussions were
conducted with the plant manager and his direct reports, the functional managers within operations,
engineering, maintenance, and HR. Targets were set for behavior, motivation, and technical skills
that defined the type of leader that would be successful in the new facility. The selection system
design for leaders was especially critical. It was recognized that front- and mid-level leaders would
be critical for carrying out the day-to-day production tasks and for following the operating model
that the senior leaders had established. A rigorous program was required to identify those leaders
who would accept and excel in this progressive manufacturing environment. The recruitment effort
was complicated by the fact that leader candidates would be selected from existing manufacturing
facilities (where older manufacturing practices were the norm).
The first step in the hiring process was a comprehensive application blank that covered work
experience and technical skill. Special care was taken to fill the selection funnel with a broad pool
of leader applicants to achieve a high number of people with the potential to display the right skill
set and diversity mix. The next step involved a comprehensive test battery that targeted behavioral,
personality, and motivational targets that were consistent with a lean manufacturing environment.
Candidates were prioritized for the next step according to “fit” (as measured by the test battery) with
the defined roles leaders would play in the plant. A third step employed the use of a day-in-the-life
AC. This simulation-based set of exercises involved prework about the fictitious company’s opera-
tion (market, competitors, structure, culture) that was used for the simulation, an in-basket exercise
that challenged the candidate on issues ranging from planning the schedule of production to dealing
with HR issues, a coaching exercise requiring the candidate to role-play with an assessor who played
the role of a team member whose performance had gone down over the last few weeks, and a peer
exercise requiring the candidate to negotiate operating procedures and schedules with an assessor.
The AC was designed to reflect the operating environment of the new plant while giving the can-
didate the opportunity to display behaviors required in the leadership roles. Successful candidates
used behaviors that were consistent with the new start-up’s desired culture. The benefit of the AC
was realized in two ways. First, candidates had the chance to experience a realistic preview of the
leadership job for which they were applying. Second, hiring managers had the chance to see how
candidates performed in exercises that were very similar to the requirements of the job. The fi nal
step in the selection process was a behavior-based interview during which candidates described how
they had performed in past jobs. Hiring managers asked questions that provided candidates with the
opportunity to describe their previous work and the results they had achieved in these situations.
Answers were evaluated against the target job requirements with relevancy, recency, and similarity
to the target job used as guiding evaluation criteria. At the end of the process, all of the data were
integrated with successful candidates given a contingent job offer (candidates needed a successful
reference check and drug screen to get the job).
Many organizations would not have taken such extensive steps to select leaders for the new facil-
ity. Each step in a hiring process takes time and investment of resources. In this case, the plant man-
agement team recognized the importance of selecting the right leaders to the eventual success of the
Selection Methods and Desired Outcomes 735

facility. The selection of the first leaders to come on board in a new facility is especially critical as
they play multiple roles early in the start-up such as interviewing the team member job candidates,
playing a significant coaching role in the development of the team members, and setting the tone
for the culture right from the beginning. For this plant start-up, new leaders were supported with
additional training on the concepts of lean manufacturing, coaching and team skills, as well as how
to conduct a successful interview. The results at this plant have been impressive. To date, the new
engine manufacturing facility is on target to reach its production goals and has started up on time
and on budget due in part to the successful hiring of its first wave of leaders. The workforce is mea-
sured on safety (number of safety violations and on-the-job injuries), quality (number of defects and
achievement of quality goals), and engagement (workforce survey conducted yearly). Engagement
levels at the new plant are at the top of the list as compared to the network of plants operated by the
organization. The plant management team attributes the high engagement level of the workforce to
the quality of the front-line and mid-level leaders. This benchmark facility is held up as an example
for how a new start-up should be implemented and as an example of the culture of the future.

CASE 2. A LEADERSHIP PIPELINE APPROACH FOR ENTRY- AND


MID-LEVEL LEADERSHIP POSITIONS
More and more organizations are executing talent management strategies to close the leadership
gap at entry- and mid-level leader positions and to have individuals in place and ready to drive the
results needed at this level. A robust pipeline for entry-level leaders encourages promotion from
within (which can be less risky than hiring from the outside because successful internal associates
have assimilated to the culture and have built a network of work relationships) and demonstrates to
employees with potential that the company supports a “grow from within” strategy.
The pipeline concept received considerable attention as a result of the book The Leadership
Pipeline: How to Build the Leadership Powered Company (Charan, Drotter, & Noel, 2000) and
was later expanded upon in Grow Your Own Leaders (Byham, Smith, & Paese, 2005). According
to Byham, Concelman, and Cosentino (2007), the leadership pipeline can be defined as “a critical
business process that provides organizations with a sustainable supply of quality leaders (at all lev-
els) to meet the challenges of today and tomorrow” (p. 3). A strong pipeline is an integrated assess-
ment and development approach that must be supported by senior management. It is not a single
program or tool, rather it is a process that provides the right quantity and quality of leaders in time
to step up and meet pressing business challenges.
Many organizations apply a pipeline approach at senior-levels of leadership through the use of
succession management systems where high-potentials are handpicked, groomed, developed, and
promoted. Traditional methods of succession management tend to fall apart when applied to lower
organizational levels because of the sheer number of incumbent leaders that fill lower-level leader-
ship ranks. A scalable pipeline strategy is needed at first- and mid-level leadership because leader-
ship assessment and development processes need to be applied to potentially large numbers of first
and second-level leaders. In addition, the efforts must be aligned with wide-ranging expectations
of high-potential employees within the organization and those who are entering the workforce or
looking for new opportunities from a competitor organization. We believe that an effective pipeline
approach must (a) focus on early identification of leadership potential and readiness for a leadership
position and (b) provide individuals with accelerated development prior to their promotion so they
are confident in their leadership skills on day one.
Practitioners responsible for entry- and mid-level career management look to mitigate risk of
early-leadership failure by integrating assessment and development solutions. The pressure to dem-
onstrate payback to the company for the expense of these programs in terms of time and money is
significant, and ROI analyses are critical for sustained implementations.
The company described below took a programmatic approach to leadership pipeline manage-
ment. This Fortune 500 technology company was interested in identifying individuals in their sales
736 Handbook of Employee Selection

force who had the motivation and potential to be effective district managers, a first-level leader
position. They were committed to a “grow your own leadership strategy” because they recognized
that performance of internally promoted leaders was more effective than those hired from the out-
side. The organization’s primary business involved the implementation of a complex set of product
lines. Leaders with minimal technical and sales experience specific to the company lacked cred-
ibility with the sales associates and were not effective coaches. Unfortunately, individuals promoted
from within (under the current process) were only marginally more effective as leaders. This was
disconcerting to HR leaders because they spent considerable resources in training internal leader
candidates prior to their promotion. Candidates for this leadership training program were picked
based on top management’s judgment. Internal and external selection strategies were not working
as the company faced new market opportunities, new competitors, and an increasingly complex set
of market offerings.
The first step taken to improve the program was to develop a success profile for leaders. The
selection criteria for new leaders had not changed for many years whereas the business challenges
for sales leaders had changed substantially. Working with senior sales leaders, the authors identified
the business drivers for sales leadership positions. Business drivers represent the critical activi-
ties on which incumbents must focus to successfully implement the company’s sales strategy. In
this case, they included identifying the best client opportunities in their districts and establishing
broader customer networks in these strategic accounts, coaching others to sell value rather than
focus on product features and benefits, and to engage client line executives in discussions about
their business needs and then relate them to technology solutions. The insights from the visionary
job analysis formed the base for the success profile for the position. It included specific competen-
cies, experiences, knowledge, and personal attributes needed to address these leadership challenges.
These insights were confirmed by successful mid-level managers and incumbent sales leaders. Key
job activities related to each performance area were documented and confirmed by a representative
sample of subject matter experts.
On the basis of this success profile, tools were developed and a process was designed to support
a new promotion process. The program included the following:

1. Realistic job previews and motivational profile: Current sales associates who were above
average sales associates for 3 years had access to an online leadership career site. On the
site, they obtained information that provided a balanced view of a career in sales leader-
ship. This site included insights from current successful sales leaders about the challenges
they faced in transitioning into a leader role. Without identifying themselves, users had
access to a motivational fit inventory in which they documented the degree to which they
liked various work activities. Their responses were computer-scored, and they were given
immediate and confidential access to a profile of motivational matches and mismatches
with a leadership career. The associates were encouraged, but not required, to share the
results with their manager so that they (the associates) had the data and insights they need
to make a sound decision about pursuing a leadership career.
2. Leadership insight inventory: If they decided to continue in the process, the associates
documented their relevant experience and knowledge online and completed a leadership
insights inventory that was predictive of the many leadership competencies and personal
attributes needed to be successful in that leadership position. The inventory consisted of
a battery of item types including situational judgment items, biodata items, and personal-
ity items. The items were grouped into subscales that mapped back to the target success
profile dimensions. Only managers of candidates had access to the results. These managers
received training in interpreting the results and provided feedback to candidates. Managers
were required to have a feedback discussion with candidates. Managers were to try to influ-
ence candidates’ career choice (some candidates were identified as ready for a leadership
job while others were not), but the final decision to proceed was left to the job candidate.
Selection Methods and Desired Outcomes 737

3. Online training: Candidates had access to online leadership courses that they completed
at their own pace and on their own time. After candidates completed the coursework, they
were encouraged to discuss their training results with their manager and decide jointly if
they were ready for the next step in the process.
4. Competency assessment: Candidates who decided to proceed in the process had access to
an online assessment of the new sales leader competencies. The online assessment asked
candidates to respond to a series of leadership challenges by rating the effectiveness of
various actions to address each challenge. The leadership challenges provided to candi-
dates were tailored to challenges identified by the job analysis/success profiling process.
Responses were computer-scored, and results were provided to a promotional panel who
conducted a behavioral interview to further evaluate motivation fit and skills/competency
readiness. The promotional panels that made decisions whether or not to place candidates
in the promotion pool integrated the interview and assessment results. Candidates were
provided with feedback on the decisions. If the decision was not to proceed, there was a
discussion about other career paths.
5. Leadership training: Candidates placed in a promotion pool had access to more in-depth
leadership training.

In a concurrent validity study, 153 randomly selected incumbent managers completed the com-
petency assessment. Ratings of the leadership competencies of the participating managers were
made by their direct supervisors. The correlation between the predictor (the competency assess-
ment) and criterion (the ratings of supervisors) was .56. The validity of the competency assessment
appears to be one of the reasons why the new program is showing immediate positive results. Under
the new process, satisfaction with the slate of candidates and success rates in the final promotional
interview were much higher. There was no increase in turnover among sales associates who did
not succeed in the final steps of the promotional process, suggesting that those not selected see the
process as fair. Performance in prepromotional training was substantially better than before the
process redesign. Plans are in place to measure longer-term productivity and success of the newly
promoted leaders and their teams.

CONCLUSIONS
Current trends in business suggest that the demand for high-quality leaders is high, the complexity of
their jobs has increased, and the process for readying future leaders is more difficult for organizations
to implement. Selection processes that have multiple phases and methods have the greatest likelihood
of success. These processes include well-developed and validated screening tools (e.g., cognitive tests,
biodata instruments, personality and/or situational tests) accompanied by more in-depth evaluations,
such as simulations, ACs, and structured interviews. Sound leadership selection processes that are tied
to development have the greatest impact on performance, especially when there is a sound implemen-
tation strategy (e.g., the way the need for assessment is communicated to participants; transparency
of results with those impacted by the assessment; clear accountability for the participants, managers
and HR; alignment with other HR systems; and success metrics that can be used to demonstrate ROI).
As the case studies demonstrate, differing organizational needs and contexts, such as start-up and
leadership pipeline, have differing demand characteristics that impact the tools and processes used
and the implementation strategy. Other organizational contexts, such as mergers and acquisitions and
the desire to improve employee and customer engagement, also have differential impact on the assess-
ment targets, as well as implications for how they are measured and implemented.
It is clear that private organizations are not all in the same place when it comes to improving the
performance of leaders. Although there are bright spots that can be pointed to as examples that oth-
ers should follow, the lack of a systematic approach to identifying, selecting, and developing leaders
provides opportunity for the future. Leadership is a topic that has been written about extensively
738 Handbook of Employee Selection

in the academic and popular business press, and there is no lack of theory or advice on defining
leadership or conceptualizing what steps should be taken to improve leadership performance. There
is, however, a lack of agreement on the best way to assess leadership potential and performance and
how to get individuals ready for entry- and mid-level leadership roles. To be useful, future practice
and research should seek to evaluate the specific tools, processes, and implementation strategies that
will create the best ROI within specific organizational contexts. Modern, successful selection sys-
tems balance the science of leadership selection with practical realities of the business environment.
Sustainable, long-term impact is achieved by taking a holistic and practical approach to interpret-
ing organizational context, weighing the potential impact of various selection tools, and rigorously
executing the implementation plan.

REFERENCES
Arthur, W., Woehr, D. J., & Maldegen, R. (2000). Convergent and discriminant validity of assessment center
dimensions: A conceptual and empirical re-examination of the assessment center construct-related para-
dox. Journal of Management, 26, 813–835.
Bandura, A. (2006). Social cognitive theory. In S. Rogelberg (Ed.). Encyclopedia of industrial/organizational
psychology. Beverly Hills, CA: Sage.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Bartram, D. (2004). Assessment in organizations. Applied Psychology: An International Review, 53, 237–259.
Bernthal, P. R., & Erker, S. (2005). Selection forecast: Recruiting and hiring talent. Pittsburgh, PA: Development
Dimensions International.
Bernthal, P. R., & Wellins, R. S. (2005). Leadership forecast 2005/2006: Best practices for tomorrow’s global
leaders. Pittsburgh, PA: Development Dimensions International.
Bray, D. W. (1982). The assessment center and the study of lives. American Psychologist, 37, 180–189.
Bray, D. W., & Howard, A. (1983). The AT&T Longitudinal Studies of Managers. New York, NY: The Guilford
Press.
Burke, R. J., & Cooper, C. L. (2000). The organization in crisis: Downsizing, restructuring, and privatization.
Hoboken, NJ: Blackwell.
Byham, T. M., Concelman, J., & Cosentino, C. (2007). Optimizing your leadership pipeline. Pittsburgh, PA:
Development Dimensions International.
Byham, W. C., Smith, A. B., & Paese, M. J. (2002). Grow your own leaders. Upper Saddle River, NJ: Prentice
Hall.
Campbell, J. P., McHenry, J. J, & Wise, L. L. (1990). Modeling job performance in a population of jobs.
Personnel Psychology, 43, 313–333.
Carlson, K. D., Scullen, S. E., Schmidt, F. L., Rothstein, H., & Erwin, F. (1999). Generalizable biographical data
validity can be achieved without multi-organizational development and keying. Personnel Psychology,
52, 731–755.
Charan, R., Drotter, S., & Noel, J. (2000). The leadership pipeline: How to build the leadership powered com-
pany. Hoboken, NJ: Jossey-Bass.
Feingold, A. (1994). Gender differences in personality: A meta-analysis. Psychological Bulletin, 116, 429–456.
Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., & Bentson, C. (1987). Meta-analysis of assessment center
validity. Journal of Applied Psychology, 72, 493–511.
Goldman, D. (1996). Emotional intelligence. London, England: Bloomsbury.
Goldstein, H., Riley, Y., & Yusko, K. P. (1999). Exploration of black-white subgroup differences on inter-
personal constructs. In B. Smith (Chair), Subgroup differences in employment testing. Symposium
conducted at the 14th annual conference of the Society for Industrial and Organizational Psychology,
Atlanta, Georgia.
Guion, R. M. (1998). Assessment, measurement, and prediction for personnel decisions. Mahwah, NJ:
Lawrence Erlbaum.
Hoffman, C. C., & Thornton, G. C. (1997). Examining selection utility where competing predictors differ in
adverse impact. Personnel Psychology, 50, 455–470.
Hogan, J., Davies, S., & Hogan, R. (2007). Generalizing personality-based validity evidence. In. S. M. McPhail
(Ed), Alternative validation strategies: Developing new and leveraging existing validity evidence
(pp. 181–229). Hoboken, NJ: John Wiley & Sons.
Selection Methods and Desired Outcomes 739

Hogan, J., & Ones, D. S. (1997). Conscientiousness and integrity at work. The Handbook of Personality
Psychology. New York, NY: Academic Press.
Hogan, R., & Hogan, J. (1995). Hogan Personality Inventory manual. Tulsa, OK: Hogan Assessment
Systems.
Hogan, R., Hogan, J., & Roberts, B. W. (1996). Personality measurement and employment decisions: Questions
and answers. American Psychologist, 51, 469–477.
Hogan, R., & Kaiser, R. B. (2005). What we know about leadership. Review of General Psychology, 9,
169–180.
Hollander, E. (2006). Influence processes in leadership-followership: Inclusion and the Idiosyncrasy Credit
Model. In D. A. Hantula (Ed.), Theoretical & methodological advances in social & organizational psy-
chology: A tribute to Ralph Rosnow. Mahwah, NJ: Lawrence Erlbaum.
Hollander, E. (2008). Inclusive leadership and leader-follower relations: Concepts, research, and applications.
New York, NY: Routledge/Psychology Press.
Hough, L. M., & Oswald, F. L. (2000). Personnel selection: Looking toward the future—remembering the past.
Annual Review of Psychology, 51, 631–664.
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants, detections and amelioration of
adverse impact in personnel selection procedures: Issues, evidence and lessons learned. International
Journal of Selection and Assessment, 9, 152–194.
Howard, A. (2006). Best practices in leader selection. In J. A. Conger & R. E. Riggio (Eds.), The practice
of leadership: Developing the next generation of leaders. San Francisco, CA: Jossey-Bass.
Hunter, J. E. (1980). Validity generalization for 12,000 jobs: An application of synthetic validity and validity
generalization to the general aptitude test battery (GATB). Washington, DC: U.S. Department of Labor:
U.S. Employment Service.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Journal
of Applied Psychology, 96, 72–98.
Hunter, J. E., Schmidt, F. L., & Judiesch, M. K. (1990). Individual differences in output variability as a function
of job complexity. Journal of Applied Psychology, 75, 28–42.
Katz, D., & Kahn, R. L. (1978). The social psychology of organizations. Hoboken, NJ: John Wiley &
Sons, Inc.
Krajewski, H. T., Goffin, R. D., McCarthy, J. M., Rothstein, M. G., & Johnston, N. (2006). Comparing the
validity of structured interviews for managerial-level employees: Should we look to the past or focus on
the future? Journal of Occupational and Organizational Psychology, 79, 411–432.
Kramer, R. M. (1999). Trust and distrust in organizations: Emerging perspectives, enduring questions. Annual
Reviews in Psychology, 50, 569–598.
Lefkowitz, J., Gebbia, M. I., Balsam, T., & Dunn, L. (1999). Dimensions of biodata items and their relation-
ships to item validity. Journal of Occupational and Organizational Psychology, 72, 331–350.
Marston, C. (2007). Motivating the “What’s in it for me?” workforce. Hoboken, NJ: John Wiley & Sons.
McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M. A., & Braverman, E. P. (2001). Use of situ-
ational judgment tests to predict job performance: A clarification of the literature. Journal of Applied
Psychology, 86, 730–740.
McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The validity of employment inter-
views: A comprehensive review and meta-analysis. Journal of Applied Psychology, 79, 599–616.
Mumford, M. D., Stokes, G. S., & Owens, W. A. (1990). Patterns of life history: The ecology of human indi-
viduality. Hillsdale, NJ: Lawrence Erlbaum.
Ones, D. S., & Viswesvaran, C. (1996). What do pre-employment customer service scales measure? Explorations
in construct validity and implications for personnel selection. Presented at the Annual Meeting for the
Society of Industrial and Organizational Psychology, San Diego, CA.
Ones, D. S. & Viswesvaran, C. (1998). Gender, age and race differences on overt integrity tests: Analyses
across four large-scale applicant data sets. Journal of Applied Psychology, 83, 35–42.
Robertson, I. T., Barron, H., Gibbons, P., MacIver, R., & Nyfield, G. (2000). Conscientiousness and managerial
performance. Journal of Occupational and Organizational Psychology, 66, 225–244.
Robertson, I. T., & Smith, M. (2001). Personnel selection. Journal of Occupational and Organizational
Psychology, 74, 441–472.
Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in
employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75,
175–184.
Ryan, A. M., McFarland, L., Baron, H., & Page, R. (1999). An international look at selection practices: Nation
and culture as explanations for variability in practice. Personnel Psychology, 52, 359–391.
740 Handbook of Employee Selection

Sackett, P. R., & Wilk, S. L. (1994). Within-group norming and other forms of score adjustment in preemploy-
ment testing. American Psychologist, 49, 929–954.
Salgado, J. F., Viswesvaran, C, & Ones, D. S. (2001). Predictors used for personnel selection: An overview
of constructs, methods, and techniques. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran
(Eds.), Handbook of industrial, work, and organizational psychology. Vol. 1, Personnel psychology.
Thousand Oaks, CA: Sage.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology:
Practical and theoretical implications of 85 years of research findings. Journal of Applied Psychology,
124, 262–274.
Schmidt, F. L., Ones, D. S., & Hunter, J. E. (1992). Personnel selection. Annual Review of Psychology, 43,
671–710.
Shackleton, V., & Newell, S. (1997). International assessment and selection. In N. Anderson & P. Herriot
(Eds.), International handbook of selection and assessment. Chichester, England: Wiley.
Society for Human Resource Management. (2009). SHRM’s 2009 HR trend book. Alexandria, VA: Author.
Stricker, L. J., & Rock, D. A. (1998). Assessing leadership potential with a biographical measure of personality
traits. International Journal of Selection and Assessment, 6, 162–184.
Thornton, G. C., & Rupp, D. E. (2006). Assessment centers in human resource management: Strategies for
prediction, diagnosis, and development. Mahwah, NJ: Lawrence Erlbaum.
Wernimont, P. F. & Campbell, J. P. (1968). Signs, samples, and criteria. Journal of Applied Psychology, 52,
372–376.
34 Blue-Collar Selection in
Private Sector Organizations

Wanda J. Campbell and Robert A. Ramos

This chapter focuses on the issues that distinguish blue-collar/technical selection from other
selection procedures. The chapter begins with a definition of blue-collar/technical jobs and pro-
vides a comprehensive survey of the environment in which blue-collar jobs occur. Thereafter,
issues and challenges associated with planning and developing, implementing, and maintaining
selection procedures are addressed. Finally, recruitment and employee development are discussed
as positive measures that may increase the number of qualified, diverse candidates and provide
political support for selection procedures.

DEFINITION OF BLUE-COLLAR JOBS


The term “blue collar” was used in the chapter title to provide clarity for the reader. Because “blue
collar” can have a negative connotation, the preferred designation of “technical” jobs is used in the
remainder of this chapter.
Technical jobs refer to a wide array of jobs that range from entry-level positions that rely heavily
on physical exertion (e.g., laborers) to skilled trades jobs that require specialized knowledge, skills,
and abilities (e.g., electrician, mechanic, machinist, carpenter, welder, and pipefitter). Oftentimes,
the jobs occur in work environments that may be noisy, dirty, and subject to temperature extremes.
Skilled technical jobs are often associated with formalized apprenticeship programs, which may
provide for 3 or more years of structured classroom and on-the-job training. Very often, individuals
are hired into entry-level jobs and over time, along with adequate performance in training and on
the job, can be promoted to skilled jobs within a line of progression. For example, in the electric
utility industry, a new employee is typically hired into a nonskilled job such as a laborer position.
In a period of 3–5 years, most of the laborers are promoted to an apprenticeship program (e.g., plant
operator, maintenance, lineworker) culminating in the promotion to journeyman status, which sig-
nifies mastery of the work encompassed by the particular job progression.
Mastery tests and work samples are often used in the apprenticeship programs to ensure that
adequate job knowledge is accompanied by skill in performing job tasks. Frequently, job knowledge
and work sample tests must be passed in order to advance in the apprenticeship program. Those
apprentices who do not pass the training tests may be removed from the apprenticeship program and
required to seek other positions in order to remain with the organization.

ENVIRONMENT
Understanding the environment for technical jobs requires recognition of the distinctions from
management selection, the effect of recent economic changes, and the perspectives of management
and labor organizations, and the history of distrust.

741
742 Handbook of Employee Selection

DISTINCTIONS BETWEEN TECHNICAL AND MANAGEMENT SELECTION


Large corporations generally have separate organizations responsible for management and techni-
cal staffing. Management staffing is frequently focused on college hires whereas technical staffing
typically relies on high school and technical/trade school graduates. One example of this approach
is provided by the military. Officers, especially at higher ranks, are typically products of military
academies or college ROTC programs whereas enlisted personnel are generally recruited out of
high school.
One potential negative consequence of this distinction is a pervasive lack of appreciation for the
knowledge, skills, and abilities that are required for technical jobs. The farther removed manage-
ment is from the work being done, the more inclined they are to minimize the qualifications required
for successful job performance. By contrast, line management and technical trainers whose work
brings them into much closer contact with technical workers have a much greater appreciation for
the skills that the jobs require. As an example, it is not unusual for corporate leaders to question
the necessity for technical workers to read, speak, and understand English when trying to address
shortfalls in the availability of skilled workers. The presumption is that the jobs are so simple that
it is not necessary for the incumbents to be conversant in English. By contrast, discussions with
line managers and technical trainers make clear that fluency in the English language is required for
effective job performance and employee safety.
The belief system surrounding the qualifications required to perform technical jobs permeates
all decisions regarding the planning and development of the selection procedure. Whereas testing
for management positions may be justified for a full day or more in some organizations, most selec-
tion programs for technical workers are limited to two or three hours of testing time (or less). If
jobs are perceived to be low level, then intensive screening is viewed as an unnecessary waste of
recruitment resources. The large volume of candidates that historically have sought technical posi-
tions, combined with sometimes inadequate prescreening, also supports the use of relatively short
testing procedures.

IMPACT OF ECONOMIC CHANGES ON SELECTION


Employee selection, like everything else, is subject to the economic cycle and issues of supply
and demand. The joint economic pressures of the aging workforce (Hedge, Borman, & Lammlein,
2006) combined with the relative scarcity of skilled labor may alter management’s perception of
the importance of technical jobs. This is especially true when the work performed by the techni-
cal employees represents the product or services that generate revenue. As an example, within the
electric utility industry as many as 50% of journeyman lineworkers are projected to retire within
the next 5 years. Lineworkers are instrumental in the reliable delivery of electricity, which is criti-
cal to the mission of the industry. The importance of the lineworker job justifies the labor-intensive
work samples that supplement cognitive ability tests as part of the selection process in many orga-
nizations. In a few cases, however, the demand is so high that organizations are willing to set aside
selection procedures to get enough people into the 3+-year apprenticeship program.
Given the sheer number of individuals required in technical jobs, the extensive training required
for apprentices to progress to the journeyman level (three or more years), and the shortage of quali-
fied individuals who are interested in pursuing these types of careers, practitioners have learned to
be careful to avoid even the appearance of disparaging jobs that are so critical to the accomplish-
ment of organizational goals.

WORKING WITH LINE MANAGEMENT


Line managers are extremely powerful people. They are directly responsible for the productivity
that underlies the organization’s profits, and this fact is never far from anyone’s mind. In many
Blue-Collar Selection in Private Sector Organizations 743

industries, workforces have been cut to the bone, and line managers are resistant to any activity that
threatens their ability to achieve production goals; this includes the development of a selection pro-
cedure. When employees are completing job analysis questionnaires or taking experimental tests,
they are not performing their jobs. It is not always easy for line management to secure replacement
workers and doing so may involve payment of overtime wages to the available workers.

ROLE OF LABOR ORGANIZATIONS


Technical jobs may or may not be represented by a labor organization. Because the threat of a union
organizing and the repercussions of a union election are never far from the minds of those relying
on the efforts of technical workers, it is important to understand how labor organizations operate
and the implications of their presence on the selection process. Furthermore, those organizations
that operate outside of a union environment often face many of the same issues described in this
chapter. The chief difference between a represented and unrepresented workforce is the flexibility
and control that management has in dealing with the employment issues.
Labor leaders are charged with looking after the welfare of their members. Key issues for labor
organizations include job security, safety, wages, and benefits. Labor leaders take these responsibilities
very seriously, and one of the most effective ways of safeguarding their members’ interests is to gain
as much control as possible over policies and procedures that affect the represented workforce.
Obviously, selection procedures are of great interest to labor leaders. From the union perspective,
seniority is the preferred basis for selecting one represented employee over another in employment
decisions. Positive events such as promotions favor the employee with the most seniority, and nega-
tive events such as demotions or reductions in force have the most impact on those employees with
the least seniority. Any selection procedure that diminishes the impact of seniority systems will
receive close scrutiny by labor leaders.

TRADITION OF DISTRUST
The history of the labor movement is long, and the relationship between management and labor has
often been turbulent. There has been violence on both sides, and it takes considerable effort to over-
come historical difficulties, animosity, and distrust. It is convenient to brush away these problems as
ancient history, but the truth is that the problems persist. Management is not above using reductions
in force to enhance short-term profitability, and labor is not above calling strikes to obtain conces-
sions that organizations cannot long afford.

WORKING WITH LABOR ORGANIZATIONS


Like line management, labor organizations exert tremendous power and influence over the development
and implementation of selection procedures. The success of an attempt to introduce a selection pro-
cedure will be governed to a great extent by the history of relations between the organization and the
union. If the relationship has been an acrimonious one, then there may be little that can be done in the
short run to gain the acceptance of the labor leaders and their members (Bownas, 2000). By contrast, if
the relationship has been relatively positive, then the introduction of the concept of a selection procedure
would follow the same procedure as introducing change to any other organization.

SUMMARY
In most organizations, employee selection for technical jobs tends to be housed in a separate area
than management selection. Candidates for technical jobs tend to be graduates of high school or
technical schools as opposed to graduates of 4-year colleges. Organizations provide much of the job
training, frequently through formalized apprenticeship programs.
744 Handbook of Employee Selection

Corporate management sometimes underestimates the knowledge, skills, and abilities required
for technical jobs, and this belief system permeates all aspects of the planning and development of
the selection procedure. Perceptions of the importance of technical jobs may be modified as the
nation faces the joint economic pressures of an aging workforce combined with the relative scarcity
of skilled labor.
Understanding the perspectives of management and labor organizations and the tradition of
distrust is essential when working on selection procedures for technical jobs. As David Bownas
(2000) noted, management and labor have a great deal in common. Both have attained positions of
power and influence, and neither is likely to question their own judgment in matters of personnel
selection. Support from labor and management is essential to the development of a successful selec-
tion procedure.

PLANNING AND DEVELOPING THE SELECTION PROCESS:


PSYCHOMETRIC AND PRACTICAL CONSIDERATIONS
Economic considerations are paramount in organizational decisions regarding whether to pro-
ceed with the development of a selection procedure. Once this decision has been made, two
critical factors considered when choosing a selection procedure are validity and subgroup differ-
ences. Other issues of importance include balancing the fidelity of the assessment with efficiency,
using cut-off scores versus rank-ordering, and becoming familiar with the jobs covered by the
labor contracts.

ECONOMIC CONSIDERATIONS
Before embarking on what could be a very expensive validity study, it is important to determine that
the need faced by the organization can be addressed by a new selection procedure. For example, a
turnover problem may be due to perceptions of inadequate compensation levels or poor management
practices.
It is necessary to consider the number of individuals who will flow through the new selection
process as well as the company’s investment in the employees selected. If the number of employees
selected is small and the investment in development is nominal, a cost benefit analysis might lead
to the conclusion that the expense of a new selection process is not worth the perceived benefit. In
general, the larger the number of individuals who flow through a selection process or the greater the
expenses in time and development, the greater the potential benefit of a selection procedure to the
corporation. As an example, the electric utility industry typically invests approximately $250,000 or
more in the training of each nuclear plant operator. Thus, the decision to invest in a comprehensive
selection process is a prudent business decision.
The resources that organizations are willing to invest in a selection procedure are often governed
by their perception of the value of the job. As indicated earlier, sophisticated, time-consuming, and
expensive processes that may be acceptable for high level or mission critical jobs are very much
different from what is acceptable for lower level jobs. In one large corporation, the service repre-
sentative position, a relatively low-level position, was viewed as a mission critical job because it
was the primary interface with customers. As a consequence, the company was willing to expend
considerable resources on this nonmanagement job.
One of the advantages of valid selection procedures is the economic benefit associated with the
selective identification of individuals placed into positions. A valid selection procedure provides
the opportunity to identify candidates that can increase the overall level of job performance
for the organization. The increase in average job performance can be translated into economic
values or other significant organizational outcomes such as the reduction of training costs due to
turnover. As the number of people selected, the magnitude of the validity, and the variability of
Blue-Collar Selection in Private Sector Organizations 745

job performance increases, so does the economic benefit due to selection (Cascio, 1982; Cascio &
Ramos, 1986).

VALIDITY OF TECHNICAL SELECTION PROCEDURES


There have not been many published meta-analyses of technical validity studies. Fortunately, we
have several classic reviews and meta-analyses that provide guidance on the significant predictors
of technical job performance.
With respect to trades and crafts jobs, Ghiselli (1966) found that tests of intellectual abilities
and tests of mechanical and spatial abilities were predictive of training criteria, with correlations
on the order of .40. Tests of perceptual accuracy had correlations in the low .30s with training
criteria. Motor abilities and personality tests were associated with correlations slightly lower
than .20 for training criteria. Similar results were found for mechanical repairmen and electrical
workers. The results for unskilled and semi-skilled industrial workers showed a similar pattern
for training with somewhat lower correlations. One exception was that motor abilities were much
more predictive of training success in unskilled and semi-skilled jobs than they were for more
highly skilled jobs. Prediction of job performance was less impressive for the trades and crafts
jobs with all five types of tests having correlations ranging from the high teens to the mid-.20s.
The correlations for unskilled and semi-skilled jobs were also clustered in a narrow band, but at
a somewhat lower level.
Since the time of Ghiselli, there have been substantial advancements in the area of meta-analysis
(Hunter, Schmidt, & Le, 2006; Schmidt, Hunter, & Pearlman, 1981). A meta-analysis performed for
construction and skilled trades for the electric utility industry (Jones & Gottschalk, 1988) collected
141 studies and included 2,062 coefficients, 97% of which involved training or job proficiency
criteria. Tables 34.1 and 34.2 present the meta-analysis summaries corrected for sampling error

TABLE 34.1
Meta-Analysis Summary Correcting for Sampling Error,
Criteria and Predictor Attenuation, and Range Restriction
Number of 95% Confidence
Validity Limits 90%
Number of Coefficients Total Sample Mean Credibility
Predictor Type Studies Reviewed Represented Validity Lower Upper Value
Biographical
information 2 11 1,990 .23 .13 .33 .16
General mental ability 9 12 1,492 .66 .66 .66 .66
Mechanical ability 27 48 35,453 .76 .76 .76 .76
Memory 3 10 2,998 .55 .55 .55 .55
Motor ability 20 150 29,587 .40 .40 .40 .40
Perceptual speed 28 81 33,308 .65 .65 .65 .65
Personality and
interest inventories 7 37 8,375 .22 .22 .22 .22
Quantitative ability 39 87 57,309 .77 .77 .77 .77
Reasoning ability 5 18 1,598 .64 .64 .64 .64
Spatial ability 39 122 41,838 .70 .70 .70 .70
Verbal ability 31 70 14,419 .68 .68 .68 .68

Source: From Jones, D. P., & Gottschalk, R. J., Validation of selection procedures for electric utility construction and
skilled trades occupations: Literature review and meta-analysis of related validation studies, Edison Electric
Institute, Washington, DC, 1988. With permission.
746 Handbook of Employee Selection

TABLE 34.2
Meta-Analysis Summary Corrected for Sampling Error and
Attenuation in the Job Proficiency Criterion
Number of 95% Confidence
Validity Total Limits 90%
Number Coefficients Sample Mean Credibility
Predictor Type of Studies Reviewed Represented Validity Lower Upper Value
Biographical
information 2 11 1,870 −.02 −.02 −.02 −.02
General mental ability 16 37 2,274 .33 .33 .33 .33
Mechanical ability 45 90 10,363 .53 .53 .53 .53
Memory 3 9 700 .30 .30 .30 .30
Motor ability 31 200 29,701 .39 .39 .39 .39
Perceptual speed 33 99 12,260 .41 .41 .41 .41
Personality and interest
inventories 7 52 5,398 .04 .04 .04 .04
Quantitative ability 53 119 17,032 .45 .45 .45 .45
Reasoning ability 6 15 1,813 .34 .34 .34 .34
Spatial ability 52 198 25,762 .49 .49 .49 .49
Verbal ability 44 92 8,996 .37 .37 .37 .37

Source: From Jones, D. P., & Gottschalk, R. J., Validation of selection procedures for electric utility construction and skilled
trades occupations: Literature review and meta-analysis of related validation studies, Edison Electric Institute,
Washington, DC, 1988. With permission.

and attenuation in the criterion for training criteria and job proficiency criteria validities, respec-
tively. Consistent with previous research, correlations for training criteria were generally higher
than those for job proficiency criteria. The four highest mean validity coefficients for training
criteria were for mechanical ability (.76), quantitative ability (.77), spatial ability (.70), and verbal
ability (.68). The three lowest mean validity coefficients for training criteria were for biographical
information (.23), personality and interest inventories (.22), and motor ability (.40). The four high-
est mean validity coefficients for job proficiency criteria were for mechanical ability (.53), spatial
ability (.49), quantitative ability (.45), and perceptual speed (.41). The two lowest correlation coeffi-
cients for job proficiency criteria were biographical information (−.02) and personality and interest
inventories (.04).
The Jones and Gottschalk (1988) meta-analyses presented somewhat lower “corrected validities”
than past meta-analytic reviews. They cited the following reasons for the reduced corrected validi-
ties (p. 60):

• Relatively conservative coding rules (e.g., coding as zero reported nonsignificant correlations)
• Relatively conservative assumptions about the mean level of criterion reliability (e.g., .70)
and the amount of variance in criterion reliability among studies
• A focus on the validity of single tests rather than overall test batteries
• A focus on coding the validity of a predictor against the “overall criterion” measure
available in a given validity study, as opposed to focusing on correlations with individual
criterion ratings

Schmidt, Hunter, Pearlman, and Shane (1979) presented validity generalization data (mean
correlations) on three technical jobs: mechanical repairman, r = .78 with mechanical principles
Blue-Collar Selection in Private Sector Organizations 747

tests; bench workers, r = .39 with finger dexterity tests; and machine tenders, r = .05 with spatial
ability tests.
Schmidt et al. (1981) reported average validities (mean correlations) for 35 mostly technical jobs
obtained with the Army Classification Battery. Validities for two samples were as follows:

• Vocabulary: .37 and .35


• Arithmetic reasoning: .41 and .40
• Spatial aptitude: .34 and .37
• Mechanical aptitude: .38 and .36
• Clerical speed: .26 and .21
• Radiocode aptitude: .29 and .29
• Shop mechanics: .31 and .32
• Automotive information: .22 and .27
• Electronics information: .32 and .27
• Radio information: .17 and .15

A common thread running through the results presented in this section is that cognitive ability
tests of verbal, math, spatial, and mechanical comprehension are valid predictors of job performance
for the broad spectrum of technical jobs. Biographical, personality, and interest inventories—all
self-report measures—provided near-zero validities for job performance.

SUBGROUP DIFFERENCES ON TEST PERFORMANCE


As indicated above, cognitive predictors are the most valid predictors of job performance for
technical jobs. Hunter and Hunter (1984) stated that cognitive ability tests are considered the
most valid predictors for nearly all jobs. However, cognitive tests typically result in group
differences with Black-White mean differences of about one standard deviation and Hispanic-
White mean differences of about .60 standard deviations. Since the late 1970s, research has
explored the use of alternative selection measures with less adverse impact in response to the
encouragement provided by the federal Uniform Guidelines on Employee Selection Procedures
(1978).
The focus of the search for alternative selection procedures has typically been on noncog-
nitive self-report measures such as biodata, personality, and interest measures. The goal of
this research has been to identify alternative selection measures that would allow organiza-
tions to select more minorities and increase organizational diversity (Hough, 1998). However,
self-report measures for technical jobs have been found to have low reliability and validity.
As a consequence, their use in technical selection batteries creates a conflict with utility.
Further, research shows that the combination of cognitive and noncognitive predictors results
in no or minimal change in adverse impact. Noncognitive predictors do not compensate for
the differences due to cognitive ability tests (Pulakos & Schmitt, 1996; Ryan, Ployhart, &
Friedel, 1998).
A major assumption associated with research related to self-report measures is they have less
or no adverse impact for minorities. However, in our own research and that of other colleagues
(personal communications), White applicants often score substantially higher than minorities on
self-report measures. In particular, women, Hispanics and Asian Americans may score well below
Whites in a technical selection environment. Thus, the use of these measures can contribute addi-
tional adverse impact to a selection process. Before introducing a self-report measure into a selec-
tion battery in order to reduce adverse impact, it would be prudent to determine empirically that the
measure does, in fact, produce the desired effect.
Another major problem associated with noncognitive measures is the tendency (and ability) to
“fake good” on tests used for employment purposes. In general, the faking issue is associated with
748 Handbook of Employee Selection

any self-report measure used in employment testing whose responses are not verified. One of the
experimental designs often used to detect faking on noncognitive measures is to compare the scores
of job incumbents from the original validity study with job applicants. Presumably, job incumbents
participating in a validity study will respond honestly on noncognitive measures because there are
no negative consequences associated with their test performance. Applicants, on the other hand,
are highly motivated to present themselves as positively as they can in order to obtain employment.
One would expect job incumbents to generally score higher on valid employment measures because
successful job performance is associated with having or developing increasing levels of skills and
abilities required by the job.
Research examining the differences in responses to personality measures by applicants ver-
sus nonapplicant groups typically show significant and substantially higher scores in the applicant
group. In a meta-analysis of job applicant faking on personality measures, Birkeland, Manson,
Kisamore, Brannick, and Smith (2006) found, across all jobs, that applicants scored significantly
higher than nonapplicants on extraversion, emotional stability, conscientiousness, and openness. In
addition, this study showed that job applicants distorted their responses on personality dimensions
that were viewed as particularly job relevant. These results show that intentional faking is the norm
for a highly motivated group in a high stakes testing environment such as employment testing.
Finally, the research evidence shows that faking behavior on the part of applicants can have a
negative impact on the predictive and construct validity of noncognitive measures. When compared
to the original validation sample, Hough (1998) has found that validity is often lower in applicant
groups. Douglas, McDaniel, and Snell (1996) presented evidence that criterion related validity may
decrease or disappear for applicant groups. Douglas et al. reported that the factor structure of per-
sonality tests changes for groups instructed to fake versus honest responders.

BALANCING FIDELITY WITH EFFICIENCY


Given the validity evidence presented earlier, there seems to be little incentive to include noncogni-
tive components in test batteries for technical positions. A dilemma faced by practitioners working
with technical jobs is balancing the desire for tests that resemble aspects of the jobs with the desire
to have an efficient test that may be used to predict job performance across a range of related jobs.
Work samples are sometimes perceived to have lower mean group differences than cognitive ability
tests. However, if work samples are used exclusively, it may be necessary to develop several simula-
tions to adequately cover the important job duties. Computer-based work samples do not require
highly paid supervisors or trainers to serve as raters but may be rather expensive to develop or
modify. Thus, cost considerations may limit the viability of work samples for many jobs. One way
to take advantage of the cost effectiveness of cognitive ability tests and utilize the realism of work
samples is to combine the two. For example, in the electric utility industry, cognitive ability tests
are used initially to screen candidates for lineworker positions. Those candidates who possess the
cognitive abilities required for the job are then trained and tested on their ability to climb poles.

CUT-OFF SCORES VERSUS TOP-DOWN SELECTION


Once the selection procedure has been agreed upon, the next step is the standard for establishing
successful test performance. Top-down selection has greater utility than the use of cut-off scores,
but this utility comes at the price of higher validity requirements based on the federal Uniform
Guidelines on Employee Selection Procedures (1978). Top-down selection is practical when test-
ing is done periodically for large groups of candidates; however, in a continuous testing environ-
ment, maintaining updated databases is cumbersome, and providing feedback is more complicated.
Furthermore, the use of cut-off scores results in less adverse impact than top-down selection. Lastly,
the use of cut-off scores provides a compromise position in represented environments: The most
senior employee who passes the selection test is chosen.
Blue-Collar Selection in Private Sector Organizations 749

KNOWING WHICH JOBS ARE COVERED BY THE LABOR CONTRACTS


It is imperative that the psychologist become acquainted with the jobs covered by labor contracts
and the labor organizations involved prior to beginning work on test development. Sometimes orga-
nizations interact with more than one union, and it is quite possible for the same job to be covered by
more than one labor organization based on geography and the history of mergers and acquisitions.
Once the psychologist is familiar with the provisions of the relevant labor contracts, it is neces-
sary to understand the constituent needs that can be met by the selection procedure as well as the
sources of resistance for management and labor organizations. Selection procedures require the
cooperation of line management and labor. Gaining the acceptance and cooperation of multiple
labor organizations is particularly challenging. Oftentimes, the tests are developed with the assis-
tance of one or two unions and transported to the sites of other units at a later point.

SUMMARY
Selection procedures should only be developed if doing so provides a solution to the organization’s
problem. Once this is determined, the selection procedure’s validity and subgroup differences need
to be examined. A review of the sparse published literature on technical jobs revealed that cogni-
tive ability tests of verbal, math, spatial, and mechanical comprehension are valid predictors of job
performance for a broad range of technical jobs. Biographical, personality, and interest invento-
ries provided near-zero validities for job performance. Furthermore, in a technical environment,
women, Hispanics, and Asian Americans scored substantially below White males on noncognitive
measures.
Efficiency is generally preferred over fidelity because of the wide variety of jobs that can be
subsumed under one selection procedure. Top-down selection has greater utility than the use of
cut-off scores, but this utility comes at the price of higher validity requirements, and the process is
cumbersome in a continuous testing environment. The use of cut-off scores results in less adverse
impact than top-down selection and serves as a compromise in represented environments. The
employee with the most seniority who passes the test is selected.

PLANNING AND DEVELOPING THE SELECTION PROCESS:


THE CONSTITUENTS, THEIR ISSUES, AND PREFERENCES
The issue of control of the selection process must be addressed in a represented environment.
Management and labor organizations have their preferences with regard to the characteristics of
the selection procedures. It is essential to understand the needs and resistances of management and
labor organizations to persuade both to participate in the development of the selection procedure.

CONTROL OF THE SELECTION PROCESS


Management has traditionally controlled the decisions regarding the types of selection procedures
that will be developed and implemented. This power equation shifts with the existence of a labor
organization. The successful development and implementation of a selection procedure is dependent
on the historic relationship between management and workers as well as the concessions made by
each over time.
In some cases, management retains both the right to determine qualifications for jobs and the
upper hand when making decisions regarding the selection procedures to be used and determining
selection standards. Employee selection thus falls within management’s “reserved rights,” which are
defined as management’s authority in all matters except those it expressly conceded in the collec-
tive bargaining agreement or which are restricted by law (Prasow & Peters, 1970). In other cases,
management has relinquished some of this control and in some instances the collective bargaining
750 Handbook of Employee Selection

agreement specifies not only the selection procedure(s) to be used but also the cut-off scores and the
nature of the feedback that must be provided to their members.

DESIRED CHARACTERISTICS OF SELECTION PROCEDURES FOR TECHNICAL JOBS


Management’s preference for short, cost-effective selection procedures supports the use of cognitive
ability tests, personality tests, and biographical inventories. Work samples are viewed as effective,
but often are too expensive when hiring entry-level employees. The exception is jobs in which large
numbers of employees are unable to pass an expensive training program. In the latter case, the high
cost of training, combined with the high failure rate, exceed the cost of developing, implementing,
and maintaining a job simulation.
Technical employees and labor organizations have a strong preference for objective tests that
look like the job. Work samples satisfy the job fidelity issues, but have drawbacks in terms of the
potential subjectivity in the evaluations.
The resulting compromise is often a battery of cognitive ability tests that incorporate as many
technical jobs as possible. Labor organizations’ involvement with apprenticeship programs has
taught them that certain knowledge, skills, and abilities are required for effective job performance.
Noncognitive predictors are favored by management because of the desire to assess constructs
such as conscientiousness and interpersonal skills. However, union members have a high regard for
their individual privacy and often react negatively to personality tests and biographical inventories.
Labor organizations tend to believe that their members’ personalities and personal experiences are
none of management’s business. Anecdotal evidence from colleagues’ and the authors’ experience
indicates that labor organizations will often oppose the use of noncognitive selection tools even
when agreeing to the use of cognitive ability tests. In several cases, labor organizations were able
to prevent the implementation of selection procedures until management agreed to the removal of
the noncognitive components. In other cases, the noncognitive components have been at the heart of
arbitrations challenging the validity of the selection procedures. Thus, within the technical environ-
ment, the costs of noncognitive measures in terms of poor validity, subgroup differences, and the
negative reaction of labor environments outweigh the benefits.

WORKING WITH LINE MANAGEMENT


As stated previously, much of management’s resistance comes from the difficulty in meeting pro-
duction goals when workers are taken off the job to participate in the development of a selection
procedure. If replacement workers are available, it may be at the price of overtime, which will come
out of management’s budgets. Lastly, the adoption of a selection procedure means further loss of
control of some aspects of the selection process.
Resistance to the labor requirements associated with the development of a selection procedure is
reduced to the extent that management is the catalyst for change. Managers are aware that a great
deal of time, energy, and resources are expended dealing with marginal and incompetent workers.
Early on, the least capable employees slow down the progress of training programs. On the job,
productivity suffers when less skilled employees reside in jobs that could be filled by more compe-
tent employees. Finally, remediation, counseling, and disciplinary actions take a tremendous toll on
the organization’s resources. Managers who have successfully terminated incompetent workers are
among those best able to appreciate the benefits of a valid selection procedure.
The fact that someone in line management has requested the research does not ensure coopera-
tion at all levels. Thus, everything that we know about communicating the benefits of the research,
explaining the process and providing detailed descriptions of the resources required is essential for
a successful project. Working with line management to determine the best time to take workers off
the job and minimize the disruption will go a long way in ensuring their cooperation during the
validation process.
Blue-Collar Selection in Private Sector Organizations 751

Line managers frequently reach their positions by one of two paths: a degree in engineering or
advancement through a range of technical jobs such as those they now supervise. Line managers
work closely with the technical workers and generally have an understanding of the skills and abili-
ties required for successful performance. Thus, line management is a valuable resource as well as
a powerful ally.
A sometimes overlooked patron of selection procedures is the technical training staff. As the
resident experts on the tasks and the ways in which they are to be performed, the trainers have
tremendous influence with line management. Trainers are well-versed on the knowledge, skills,
and abilities required for successful performance on the job, and many of them are keenly aware
that training cannot cure all deficiencies. Technical trainers often understand more readily than
line management what information the psychologists require and can contribute enormously to the
development of task lists and job requirements in the job analysis stage as well as the identification
of criterion measures.
A final note on the value of technical trainers is that they are the first people in the organi-
zation to see the benefits of a newly implemented selection procedure and the first to see the
decline in abilities when valid selection procedures are discontinued. Thus, forming an alliance
with trainers is well worth the effort and pays dividends long after the selection procedure has
been implemented.

WORKING WITH LABOR ORGANIZATIONS


In a represented environment, technical employees are not inclined to participate in any aspect
of the development of the selection process without the approval of the labor leaders. A difficulty
encountered with regard to the introduction of a selection procedure in a union environment is
that such systems are at odds with the time honored concept of seniority. Since the beginning of
the labor movement, seniority has served as the basis for awarding attractive job opportunities. As
Bownas (2000) indicated, seniority has the aura of democracy in that, over time, everyone who
remains with the organization will have the opportunity to advance. Seniority also enables the union
to avoid having to favor one member over another. The most senior employee who applies for a job is
selected and provided with an opportunity to learn the job. From the union’s perspective, employees
who are unable to learn a new job should be permitted to return to their former job. Obviously, this
is a very expensive approach to selection and one that has little likelihood of consistently yielding
positive results for management. Thus, it is not difficult to understand why labor organizations view
the traditional seniority system as the ideal selection procedure and why management often is eager
to supplement it with a valid selection program. The selection model that addresses the needs of
both labor and management is to select the senior qualified candidate, where qualified is defined as
being successful on the selection procedure.
Labor leaders, like management, need to be educated concerning the benefits associated with the
introduction of a selection procedure. Fortunately, some labor organizations are beginning to under-
stand that their long-term survival depends on the financial well being of the employer. Factors that
may be used to persuade labor leaders of the advantages of supporting a selection procedure and
participating in its development include employee safety, a reduction in adverse employment deci-
sions, and the inherent fairness of the selection procedure.

SELLING SELECTION PROCEDURES TO MANAGEMENT AND LABOR


Employee Safety
Safety is a concern to management and technical workers. In addition to management’s concern for
the well being of its employees, there are also external motivators in the form of the Department
of Labor’s Office of Safety and Health Administration (OSHA) and the Office of Workman’s
Compensation Programs (OWCP).
752 Handbook of Employee Selection

Employee safety, like job security, is part of the very fabric of the labor movement. Thus, for
those occupations where the safety of employees is dependent upon their competence, employee
safety can be a powerful argument for the introduction of a selection procedure. The authors are
familiar with organizations where the represented employees, because of concerns regarding their
personal safety, have refused to work with employees who were hired during a period when the
selection procedure was temporarily suspended. In another case, a labor organization refused to
countenance the removal of a selection procedure for promotion because of their fears regarding the
impact of the removal on member safety.

Reduction in Adverse Employment Decisions


Disciplinary actions are extremely time-consuming for management, and the involvement of a labor
organization elongates the process. Collective bargaining agreements provide for a grievance process
to examine the merit of adverse employment decisions affecting employees. The process begins with
the filing of a grievance and continues with labor and management meetings involving successively
higher levels of representatives of management and labor. Unresolved issues may be submitted for
arbitration, and arbitration decisions may be appealed to the National Labor Relations Board.
Incompetent represented workers also pose a real dilemma for labor organizations. Technical
employees have little patience for their less skilled coworkers. At the very least, competent employ-
ees often must assume additional work to make up for production deficiencies, and they are justifi-
ably annoyed and vocal about the situation. The union is responsible for addressing these disputes
with management. To the extent that a valid selection procedure minimizes the number of future
members who cannot perform their work competently, this reduces the demands on the union to
deal with conflicts internal to the labor organization. In addition, the grievance and arbitration
procedures also have costs for the labor organization. Reducing the number of employees who are
facing legitimate disciplinary actions saves resources for disputes that the union views as more
meritorious.

Fairness of the Selection Procedure


All parties concerned are very interested in the objectivity and fairness of selection procedures.
Explanations regarding the process for developing the selection procedure and its proposed opera-
tion are often positively received and are necessary to ensure cooperation from line management
and labor organizations.

ESTABLISHING THE PARAMETERS AROUND TESTING


Decisions regarding the intended use of selection procedures are best made well in advance of their
development. Developing a test in a represented environment means that many of the implementa-
tion decisions must be made up front in order to obtain union support. Two issues that have tremen-
dous impact on the viability of developing a selection procedure are (a) whether it will be used for
applicants versus current employees and (b) what provisions will be made for grandparenting, a
procedure that exempts certain groups of employees from the selection procedure.

Applicants Versus Current Employees


Current employees will be much more responsive to requests for their assistance in the development
of the selection procedure if it will have no effect on their current positions and limited impact on
their ability to progress in their career.
Labor organizations generally have few issues with selection procedures that are put in place to
select candidates from outside of the organization. The one exception would be if there are union
members outside of the organization to which they wish to extend preferential consideration. The
greatest concern is when the selection procedure is to be used to promote current employees into
higher level jobs or to restrict lateral movement into other job progressions.
Blue-Collar Selection in Private Sector Organizations 753

The promotion of current employees is generally an issue when employees in entry-level posi-
tions apply for promotion to apprenticeship positions. Many organizations will give preferential
consideration to current employees over external applicants. This is particularly true in a repre-
sented environment. Thus, entry-level jobs are typically the only positions in an organization where
recruitment is almost exclusively external.
Resistance to testing current employees has resulted in many organizations choosing to extend
job qualifications for higher level jobs to entry level positions. This approach is limited to those
situations where a majority of the employees hired into the entry level position move up in the job
progression within a 5-year period (Uniform Guidelines, 1978). It is important to note that the 50%
majority applies only to those employees who have remained in the organization.
Imposing higher-level job requirements on entry level employees has several benefits. External
applicants are less likely to challenge selection procedures than current employees and are less
likely to have access to the information required to wage an effective attack on the selection pro-
cedure. External candidates also lack the standing to grieve and arbitrate adverse selection deci-
sions. Those candidates who are hired are able to progress upward in the job progression without
additional testing. This approach reduces morale problems among lower-level employees and the
attendant grievances and arbitrations. It also guarantees a pipeline of workers for the more demand-
ing jobs. To the extent that the selection procedure has adverse impact, imposing the selection
procedure as an entry level requirement may avoid the appearance of a segregated workforce, with
minority candidates in the lowest level position.

Grandparenting
As indicated previously, grandparenting refers to the practice of exempting certain groups of
employees from the selection procedure. It is standard procedure in most organizations to grandpar-
ent employees in their current job. The question arises concerning the breadth of the grandparenting
provisions. Does grandparenting cover only those jobs in a given line of progression or can employ-
ees move from one line of progression to another that requires the same basic skills and abilities?
As an example, cognitive ability tests such as reading, math, mechanical ability, and spatial ability
are relevant for a wide range of technical jobs. Thus, the question could be raised whether a current
employee working as a mechanic is able to move to an electrician apprenticeship program that uses
the same test battery without passing the test. Disputes regarding the breadth of the grandparenting
clauses are fertile ground for arbitrations. Furthermore, arbitrations addressing the extent of cover-
age of grandparenting provisions invariably also question the validity of the selection procedure.
Thus, great care should be taken to clearly establish and communicate the parameters under which
grandparenting will apply.
It is also important to address the impact of demotions and separations with respect to testing
requirements. For instance, if an employee takes a demotion as a result of a reduction in force, which
job progressions are available without requiring the selection procedure? Are supervisors who are
demoted back into the ranks required to qualify on a test that is used to select their former subordi-
nates? What happens to employees who leave the organization either voluntarily or involuntarily?
This situation may be addressed by including in the grandparenting policy a requirement of a cer-
tain period of time in the job or the job progression and evidence of satisfactory job performance.
In reality, the latter requirement is often more a matter of the existence of evidence that documents
unsatisfactory job performance. In the absence of such documentation, satisfactory performance is
assumed.

SUMMARY
The presence of a labor organization may reduce management’s control of the selection pro-
cedure, depending on the nature and extent of the concessions made by management in their
contract negotiations. Management’s resistance to selection procedures has at least three
754 Handbook of Employee Selection

components: even greater loss of control of the selection system, the need to take employees off
the job during test development, and the potential need to pay replacement workers overtime.
Labor organizations may resist selection procedures because they diminish the role of seniority
in selection decisions.
Selection procedures benefit management and labor organizations on issues such as employee
safety, reduction in adverse employment decisions, and test fairness. Finally, several operational
selection policies such as the jobs covered by the selection procedure and the grandparenting
provisions must normally be agreed to by both parties in the planning stage in order for develop-
ment to take place.

IMPLEMENTING A SELECTION PROCEDURE


There are at least three aspects to the implementation of a selection procedure: establishing the
policies surrounding its use; influencing constituents such as line management, labor organizations
and their members, labor relations and staffing personnel; and training testing professionals.

SELECTION POLICIES
Selection policies include such things as the specification of the jobs covered by the testing
procedure, grandparenting, duration of test qualifications over time, retest policies, test security
provisions, the nature of testing information provided to the selecting manager, and the nature of
feedback provided to the candidates.
When a selection procedure is developed in a union environment, policies such as the jobs cov-
ered by the selection procedure and the grandparenting provisions are generally established during
the planning stage to ensure worker participation. Other provisions such as the duration of test
qualifications over time, retest policies, test security provisions, the nature of testing information
provided to the selecting manager, and the nature of feedback provided to the candidates are gener-
ally decided at the time the selection procedure is implemented.
Decisions regarding the duration of test qualifications over time is influenced by the nature of
the test and data storage considerations. Test scores on cognitive ability tests have a relatively long
shelf life compared to performance on skills tests such as keyboarding skills, which may degrade
over time. It is not unusual to consider passing scores on cognitive ability tests as having an unlim-
ited life, particularly because large amounts of data may be stored relatively easily on computer
systems. Scores on skills tests are viewed as less stable by some organizations, whereas others
assume that individuals will be able to refresh their skills adequately upon assuming a position that
requires them.
Retest policies involve considerations of test security and the cost considerations related to test
administration. The interval between test administrations should be sufficiently long to avoid prac-
tice effects. The labor costs associated with retesting and the cost of the test materials themselves
influence retest policies, particularly when candidate scores vary little across test administrations.
Organizations that implement selection procedures must make provisions to ensure the security
of their investment. The physical storage of paper tests is relatively easy so long as access is limited
to those directly involved in the operation of the selection procedure. Storage of computer-based
tests is made more challenging in that the testing staff is often dependent on the technical skills of
information technology (IT) personnel. The most difficult of the security policies deals with access
by management. Management is naturally curious about the contents of the selection procedures,
and their knowledge of the content may jeopardize the integrity of the selection program. Managers
have been known to provide assistance to candidates whom they favor. As an example, a manager
who had access to tests in one organization provided forged answer sheets to the central office for
scoring. The “passed” prototype was used for favored candidates, and the “failed” prototype was
used for those that the manager did not want to employ.
Blue-Collar Selection in Private Sector Organizations 755

A related issue is the amount of information provided to the selecting manager regarding test
performance. Providing test scores may result in the de facto use of a rank-ordering of candidates.
Another serious concern with providing test scores to managers is that the scores may influence sub-
sequent employment decisions that go well beyond the validation study. For instance, supervisors
may be influenced by test scores when providing work assignments, developmental opportunities,
or deciding which employees will be affected by a reduction in force.
Organizations must also make decisions regarding the nature of the feedback provided to job
candidates. Candidate feedback can be as simple as a form letter or as complex as a feedback inter-
view. Form letters may be limited to whether the candidate was successful on a test or may provide
more detailed information such as performance on each component of the selection procedure.
Establishing policies and influencing constituents are interactive processes in that policies must
sometimes be adjusted to gain the support of affected parties.

INFLUENCING CONSTITUENTS
The introduction of a selection procedure by definition reduces the control of the selecting man-
ager by limiting the candidates that are advanced for consideration. Likewise, the introduction
of the selection procedure constrains the influence of seniority for labor organizations. Line
management’s and labor’s acceptance of a selection procedure in the development stage of the
process is no guarantee of their support at the implementation stage. During the development
stage, the selection procedure is an abstraction. At the implementation stage, the selection proce-
dure has taken form and its potential impact is fully recognized. This recognition may result in
the need to revisit selection policies. As an example, one organization initially placed a limit on
the number of retests permitted on a well-developed selection procedure, believing that retests
would provide little value in increasing the pool of qualified candidates. The backlash from the
labor organization, and by extension labor relations personnel, resulted in the removal of the
upper limit on retests as a condition of implementation.
One source of resistance to selection procedures is that labor leaders and labor organizations
themselves are subject to the whims of their membership. Labor leaders may be voted out of office,
and the need for the union itself may be called into question. Thus, the labor leaders and the labor
organization are keenly aware of the need to keep their membership satisfied. Furthermore, the
union can be sued for failure to represent the interests of its members. This concern often lies at the
heart of apparently frivolous grievances and arbitrations.

NEGOTIATING THE SELECTION PROCEDURE


In a represented environment, most organizations will confer with the labor organization on the
selection procedure to obtain their cooperation in the development and implementation regardless
of whether employee selection is reserved as a right of management.

TRAINING TESTING PROFESSIONALS


All testing professionals must be educated on the selection policies. In addition, test administrators
must be trained on the appropriate steps in administering and scoring a selection procedure as well
as the security procedures that are to be followed at each step in the selection process.

SUMMARY
Selection policies at the implementation stage tend to cover the duration of test qualifications over
time, retest policies, test security provisions, test information provided to the selecting manager,
and the nature of feedback to the candidates. Efforts to persuade management and labor of the
756 Handbook of Employee Selection

benefits of the selection procedure must continue at this stage because the abstract concept has now
become a reality. Testing professionals must be trained on the selection policies and the standard-
ized procedures for administering and scoring the selection procedure.

MAINTAINING A SELECTION PROCEDURE


The maintenance of a selection procedure is largely an issue of protecting it against challenges.
Challenges generally take the form of questioning the validity of the selection procedure or the
standards that distinguish successful from unsuccessful candidates. Other times, the selection poli-
cies themselves (e.g., retest periods, feedback) are under attack.

VALIDITY
Most challenges, whether the source is a government agency, a disgruntled applicant or employee,
line management, or a labor organization, will be waged on the validity of the selection procedure.
Carefully developed selection procedures by definition are valid at the time they are developed.
Over time, however, some selection procedures can become less effective. Frequently, there are
challenges in that the jobs have changed and that the selection procedures no longer measure the
knowledge, skills, and abilities that are required for successful job performance. There are certainly
cases where this allegation is true, and it is incumbent upon the psychologist to examine changes in
the job. However, the experience in the electric utility industry is that technical jobs have changed
little over time with regard to the basic job requirements. Most of the job changes have been minor
and involved ancillary responsibilities. For example, an employee might obtain job assignments
from a computer in the company truck rather than receiving them personally from a supervisor.
Abilities in areas such as basic mathematics, reading, and understanding mechanical concepts con-
tinue to be relevant for very long periods of time. Line management and labor organizations do not
always share this opinion when they have problems of their own.
Most of the challenges coming from line management are related in some fashion to meeting
immediate staffing requirements.

MAINTAINING ADEQUATE STAFFING LEVELS


Labor-intensive organizations are keenly aware of the need to maintain adequate staff. Anything
that hinders or blocks line management’s ability to maintain full employment will be the focus
of intense political pressure. It is difficult for even the most enlightened line manager to patiently
wait for qualified candidates to be identified when upper management is hammering him/her with
production quotas. Therefore, it is essential that line management understand the utility associated
with hiring better employees and recognize the selection procedure as a valuable tool in reaching
their goals.
When there is a shortage of qualified candidates, the blame will generally be placed on the selec-
tion procedure and/or the standards established. The common refrain is that, “the test is screening
out qualified candidates.” The real problem may be that candidates are not properly screened, effec-
tive recruitment sources are not being tapped, or the candidate of choice was not successful.
Working with recruiting staff to clarify the knowledge, skills, abilities, and other characteristics
necessary for successful job performance can improve the pass rates. Developing relationships with
technical schools can provide a source of talented candidates who have an interest in the available
jobs. Perhaps the only solution to the candidate of choice not being selected is to revert to the days
prior to the use of effective selection procedures, and a line manager sometimes suggests this.
Educating high-level line managers on the benefits of the selection procedure and enlisting the sup-
port of technical trainers can short-circuit some of the challenges before they have an opportunity
to spread.
Blue-Collar Selection in Private Sector Organizations 757

Line managers often come up with their own solutions to staffing problems. These generally take
the form of contractors and temporary assignments of current employees, lowering test standards,
reductions in retest periods, and providing more detailed feedback and developmental opportunities
for unsuccessful candidates.

CONTRACTORS
Contractors have long been used to meet short-term staffing needs caused by one-time planned
activities, unexpected events, or emergencies. The shortage of skilled workers has resulted in many
situations where the contractors have been retained for years and sometimes are performing the
same work as the employees.
This poses several legal difficulties, not the least of which is a threat to the selection procedure.
Most often the selection requirements of the contracting firms differ from those of the hiring orga-
nization. This creates a problem when the company’s selection procedures are required for current
employees but are not required for contractors who perform the same work. If the selection procedure
employed by the hiring organization has adverse impact, then the use of contractors who perform
the same work as current employees threatens the business necessity of the selection procedure. This
problem is made all the more difficult when the contractor who has been performing the same work as
the employees applies for a job with the company and is not successful on the selection procedure.
Two solutions that have been implemented to deal with these problems include limiting the range
of work performed by the contractors or requiring that contractors retained beyond a specified
period of time be required to meet the employee selection requirements. Restricting the work that
can be done by a contractor has the advantage of ensuring that only qualified employees perform the
more complex work. This approach has withstood legal challenges in the electric utility industry.
The second approach, requiring that long-term contractors be successful on the selection pro-
cedure, can be difficult to manage if the organization chooses to track employment time. Many
employers who adopt this approach require that all contractors qualify on the selection procedure
prior to stepping foot on company property. The benefit is that the company has some assurance that
the contractors hired have the ability to do the work and will be eligible for hire when they apply.
The disadvantage is that sometimes the need for workers exceeds the number of contractors who
have been successful on the selection procedure.
An additional concern frequently raised is the perception that testing contractors creates
co-employment issues. Co-employment refers to the sharing of employment responsibilities
(including compliance with employment laws) by the staffi ng agency and the company. For
example, the company and the staffing agency might jointly be held responsible for violations of
Title VII of the Civil Rights Act.

TEMPORARY ASSIGNMENTS
This situation is very similar to the contractor problem described above, except that the person
performing the work is a current employee. When the employee is denied a promotion to the job
performed for some period of time, the company may face a host of legal challenges, including
grievances, arbitrations, and lawsuits.
The solution applied most often is to apply the higher-level job standards to individuals who enter
a job progression. However, this is only possible when more than 50% of the employees hired into
the lower level position move to the higher-level job within 5 years.

CHANGING SELECTION POLICIES AND ADDING DEVELOPMENTAL OPPORTUNITIES


A common solution proposed by some line managers who are unable to meet their staffing levels
is to reduce the standards on the selection procedure. There are times when this is appropriate,
758 Handbook of Employee Selection

but it should be a last resort because it will be accompanied by a corresponding reduction in job
performance. Other times, line managers who are dissatisfied with the level of job performance of
their current workforce may request that the standards be raised. Careful review of pass rate infor-
mation in comparison to job openings and job performance information must precede changes in
the standards regardless of the direction.
Reducing the period between retests is also a popular idea but doing so increases the costs asso-
ciated with administering and scoring the selection procedure. Furthermore, the reliability valued
by psychologists is not always appreciated by line management and labor organizations when can-
didates are repeatedly unsuccessful. By contrast, organizations may choose to lengthen the retest
period when labor markets are more favorable.
Despite the best efforts of all concerned, occasions do occur when labor relations personnel
unwittingly negotiate problematic agreements. For instance, the authors are familiar with a cir-
cumstance where the wrong test was negotiated in a collective bargaining agreement. The situ-
ation was resolved by negotiating an addendum to the collective bargaining agreement for the
use of the correct test battery; the alternative could well have been a concession to eliminate the
testing requirement. Thus, the moral of this story is that labor relations personnel must be edu-
cated on testing issues to avoid inconvenient and counterproductive concessions during contract
negotiations.
Providing more detailed feedback, diagnostic testing, and providing practice tests will be
described later in the chapter.

REDUCTIONS IN FORCE
Reducing the number of employees is probably one of the most unpleasant activities of management
in that the affected employees often have done nothing to warrant this action. If there is a collective
bargaining agreement, then the displacement is governed by seniority. The employees with lower
seniority may be able to bump into other jobs in order to maintain their employment. Regardless of
the presence or absence of a labor organization, the reduction in force often impacts testing when
the job reassignment requires that the displaced employee pass an employment test before moving
to the position of choice.
The development of a single test battery for multiple jobs with similar requirements can alleviate
this problem. Oftentimes, the job a displaced employee wants to move to is covered by the same test
battery or a test battery measuring similar abilities.
When the reduction in force is substantial, senior employees who had previously been grandpar-
ented into their positions may have to test to move into another job progression. This depends on the
parameters of the grandparenting policy and, in the case of represented employees, the collective
bargaining agreement.

PLACEMENT OF EMPLOYEES UNABLE TO CONTINUE PERFORMING DEMANDING JOBS


Accidents, injuries, and the aging process may impact the ability of current employees to continue
to perform physically demanding jobs. Organizations are sensitive to the need to place such employ-
ees in jobs that capitalize on their expertise while minimizing the physical demands. The motiva-
tion is greatest for long-term employees or those injured on the job. Two difficulties are created by
this situation. First, the less physically demanding jobs are often viewed as attractive to many of
the employees, and the most senior employees generally are awarded such positions. Second, the
less physically demanding jobs may require skills and abilities that differ from those required in the
employee’s previous job.
Labor organizations generally will attempt to accommodate members who are no longer
able to perform their previous job because of physical limitations. The second problem is
more difficult in that the range of jobs requiring the same abilities, but are not physically
Blue-Collar Selection in Private Sector Organizations 759

demanding, are limited in number. Administrative jobs are attractive alternatives, but often
vary considerably in the skills and abilities required for successful performance. Refusing to
lower the standards for entry into these less demanding administrative jobs is a political night-
mare; however, defending the exception is often even more perilous, threatening the defensibil-
ity of the selection procedure in situations where other equally less able candidates are refused
employment.

MERGERS AND ACQUISITIONS


The employment selection procedure probably seems like the least of management’s concerns when
negotiating a merger or acquisition, but it does not remain out of the organization’s consciousness
for long. The trend is for organizations to attempt to provide uniformity throughout their organiza-
tion. This is a Herculean task under the best of circumstances. It is even more difficult when there
are different selection procedures in effect in the two organizations or when multiple labor organi-
zations are affected.
Most often, multiple systems remain in place while the organization decides which of the selec-
tion procedures it prefers to retain. In a represented environment, this decision may be influenced
by the organization’s beliefs regarding its ability to negotiate a change with the unions. In these
situations, it is critical that the psychologist develop and maintain a close relationship with the
labor relations group to ensure that future negotiations are in compliance with testing and legal
standards.

SUMMARY
Maintaining a selection procedure is primarily an issue of protecting it against a variety of chal-
lenges. Most of the challenges will be waged against the validity of the selection procedure and/
or the standards that differentiate successful candidates from less successful ones. Although many
technical jobs change very little over long periods of time, job changes must be monitored to assess
the impact on the validity of the selection procedure. The trigger point for management challenges
to the selection procedure generally involves difficulties in maintaining adequate staffing levels.
Interim solutions provided by line management include the use of contractors and temporary assign-
ment of current employees to different jobs. Both of these solutions have the potential of threatening
the business necessity of the selection procedure.
Some policies may need to be modified over time to preserve the selection procedure. The
standards for distinguishing successful from unsuccessful candidates may need to be adjusted,
either upward or downward based on the labor market and job performance. Retest periods are
also subject to changes depending on the labor market.
It is necessary that labor relations personnel be educated on testing issues to avoid inconvenient
and counterproductive concessions during contract negotiations. Reductions in force and the place-
ment of employees unable to continue performing their jobs can create havoc on existing selection
procedures when labor relations personnel are uninformed of testing requirements.

RECRUITMENT AND EMPLOYEE DEVELOPMENT


The ultimate value of a selection procedure is dependent on the ability of line management to staff
vacant positions with qualified candidates. Increasingly, locating qualified staff is becoming dif-
ficult regardless of race, gender, or ethnicity. Historically, the field has focused on identifying ways
to reduce adverse impact on selection procedures. An alternative is to deal with the problem on the
supply side by upgrading the skills of candidates who have not fully developed their abilities. Three
methods include the development of working relationships with technical schools, providing diag-
nostic testing, and making practice tests available to candidates.
760 Handbook of Employee Selection

RELATIONSHIPS WITH TECHNICAL SCHOOLS


Developing relationships with technical schools can dramatically improve the supply of qualified
candidates. The technical schools are extremely eager to develop such relationships and respond
well to guidance concerning the subjects that need to be emphasized. Often, the instructors in the
technical schools have previously worked in industry and have an understanding of what is required
to perform successfully on the job.
Preliminary research in the electric utility industry indicates that the pass rates of students grad-
uating from rigorous technical schools can exceed the pass rates from other recruitment sources
by as much as six times. When technical programs were not in existence in particular geographic
areas, companies partnered with community colleges to create them and provided instructors to
teach some of the classes. Technical programs developed to invigorate underdeveloped urban areas
are reported as having produced large numbers of qualified minority candidates who have become
valuable employees. These organizational initiatives are still in their infancy, but they show consid-
erable promise. Companies in the electric utility industry are convinced that working with the tech-
nical schools enhances the supply of qualified candidates, reduces the amount of training required
by the organization, and serves a public relations function.

DIAGNOSTIC TESTING
Most selection procedures are designed to deliver accurate recommendations on whether a candi-
date should proceed with the remainder of the selection process at a relatively low cost in the least
amount of time possible. These cost and time criteria do not lend themselves to providing in-depth
diagnostic information regarding the specific needs for improvement. Therefore, the quality of the
feedback available to candidates is somewhat limited.
Companies that have combined the diagnostic feedback with concerted remedial opportunities
at local community colleges are reporting positive results on pass rates. These efforts certainly war-
rant closer investigation to determine their efficacy.
At the very least, the development of diagnostic instruments to address the need for feedback can
be a powerful political force for improving employee reactions to the selection procedure and the
organization. One goal of the diagnostic test is to enhance retention of lower-level employees in the
organization. Labor leaders have embraced the process and negotiated its implementation in a num-
ber of their collective bargaining agreements. Providing diagnostic feedback to candidates, many
of whom are currently employed, shifts the burden to the employee to do something to address
developmental needs. Even if the employees do nothing to improve, their enhanced self-efficacy
does much to alleviate employee relations problems.
Whereas the diagnostic assessments and feedback have been concentrated on current
employees, efforts have also been directed to candidates outside of the organizations in the form
of practice tests.

PRACTICE TESTS
Many organizations that use pre-employment tests provide candidates with brochures describing
the testing process. Although testing brochures can help alleviate candidate anxiety, supplementing
the brochures with practice tests that provide information on how to solve the sample problems is
likely to increase pass rates.
The basis for believing that practice tests may be effective lies in the research of Clause, Delbridge,
Schmitt, Chan, and Jennings (2001). These researchers examined the relationship between test
preparation activities and employment test performance in situations where applicants were tested
on material to be learned during a given period of time. They found that self-efficacy and motiva-
tion were associated with greater efforts to plan and monitor test preparation activities, to engage
Blue-Collar Selection in Private Sector Organizations 761

learning strategies that required deeper information processing, and to devote more effort to the test
preparation process.
There is certainly room for debate regarding whether increased test scores following practice are
the result of candidates becoming familiar with test content or reflect actual enhancements in candidate
ability levels. In a meta-analysis of retest scores on cognitive ability tests, Hausknecht, Halpert, Di Paolo,
and Moriarty Gerrard (2007) found practice effects of about one quarter of a standard deviation between
the first and second test administration. In addition, these researchers found that coaching was associ-
ated with larger practice effects but that the effects of coaching deteriorated with time. Retests using the
identical test form were associated with greater practice effects than the use of an alternate form, and
shorter intervals between retests of the identical form were associated with larger practice effects.
The most important issue for organizations is the effect of practice tests on the ability of the test
to predict criteria of interest. Hausknecht, Trevor, and Farr (2002) investigated the effect of retests
on posthire training performance and turnover among candidates hired into law enforcement posi-
tions. After controlling for ability, a consistent, though small, effect size was found favoring the
training performance of candidates who were hired after retesting. Those candidates who were
hired after retesting also were more likely to remain with the organization than those candidates
who were hired after their first test. Candidates who retested to obtain positions were viewed as
being highly motivated and once hired, invested in their jobs. In an academic setting, Lievens,
Buyse, and Sackett (2005) found that the validity of a cognitive ability test in predicting grade point
average was unaffected by whether the first or second test scores was used.
Our contention is that the model proposed and supported by Clause et al. (2001) has application
to cognitive abilities testing when practice tests are made available. The effects of the test prepara-
tion are likely to be moderated by the varying levels of ability at the time the practice tests are made
available. High ability candidates will probably pass regardless of the availability of the resource,
and some low ability candidates may never be able to improve their abilities to the extent necessary
to be successful either on the test or the job. However, neither of these groups is really the focus of
the practice tests. The real goal is to provide an opportunity for individuals with modest abilities to
develop to the level necessary to pass the tests and to competently perform the jobs. These individu-
als may simply need to refresh abilities not recently used or may never have fully developed their
potential. The only thing that really matters is that those individuals who are motivated and willing
to put forth the required effort may be able to improve their abilities sufficiently to be successful on
the selection procedure and the job. The opportunity to prepare for test performance also becomes
an indirect measure of the candidates’ motivation. Although it would be desirable to extricate the
impact of initial ability versus motivation, this is not always viable in applied settings. Anonymity
has been promised to candidates in the electric utility industry to encourage the use of the practice
tests and make it impossible for organizations to use the results for selection. A variety of research
designs are under discussion to attempt to quantify the positive reports from the field.
Even if providing practice tests has no impact on actual test performance, the practice continues
to serve a public relations function. For candidates, practice tests may lead to enhanced self-efficacy
and perceptions of the test process as being fair. In the case of employees, labor leaders, and line
management, providing practice tests seems like the right thing to do. There are few things that
practitioners in technical environments do that result in this level of satisfaction by this broad a
swath of their constituents.
Anecdotal reports from staffing departments generally indicate improved pass rates on the selec-
tion procedure after the implementation of diagnostic feedback and the practice tests, and there have
been no corresponding complaints of decrements in the job performance of candidates hired after
these initiatives were implemented. Although the absence of complaints from line managers does
not constitute definitive evidence that the diagnostic tests and practice tests are functioning effec-
tively as developmental interventions, these managers have never been reticent about voicing dis-
satisfaction over the effectiveness of selection procedures. One of the benefits of these approaches is
that individuals and the organization share the responsibility for development.
762 Handbook of Employee Selection

SUMMARY
The ultimate value of a selection procedure is dependent on the ability of line management to staff
vacant positions with qualified candidates. Locating qualified staff is becoming increasingly difficult
regardless of candidate race, gender, or ethnicity. The field has traditionally focused on identifying
ways to reduce adverse impact on selection procedures. An alternative is to deal with the problem on
the supply side by upgrading the skills of candidates who have not fully developed their abilities.
Three methods that may be used to enhance talent acquisition include the development of working
relationships with technical schools, providing diagnostic testing, and making practice tests available
to candidates. Developing relationships with rigorous technical schools has increased pass rates
substantially. In addition, diagnostic tests and practice tests can be provided to help candidates
identify and address developmental needs. Practice tests are not without controversy; however, the
focus of these practice tests has been on skill acquisition rather than on test-taking strategies. The
candidates who are motivated to use the practice tests and are subsequently hired, may, as sug-
gested by the research of Hausknecht et al. (2002), go on to display other behaviors that are valued
by organizations. Anecdotal evidence suggests that the diagnostic feedback and the practice tests
uniformly have been associated with positive responses from recruiters, labor relations and staffing
staff, diversity committees, and labor organizations and warrants formalized investigation.

REFERENCES
Birkeland, S. A., Manson, T. M., Kisamore, J. L., Brannick, M. T., & Smith, M. A. (2006). A meta-analytic
investigation of job applicant faking on personality measures. International Journal of Selection and
Assessment, 14, 317–335.
Bownas, D. (2000). Selection programs in a union environment: A commentary. In J. Kehoe (Ed.), Managing
selection in changing organizations (pp. 197–209). San Francisco, CA: Jossey-Bass.
Cascio, W. F. (1982). Costing human resources: The financial impact of behavior in organizations. Boston,
MA: Kent.
Cascio, W. F., & Ramos, R. A. (1986). Development and application of a new method for assessing job performance
in behavioral/economic terms. Journal of Applied Psychology, 71, 20–28.
Clause, C. S., Delbridge, K., Schmitt, N., Chan, D., & Jennings, D. (2001). Test preparation activities and
employment test performance. Human Performance, 14, 149–168.
Douglas, E. F., McDaniel, M. A., & Snell, A. (1996). The validity of noncognitive measures decays when applicants
fake. Paper presented at the annual conference of the Academy of Management, Cincinnati, OH.
Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New York, NY: John Wiley & Sons.
Hausknecht, J. P., Halpert, J. A., DiPaolo, N. T., & Moriarty Gerrard, M. O. (2007). Retesting in selection:
A meta-analysis of coaching and practice effects for tests of cognitive ability. Journal of Applied
Psychology, 92, 373–385.
Hausknecht, J. P., Trevor, C. O., & Farr, J. L. (2002). Retaking ability tests in a selection setting: Implications
for practice effects, training performance, and turnover. Journal of Applied Psychology, 87, 243–254.
Hedge, J. W., Borman, W. C., & Lammlein, S. E. (2006). The aging workforce: Realities, myths, and implica-
tions for organizations. Washington, DC: American Psychological Association.
Hough, L. M. (1998). Effects of intentional distortion in personality measurement and evaluation of suggested
palliatives. Human Performance, 11, 209–244.
Hunter, J. E, & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin, 96, 72–98.
Hunter, J. E., Schmidt, F. L., & Le, H. (2006). Implications of direct and indirect range restriction for meta-
analysis methods and findings. Journal of Applied Psychology, 91, 594–612.
Jones, D. P., & Gottschalk, R. J. (1988). Validation of selection procedures for electric utility construction and
skilled trades occupations: Literature review and meta-analysis of related validation studies. Washington,
DC: Edison Electric Institute.
Lievens, F., Buyse, T., & Sackett, P. R. (2005). Retest effects in operational selection settings: Development and
test of a framework. Personnel Psychology, 58, 981–1007.
Prasow, P., & Peters, E. (1970). Arbitration and collective bargaining: Conflict resolution in labor relations.
New York, NY: McGraw-Hill.
Blue-Collar Selection in Private Sector Organizations 763

Pulakos, E. D., & Schmitt, N. (1996). An evaluation of two strategies for reducing adverse impact and their
effects on criterion related validity. Human Performance, 9, 241–258.
Ryan, A., Ployhart, R. E., & Friedel, L. A. (1998). Using personality testing to reduce adverse impact: A
cautionary note. Journal of Applied Psychology, 83, 298–307.
Schmidt, F. L., Hunter, J. E., & Pearlman, K. (1981). Task differences as moderators of aptitude test validity in
selection: A red herring. Journal of Applied Psychology, 66, 166–185.
Schmidt, F. L., Hunter, J. E., Pearlman, K., & Shane, G. S. (1979). Further tests of the Schmidt-Hunter Bayesian
Validity Generalization Model. Personnel Psychology, 32, 257–281.
Uniform Guidelines on Employee Selection Procedures. 29 CFR § 1607 et seq. (1978).
This page intentionally left blank
35 Selection for Service
and Sales Jobs

John P. Hausknecht and Angela M. Langevin

According to data from the Bureau of Labor Statistics (BLS), U.S. organizations currently employ
over 30 million workers in service and sales occupations (U.S. Department of Labor, 2007). Although
annual turnover rates can exceed 100% for some jobs in services and sales, even a conservative esti-
mate of 20% turnover reveals that U.S. organizations select over 6 million service and sales work-
ers each year. As such, many organizations have adopted formal assessment methods to improve
hiring decisions and ultimately increase organizational effectiveness. Research shows that the use
of validated selection tools as part of a broader, strategic approach to human resource (HR) manage-
ment is associated with higher productivity, lower employee turnover, and better corporate financial
performance (Huselid, 1995; Terpstra & Rozell, 1993). However, it is clear that not all selection
methods are equally effective, nor do research findings apply uniformly to all occupations.
This chapter provides a review of selection research for service and sales occupations and is
organized into three major sections. First, we describe the nature of service and sales work and
define the competencies that underlie success in these jobs. Second, we summarize past research
concerning the methods that have been used to select service and sales employees with attention to
issues of validity, applicant reactions, and adverse impact. Finally, we discuss the implications of
this body of work for practice and future research, highlighting several important but often over-
looked issues concerning selection system design for this critical segment of today’s workforce.

NATURE OF SERVICE AND SALES WORK


Companies rely upon their core service and sales workers to execute service-driven strategies and
place the organization’s products and services in the hands of customers and clients (Vinchur,
Schippmann, Switzer, & Roth, 1998). Service and sales jobs share many similarities because
service- and sales-related tasks can be found in both types of occupations and there is a large degree
of competency overlap (Frei & McDaniel, 1998). As detailed below, many of the similarities are
attributable to the high degree of interpersonal interaction with clients or customers that is required
in these jobs (Mount, Barrick, & Stewart, 1998).

MAJOR DUTIES AND RESPONSIBILITIES


Broadly defined, service work involves relational processes between service providers and custom-
ers. Unlike goods, services are relatively intangible, cannot be stored or transported, require the
participation of the customer, and because of changing situational demands, tend to be less stan-
dardized (Bruhn & Georgi, 2006; Schneider & White, 2004). BLS data show that service workers
have come to dominate the U.S. economy, as over 80% of jobs involve at least some aspect of service
provision as opposed to goods production. Some of the most common job titles for service workers

765
766 Handbook of Employee Selection

TABLE 35.1
Job Titles for Common Occupations in Services and Sales
Services Sales
Flight attendants Retail sales persons
Customer service representatives Real estate sales agents
Ticket agents and travel clerks Sales representatives
Tellers Telemarketers
Hotel, motel, and resort desk clerks Insurance sales agents
Waiters and waitresses Travel agents
Gaming service workers Advertising sales agents
Concierges Cashiers

Source: Information obtained from the O*NET database.

in the United States include customer service representative (approximately 2.2 million employees)
and waiter/waitress (2.4 million). Table 35.1 provides a sampling of these and other job titles com-
monly found within the service sector.
Occupational information from the O*NETTM database (http://www.onetcenter.org) reveals that
the core activities of service workers often involve (a) interacting directly with the public (i.e.,
customers), (b) processing customer requests (e.g., billing inquiries, food orders, bank deposits), (c)
soliciting sales of new products and services, and (d) routinely dealing with unpleasant and angry
people, such as when resolving complaints.
In contrast, the general nature of most sales work involves selling products and services to
customers, clients, or businesses. Approximately 14 million people held sales and related occu-
pations in 2007 (see Table 35.1 for a sample of common sales-related job titles). This group con-
sists largely of retail salespersons (4.4 million), cashiers (3.5 million), and sales representatives
(2.5 million). On the basis of O*NET information, the core activities of sales workers include
(a) locating new clients or customers, (b) determining customers’ needs, (c) providing informa-
tion about products or services (e.g., features, benefits, pricing), (d) convincing customers to
purchase products or services, (e) negotiating sale prices and terms, and (f) providing follow-up
services.

COMPETENCIES REQUIRED FOR SUCCESS


O*NET data reveal several competencies (i.e., knowledge, skills, abilities, and other characteristics
[KSAOs]) that underlie successful performance in common service and sales occupations. These
competencies and their O*NET definitions are summarized in Table 35.2.
For the knowledge dimension, understanding basic customer and personal service principles
and processes is necessary for both types of jobs, but importance ratings for this dimension are
generally higher for service occupations than for sales occupations. In contrast, knowledge of sales
and marketing concepts is essential for many sales jobs but is rated as much less important for
service positions. In terms of required skills, speaking, active listening, service orientation, and
social perceptiveness are critical for service and sales occupations. Time management and persua-
sion tend to be rated high in importance only for sales jobs. Analysis of the ability requirements
reveals that both types of occupations require high levels of oral expression and oral comprehension
ability. Examination of O*NET importance ratings for the final dimension—work styles—reveals
that conscientiousness and adjustment are rated highly for both types of occupations. Interpersonal
orientation is rated higher for service occupations, whereas achievement orientation is rated higher
for sales jobs.
Selection for Service and Sales Jobs 767

TABLE 35.2
Important Worker Requirements and Characteristics for Service and Sales Occupations
Worker Requirements Worker Characteristics
Knowledge Abilities
Customer and personal servicea: Knowledge of customer- Oral comprehension: The ability to listen to and understand
service principles and processes (e.g., customer needs information and ideas presented through spoken words and
assessment, quality service standards, evaluating customer sentences
satisfaction)
Sales and marketingb: Knowledge of principles and methods Oral expression: The ability to communicate information
for promoting and selling products and services (e.g., and ideas in speaking so that others will understand
marketing strategies, product demonstrations, sales
techniques)

Skills Work styles


Speaking: Talking to others to convey information Conscientiousness: Being dependable, reliable, attentive to
effectively detail, and trustworthy
Active listening: Giving full attention to what others are Adjustment: Poise, flexibility, maintaining composure, and
saying, taking time to understand points made, and asking dealing calmly with high-stress situations
questions as appropriate
Service orientation: Actively looking for ways to help Interpersonal orientationa: Being pleasant, cooperative,
people sensitive to others, and preferring to associate with other
organizational members
Social perceptiveness: Maintaining an awareness of others’ Achievement orientationb: Setting personal goals, persisting
reactions and understanding why they react as they do in the face of obstacles, and willing to take on
responsibilities and challenges
Time managementb: Managing one’s own time and the time
of others
Persuasionb: Persuading others to change their minds or
behavior

According to O*NET information defines worker requirements are defined as “descriptors referring to work-related attributes
acquired and/or developed through experience and education.” Worker characteristics are defined as “enduring char-
acteristics that may influence both work performance and the capacity to acquire knowledge and skills required for
effective work performance.”
a Rated as more important for service-related occupations.

b Rated as more important for sales-related occupations.

Source: Information obtained from the O*NET database.

CONTRASTING SERVICE AND SALES JOBS


Although there are many similarities between service and sales occupations, closer examination
of O*NET data reveals several notable differences in the degree to which certain characteris-
tics are deemed critical to successful job performance. When compared to service occupations,
sales employees must possess higher levels of initiative, persistence, persuasiveness, negotiation,
and time management. In contrast, service work requires higher levels of interpersonal orienta-
tion and greater knowledge of customer service principles, and the importance of sales and mar-
keting knowledge is somewhat diminished. More broadly, sales workers are rewarded differently
(e.g., commission-based pay) and tend to operate independent of supervision (Vinchur et al., 1998).
Despite these differences, the selection systems ultimately adopted for service and sales workers are
very similar. In the research review presented below, we do not make strong distinctions between
the two unless warranted. Instead, we organize the review around the competencies that have been
routinely assessed in past research.
768 Handbook of Employee Selection

RESEARCH ON SELECTION FOR SERVICE AND SALES WORKERS


It is clear from our review of selection research published over the last 50 years or so that there are
no simple solutions when it comes to designing selection systems for service and sales workers that
are valid, fair, legally defensible, and relatively simple to administer. The review emphasizes valid-
ity evidence to reflect the focus of past research and concludes with information regarding applicant
perceptions and adverse impact considerations.

SELECTION RESEARCH ON PERSONALITY AND PERSONALITY-RELATED CHARACTERISTICS


By far, most of the published literature on selection for service and sales workers involves per-
sonality assessment. This is perhaps not surprising given the interpersonal and motivational skills
required for success in these occupations (see Table 35.2). Although there are exceptions, most of the
published work in this area concerns assessment of the “Big Five” dimensions of personality using
self-report, paper-and-pencil inventories. A smaller number of studies examine personality dimen-
sions that are more narrowly defined or evaluate personality-related constructs that are developed
specifically for service or sales occupations. Although we generally restrict the focus to personality
measures used in service and sales domains, a broader discussion of personality and selection can
be found in Chapter 14, this volume.

Big Five Personality Dimensions


The dimensions of the Big Five (or Five-Factor Model) include agreeableness, conscientiousness,
emotional stability, extraversion, and openness to experience. Agreeableness is generally defined as
being flexible, trusting, cooperative, forgiving, and tolerant (Barrick & Mount, 1991; Vinchur et al.,
1998). Conscientiousness refers to one’s level of dependability, achievement-orientation, and perse-
verance (Barrick & Mount, 1991). Emotional stability, also referred to as neuroticism, encompasses
traits such as anxiousness, depression, anger, embarrassment, or insecurity (Barrick & Mount, 1991),
whereas extraversion assesses interpersonal interaction, tapping such traits such as assertiveness
and sociability (Vinchur et al., 1998). Finally, openness to experience refers to one’s propensity to
be imaginative, curious, intelligent, or artistically sensitive (Barrick & Mount, 1991). Many scales
have been developed to assess the Big Five, which often contain several hundred items.
Associations between Big Five personality dimensions and performance in sales jobs have been
summarized using meta-analysis. Vinchur et al. (1998) found average unadjusted correlations of
.03 (agreeableness), .11 (conscientiousness), .05 (emotional stability), .09 (extraversion), and .06
(openness to experience) when supervisor-provided ratings were the performance criterion. Effects
were somewhat larger after corrections for criterion unreliability and range restriction were applied
(r = .03 to .12). When examining objective sales performance as the criterion, average unadjusted
correlations of -.02 (agreeableness), .17 (conscientiousness), -.07 (emotional stability), .12 (extraver-
sion), and .03 (openness to experience) were found. Values were generally larger once corrected for
range restriction, particularly in the case of conscientiousness (.31) and extraversion (.22). Vinchur
et al. also reported relatively larger effects for those studies that used an alternative taxonomy
of personality dimensions. In particular, unadjusted validity coefficients for achievement (defined
as a subdimension of conscientiousness) and potency (subdimension of extraversion) as predictors
of supervisor ratings were .14 and .15, respectively (corrected values were .25 and .28, respec-
tively). When considering objective sales criteria, unadjusted validity estimates for achievement and
potency were .23 and .15, respectively (corrected values were .41 and .26). In service contexts, doz-
ens of studies (e.g., Avis, Kudisch, & Fortunato, 2002; Hurley, 1998; Hunthausen, Truxillo, Bauer, &
Hammer, 2003; Liao & Chuang, 2004; Mount et al., 1998) reveal correlations with job performance
ratings ranging from .09 to .20 (agreeableness), .11 to .33 (conscientiousness), .09 to .21 (emotional
stability), .07 to .26 (extraversion), and .09 to .20 (openness to experience). Differences in types of
jobs studied, the rating criteria adopted, and other study characteristics may explain the variability
Selection for Service and Sales Jobs 769

in effect-size estimates reported in these studies, but these moderators have not been empirically
evaluated to date.
In some studies, interactive effects among personality dimensions, moderating contextual
influences, and other design considerations have been found to account for an additional 2–9% of
the variance in performance ratings. Brown, Mowen, Donovan, and Licata (2002) studied frontline
restaurant service workers and found that customer orientation partially mediated the relationship
between certain personality traits (emotional stability, agreeableness, need for activity) and self- and
supervisor-provided performance ratings. The results indicated that customer orientation accounted
for an additional 2% of the variance in supervisor-reported performance and an additional 9% of the
variance in self-reported performance. In a selection context, such results show the potential value
of assessing certain traits (i.e., customer service orientation) in conjunction with more traditional
personality characteristics.
Research has also found that certain cognitive-motivational work orientations, specifically
accomplishment striving and status-striving, may mediate the relationship between certain person-
ality traits (i.e., conscientiousness and extraversion) and supervisor-rated job performance (Barrick,
Stewart, & Piotrowski, 2002). Barrick et al. sampled telemarketing sales representatives and found
that an individual’s orientation toward status-striving mediated the relationship between extraver-
sion and job performance such that individuals scoring higher on extraversion were more likely
to strive for status, which in turn resulted in higher supervisor ratings of effectiveness. Similarly,
individuals high in conscientiousness were more likely to strive for accomplishment, which led to
higher effectiveness ratings indirectly through status-striving.
Goal-setting behavior is another motivational variable that has been found to mediate the rela-
tionship between personality and job performance. Looking specifically at the personality trait
of conscientiousness, Barrick, Mount, and Strauss (1993) studied sales representatives of a large
appliance manufacturer and found that the relationship between conscientiousness and supervisor-
rated job performance was mediated by goal commitment and autonomous goal-setting, such that
individuals scoring high in conscientiousness were more likely to set and commit to goals, which
then led to increased job performance. The above studies illustrate the potential value of assessing
motivational variables in the process of selection, because they demonstrate how such variables may
impact the relationship between personality and job performance.
In terms of design, researchers have found that using supervisor, coworker, and customer ratings
of employee personality (rather than self-ratings alone) increases the total explained variance in
performance ratings by an additional 11–20% (Mount, Barrick, & Strauss, 1994). In addition, when
job performance is measured using more specific versus general job criteria, personality characteris-
tics appear to more accurately predict job performance ratings (Hogan & Holland, 2003). Regarding
personality measurement, Hunthausen et al. (2003) studied entry-level customer service managers
at a major airline and found that using an “at-work” frame of reference (i.e., asking respondents
to think about how they behave at work when responding to survey questions) resulted in stronger
relationships between two dimensions of the Big Five (extraversion and openness to experience) and
supervisory ratings of performance when controlling for cognitive ability.

Narrow Personality Traits


Although a large amount of research centers on broad measures of personality such as the Big Five,
researchers have also examined relationships between specific or narrow traits of personality and
job performance. In general, there is debate concerning whether broad or narrow measures of per-
sonality are best for predicting job performance. Although some contend that broad measures are
more successful at predicting overall performance (Ones & Viswesvaran, 1996), others maintain
that narrow measures account for more variance and argue that researchers should use narrow per-
sonality traits to predict specific aspects of job performance (Schneider, Hough, & Dunnette, 1996).
In doing so, criterion-related validity may be improved because the predictors (traits) are more
closely attuned to the criterion (job performance).
770 Handbook of Employee Selection

Although not as plentiful as the research involving broad traits, there is evidence supporting a
narrow-traits approach to studying job performance. A meta-analysis conducted by Dudley, Orvis,
Lebiecki, and Cortina (2006) found (in their overall analysis, which included all types of jobs)
that four narrow traits of conscientiousness (dependability, cautiousness, achievement, and order)
have incremental validity over the global conscientiousness construct in predicting performance.
Specifically, the narrow traits explained an additional 3.7% of variance in overall performance.
Breaking performance into more specific criteria, narrow traits explained an additional 5–26% of
the variance in specific aspects of job performance, such as task performance (4.6%), job dedication
(25.9%), interpersonal facilitation (5.8%), and counterproductive work behaviors (13.6%).
In addition, Dudley et al. (2006) examined the incremental validity of narrow traits of con-
scientiousness on the basis of occupational type. Jobs were divided into four occupation types:
sales, customer service, managerial, and skilled/semi-skilled. Across all occupational categories,
narrow conscientiousness traits were found to have incremental validity of 1–24% over the global
dimension. Although the incremental validity of narrow traits over the global trait was relatively
small for the customer service occupational group (1.2%), it rose to 5.4% for the sales group. The
managerial occupational group showed a 9.3% increase in variance explained, whereas the skilled/
semi-skilled group posted the largest increase at 24%. On the basis of these results, the authors
note that the degree of prediction offered by narrow traits depends in large part on the type of job
and the aspect of performance under study (Dudley et al., 2006). In the context of sales and service
selection, such results suggest that although the assessment of narrow conscientiousness traits may
be useful for selection of salespeople, such assessment may have less utility for those positions
with a customer service focus. Although further research is necessary to examine the utility of a
narrow traits approach to personality assessment (particularly for other personality dimensions),
initial results suggest the assessment of narrow traits may be useful in predicting performance for
certain jobs.

Service/Customer/Sales Orientation
Given the distinctive features of service and sales work, researchers have developed composite scales
to assess candidates’ dispositions toward customers, service, and/or sales. Sometimes referred to as
“criterion-focused occupational personality scales” (Ones & Viswesvaran, 2001), these self-report,
noncognitive composite measures typically assess a pattern of personality characteristics thought to
underlie successful performance in service and sales domains. Service orientation is one such con-
struct, and it is defined as a set of basic predispositions to provide helpful customer service, includ-
ing dimensions such as friendliness, reliability, responsiveness, courteousness, and cooperativeness
(Cran, 1994; Frei & McDaniel, 1998; Hennig-Thurau, 2004; Hogan, Hogan, & Busch, 1984).
Meta-analysis findings provide evidence of validity for service orientation measures. In a review
of 41 studies, and with supervisory performance ratings serving as the criterion, Frei and McDaniel
(1998) reported an unadjusted validity coefficient of .24. They also showed that service orientation
was moderately correlated (approximately .30 to .40) with several Big Five personality constructs
(agreeableness, emotional stability, and conscientiousness), sales drive, and social vocational inter-
ests. Service orientation was generally unrelated to extraversion, openness to experience, cognitive
ability, or other vocational interests. One caveat noted by Frei and McDaniel is that most of the coef-
ficients summarized in the meta-analysis were drawn from unpublished studies that were produced
by the test vendor. More recently, McDaniel, Rothstein, and Whetzel (2006) conducted a case study
of test vendor technical reports and found evidence of “moderate-to-severe publication bias” such
that two of the four test vendors studied showed a greater likelihood of reporting only statistically
significant validity coefficients for particular scales. A second concern is that researchers have
found that service orientation measures fare no better than general personality dimensions in pre-
dicting performance and do not predict service-focused criteria any better than they predict broader
criteria such as overall performance or counterproductive work behaviors (Ones & Viswesvaran,
2001; Rosse, Miller, & Barnes, 1991).
Selection for Service and Sales Jobs 771

Several measures have been developed to evaluate sales potential, customer-oriented sell-
ing orientation, or sales ability. These scales variously reflect composite measures of per-
sonality facets that are important for success in sales occupations (e.g., Hakstian, Scratchley,
MacLeod, Tweed, & Siddarth, 1997; Hogan, Hogan, & Gregory, 1992; Li & Wang, 2007),
self-assessments of behaviors taken when selling (Saxe & Weitz, 1982), or knowledge of basic
selling principles (Bruce, 1953, 1971, as cited in Vinchur et al., 1998). These studies gener-
ally find that sales potential is predictive of supervisory ratings and objective sales (Farrell &
Hakstian, 2001; Hogan et al., 1992; Li & Wang, 2007). Regarding selling/customer orienta-
tion, meta-analytic evidence from 19 studies reveals unadjusted validity coefficients of .17
for subjective performance measures and .06 for objective performance indicators, although
confidence intervals for the two criteria overlap (Jaramillo, Ladik, Marshall, & Mulki, 2007).
Vinchur et al. (1998) summarized the predictive validity of sales ability measures using meta-
analysis and reported unadjusted average correlations of .26 (supervisory performance ratings)
and .21 (objective sales).

SELECTION RESEARCH ON BACKGROUND, EXPERIENCE, INTERESTS,


AND OTHER LIFE HISTORY DIMENSIONS

In addition to personality testing, the other dominant approach to the selection of service and sales
workers involves systematic assessment of candidates’ personal histories using biodata inventories.
The most common approach has been to develop paper-and-pencil questionnaires that ask can-
didates about various domains such as work history, experience, interests, values, attitudes, and
leadership activities (e.g., Allworth & Hesketh, 2000; Jacobs, Conte, Day, Silva & Harris, 1996;
McManus & Kelly, 1999; Ployhart, Weekley, Holtz, & Kemp, 2003; Schoenfeldt, 1999; Stokes, Toth,
Searcy, Stroupe, & Carter, 1999). Regarding sales occupations, meta-analysis evidence reveals an
average unadjusted correlation of .31 between biodata and job performance ratings and .17 between
biodata and objective sales (Vinchur et al., 1998). Dalessio and Silverhart (1994) found that bio-
data predicted 12-month survival and first-year commissions among life insurance sales agents,
although effects tended to be smaller than those typically found for performance rating criteria.
Research also supports biodata as a predictor in customer service contexts. Allworth and Hesketh
(2000) found that a biodata inventory that measured experience with tasks and behaviors required
in service jobs provided incremental validity beyond cognitive ability and personality measures in
explaining supervisory performance ratings.
Although biodata inventories encompass multiple aspects of an applicant’s background, work
experience is one element of such inventories that deserves more detailed examination. Organizations
routinely advertise that “previous experience is required” for many service and sales jobs, but expe-
rience is rarely addressed in most validation studies. Drawing from two broader meta-analyses that
included (but were not limited to) sales and service jobs reveals some support for work experience
as a predictor of performance. Schmidt and Hunter (1998) reported an adjusted correlation of .18
between previous work experience (in years) and job performance. When work experience mea-
sures were categorized according to their level of specificity (task, job, and organization) and mea-
surement mode (amount, time, and type), researchers found adjusted correlations with performance
ranging from .16 to .43 (Quinones, Ford, & Teachout, 1995) and suggested that validity can be
maximized by measuring the amount of work experience and tailoring measures to the task level.
Although neither study was conducted with an exclusive focus on sales or service settings, other
research demonstrates the potential of assessing an applicant’s previous work experience in these
contexts. Allworth and Hesketh (2000) approached the construct of work experience by collecting
job requirements biodata from incumbents at an international hotel. This type of biodata asked
participants to gauge how much their previous or current jobs required them to enlist certain cus-
tomer service behaviors. Overall, the authors found that job requirements biodata accounted for
7.6% of unique variance in job performance. Further validation studies by Weekley and Jones
772 Handbook of Employee Selection

(1997, 1999) in multiple service contexts found correlations between previous work experience and
future performance that ranged from .14 to .19. Work experience was assessed using a multidimen-
sional measure that asked participants to report their total full-time work experience, maximum
tenure with any single organization, retail-specific work experience, number of different employers,
and tenure in last job.

SELECTION RESEARCH ON COGNITIVE ABILITY


Cognitive ability testing is somewhat of an enigma in the context of service and sales occupations.
Although cognitive ability is a strong predictor of performance for a wide range of jobs (Hunter &
Hunter, 1984; see also Chapter 12, this volume), research that is specific to service and sales occu-
pations yields mixed results. Some studies report finding no relationship between cognitive ability
and performance (Jacobs et al., 1996; Robie, Brown, & Shepherd, 2005), whereas others have found
statistically significant effects, with validity coefficients generally ranging from .10 to .25 (Allworth
& Hesketh, 2000; Avis et al., 2002; Cellar, DeGrendel, Klawsky, & Miller, 1996; Hakstian et al.,
1997; Miner, 1962; Rosse et al., 1991; Stokes, Hogan, & Snell, 1993; Weekley & Jones, 1997, 1999).
A meta-analysis of the cognitive ability-performance relationship for sales jobs in particular may
help explain these discrepant findings. Vinchur et al. (1998) found an unadjusted validity coef-
ficient of .23 for general cognitive ability when the criterion was supervisory ratings of job perfor-
mance (based on 22 studies) but only .02 when the criterion was objective sales volume (12 studies).
Unadjusted validity coefficients involving verbal ability and quantitative ability (two facets of gen-
eral cognitive ability) were generally low (-.17 to .08) and were largely based on a small number of
studies. Thus, variance in performance criteria, predictor dimensions, and sample characteristics
may account for the differences in effect sizes observed across studies. One final consideration is
that O*NET data for common sales and service occupations reveal importance ratings for problem-
solving and critical thinking skills that are comparably lower than those for social skills, which may
also explain why cognitive ability is not a stronger predictor of performance in service and sales
contexts. On the other hand, certain service and sales jobs may indeed require fairly high levels of
critical thinking and problem-solving, such as those that require consultative selling and ongoing
relationship management (e.g., pharmaceutical sales; see Ahearne, Bhattacharya, & Gruen, 2005).

SELECTION RESEARCH ON SITUATIONAL JUDGMENT


Situational judgment tests (SJTs) present candidates with various job-related scenarios and ask how
they would respond to each situation (McDaniel, Hartman, Whetzel, & Grubb, 2007; Weekley &
Jones, 1997). For example, candidates for service-related positions may be asked how they would
respond when a customer requests an item that the store does not carry (Weekley & Jones, 1999).
On the basis of scoring guidelines established during test development, responses are weighted
based on how well they match the judgment exercised by high-performing incumbents. Research
shows unadjusted validity coefficients averaging in the mid-.20s when SJTs are used to predict job
performance (McDaniel et al., 2007). Although this meta-analysis was not restricted to service and
sales research, the findings are consistent with individual studies conducted in service contexts that
have used a video-based mode of administration rather than paper and pencil (Cellar et al., 1996;
Weekley & Jones, 1997, 1999). These latter studies also show that SJTs offer incremental validity
over cognitive ability as a predictor of performance.

APPLICANT REACTIONS
In addition to validity concerns, it is important to consider how applicants will respond to differ-
ent selection procedures. Broadly speaking, research on applicant reactions involves understand-
ing candidates’ perceptions of the fairness and job relatedness of different selection procedures.
Selection for Service and Sales Jobs 773

The general arguments put forth in this area suggest that candidates who hold negative perceptions
of the selection process will be less attracted to the company, less likely to recommend the company
to others, and perhaps even less likely to perform well or remain on the job (Gilliland, 1993). Recent
reviews and meta-analytic evidence confirm many of these propositions with the exception of the
hypothesized performance and retention outcomes, which have yet to be systematically addressed
(Hausknecht, Day, & Thomas, 2004; Ryan & Ployhart, 2000).
When compared with a list of other possible selection methods, participants have among the least
favorable reactions to personality inventories and biodata, whereas reactions to cognitive ability testing
tend to be somewhat more positive but not as favorable as they are to interviews or work samples
(Hausknecht et al., 2004). We are not aware of any published work on applicants’ reactions to occupation-
specific inventories. Given their strong association with personality inventories, one might expect
reactions to be somewhat negative. However, because these tests have been designed for particular
applications in service and sales contexts, fairness and job relatedness perceptions may improve because
of the close connections to relevant aspects of the job. Smither and colleagues found that applicants’
perceptions were more positive for item types that were less abstract, suggesting that occupation-specific
predictors may fare somewhat better on this dimension (Smither, Reilly, Millsap, Pearlman, & Stoffey,
1993). Applicant reactions to SJTs have been studied infrequently, but evidence from Chan and Schmitt
(1997) indicated that reactions to a video-based SJT were favorable and comparable in magnitude to
those found for work sample tests in the Hausknecht et al. (2004) meta-analysis. Bauer and Truxillo
(2006) noted that reactions to SJTs may be dependent on the stimulus and response formats used (i.e.,
written vs. video, multiple-choice vs. open-ended) but suggested that reactions to SJTs overall should be
more favorable than reactions to selection procedures with more abstract content.

ADVERSE IMPACT
Given the legal context of selection and employment testing, concerns about adverse impact must
be given due consideration in selection system design and administration. Although a detailed
treatment of adverse impact research is beyond the scope of this chapter (see Hough, Oswald, &
Ployhart, 2001), several findings are summarized here concerning subgroup differences in test
scores for the predictor classes reviewed above. We note upfront that even small subgroup differ-
ences can produce adverse impact (as defined by the four-fifths rule), particularly as organizations
become more selective (Sackett & Ellingson, 1997). Further, adverse impact calculations involving a
small number of hires and/or low selection ratios tend to produce higher numbers of false positives,
meaning that adverse impact can be found even though subgroup differences are not statistically
significant (Roth, Bobko, & Switzer, 2006). Finally, it is important to point out that many of the esti-
mates reported below are based on reviews that include, but are not limited to, samples drawn from
service and sales domains. At this point in the literature, there are simply too few published studies
available to make definitive conclusions concerning adverse impact in sales and service settings.
Generally speaking, subgroup differences based on ethnic/cultural background, gender, and age
for Big Five personality measures tend to be minimal and, when found, are typically less than one-
tenth of a standard deviation. The largest effects have been found for measures of agreeableness
(women tend to score about four-tenths of a standard deviation higher than men) and emotional
stability (men tend to score about one-quarter of a standard deviation higher than women; Hough
et al., 2001). Subgroup differences have not been comprehensively assessed for measures of service/
sales/customer orientation, although the large overlap with personality constructs suggests that dif-
ferences would be relatively small. Hogan et al. (1992) examined archival data for a personality-
based sales potential inventory and found no differences when comparing scores across ethnic/
cultural and gender-based subgroups. Published data concerning subgroup differences for biodata
inventories in sales and service contexts are limited, although broader reviews find that the aver-
age performance for Whites is about one-third of a standard deviation higher than that for Blacks
(Bobko, Roth, & Potosky, 1999).
774 Handbook of Employee Selection

For measures of cognitive ability, the cumulative evidence (across all types of occupations) indi-
cates that Whites score approximately one standard deviation higher than Blacks, over one-half of a
standard deviation higher than Hispanics, and approximately two-tenths of a standard deviation lower
than Asians (Hough et al., 2001; Roth, Bevier, Bobko, Switzer, & Tyler, 2001). These estimates are
moderated by job complexity such that subgroup differences tend to be larger for less complex jobs.
Thus, given that many service and sales occupations are relatively low in complexity (see O*NET),
subgroup differences may be somewhat larger in these domains. Regarding age and gender differ-
ences, research shows that age and cognitive ability test scores tend to be negatively related, whereas
cognitive ability test performance does not generally differ between males and females (Hough et al.,
2001). Finally, research on subgroup differences for video-based and written SJTs shows that Whites
tend to score about four-tenths of a standard deviation higher than members of other ethnic/cultural
groups, whereas women tend to score slightly higher (approximately one-tenth of a standard devia-
tion) than men (Nguyen, McDaniel, & Whetzel, 2005, cited in Ployhart & Holtz, 2008). Potential
age-based differences for SJTs have not been reported in the published literature.
In summary, validity and adverse impact considerations often represent tradeoffs. Selection meth-
ods with strong evidence of predictive validity often share variance with cognitive ability, and cog-
nitively loaded measures tend to produce the highest levels of adverse impact. Pyburn, Ployhart, and
Kravitz (2008) termed this situation the “diversity-validity dilemma.” From a practical standpoint,
there are many strategies available to selection specialists who must balance diversity and validity
concerns, and the interested reader is directed to several recent papers that provide valuable critiques
of these various approaches (Aguinis & Smith, 2007; De Corte, Lievens, & Sackett, 2007; Kravitz,
2008; Ployhart & Holtz; 2008). One common conclusion from this line of research is that, to date,
there are no universal solutions that successfully maximize validity and eliminate adverse impact.

IMPLICATIONS FOR PRACTICE AND FUTURE RESEARCH


Despite the wealth of information available concerning service and sales selection, several oppor-
tunities remain to enhance our understanding of the factors that contribute to effective selection
in these domains. We raise several issues with regard to past research in terms of (a) the criteria
adopted, (b) the range of predictors studied, (c) the temporal perspectives addressed, and (d) the
levels of analysis considered.

CRITERION ISSUES
Much of the research reviewed here has included supervisory performance ratings as the sole cri-
terion. Although these ratings serve many important functions, a pessimistic view is that organi-
zations are not as interested in boosting the performance appraisal ratings of its members as they
are in increasing sales volume and service quality perceptions among customers. Objective sales
figures have obvious implications for an organization’s bottom line, and customer perceptions of
service quality are an important leading indicator of future sales and repeat business (Bowman
& Narayandas, 2004). Further, in the sales domain in particular, research has shown that validity
coefficients vary considerably across different criteria (Vinchur et al., 1998). Thus, organizations
that use cognitive ability tests, for example, may see no benefit in terms of enhanced sales volume
among new hires (but would identify candidates who will be rated highly by supervisors). Despite
the appeal of using objective sales criteria, such validation work requires adequate consideration
of situational opportunities that may influence performance (Stewart & Nandkeolyar, 2006). For
example, researchers have advocated for controlling geographic or territorial constraints such as
market potential, workload, company presence in a particular area, local economic conditions, and
other region-specific factors (Cravens & Woodruff, 1973; McManus & Brown, 1995).
In addition to considering alternative measures of job performance, researchers might broaden
the types of criteria examined in future research. Only a handful of studies reviewed here examined
Selection for Service and Sales Jobs 775

withdrawal behaviors or counterproductive work behaviors (e.g., Dalessio & Silverhart, 1994; Jacobs
et al., 1996; Ones & Viswesvaran, 2001). Given the significant costs associated with these outcomes,
it would be useful to broaden the scope of selection research by incorporating these criteria into
validity studies whenever possible.

PREDICTOR ISSUES
Almost exclusively, published research in this area tends to feature self-report, paper-and-pencil per-
sonality tests or biodata inventories. This work is valuable, but research must also respond to new and
different forms of assessment. For example, Winkler (2006) estimated that about 5% of organiza-
tions (e.g., Toyota, SunTrust Bank) are using technology to assess important competencies via online
job simulations. These interactive assessments place candidates in a virtual environment that mirrors
the work that they would be doing on the job and allows companies to assess important competencies
while providing a realistic preview of the work itself. Other forms of capturing live behavior (e.g.,
assessment centers) may also be appropriate for assessing service and sales candidates, although
little work has been published in this area (see Burroughs & White, 1996, for an exception).
The format of predictors is another important consideration, particularly as organizations consider
how to leverage technology when building selection systems. Technology-based selection measures
differ from their paper-and-pencil counterparts in several ways (Weekley & Jones, 1997, 1999; see
also Chapter 8, this volume), and suggest a different profile of considerations for organizations in terms
of costs, applicant reactions, administrative ease, and so forth. Until additional research examines
these alternative approaches in the context of what we already know, it is unclear what (if any) effect
these alternative forms of assessment have on selection outcomes in service and sales contexts.

TEMPORAL ISSUES
Another issue raised by this analysis is that we currently know very little about the role of time in
the selection process. Much of the research reviewed here uses concurrent (i.e., cross-sectional)
designs or time-lagged predictive designs with a fairly short temporal window (e.g., 6-month per-
formance review). Yet, recent explorations into predictors of performance trends suggest that past
findings may not readily generalize across time (Ployhart & Hakel, 1998; Stewart & Nandkeolyar,
2006; Thoresen, Bradley, Bliese, & Thoresen, 2004). For example, in a study of insurance sales per-
sonnel, Hofmann, Jacobs, and Baratta (1993) found that the performance of sales agents followed a
quadratic trend over time such that mean performance was initially positive and linear, then curved
asymptotically with time. The authors suggested that different skills and abilities may be predictive
of performance at early and later stages of the sales agents’ careers. Goal orientation was advanced
as a potential determinant of intraindividual performance trends, such that highly goal-oriented
individuals may be better equipped to learn from unsuccessful sales calls over time and more likely
to engage in the self-development activities that ultimately lead to improved performance.
Other researchers have shown that conclusions about personality-performance relationships dif-
fer when comparing cross-sectional and longitudinal designs such that certain personality charac-
teristics are more predictive of performance trends than they are of initial performance (Thoresen
et al., 2004), whereas others moderate the effect of situational opportunities on performance over
time (Stewart & Nandkeolyar, 2006). Conclusions about the predictive validity of cognitive ability
measures are also likely to be time-dependent in service and sales contexts. Keil and Cortina (2001)
found that validity coefficients decline with time, and although their review was not limited to sales
and service contexts, Cascio and Aguinis (2005) argued that task performance should be dynamic
in service contexts (thus making it more difficult to predict over time) because service workers
often have to adapt to new work processes as new products or services are introduced. These stud-
ies demonstrate that by focusing more closely on temporal dynamics, organizations can not only
select candidates who are likely to perform well soon after hire, but also identify those who have
776 Handbook of Employee Selection

the capacity to increase their performance over time or reach proficiency in a shorter period, both
of which are critically important to long-term organizational success.

LEVELS ISSUES
A final consideration is that nearly all of the studies reviewed here focus on selection at the indi-
vidual level of analysis. This reflects a long tradition in psychology of examining individual differ-
ence characteristics that might relate to individual job performance. However, selection researchers
have argued that the field needs to examine relationships at higher levels of analysis such as the
group, work unit, or organization (Ployhart, 2004, 2006). In one recent empirical example, Ployhart,
Weekley, and Baughman (2006) found unique personality-performance associations at individual,
job, and organizational levels and concluded that higher-level relationships may occur because cer-
tain personality factors are related to the teamwork and coordination behaviors critical for success
in service work.
Another multilevel study examining the impact of managerial personality traits and service
quality orientation on service climate found that a manager’s personality may play a role in shap-
ing service climate (Salvaggio et al., 2007). Core self-evaluations were administered to managers,
in which participants were asked to rate themselves on certain personality traits (i.e., self-esteem,
self-efficacy, neuroticism, etc.). Results indicated that managers with more positive self-evaluations
had higher service quality orientations, which in turn led to more positive service climates. As the
authors note, these results demonstrate the potential impact that individual managers’ personality
traits may have on the overall workplace service climate. Considering that service climate is posi-
tively related to sales volume via customer-focused citizenship behaviors and customer satisfaction
(Schneider, Ehrhart, Mayer, Saltz, & Niles-Jolley, 2005), such findings show that careful attention to
employee selection may be useful not only in predicting individual performance, but also more dis-
tal indicators of success for the work unit or organization. Taken together, these studies demonstrate
that multilevel approaches are valuable for addressing the long-standing question of how to improve
organizational effectiveness through selection (see also Chapter 9, this volume).

CONCLUSIONS
Service and sales workers represent a significant portion of the global workforce, and the eco-
nomic success of many organizations hinges upon their performance. Although much remains to be
learned, the research reviewed here shows that careful attention to selection system design provides
organizations with an opportunity to improve the overall quality of hiring decisions for service and
sales employees. Results clearly indicate that investments in formal selection methods improve the
odds of finding service and sales workers who will perform well on the job. The validity coefficients
discussed here are not large; however, they can translate into substantial benefits in terms of reduced
hiring and training costs, increased sales productivity, and better service quality. Combining the
results of these individual-level studies with what we are beginning to learn about similar relation-
ships at higher levels and over time shows that effective selection is a viable means by which orga-
nizations can generate an advantage over competitor firms.

REFERENCES
Aguinis, H. & Smith, M. A. (2007). Understanding the impact of test validity and bias on selection errors and
adverse impact in human resource selection. Personnel Psychology, 60, 165–199.
Ahearne, M., Bhattacharya, C. B., & Gruen, T. (2005). Antecedents and consequences of customer-
company identification: Expanding the role of relationship marketing. Journal of Applied Psychology,
90, 574–585.
Allworth, E., & Hesketh, B. (2000). Job requirements biodata as a predictor of performance in customer service
roles. International Journal of Selection and Assessment, 8, 137–147.
Selection for Service and Sales Jobs 777

Avis, J. M., Kudisch, J. D., & Fortunato, V. J. (2002). Examining the incremental validity and adverse impact
of cognitive ability and conscientiousness on job performance. Journal of Business and Psychology, 17,
87–105.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Barrick, M. R., Mount, M. K., & Strauss, J. P. (1993). Conscientiousness and performance of sales representa-
tives—Test of the mediating effects of goal-setting. Journal of Applied Psychology, 78, 715–722.
Barrick, M. R., Stewart, G. L., & Piotrowski, M. (2002). Personality and job performance: Test of the mediating
effects of the motivation among sales representatives. Journal of Applied Psychology, 87, 43–51.
Bauer, T. N., & Truxillo, D. M. (2006). Applicant reactions to situational judgment tests: Research and related
practical issues. In J. A. Weekley & R. E. Ployhart (Eds.), Situational judgment tests: Theory, measure-
ment, and applications (pp. 233–249). Mahwah, NJ: Erlbaum.
Bobko, P., Roth, P.L., & Potosky, D. (1999). Derivation and implications of a meta-analytic matrix incorporat-
ing cognitive ability, alternative predictors, and job performance. Personnel Psychology, 52, 561–589.
Bowman, D., & Narayandas, D. (2004). Linking customer management effort to customer profitability in
business markets. Journal of Marketing Research, 41, 433–447.
Brown, T. J., Mowen, J. C., Donovan, D. T., & Licata, J. W. (2002). The customer orientation of service work-
ers: Personality trait effects on self- and supervisor performance ratings. Journal of Marketing Research,
39, 110–119.
Bruhn, M., & Georgi, D. (2006). Services marketing: Managing the service value chain. Essex, England:
Pearson.
Burroughs, W. A., & White, L. L. (1996). Predicting sales performance. Journal of Business and Psychology,
11, 73–84.
Cascio, W. F., & Aguinis, H. (2005). Test development and use: New twists on old questions. Human Resource
Management, 44, 219–235.
Cellar, D. F., DeGrendel, D. J. D., Klawsky, J. D., & Miller, M. L. (1996). The validity of personality, service
orientation, and reading comprehension measures as predictors of flight attendant training performance.
Journal of Business and Psychology, 11, 43–54.
Chan, D., & Schmitt, N. (1997). Video-based versus paper-and-pencil method of assessment in situational judg-
ment tests: Subgroup differences in test performance and face validity. Journal of Applied Psychology,
82, 143–159.
Cran, D. J. (1994). Towards validation of the service orientation construct. Service Industries Journal, 14,
34–44.
Cravens, D. W., & Woodruff, R. B. (1973). An approach for determining criteria of sales performance. Journal
of Applied Psychology, 57, 242–247.
Dalessio, A. T., & Silverhart, T. A. (1994). Combining biodata test and interview information: Predicting deci-
sions and performance criteria. Personnel Psychology, 47, 303–315.
De Corte, W., Lievens, F., & Sackett, P. R. (2007). Combining predictors to achieve optimal trade-offs between
selection quality and adverse impact. Journal of Applied Psychology, 92, 1380–1393.
Dudley, N. M., Orvis, K. A., Lebiecki, J. E., & Cortina, J. M. (2006). A meta-analytic investigation of consci-
entiousness in the prediction of job performance: Examining the intercorrelations and the incremental
validity of narrow traits. Journal of Applied Psychology, 91, 40–57.
Farrell, S., & Hakstian, A. R. (2001). Improving salesforce performance: A meta-analytic investigation of
the effectiveness and utility of personnel selection procedures and training interventions. Psychology &
Marketing, 18, 281–316.
Frei, R. L., & McDaniel, M. A. (1998). Validity of customer service measures in personnel selection: A review
of criterion and construct evidence. Human Performance, 11, 1–27.
Gilliland, S. W. (1993). The perceived fairness of selection systems: An organizational justice perspective.
Academy of Management Review, 18, 694–734.
Hakstian, A. R., Scratchley, L. S., MacLeod, A. A., Tweed, R. G., & Siddarth, S. (1997). Selection of telemar-
keting employees by standardized assessment procedures. Psychology & Marketing, 14, 703–726.
Hausknecht, J. P., Day, D. V., & Thomas, S.C. (2004). Applicant reactions to selection procedures: An updated
model and meta-analysis. Personnel Psychology, 57, 639–683.
Hennig-Thurau, T. (2004). Customer orientation of service employees—Its impact on customer satis-
faction, commitment, and retention. International Journal of Service Industry Management, 15,
460–478.
Hofmann, D. A., Jacobs, R., & Baratta, J. E. (1993). Dynamic criteria and the measurement of change. Journal
of Applied Psychology, 78, 194–204.
778 Handbook of Employee Selection

Hogan, J., Hogan, R., & Busch, C. M. (1984). How to measure service orientation. Journal of Applied
Psychology, 69, 167–173.
Hogan, J., Hogan, R., & Gregory, S. (1992). Validation of a sales representative selection inventory. Journal of
Business and Psychology, 7, 161–171.
Hogan, J., & Holland, B. (2003). Using theory to evaluate personality and job-performance relations: A socio-
analytic perspective. Journal of Applied Psychology, 88, 100–112.
Hough, L. M., Oswald, F. L., & Ployhart, R. E. (2001). Determinants, detection and amelioration of adverse
impact in personnel selection procedures: Issues, evidence and lessons learned. International Journal of
Selection and Assessment, 9, 152–194.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin, 96, 72–98.
Hunthausen, J. M., Truxillo, D. M., Bauer, T. N., & Hammer, L. B. (2003). A field study of frame-of-reference
effects on personality test validity. Journal of Applied Psychology, 88, 545–551.
Hurley, R. F. (1998). Customer service behavior in retail settings: A study of the effect of service provider
personality. Journal of the Academy of Marketing Science, 26, 115–127.
Huselid, M. A. (1995). The impact of human resource management practices on turnover, productivity, and
corporate financial performance. Academy of Management Journal, 38, 635–672.
Jacobs, R. R., Conte, J. M., Day, D. V., Silva, J. M., & Harris, R. (1996). Selecting bus drivers: Multiple
predictors, multiple perspectives on validity, and multiple estimates of utility. Human Performance, 9,
199–217.
Jaramillo, F., Ladik, D. M., Marshall, G. W., & Mulki, F. P. (2007). A meta-analysis of the relationship between
sales orientation-customer orientation (SOCO) and salesperson job performance. Journal of Business
and Industrial Marketing, 22, 302–310.
Keil, C. T., & Cortina, J. M. (2001). Degradation of validity over time: A test and extension of Ackerman’s
model. Journal of Applied Psychology, 127, 673–697.
Kravitz, D. A. (2008). The diversity-validity dilemma: Beyond selection—the role of affirmative action.
Personnel Psychology, 61, 173–193.
Li, L., & Wang, L. (2007). Development and validation of the salespeople forced choice behavioral style test in
the information technology industry. Personality and Individual Differences, 42, 99–110.
Liao, H., & Chuang, A. (2004). A multilevel investigation of factors influencing employee service performance
and customer outcomes. Academy of Management Journal, 47, 41–58.
McDaniel, M. A., Hartman, N. S., Whetzel, D. L., & Grubb, W. L. (2007). Situational judgment tests, response
instructions, and validity: A meta-analysis. Personnel Psychology, 60, 63–91.
McDaniel, M. A., Rothstein, H. R., & Whetzel, D. L. (2006). Publication bias: A case study of four test vendors.
Personnel Psychology, 59, 927–953.
McManus, M. A., & Brown, S. H. (1995). Adjusting sales results measures for use as criteria. Personnel
Psychology, 48, 391–400.
McManus, M. A., & Kelly, M. L. (1999). Personality measures and biodata: Evidence regarding their incre-
mental predictive value in the life insurance industry. Personnel Psychology, 52, 137–148.
Miner, J. B. (1962). Personality and ability factors in sales performance. Journal of Applied Psychology, 46,
6–13.
Mount, M. K., Barrick, M. R., & Stewart, G. L. (1998). Five-factor model of personality and performance in
jobs involving interpersonal interactions. Human Performance, 11, 145–165.
Mount, M. K., Barrick, M. R., & Strauss, J. P. (1994). Validity of observer ratings of the Big 5 personality-
factors. Journal of Applied Psychology, 79, 272–280.
Ones, D. S., & Viswesvaran, C. (1996). Bandwidth-fidelity dilemma in personality measurement for personnel
selection. Journal of Organizational Behavior, 17, 609–626.
Ones, D. S., & Viswesvaran, C. (2001). Integrity tests and other criterion-focused occupational personal-
ity scales (COPS) used in personnel selection. International Journal of Selection and Assessment, 9,
31–39.
Ployhart, R. E. (2004). Organizational staffing: A multilevel review, synthesis, and model. Research in person-
nel and human resource management, 23, 121–176.
Ployhart, R. E. (2006). Staffing in the 21st century: New challenges and strategic opportunities. Journal of
Management, 32, 868–897.
Ployhart, R. E., & Hakel, M. D. (1998). The substantive nature of performance variability: Predicting interindi-
vidual differences in intraindividual performance. Personnel Psychology, 51, 859–901.
Ployhart, R. E., & Holtz, B. C. (2008). The diversity-validity dilemma: Strategies for reducing racioethnic and
sex subgroup differences and adverse impact in selection. Personnel Psychology, 61, 153–172.
Selection for Service and Sales Jobs 779

Ployhart, R. E, Weekley, J. A., & Baughman, K. (2006). The structure and function of human capital emer-
gence: A multilevel examination of the attraction-selection-attrition model. Academy of Management
Journal, 49, 661–677.
Ployhart, R. E., Weekley, J. A., Holtz, B. C., & Kemp, C. (2003). Web-based and paper-and-pencil testing of
applicants in a proctored setting: Are personality, biodata, and situational judgment tests comparable?
Personnel Psychology, 56, 733–752.
Pyburn, K. M., Ployhart, R. E., & Kravitz, D. A. (2008). The diversity-validity dilemma: Overview and legal
context. Personnel Psychology, 61, 143–151.
Quinones, M. A., Ford, J. K., & Teachout, M. S. (1995). The relationship between work experience and job
performance: A conceptual and meta-analytic review. Personnel Psychology, 48, 887–910.
Robie, C., Brown, D. J., & Shepherd, W. J. (2005). Interdependence as a moderator of the relationship between com-
petitiveness and objective sales performance. International Journal of Selection and Assessment, 13, 274–281.
Rosse, J. G., Miller, H. E., & Barnes, L. K. (1991). Combining personality and cognitive ability predictors for
hiring service-oriented employees. Journal of Business and Psychology, 5, 431–445.
Roth, P. L., Bevier, C. A., Bobko, P., Switzer, F. S., III, & Tyler, P. (2001). Ethnic group differences in cognitive
ability in employment and educational settings: A meta-analysis. Personnel Psychology, 54, 297–330.
Roth, P. L., Bobko, P., & Switzer, F. S., III. (2006). Modeling the behavior of the 4/5ths rule for determining
adverse impact: Reasons for caution. Journal of Applied Psychology, 91, 507–522.
Ryan, A. M., & Ployhart, R. E. (2000). Applicants’ perceptions of selection procedures and decisions: A critical
review and agenda for the future. Journal of Management, 26, 565–606.
Sackett, P. R., & Ellingson, J. E. (1997). The effects of forming multi-predictor composites on group differ-
ences and adverse impact. Personnel Psychology, 50, 707–721.
Salvaggio, A. N., Schneider, B., Nishii, L. H., Mayer, D. M., Ramesh, A., & Lyon, J. S. (2007). Manager per-
sonality, manager service quality orientation, and service climate: Test of a model. Journal of Applied
Psychology, 92, 1741–1750.
Saxe, R., & Weitz, B. A. (1982). The SOCO scale: A measure of the customer orientation of salespeople.
Journal of Marketing Research, 19, 343–351.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology:
Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262–274.
Schneider, B., Ehrhart, M. G., Mayer, D. M., Saltz, J. L., & Niles-Jolly, K. (2005). Understanding organization-
customer links in service settings. Academy of Management Journal, 48, 1017–1032.
Schneider, B., & White, S. S. (2004). Service quality: Research perspectives. Thousand Oaks, CA: Sage.
Schneider, R. J., Hough, L. M., & Dunnette, M. D. (1996). Broadsided by broad traits: How to sink science in
five dimensions or less. Journal of Organizational Behavior, 17, 639–655.
Schoenfeldt, L. F. (1999). From dust bowl empiricism to rational constructs in biographical data. Human
Resource Management Review, 9, 147–167.
Smither, J. W., Reilly, R. R., Millsap, R. E., Pearlman, K., & Stoffey, R. W. (1993). Applicant reactions to selec-
tion procedures. Personnel Psychology, 46, 49–76.
Stewart, G. L., & Nandkeolyar, A. K. (2006). Adaptation and intraindividual variation in sales outcomes:
Exploring the interactive effects of personality effects of personality and environmental opportunity.
Personnel Psychology, 59, 307–332.
Stokes, G. S., Hogan, J. B., & Snell, A. F. (1993). Comparability of incumbent and applicant samples for the
development of biodata keys—The influence of social desirability. Personnel Psychology, 46, 739–762.
Stokes, G. S., Toth, C. S., Searcy, C. A., Stroupe, J. P., & Carter, G. W. (1999). Construct/rational biodata
dimensions to predict salesperson performance: Report on the U.S. Department of Labor sales study.
Human Resource Management Review, 9, 185–218.
Terpstra, D. E., & Rozell, E. J. (1993). The relationship of staffing practices and organizational level measures
of performance. Personnel Psychology, 46, 27–48.
Thoresen, C. J., Bradley, J. C., Bliese, P. D., & Thoresen, J. D. (2004). The big five personality traits and indi-
vidual job performance growth trajectories in maintenance and transitional job stages. Journal of Applied
Psychology, 89, 835–853.
U.S. Department of Labor. (2007). Occupational employment and wages, 2007. Washington, DC: Author.
Vinchur, A. J., Schippmann, J. S., Switzer, F. S., & Roth, P. L. (1998). A meta-analytic review of predictors of
job performance for salespeople. Journal of Applied Psychology, 83, 586–597.
Weekley, J. A., & Jones, C. (1997). Video-based situational testing. Personnel Psychology, 50, 25–49.
Weekley, J. A., & Jones, C. (1999). Further studies of situational tests. Personnel Psychology, 52, 679–700.
Winkler, C. (2006). Job tryouts go virtual: Online job simulations provide sophisticated candidate assessments.
HR Magazine, 51, 131–134.
This page intentionally left blank
36 Selection in Multinational
Organizations

Paula Caligiuri and Karen B. Paul

Corporations’ competitiveness on a global scale is largely contingent on firms’ abilities to strategi-


cally adapt, reconfigure, and acquire the resources needed for the ever-changing global market-
place. Given that it is the people within organizations who sell and market, develop products, make
decisions, and implement programs, human resources (HR) are vital to the success of an organiza-
tion. The selection of human talent worldwide should be congruent with the organization’s strategic
plans, a manifestation of the firm’s strategic capabilities and values, and a means to facilitate the
successful implementation of the firm’s global business strategies.
This chapter is divided into three major sections. The first section begins with a discussion of
the fit between a firm’s business strategy and its employee selection systems. Three approaches to
employee selection systems in multinational organizations will be reviewed: centralized systems (for
greater global integration), localized systems (for greater local responsiveness) and synergistic sys-
tems (Bartlett & Ghoshal, 1989; Prahalad & Doz, 1987). In the second section, the major challenges
in assessing and selecting employees worldwide will be discussed from both cultural and national
systems perspectives. This section will emphasize the importance of the cross-cultural context with
respect to the effect of national culture on method of selection and assessment, culture’s influence on
the candidates’ reactions, and cross-national differences in HR systems affecting employee selec-
tion methods used (e.g., discrimination and privacy laws, unemployment rates, education systems).
Although the first two sections emphasize selection systems across countries within multinational
organizations, the third section focuses on the selection and assessment for individuals who work
globally, namely international assignees. In this last section, we discuss the various selection and
assessment methods that can be used for identifying international assignees.

EMPLOYEE SELECTION IN MULTINATIONAL ORGANIZATIONS


Multinational firms’ operations differ from the operations of domestic firms in terms of two complex
dimensions: geographic dispersion and multiculturalism (Adler, 2001). Geographic dispersion is the
extent to which a firm is operating across borders and must coordinate operations across borders in
order to be effective. Multiculturalism is the extent to which the workers, customers, suppliers, etc.,
are from diverse cultural backgrounds and the extent to which the organization must coordinate the
activities of people from diverse cultures in order to be effective. In leveraging both geographic dis-
persion and multiculturalism, transnational organizations must achieve a dynamic balance between
the need to be centralized, or tightly controlled by headquarters, and the need to be decentralized,
or operating differently across diverse locations (Bartlett & Ghoshal, 1989). The achievement of this
balance between centralization and decentralization can happen in various ways.
As a strategy, extreme centralization can provide an organization with a variety of competi-
tive benefits such as economies of scale (and the associated cost controls), improved value chain

781
782 Handbook of Employee Selection

linkages, product/service standardization, and global branding. Extreme decentralization, however,


can also be highly strategic, enabling a firm to modify products or services to fully meet local cus-
tomer needs, respond to local competition, remain compliant with various governments’ regulations
in different countries of operation, readily attract local employees, and penetrate local business net-
works. These two countervailing forces, centralization and decentralization, are also labeled global
integration and local responsiveness respectively (Prahalad & Doz, 1987).
To be successful, every multinational firm should adopt a strategy that “fits” the complexity of
its environment (Ghoshal & Nohria, 1993) and realities of its business. Although many compa-
nies pursue more integrated strategies, the realities of business operations require more local ones.
Research has found that each diverse business strategy implies a different approach to managing
HR (Caligiuri & Colakoglu, 2008; Caligiuri & Stroh, 1995). Three general multinational strategies
will be discussed with respect worldwide employee selection activities.

• Global—worldwide coordination and control


• Multidomestic—local responsiveness and flexibility
• Transnational—leveraging synergy worldwide

STANDARDIZATION OF EMPLOYEE SELECTION PRACTICES FOR


WORLDWIDE COORDINATION AND CONTROL
Organizations pursuing a global strategy face strong pressure for worldwide integration but face
weak pressure for local responsiveness (Ghoshal & Nohria, 1993). These firms organize their opera-
tions into worldwide lines of business that are managed and controlled by headquarters. This cen-
tral control is often critical because, in part, their basis for competition is a systematic delivery
of products and services around the world (Bartlett & Ghoshal, 1989). To maintain this strategic
capability of integration or standardization, key functions and tasks are managed and controlled
by headquarters; for example, customer expectations for consistency, such as outstanding qual-
ity (e.g., Sony), luxury fashion image (e.g., Louis Vuitton), or global standards for their fast food
(e.g., McDonald’s). The production workers with Sony must maintain worldwide quality standards,
regardless of where they are in the world. The sales agents with Louis Vuitton must provide world-
class customer service. The food preparation staff at McDonald’s must prepare food to the famous
global standards as well as have a janitorial staff to clean restrooms to a global standard of sanita-
tion and hygiene. In all of these cases, the standard is set forth by corporate and the uniformity is a
competitive advantage.
Given the emphasis around standards and consistency, global organizations will tend to have
centrally developed dimensions to be included in assessments, or possibly even centrally developed
selection systems. For example, a global fast-food restaurant chain is competitive, in part, by deliv-
ering consistency to its customers in terms of food, service, cleanliness, and restaurant appearance.
It follows that this same fast-food restaurant chain would include friendliness and personal hygiene
in their selection systems, regardless of country.
In technically oriented roles, in which international consistency is needed, selection dimensions
are more objective and relatively easy to maintain across cultures. In 3M, for example, a global
prehire test for sales representatives has been developed and is currently implemented in 20 coun-
tries with plans to use this common test globally to hire sales representatives in the future. The
test originally was developed to be in the local language, as well as to be done online so that it is
available regardless of the time zone where applicants are taking it. The idea was that 3M should
develop one test, enabling them to maintain the rights to it; this would obviate some issues in intel-
lectual property regarding test publishing, such as the difficulty of obtaining permission to translate
an existing test into a language or move it to a different system. As a result, part of 3M’s solution
was to create their own test using the sales competencies that were jointly developed with 3M Sales
and Marketing. The competency model for sales representatives globally (see Table 36.1) has been
Selection in Multinational Organizations 783

TABLE 36.1
Core Sales Competencies for 3M Sales Representatives
Competencies Skills
Core sales skills Selling process management
Account planning/account plan management
Opportunity identification
Solution creation
Negotiation persuasiveness
Industry and market knowledge
Business acumen skills Customer knowledge
Product and technical knowledge
Channel acumen
Coaching
Individual effectiveness skills Resource management
Problem solving/decision-making
Communication

integrated into 3M’s selection system for potential new hires and also the training and development
programs for incumbent sales representatives.
In developing this competency model as the basis for the common test globally, an international
job analysis was conducted to assess whether the content domain of the 3M sales representative
position was similar around the world. A job analysis questionnaire (JAQ) was administered to sales
representative and sales subject matter experts in ten countries. The JAQ assessed work behaviors
from the content domain for 3M sales representatives shown in Table 36.1. In 2006, 3M sales rep-
resentatives from Brazil, Russia, India, China, and Poland (labeled BRICP) completed the JAQ. In
2007, 3M sales representatives in Australia, Singapore, Taiwan, Japan, and Korea (labeled APAC)
completed the JAQ. Of the 13 work behavior dimensions shown in Table 36.1, the top 7 in terms of
importance are presented in Table 36.2. For each of the seven most important work behaviors, the
average importance rating is shown for both country sets. As Table 36.2 illustrates, the results of
this global job analysis found that the job content was the same around the world.

TABLE 36.2
Mean Importance Ratings for Work Behavior Dimensions: 3M Sales
Representatives’ Job Analysis Results Across Two Sets of Countries
Country Set
Work Behavior Dimensions APAC BRICP
Conduct sales and follow-up 3.7 3.7
Work with others 3.4 3.7
Provide information to customers and distributors 3.7 3.6
Plan, organize, and prioritize 3.5 3.6
Maintain knowledge and skills 3.4 3.5
Negotiate and persuade 3.4 3.5
Document work: keep records 3.3 3.5

APAC country set includes Australia, Japan, Korea, Singapore, and Taiwan. BRICP country
set includes Brazil, Russia, India, China, and Poland. Job analysis importance ratings obtained
on five-point scale (5 = highly important).
784 Handbook of Employee Selection

The greater challenge for global organizations is in maintaining consistency with more subjec-
tive dimensions of the type generally found in critical leadership roles (i.e., a global firm’s top
management team). It is critical for organizations to select leaders who have integrity, can work well
in teams, are committed, and are results-oriented. However, the interpretation of these dimensions
can vary greatly depending on the culture of the subsidiary location (Caligiuri & Day, 2000). There
are also practical challenges with headquarter-controlled international selection systems. Using the
same test across countries may be difficult for reasons ranging from possible culture-based interpre-
tations lowering the validity of the test to the basic logistics of testing. The challenge of maintaining
consistency in employee selection is discussed later in this chapter.

DIFFERENTIATION OF EMPLOYEE SELECTION PRACTICES


FOR LOCAL RESPONSIVENESS AND FLEXIBILITY

Organizations pursuing a multidomestic strategy face strong pressures for local responsiveness
and weak pressures for worldwide integration. Their structure resembles a decentralized group
of local firms and their HR policies differ across countries, conforming to local market demands.
Multidomestic organizations transfer almost all HR practices, including employee selection, to the
host national subsidiaries. Decisions are made at the subsidiary level regarding the way in which
employees should be selected, the method to be used in selection, the criteria for evaluation, and
the like.
The benefit of this strategy is that global firms are able to compete locally—and with local
knowledge, which may be especially important when a country has a unique infrastructure, market,
client base, governmental roles, etc. It follows that the localization of selection systems is best for
positions where a localization strategy is being deployed. The weakness of this strategy at the com-
pany level is that companies lose the economies of scale and their ability to maintain consistency
and standards around the world and the possibility for global talent management. For example, in
selection, multiple selection tests would need to be validated, and it would be impossible to have
cross-nationally comparability of candidates across countries.

INTEGRATION OF EMPLOYEE SELECTION PRACTICES TO LEVERAGE


THE BEST FROM AROUND THE WORLD

Organizations pursuing a transnational strategy face strong pressures for worldwide integration
and for local responsiveness. These organizations attempt to develop an interdependent global net-
work of subsidiaries that combines the benefits of worldwide integration and local responsiveness.
Subsidiaries are neither satellites of nor independent from headquarters, but rather are integral
parts of a whole system with both global and local objectives. Each subsidiary is believed to make
a unique contribution to this synergistic system. Decisions are a result of a collaborative process
between headquarters and subsidiaries. As such, employee selection systems in transnational orga-
nizations seek consistency around the world—based on strategic necessity—but are also culturally
acceptable across the participating countries. Many multinational organizations aspire to (or believe
themselves to follow) a transnational business strategy. As such, there is an increased pressure to
develop HR systems (and employee selection systems in particular), which are acceptable and syn-
ergistic across cultures.
In the case of 3M’s prehire sales test, their solution was hybrid—to standardize the test (on the
basis of the common competency model outlined in Table 36.1) but allow the countries’ HR depart-
ments the freedom to vary when and how the test was given. For example, in some countries it did
not make sense to offer the test at the beginning of the selection process, but rather a little bit later if it
was a particularly competitive job market, if it was in a more remote location, and so forth. By work-
ing very pragmatically, 3M came up with a variety of different approaches to implement the online
test to make sure that the process was really helping advance the cause of the country and company
Selection in Multinational Organizations 785

rather than something prescribed and imposed from corporate. In the end, 3M’s solution to global
testing was implemented, and the prehire test for sales representatives was rolled-out globally.
There are three challenges when developing candidate selection systems from a transnational
or synergistic perspective. The first challenge is determining selection dimensions that would be
applicable for candidates for the same positions across subsidiaries, regardless of country (Davis,
1998; Ployhart, Wiechmann, Schmitt, Sacco, & Rogg, 2003). This means that the content domain
is comparable across cultures within positions and that the selection systems based on the com-
mon content domain would have validity coefficients generalizable across countries (Lievens,
2007; Salgado & Anderson, 2002). Once the common content domain is determined, creating
conceptual equivalence in the assessment tools is the next and second challenge. This may include
everything from language comparability in selection tests to developing the behavioral indices of
various selection dimensions so that raters (e.g., interviewers, assessors in assessment centers) can
make cross-culturally comparable ratings or possibly even changing cut scores and norms within
countries to appropriate levels. The third challenge is determining the selection and assessment
methods that can be successfully implemented across cultures. This challenge includes gaining an
understanding of the candidates’ perspectives and how their reactions to the selection system may
vary depending upon their cultural perspectives.
These three challenges make the transnational strategy very difficult to maintain with respect to
selection and assessment. In practice, organizations that purport to have HR systems following from
a transnational business strategy tend to have a hybrid model in practice. They tend to either have
substantial influence from headquarters in setting and developing a selection system (acting much
like the global organizations) or they allow for a significant amount of cross-national adaptation to
a given countrywide system (acting much like a multi-domestic system). Given that many multina-
tional organizations desire to be in a synergistic state with respect to their HR systems (i.e., follow-
ing a transnational strategy), it is useful to discuss each of the three challenges in greater detail.

CHALLENGES OF CROSS-NATIONAL SELECTION AND ASSESSMENT


The previous section identified three challenges when developing international selection systems.
Although a thorough review of all of the measurement, methodological, and cultural issues embed-
ded in these challenges is beyond the scope of this chapter, the goal for this section is to highlight
some of the key issues applied psychologists and HR practitioners should consider when developing
international selection systems.

DETERMINING THE SELECTION CONSTRUCTS APPLICABLE ACROSS CULTURES


As with the development of selection systems in the domestic context, the first step is to deter-
mine the broad content domain for a given position—repeating this step across countries for the
same position to determine whether the jobs are, in fact, comparable. In validity language, the
selection systems (predictors) would need to tap the same performance domain across countries.
For example, Asian managers may emphasize cooperation and teamwork when assessing leaders,
whereas American managers may emphasize assertiveness and independence for the same leader-
ship position. In this example, the content domain for a given leadership role may not be conceptu-
ally equivalent across cultures.
In firms operating from either a global or a transnational perspective and transferring people
across borders, the conceptual equivalence and validity generalization challenge may be further
exacerbated. If a candidate is selected in one country (predicting a country-specific performance
domain) and transferred to another country, in the same role, where the performance domain may
differ (as in the leadership example above), the validity of the original selection system will be
lowered (Lievens, 2007). This issue is addressed again in the last section of the chapter when inter-
national assignments are discussed.
786 Handbook of Employee Selection

Many transnational firms have driving corporate cultural values that appear in managerial selec-
tion systems around the world. These corporate values may include dimensions such as managing
with integrity, taking appropriate risks, being customer-focused, being results-oriented, and the
like. After these broad performance dimensions are named, the challenge turns to creating concep-
tual equivalence for each dimension across cultures. Once this equivalence is established, selection
systems to assess candidates against these dimensions are created.

CREATING CONCEPTUAL EQUIVALENCE ACROSS CULTURES


Cultural values are socialized in each individual through various agents such as nationality, religion,
family, education, company, and profession. This foundation of individuals’ culture can influence the
sphere of work. Thus, individuals’ work-related values are formed when their overarching cultural
values are applied to the work situation (England & Harpaz, 1983; Hofstede, 1980). Comparative man-
agement researchers have found that individuals within one country will have more values in com-
mon compared to individuals from different countries (Hofstede, 1980), especially when corporate or
professional cultures are weak. In the context of this chapter, culturally laden work values can affect
the weight that one places on a particular selection dimension or the actual interpretation of the appli-
cants’ behaviors, creating a challenge for assessing candidates through a single cultural lens. Applied
psychologists and HR practitioners working internationally have been grappling with the challenge of
developing assessment and measurement methods that are conceptually comparable across cultures—
beyond a mere translation of words (e.g., Davis, 1998; Erez, 1994; Harpaz, 1996; Vandenberg & Lance,
2000; Ployhart et al., 2003; Ryan, Horvath, Ployhart, Schmitt, & Slade, 2000; Ryan, Chan, Ployhart,
& Slade, 1999; Riordan & Vandenberg, 1994; Vandenberg, 2002). In this context, the goal is to create
enough conceptual equivalence for comparisons of candidates to be meaningful.
Many researchers have examined methods to assess cross-cultural conceptual equivalence (e.g.,
Harpaz, 1996; Vandenberg & Lance, 2000; Riordan & Vandenberg, 1994; Vandenberg, 2002). By
definition, conceptual equivalence occurs when constructs have similar meanings across cultures
(Harpaz, 1996; Ryan et al., 1999). For example, customer service orientation may translate into
“complete attention to customers’ needs” in Japan where anticipating needs is important. However,
in Italy, where shopkeepers with exquisite taste are highly valued, customer service may mean
“providing honest feedback.” In this example, “customer service orientation” lacks conceptual
equivalence. However, in both Japan and Italy, the construct “expending effort for clients” may be
defined as working hard to find a desired item or to help a client resolve a problem. In this example,
“expending effort for clients” does possess conceptual equivalence. Maximizing conceptual equiva-
lence may be especially problematic when constructs in the content domain are more subjective and
less objective.
Some examples of the challenges of conceptual equivalence happen at the item level also. For an
item written through the lens of the 3M HR team in the United States, the alternative involved the
appropriateness of inviting a new client to lunch. The assumption of taking a new client to lunch is
within acceptable standard operating procedures for most U.S. sales representatives—yet in a differ-
ent cultural context, the same activity conveys a level of familiarity that is inconsistent with establish-
ing a new relationship, hence, making the response option cross-culturally less viable. In countries
such as Brazil, inviting a person to lunch implies a deeper level of the relationship that had not yet
been established between the potential new client and the sales representative. The option would not
be selected as written and was ultimately rewritten to reflect a universally appropriate response.

DEVELOPING CROSS-CULTURALLY ACCEPTABLE METHODS


FOR ASSESSING AND SELECTING CANDIDATES

Once the dimensions to be included in the selection system have been established, the next cross-
cultural concern would be the appropriateness of the assessment method and the logistics of those
Selection in Multinational Organizations 787

methods in a given cross-cultural context. With respect to logistics, testing assumptions need to be
questioned cross-culturally. For example, when 3M was rolling out their prehire sales test globally,
one of the basic assumptions made was that testing would be done in a room with multiple comput-
ers and a fairly controlled environment so that multiple applicants could simultaneously take the
online test. As it turned out, this was easier thought than done. First, for many of the 3M subsidiar-
ies around the world, they did not have an available testing room (i.e., an empty room with multiple
computers each with internet connections). Second, some of the subsidiaries had sales territories
that covered vast regions. If 3M was looking for sales representatives for a given region, they needed
to be able to connect with candidates in their remote locations. In Russia, for example, 3M needed
to be able to connect with candidates in more remote places such as Siberia. Practically, decisions
needed to be made regarding the appropriate distance for a candidate to need to travel to even take
the prehire test. Third, as 3M learned, the idea to have multiple applicants taking the test simultane-
ously in some countries was flawed. For some cultures, and in highly competitive job markets, it was
undesirable and discouraging for applicants to see how many people are competing. Furthermore,
in some cultures this kind of testing is culturally unacceptable. Even the idea of a controlled testing
room with a closed door in some small subsidiaries or in predominantly open-floor plans such as
Japan raised cross-national challenges.
With respect to testing methods, some methods are perceived more favorably by applicants
around the globe. Applicants across Western cultures perceive job-related interviews, résumés, and
work samples more favorably than testing (e.g., personality tests, cognitive ability tests) and refer-
ence checking (Steiner & Gilliland, 2001). Only one study has been conducted in the Asian con-
text: Comparing perceptions of selection methods in Singapore and the United States. Phillips and
Gully (2002) found similar results to those conducted in the Western context (reviewed by Steiner
& Gilliland, 2001). However, they did find that Singaporeans rated personality tests more favorably
than Americans.
Although applicant reactions to selection methods may be generally similar across countries,
their usage is not. Multicountry survey-based studies found that countries did vary significantly in
terms of employee selection procedures used (Ryan, McFarland, Baron, & Page, 1999; Shackleton &
Newell, 1997). Ryan et al. (1999) found that national-level cultural values, such as uncertainty avoid-
ance, predicted what selection procedures were more likely to be used across countries. Countries
higher in risk aversion were more likely to rely more heavily on interviews and testing, presum-
ably as a way of reducing hiring risks. Further research in the area of cross-cultural differences in
use and acceptance of selection methods is important to further understanding of global employee
selection methods and, hopefully, reduce resistance to them (for a review, see Lievens, 2007).
In the context of cross-cultural acceptability, it is important to consider how the selection is being
made and who is conducting the evaluation. Selection is most often conducted through supervi-
sors’ interviews of candidates. Interviews are especially challenging when supervisors from one
culture are rating candidates from another culture. In these cross-cultural rater-ratee dyads, ratings
could be biased by the degree of rater-ratee similarity measured in terms of demographic similarity
(e.g., Judge & Ferris, 1993; Tsui & O’Reilly, 1989). The degree of similarity between members of
a dyad—also referred to as relational demography—has been shown to be positively related to per-
ceived similarity and supervisors’ reported liking of a ratee (Wayne & Liden, 1995). Both of these
effects have the potential to favorably bias ratings.
In the context of performance evaluation (with comparable implications for candidate assess-
ment) in cross-national work settings, national similarity is a type of relational demography that
could affect ratings. Similarity of managers and their supervisors in terms of national background
has been shown to influence performance ratings (Caligiuri & Day, 2000). Caligiuri and Day (2000)
found that the bias of nationality might affect the performance dimensions differently—depending
on the type of dimension. They tested task and contextual performance dimensions and found that
rater nationality influenced the ratings of the more subjective contextual performance dimensions,
but not the objective task-based performance dimensions. This finding is consistent with research
788 Handbook of Employee Selection

indicating that less ambiguous performance standards increases rater-ratee agreement regarding
performance ratings (Schrader & Steiner, 1996). Given that the assessors’ cultural lens is a potential
for bias, supervisors should be trained on behavioral indicators for assessing employees’ perfor-
mance. A training intervention, such as frame-of-reference rater training (Bernardin & Buckley,
1981), could be highly effective at reducing the idiosyncratic standards of raters (Day & Sulsky,
1995). Frame-of-reference rater training would clarify standards across dimensions and across
cultures—especially for the more ambiguous subjective dimensions in the content domain.

NATIONAL DIFFERENCES IN SYSTEMS AFFECTING EMPLOYEE SELECTION


HR systems vary from country to country depending on some relatively fixed dimensions including
the given country’s work systems (Begin, 1992). These country-level factors may affect the practice
of employee selection across given countries as they affect employment laws, workforce compe-
tence, and availability of talent. Although not intended to be comprehensive, this section offers
some illustrative examples of the way in which countries’ work systems affect employee selection
internationally.
Countries differ with respect to laws governing the practice of employee selection. (See Chapter
30, this volume, for more details about national differences in legal issues concerning employee
selection.) For example, the United States has a body of laws stemming from the initial fair non-
discriminatory employment legislation covered in the Civil Rights Act of 1964, Title VII, the Age
Discrimination in Employment Act, and the Americans with Disabilities Act. As in the United
States, in almost every country, laws exist that define the type of firm that must abide by the given
law prohibiting discrimination (e.g., size of the firm, public or private sector) and defines who is con-
sidered protected under the given law (e.g., race, sex age, sexual orientation). In India, for example,
Article 15 of the Indian Constitution prohibits discrimination on the grounds of caste. Across these
laws around the world, most state that selection systems cannot discriminate against the target-
protected group; however, the way in which it is established, whether discrimination has occurred,
and the penalty for the violation of the law varies greatly from country to country.
Another legal issue affecting international selection is data privacy. For example, the European
Union (EU) Directive on Data Protection prohibits the transfer of personal information from Europe
to other countries unless an adequate protection of privacy, notice, and consent are given. This EU
Directive affects selection practices globally in the way data are collected and shared. Countries
also have their own privacy laws, as illustrated in the example of 3M in Poland. To implement the
prehire assessment sales test representatives in Poland, 3M had some added challenges. The Polish
Labor Code limits, in Article 2, the personal data that might be required by employer from the
candidate for employment. Those data are limited mainly to such items as name, surname, date of
birth, candidate education, and history of previous employment. In order not to be even remotely
viewed as risking violation, 3M chose not to require candidates to provide personal data other than
those specifically outlined in Article 22 of the Polish Labour Code. For compliance to the Polish
Act on Personal Data Protection, additional adjustments were made to comply with all regulations
in terms of demographics collected. For example, given that some information would reside on the
U.S.-based server, names needed to be removed from the information collected. Further, changes
were required given that the test was processed on a U.S. server, such as written (not electronic)
informed consents to be signed and collected before the start of the testing of each applicant. These
steps, among others, are examples of how cross-national differences in laws may affect the logistics
of the testing situation.
Countries vary in terms of their workforce competence, which, in turn, has an influence on com-
petence and readiness of candidates. Organizations such as the U.N. Educational, Scientific, and
Cultural Organization (UNESCO) and the International Archive of Education Data (IAED) report
large differences in literacy rates, education levels, and test scores across countries, which, in turn,
have implications for the quality of a given country’s workforce. Germany is considered to have
Selection in Multinational Organizations 789

one of the best trained workforces in the world, with an extensive apprenticeship program in which
employers help train students on their intended trades and professions.
Within-country selection systems rely on an ample supply of qualified talent against the orga-
nization’s demand for talent for a given job. Given that countries differ in labor economics, the
availability of talent will influence selection ratios, making selection systems more or less effec-
tive across the entire workforce strategy with the country. The general labor economics of a given
country or city affects the size and quality of the applicant pools. Supply of talent also affects the
concern companies will have for candidate reactions to their (even validated) selection methods.
For example, in both India and Poland, skilled labor is in high demand. Often a hiring manager
just wants someone to fill a position, without the extra hurdle of giving applicants a test, which
increases the time needed to make a hiring decision and could result in losing some viable candi-
dates. One of the ways that 3M accommodated this high demand for skilled labor in Poland and
India was to change the placement of testing in the selection process to be a later hurdle in the
process. The goal was to keep more qualified candidates in the pipeline for the interpersonally
interactive aspects of the selection system, such as the interview, and not turn them off with the
testing process. Testing was conducted after the relationship with 3M was built, which also ensured
top talent was selected.

INTERNATIONAL ASSIGNMENTS: IMPLICATIONS FOR


EMPLOYEE SELECTION
This chapter thus far has focused on employee selection systems and the challenges present in
multinational organizations—namely in developing worldwide selection systems consistent with
business strategy and the cultural context. We now shift from examining cross-national influences
on multinational companies’ selection systems to multinational companies’ selection systems for
those who must work in a cross-national context; namely, international assignees. This section of
the chapter focuses on the selection of international assignees, employees within a multinational
organization who are living and working in a foreign or cross-national context.
There are many challenges when developing selection systems for international assignee candi-
dates who will be living and working outside of their own national borders. International assignees
are nationals of one country who are sent by a parent organization to live and work in another
country. The definition of international assignees, for the purpose of this chapter, will be limited to
those who are sent for an assignment (rather than a self-initiated relocation) to another country for
at least 1 year. This section will accomplish three things. First, it will describe the various types of
international assignees found in most multinational firms; second, it will describe the individual-
level antecedents most important for inclusion in international assignee selection systems; and third,
it will cover process issues for international assignee candidate selection.

TYPES OF INTERNATIONAL ASSIGNMENTS


International assignments are as diverse as the people who occupy them. For a given company,
a British vice president of operations working in Brussels, a Swiss systems Engineer working in
Singapore, a French management trainee working in Canada, and a German electrical technician
working in China are all on international assignments. However, their job descriptions are very
different in terms of the technical skills and language skills needed to perform their jobs, their
hierarchical levels within the organization, their educational backgrounds, the assignment dura-
tions, and the like. The differences among international assignments have implications for the way
organizations select and place these individuals.
International assignments vary on two major continua. The first is the extent to which intercul-
tural effectiveness is needed in order to be effective on the assignment. The second continuum is the
extent to which a given global assignments is expected to be developmental. With respect to the first
790 Handbook of Employee Selection

continuum, some assignments will have an extensive need for intercultural effectiveness through
successful communication with host nationals for the assignments to be deemed successful; for
example, product managers (where knowledge of the local market through collaboration with host
nationals is critical), and regional managers (where most direct reports are host nationals). Others
may require substantially less intercultural effectiveness and rely on more technically based skills,
as in positions involving information technology, engineering, or physical plant.
With respect to the second continuum, multinational organizations are sending employees on
global assignments for the primary purpose of developing global competence or a global orienta-
tion, such as building their knowledge of the worldwide market, developing cross-cultural under-
standing, or the ability to speak a second language at work. The extent to which developmental
dimensions are formally recognized or the primary purpose as an expected outcome of a global
assignment will vary greatly depending on type of assignment.
These two continua form a basic classification system resulting in the following three general
categories of global assignments: (a) technical or functional assignments, (b) developmental or high-
potential assignments, and (c) strategic or executive assignments.

Technical and Functional Assignments


Technical and functional assignees generally are the most common global assignees in a typi-
cal transnational organization and represent almost all functional areas within the organization
(Caligiuri & Lazarova, 2001). Functional assignees are placed in international assignments when-
ever a technical expertise may be lacking in the host country, and they are needed to fill a skill
gap. For example, these individuals may be technical experts who are members of implementation
teams, operations managers who are overseeing manufacturing facilities, or quality engineers man-
aging supply chains. Given that they are primarily sent to fill a technical need, their professional
development is not the intended primary goal for the assignment.

Developmental or High Potential Assignments


For some firms, sending employees to another country to develop global competencies is consistent
with their overall strategic HR plan, in the context of managerial development. These programs are
often rotational—with one of the rotations being in another country. While on this type of assign-
ment, the worker’s goal is to develop professional, technical, and intercultural competencies. These
rotational assignments, often a part of a global leadership development program, include a very
structured series of developmental experiences, such as moving across functional areas, product
lines, business units, and geographical regions.

Strategic or Executive Assignments


These strategic assignments are usually filled by senior leaders (directors, vice presidents, general
managers) who are being developed for progressively higher-level executive positions. They are also
sent to fill specific needs in the organization, which may be entering new markets, managing joint
ventures, running a function within a foreign subsidiary, and the like. These individuals often need
a high level of intercultural sensitivity in order to be successful on their global assignments.

INDIVIDUAL-LEVEL ANTECEDENTS FOR


INTERNATIONAL ASSIGNMENT CANDIDATE SELECTION
As the previous section described, there are a variety of jobs (with corresponding diverse content
domains) that can be categorized as international assignments. Therefore, when thinking about
international assignee selection, unlike traditional selection, we are considering ways to predict suc-
cess within the job context (i.e., working in a foreign country), rather than job content in the tradi-
tional sense. In the research literature on international assignees, cross-cultural adjustment is most
often considered an important dependent variable when considering selection across assignee types
Selection in Multinational Organizations 791

given that adjustment (psychological comfort living and working in another country) is important
for almost all expatriates.
In meta-analysis of antecedents and consequents of expatriate adjustment, Bhaskar-Shrinivas,
Harrison, Shaffer, and Luk (2005) found language ability, previous overseas experience, withdrawal
cognitions, job satisfaction, and spousal adjustment were predictors of cross-cultural adjustment. In
another meta-analysis, Hechanova, Beehr, and Christiansen (2003) found self-efficacy, frequency
of interaction with host nationals, and family support were predictors of cross-cultural adjustment.
These meta-analyses also suggest that greater cross-adjustment in international assignees gener-
ally predicted greater job satisfaction, less strain, and higher levels of organizational commitment.
Another meta-analysis examining personality as predictors of expatriate performance (Mol, Born,
Willemsen, & Van Der Molen, 2005) found that extraversion, emotional stability, agreeableness,
and conscientiousness were predictive of expatriate performance. This same meta-analysis also
found cultural sensitivity and local language ability to also be predictive.
Across these meta-analyses, three categories of individual-level antecedents seem to emerge as
predictors of cross-cultural adjustment that would lend themselves to international assignee selec-
tion systems. They are personality characteristics, language skills, and prior experience living in
a different country (see Caligiuri & Tarique, 2006, for a review). On the basis of the domestic
literature, cognitive ability would seem that it should be another potential category, however, as a
predictor cognitive ability “has barely been investigated in the expatriate context” (Mol et al., 2005,
p. 614).

Personality Characteristics
Extensive research has found that well-adjusted and high-performing international assignees tend
to share certain personality traits (e.g., Black, 1990; Caligiuri, 2000a, 2000b; Church, 1982; Dalton &
Wilson, 2000; Mendenhall & Oddou, 1985; Mol et al., 2005; Shaffer, Harrison, Gregersen,
Black, & Ferzandi, 2006). Personality characteristics enable international assignees to be open
and receptive to learning the norms of new cultures, to initiate contact with host nationals, to
gather cultural information, and to handle the higher amounts of stress associated with the ambi-
guity of their new environments (Black, 1990; Church, 1982; Mendenhall & Oddou, 1985; Shaffer
et al., 2006)—all important for international assignee success. Research has found that five factors
provide a useful typology or taxonomy for classifying personality characteristics (Digman, 1990;
Goldberg, 1992, 1993; McCrae & Costa, 1987, 1989; McCrae & John, 1992), labeled “the Big Five.”
The Big Five personality factors are (a) extroversion, (b) agreeableness, (c) conscientiousness, (d)
emotional stability, and (e) openness or intellect.
Each of the Big Five personality characteristics relate to international assignee success in
a unique way (Caligiuri, 2000a, 2000b; Ones & Viswesvaran, 1997, 1999; Shaffer et al., 2006)
and should be included in a selection system for international assignees for different reasons (Van
Vianen, De Pater, & Caligiuri, 2005). For example, employees higher in conscientiousness are more
likely to become leaders, gain status, get promoted, earn higher salaries, etc. Studies in a domestic
context have found a positive relationship between conscientiousness and work performance among
professionals in the domestic work context (e.g., Barrick & Mount, 1991; Day & Silverman, 1989),
which was later generalized to the international assignment context (Caligiuri, 2000a; Ones &
Viswesvaran, 1997). On the basis of the meta-analysis conducted by Mol et al. (2005), the estimated
true population effect size for the relationship between conscientiousness and international assignee
success is positive (ρ = .17).
Personality characteristics related to relational skills (extroversion and agreeableness) enable
international assignees to form stronger interpersonal bonds with both host nationals and other
international assignees. On the basis of a meta-analysis, Bhaskar-Shrinivas et al. (2005) found that
relational skills are positively related to cross-cultural adjustment (ρ = .32). Extroverted individuals
are able to more effectively learn the social culture of the host country through their relationships
with local nationals and, in turn, report higher cross-cultural adjustment (Abe & Wiseman, 1983;
792 Handbook of Employee Selection

Black, 1990; Caligiuri, 2000a, 2000b; Mendenhall & Oddou, 1985, 1988; Searle & Ward, 1990).
More agreeable international assignees tend to deal with conflict collaboratively, strive for mutual
understanding, and are less competitive. They report greater cross-cultural adjustment and greater
success on the assignment (Bhaskar-Shrinivas et al, 2005; Caligiuri, 2000a, 2000b; Mol et al., 2005;
Ones & Viswesvaran, 1997; Black, 1990; Tung, 1981). Meta-analytic results (Mol et al., 2005) found
the estimated true population effect size for the relationship of international assignee success and
extroversion and agreeableness to be positive (ρ = .17 and .11, respectively).
Emotional stability is also important for international assignee success. Emotional stability is
the universal adaptive mechanism enabling humans to cope with stress in their environment (Buss,
1991). Given that stress is often associated with living and working in an ambiguous and unfamiliar
environment (Richards, 1996; Stahl & Caligiuri, 2005), emotional stability is an important person-
ality characteristic for international assignees’ adjustment to the host country (Abe & Wiseman,
1983; Black, 1988; Gudykunst, 1988; Gudykunst & Hammer, 1984; Mendenhall & Oddou, 1985)
and completion of an international assignee assignment (Ones & Viswesvaran, 1997). From the
meta-analysis conducted by Mol et al. (2005), the estimated true population effect size for the rela-
tionship between emotional stability and international assignee success is positive (ρ = .10).
Seeming to be the most intuitively necessary personality characteristic relating to international
assignee success is openness. For an international assignee, the ability to correctly assess the
social environment is more complicated given that the host country may provide ambiguous or
uninterpretable social cues (Caligiuri & Day, 2000). Successful international assignees must possess
cognitive complexity, openness, and intuitive perceptual acuity to accurately perceive and interpret
the host culture (Caligiuri, Jacobs, & Farr, 2000; Finney & Von Glinow, 1987; Ones & Viswesvaran,
1997). Openness should be related to international assignee success because individuals higher in
this personality characteristic will have fewer rigid views of appropriate and inappropriate contextual
behavior and are more likely to be accepting of the new culture (e.g., Abe & Wiseman, 1983; Black,
1990; Cui & van den Berg, 1991; Hammer, Gudykunst, & Wiseman, 1978). On the basis of the meta-
analysis conducted by Mol et al. (2005), the estimated true population effect size for the relationship
between openness and international assignee success is positive (ρ = .06); however, this relationship
was not significant, as the confidence interval included zero. The authors noted that “moderated
support was found for the relationship of openness” (p. 608), which is consistent with other
research. For example, Caligiuri (2000b) found moderated support for openness as a personality
characteristic relating to expatriate adjustment, such that greater contact with host nationals was
positively related to cross-cultural adjustment when an individual possesses the personality trait
of openness. In addition, the same meta-analysis (Mol et al., 2005) did find a positive relationship
between international assignee performance and cultural sensitivity (r = .24). Perhaps, depending
on measurement, cultural sensitivity may be assessing a facet of openness, which could have a linear
and positive relationship.
Collectively, these personality characteristics should be included in any selection program for
international assignees (Van Vianen et al., 2005). It is important to note that this type of employee
assessment would predict those who will do well adjusting to a cross-cultural job context. However,
this assessment does not predict success in the actual job tasks. Likewise, the absolute level of
each personality characteristic may be contingent upon the type of international assignment under
consideration. For example, the necessary level of relational skills might be important for all inter-
national assignees but higher for more senior executives who must network with, persuade, and
influence host nationals to be successful, compared with technical assignees, who may interact with
host nationals mostly around tasks with computer systems or equipment.

Language Skills
Many have noted a positive relationship between language skills and international assignee success
(Abe & Wiseman, 1983; Church, 1982; Cui & van den Berg, 1991; Shaffer, Harrison, & Gilley,
1999). In their meta-analytic studies, Mol et al. (2005) and Bhaskar-Shrinivas et al. (2005) found
Selection in Multinational Organizations 793

that local language ability is a positive predictor of international assignee success (as generally
defined by adjustment; ρ = .19 and .22). Mol et al. (2005) further noted that “more research may be
needed on the moderators of this relationship” (p. 609) between language skills and international
assignee performance. For example, some researchers suggest that language skills, which are neces-
sary for communication, are critical for cross-cultural adjustment. Others (e.g., Cui & van den Berg,
1991) suggest that there may be an interactive influence of language fluency (Shaffer et al., 1999):
Individual differences such as openness may interact with language fluency to positively influence
international assignee success (Cui & van den Berg, 1991). In other words, one could speak the
host language fluently and know the “correct” behaviors to display, and yet only superficially be
immersed in the host culture (Cui & van den Berg, 1991). Because it would be difficult for the oppo-
site to be true (i.e., that one could be immersed in a culture without language skills), basic language
skills should, at very least, be considered helpful. At a minimum, an attempt should be made to find
a qualified international assignee with language skills, for some positions the language skills may
be more critical than with others.

Prior International Experience


From a social learning perspective, the more contact international assignees have with host nation-
als and the host culture, the greater their cross-cultural adjustment (Bochner, Hutnik, & Furnham,
1986; Bochner, Mcleod, & Lin, 1971; Brislin, 1981). For example, past research has found that
having friendships with host nationals greatly improves international assignees’ ability to learn cul-
turally appropriate social skills and behaviors (Searle & Ward, 1990). From this perspective, more
prior experience with the host culture should produce greater cross-cultural adjustment.
On the other hand, the social cognitive theorists contend that prior foreign experience with the
host culture is positively related to adjustment provided that the experience does not serve to rein-
force previously held stereotypical beliefs or foster negative, unrealistic expectations of the for-
eign culture. Social cognitive proponents agree that there is a direct relationship between foreign
experience and cross-cultural adjustment when the experience provides an accurate and realistic
representation of the host countries’ norms, customs, values, etc. Bhaskar-Shrinivas et al.’s meta-
analytic results (2005) found that prior international experience was a weak but positive predictor of
interaction adjustment and work adjustment (ρ = .13 and .06, respectively). Further research should
examine whether a moderator exists, such as the quality of the prior international experience.

PRACTICES IN INTERNATIONAL ASSIGNEE SELECTION


There are three unique practices in the research literature regarding international assignee selec-
tion (Caligiuri & Tarique, 2006). The first is the application of realistic previews to international
assignments to help create realistic expectations during (or prior to) selection. The second is the
concept of self-selection, which enables international assignee candidates to determine whether the
assignment is right for his or her personal situation, family situation, career stage, etc. The third
is traditional candidate assessment, which would include many of the dimensions identified in the
previous section (personality, language skills, and past experience) in a structured organizational
selection program. Each of these three international assignment selection practices are discussed in
greater detail below.

Realistic Previews for International Assignments


Preconceived and accurate expectations prior to an international assignment have been shown to
influence the international assignment in many important ways (Caligiuri & Phillips, 2003; Searle &
Ward, 1990; Weissman & Furnham, 1987). Studies comparing international assignees’ expec-
tations prior to going abroad and their actual experience after relocation suggest that having
moderately accurate expectations facilitates cross-cultural adjustment (Searle & Ward, 1990;
Weissman & Furnham, 1987). Caligiuri and Phillips (2003) found that providing realistic
794 Handbook of Employee Selection

previews prior to international assignments did not change candidates’ interest in possible
assignments, but did increase candidates’ self-efficacy for an international assignment. This
self-efficacy, in turn, could influence the outcome of the international assignment.
Research and practice suggest that in the selection phase (or prior to it) it is useful for firms to
provide some information to assist candidates in making realistic decisions on whether an assign-
ment is right for them and to help them form realistic expectations about a possible international
assignment (Black, Gregersen, & Mendenhall, 1992; Caligiuri & Phillips, 2003; Tung, 1988). Many
firms have preselection programs that pair repatriates with international assignee candidates to
give international assignees the opportunity to form realistic expectations (Black, Gregersen, &
Mendenhall, 1992; Tung, 1988). Caligiuri and Phillips (2003) have found that self-directed realistic
previews are also highly effective in helping international assignee candidates form accurate per-
ceptions of the possible assignment.

Self-Selection
Given that the demographic profiles and personal situations of the international assignee candidates
will vary, self-assessment (or self-selection) has been found to be an effective method for sharing
realistic assessments in a tailored way (Caligiuri & Phillips, 2003). For example, an unmarried
person who is a candidate for an international assignment might have a different set of concerns,
compared with a married candidate with a family (Caligiuri, Hyland, Joshi, & Bross, 1998). Self-
assessment has been found to be useful because global assignment candidates actively self-assess
their fit with the personality and lifestyle requirements of the assignment (Caligiuri & Phillips,
2003). Effective self-selection tools enable international assignee candidates to critically evaluate
themselves on three critical dimensions: (a) personality and individual characteristics, (b) career
issues, and (c) family issues (including issues of spouses and children). Self-selection instrument,
acting as a realistic preview of the assignment, help employees make a thoroughly informed and
realistic decision about a global assignment (Caligiuri & Phillips, 2003). Many firms have found that
this self-assessment step fosters the creation of a candidate pool of potential international assignees.
This candidate pool can be organized to include the following pieces of information: the availability
of the employee (when and to what countries), languages the employee speaks, countries preferred,
technical knowledge, skills, and abilities, etc.

Candidate Assessment
Once the requirements of a given international assignment have been determined, many possibili-
ties exist to assess the candidates on job-related dimensions. Given that international assignments
are job contexts, rather than job descriptions and, therefore, require that the various international
assignments will require differential levels of relevant attributes (e.g., language fluency, openness).
For example, greater emphasis would be placed on personality characteristics (such as sociability
and openness) when assessing a candidate for a developmental or strategic assignment requiring
much more host national contact, compared to a more technical international assignment (Caligiuri
2000a, 2000b; Caligiuri & Tarique, 2006). In the best case, a thorough assessment can be conducted
through a variety of valid formal selection methods: paper-and-pencil tests, assessment centers,
interviews, behavioral observations, and the like. However, the reality is that most international
assignee selection generally happens using the most informal methods—recommendations of peers
or supervisors (Brewster & Harris, 1999).
Looking forward to best practice, two aspects of international assignee selection process have
shown promise but warrant further investigation. The first is to better understand ways to engage
employees early—even before an international assignment is available. The best candidates can
build their efficacy for the assignment when their decision-making processes are engaged before a
position becomes available (Caligiuri & Phillips, 2003). The second is to better understand ways to
effectively involve the family as early as possible in the selection process. Research has concluded
that each family member will positively or negatively influence the assignment (Caligiuri et al.,
Selection in Multinational Organizations 795

1998), so their influence should not be disregarded in the assessment phase. It is accepted that the
best selection decision will be mutual among the employees, their organizations, and their fami-
lies. Although the best case for international assignee selection is understood, the dynamic inter-
play among employees, families, and organizations—in terms of international assignment selection
decisions—are not yet thoroughly understood and warrant further research.
Most multinational companies acknowledge that the wrong person in an expatriate assignment
can result in poor job performance, early repatriation, anxiety or other emotional problems, and
personal and professional upheaval for accompanying family members. With the risks so high,
expatriate selection (designed to identify who will have the greater likelihood of success) is critical.
The efficacy of expatriate selection programs is challenged when transnational firms report (as they
often do) that there are not enough people to fill current expatriate assignments. The natural reac-
tion, in this case, is to believe that expatriate selection would not apply. However, ignoring proper
selection is extremely shortsighted given the risks to the firm and the individual if the global assign-
ment is unsuccessful. This reaction is especially limited given that when selection is thorough,
firms cast wider nets for possible candidates and generally find multiple candidates with a higher
probability of success. These comprehensive selection systems generally have four distinct phases
including self-assessment, the creation of a candidate pool, technical and managerial selection,
and placement. The placement in a host country will be most successful when agreement is mutual
among the candidate, the candidate’s family, the sending unit, and the host national unit.

CONCLUSIONS
This chapter covered the many challenges of developing international selection systems, namely
the transnational strategies affecting selection systems, the challenges of construct development
with respect to cross-cultural comparability, and selection issues international assignees. As the
need for strategically oriented and conceptually equivalent international selection systems contin-
ues to grow, so do the demands on HR professionals and applied psychologists to respond to this
complex need.
There are many dynamic changes happening today that will increase the need for and the ease of
adopting internationally integrated selection systems. For example, increasingly strong worldwide
corporate cultures, where employees globally share values and norms, may diminish the influ-
ence of national cultures. Strong global corporate cultures create a common frame-of-reference for
more subjective constructs and ease the integration of international selection systems. Subjective
constructs, such as “integrity,” “teamwork,” and “trust” will have a company-driven understanding
leveling any nationally driven cultural differences. This move to stronger corporate cultures will
increasingly ease of integrating international selection systems.
Although the technical issues of employee selection are important, the implementation of selec-
tion systems globally requires more than merely validating employee selection tests in different
countries. Employee selection tests are created and adopted by HR professionals located around the
world. These HR professionals, from different cultures and with different levels of knowledge of
the science and practice of employee selection, ultimately affect whether a given selection system
can be integrated globally. As described in this chapter, the concept of testing—and the very idea
of individual differences—varies from country to country. Likewise, the science of testing and
the level of acceptance of U.S.-oriented industrial-organizational psychology standards for practice
also vary from country to country. In some cultures testing is rooted in education (not industrial-
organizational psychology), where teachers create and give tests, assigning grades accordingly. Test
validation, in these cultures, would seem like a burdensome and unnecessary process. Creating
standards for practice for a company’s HR professionals globally is an important step to developing
selection systems that can be validated and accepted globally.
The future success of international employee selection may also be reliant on headquarters-based
HR professionals’ and industrial/organizational psychologists’ abilities to manage relationships
796 Handbook of Employee Selection

cross-nationally. Developing relationships with in-country HR leaders and line managers are criti-
cal for successful integration of selection systems. The in-country HR professionals will likely be
first to identify any country-specific problems and ways to eventually solve those problems. Because
this willingness to help relies on the goodwill of in-country HR professionals (some of whom may
initially need to be convinced that testing is appropriate), the ability for headquarters-based testing
professionals to develop respectful, collegial, and lasting relationships is critical.
Lastly, the future success of international employee selection may be determined by whether
the employee selection systems are integrated as part of a whole strategic HR system (or high-
performance work system). HR professionals would be only addressing part of the picture if
they developed employee selection systems in isolation. Ideally, selection and assessment should
be integrated with training and development, performance management, and reward systems.
Collectively, when these systems globally reinforce the predictors of performance in a compre-
hensive manner, the needle moves much quicker toward a high-performing globally competitive
organization.

REFERENCES
Abe, H., & Wiseman, R. (1983). A cross-culture confirmation of the dimensions of intercultural effectiveness.
International Journal of Intercultural Relations, 7, 5–67.
Adler, N. J. (2001). International dimensions of organizational behavior (4th ed.). Cincinnati, OH:
Southwestern.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Bartlett, C. A., & Ghoshal, S. (1989). Managing across borders: The transnational solution. Boston, MA:
Harvard Business School Press.
Begin, J. P. (1992). Comparative human resource management (HRM): A systems perspective. International
Journal of Human Resource Management, 3(3), 379–408.
Bernardin, H. J., & Buckley, M. R. (1981). Strategies in rater training. Academy of Management Review, 6,
205–212.
Bhaskar-Shrinivas, P., Harrison, D. A., Shaffer, M., & Luk, D. M. (2005). Input-based and time-based models
of international adjustment: Meta-analytic evidence and theoretical extensions. Academy of Management
Journal, 48(2), 257–281.
Black, J. (1990). The relationship of personal characteristics with adjustment of Japanese expatriate managers.
Management International Review, 30, 119–134.
Black, J. S., Gregersen, H. B., & Mendenhall, M. E. (1992). Global assignments: Successfully expatriating and
repatriating international managers. San Francisco, CA: Jossey-Bass.
Bochner, S., Hutnik, N., & Furnham, A. (1986). The friendship patterns of overseas students and host students
in an Oxford student resident hall. Journal of Social Psychology, 125, 689–694.
Bochner, S., McLeod, B. M., & Lin, A. (1977). Friendship patterns of overseas students: A functional model.
International Journal of Psychology, 12, 277–294.
Brewster, C., & Harris, H. (1999). International HRM: Contemporary issues in Europe. London, England:
Routledge.
Caligiuri, P. (2000a). The Big Five personality characteristics as predictors of expatriate success. Personnel
Psychology, 53, 67–88.
Caligiuri, P. (2000b). Selecting expatriates for personality characteristics: A moderating effect of personality on
the relationship between host national contact and cross-cultural adjustment. Management International
Review, 40, 61–80.
Caligiuri, P. (2006). Performance measurement in a cross-national context: Evaluating the success of global
assignments. In W. Bennett, D. Woehr, & C. Lance (Eds.), Performance measurement: Current perspec-
tives and future challenges (pp. 227–245). Mahwah, NJ: Lawrence Erlbaum.
Caligiuri, P., & Colakoglu, S. (2008). A strategic contingency approach to expatriate assignment management.
Human Resource Management Journal, 17, 393–410.
Caligiuri, P., & Day, D. (2000). Effects of self-monitoring on technical, contextual, and assignment-specific
performance: A study of cross-national work performance ratings. Group and Organization Management,
25, 154–175.
Selection in Multinational Organizations 797

Caligiuri, P., Hyland, M., Joshi, A., & Bross, A. (1998). A theoretical framework for examining the relationship
between family adjustment and expatriate adjustment to working in the host country. Journal of Applied
Psychology, 83, 598–614.
Caligiuri, P., Jacobs, R., & Farr, J. (2000). The attitudinal and behavioral openness scale: Scale development
and construct validation. International Journal of Intercultural Relations, 24, 27–46.
Caligiuri, P., & Lazarova, M. (2001). Strategic repatriation policies to enhance global leadership development. In
M. Mendenhall, T. Kuehlmann, & G. Stahl (Eds.), Developing global business leaders: Policies, processes,
and innovations, (pp. 243–256). Westport, CT: Quorum Books.
Caligiuri, P., & Phillips, J. (2003). An application of self-assessment realistic job previews to expatriate assign-
ments. International Journal of Human Resource Management, 14, 1102–1116.
Caligiuri, P., & Stroh, L. K. (1995). Multinational corporation management strategies and international human
resource practices: Bringing IHRM to the bottom line. International Journal of Human Resource
Management, 6, 494–507.
Caligiuri, P., & Tarique, I. (2006). International assignee selection and cross-cultural training and development.
In I. Björkman & G. Stahl (Eds.), Handbook of research in international human resource management
(pp. 302–322). London, England: Edward Elgar.
Church, A. (1982). Sojourner adjustment. Psychological Bulletin, 9, 540–572.
Cui, G., & Van den Berg, S. A. (1991). Testing the construct validity of intercultural effectiveness. International
Journal of Intercultural Relations, 15, 227–241.
Dalton, M., & Wilson, M. (2000). The relationship of the five-factor model of personality to job performance
for a group of Middle Eastern international assignee managers. Journal of Cross-Cultural Psychology,
18, 250–258.
Davis, D. D. (1998). International performance measurement and management. In J. W. Smither (Ed.),
Performance appraisal: State of the art in practice (pp. 95–131). San Francisco, CA: Jossey-Bass.
Day, D. V., & Silverman, S. (1989). Personality and job performance: Evidence of incremental validity.
Personnel Psychology, 42, 25–36.
Day, D. V., & Sulsky, L. M. (1995). Effects of frame-of-reference training and information configuration on
memory organization and rating accuracy. Journal of Applied Psychology, 80, 158–167.
Digman, J. (1990). Personality structure: The emergence of the Five-Factor model. Annual Review of
Psychology, 41, 417–440.
England, G., & Harpaz, I. (1983). Some methodological and analytic considerations in cross-national compara-
tive research. Journal of International Business Studies, Fall, 49–59.
Erez, M. (1994). Toward a model of cross-cultural industrial and organizational psychology. In H. C. Triandis,
M. D. Dunnette, & L. M. Hough (Eds.), Handbook of industrial and organizational psychology (Vol. 4,
pp. 559–608). Palo Alto, CA: Consulting Psychologists Press.
Ghoshal, S., & Nohria, N. (1993). Horses for courses: Organizational forms for multinational corporations.
Sloan Management Review, 2, 23–35.
Goldberg, L. (1992). The development of markers for the Big-Five factor structure. Psychological Assessment,
4, 26–42.
Goldberg, L. (1993). The structure of phenotypic personality traits. American Psychologist, 48, 26–34.
Gudykunst, W. (1988). Uncertainty and anxiety. In Y. Kim & W. B. Gudykunst (Eds.), Theories in intercultural
communication (pp. 123–156). Newbury Park, CA: Sage.
Gudykunst, W., & Hammer, M. (1984). Dimensions of intercultural effectiveness: Culture specific or cultural
general? International Journal of Intercultural Relations, 8, 1–10.
Hammer, M. R., Gudykunst, W. B., & Wiseman, R. L. (1978). Dimensions of intercultural effectiveness: An
exploratory study. International Journal of Intercultural Relations, 2, 382–393.
Harpaz, I. (1996). International management survey research. In B. J. Punnett & O. Shenkar (Eds.), Handbook
for international management research (pp. 37–62). Cambridge, MA: Blackwell.
Hechanova, R., Beehr, T. A., & Christiansen, N. D. (2003). Antecedents and consequences of employees’
adjustment to overseas assignment: A meta-analytic review. Applied Psychology: An International
Review, 52(2), 213–236.
Hofstede, G. (1980). Cultures consequences: International differences in work-related values. Thousand Oaks,
CA: Sage.
Judge, T. A., & Ferris, G. R. (1993). Social context of performance evaluation decisions. Academy of
Management Journal, 36, 80–105.
Lievens, F. (2007). Research on selection in an international context: Current status and future directions. In
M. M. Harris (Ed.), Handbook of research in international human resource management (pp. 107–123).
Mahwah, NJ: Lawrence Erlbaum.
798 Handbook of Employee Selection

McCrae, R., & Costa, P. (1987). Validation of the five-factor model of personality across instruments and
observers. Journal of Personality and Social Psychology, 52, 81–90.
McCrae, R., & Costa, P. (1989). More reasons to adopt the five-factor model. American Psychologist, 44,
451–452.
McCrae, R., & John, O. (1992). An introduction to the Five-Factor model and its applications. Journal of
Personality, 60, 175–216.
Mendenhall, M., & Oddou, G. (1985). The dimensions of expatriate acculturation. Academy of Management
Review, 10, 39–47.
Mendenhall, M., & Oddou, G. (1988). The overseas assignment: A practical look. Business Horizons, 31,
78–84.
Mol, S. T., Born, M. P., Willemsen, M. E., & Van Der Molen, H. T. (2005). Predicting expatriate job performance
for selection purposes: A quantitative review. Journal of Cross-Cultural Psychology, 36, 590–620.
Ones, D. S., & Viswesvaran, C. (1997). Personality determinants in the prediction of aspects of expatriate job
success. In Z. Aycan (Ed.), New approaches to employee management, (Vol. 4, pp. 63–92). Greenwich,
CT: JAI Press.
Ones, D. S., & Viswesvaran, C. (1999). Relative importance of personality dimensions for international assignee
selection: A policy capturing study. Human Performance, 12, 275–294.
Phillips, J. M., & Gully, S. M. (2002). Fairness reactions to personnel selection techniques in Singapore and the
United States. International Journal of Human Resource Management, 13, 1186–1205.
Ployhart, R. E., Wiechmann, D., Schmitt, N., Sacco, J. M., & Rogg, K. L. (2003). The cross-cultural equiva-
lence of job performance ratings. Human Performance, 16, 49–79.
Prahalad, C. K., & Doz, Y. L. (1987). The multinational mission: Balancing local demands and global vision.
New York, NY: Free Press.
Riordan, C. M., & Vandenberg, R. J. (1994). A central question in cross-cultural research: Do employees of
different cultures interpret work-related measures in an equivalent manner? Journal of Management, 20,
643–671.
Ryan, A. M., Chan, D., Ployhart, R. E., & Slade, L. A. (1999). Employee attitude surveys in a multinational
organization: Considering language and culture in assessing measurement equivalence. Personnel
Psychology, 52, 37–58.
Ryan, A. M., Horvath, M., Ployhart, R. E., Schmitt, N., & Slade, L. A. (2000). Hypothesizing differential item
functioning in global employee opinion surveys. Personnel Psychology, 53, 531–562.
Ryan, A. M., McFarland, L., Baron, H., & Page, R. (1999). An international look at selection practices: Nation
and Culture as Explanations for variability in practice. Personnel Psychology, 52, 359–391.
Salgado, J. F., & Anderson, N. R. (2002). Cognitive and GMA testing in the European Community: Issues and
evidence. Human Performance, 15, 75–96.
Schrader, B. W., & Steiner, D. D. (1996). Common comparison standards: An approach to improving agreement
between self and supervisory performance ratings. Journal of Applied Psychology, 81, 813–820.
Searle, W., & Ward, C. (1990). The prediction of psychological and sociocultural adjustment during cross-cultural
transitions. International Journal of Intercultural Relations, 14, 449–464.
Shackleton, V., & Newell, S. (1997). International assessment and selection. In N. Anderson & P. Herriot
(Eds.), International handbook of selection and assessment (pp. 82–95). New York, NY: Wiley.
Shaffer, M. A., Harrison, D. A., & Gilley, K. M. (1999). Dimensions, determinants and differences in the expatriate
adjustment process. Journal of International Business Studies, 30, 557–581.
Shaffer, M. A., Harrison, D. A., Gregersen, H., Black, J. S., & Ferzandi, L. A. (2006). You can take it with you:
Individual differences and expatriate effectiveness. Journal of Applied Psychology, 91, 109–115.
Stahl, G., & Caligiuri, P. M. (2005). The relationship between expatriate coping strategies and expatriate adjust-
ment, Journal of Applied Psychology, 90, 603–616.
Steiner, D. D., & Gilliland, S. W. (2001). Procedural justice in personnel selection: International and cross-cultural
perspectives. International Journal of Selection and Assessment, 9, 124–137.
Tsui, A. S., & O’Reilly, C. A., III. (1989). Beyond simple demographic effects: The importance of relational
demography in superior-subordinate dyads. Academy of Management Journal, 32, 402–423.
Tung, R. (1981). Selection and training of personnel for overseas assignments. Columbia Journal of World
Business, 16, 21–25.
Tung, R. L. (1988). Career issues in international assignments. Academy of Management Executive, 2,
241–244.
Van Vianen, A. E. M., De Pater, I. E., & Caligiuri, P. M. (2005). Expatriate selection: A process. In A. Evers,
O. Smit-Voskuyl, & N. Anderson (Eds.), The handbook of personnel selection (pp. 458–475). Oxford,
England: Blackwell.
Selection in Multinational Organizations 799

Vandenberg, R. J. (2002). Toward a further understanding of an improvement in measurement invariance


methods and procedures. Organizational Research Methods, 5, 139–158.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature:
Suggestions, practices, and recommendations for organizational research. Organizational Research
Methods, 3, 4–69.
Wayne, S. J., & Liden, R. C. (1995). Effects of impression management on performance ratings: A longitudinal
study. Academy of Management Journal, 38, 232–260.
Weissman, D., & Furnham, A. (1987). The expectations and experiences of a sojourning temporary resident
abroad: A preliminary study. Human Relations, 40, 313–326.
This page intentionally left blank
37 Selection
Membership
for Team

A Contingency and
Multilevel Perspective
Susan Mohammed, Jan Cannon-Bowers,
and Su Chuen Foo

Working in a team environment calls for a different set of skills and motivations than those required
in traditional workplaces. A premium is placed on the ability to learn, teamwork and collaboration, a
continuous improvement ethic, high initiative, a focus on the customer, and problem-solving skills….
Making sure you have the best people on board will dramatically increase your chances for a successful
team implementation.
Wellins, Byham, & Dixon
Inside Teams (1994, p. 322)

Why do some teams succeed and others fail? This fundamentally important question has been
investigated for decades (e.g., Heslin, 1964; Hackman, 1987; Kozlowski & Ilgen, 2006), and the
evolving answer is multifaceted and complex, involving a wide range of constructs and their inter-
actions. Nevertheless, team scholars generally agree that selecting the right team members is a key
variable in the team effectiveness equation. Indeed, most team performance models highlight the
critical role of member characteristics (e.g., Gladstein, 1984; Hackman, 1987; Tannenbaum, Salas, &
Cannon-Bowers, 1996), and the popular Input-Process-Outcome framework of team studies begins
with the expertise, abilities, and personality traits that members bring to the group (McGrath, 1984).
Selection interventions are expected to enhance team effectiveness because they identify employees
with superior taskwork and teamwork skills and detect those with a better fit to work in team envi-
ronments (Tannenbaum et al., 1996). However, despite the importance of team selection, significant
knowledge gaps remain regarding how to distinguish “team players” from “team inhibitors” and
how to create teams whose members have the right mix of competencies. Ironically, despite a wealth
of accumulated knowledge about how to select individuals to fit jobs and a burgeoning team litera-
ture, relatively little of this research has systematically focused on team selection issues (e.g., Jones,
Stevens, & Fischer, 2000; McClough & Rogelberg, 2003). In addition, teamwork measurement for
selection has been described as “primitive” (O’Neil, Wang, Lee, Mulkey, & Baker, 2003, p. 283).
The purpose of this chapter is to review and integrate what is currently known about team selec-
tion with the goals of identifying gaps in current knowledge and underscoring promising avenues for
future research. In doing so, we adopt a contingency as well as a multilevel perspective. With respect
to contingency, one of the overarching themes of the present work is that selection approaches will
differ for diverse types of teams and tasks. Because the nature of the team and why it exists plays
such a prominent role in determining what member characteristics are needed, selection systems
must clearly define the purposes and conditions for which group members are chosen. Therefore, we
feature recent developments in team task analysis, which extend traditional job analysis procedures
by addressing the coordinative behaviors needed for effective team performance.

801
802 Handbook of Employee Selection

In addition to a contingency approach, we also acknowledge the multilevel nature of team


functioning. Although the individual has been the unit of analysis in virtually all personnel selec-
tion research, including the work on team selection, choosing team members based on individual
competencies alone is not sufficient to ensure team success (Klimoski & Jones, 1995). According to
Salas, Cannon-Bowers, and Johnston (1997), a collection of experts does not automatically consti-
tute an expert team. Rather, it is important to consider the configuration of members with regard to
knowledge, skills, abilities, and other factors (KSAOs) such as personality traits and experience lev-
els. Regarding team selection, McClough and Rogelberg (2003) concluded, “future research needs
to integrate the team composition literature and develop insights into finding effective combinations
of people for a variety of team situations” (p. 65). In other words, mechanisms must be developed to
determine how a potential employee will “fit” into a particular team.

CONCEPTUAL FRAMEWORK FOR UNDERSTANDING


SELECTION FOR TEAM MEMBERSHIP
Figure 37.1 presents a conceptual framework that captures both the contingency and multilevel
approaches of team selection. The central focus is the individual- as well as team- level competen-
cies that are required for team effectiveness. However, task demands, team type, and team staffing
considerations play a critical role in determining the specific KSAOs that will be relevant in particu-
lar contexts. Each component of Figure 37.1 is discussed below.

CORE TEAMWORK COMPETENCIES


The first step in selection for team membership is to garner a thorough understanding of the
KSAOs needed for effective team performance. Team selection subsumes the requirements of tra-
ditional selection such as ensuring that individuals possess technical competence and maximizing
the fit between the person and job. However, in addition to the taskwork skills required to perform

Contingency perspective: team & task context

Team type:
• Executive • Command and control • Negotiation • Design

Task demands:
• Interdependence • Coordination demands • Behavioral discretion
• Role specialization • Structure (formal & informal) • Level of autonomy

Core/generic teamwork Contingent teamwork


Individual-level

competencies: competencies:
Team-level

• Knowledge • Knowledge
• Skills • Skills
• Abilities • Abilities
• Attitudes • Attitudes

• Compatibility/fit among members • Balance/coverage of competencies


• Homogeneity/heterogeneity of competencies • Organizational/functional scheme
• Leadership • (internal/external) • Size (current & optimal) • Climate/Norms

Multilevel perspective: team-level staffing considerations

FIGURE 37.1 Conceptual framework for understanding selection for team membership.
Selection for Team Membership 803

individually, team members must also possess teamwork skills that enable interdependent work
(e.g., interpersonal, conflict management, communication, collaborative problem-solving; Stevens
& Campion, 1994). Because taskwork skills are not unique to the team context, we focus on two
types of teamwork competencies: core (general teamwork behaviors common to all team tasks)
and contingent (dependent on the task and the team’s configuration). Similar to other researchers
(e.g., Cannon-Bowers, Tannenbaum, Salas, & Volpe, 1995; Kichuk & Wiesner, 1998), we posit
that core teamwork competencies are generic or transportable—that is, they are important regard-
less of the particular task or team at hand. Examples of such competencies include interpersonal
skills, knowledge of teamwork, communication skills, preference for teamwork, and agreeable-
ness. Further, we propose that these attributes are held by individuals and can therefore be mea-
sured at the individual level.

CONTINGENT TEAMWORK COMPETENCIES


In contrast to core competencies, contingent teamwork competencies are particular to the team and
task for which an individual is being selected and must therefore consider team-level attributes.
Because of the dynamic nature of teams, particular needs may change as a function of the team’s
changing structure, configuration, size, and/or life cycle. A culmination of the other categories of
variables presented in Figure 37.1, contingent teamwork competencies are influenced by team type,
task demands, and team staffing variables, which are described below.

Team Type
The key determinants of team effectiveness differ depending on team type. Integrating the various
taxonomies of team types that have been proposed in the literature, Devine (2002) derived two
broad classifications of teams: knowledge and physical. Knowledge work teams (e.g., executive,
command, negotiation, design, advisory) involve mental processing as the core task and informa-
tion as the main outcome. Contrastingly, physical work teams (e.g., service, production, medical,
military, sports) produce a tangible product as the main outcome.
Because knowledge and physical work teams vary with regard to their fundamental work cycle,
temporal duration, technological constraints, and health risks (Devine, 2002), selection require-
ments will clearly be different for diverse team types. In addition, consideration should be given to
whether selection is for new teams (all new members), transition teams (familiar members being
reassigned to new team tasks), or intact teams (new members being placed in existing teams; Jones
et al., 2000).

Task Demands
Influenced in large part by team type, the nature of the task that is being performed is a central
driver of team performance. Past researchers have couched this discussion from multiple perspec-
tives, including descriptions of different types of interdependence (Saavedra, Earley, & van Dyne,
1993) and the behavioral requirements of members during performance (McGrath, 1984). What
these conceptualizations have in common is the notion that the task affects the way the team func-
tions as well as the types of members needed. Other drivers of team functioning include coordina-
tion demands (the manner in which team members must organize and allocate their resources in
order to be successful; Bowers, Morgan, Salas, & Prince, 1993), behavioral discretion (the degree
of control team members have in performing the task as dictated by the level of proceduralization;
Cannon-Bowers, Salas, & Blickensderfer, 1998), role specialization (how roles are defined in the
team; Kilduff, Angelmar, & Mehra, 2000), structure (the nature of the formal organization and
communication channels; Price, Harrison, & Gavin, 2006), level of autonomy (the degree to which
the team manages itself; Langfred, 2007), and environmental stability (the stability of demands
placed on the team from external forces; Keck & Tushman, 1993). Given the importance of task
demands, team researchers have developed team task analysis methods.
804 Handbook of Employee Selection

Team Task Analysis


Analogous to job analysis for individuals (see Chapter 4, this volume), team task analysis (TTA) is
a mechanism for capturing team task features through the systematic study and description of team
member jobs (Lorenzet, Eddy, & Klein, 2003), and in particular, to elucidate the features of the team
task discussed above. Regarded as the first step for all team human resource management functions
(e.g., selection, training, and performance measurement), TTA involves a comprehensive under-
standing of the nature of the team and the key skills necessary to function effectively as a collective
unit (Baker, Salas, & Cannon-Bowers, 1998). Specifically, team competencies, job characteristics,
and cognitive demands are three categories of information gathered during TTA (Lorenzet et al.,
2003). Whereas job analysis techniques focus on the individual and taskwork, TTA emphasizes the
team and teamwork, including coordination and interdependence requirements (Morgan & Lassiter,
1992). Nevertheless, because of the lack of validated TTA techniques, traditional job analysis meth-
ods are often used for teams, violating multilevel principles (Lorenzet et al., 2003), and overlooking
many potentially important factors.
TTA data collection methods include questionnaires, interviews, observation, and critical inci-
dents, with multiple techniques recommended for thorough assessment (Baker et al., 1998; Lorenzet
et al., 2003). Recently, Arthur, Edwards, Bell, Villado, and Bennett (2005) developed and validated
three generic task analysis scales measuring team relatedness (extent to which tasks cannot be per-
formed by a single individual), team workflow (paths through which information flows throughout
the team), and team-task ratio (ratio of the number of tasks that cannot be performed alone to the
total number of tasks). In addition, groupware task analysis has been proposed as a method for
studying group activities, which involves modeling structure, workflow, artifacts, and the work
environment (van Welie & van der Veer, 2003). Furthermore, cognitive TTA investigates the cogni-
tive components underlying teamwork processes, including knowledge of goals, task procedures,
sequences, timing, roles, and teammate characteristics (Blickensderfer, Cannon-Bowers, Salas, &
Baker, 2000). Despite these promising developments, additional research is needed to validate
existing TTA methodologies and to develop new tools.

Team Staffing Considerations


Perhaps the fundamental difference between selection for individual jobs and team jobs is that in
team situations, the level of focus must include the team as a whole, and not simply individual mem-
bers. Indeed, the level of complexity is substantially increased by the need to consider an additional
set of team-relevant KSAOs, and to navigate multiple levels of analysis.
Although taskwork and teamwork skills are generally measured individually, “the major goal of
team staffing is to combine individual level KSAOs in some form to best enhance team-level medi-
ating processes and ultimately performance” (Ployhart, 2004, p. 139). Therefore, when considering
team selection systems, it is crucial to consider the mix of attributes across members, as well as issues
like size, team lifecycle, current staffing levels, member compatibility, and the team’s climate.
Another potential difference between traditional and team selection involves the locus of respon-
sibility for staffing. Although normally ascribed to management, some autonomous work groups are
tasked with member recruitment, testing, and hiring (Hackman, 2002; Wellins, Byham, & Wilson,
1991). With the rising popularity of self-managing teams, member initiated team selection may
become increasingly common in the years to come (Barsness, Tenbrunsel, Michael, & Lawson,
2002).

INDIVIDUAL-LEVEL CONSIDERATIONS
INDIVIDUAL ATTRIBUTES THAT CONTRIBUTE TO EFFECTIVE TEAMWORK
In this section, we focus on describing individual-level attributes that contribute to effective team
performance as a basis to develop selection systems for teams. From a purely practical perspective,
Selection for Team Membership 805

organizations typically hire employees individually even if they are going to work as part of a team.
For this reason, it behooves team selection researchers to attempt to identify teamwork competen-
cies that can predict as much variance in team performance as possible. Because “interdependence
creates special job requirements” (Klimoski & Jones, 1995, p. 309), team tasks place additional
demands on employees that are not found in typical individual tasks. Further, these “teamwork”
attributes are separate and distinct from “taskwork” attributes, which are defined as the more tech-
nical competencies necessary to perform the task (Morgan, Salas, & Glickman, 1993).
Table 37.1 provides a summary of the knowledge, skills, attitudes, and personality traits impor-
tant for team selection, although we do not claim to be exhaustive. In a few cases, the variables
displayed here have been studied, and even validated, in a selection context. However, in other cases
we have made the link to selection by extrapolating from the broader team performance literature.
In other words, we have reasoned that variables which have been demonstrated to be important to
team functioning should be useful in selection, particularly if the attribute is difficult to train.

MEASUREMENT AND VALIDATION


Much of the measurement-related work on team selection has been approached from a content
validity perspective. That is, after the content domain of interest is defined, items are developed
to represent the domain and then compiled in a testing instrument (Binning & Barrett, 1989). In
addition, some research has addressed criterion-oriented validity in which the relationship between
the predictor and criterion (e.g., team performance) is assessed. Both content and criterion-oriented
validation efforts are described below.

Paper-and-Pencil Measures
Because of ease of administration and relatively low cost, paper-and-pencil measures are a popular
means of assessing KSAOs and personality traits for team member selection. Reflecting knowledge
of how to act in team situations, one promising measurement tool is the Teamwork KSA test, which
consists of 35 situational judgment items answered in a multiple choice format (Stevens & Campion,
1999). On the basis of the conceptual model of teamwork requirements developed by Stevens and
Campion (1994), the test captures both interpersonal (conflict resolution, collaborative problem-
solving, communication) and self-management (goal-setting, performance management, planning,
and coordination) KSAs. In addition to being more face-valid for team selection than cognitive ability,
the answers are not as fakable as personality measures, and the items are written generally to be appli-
cable for a variety of industry contexts. Validation studies have shown that the Teamwork KSA test
correlates with supervisory ratings of teamwork and taskwork performance (Leach, Wall, Rogelberg,
& Jackson, 2005; Stevens & Campion, 1999), peer nominations of teamwork (Stevens & Campion,
1999), team task proficiency (Hirschfeld, Jordan, Field, Giles, & Armenakis, 2006), observed ratings
of effective teamwork (Hirshfeld et al., 2006), and contextual performance (Morgeson, Reider, &
Campion, 2005) in organizational and military samples. Moreover, higher scores on the Teamwork
KSA test yielded higher observable teamwork behavior scores and peer ratings of individual effec-
tiveness in a student sample (McClough & Rogelberg, 2003). Although one sample revealed incre-
mental criterion-related validity beyond employment aptitude tests (Stevens & Campion, 1999), a
cautionary note is that strong correlations have raised the issue of redundancy with cognitive ability.
Another situational judgment test (SJT) that has recently been developed and validated for team
member selection is the Team Role Test (Mumford, van Iddekinge, Morgeson, & Campion, 2008),
which assesses declarative and procedural knowledge of team role types and the situational con-
tingencies needed for role adaptability. The Team Role Test consists of nine team scenarios, each
requiring one appropriate role, ten items per scenario. In academic and work team samples, the
Team Role Test was positively related with peer ratings of team role performance (Mumford et al.,
2008). Furthermore, the SJT demonstrated incremental validity beyond cognitive ability and Big
Five traits in predicting role performance (Mumford et al., 2008).
806

TABLE 37.1
Examples of the KSAOs and Personality Traits Important for Team Selection
Attribute Definition Related/Subsidiary Constructs Validation/Measurement Issues
Knowledge
Knowledge of teamwork skills Understanding of the necessary underpinnings Understanding teamwork, familiarity with Assessed via Teamwork KSA test. Validation data
and behavioral requirements of effective team teamwork, knowledge of teamwork KSAs show that this variable predicts effective
performance teamwork (Hirschfeld et al., 2006; McClough &
Rogelberg, 2003; Stevens & Campion, 1999).
Knowledge of team roles Knowledge of team roles and their situational Assessed via Team Role Test. Validation data
contingencies show that this test predicts role performance
(Mumford et al., 2008).
Skills
Adaptability Ability of team members to adjust their strategies Compensatory behavior, backing-up behavior, Best assessed in a work sample or other
in response to task demands, by reallocating dynamic reallocation of function, mutual simulation. Most likely a function of past
team resources adjustment, workload balancing experience (so not easily trained). Some data to
suggest that adaptability improves teamwork
(Salas, Nichols, & Driskell, 2007).
Interpersonal Ability of team members to optimize the quality Morale building, conflict resolution, negotiation, May be assessed through a combination of
if team member interactions through resolution cooperation, consulting with others, interpersonal paper-and-pencil and behavioral measures. Some
on dissent, motivational reinforcement, and trust, social perception, persuasion, helping validation data suggests that interpersonal skills
cooperative behaviors others predict teamwork (e.g., see Morgeson et al.,
2005).
Team management/leadership Ability of team members to direct and coordinate Task motivation, goal-setting, planning and task Best assessed in a work sample or other
activities; assign tasks; organize workflow coordination, establishing roles and expectations, simulation, although paper-and-pencil
among members; and plan, organize, and instructing others, planning, organizing instruments may add value. Not easily trained
establish a positive climate because probably develops with experience.
Some validation indicates that individual
leadership skills are associated with teamwork
effectiveness (e.g., Burke et al., 2006).
Handbook of Employee Selection
Assertiveness Capacity of team members to communicate Task-related assertiveness, component of Can be assessed via paper-and-pencil tests, but
effectively by sharing ideas clearly and extraversion behavioral measures are better. Some validation
directly in interpersonal situations data exist (Pearsall & Ellis, 2006; Smith-Jentsch,
Salas, & Baker, 1996).
Skills
Mutual performance monitoring Ability of team members to accurately monitor Accepting suggestions/criticism; giving Best assessed through a combination of
and assess the work of others; ability to give, suggestions/criticisms; intrateam feedback; paper-and-pencil and behavioral measures. Has
seek, and receive task-clarifying feedback in a monitoring and giving feedback; cross checking; been linked to team performance (e.g., Marks &
constructive manner, and to offer advice error correction; team maintenance Panzer, 2004).
Communication Ability to clearly and accurately articulate and Active listening; information exchange; closed Best assessed through a combination of
exchange information among team members loop communication; information sharing; open paper-and-pencil and behavioral measures.
Selection for Team Membership

using accepted terminology; acknowledge of exchange; consulting with others Closed-loop communication has been shown to
receipt of information; clarify message when predict teamwork (e.g., see Bowers, Pharmer, &
needed Salas, 2000).
Cross-boundary External, task-related actions directed to other Organizational awareness, organizational Paper-and-pencil measure developed by Druskat
teams or the larger organizational context resourcefulness, building relationships with and Kayes (1999).
other teams
Attitudes
Preference for teamwork Inclination and desire to be part of a team; Team/collective orientation, importance of Has been assessed with paper and pencil
willingness to engage with other people in teamwork; appreciation for teamwork; desire to measures. Some evidence to suggest that a
pursuit of task success; appreciation for the work in a team; collectiveness; preference for collective orientation leads to better teamwork
importance of teamwork in accomplishing teamwork (Driskell & Salas, 1992) and that those who
challenging tasks enjoy working in a team engage in less social
loafing (Stark, Shaw, & Duffy, 2007) and have
better team performance (Helmreich & Foushee,
1993).
Self-efficacy for teamwork Degree to which individuals believe that they Teamwork self-efficacy Has been measured with paper-and-pencil
have the requisite knowledge, skills, and other measure (e.g., McClough & Rogelberg, 2003).
attributes to be a successful team member Some data support the link to effective teamwork
(e.g., Tasa, Taggar & Seijts, 2007).

continued
807
808

TABLE 37.1 (continued)


Examples of the KSAOs and Personality Traits Important for Team Selection
Attribute Definition Related/Subsidiary Constructs Validation/Measurement Issues

Other Characteristics
Personality

Conscientiousness Extent to which a person is self-disciplined and Need for achievement, ambition, responsible, Has been assessed with paper-and-pencil
organized dependable measures. Positively related to contextual
performance in team settings (Morgeson et al.,
2005)
Extraversion Extent to which an individual is social, outgoing, Enthusiasm, optimism, assertiveness, dominance, Has been assessed with paper-and-pencil
and talkative gregariousness measures. Positively related to contextual
performance in team settings (Morgeson et al.,
2005)
Agreeableness Extent to which an individual is gentle and Likeability, interpersonal facilitation, trustworthy, Has been assessed with paper-and-pencil
cooperative tolerance, courteousness measures. Positively related to contextual
performance in team settings (Morgeson et al.,
2005)
Emotional stability Extent to which an individual is calm and poised Neuroticism (negative relationship), adjustment, Has been assessed with paper-and-pencil
lack of nervous tendencies, not anxious, security measures. Positively (but only marginally)
related to contextual performance in team
settings (Morgeson et al., 2005)
Handbook of Employee Selection
Selection for Team Membership 809

Work Sample and Interview Measures


Clearly, one of the complications incurred by working at the team level is that many of the vari-
ables of interest are difficult to capture in a typical paper-and-pencil format. In fact, most team
researchers agree that team performance is best measured behaviorally and that systematic obser-
vation is unavoidable (Baker & Salas, 1992; Salas, Burke, Fowlkes, & Priest, 2004). This poses a
particular challenge to team selection researchers and practitioners because instituting behaviorally
based measures that place the applicant in a realistic team situation are more difficult and expen-
sive to employ. Nevertheless, team-oriented assessment centers utilizing team consensus exercises
have been successfully implemented (e.g., Kirksey & Zawacki, 1994; Wellins, Byham, & Dixon,
1994). Moreover, interviews have been shown to effectively measure interpersonal skills (Huffcutt,
Conway, Roth, & Stone, 2001). Indeed, a recent study investigating the selection of individuals in
organizational teams found that social skills, as measured by a structured interview, predicted con-
textual performance beyond Big Five traits and the Teamwork KSA test (Morgeson et al., 2005).
Emerging technologies such as intelligent video-based systems may also prove useful in providing a
realistic context in which to assess team skills (e.g., Cannon-Bowers, Bowers, & Sanchez, 2007).

TEAM-LEVEL CONSIDERATIONS
Thus far, we have discussed the individual-level KSAOs needed for team functioning, which
assumes that teams whose members score higher on taskwork and teamwork competencies will
perform better. However, “when individuals form groups the effects of a valid selection procedure
can be nullified by any lack of cooperation within groups and by bottlenecks, shirking, and social
loafing” (Schneider, Smith, & Sipe, 2000, p. 99). Therefore, it is critical that the overall team con-
text be considered in selection for team membership. In the sections below, we discuss team size,
person-group fit, and team composition.

TEAM SIZE
Because too few members can result in unreasonable work demands and too many members can
produce unnecessary redundancy, an important consideration in team selection involves determin-
ing an appropriate team size. Although larger teams are generally advantaged in terms of resources
(e.g., information, skills, division of labor), they are disadvantaged by heightened coordination dif-
ficulties (e.g., Steiner, 1972). Therefore, one prescription is to staff teams with the smallest number
required to do the work, but determining the optimal figure is contingent on team and task type
(Steiner, 1972; Sundstrom, DeMeuse, & Futrell, 1990). Different team responsibilities across stud-
ies may account for the mixed pattern of findings regarding team size (e.g., Campion, Medsker, &
Higgs, 1993; Mullen, Symons, Hu, & Salas, 1989). To illustrate, a meta-analysis by Stewart (2006)
found that the overall relationship between team size and performance was very small, but modera-
tion effects revealed stronger positive results for project and management teams as compared to
production teams. Because project and management teams involve unstructured tasks and interac-
tion with external constituencies, more team members may be desirable when the environment is
complex (Stewart, 2006).

PERSON-GROUP FIT
Subsumed under the broad, multilevel construct of person-environment (PE) fit, person-group
(PG) or person-team fit refers to the compatibility between members and their groups (Werbel &
Johnson, 2001). Compared with the other types of fit (e.g., person-job and person-organization,
person-supervisor), PG fit has received the least research attention, but interest has been growing
in recent years (Kristof-Brown, Zimmerman, & Johnson, 2005b). Two general categories of PG fit
have been identified. Supplementary PG fit occurs when the individual and the workgroup share
810 Handbook of Employee Selection

similar personality, goals, values and abilities. For example, a person with an engineering back-
ground may join a team of other engineers. In contrast, complementary PG fit occurs when mem-
bers have different competencies, offsetting others’ weaknesses and offering resources that support
each other (Werbel & Johnson, 2001). To illustrate, a member with a marketing background may fill
a gap in a team comprised of engineers. Although examining the match between individuals and
groups, fit research examines individual-level dependent variables.
Existing studies on PG congruence have looked at fit on a variety of content domains such as
values congruence (e.g., Adkins, Ravlin, & Meglino, 1996; DeRue & Morgeson, 2007), goal con-
gruence (e.g., Kristof-Brown & Stevens, 2001), and personality traits (e.g., Kristof-Brown, Barrick,
& Stevens, 2005a). Among these various content dimensions, PG value congruence (supplementary
fit) appears to have the strongest correlations with various outcomes given the relative constancy of
value systems (Kristof-Brown et al., 2005b). With respect to complementary fit on personality traits,
there is some evidence that extraverts are more attracted to teams of introverts, whereas introverts
are more attracted to teams of extraverts (Kristof-Brown et al., 2005).
Current research points to various advantages of PG fit for the individual and team. For exam-
ple, PG value congruence contributed to increased satisfaction with work and social relation-
ships, improved performance on interpersonal dimensions, and reduced tardiness and absenteeism
(Adkins et al., 1996). Additionally, similarity between the individual and team on perceived self
and team mastery goals as well as self and team performance goals led to increased interpersonal
contributions to the workgroup (Kristof-Brown & Stevens, 2001). Self-team performance goal con-
gruence also improved satisfaction with work and the team (Kristof-Brown & Stevens, 2001). The
PG fit-outcome relationship can be characterized as reciprocal and cyclical, in that improved PG
fit enhances individual and group outcomes, which then results in better perceived PG fit (DeRue
& Morgeson, 2007). It is important for researchers to measure the perceptions of team members in
assessing PG fit, as studies have shown the greater salience of perceived PG fit as opposed to actual
PG fit in determining individual outcomes (Kristof-Brown & Stevens, 2001).
Demonstrating its importance, a recent meta-analysis by Kristof-Brown et al. (2005b) established
that PG fit (broadly defined) taps an independent conceptual domain distinct from other types of fit.
Interestingly, PG fit predicted outcomes such as work satisfaction and overall performance equally
well as more established dimensions of fit (Kristof-Brown, Jansen, & Colbert, 2002; Kristof-Brown
et al., 2005b). Specifically, the meta-analytic results revealed that PG fit was positively correlated
with job satisfaction, organizational commitment, supervisor satisfaction, overall performance, and
contextual performance, but negatively correlated with intention to quit. Coworker satisfaction and
group cohesion exhibited particularly strong relationships with PG fit. Because of the limited num-
ber of studies examining PG fit, the unique relationships of supplementary and complementary fit
could not be examined in this meta-analysis (Kristof-Brown et al., 2005b).
Individual member characteristics have been shown to be important predictors of PG fit. In
particular, individual performance and growth satisfaction of team members were found to posi-
tively predict person-team congruence on values and person-role demands-abilities fit (DeRue &
Morgeson, 2007). In addition, the number of previous companies worked for (job switching) and
the length of overall work experience influenced supplementary PG goal congruence on satisfaction
with work and the team (Kristof-Brown et al., 2002). Specifically, individuals who worked in many
companies in the past placed greater emphasis on person-organization fit, whereas individuals with
longer working experience prioritized person-job fit more, deflating the significance of PG fit when
evaluating satisfaction with work and team.
Hollenbeck (2000) discussed the various ways in which individual personal traits can be matched
with team type to improve team performance. For example, to achieve internal person-team fit, it
is recommended that researchers and practitioners select individuals high on cognitive ability for
teams characterized by broad and undefined roles, but select individuals who are relatively high
on openness to experience for teams that constantly need to change and adapt to the environment
(Hollenbeck, 2000). Additionally, functional team structures, which are defined by roles that are
Selection for Team Membership 811

narrow and low in scope, require agreeable members while self-managing teams are better suited
for high conscientious members. Finally, misaligned team structures, which occur when the team
structure is not well matched to the environment, need emotionally stable individuals to handle the
stress of associated problems (Hollenbeck, 2000).
Given the advantages gleaned from PG fit, it is important for managers and practitioners to
consider the match between individuals and the groups to which they are assigned. By focusing
only on individual KSAs when forming a team or when selecting a member into a pre-existing
team (Baker & Salas, 1992; Leach et al., 2005), organizations neglect the overall team context.
Measuring individual-level teamwork skills (Baker & Salas, 1992; O’Neil et al., 2003) is necessary,
but not sufficient, for team selection, as the interaction between individual characteristics and the
team environment must be taken into account. One available tool is the Team Selection Inventory,
which assesses an individual’s preferred team climate as compared to the team’s current climate to
determine person-team fit (Burch & Anderson, 2004).

TEAM COMPOSITION
Because team effectiveness depends, in part, on the compatibility of team members, it is important
to not only seek the best individuals for the team, but the best combination of individuals in
terms of the mix of skills, personality traits, ability, and experience levels. Indeed, a recent meta-
analysis concluded, “who is included in the team matters…individual characteristics emerge to
form a collective construct that links with higher team-level performance” (Stewart, 2006, p. 44).
Composition is a broad term referring to configurations of attributes within small groups (Levine
& Moreland, 1990). Deriving from distinct literatures and utilizing different methodologies, it
is important to differentiate between PG fit and team composition. Whereas the PG fit literature
examines individual-level criteria, team composition studies aggregate member characteristics to
the group level and investigate their impact on group-level outcomes (Kristof-Brown et al., 2005b).
Although team composition has been recognized as a significant predictor of team effectiveness
for years (e.g., Mann, 1959; Hackman, 1987), there is growing acknowledgement that the relation-
ship between member characteristics and team performance is complex and multifaceted. Similar to
the contingent teamwork competencies in Figure 37.1, the emerging conceptual framework reflects
a contingency perspective by suggesting that how and why composition variables influence team
outcomes will depend on a multiplicity of factors. Below, we discuss the aggregation method used,
the individual differences assessed, the particular outcomes studied, and the nature of the team
task as contingency factors in team composition (e.g., Argote & McGrath, 1993; Jackson, May, &
Whitney, 1995).

Aggregation Method
Team composition research is complicated by the various ways that individual scores can be com-
bined to arrive at a group score. The most popular and straightforward approach is to simply aver-
age each member’s responses. More complex than aggregating individual characteristics in a linear
fashion, diversity is concerned with the “distribution of differences among the members of a unit
with respect to a common attribute” (Harrison & Klein, 2007, p. 1200). The standard deviation of
individual-level scores is one specific aggregated team score for diversity (e.g., Harrison & Sin,
2006). Additional methods for operationalizing team composition include selecting the maximum
or minimum individual team member score (e.g., Chan, 1998; Kozlowski & Klein, 2000). Studies
have demonstrated that results differ, depending on the type of aggregation used, and that each
captures a unique aspect of team composition (e.g., Barrick, Stewart, Neubert, & Mount, 1998; Day,
Arthur, Miyashiro, Edwards, & Hanson, 2004).
Steiner’s (1972) task typology provided guidance on which aggregation method to use and
has been the most commonly used approach to specifying the appropriate operationalization in
the team composition literature. According to Steiner (1972), mean aggregation is best suited
812 Handbook of Employee Selection

for additive tasks, in which group performance is the sum of each member’s contribution (e.g.,
shoveling snow). Minimum scores are deemed appropriate for conjunctive tasks where the weakest
member determines team performance (e.g., mountain climbing), and maximum scores are deemed
appropriate for disjunctive tasks where the most competent member determines team performance
(e.g., problem-solving; Steiner, 1972). However, recent studies have been critical of this rationale
(e.g., Day et al., 2004), and a meta-analysis found that stronger effects were not observed when the
operationalization matched the task type of Steiner’s typology (Bell, 2007).
Because Steiner’s (1972) task taxonomy focused exclusively on the way in which group members’
contributions combine into a team outcome, additional variables must be considered in determining
the appropriate method of aggregation, including the predictor and outcome variables being assessed
as well as team and task type. For example, in a sample of business student teams, Mohammed and
Angell (2003) found that diversity on agreeableness, neuroticism, and extraversion affected oral
presentation scores, but mean cognitive ability positively affected written reports. Reflecting these
findings, a recent meta-analysis concluded that the best aggregation method depended on the com-
position variable of interest and that no single operationalization emerged as superior for all compo-
sition variables (Bell, 2007). To illustrate, the strongest relationships with team performance were
observed when conscientiousness was operationalized as the team mean, but when agreeableness
was operationalized as the team minimum (one disagreeable member was enough to be a disruptive
force) (Bell, 2007).

Individual Differences
Clearly, many types of individual differences can exist in a team, including demographic (e.g.,
gender, age, ethnicity), task-related (e.g., ability, experience, functional background, tenure in the
field), and psychological (e.g., attitudes, personality, values). Therefore, in addition to aggregation
method, the member characteristics being assessed figure prominently in the contingency frame-
work describing how and why composition variables influence team outcomes. On the basis of a
TTA, it is important to assess which differences are likely to account for the greatest amount of
variance in predicting team performance. We briefly review the major findings for the individual
differences most researched in the team composition literature: demographics, ability, expertise,
and personality.
Given the current and projected demographic changes in the American workforce (Fullerton &
Toossi, 2001), there has been sustained research interest in readily detectable, social-category dif-
ferences in ethnicity, gender, and age (e.g., Mannix & Neale, 2005). This area of research focuses
exclusively on variability as an aggregation method. Reflecting the fact that diversity is often viewed
as a “double-edged sword,” competing views of team diversity have been espoused (e.g., Milliken &
Martins, 1996). Pessimistically, member heterogeneity is predicted to adversely impact commu-
nication, cohesion, and ultimately team performance because members are motivated to maintain
their social identities and prefer similarity in their interactions (e.g., Byrne, 1971; Tajfel, 1978).
Optimistically, member heterogeneity promotes creativity and quality decision-making because of
the various perspectives brought to a problem and opportunities for knowledge sharing (e.g., Cox,
Lobel, & McLeod, 1991).
Rather than resolve this debate, the results of five published meta-analyses have consistently
found that there is virtually no relationship between heterogeneity on demographic variables and
team performance (Bell, 2007; Bowers, Pharmer, & Salas, 2000; Horowitz & Horowitz, 2007;
Stewart, 2006; Webber & Donahue, 2001). Nevertheless, diversity scholars have shown surpris-
ingly high agreement in their recommendations concerning how to move this area of study forward.
For example, qualitative reviews have recommended extending beyond demographic differences
to incorporate more task-relevant forms of diversity that relate to the fundamental purposes of the
team (e.g., Mannix & Neale, 2005; van Knippenberg & Schippers, 2007). In addition, researchers
have been strongly advised to explore moderating influences rather than focus solely on main effects
(e.g., Webber & Donahue, 2001; Williams & O’Reilly, 1998). Time is one moderator that has proved
Selection for Team Membership 813

fruitful in explaining some of the inconsistent research findings in this literature. Specifically,
the effects of demographic diversity on team processes have been shown to weaken over time (or
with greater group tenure), whereas the effects of deep-level diversity (e.g., job-related attitudes)
strengthen over time (e.g., Harrison, Price, Gavin, & Florey, 2002). Similarly, a recent meta-analysis
found that the negative effects of race and sex diversity on team performance diminished the longer
a team was together (Bell, Villado, Lukasik, Briggs, & Belau, 2007).
Whereas much of the research has primarily focused on a single demographic characteristic at a
time (e.g., gender), individuals have multiple identities simultaneously (e.g., Hispanic female under 30).
Receiving increased attention in the diversity literature, faultline theory explores the hypothetical
dividing lines that may split a group into subgroups (Lau & Murnighan, 1998; Thatcher, Jehn, &
Zanutto, 2003). Recent evidence shows that the more differences converge with each other (e.g.,
all male members of a work group are over 45, while all female members are under 30), the more
groups experience intragroup conflict and lower satisfaction (Lau & Murnighan, 2005).
Whereas demographic findings have generally been mixed and inconclusive, cognitive ability
has yielded the most robust results in team composition research, replicating across field mainte-
nance teams (Barrick et al., 1998), student laboratory groups (Day et al., 2004), human resource
teams (Neuman & Wright, 1999), military tank crews (Tziner & Eden, 1985), and hierarchical
decision-making teams (Lepine, Hollenbeck, Ilgen, & Hedlund, 1997). Isomorphic to the strong
positive relationship between general mental ability and individual-level performance (Schmidt,
2002), several meta-analyses have concluded that teams with smarter members do better (Bell,
2007; Devine & Philips, 2001; Stewart, 2006). When different operationalizations of cognitive
ability are compared (e.g., mean, maximum, minimum, variance), the mean has emerged as the
strongest predictor of team performance across several task types (e.g., Day et al., 2004; Devine &
Philips, 2001).
Expanding beyond general mental ability, research has also investigated expertise as a task-
relevant composition variable, which focuses on specific types of knowledge and experience that
members bring to the team, as well as educational specialization and functional background.
Although the results for cognitive ability were notably stronger, a recent meta-analysis found a
small positive relationship between expertise (mean-aggregated member experience and education)
and team performance (Stewart, 2006). However, most of the work in this area focuses on exper-
tise diversity, which can result in either increased team learning or conflict (Milliken & Martins,
1996; Williams & O’Reilly, 1998). Moderating factors such as collective team identification (van
der Vegt & Bunderson, 2005) and member debate and dialogue (Simons, Pelled, & Smith, 1999)
have been shown to promote positive outcomes from expertise diversity. Although Webber and
Donahue (2001) found that highly job-related diversity (functional, educational, and industry back-
ground) was not related to team outcomes, a more recent meta-analysis found a positive relationship
with both the quality and quantity of team performance (Horowitz & Horowitz, 2007). Similarly,
a meta-analysis by Bell et al. (2007) established that functional background and organizational
tenure diversity (but not educational diversity) were positively associated with team performance
(Bell et al., 2007).
In addition to cognitive ability and expertise, there is growing interest in the role of personal-
ity as a form of deep-level psychological diversity in the team composition literature. Much of the
existing work has focused on the Five-Factor Model (conscientiousness, extraversion, agreeable-
ness, neuroticism, and openness to experience) and examines the aggregate mean level of traits
(e.g., Mohammed, Mathieu, & Bartlett, 2002; Neuman & Wright, 1999). Meta-analytic results con-
sistently conclude that teams composed of conscientious and agreeable members perform better
(Bell, 2007; Peeters, van Tuijl, Rutte, & Reymen, 2006; Stewart, 2006). In addition, a meta-analysis
by Bell (2007) found that extraversion and openness to experience also positively predicted team
performance. Not surprisingly, these personality traits generally exhibited stronger relationships
with performance for organizational teams as compared to laboratory groups (Bell, 2007; Peeters
et al., 2006).
814 Handbook of Employee Selection

Whereas mean aggregation assumes that more of a trait is always better, there are individual
differences for which a balanced pattern across members may be more desirable. Of the research
focusing on personality diversity, extraversion (tendency to be sociable, talkative, and dominant;
Costa & McCrae, 1992) has received the most attention (e.g., Humphrey, Hollenbeck, Meyer, &
Ilgen, 2007; Neuman, Wagner, & Christiansen, 1999). Although too few extraverts within a group
may result in inadequate levels of intrateam communication, too many may pursue social interac-
tions at the expense of task demands and experience power struggles for leadership. Therefore,
diversity on extraversion may lead to more positive outcomes because roles are complimentary, with
some members talking/leading and others listening/following. Several studies have found favorable
results for variability on extraversion (e.g., Barry & Stewart, 1997; Mohammed & Angell, 2003;
Neuman et al., 1999), but meta-analytic results have not been supportive (Bell, 2007; Peeters et al.,
2006). In general, meta-analyses investigating member heterogeneity on personality characteristics
have not yielded strong findings (e.g., Bell, 2007; Stewart, 2006).
Several researchers have begun to expand beyond Big Five personality traits to consider how the
composition of variables such as time urgency (Mohammed & Angell, 2004), task-related attitudes
(e.g., Harrison et al., 2002), collectivism (e.g., Tyran & Gibson, 2008), and preference for teamwork
(e.g., Mohammed & Angell, 2003) affect team functioning. Although the number of studies is quite
limited, meta-analytic results suggest that composing teams with members who enjoy working in
teams and have a collectivistic orientation is positively related to team performance (Bell, 2007).

Outcome Variables
In addition to aggregation method and individual differences, the type of outcome variable
being assessed also affects the relationship between team composition and team performance.
Demonstrating that differences in outcomes reflect a major source of variation across teams,
Mohammed et al. (2002) found that different predictors emerged for each of the three types of team
performance measured. Specifically, ability was significantly related to technical-administrative
task performance, and extraversion, neuroticism, and ability were related to leadership task perfor-
mance. Agreeableness predicted contextual performance. Thus, a single optimal composition for
work teams did not emerge.
Team member heterogeneity may prove to be a liability or an asset, depending on the dependent
variable being assessed. According to Mannix and Neale (2005), diverse teams are more appro-
priate for experimenting, problem-solving, and innovating, whereas homogenous teams are more
desirable for producing, executing, and convergent thinking. Supportive of these predictions, meta-
analytic results revealed that homogeneous teams outperformed heterogeneous teams for physical
work tasks requiring low cognitive demand (Bowers et al., 2000). In addition, although the over-
all relationship between team member heterogeneity and performance was near zero in the meta-
analysis by Stewart (2006), moderator tests found some support for the notion that heterogeneous
teams were better suited for creative tasks than production tasks. Similarly, functional background
and educational diversity yielded stronger effects with performance when innovation was the crite-
rion compared to efficiency as the criterion (Bell et al., 2007). Thus, the potentially positive effects
of work group diversity on group performance are more likely to emerge in teams performing
relatively complex tasks that require information processing, creativity, and collaborative decision-
making where the exchange and integration of diverse task-related information may stimulate thor-
ough consideration of ideas (van Knippenberg, De Dreu, & Homan, 2004).

Team and Task Type


The nature of the task has been strongly implicated as a moderator variable that accounts for the
inconsistency in research findings concerning the effect of composition variables on team outcomes.
Without exception, each of the existing team composition/diversity meta-analyses has emphasized
the importance of considering team and task types (e.g., Bell, 2007; Bell et al., 2007; Bowers et al.,
2000; Devine & Phillips, 2001; Horowitz & Horowitz, 2007; Peeters et al., 2006; Stewart, 2006;
Selection for Team Membership 815

Webber & Donahue, 2001). For example, Bowers et al. (2000) stated, “It appears that the signifi-
cant effects found in many of the included studies can be attributed to the type and difficulty of
the task used in the investigation” (p. 305). Bell et al. (2007) found that the diversity (functional
background and organizational tenure)-performance relationship was strengthened for design and
cross-functional teams compared to other types of teams. However, some meta-analyses have failed
to find evidence for task/team moderated effects (e.g., Horowitz & Horowitz, 2007; Webber &
Donahue, 2001), and others have been hindered in the attempt because of sparse research. To illus-
trate, Horowitz and Horowitz (2007) could not test whether interdependence was a contingency
factor in the team diversity-team outcome link because of the lack of low interdependence teams
across studies. In addition, although there was clear support for moderation, Bell (2007) was unable
to distinguish between study setting and team type because knowledge work teams were mostly
studied in the laboratory, whereas physical work teams were mostly investigated in the field.

DISCUSSION
IMPLICATIONS FOR RESEARCH
TTA
Despite widespread consensus among team researchers regarding its importance (e.g., Baker et al.,
1998; Blickensderfer et al., 2000; Lorenzet et al., 2003), the paucity of research on TTA is indeed
surprising. Research has been deficient in offering empirically tested methodologies for practitioner
use. Since Baker et al. (1998) referred to TTA as “lost but hopefully not forgotten,” there has been
little follow-up research activity (p. 79). Therefore, efforts to examine untested teamwork taxono-
mies, develop new tools, and determine which methods are most appropriate for collecting specific
information top the research agenda in this area (Lorenzet et al., 2003).

Individual-Level Considerations
Although there are many theoretically derived variables that have been hypothesized and investi-
gated as important contributors to team effectiveness, few studies have been conducted to validate
the predictive power of these attributes in a selection context. Moreover, studies that assess the
combinatorial contributions of individual and team-level factors are required in order to optimize
the prediction of effective teamwork. Because many aspects of team functioning cannot be easily
captured via paper-and-pencil measures, efforts to develop behaviorally based assessment tools to
capture observed team actions objectively and reliably are also sorely needed. Fortunately, new
technologies are beginning to emerge as candidates for simulating team environments realistically,
including role players, video-based systems, and virtual world technologies (e.g., Cannon-Bowers
et al., 2007), but they must be validated for team selection.

Team-Level Considerations
Although the most nascent type of fit, a small body of work has already demonstrated the positive
impact of PG fit on a number of individual-level outcomes (e.g., Kristof-Brown et al., 2005b). In
addition, taking into account that meta-analytic moderator analyses could not be tested because of
too few studies (Kristof-Brown et al., 2005b), further PG fit research is clearly warranted.
Still in a formative stage of development, team composition has consistently been identified as a
promising avenue for future studies (e.g., Ilgen, Hollenbeck, Johnson, & Jundt, 2005; Kozlowski &
Bell, 2003; Kozlowski & Ilgen, 2006). Although meta-analytic results have been straightforward
regarding mean-aggregated characteristics (e.g., Bell, 2007; Stewart, 2006), results have been
far less conclusive regarding how to improve the mix of competencies in a team or how to select
new team members while considering existing team member KSAOs. In addition, the question
of whether heterogeneity or homogeneity of member traits is superior has not been satisfactorily
addressed (e.g., Bowers et al., 2000; Webber & Donahue, 2001).
816 Handbook of Employee Selection

Diversity should not be discussed using ‘blanket’ statements suggesting that it is either good, bad, or
unrelated to team performance without reference to the specific diversity variable of interest, type of
team, type of performance assessed, and the amount of time the team has been together.” (Bell et al.,
2007, p. 14)

Criticized as being “conceptually scattered” (McGrath, 1984, p. 256) and “atheoretical” (Levine &
Moreland, 1990, p. 594), well-developed models adopting a contingency and multilevel perspective
are needed to help clarify the complex patterns of variables deemed important in the team composi-
tion literature. Still a rarity in research practice, a comprehensive “meso” approach to team staffing
involves not only multiple levels, but also cross-level interactions (Ployhart, 2004). The meso para-
digm concerns the simultaneous investigation of two levels of analysis where the processes between
levels are articulated (House, Rousseau, & Thomas-Hunt, 1995).
According to Stewart (2006), “some traits matter more than others … yet our understanding
of which traits are most beneficial for collective performance of teams is still evolving” (p. 45).
Therefore researchers should be encouraged to explore new task-relevant constructs and expand
beyond examining the traditionally considered characteristics of demographic variables, cognitive
ability, expertise, and Big Five traits. In addition, research simultaneously examining multiple char-
acteristics, as is done in faultline studies, should increase. Future work should also seek to specify
the team and task types, as well as contextual factors, that affect optimal team size.

IMPLICATIONS FOR PRACTICE


In many organizations, team assignments are handled casually, based on availability and func-
tional background, with little else considered. For example, with the exception of rank, military
personnel are essentially randomly assigned to many long-term Army teams. Assignment to
Special Forces teams is largely ad hoc, but military occupational specialty is taken into account
(personal communication, Dr. Jay Goodwin, Army Research Institute, June 9, 2008). Thus, as
compared to the range of predictors investigated by researchers (e.g., demographics, personality,
attitudes, abilities, experience), it appears that practitioners formally consider a narrower subset of
variables in team assignments.
Clearly, the starting point in selection for team membership should be a team-based task analy-
sis that specifies the nature of the team and the purposes for which it exists. Based on the need to
account for both individual member performance as well as team performance as a whole, we sug-
gest that a multi-phase procedure be utilized for team selection. In the first stage, generic team and
task competencies would be assessed, including cognitive ability, conscientiousness, agreeableness,
preference for teamwork, and interpersonal KSAs. In the second stage, a contingency framework
would be adopted to examine the synergy of several factors, including the type of team and the
outcomes that are important, task-specific and team- specific competencies, and the capability and
personality compatibility of members. Using assessment tools such as the person-team fit inventory
(Burch & Anderson, 2004), which matches an individual’s desired climate and an actual team’s
climate, could provide valuable information during decisions of team member placement into exist-
ing teams or team member-initiated selection. Group-role analysis, which identifies the nature of
group norms and group-specific task roles, maintenance roles and role interactions, should also be
leveraged in the process of identifying the complementary and supplementary needs of the team
(Werbel & Johnson, 2001).
Halfhill, Huff, Sundstrom, and Nielsen (2003) recommended involving team members in staff-
ing decisions and soliciting their views on personality composition issues. Team members from
pre-existing groups can provide valuable information on group values, beliefs, and goals that can
be used to assess the fit of potentially incoming members of the group. In the case of self-managed
teams, members may be entirely responsible for selecting new members and would therefore need
to be properly trained.
Selection for Team Membership 817

Individual-Level Considerations
Evaluating potential team members on generic team KSA requirements and personality factors
has shown success as a selection tool (e.g., Hirschfeld et al., 2006; McClough & Rogelberg, 2003;
Morgeson et al., 2005; Mumford et al., 2008; Stevens & Campion, 1999). Because these paper-and-
pencil tests are relatively easy to administer and add significantly to the prediction of team member
effectiveness, we would recommend including them in a team selection context. However, given that
many of the attributes that best predict team effectiveness are not easily captured via paper-and-
pencil tests (e.g., communication skills, adaptability), we also advise implementing work sample
tests that place the applicant in a realistic team setting when feasible. Obviously, this is a more costly
and complicated process, making it justified for only high stakes hires.

Team-Level Considerations
Convergent meta-analytic results offer some guidance to practitioners in their quest to staff teams
effectively in organizations. Team composition has been shown to strongly influence team perfor-
mance across many studies, with some effects rivaling those found for process variables such as
conflict and cohesion (e.g., Bell, 2007; Stewart, 2006). Therefore, when possible, practitioners should
utilize team composition as a tool to increase team effectiveness. With regard to what characteris-
tics to focus on, both taskwork and teamwork competencies have been shown to contribute unique
variance as predictors of team performance (Bell, 2007). Specifically, multiple meta-analyses have
confirmed that teams with smart, conscientious, and agreeable members perform better (e.g., Bell,
2007, Peeters et al., 2006; Stewart, 2006). Individual meta-analyses have also found that higher
mean levels of expertise (Stewart, 2006), openness to experience, and extraversion (Bell, 2007) are
also related to higher team performance.
Although it is recommended that the heterogeneity/homogeneity of member characteristics be
explored in team selection (e.g., Klimoski & Jones, 1995; McClough & Rogelberg, 2003), the incon-
sistency and complexity of current research findings disallows the kind of straightforward prescrip-
tions that are appealing to practitioners. Whereas moderated results are attractive to researchers in
specifying the conditions under which diversity will aid or hinder team performance, the number of
contingencies to be considered significantly complicates the feasibility of interventions to compose
teams. In addition, composition research has not adequately addressed many of the individual dif-
ference variables that may be of substantial import in work settings. For example, when forming
military teams, rank or status must be considered to avoid violating political protocols and creating
interpersonal mismatches. Therefore, a significant disconnect remains between the current state of
research and practitioners’ ability to optimally assign members to teams based on the accumulated
body of knowledge. Bridging this gap is a key agenda item for future work.
The legal issues underlying selection for team membership must also be considered. Interestingly,
there may be an inherent conflict between Equal Employment Opportunity laws written to protect
the individual and team selection procedures designed to improve group outcomes. Whereas the legal
perspective emphasizes standardization and the importance of evaluating all applicants according to
a common set of metrics, the team contingency perspective emphasizes customization and the value
of member compatibility as well as skill heterogeneity. Is it legally defensible for an employer to reject
a candidate who has the same competencies of other team members and select another candidate with
different competencies? What are the legal ramifications when selection for team membership is seen
as promotion or special placement? These questions have yet to be fully explored and resolved.1

CONCLUSIONS
Understanding how to form superior teams is the key to harnessing selection as a tool for improving
team performance. Given the importance of teams in many modern organizations, it is surprising
1 The authors would like to thank Nancy T. Tippins for this addition.
818 Handbook of Employee Selection

that the state of the science and practice in team selection is still fairly rudimentary. Although
there is no shortage of variables that have been hypothesized to affect team performance, specific
studies validating predictors of team effectiveness in a selection context are relatively rare. In addi-
tion, developing mechanisms to determine how a potential employee will fit into a particular team
requires a level of precision that has yet to be developed, in terms of what competencies are needed
within the team and how to measure them in applicants. Therefore, more work is needed regarding
the categories of attributes that are necessary to optimize team functioning—those that are held
by individual members and those that transcend individual members and exist at the team level.
Despite the difficulties, we hope that team researchers will focus on identifying and empirically
validating the competencies needed for selection for team membership.

REFERENCES
Adkins, C. L., Ravlin, E. C., & Meglino, B. M. (1996). Value congruence between co-workers and its relation-
ship to work outcomes. Group & Organization Studies, 21, 439.
Argote, L., & McGrath, J. E. (1993). Group processes in organizations: Continuity and change. International
Review of Industrial and Organizational Psychology, 8, 333–389.
Arthur, W., Jr., Edwards, B. D., Bell, S. T., Villado, A. J., & Bennett, W., Jr. (2005). Team task analysis:
Identifying tasks and jobs that are team-based. Human Factors, 47(3), 654–669.
Baker, D. P., & Salas, E. (1992). Principles for measuring teamwork skills. Human Factors, 34(4), 469–475.
Baker, D. P., Salas, E., & Cannon-Bowers, J. (1998). Team task analysis: Lost but hopefully not forgotten. The
Industrial and Organizational Psychologist, 35(3), 79–83.
Barrick, M. R., Stewart, G. L., Neubert, M. J., & Mount, M. K. (1998). Relating member ability and personality
to work-team processes and team effectiveness. Journal of Applied Psychology, 83(3), 377–391.
Barry, B., & Stewart, G. L. (1997). Composition, process, and performance in self-managed groups: The role
of personality. Journal of Applied Psychology, 82(1), 62–78.
Barsness, Z. I., Tenbrunsel, A. E., Michael, J. H., & Lawson, L. (2002). Why am I here? The influence of group
and relational attributes on member-initiated team selection. In H. Sondak (Ed.), Toward phenomenology
of groups and group membership (pp. 141–171). New York, NY: Elsevier Science.
Bell, S. T. (2007). Deep-level composition variables as predictors of team performance: A meta-analysis.
Journal of Applied Psychology, 92(3), 595–615.
Bell, S. T., Villado, A. J., Lukasik, M. A., Briggs, A., & Belau, L. (2007). Revisiting the team demographic
diversity and performance relationship: A meta-analysis. Paper presented at the annual conference of the
Society for Industrial/Organizational Psychology, New York, NY.
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential
and evidential bases. Journal of Applied Psychology, 74, 478–494.
Blickensderfer, E., Cannon-Bowers, J. A., Salas, E., & Baker, D. P. (2000). Analyzing knowledge require-
ments in team tasks. In J. M. Schraagen, S. F. Chipman, & V. L. Shalin (Eds.), Cognitive task analysis
(pp. 431–450). Philadelphia, PA: Lawrence Erlbaum.
Bowers, C. A., Morgan, B. B., Jr., Salas, E., & Prince, C. (1993). Assessment of coordination demand for air-
crew coordination training. Military Psychology, 5, 95–112.
Bowers, C. A., Pharmer, J. A., & Salas, E. (2000). When member homogeneity is needed in work teams: A
meta-analysis. Small Group Research, 31(3), 305.
Burch, G. S. J., & Anderson, N. (2004). Measuring person-team fit: Development and validation of the team
selection inventory. Journal of Managerial Psychology, 19(4), 406.
Burke, C. S., Stagl, K. C., Klein, C., Goodwin, G. F., Salas, E., & Halpin, S. (2006). What type of leadership
behaviors are functional in teams? A meta-analysis. The Leadership Quarterly, 17, 288–307.
Byrne, D. E. (1971). The attraction paradigm. New York, NY: Academic Press.
Campion, M. A., Medsker, G. J., & Higgs, A. C. (1993). Relations between work group characteristics and
effectiveness: Implications for designing effective work groups. Personnel Psychology, 46(4), 823–847.
Cannon-Bowers, J. A., Bowers, C. A., & Sanchez, A. (2007). Using synthetic learning environments to train
teams. In V. I. Sessa & M. London (Eds.), Work group learning: Understanding, improving, and assess-
ing how groups learn in organizations (pp. 315–347). Mahwah, NJ: Lawrence Erlbaum.
Cannon-Bowers, J. A., Salas, E., & Blickensderfer, E. (1998, April). On training crews. In R. J. Klimoski
(Chair), When is a work team a crew—and does it matter? Paper presented at the 13th annual meeting of
the Society of Industrial and Organizational Psychology, Dallas, TX.
Selection for Team Membership 819

Cannon-Bowers, J. A., Tannenbaum, S. I., Salas, E., & Volpe, C. E. (1995). Defining competencies and estab-
lishing team training requirements. In R. A. Guzzo, E. Salas, et al. (Eds.), Team effectiveness and deci-
sion making in organizations (pp. 333–380). San Francisco, CA: Jossey-Bass.
Chan, D. (1998). Functional relations among constructs in the same content domain at different levels of
analysis: A typology of composition models. Journal of Applied Psychology, 83(2), 234–246.
Costa, P. T., & McCrae, R. R. (1992). Revised NEO Personality Inventory (NEO PI-R) and NEO Five-Factor
Inventory (NEO-FFI): Professional manual. Lutz, FL: Psychological Assessment Resources.
Cox, T. H., Lobel, S. A., & McLeod, P. L. (1991). Effects of ethnic group cultural differences on coop-
erative and competitive behavior on a group task. The Academy of Management Journal, 34(4),
827–847.
Day, E. A., Arthur, W., Miyashiro, B., Edwards, B. D., & Hanson, T. (2004). Criterion-related validity of sta-
tistical operationalizations of group general cognitive ability as a function of task type: Comparing the
mean, maximum, and minimum. Journal of Applied Social Psychology, 34(7), 1521–1549.
DeRue, D. S., & Morgeson, F. P. (2007). Stability and change in person-team and person-role fit over time: The
effects of growth satisfaction, performance, and general self-efficacy. Journal of Applied Psychology,
92(5), 1242–1253.
Devine, D. J. (2002). A review and integration of classification systems relevant to teams in organizations.
Group Dynamics: Theory, Research, and Practice, 6, 291–310.
Devine, D. J., & Philips, J. L. (2001). Do smarter teams do better: A meta-analysis of cognitive ability and team
performance. Small Group Research, 32(5), 507.
Driskell, J. E., & Salas, E. (1992). Collective behavior and team performance. Human Factors, 34, 277–288.
Druskat, V. U., & Kayes, D. C. (1999). The antecedents of competence: Toward a fine-grained model
of self-managing team effectiveness. In E. A. Mannix, M. A. Neale, & R. Wageman (Eds.),
Research on managing groups and teams: Groups in context (Vol. 2, pp. 201–231). Greenwich,
CT: JAI Press.
Fullerton, H. N., Jr., & Toossi, M. (2001). Labor force projections to 2010: Steady growth and changing com-
position. Monthly Labor Review, 124(11), 21–38.
Gladstein, D. L. (1984). Groups in context: A model of task group effectiveness. Administrative Science
Quarterly, 29(4), 499–517.
Hackman, J. R. (1987). The design of work teams. In J. Lorsch (Ed.), Handbook of organizational behavior
(pp. 315–342). Englewood Cliffs, NJ: Prentice-Hall.
Hackman, J. R. (2002). Leading teams: Setting the stage for great performances. Boston, MA: Harvard
Business School Press.
Halfhill, T. R., Huff, J. W., Sundstrom, E., & Nielsen, T. M. (2003). Group personality composition and work
team effectiveness: Key factor in staffing the team-based organization. Advances in Interdisciplinary
Studies of Work Teams, 9, 147–168.
Harrison, D. A., & Klein, K. J. (2007). What’s the difference? Diversity constructs as separation, variety, or
disparity in organizations. The Academy of Management Review, 32(4), 1199–1228.
Harrison, D. A., Price, K. H., Gavin, J. H., & Florey, A. T. (2002). Time, teams, and task performance: Changing
effects of surface-and deep-level diversity on group functioning. Academy of Management Journal,
45(5), 1029–1045.
Harrison, D. A., & Sin, H. (2006). What is diversity and how should it be measured? In A. M. Konrad, P. Prasad,
& J. K. Pringle (Eds.), Handbook of workplace diversity (pp. 191–216). Thousand Oaks, CA: Sage.
Helmreich, R. L., & Foushee, H. C. (1993). Why crew resource management? Empirical and theoretical bases
of human factors in training and aviation. In E. Wiener, B. G. Kanki, & R. L. Helmreich (Eds), Cockpit
resource management (pp. 3–45). San Diego, CA: Academic Press.
Heslin, R. (1964). Predicting group task effectiveness from member characteristics. Psychology Bulletin, 62,
248–256.
Hirschfeld, R. R., Jordan, M. H., Feild, H. S., Giles, W. F., & Armenakis, A. A. (2006). Becoming team players:
Team members’ mastery of teamwork knowledge as a predictor of team task proficiency and observed
teamwork effectiveness. Journal of Applied Psychology, 91(2), 467–474.
Hollenbeck, J. R. (2000). A structural approach to external and internal person-team fit. Applied Psychology:
An International Review, 49(3), 534–549.
Horowitz, S. K., & Horowitz, I. B. (2007). The effects of team diversity on team outcomes: A meta-analytic
review of team demography. Journal of Management, 33(6), 987–1015.
House, R., Rousseau, D. M., & Thomas-Hunt, M. (1995). The meso paradigm: A framework for the inte-
gratition of micro and macro organizational behavior. Research in Organizational Behavior, 17,
71–114.
820 Handbook of Employee Selection

Huffcutt, A. L., Conway, J. M., Roth, P. L., & Stone, N. J. (2001). Identification and meta-analytic assessment
of psychological constructs measured in employment interviews. Journal of Applied Psychology, 86(5),
897–913.
Humphrey, S. E., Hollenbeck, J. R., Meyer, C. J., & Ilgen, D. R. (2007). Trait configurations in self-managed
teams: A conceptual examination of the use of seeding for maximizing and minimizing trait variance in
teams. Journal of Applied Psychology, 92(3), 885.
Ilgen, D. R., Hollenbeck, J. R., Johnson, M., & Jundt, D. (2005). Teams in organizations: From input-
process-output models to IMOI models. Annual Review of Psychology, 56(1), 517–543.
Jackson, S. E., May, K. E., & Whitney, K. (1995). Understanding the dynamics of diversity in decision-making
teams. In R. A. Guzzo, E. Salas, & Associates (Eds.), Team effectiveness and decision making in organi-
zations (pp. 204–261) San Francisco, CA: Jossey Bass.
Jones, R. G., Stevens M. J., & Fischer, D. L. (2000). Selection in team contexts. In J. F. Kehoe (Ed.), Managing
selection in changing organizations (pp. 210–241). San Francisco, CA: Jossey-Bass.
Keck, S., & Tushman, M. (1993). Environmental and organizational context and executive team structure.
Academy of Management Journal, 36(6), 1314–1344.
Kichuk, S. L., & Wiesner, W. H. (1998). Work teams: Selecting members for optimal performance. Canadian
Psychology, 39(1–2), 23–32.
Kilduff, M., Angelmar, R., & Mehra, A. (2000, January). Top management-team diversity and firm perfor-
mance: Examining the role of cognitions. Organization Science, 11, 21–34.
Kirksey, J., &. Zawacki, R. A. (1994). Assessment center helps find team-oriented candidates. Personnel
Journal, 73(5), 92.
Klimoski, R., & Jones, R. G. (1995). Staffing for effective group decision-making: Key issues in matching
people and teams. In R. A. Guzzo, E. Salas, & Associates (Eds.), Team effectiveness and decision making
in organizations (pp. 291–332) San Francisco, CA: Jossey Bass.
Kozlowski, S. W. J., & Bell, B. S. (2003). Work groups and teams in organizations. In W. C. Borman,
D. R. Ilgen, & R. J. Klimoski (Eds.), Handbook of psychology: Industrial and organizational psychology
(pp. 333–375). London, England: Wiley.
Kozlowski, S. W. J., & Ilgen, D. R. (2006). Enhancing the effectiveness of work groups and teams. Psychological
Science in the Public Interest, 7(3), 77–124.
Kozlowski, S. W. J., & Klein, K. J. (2000). A multilevel approach to theory and research in organizations:
Contextual, temporal, and emergent processes. In S. W. Kozlowski & K. J. Klein (Eds.), Multilevel the-
ory, research, and methods in organizations: Foundations, extensions, and new directions (pp. 3–90).
London, England: Wiley.
Kristof-Brown, A., Barrick, M. R., & Stevens, C. K. (2005a). When opposites attract: A multi-sample demon-
stration of complementary person-team fit on extraversion. Journal of Personality, 73, 935–958.
Kristof-Brown, A. L., Jansen, K. J., & Colbert, A. E. (2002). A policy-capturing study of the simultaneous
effects of fit with jobs, groups, and organizations. Journal of Applied Psychology, 87(5), 985–993.
Kristof-Brown, A. L., & Stevens, C. K. (2001). Goal congruence in project teams: Does the fit between
members’ personal mastery and performance goals matter? Journal of Applied Psychology, 86(6),
1083–1095.
Kristof-Brown, A. L., Zimmerman, R. D., & Johnson, E. C. (2005b). Consequences of individual’s fit at work:
A meta-analysis of person-job, person-organization, person-group, and person-supervisor fit. Personnel
Psychology, 58(2), 281–342.
Langfred, C. (2007). The downside of self-management: A longitudinal study of the effects of conflict on
trust, autonomy, and task interdependence in self-managing teams. Academy of Management Journal,
50, 885–900.
Lau, D. C., & Murnighan, J. K. (1998). Demographic diversity and faultlines: The compositional dynamics of
organizational groups. The Academy of Management Review, 23(2), 325–340.
Lau, D., & Murnighan, J. K. (2005). Interactions with groups and subgroups: The effects of demographic fault-
lines. Academy of Management Journal, 48, 645–659.
Leach, D. J., Wall, T. D., Rogelberg, S. G., & Jackson, P. R. (2005). Team autonomy, performance, and mem-
ber job strain: Uncovering the teamwork KSA link. Applied Psychology: An International Review, 54,
1–24.
Lepine, J. A., Hollenbeck, J. R., Ilgen, D. R., & Hedlund, J. (1997). Effects of individual differences on the
performance of hierarchical decision-making teams: Much more than g. Journal of Applied Psychology,
82(5), 803–811.
Levine, J. M., & Moreland, R. L. (1990). Progress in small group research. Annual Review of Psychology,
41(1), 585–634.
Selection for Team Membership 821

Lorenzet, S. J., Eddy, E. R., & Klein, G. D. (2003). The importance of team task analysis for team human
resource management. In M. M. Beyerlein, D. A. Johnson, & S. T. Beyerlein (Eds.), Team-based organiz-
ing (Vol. 9, pp. 113–145). New York, NY: Elsevier Science.
Mann, R. D. (1959). A review of the relationships between personality and performance in small groups.
Psychological Bulletin, 56(4), 241.
Mannix, E., & Neale, M. A. (2005). What differences make a difference? Psychological Science in the Public
Interest, 6(2), 31–55.
Marks, M. A., & Panzer, F. J. (2004). The influence of team monitoring on team processes and performance.
Human Performance, 17, 25–41.
McClough, A. C., & Rogelberg, S. G. (2003). Selection in teams: An exploration of the teamwork knowledge,
skills, and ability test. International Journal of Selection and Assessment, 11, 56–66.
McGrath, J. E. (1984). Groups: Interaction and performance. Englewood Cliffs, NJ: Prentice-Hall.
Milliken, F. J., & Martins, L. L. (1996). Searching for common threads: Understanding the multiple effects of
diversity in organizational groups. The Academy of Management Review, 21(2), 402–433.
Mohammed, S., & Angell, L. C. (2003). Personality heterogeneity in teams: Which differences make a differ-
ence for team performance? Small Group Research, 34(6), 651.
Mohammed, S., & Angell, L. (2004). Surface- and deep-level diversity in workgroups: Examining the mod-
erating effects of team orientation and team process on relationship conflict. Journal of Organizational
Behavior, 25, 1015–1039.
Mohammed, S., Mathieu, J. E., & Bartlett, A. L. (2002). Technical-administrative task performance, leader-
ship task performance, and contextual performance: Considering the influence of team- and task-related
composition variables. Journal of Organizational Behavior, 23(7), 795–814.
Morgan, B. B., & Lassiter, D. L. (1992). Team composition and staffing. In R. W. Swezey & E. Salas (Eds.),
Teams: Their training and performance (pp. 75–100). Westport, CT: Ablex.
Morgan, B., Salas, E., & Glickman, A. (1993, July). An analysis of team evolution and maturation. Journal of
General Psychology, 120(3), 277–291.
Morgeson, F. P., Reider, M. H., & Campion, M. A. (2005). Selecting individuals in team settings: The impor-
tance of social skills, personality characteristics, and teamwork knowledge. Personnel Psychology, 58(3),
583–611.
Mullen, B., Simons, C., Hu, L. T., & Salas, E. (1989). Group size, leadership behavior, and subordinate satisfac-
tion. The Journal of General Psychology, 116(2), 155.
Mumford, T. V., van Iddekinge, C. H., Morgeson, F. P., & Campion, M. A. (2008). The team role test:
Development and validation of a team role knowledge situational judgment test. Journal of Applied
Psychology, 93, 250–267.
Neuman, G. A., Wagner, S. H., & Christiansen, N. D. (1999). The relationship between work-team personality
composition and the job performance of teams. Group and Organization Management, 24, 28–45.
Neuman, G. A., & Wright, J. (1999). Team effectiveness: Beyond skills and cognitive ability. Journal of Applied
Psychology, 84(3), 376–389.
O’Neil, H. F., Wang, S., Lee. C., Mulkey, J., & Baker, E. L. (2003). Assessment of teamwork skills via a
teamwork questionnaire. In H. F. O’Neil & R. S. Perez (Eds.), Technology applications in education: A
learning view (pp. 283–304). Mahwah, NJ: Lawrence Erlbaum.
Pearsall, M. J., & Ellis, A. P. J. (2006). The effects of critical team member assertiveness on team performance
and satisfaction. Journal of Management, 32, 575–594.
Peeters, M. A. G., van Tuijl, H., Rutte, C. G., & Reymen, I. (2006). Personality and team performance: A meta-
analysis. European Journal of Personality, 20(5), 377–396.
Ployhart, R. E. (2004). Organizational staffing: A multilevel review, synthesis, and model. Research in Personnel
and Human Resources Management, 23, 121–176.
Price, K., Harrison, D., & Gavin, J. (2006). Withholding inputs in team contexts: Member composition,
interaction processes, evaluation structure, and social loafing. Journal of Applied Psychology, 91(6),
1375–1384.
Saavedra, R., Earley, P. C., & van Dyne, L. (1993). Complex interdependence in task performing groups.
Journal of Applied Psychology, 78, 61–72.
Salas, E., Burke, C. S., Fowlkes, J. E., & Priest, H. A. (2004). On measuring teamwork skills. In J. Thomas
(Ed.), Comprehensive handbook of psychological assessment. Vol. 4: Industrial/organizational assess-
ment (pp. 427–442). Hoboken, NJ: Wiley.
Salas, E., Cannon-Bowers, J. A., & Johnston, J. H. (1997). How can you turn a team of experts into an expert
team? Emerging training strategies. In G. K. Zsambok & C. E. Zsambok (Eds.), Naturalistic decision
making (pp. 359–370). Mahwah, NJ: Lawrence Erlbaum.
822 Handbook of Employee Selection

Salas, E., Nichols, D. R., & Driskell, J. E. (2007). Testing three team training strategies in intact teams: A meta-
analysis. Small Group Research, 38, 471–488.
Schmidt, F. L. (2002). The role of general cognitive ability and job performance: Why there cannot be a debate.
Human Performance, 15, 187–210.
Schneider, B., Smith, D. B., & Sipe, W. P. (2000). Personnel selection psychology. In S. W. Kozlowski &
K. J. Klein (Eds.), Multilevel theory, research, and methods in organizations: Foundations, extensions,
and new directions (pp. 91–120). London, England: Wiley.
Simons, T., Pelled, L. H., & Smith, K. A. (1999). Making use of difference: Diversity, debate, and decision
comprehensiveness in top management teams. Academy of Management Journal, 42(6), 662–673.
Smith-Jentsch, K. A., Salas, E., & Baker, D. P. (1996). Training team performance-related assertiveness.
Personnel Psychology, 49, 909–936.
Stark, E. M., Shaw, J. D., & Duffy, M. K. (2007). Preference for group work, winning orientation, and social
loafing behavior in groups. Group and Organization Management, 32(6), 699–723.
Steiner, I. D. (1972). Group processes and productivity. New York, NY: Academic Press.
Stevens, M. J., & Campion, M. A. (1994). The knowledge, skill, and ability requirements for teamwork:
Implications for human resource management. Journal of Management, 20(2), 503.
Stevens, M. J., & Campion, M. A. (1999). Staffing work teams: Development and validation of a selection test
for teamwork settings. Journal of Management, 25(2), 207.
Stewart, G. L. (2006). A meta-analytic review of relationships between team design features and team perfor-
mance. Journal of Management, 32(1), 29.
Sundstrom, E., De Meuse, K. P., & Futrell, D. (1990). Work teams: Applications and effectiveness. American
Psychologist, 45(2), 120–133.
Tajfel, H. (1978). Differentiation between social groups: Studies in the social psychology of intergroup rela-
tions. New York, NY: Academic Press.
Tannenbaum, S. I., Salas, E., & Cannon-Bowers, J. A. (1996). Promoting team effectiveness. In M. A. West
(Ed.), Handbook of work group psychology (pp. 503–529). Chichester, England: Wiley.
Tasa, K., Taggar, S., & Seijts, G. H. (2007). The development of collective efficacy in teams: A multilevel and
longitudinal perspective. Journal of Applied Psychology, 91(1), 17–27.
Thatcher, S. M. B., Jehn, K. A., & Zanutto, E. (2003). Cracks in diversity research: The effects of diversity
faultlines on conflict and performance. Group Decision and Negotiation, 12(3), 217–241.
Tyran, K. L., & Gibson, C. B. (2008). Is what you see, what you get?: The relationship among surface-and
deep-level heterogeneity characteristics, group efficacy, and team reputation. Group & Organization
Management, 33(1), 46.
Tziner, A., & Eden, D. (1985). Effects of crew composition on crew performance: Does the whole equal the
sum of its parts? Journal of Applied Psychology, 70(1), 85–93.
van der Vegt, G., & Bunderson, J. S. (2005). Learning and performance in multidisciplinary teams: The impor-
tance of collective team identification. The Academy of Management Journal, 48(3), 532–547.
van Knippenberg, D., De Dreu, C. K., & Homan, A. C. (2004). Work group diversity and group performance:
An integrative model and research agenda. Journal of Applied Psychology, 89(6), 1008–1022.
van Knippenberg, D., & Schippers, M. C. (2007). Work group diversity. Annual Review of Psychology, 58,
515–541.
van Welie, M., & van der Veer, G. C. (2003). Pattern languages in interaction design: Structure and organiza-
tion. Proceedings of Interact, 3, 1–5.
Webber, S. S., & Donahue, L. M. (2001). Impact of highly and less job-related diversity on work group cohe-
sion and performance: A meta-analysis. Journal of Management, 27(2), 141.
Wellins, R. S., Byham, W. C., & Dixon, G. R. (1994). Inside teams: How 20 world-class organizations are
winning through teamwork. San Francisco, CA: Jossey-Bass.
Wellins, R. S., Byham, W. C., & Wilson, J. M. (1991). Empowered teams, creating self managing working
groups and the improvement of productivity and participation. San Francisco, CA: Jossey-Bass.
Werbel, J. D., & Johnson, D. J. (2001). The use of person-group fit for employment selection: A missing link
in person-environment fit. Human Resource Management, 40(3), 227–240.
Williams, K. Y., & O’Reilly, C. A. (1998). Demography and diversity in organizations: A review of 40 years of
research. Research in Organizational Behavior, 20(S77), 140.
38 Selecting Leaders
Executives and High Potentials
George C. Thornton III, George P. Hollenbeck,
and Stefanie K. Johnson

Despite the importance of leaders at the top levels of organizations, there has been relatively little
research on executive selection. There is a large literature on leadership concepts (Den Hartog &
Koopman, 2001) and leadership development (Day, 2001), but surprisingly little empirical research
on selecting leaders into top-level positions. Recent books on selection (Evers, Anderson, &
Voskuijl, 2005) provide little or no guidance on the selection of executive leaders, despite the fact
that selection of personnel is a major area of practice in industrial-organizational (I-O) psychology,
and executive selection is extremely important in any organization.
The majority of our understanding of executive selection has come from applied research and
experience. For example, Hollenbeck (1994) summarized observations from his experience and
eight books on the selection of chief executive officers (CEOs). Sessa and Taylor (2000) summarized
the results of a series of studies conducted at the Center for Creative Leadership in the 1990s using
simulations and surveys of executives. The purpose of this chapter is to review research and practice
in the selection of executive leaders and those who have high potential for these positions and to
comment on these developments over the past decades on the basis of our observations.
We begin the chapter by defining our focal group of executives and high potentials. To clarify
our focus, we make distinctions between leadership behaviors, leaders, and management. Next we
describe a few of the attributes thought to be important for the effectiveness of top leaders and
review some techniques to assess these attributes in high potentials and executive candidates. Then,
we describe the importance of an integrated process of recruitment, development, and management
of high potentials and executives in the context of an organization’s internal culture and exter-
nal environment. Next, mechanisms for integrating evaluations of performance and potential are
described. We then take up special topics such as the fit of executive leaders with the organization’s
needs and culture. We discuss what may be the most difficult and challenging issue: An evalua-
tion of whether leader selection methods work. We conclude with some review of past and present
executive selection research discussing roles (actual and potential) of I-O psychologists in executive
selection.

EXECUTIVES AND HIGH POTENTIALS: WHO ARE THEY?


WHAT DO THEY DO?
Definitions of “executives” and “high potentials” are as elusive as a definition of “leadership.”
Organizations, and those who write about them, seem to follow Humpty Dumpty: “When I use a
word, it means just what I choose it to mean, neither more nor less.” Following Mr. Dumpty, we use
“executive” to mean those at the top of the hierarchy in organizations, those who carry responsibil-
ity for major organizational units or who occupy key jobs that are essential to the purpose of the
organization (e.g., chief scientist). “High potential” here refers to one deemed to be capable, with the
right development, of occupying one of these executive positions at some time in the relatively near
future. Thus, we do not include lower-level managers who may have long range potential.

823
824 Handbook of Employee Selection

Executives, defined by level in the organization and by participation in the company’s execu-
tive compensation plan, generally make up less than 2% of the total employee population in large
organizations. According to Silzer (2002), executives include general managers, corporate offi-
cers, and heads of major organizational functions and business units; senior executives are in the
top tier—corporate officers, executive committee members, and CEOs. For example, at General
Electric in 2007, among its 320,000 employees, 5,000 were executives, 600 were senior executives,
and 185 were corporate officers (http://gecapsol.com/cms/servlet/cmsview/GE_Capital_Solutions/
prod/en/index.html). Highly decentralized organizations may have many company presidents (e.g.,
Johnson and Johnson with about 180 companies; S. Peters, personal communication, July 2, 2007).
In smaller organizations, the half dozen or so executives who have responsibility for major segments
of the business are, of course, a much smaller percentage of the employee population.
High potential for executive jobs is also organization-specific, but perhaps used more narrowly
than executive. Classifying employees as high potential grows out of an organization’s efforts
to ensure continuity in its supply of executives through succession planning, replacement
planning for key positions, and talent management. High potentials are typically high-
performing managers and individual contributors. At Royal Dutch Shell, “potential is the
estimate of the highest job that an individual will be able to perform in his or her future career”
(Podratz, Hungerford, Schneider, & Bui, 2007). High potentials are those thought to be able
to reach executive jobs. Depending on the resources devoted by the organization to leadership
development, high potentials may be identified quite early in their careers, or when they have had
a chance to demonstrate potential.
Executive jobs have changed dramatically since the 1950s when management was associated
with large, stable organizations and consisted of the classic functions of planning, organizing, and
controlling. Hemphill’s classic studies of executive behavior arrived at ten dimensions of manage-
ment at the executive level (Campbell, Dunnette, Lawler, & Weick, 1970) that only faintly resemble
the way executive work is described today. Now, those classic management functions must be aug-
mented with complex, diverse, and situation-specific leadership behaviors required by dynamic,
global, competitive business environments. Bass (1990) captured the essence of the distinction
between manager and leader: “Leaders manage and managers lead, but the two activities are not
synonymous,” (p. 383). To manage, the executive carries out the classic functions; to lead, the execu-
tive behaves in ways that inspires and influences the behavior of others.
Members throughout the organization may carry out leadership behaviors, and a full
description of a modern understanding of leadership behaviors is beyond the scope of this chapter
(see Salancik, Calder, Rowland, Leblebici, & Conway, 1975 for the distinction between leadership
vs. leader). Today, the simplest answer to the question of “What do executives do?” may be
“Whatever it takes.” Lengthy executive position descriptions have given way to outcome-oriented
objectives, relating to what the executive is expected to contribute to the strategic mission of the
organization.
Today, organizations and positions are seen as dynamic and rapidly changing. Executives are
expected to change the jobs they are in, and expected to be changed by these jobs. What executives
do to achieve the required outcomes is less a function of a written job description and more about
how what they do fits into the climate and culture of the organization. The higher the level of execu-
tive position, the more the incumbents shape the position to their preferences, talents, and abilities
to advance the mission of the organization. Selecting an executive has changed from asking, “What
must the executive do?” to “What must get done?” “What does it take to get that done?” is typi-
cally defined by a list of competencies or human attributes believed to underlie executive success.

EXECUTIVE COMPETENCIES AND ATTRIBUTES


Terminology used to define the characteristics needed for executive success varies considerably
across organizations and situations. These characteristics are often expressed as a mixture of
Selecting Leaders 825

competencies and human attributes, but this distinction is not always precise. Competencies are
often stated in terms of broad organizational objectives (e.g., global perspective), but some com-
petency lists include human attributes (e.g., adaptability). Most organizations have competency
lists or models for their executive positions (e.g., see the 17-competency Executive Skills Profile,
Frisch, 1998). Lists may vary from few (6–8) to many (60–70). A typical set of competencies
that has been used in talent management at Johnson and Johnson include: integrity and credo-
based actions, strategic thinking, big-picture orientation with attention to detail, ability to inspire
and motivate, intellectual curiosity, collaboration and teaming, sense of urgency, prudent risk tak-
ing, self awareness and adaptability, results and performance-driven. A challenge facing anyone
doing selection is to translate these competencies into operationally defined personal attributes that
are related to leadership effectiveness, make a difference in performance of executives, and can
be assessed. The process of translating organizational competencies to measurable human attri-
butes is carried out by various methods traditionally known as job analysis techniques (Brannik &
Levine, 2002). Subject matter experts may be asked to rate the relevance of a list of attributes needed
to completion of job tasks and responsibilities, to rate the importance of behaviors known to be
examples of attributes, and/or to infer directly the importance of human attributes to successful
leadership performance.
In the next section, we discuss some of the attributes thought to be associated with successful
executives and high potential leaders. The scores of human attributes that have been associated with
effective leadership literally range from “a to z” (Bass, 1990). Our review suggests to us that five
of these have been emphasized recently and bear further discussion: intelligence, social/emotional
intelligence, personality, self-efficacy, and integrity.

INTELLIGENCE
Executives must have a fairly high level of intelligence to run complex organizations. The well-
documented relationship of job complexity to cognitive abilities (Ones, Viswesvaran, & Dilchert,
2005) suggests that intelligence is important for executive success. But, the type of intelligence
and the relationship of intelligence to leader effectiveness have been debated for decades. Korman
(1968) concluded that verbal intelligence predicts performance of supervisors but not higher-level
managers. Menkes (2005) found that cognitive skills, such as analyzing and anticipating business
trends, differentiate star executives from their peers. Some have argued that whereas crystallized
intelligence (i.e., knowledge of facts) is important at lower levels, fluid intelligence (akin to creativ-
ity) is important at executive levels. Although a certain (probably considerable) amount of cognitive
abilities is important, additional levels may not be related to executive performance. In fact, some
evidence suggests a curvilinear relation: lower and higher levels of intelligence may be detrimental
to leadership success (Bass, 1990). The search for elements of cognitive ability important to execu-
tive leadership (and other areas of effective performance) has led to specification of different types
of intelligence.

SOCIAL/EMOTIONAL INTELLIGENCE
Emotional intelligence may be linked to leadership success (Goleman, 2000). Emotional intelli-
gence is defined as a person’s ability to successfully manage his or her own emotions and under-
stand the emotions of others (Salovey & Mayer, 1990). Goleman (1998) suggested that individuals
with high levels of emotional intelligence are more likely to be identified as star performers, and
McClelland (1996) found that managers with higher levels of emotional intelligence outperform
those with lower levels of emotional intelligence by 20%. Similarly, social skill (Riggio, Riggio,
Salinas, & Cole, 2003) and social intelligence (Riggio, Murphy, & Pirozzolo, 2002; Zaccaro, 2001)
have been related to leadership success. Considerable disagreement exists over the definition of
these types of intelligence.
826 Handbook of Employee Selection

PERSONALITY
The Five-Factor Model has become one of the widely used and popular conceptualizations of per-
sonality (e.g., McCrae & Costa, 1989). Sometimes known as “The Big Five,” these attributes include
conscientiousness, agreeableness, openness to experience, neuroticism (sometimes called anxiety),
and extraversion. Variables in the model, especially extraversion and conscientiousness have been
associated with leadership effectiveness (Judge, Bono, Ilies, & Gerhardt, 2002; Lord, DeVader, &
Alliger, 1986). Personality traits may be particularly predictive of successed for top-level leaders
given the amount of autonomy that is characteristic of such positions (Barrick & Mount, 1991).
Indeed, researchers have demonstrated the importance of personality to top-level leaders in orga-
nizations, such as Sears (Bentz, 1990). For the flip side, Hogan and Hogan (2001) suggested that
“dark side” personality traits (e.g., paranoia and passive aggressiveness) can also be used to predict
leadership failure.

SELF-EFFICACY
Self-efficacy (Smith & Foti, 1998; Chemers, Watson, & May, 2000) and the related construct of
self-esteem (e.g., Atwater, Dionne, Avolio, Camobreco, & Lau, 1999) are also important for leader
success. Self-esteem is seen as a more generalized perception of the self, whereas self-efficacy is
usually conceptualized as one’s confidence in his or her ability to perform well in a specific domain.
In the case of leadership, the most relevant type of self-efficacy would be self-efficacy for one’s
ability to lead. Chemers et al. (2000) showed that superior officers, peers, and trained observers
related levels of leadership efficacy to leadership ratings. Further, individuals’ leadership efficacy
may impact team performance through its effects on followers’ self-efficacy beliefs (Hoyt, Murphy,
Halverson, & Watson, 2003).

INTEGRITY
Integrity also deserves mention. The relevance of integrity to leadership has been highlighted in
recent years with corporate scandals such as Enron and WorldCom. Meta-analytic research has
supported the relationship between leader integrity and leadership performance (Judge & Bono,
2001). The importance of integrity to leadership has also given rise to a new model of leadership
called Authentic Leadership (Avolio & Gardner, 2005). The construct of ethical leadership is also
related to leader integrity (Brown & Trevino, 2006; Brown, Trevino, & Harrison, 2005). These
authors define ethical leadership as “the demonstration of normatively appropriate conduct through
personal actions and interpersonal relationships” (Brown et al., 2005, p. 120). They demonstrate that
ethical leadership is related to perceptions of leader honesty and fairness, trust in leader, and other
perceptions of effective leadership. Moreover, they suggest that ethical leadership can also influence
follower ethical behavior insofar as leaders are role models for their followers (Brown & Trevino,
2006; Brown et al., 2005).

ASSESSMENT TECHNIQUES
Methods to assess these and other attributes range from using internal performance appraisal data
to elaborate testing and assessment. In the following section, we summarize research on several
different types of assessment techniques and, where data exist, discuss their use in executive selec-
tion. The strength of the relationship between a specific attribute measured by a given assessment
technique and a specific criterion of leadership effectiveness is typically only moderate. That is, the
correlation is seldom over .35 to .40. Combinations of multiple measures of multiple attributes often
yield correlations in the range of .50 to .60. In other words, approximately 15–35% (i.e., .352 to .602)
of the variation in leadership effectiveness is predicted from one or more human attributes. Readers
Selecting Leaders 827

seeking more information on specific tests, questionnaires, and assessment methods in I-O psychol-
ogy will find the handbook by Thomas (2004) quite useful.

COGNITIVE ABILITY TESTS


Although there is little question that executives must have relatively high cognitive ability, there is
mixed evidence that cognitive ability tests are widely used, valid, or useful for selection into top
ranks of the organization. Cognitive ability tests are commonly used in individual psychological
assessment (as described later in this chapter); in a survey of 628 staffing directors, Howard, Erker,
and Bruce (2007) found that approximately 50% of organizations surveyed used ability tests at high
managerial levels. Cognitive ability tests (in comparison with other measures) have been shown to
have some of the highest validity correlations with performance throughout the managerial ranks,
but Fiedler (1995) pointed out that measures of an individual’s cognitive abilities have been margin-
ally successful in predicting how a leader will perform in a particular job. At the highest execu-
tive levels, marked restriction of range in test scores may provide little meaningful differentiation
among candidates and may severely restrict correlation coefficients.

BIODATA
There is little question that one’s experiences during childhood, adolescence, education, military
training, and initial work up to the time of being considered to have high potential as an executive,
or being chosen for an executive position, are relevant to later success. Considerable research has
shown that systematic measures of biographical experiences can be highly predictive of perfor-
mance in various jobs (Stokes, Mumford, & Owens, 1994), including supervisors and managers
(Stokes & Cooper, 1994). Individual experiences that predict different performance criteria vary
considerably, but commonly predictive generic dimensions of life history include academic achieve-
ment, family background, economic stability, and financial responsibility. However, at the executive
level, systematically gathering information about individuals’ early life experiences can be prob-
lematic because the relevance may not be apparent. Formally scored biodata questionnaires are not
used frequently, although their prevalence as a selection device may be increasing. In a 2006 survey
of staffing directors, 60% reported they used application forms and 23% (up from 13% in 2004)
used a biographical data form (Howard et al., 2007).

PERSONALITY QUESTIONNAIRES
In 1991, Tett, Jackson, and Rothstein reported that personality tests, and in particular measures
of the Five-Factor Model of personality, are frequently used in selection contexts. Furthermore,
research has demonstrated that personality tests that are more specific than the Big Five are more
predictive than the Big Five (Pulakos, Borman, & Hough, 1988). Yet, despite the relevance of per-
sonality characteristics to leadership, personality questionnaires are rarely used to select executives
(Hogan & Kaiser, 2005). In Howard et al.’s (2007) survey of organizations, 65% of the organiza-
tions had never used a personality inventory for selection.
Considerable controversy exists over the extent to which responses to self-report personality
tests are influenced by contaminants such as self-enhancement biases and faking. Contrasting opin-
ions on these matters are expressed in a series of articles in the autumn and winter 2007 issues of
Personnel Psychology. Morgeson et al. (2007) raised questions over the utility of self-report per-
sonality questionnaires for selection purposes given their low correlation with job performance and
the inherent flaws of self-report data. In response to this article, other authors argue for the utility
of personality measures in selection, particularly when more sophisticated weighting schemes and
conceptualizations of personality are used (Ones, Dilchert, Viswesvaran, & Judge, 2007; Tett &
Christiansen, 2007).
828 Handbook of Employee Selection

Integrity measures, personality-based scales and overt scales have demonstrated high levels of
validity and are used with some frequency in selection contexts (Ones, Viswesvaran, & Schmidt,
1993), although 85% of Howard et al.’s (2007) respondents had never used integrity tests for execu-
tive selection. Yet, there is some evidence that integrity tests could be useful at the executive level.
Ones et al.’s (1993) meta-analysis demonstrated that integrity tests were equally good at predicting
job performance for jobs ranging from low to medium to high complexity, and integrity tests were
better at predicting counterproductive work behaviors for jobs that were high in complexity.
Although related to job performance, measures of self-efficacy (Judge & Bono, 2001) and
emotional intelligence (Law, Wong, & Song, 2004) have had little use in selection contexts, pos-
sibly because of the transparency of these measures. There is some evidence supporting the use
of measures of social skills, a construct related to emotional/social intelligence, for selection (e.g.,
Morgeson, Reider, & Campion, 2005; Riggio & Taylor, 2000), but additional research is needed
before use in selection of top-level leaders. In particular, the move toward ability-based measures of
emotional intelligence could be fruitful (e.g., Daus & Ashkanasy, 2005). Ability measures of emo-
tional intelligence are psychometrically similar to general intelligence tests and assess skills using
performance tasks rather that self-report measures of behavior.

INDIVIDUAL PSYCHOLOGICAL ASSESSMENT


Individual psychological assessment typically includes some combination of tests of cognitive abili-
ties, personality, and interviews. This process may assess many of the same variables assessed
by cognitive and personality tests. It has been a well-known part of the toolkit of psychologist-
practitioners in most consulting firms for decades. The popularity of individual assessment over the
years is due, in part, to its flexibility. It can be used to assess individual executives on the spur of
the moment, and it can serve various purposes (see Jeanneret & Silzer, 1998). Although definition
and practice vary, the essence of individual psychological assessment is that an individual assessor
integrates information about job requirements and data on an individual assessee to make judg-
ments about fit with the position. The job analysis is typically as informal as a discussion with the
client organization about what the job must accomplish and what competencies are required. Data
gathering instruments typically include cognitive tests, biographical information/career reviews,
situation exercises/simulations, and almost always a semi-structured interview conducted by the
assessor (Ryan & Sackett, 1987).
Because of the idiosyncratic nature of individual assessments, their effectiveness has been and
is a subject of disagreement. On the positive side, reviews over 50 years have concluded that studies
of individual assessment have shown moderate to good predictive validity (Korman, 1968; Prien,
Schippmann, & Prien, 2003). Highhouse (2002) presented a contrasting review, having concluded,
“The holistic approach to judgment and prediction has simply not held up to scientific scrutiny”
(p. 391). He speculated that individual psychological assessment, like psychotherapy before it, has
achieved “functional autonomy [that] has enabled individual psychological assessment to survive
and flourish” (p. 391). Support for this conclusion comes from our personal communication with
a sample of psychologists in leading consulting firms1 who estimated that the number of individ-
ual assessments done by psychologists each year (2007) in the United States range from 15,000
and 50,000. Piotrowski (2007) provided an informative description of an individual psychological
assessment program run at Hartford using a cadre of outside psychologists in which over 300 man-
agers are assessed per year. Although individual psychological assessments may be used below the
executive level, it is more frequently used with high potentials after they have been identified as high
potential rather than as part of the high potential selection decision.

1 Thanks to the following for their generous sharing of time and knowledge about current practice: Seymour Adler (AON),
Steward Crandell (PDI), P. Richard Jeanneret (Valtera), and Matt Pease (DDI).
Selecting Leaders 829

MULTISOURCE APPRAISAL
Multisource or 360º performance appraisals are often used for selection and development of high
potentials; they are also used for screening of executive candidates. As the name implies, managers
are rated on a questionnaire by their supervisors, subordinates, peers, and selves, and even inter-
nal customers, external customers, vendors, or suppliers (Dalessio, 1998). Estimates of the usage
of 360º appraisals range from 12% to 29% of all organizations (Church, 2000) and up to 90% of
Fortune 500 companies (Atwater & Waldman, 1998). Although 360º appraisals have primarily been
used as a feedback and development tool for managers (Goodstone & Diamante, 1998), Tornow
(1993) suggested that the method can also be used for appraisal, selection, and promotion. Indeed,
Halverson, Tonidandel, Barlow, and Dipboye (2005) found that ratings and self-other agreement on
360º ratings predicted promotion rate throughout one’s career in the United States Air Force.
Despite the trend toward using 360º appraisals for administrative purposes (Dalton, 1996;
London & Smither, 1995), there is controversy over the application of 360º appraisals to selection
(Craig & Hannum, 2006; Toegel & Conger, 2003). Specifically, writers have expressed several
concerns: self-enhancement by the manager who has a strong motivation to convey that he or
she has been performing well (Craig & Hannum, 2006); raters who know the assessment is for
administrative purposes may not wish to negatively impact the focal manager (DeNisi & Kluger,
2000); employment decisions based on ratings from unknown and minimally trained raters may
not be legally defensible. In fact, Morgeson, Mumford, and Campion (2005) noted that one of the
leading organizational consultants in the use of 360º appraisals, the Center for Creative Leadership,
restricts their use to developmental purposes.

ASSESSMENT CENTERS
The assessment center (AC) method has been used for evaluating executive potential for over
50 years (Thornton & Byham, 1982; Thornton & Rupp, 2006). Originally validated with longitudinal
studies as an indicator of early management potential of men and women (Howard & Bray, 1988),
the method has been used to assess executives in numerous industries and countries. The unique
aspect of the AC method is the combination of features that involve observation of overt behavior
in multiple simulations of organizational challenges by multiple trained observers who integrate
evaluations in consensus discussion, statistical formulae, or a combination of both. Some ACs involve
the consideration of information from other techniques such as cognitive ability tests, personality
questionnaires, multisource feedback, or a background interview. Older and recent surveys show
that large numbers of organizations use ACs for selecting executives and high potentials (Thornton
& Krause, 2009). Howard (1997) reported that Development Dimensions International assessed
over 350 executives in 1996, and the number is surely larger in recent years. Executive ACs involve
dimensions such as global awareness and strategic vision, calling for strategic decisions such as
launching joint ventures, managing the talent pool, or promoting a turnaround. A recent study in
Germany found that an AC added incremental validity over cognitive ability tests in predicting
executive success in a public organization (Krause, Kersting, Heggestad, & Thornton, 2006).

LEADERSHIP QUESTIONNAIRES
The reader may be surprised that we have not reviewed leadership behavior and style questionnaires
(Clark & Clark, 1990). To be sure, there are scores of self-report questionnaires developed over the years,
such as the Leader Behavioral Description Questionnaire and the Leadership Opinion Questionnaire.
The respondent is typically asked to indicate how often he or she does certain behaviors, such as
directing the work of subordinates or providing them support. In addition, some general self-report
personality questionnaires have scales measuring leadership potential. Although these instruments
have been useful in research, in helping individuals gain insight into their leadership styles, and in
830 Handbook of Employee Selection

counseling and training settings, they have not been applied extensively in executive selection. The
older questionnaires typically do not cover the broader set of leader characteristics deemed important
in recent leadership studies, and they all suffer the potential biasing effects of self-enhancement.

EXECUTIVE SELECTION IN AN HRM SYSTEM


Although this chapter focuses on selection, the final screening procedure is only one phase of a long-
term, complex process. The executive leader “pipeline” includes recruitment, selection, development,
and performance management. Each phase must be executed effectively and also must be integrated
within a total human resource (HR) management system. In our observation, the more effective orga-
nizations systematically integrate these processes of selecting and developing high potentials and
executives. Byham, Smith, and Paese (2002) provided, an example of an integrated system.
These processes take place in two important contexts: the climate and culture of the organiza-
tion and the environment external to the organization. The organizational context must support the
leader selection process, and the selection process must have credibility in the organization. The
external environment also can affect the process. Succession planning may produce executives quite
competent in some contexts (e.g., traditional and stable business climates), but not in others (e.g.,
turbulent times). And in a tight labor market, well-qualified applicants for top leadership positions
may be less willing to undergo an onerous screening process, and high potential internal candidates
may choose to leave unless they are allowed to fill the available executive positions.
A committee of higher-level executives typically evaluates all the information about candidates’
strengths and organizational needs. Placement into a final pool of high potentials or promotion to
executive ranks is described next.

PERFORMANCE REVIEWS: THE ROLE OF


TOP EXECUTIVES AND BOARDS
Thornton and Byham (1982) pointed out that, despite their well-known limitations, supervisory
judgments were the most commonly used practice for predicting managerial effectiveness. The
same may be said today with regard to executive selection (Sessa, 2001) and for perhaps the same
reasons: Performance reviews by supervisors are practical and widely accepted by those in the
organization. More formal and complex systems of performance management may have replaced
simple performance appraisal programs, and common practice today includes review and participa-
tion by higher-level management and HR specialists, but the performance evaluation by an employ-
ee’s immediate supervisor remains the starting point for promotions into a high potential pool and
higher-level positions.
A high-performance evaluation in one’s current position has become “the admission price for
future growth and development” (Charan, Drotter, & Noel, 2001, p. 166). For example, the five-
level assessment pyramid at Coca Cola includes a foundation of a track record of performance,
potential, experience and competencies, leadership behaviors, and fit for the role. Similarly, Royal
Dutch Shell evaluates thinking capacity, achievement in fulfilling current roles, and relationships
with people on the basis of examples from past performance on the job (Podratz et al., 2007).
A performance/potential grid is an integral part of talent management processes today. Placement
on the grid begins with the immediate manager’s evaluation, reviewed by the next level manager.
Grid ratings then become a part of the discussion at talent management meetings. Other inputs to
the placement decision may include the range of assessment techniques discussed in this chapter,
but performance review is the foundation. The grid decision, usually made annually, has conse-
quences in terms of the development activities and positions available to the employee (Charan
et al., 2001).
Although lower-level employees are most often evaluated by their direct supervisors, as they
advance within an organization their reviewers become a broader group, typically more senior
Selecting Leaders 831

executives, the CEO, and the board. The reviewers ordinarily have a number of prior reviews of
performance and potential along with assessments of development needs available. The board of
directors of an organization has become increasingly involved in an organization’s selection (as
well as compensation and evaluation) of the CEO as a result of the recent raft of corporate scandals
and “a subsequent stream of regulations and guidelines” (Nadler, Behan, & Nadler, 2006, p. 174).
Boards are involved in reviewing the performance of not only the CEO and senior executives, but
also current high potentials and staff who are in the pipeline to become executives in the next sev-
eral years. Board involvement in executive selection and talent management is equally characteristic
in not-for-profit and private sectors, although not-for-profits are typically smaller organizations with
less extensive and less formal review processes.
Another example of the integration of evaluations of performance and potential is General Electric’s
widely copied process called Session C. This is a formal annual review of past accomplishments
in delivering quantifiable results including financial results, input from 360 degree feedback
questionnaires, and observations from mentoring relationships and developmental programs to arrive
at judgments of performance and potential. Integration of these data is carried out by committees
including the CEO, top executives, and HR staff (http://gecapsol.com/cms/servlet/cmsview/GE_
Capital_Solutions/prod/en/index.html), typically ascending up the organization with review sessions
beginning at the division levels and extending up to the executive committees and board level.
Corporate responses to the well-documented limitations of performance reviews (e.g., Dipboye,
Smith, & Howell, 1994) have improved the process, but those limitations still exist and have not
decreased the popularity of the process. Arguably, changes in the 21st century organization (e.g.,
globalization, multiple managers, greater spans of control, increased use of teams, virtual reporting
relationships) have made effective performance reviews even more difficult. Reviews of perfor-
mance and potential may be supplemented by other techniques, but they remain today, as 25 years
ago, the core of the selection process.

SUMMARY
The evidence for the validity, relevance, and accessibility of these techniques for the selection
of executives and high potentials is mixed. Moreover, perhaps the most widely used process for
selecting executives from outside the organization involves and is managed by external executive
recruiters (Howard et al., 2007), a process not discussed in detail here. Executive recruiters
typically use interviews and references as the data gathering methods, provide descriptions of
candidates, and make their own recommendations to the organization. Wackerle (2001) described,
in detail, this process. When there are internal and external candidates for an executive position in
an organization, there may well be much more information available about the internal employee
(e.g., performance appraisals, test results). The differences raise questions of the comparability and
fairness of the selection decisions. Equity may be introduced if an external source assesses both
internal and external candidates, a process being carried out more frequently (Howard, 2001).
There does not appear to be any one best method for executive selection, and evidence of the
prevalence of one selection technique or the lack of use of another does not provide evidence of
the measure’s validity or utility. Each of the measures discussed here may be useful for selecting
executives. Organizations must examine the qualities they are looking for, their staffing strategy
and philosophy, and their past success with different measures when choosing their selection plan.
In addition, consideration must be given to a leader’s fit with the organization.

FIT: INDIVIDUALS, TEAM, OUTCOME, FOLLOWERS,


AND CULTURE
Most cases of executive failure are accompanied by a statement that “there wasn’t a good fit.” What
does this mean? The traditional selection paradigm matches individuals with jobs. At executive
832 Handbook of Employee Selection

levels, a broader array of characteristics of individuals and jobs are essential; fit becomes multidi-
mensional on both sides. Fit has dominated writing on executive selection in recent years. Fitting the
executive to the strategic demands of a position (e.g., turnaround, start-up, harvesting) has become
standard practice. Increasingly, students of organizational success have recognized that success
depends on well-functioning teams rather than a single, dominant leader.

TEAM
Therefore, finding executives who fit the team is essential for the success of the executive and the
organization. At the CEO level, Hollenbeck (1994) argued that successful selection depends upon
the fit among three sets of variables: those of the individual, the organization, and the external envi-
ronment or strategic demands. Sessa and Taylor (2000) discussed characteristics of the candidate,
the organization, and its strategy. Moses and Eggebeen (1999) argued that executive selection is
influenced by the notion of fit between the individual and the needs and climate of the organization
as they change over time (e.g., from a large, stable organization such as the earlier AT&T or IBM to
a faster-paced, versatile, constantly evolving organization such as the later AT&T or IBM).
Although our earlier discussion of what executives do suggests they may face different demands
across situations and time, Sternberg and Vroom (2002) suggested that the leader, the situation, and
the leader by situation interaction are all likely to impact leadership success. Fiedler’s early work on
Contingency Theory (Fiedler, 1967) suggests that the extent to which task-oriented or relationship-
oriented leaders will be successful in a particular situation depends on situation favorability: posi-
tive leader member relations, a structured task, and position power. Path-goal theory (House, 1971)
suggests that leaders should alter their behavior on the basis of characteristics of followers and the
situation (task, authority system, workgroup).

OUTCOME
The type of performance outcome that is required for the task may also be an important situational
contingency. For example, charismatic and transformational leadership may lead to greater creativity,
but not overall productivity (Johnson & Dipboye, 2008; Sosik, 1997). Another variable that has been
of interest is the level of stress or situational uncertainty in the situation. Fiedler and Macaulay’s (1999)
review of the literature on Cognitive Resource Theory (Fielder, 1995) suggests that experience is
more important than intelligence for leader success under stressful situations. Similarly, Yun, Faraj,
and Sims (2005) found that empowering leadership was more important in low-stress situations and
when team members were more experienced, whereas directive leadership was more effective in
high-stress situations and when team members were less experienced. Charismatic leadership theory
also suggests that leaders who express a vision, demonstrate sensitivity, and exhibit unconventional
behavior (e.g., self-sacrifice) are particularly effective during crisis situations (Halverson, Holladay,
Kazama, & Quiñones, 2004; Halverson, Murphy, & Riggio, 2004), when leading organizational
change initiatives (Zaccaro & Banks, 2004), and when promoting organizational growth in
entrepreneurial firms (Baum, Locke, & Kirkpatrick, 1998).2

FOLLOWERS
Follower characteristics may also impact what type of leader will be effective in a given situa-
tion. Followers’ satisfaction and perceptions of their abilities (Hersey & Blanchard, 1988); need for
autonomy (Yun, Cox, & Sims, 2006); openness to experience (Groves, 2005); motives (Wofford,

2 However, it should be noted that doubt has been cast on the importance of charismatic leadership. For example, in his
popular book, Good to Great, Collins (2001) suggested that the effects of charismatic leadership on organizations is short
lived and does not lead to overall organizational success.
Selecting Leaders 833

Whittington, & Goodwin, 2001); and achievement orientation, self-esteem, and need for structure
(Ehrhart & Klein, 2001) have been shown to moderate the effectiveness of leaders’ behavior. From a
slightly different point of view, rather than matching a leader to situational demands, organizations
might seek to hire leaders who can adapt to any situational demand, highlighting the importance of
adaptability (Pulakos, Arad, Donovan, & Plamondon, 2000) and social/emotional intelligence. For
example, Goleman (2000) suggested that leaders should change their style on the basis of situational
demands and suggested that leaders high in emotional intelligence are better able to do so than
leaders with less emotional intelligence. Similarly, Hooijberg and Schneider (2001) suggested that
executive leaders high in social intelligence may be better able to adapt to differences in followers’
attitudes, personalities, and motives.
Identifying high potentials is explicitly an exercise in fit within a given organization. When placing
high potentials in positions, consideration should be given to how much the position will increase the
leadership capability of the individual and the organization (McCall, 1998), in addition to how well
the high potential will perform. These findings provide guidance to analyze the situation in the organi-
zation that executives are being chosen to lead. Whereas traditional job analysis focused on respon-
sibilities to be accomplished, competencies to attain, and human attributes of leaders, additional
efforts to understand situational demands may improve selection of executives and high potentials.
Taxonomies and measures of situational demands are fertile areas for research by I-O psychologists.

CULTURE
In addition to differences in the variables just reviewed, the culture in which a leader’s organization
is embedded can also impact leadership effectiveness. That is to say, certain leadership traits and
behaviors will be perceived more positively in certain cultures than others. Considerable evidence
for cross-cultural differences in leadership effectiveness has come from the work on the GLOBE
project (e.g., Dickson, Den Hartog, & Mitchelson, 2003; Javidan, Dorfman, de Luque, & House,
2006). For example, Javidan et al. (2006) reported that more individualistic cultures rate contingent-
reward leaders higher than they are rated in collectivistic cultures. Collectivistic cultures also prefer
leaders who avoid conflict and promote harmony. Despite the long list of differences, there are also
some universally positive leader characteristics including integrity, planning, and team building
skills. Needless to say, however, the national culture should have an influence on what types of
leaders are selected for an organization. Moreover, the type of selection practices used to hire execu-
tives should reflect the national culture from which an organization is hiring (Dipboye & Johnson,
in press).

DOES IT WORK?
In the previous sections, we reviewed several techniques used in executive selection. Mixed
amounts and levels of relevant published, supportive evidence were noted. This begs the question—
Does executive selection work? This question is particularly important given the marked increase
in executive turnover since the 1990s (Lucier, Spiegel, & Schuyt, 2002) and the increase in the
use of outside selection rather than internal promotion (Murphy & Zabojnik, 2004). In a unique
study, Russell (2001) reported a longitudinal study of performance among 98 top-level executives.
Information from interviews and questionnaires were integrated into ratings on several performance
dimensions by teams of executives and researchers via a process akin to the wrap-up discussion in
an AC. Competency ratings in problem-solving and people orientation predicted subsequent fiscal
and nonfiscal performance trends. This sort of research is difficult for several reasons. First, as
Hollenbeck (1994) pointed out, there are several inherent difficulties of CEO selection itself: each
CEO position is unique and may be changing in an uncertain future, the selection decision is unique,
the decision-makers may never have made such a decision and are probably not trained to do so, the
process is probably not completely open, and outside forces may come into play.
834 Handbook of Employee Selection

Second, it is difficult to conduct a good study to determine if the process was successful.
Hollenbeck (1994) offered a partial list of explanations: the long time involved to select one per-
son, high secrecy surrounding this high-stakes choice, and difficulty for credible researchers to
get access to the expensive process. There are also technical research problems precluding classic
criterion validation studies: small sample size, low range in measures of key variables such as intel-
ligences, resistance of candidates to be subject to onerous and sensitive assessment procedures,
inherent limitations (e.g., faking, biased rating by self and others) of some promising constructs,
difficulty of accessing a comparison group of individuals not selected, and complexity of any cri-
terion measure. The difficulty of finding a meaningful criterion of effectiveness of selecting high
potentials and executive leaders bedevils researchers and practitioners. Appealing as it may appear,
an index of organizational performance as a complex criterion measure has proven to be a conten-
tious topic in the leadership literature. The lack of published empirical studies of the accuracy of
executive selection procedures may be lamentable, but hardly surprising.
Furthermore, there has been a debate over the extent to which leadership impacts organiza-
tional performance. For example, Pfeffer and Salancik (1978) and others (e.g., Meindl, Ehrlich, &
Dukerich, 1985) have argued that leadership has a minimal impact on organizational performance.
Lieberson and O’Connor (1972) found that leadership had a smaller impact on organizational per-
formance than the factors of industry or company. Despite these arguments, other researchers have
demonstrated the enormous potential for leaders to affect organizational performance (see Day &
Lord, 1988, for a review). Weiner and Mahoney (1981) found that CEO change accounted for 43.9%
of the variance in organizational profitability and 47% of the variance in stock prices. As Svensson
and Wood (2006) suggested, organizational performance can be based on leader skillfulness or
serendipity.
However, the relationship between leadership and organizational performance (or lack thereof)
does not suggest that leader selection efforts are necessarily successful (or unsuccessful). The CEO’s
team may play a large role in success or failure. If all leader selection efforts were successful, then
there would be no variance in the impact of leadership selection on performance. In fact, evidence
of the effect of leader succession and change on organizational performance suggests just the oppo-
site. Many organizations are making bad leadership selection decisions, just as others are making
good selection decisions. There is a definite need for research on the use of different selection meth-
ods on organizational and leadership performance to further address this issue.

CONCLUSIONS
There are many ways executives get the job done. There is no agreed upon list of executive com-
petencies or attributes. Competencies that are commonly listed include strategic global perspec-
tive, strategic thinking, performance orientation, and emphasis on people development. To get the
job done, the executive will have a pattern of human attributes needed by the organization at the
point in time of selection. No single attribute or simple profile of attributes is related to executive
effectiveness. These attributes include a unique profile including some forms of intelligence, per-
sonality characteristics and values, as well as experience, knowledge, and effective interpersonal
skills. Organizations use various methods to assess these attributes. The quality of tests, personal-
ity questionnaires, and interviews has improved over the years. But, these procedures are used in
different ways at each stage of the process: They are used in more formal systematic quantitative
ways at screening candidates into pools of high potentials, but in more informal and variable ways
during the integration of information at time of selection into executive ranks. The most common
method of selecting executives remains the performance review process by higher-level executives
and the board of directors. In larger companies, the process patterned after the GE’s Session C has
become common.
The rigor of these final steps of leader selection varies considerably. Observers of these pro-
cesses have lamented the lack of consistency and sophistication shown by many organizations.
Selecting Leaders 835

Suggestions have been made for more systematic processes of determining organization needs,
assessing competencies in candidates, and matching competencies to needs, but little research has
been conducted to evaluate these methods. I-O psychologists have helped articulate and evaluate
the attributes related to leader effectiveness and have been involved in designing programs to screen
candidates into high-profile pools and to develop leadership and managerial skills. In addition, I-O
psychologists have specialized in individual assessment of external candidates. But, with few excep-
tions of psychologists who consult with CEOs and boards, they have not played extensive roles in
the final stages of executive selection among internal candidates.

RESEARCH OPPORTUNITIES
Future involvement of I-O psychologists could include further articulation of the competencies and
attributes needed for diverse organizational challenges (e.g., defining and assessing characteristics
related to long term success such as character); specification of organizational and environmental
variables that need to be considered in determining fit; understanding how the complex process of
executive selection will be done differently in every job and every organization; and training CEOs,
top executive teams, and boards of directors in processes of matching candidates to demands of
positions. Consulting firms are becoming more involved in assessing internal and external candi-
dates for CEO positions, and these assignments may provide opportunities for I-O psychologists
to apply more scientific methods to the selection of CEOs (A. Howard, personal communication,
March 3, 2008).
On the basis of our observations of the field of executive selection, we note that there was much
systematic research in the 1960s to 1980s on early identification of management potential and exec-
utive selection, but not as much recent published research. Various assessment techniques (e.g., bio-
data, ACs, and cognitive ability tests) were evaluated for selection, but emphasis the past 2 decades
has been placed on development. Considering the noted scandals in executive ranks, selection may
be gaining renewed importance as the cost of executive failure becomes higher. The concern for
fit is probably the most significant development in executive selection in the last 20 years. More
research is needed into the judgmental processes that are needed to combine complex patterns of
information about candidates on the one hand with the complex changing patterns of organizational
and situational demands on the other hand. At the risk of appearing nihilistic, we suggest that the
traditional statistical methods used by I-O psychologists to study relationships of predictor scores
and criterion measures may not be up to the task of understanding the processes of executive selec-
tion at the highest levels of organizations. Studies of these complex processes of executive selection
may call for different research methods to study executive selection, including clinical methods of
judgment, policy capturing with executive recruiters, and a return to dormant complex statistical
validation strategies such as synthetic validation (McPhail, 2007).

REFERENCES
Atwater, L. E., Dionne, S. D., Avolio, B., Camobreco, F. E., & Lau, A. W. (1999). A longitudinal study of
the leadership development process: Individual differences predicting leadership effectiveness. Human
Relations, 52, 1543–1562.
Atwater, L. E., & Waldman, D. (1998). Accountability in 360 degree feedback. HR Magazine, 43, 96–104.
Avolio, B., & Gardner, W. (2005). Authentic leadership development: Getting to the root of positive forms of
leadership. The Leadership Quarterly, 10, 181–217.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A
meta-analysis. Personnel Psychology, 44, 1–26.
Bass, B. M. (1990). Bass and Stogdill’s handbook of leadership: Theory, research, and managerial applica-
tions (3rd ed.). New York, NY: Free Press.
Baum, R. J., Locke, E. A., & Kirkpatrick, S. A. (1998). A longitudinal study of the relation of vision and vision
communication to venture growth in entrepreneurial firms. Journal of Applied Psychology, 83, 43–54.
836 Handbook of Employee Selection

Bentz, V. J. (1990). Contextual issues in predicting high-level leadership performance. In K. E. Clark & M. B.
Clark (Eds.), Measures of leadership (pp. 131–143). West Orange, NJ: Leadership Library of America.
Brannik, M. R., & Levine, E. L. (2002). Job analysis. Thousand Oaks, CA: Sage.
Brown, M. E., & Trevino, L. K. (2006). Ethical leadership: A review and future directions. The Leadership
Quarterly, 17, 595–616.
Brown, M. E., Trevino, L. K., & Harrison, D. A. (2005). Ethical leadership: A social learning perspective
for construct development and testing. Organizational Behavior and Human Decision Processes, 97,
117–134.
Byham, W. C., Smith, A. B., & Paese, M. J. (2002). Grow your own leaders. Pittsburgh, PA: DDI Press.
Campbell, J. P., Dunnette, M. D., Lawler, E. F., III, & Weick, K. E., Jr. (1970). Managerial behavior, perfor-
mance, and effectiveness. New York, NY: McGraw-Hill.
Charam, R., Drotter, S., & Noel, J. (2001).The leadership pipeline: How to build the leadership-powered
company. San Francisco, CA: Jossey-Bass.
Chemers, M. M., Watson, C. B., & May, S. (2000). Dispositional affect and leadership effectiveness: A
comparison of self-esteem, optimism and efficacy. Personality and Social Psychology Bulletin, 26,
267–277.
Church, A. H. (2000). Do higher performing managers actually receive better ratings? A validation of multi-
rater assessment methodology. Consulting Psychology Journal: Practice and Research, 52, 99–116.
Clark, K. E., & Clark, M. B. (1990). Measures of leadership. West Orange, NJ: Leadership Library of
America.
Collins, J. (2001). Good to great: Why some companies make the leap . . . and others don’t. New York, NY:
HarperCollins.
Craig, S. B., & Hannum, K. (2006). Research update: 360-degree performance assessment. Consulting
Psychology Journal: Practice and Research, 58, 117–122.
Dalessio, A. T. (1998). Using multi-source feedback for employee development and personnel decisions. In
J. W. Smither (Ed.), Performance appraisals: State-of-the-art in practice (pp. 278–330). San Francisco,
CA: Jossey-Bass.
Dalton, M. (1996). Multirater feedback and conditions for change. Consulting Psychology Journal: Practice
and Research, 48, 12–16.
Daus, C. S., & Ashkanasy, N. M. (2005). The case for the ability-based model of emotional intelligence in
organizational behavior. Journal of Organizational Behavior, 26, 453–466.
Day, D. V. (2001). Leadership development: A review in context. The Leadership Quarterly, 11, 581–613.
Day, D. V., & Lord, R. G. (1988). Executive leadership and organizational performance: Suggestions for a new
theory and methodology. Journal of Management, 14, 453–464.
Den Hartog, D. N., & Koopman, P. L. (2001). Leadership in organizations. In N. Anderson, D. S. Ones, H. H.
Sinangil, & C. Viswesvaren (Eds.), Handbook of industrial, work, and organizational psychology. Vol. 2:
Organizational Psychology (pp. 167–187). Thousand Oaks, CA: Sage.
DeNisi, A. S., & Kluger, A. N. (2000). Feedback effectiveness: Can 360-degree appraisals be improved?
Academy of Management Executive, 14, 129–139.
Dipboye, R. L., & Johnson, S. K. (in press). A cross-cultural perspective on employee selection. In D. Stone,
E. F., Stone-Romero, & E. Salas (Eds.), The influence of cultural diversity on human resources practices.
Mahwah, NJ: Lawrence Erlbaum.
Dipboye, R. L., Smith, C. S., & Howell, W. C. (1994). Understanding industrial and organizational psychol-
ogy: An integrated approach. Orlando, FL: Harcourt Brace.
Dickson, M. W., Den Hartog, D. N., & Mitchelson, J. (2003). Research on leadership in a cross-cultural context:
Making progress and raising new questions. The Leadership Quarterly, 14, 729–768.
Ehrhart, M. G., & Klein, K. J. (2001). Predicting followers’ preferences for charismatic leadership: The influ-
ence of follower values and personality. The Leadership Quarterly, 12, 153–179.
Evers, A., Anderson, N., & Voskuijl, O. (2005). Handbook of personnel selection. Malden, MA: Blackwell.
Fiedler, F. (1967). A theory of leadership effectiveness. New York, NY: McGraw-Hill.
Fiedler, F. (1995). Cognitive resources and leadership performance. Applied Psychology: An International
Review, 44, 5–28.
Fiedler, F. E., & Macaulay, J. L. (1999). The leadership situation: A missing factor in selecting and training
managers. Human Resource Management Review, 8, 335–350.
Frisch, M. H. (1998). Designing the individual assessment process. In R. Jeanneret & R. Silzer (Eds.), Individual
psychological assessment (pp. 135–177). San Francisco, CA: Jossey-Bass.
Goleman, D. (1998). Working with emotional intelligence. New York, NY: Bantam Books.
Goleman, D. (2000). Leadership that gets results. Harvard Business Review, 78, 78–90.
Selecting Leaders 837

Goodstone, M. S., & Diamante, T. (1998). Organizational use of therapeutic change strengthening multisource
feedback systems through interdisciplinary coaching. Consulting Psychology Journal: Practice and
Research, 50, 152–163.
Groves, K. S. (2005). Linking leader skills, follower attitudes, and contextual variables via an integrated model
of charismatic leadership. Journal of Management, 31, 255–277.
Halverson, S. K., Holladay, C. L., Kazama, S. K., & Quiñones, M. A. (2004). Self-sacrificial behavior in crisis
situations: The competing roles of behavioral and situational factors. The Leadership Quarterly, 15,
263–275.
Halverson, S. K., Murphy, S. E., & Riggio, R. E. (2004). Charismatic leadership in crisis situations: A labora-
tory investigation of stress and crisis. Small Group Research, 35, 495–514.
Halverson, S. K., Tonidandel, S., Barlow, C., & Dipboye, R. L. (2005). Self-other agreement on a 360-degree
leadership evaluation. In S. Reddy (Ed.), Perspectives on multirater performance assessment (pp. 125–
144). Nagarjuna Hills, Hyderabad, India: ICFAI Books.
Hersey, P., & Blanchard, K. H. (1984), Management of organizational behaviour (4th ed.). Englewood Cliffs,
NJ: Prentice-Hall.
Highhouse, S. (2002) Assessing the candidate as a whole: A historical and critical analysis of individual psy-
chological assessment for personnel decision making. Personnel Psychology, 55, 363–396.
Hogan, R., & Hogan, J. (2001). Assessing leadership: A view from the dark side. International Journal of
Selection and Assessment, 9, 40–51.
Hogan, R., & Kaiser, R. B. (2005). What we know about leadership. Review of General Leadership, 9,
169–180.
Hollenbeck, G. P. (1994). CEO selection: A street-smart review. Greensboro, NC: Center for Creative
Leadership.
Hooijberg, R., & Schneider, M. (2001). Behavioral complexity and social intelligence: How executive leaders
use stakeholders to form a systems perspective. In S. J. Zaccaro & R. J. Klimoski (Eds.), The nature of
organizational leadership (pp. 104–131). San Francisco, CA: Jossey-Bass.
House, R. J. (1971). A goal-path theory of leader effectiveness. Administrative Science Quarterly, 16,
321–339.
Howard, A. (1997). A reassessment of assessment centers: Challenges for the 21st century. Journal of Social
Behavior and Personality, 12, 13–52.
Howard, A. (2001). Identifying, assessing, and selecting senior leaders. In S. J. Zaccaro & R. Klimoski (Eds.),
The nature and context of organizational leadership (pp. 305–346). San Francisco, CA: Jossey-Bass.
Howard, A., & Bray, D. W. (1988). Managerial lives in transition: Advancing age and changing times. New
York, NY: Guilford Press.
Howard, A., Erker, S., & Bruce, N. (2007). Selection forecast 2006/2007. Pittsburgh, PA: Development
Dimensions International.
Hoyt, C. L., Murphy, S. E., Halverson, S. K., & Watson, C. B. (2003). Group leadership: Efficacy and effective-
ness. Group Dynamics: Theory, Research, & Practice, 7, 259–274.
Javidan, M., Dorfman, P. W., de Luque, M. S., & House, R. J. (2006). In the eye of the beholder: Cross cultural
lessons in leadership from project GLOBE. Academy of Management Perspectives, 20, 67–90.
Jeanneret, P. R., & Silzer, R. (Eds.). (1998). Individual psychological assessment. San Francisco, CA: Jossey-
Bass.
Johnson, S. K., & Dipboye, R. L. (2008). Effects of task charisma conduciveness on the effectiveness of char-
ismatic leadership. Group & Organization Management, 33, 77–106.
Judge, T. A., & Bono, J. E. (2001). Relationship of core self-evaluation traits—self-esteem, generalized self-
efficacy, locus of control, and emotional stability—with job satisfaction and job performance. Journal of
Applied Psychology, 86, 80–92.
Judge, T. A., Bono, J. E., Ilies, R., & Gerhardt, M. W. (2002). Personality and leadership: A quantitative and
qualitative review. Journal of Applied Psychology, 87, 765–780.
Korman, A. K. (1968) The prediction of managerial performance: A review. Personnel Psychology, 21,
295–322.
Krause, D. E., Kersting, M., Heggestad, E. D., & Thornton, G. C., III. (2006). Incremental validity of assess-
ment center ratings over cognitive ability tests: A study at the executive management level. International
Journal of Selection and Assessment. 14, 360–371.
Law, K. S., Wong, C., & Song, L. J. (2004). The construct and criterion validity of emotional intelligence and
its potential utility for management studies. Journal of Applied Psychology, 89, 483–496.
Lieberson, S., & O’Connor, J. F. (1972). Leadership and organizational performance: A study of large corpora-
tions. American Sociological Review, 37, 117–130.
838 Handbook of Employee Selection

London, M., & Smither, J. W. (1995). Can multi-source feedback change perceptions of goal accomplish-
ment, self-evaluations, and performance related outcomes? Theory-based applications and directions for
research. Personnel Psychology, 48, 803–839.
Lord, R. G., DeVader, C. L., & Alliger, G. M. (1986). A meta-analysis of the relation between personality traits
and leadership perceptions: An application of validity generalization procedures. Journal of Applied
Psychology, 71, 402–410.
Lucier, C., Spiegel, E., & Schuyt, R. (2002). Why CEOs fall: The causes and consequences of turnover at the top.
Retrieved December 17, 2007, from http://www.boozallen.com/media/file/CEO-Turnover-Study-sb28.pdf
McCall, M. W. (1998). High flyers: Developing the next generation of leaders. Boston, MA: Harvard Business
School Press.
McCall, M. W., & Hollenbeck, G. P. (2001). Developing global leaders. Boston, MA: Harvard Business Review
Press.
McClelland, D. C. (1996). Does the field of psychology have a future? Journal of Research in Personality, 30,
429–434.
McCrae, R. R., & Costa, P. T. (1989). The structure of interpersonal traits: Wiggins’s circumplex and the Five-
Factor model. Journal of Personality and Social Psychology, 56, 586–595.
McPhail, S. M. (2007). Alternative validation strategies: Developing new and leveraging existing evidence.
San Francisco, CA: Jossey-Bass.
Meindl, J. R., Ehrlich, S. B., & Dukerich, J. M. (1985). The romance of leadership. Administrative Science
Quarterly, 30, 78–102.
Menkes, J. (2005). Executive intelligence: What all great leaders have. New York, NY: HarperCollins.
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007).
Reconsidering the use of personality tests in selection contexts. Personnel Psychology, 60, 683–729.
Morgeson, F. P., Mumford, T. V., & Campion, M. A. (2005). Coming full circle: Using research and practice
to address 27 questions about 360-degree feedback programs. Consulting Psychology Journal: Practice
and Research, 57, 196–209.
Morgeson, F. P., Reider, M. H., & Campion, M. A. (2005). Selecting individuals in team settings: The impor-
tance of social skills, personality characteristics, and teamwork knowledge. Personnel Psychology, 58,
583–611.
Moses, J. L., & Eggebeen, S. L. (1999). Building room at the top. In A. Kraut & A. K. Korman (Eds.), Evolving
practices in human resource management: Responses to a changing world of work (pp. 201–225). San
Francisco, CA: Jossey-Bass.
Murphy, K. J., & Zabojnik, J. (2004). CEO payment and appointments: A market-based explanation for recent
trends. American Economic Review, 94, 192–196.
Nadler, D. A., Behan, B. A., & Nadler, M. B. (Eds.). (2006). Building better boards: A blueprint for effective
governance. San Francisco, CA: Jossey-Bass.
Ones, D. S., Dilchert, S., Viswesvaran, C., & Judge, T. A. (2007). In support of personality assessment in orga-
nizational settings. Personnel Psychology, 60, 999–1027.
Ones, D. S., Viswesvaran, C., & Dilchert, S. (2005). Cognitive ability in personnel selection decisions. In A.
Evers, N. Anderson, & O. Voskuijl (Eds.), Handbook of personnel selection (pp. 143–173). Malden, MA:
Blackwell.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities:
Findings and implications for personnel selection and theories of job performance. Journal of Applied
Psychology, 78, 679–703.
Pfeffer, J., & Salancik, G. R. (1978). The external control of organizations: A resource dependent perspective.
New York, NY: Harper & Row.
Piotrowski, M. (2007, April). Individual assessment today: What works, and what doesn’t work! Presentation at
the Annual Meeting of the Society for Industrial and Organizational Psychology, New York, NY.
Podratz, L. T., Hungerford, M. K., Schneider, J. R., & Bui, T. (2007, April) Global high-potential assessment:
Supporting a global talent pipeline. Practice Forum presented at the 22nd Annual Conference of the
Society for Industrial and Organizational Psychology, New York, NY.
Prien, E. P., Schippmann, J. S., & Prien, K. O. (2003). Individual assessment as practiced in industry and con-
sulting. Mahwah, NJ: Lawrence Erlbaum.
Pulakos, E. D., Arad, S., Donovan, M. A., & Plamondon, K. E. (2000). Adaptability in the workplace:
Development of a taxonomy of adaptive performance. Journal of Applied Psychology, 85, 612–624.
Pulakos, E. D., Borman, W. C., & Hough, L. M. (1988). Test validation for scientific understanding: Two
demonstrations of an approach for studying predictor-criterion linkages. Personnel Psychology, 41,
703–716.
Selecting Leaders 839

Riggio, R. E., Murphy, S. E., & Pirozzolo, F. J. (2002). Multiple intelligences and leadership. Mahwah, NJ:
Lawrence Erlbaum.
Riggio, R. E., Riggio, H. R., Salinas, C., & Cole, E. J. (2003). The role of social and emotional communica-
tion skills in leader emergence and effectiveness. Group Dynamics: Theory, Research, and Practice, 7,
83–103.
Riggio, R. E., & Taylor S. J. (2000). Personality and communication skills as predictors of hospice nurse per-
formance. Journal of Business and Psychology, 15, 351–359.
Russell, C. J. (2001). A longitudinal study of top-level executive performance. Journal of Applied Psychology,
86, 560–573.
Ryan, A. M., & Sackett, P. R. (1987). A survey of individual assessment practices by I-O psychologists.
Personnel Psychology, 40, 455–488.
Salancik, G. R., Calder, B. J., Rowland, K. M., Leblebici, H., & Conway, M. (1975). Leadership as an out-
come of social structure and process: A multidimensional analysis. In J. G. Hunt & L. L. Larson (Eds.),
Leadership frontiers (pp. 81–101). Kent, OH: Kent State University.
Salovey, P., & Mayer, J. D. (1990). Emotional intelligence. Imagination, Cognition, and Personality, 9,
185–211.
Sessa, V. I. (2001). Executive promotion and selection. In M. London (Ed.), How people evaluate others in
organizations (pp. 91–110). Mahwah, NJ: Lawrence Erlbaum.
Sessa, V. I., & Taylor, J. J. (2000) Executive selection. San Francisco, CA: Jossey-Bass.
Silzer, R. (2002). The 21st century executive: Innovative practices for building leadership at the top. San
Francisco, CA: Jossey-Bass.
Smith, J. A., & Foti, R. J. (1998). A pattern approach to the study of leader emergence. The Leadership
Quarterly, 9, 147–160.
Sosik, J. J. (1997). Effects of transformational leadership and anonymity on idea generation in computer-
mediated groups. Group & Organization Management, 22, 460–485.
Sternberg, R. J., & Vroom, V. (2002). Theoretical letters: The person versus the situation in leadership. The
Leadership Quarterly, 13, 301–323.
Stokes, G. S., & Cooper, L. A. (1995). Selection using biodata: Old notions revisited. In G. S. Stokes, M. D.
Mumford, & W. A. Owens (Eds.), Biodata handbook (pp. 311–349). Palo Alto, CA: CPP Books.
Stokes, G. S., Mumford, M. D., & Owens, W. A. (Eds.). (1995). Biodata handbook. Palo Alto, CA: CPP
Books.
Svensson, G., & Wood, G. (2006). Sustainable components of leadership effectiveness in organizational perfor-
mance. Journal of Management Development, 25, 522–534.
Tett, R. P., & Christiansen, N. D. (2007). Personality tests at the crossroads: A response to Morgeson, Campion,
Dipboye, Hollenbeck, Murphy, & Schmitt (2007). Personnel Psychology, 60, 967–993.
Tett, R. P., Jackson, D. N., & Rothstein, M. (1991). Personality measures as predictors of job performance: A
meta-analytic review. Personnel Psychology, 44, 703–742.
Thomas, J.C. (2004). Comprehensive handbook of psychological assessment. Vol. 4: Industrial and organiza-
tional assessment. Hoboken, NJ: Wiley.
Thornton, G. C., III, & Byham, W. C. (1982). Assessment centers and managerial performance. New York,
NY: Academic Press.
Thornton, G. C., III, & Krause, D. E. (2009). Comparison of practices in selection vs. development assess-
ment centers: An international survey. International Journal of Human Resource Management, 20,
478–498.
Thornton, G. C., III, & Rupp, D. R. (2006). Assessment centers in human resource management: Strategies for
prediction, diagnosis, and development. Mahwah, NJ: Lawrence Erlbaum.
Toegel, G., & Conger, J. A. (2003). 360-degree assessment: Time for reinvention. Academy of Management
Executive, 12, 86–94.
Tornow, W. W. (1993). Perceptions or reality: Is multi-perspective measurement a means or an end? Human
Resource Management, 32, 211–219.
Wackerle, F. W. (2001). The right CEO: Straight talk about making tough CEO selection decisions. San
Francisco, CA: Jossey-Bass.
Weiner, N., & Mahoney, T. A. (1981). A model of corporate performance as a function of environmental, orga-
nizational, and leadership processes. Academy of Management Journal, 24, 453–470.
Wofford, J. C., Whittington, J. L., & Goodwin, V. L. (2001). Follower motive patterns as situational moderators
for transformational leadership effectiveness. Journal of Managerial Issues, 13, 196–211.
Yun, S., Cox, J., & Sims, H. P. (2006). The forgotten follower: A contingency model of leadership and follower
self-leadership. Journal of Managerial Psychology, 21, 374–388.
840 Handbook of Employee Selection

Yun, S., Faraj, S., & Sims, H. (2005). Contingent leadership and effectiveness of trauma resuscitation teams.
Journal of Applied Psychology, 90, 1288–1296.
Zaccaro, S. J. (2001). The nature of executive leadership: A conceptual and empirical analysis of success.
Washington, DC: APA Books.
Zaccaro, S. J., & Banks, D. (2004). Leader visioning and adaptability: Building the gap between research and
practice on developing the ability to manage change. Human Resource Management, 43, 367–380.
Part 8
Milestones in Employee Selection
Walter C. Borman and Benjamin Schneider,
Section Editors
This page intentionally left blank
39 The Management
Progress Study and Its
Legacy for Selection

Ann Howard

The Management Progress Study (MPS) is a longitudinal study of Bell System telephone company
managers that actively collected data from the mid-1950s to the mid-1980s. AT&T’s primary moti-
vation for initiating the study was to generate knowledge about how managers’ lives and careers
develop over time. This long-term pursuit of basic human resources research in a corporate setting
remains a unique and outstanding achievement. However, it was founding director Douglas W.
Bray’s primary research method—the management assessment center—that made the study a clas-
sic in personnel selection. The foundations for this method came primarily from the military, but
Bray adapted it to organizational settings in a way that has endured for more than 50 years.
This chapter describes the beginnings of MPS, its methodology, and major findings that bear
on selection, including a brief review of how successful managers change and develop over time.
A discussion of what has been preserved from the MPS assessment center model, what has been
enhanced, and what has been sidetracked or overlooked helps frame the MPS legacy for selection.
The chapter closes with hypotheses about the durability of specific research findings.

THE FOUNDATIONS OF THE ASSESSMENT CENTER


The assessment center (AC) method is distinguished from other selection techniques by its use
of multiple methods, including simulations, to elicit behavior that is observed and evaluated by
multiple assessors. Although performance tests requiring overt behavior were used by early psy-
chologists, such as Alfred Binet in his studies of children’s intelligence, problems with observation,
scoring, and other issues led psychologists to rely on paper-and-paper testing during the 1930s and
1940s. Highlights of the evolution of ACs are presented here; for a more complete treatment see
Thornton and Byham (1982).

EARLY DEVELOPMENTS
The roots of the AC method are usually traced to the testing and observation of German mili-
tary officers in World War I. Air Force officer candidates participated in tests, interviews, physical
exercises, discussion groups, and simulations, such as explaining a problem to a group of soldiers.
Psychologists observed the candidates and provided evaluation data to a senior military officer, who
made the final selection among candidates (Martin, 1941).
The German program was among the first to use both multiple assessment techniques and mul-
tiple assessors. It also made some attempts to obtain behavioral samples and brought in the idea
of holistic assessment rather than trait measurement. However, the simulations were crude and

843
844 Handbook of Employee Selection

inconsistent across applications, and there was a lack of standardization in administration and
recording observations. Moreover, the target of measurement (overall character of leaders) and
some of the measures used (handwriting analysis, facial expression) were misguided. The validity
of the program was not examined (Thornton & Byham, 1982).
The idea of bringing together multiple techniques and multiple assessors also grew among personality
researchers at the Harvard Psychological Clinic in the 1930s. Dr. Henry Murray, a physician turned
psychologist, thought that better diagnoses of normal personality would result from bringing diverse
specialists together to make a holistic evaluation. Murray and his colleagues also developed unique
instruments to study personality such as the Thematic Apperception Test, or TAT. However, their
evaluations of these early techniques focused on the extent to which they elicited rich material for
interpretation of personality rather than their relationship to external criteria (Murray, 1938).

ACS IN WORLD WAR II


When World War II erupted in Europe in 1939, the British soon discovered that their process for
selecting Army officers was ineffective. Their selection method, a 20-minute interview with gradu-
ates from prestigious schools, was clearly biased toward the upper classes and unable to identify the
most successful officers. To help meet the need for more and better quality officers, Major General
A.F.A.N. Thorne developed a new selection system on the basis of the German program, which he
had witnessed while a British military attaché in Berlin (Vernon & Parry, 1949).
The resulting War Office Selection Boards (WOSB) typically brought officer candidates to an
estate for 3 days. To minimize class or other biases, candidates wore officer uniforms and used num-
bers instead of their names. Assessment methods included psychiatric interviews, psychological
tests, physical activities, and simulations. Although the WOSB program initially used simulations
from the German model, problems with these generally unstructured and unscorable exercises led
them to develop their own techniques. They made extensive use of group exercises to evaluate lead-
ership. Although there is little evidence to validate the AC against performance in the field, cadets
selected by the WOSB process were more likely to earn above-average ratings in training than those
selected by the old interview method. More specifically, a study conducted during the transition
found that among cadets selected by the interview method, 22% were rated above average in train-
ing and 37% were rated below average or failed; among cadets selected by the assessment method,
35% were above average and 25% below average or failed (Vernon & Parry, 1949).
The AC served a different need after the United States entered the war. America wanted better
intelligence on its adversaries and decided to send people to work with resistance groups behind
enemy lines. To accomplish this mission, the government formed a new agency called the Office of
Strategic Services, or OSS. After some of the intelligence agents suffered breakdowns in the field
or exhibited other failings that might have been detected earlier, the OSS looked for new ways of
selecting intelligence personnel.
Captain John Gardner, a psychologist, saw the AC as a way around the limitations of the paper-
and-pencil test and sold the idea to the OSS staff. Most of the OSS assessments took place at the
Willard estate in Fairfax, Virginia, which became known as Station S. The OSS recruited Dr.
Henry Murray to direct the ACs. Murray brought with him some of the exercises he had developed
at Harvard, including the TAT. However, the staff designed most of the simulations specifically for
the OSS mission.
One of the unique aspects of the OSS assessment was requiring the participants to take on a role.
When they arrived at the site, candidates were issued Army fatigues and boots and told to concoct
a false story about themselves. They were not to reveal their true identity except under defined
conditions. A stress interview tested whether the recruits could maintain their false identity. In
another simulation, the construction exercise, the candidates had to direct two difficult role players
to put a structure together. Another exercise had participants briefly observe a previously occupied
room and then answer detailed questions about the occupant. The OSS exercises demonstrated
The Management Progress Study and Its Legacy for Selection 845

the potential of simulations for eliciting behavior in a wide range of situations, and all methods
contributed significantly to one or more final dimension ratings. The overall effectiveness rating
from assessments at two locations had an average correlation with a field performance appraisal by
overseas staff of .45; this was a median of .37 at Station S and .53 at Station W, each corrected for
range restriction (OSS Assessment Staff, 1948).
In 1948, while finishing his doctorate at Yale, Douglas W. Bray read Assessment of Men, the
book describing the OSS experience, and became excited by the possibility of evaluating people by
observing their live behavior. He had already recognized the power of watching people in action
from his own World War II experience evaluating the performance of bombardiers. In his first
post-PhD position on Princeton’s Study of Education, he proposed using the AC method to study
students, but the university was reluctant to implement such a bold initiative. It was not until 1956
that Bray finally got the opportunity to put an AC into action when he accepted a new position at
AT&T.

THE BEGINNINGS OF MANAGERIAL ASSESSMENT


As organizations grew in the postwar economic boom, they created multiple layers of management
to coordinate the enterprise (Howard, 1995). AT&T, then the largest U.S. corporation, struggled to
fill its many middle-management positions, particularly those in its 23 operating telephone compa-
nies. Robert K. Greenleaf, AT&T’s director of management development, had a deep intellectual
curiosity about behavioral science and had already arranged for seminars for AT&T officers fea-
turing prominent speakers such as Carl Rogers, William and Karl Menninger, and Carl Hovland
(Bray, 1990). Greenleaf decided to start a longitudinal study of managers to better understand the
course of life as a Bell System manager. Little was known at the time about managers’ lives and
careers, although there was much public speculation that mid-century work and life was changing
the American character toward conformity (Riesman, Glazer, & Denney, 1950), particularly among
those in large organizations (Whyte, 1956).
Greenleaf apparently informed Carl Hovland about his plans, because Hovland invited his for-
mer doctoral student at Yale, Douglas W. Bray, to have lunch at AT&T. Bray had spent the previous
5 years with Eli Ginzberg on the Conservation of Human Resources Project at Columbia University.
While there, he initiated a study of World War II’s ineffective soldiers, a project sparked by for-
mer U.S. President Dwight Eisenhower, then President of Columbia. Despite significant recognition
Bray was receiving for publications about this work—including occasional citations in The New
York Times—when the AT&T lunch led to a job offer, he jumped at the chance to do applied orga-
nizational research.
Bray indicated up front that he wanted to use an AC to collect the primary research data for
the longitudinal study. Fortunately, Greenleaf was “enthusiastically encouraging and laissez-faire”
(Bray, 1990), which gave Bray considerable license to run the study as he saw fit. Bray named the
longitudinal research the Management Progress Study.
Because MPS was intended to study progress within the managerial ranks, participants needed
to have a realistic chance of being promoted into middle management or higher, that is, to at least
the third rung in a seven-level telephone company hierarchy. In the 1950s, this inevitably meant
being a White male. Although having a college degree could be considered an advantage, it was
not a prerequisite. The initial MPS sample consisted of 422 men from six telephone companies
(Howard & Bray, 1988): 274 college graduates hired into the first level of management and 148
high school graduates who had been promoted into management before they were 32 years of age.
The average college graduate, at age 24.5 years, was 5 years younger than the average noncollege
participant (age 29.5 years).
The first data collection was a 3.5-day AC, conducted at different telephone company locations
in the summers of 1956–1960. Following the OSS tradition, the first AC took place in a converted
mansion: the Barlum House on the St. Clair River in Michigan. However, the MPS AC departed
846 Handbook of Employee Selection

widely from its military predecessors in terms of what was measured and the methods used. Instead
of enduring physical activities or keeping one’s identity as a spy under duress, participants engaged
in activities directly related to the challenges of a middle manager in a large organization.
The variables measured were qualities considered important to managerial success. Bray con-
solidated a list of 25 qualities from three sources: the management and psychological literature,
behavioral scientists outside of the Bell System, and senior personnel executives inside of the
System (Bray, Campbell, & Grant, 1974). These variables, later called dimensions, clustered into
seven factors common to both samples (Bray & Grant, 1966): administrative skills (e.g., organizing
and planning), interpersonal skills (e.g., leadership skills), intellectual ability (e.g., general mental
ability), stability of performance (e.g., tolerance of uncertainty), work involvement (e.g., primacy of
work), advancement motivation (e.g., need for advancement), and independence (e.g., low need for
superior approval).
In addition to simulations, the exercises included background interviews, a personal history
questionnaire, biographical essays, an expectations inventory, cognitive tests (School and College
Ability Test, Critical Thinking in Social Science Test, Contemporary Affairs Test), personality
and motivation inventories (Edwards Personal Preference Schedule, Guilford-Martin Inventory,
Sarnoff’s Survey of Attitudes Toward Life, Q sort, Bass version of California F-scale), and projec-
tive tests (Murray’s TAT, Rotter Incomplete Sentences Blank, Katkovsky’s Management Incomplete
Sentences Test).
However, it was the business simulations that drew the most attention. There were three of these:
an in-basket, a business game, and a leaderless group discussion. The in-basket technique was first
developed by the Educational Testing Service to measure curriculum effectiveness at the U.S Air
Force Air College (Frederiksen, Saunders, & Wand, 1957). The MPS assessee received memos,
reports, and records of telephone calls that might well accumulate on the desk of a mid-level man-
ager. In responding to these items, the assessee had to initiate meetings, delegate tasks, make deci-
sions, and engage in planning.
Also new was the business game, which used Tinker Toys to simulate a manufacturing situa-
tion. A team bought parts and assembled them to make one of five toys, which they sold back to the
assessors. The prices of the raw materials and the completed toys fluctuated throughout the game.
In a group discussion exercise, six assessees had to decide which of their direct reports should be
promoted to fill a vacancy. Each participant made a short presentation recommending his candidate.
Then the group had to discuss and agree on which person to promote.
Assessors, most of whom were psychologists, wrote reports that described individual partici-
pants’ behavior in each simulation, summarized interview information, and interpreted the
projective tests. They read their reports aloud in an integration session at which test scores and
all other information on the candidate were also revealed. Each assessor individually rated the
25 dimensions for each participant and then reviewed each rating with the group. If assessors’
ratings disagreed by at least two points on a five-point scale, they discussed the evidence until they
came to a consensus rating.
The assessment staff then made predictions about each man’s future with the Bell System. Three
types of predictions were made: Will this person be advanced to the entry level of middle manage-
ment (third level) within 10 years? Should he be? And will he stay with the Bell System for the rest
of his career? The “will” prediction took into account not only latent potential, but how decision-
makers in the company would react to the participant. Thus, “will” was the best prediction of
whether the person would be promoted, whereas “should” was the truer measure of management
potential.
The extensive array of data collected on each participant enabled deep insight into each man’s
background, lifestyle, work interests, personality, motivation, and abilities. Differences between the
college and noncollege samples were pronounced from the start (Howard & Bray, 1988). For exam-
ple, each background interview was scored on nine life themes, such as marital-familial, financial-
acquisitive, religious-humanism, and occupational (Rychlak & Bray, 1967). The more experienced
The Management Progress Study and Its Legacy for Selection 847

and mostly married noncollege men were more invested in their marital family and in their work
life, whereas the college men were more likely to focus on improving their minds and bodies (ego-
functional life theme) and on providing service to the community. Personality and motivation factors
found the college men considerably more ambitious and impulsive than their noncollege counter-
parts. Although the two groups were similar in administrative and interpersonal skills, the college
men far surpassed the noncollege in cognitive ability. Factor scores computed from the assessors’
dimension ratings showed the college men much higher on the intellectual ability and advancement
motivation factors. The noncollege men scored higher on the independence factor, meaning that
they were freer of others’ influence and better able to stand on their own.
The assessors predicted that more college than noncollege men had true potential for middle
management, which was not surprising in light of the differences in intellectual ability and advance-
ment motivation. However, they found nearly half of the college men (47%) and more than two-
thirds of the noncollege men (69%) as lacking middle-management potential, and they rated still
more as questionable. These findings indicated that the Bell System was in great need of a better
way of selecting higher-level managers (Howard & Bray, 1988).
The predictions about rising into middle management were for 10 years after the assessment;
thus, evidence to verify the accuracy of the predictions accumulated slowly. Although individual
results of the assessment were kept confidential, the activities of the center were not a secret. When
Michigan Bell executives heard about the AC, they asked if the method could be used to help select
foremen. Bray understood that operational use would require disengaging the method from its psy-
chological underpinnings and making it accessible to laypeople. He removed all of the personality,
motivational, and other psychological techniques and left only the mental ability test to complement
the simulations. Bray and his staff then trained the managers to conduct the assessment, observe
behavior in the simulations, write reports, make judgments about the relevant dimensions, and rate
candidates’ potential performance as a foreman. The first operational center opened in Michigan
Bell in 1958.
The AC method quickly spread throughout the Bell System and was adapted to additional man-
agement positions. The centers were staffed entirely by managers, who went through assessor
training programs. The orientation of the assessment switched from clinical judgments to a more
behavioral approach; assessors were encouraged to use the behavior observed in the exercises to
predict future behavior on the job, not to try to be amateur psychologists. The dimensions sought
were determined by job analyses of target jobs rather than educated guesses, although many of the
original dimensions endured.

PREDICTING MANAGERIAL SUCCESS


Fortunately, Donald Grant, a key manager in Bray’s group at AT&T who had also served as
an MPS assessor, initiated the early research with the data. This included factor analyzing the
dimensions, relating the exercises to assessor judgments and comparing the assessment results to
early progress in management (Bray & Grant, 1966). The fundamental question of whether the
AC could predict advancement into middle management was clearly answered 8 years after the
original assessments. Figure 39.1 shows the relationship between the “will make middle manage-
ment” prediction from the initial assessment and actual attainment of third-level management
8 years later. For both the college and noncollege groups there was a highly significant relation-
ship between assessment staff predictions and progress in management. Assessment results for
these men were never available to anyone outside of the research team and played no part in the
participants’ advancement.
Although these data assured the predictive validity of the AC method, MPS yielded consider-
ably more information about the progress of the managers’ careers that helped satisfy the original
purpose of the longitudinal study. In the early years of the study, researchers interviewed the
participants annually about their perceptions of and reactions to the company, their jobs, career
848 Handbook of Employee Selection

70%
“No (or ?)” prediction 58%
60%
% Reaching middle management
“Yes” prediction
50%
40%
40%

30%
22%
20%

9%
10%

0%
Noncollege College

FIGURE 39.1 AC validity after 8 years.

expectations, and other aspects of their work and personal lives. The interview program also
included those who had left the Bell System. In conjunction with the interviews, participants filled
out questionnaires about their expectations and attitudes toward management. The research-
ers also gathered others’ views about the participants’ work assignments and job performance.
Personnel representatives provided this information initially, but the responsibility eventually
shifted to each participant’s immediate supervisor. The supervisor interviews were coded for
various work conditions but were less fruitful as validation criteria than was progression in
management.
By the time 8 years had passed, it became clear that the interviews could not fully evaluate
how the men’s lives and careers had changed as they fulfilled their destiny as business managers.
AT&T therefore decided to put those participants still employed in the Bell System (about 75% of
the total group) through another AC duplicating or paralleling the one they had undergone 8 years
previously. Richard Campbell served as administrator of the data collection while Bray continued
as director of MPS and Donald Grant supervised the data analyses. Following this second AC, the
interview program continued on a 3-year rather than an annual basis.
As the 20th year of the study approached, the participants were in their 40s and 50s, many
careers had plateaued, and thoughts of retirement had begun to stir. It was an ideal time to gather
in-depth information about the later stages of the MPS participants’ careers. Doug Bray once again
determined that an AC was the best way to develop the deepest understanding, and he hired me as a
new MPS collaborator. My lucky break, which came as I was completing my PhD at the University
of Maryland, stemmed from an article about ACs that I originally wrote for a graduate class and was
published in the Academy of Management Journal (Howard, 1974), thanks to the encouragement of
Benjamin Schneider, one of my professors. The article stimulated an invitation to an AC conference
at the Center for Creative Leadership, where I became acquainted with Doug, who had also read the
article. When there was an unexpected job opening at AT&T a few months later, Doug recruited me,
and I began my new position in August 1975.
The third MPS AC, which began in 1976, repeated only about one-third of the tests and question-
naires given at the previous two centers. The rest of the assessment included various new exercises
focused on midcareer and midlife that Doug Bray and I developed together (Bray & Howard, 1983).
Bray remained the overall MPS director while I assumed the roles of administrator and researcher;
Don Grant had by now retired from AT&T and taken an academic position.
One more round of MPS interviews was conducted at year 25 before the breakup of the Bell
System ended data collection for the study. Table 39.1 shows the 25-year design and the sample sizes
for each AC.
The Management Progress Study and Its Legacy for Selection 849

TABLE 39.1
MPS Design and Sample
Sample
Year Design College Noncollege Total
0 MPS:0 assessment 274 148 422
1–7 Annual interviews with participants and company
representatives; annual expectations and attitude
questionnaires by participants; terminator interviews
8 MPS:8 reassessment (parallel to original) 167 142 309
9–19 Triannual interviews with participants, bosses, and
terminators; participant questionnaires
20 MPS:20 reassessment (midlife and midcareer) 137 129 266
25 Interviews with participants, bosses, and terminators;
participant questionnaires

SUSTAINABILITY OF ASSESSMENT PREDICTIONS


An important question for selection is how long AC results will continue to have predictive power.
Given that people are expected to change and develop over time, at what point should assessment
data be considered obsolete?
Table 39.2 shows the management levels the participants had obtained by the 25th year of the
study. By this time, many of the noncollege men had retired, so their level reflects their last promo-
tion while still employed.
Predictions made from the initial assessment continued to relate to advancement over long peri-
ods of time. Many in the noncollege sample had plateaued by year 8. As a result, validity coefficients
showed only a small decline (62% of the “yes” prediction group had made middle management
compared with 21% of the “no or ?” group).
The college graduates had a very different experience. Validity coefficients plummeted
between years 8 and 20, primarily because many in the “no or ?” prediction group began to be
advanced into middle management—a phenomenon the researchers dubbed “the upward seep
of mediocrity.” Unlike the noncollege participants, the college graduates had been hired with
middle management as a goal; apparently if they stayed around long enough, many would reach
that goal regardless of merit. Raising the criterion bar to the fourth level of management for the

TABLE 39.2
MPS Management Levels at Year 25a
College Noncollege Total
Level N % N % N %
7 1 1 1 0
6 4 3 4 1
5 10 7 10 4
4 32 23 5 4 37 13
3 62 45 40 29 102 37
2 24 17 61 44 85 31
1 6 4 32 23 38 14
Total 139 100 138 100 277 100

a Includes retirees.
850 Handbook of Employee Selection

college group once again demonstrated successful prediction from the original assessment: 44%
of the “yes” group attained the fourth level of management compared to 25% of the “no or ?”
prediction group.
Although the original assessment ratings remained valid across time, the question still remains
as to whether a later assessment would have even greater validity. The year-8 reassessment data
confirmed that this was true. Figure 39.2 illustrates why greater prediction occurred with time. The
graph shows average ratings on a higher-order general effectiveness factor, which comprised all but
a few of the dimensions. The results are shown by management level attained by year 20.
The pattern of results shown in Figure 39.2 illustrates the cliché, “The rich get richer, and the
poor get poorer” (Howard & Bray, 1988). The men who reached the fifth level or higher gained
significantly in managerial effectiveness between years 0 and 8, the fourth- and third-level men
showed little change, but the men who remained in lower levels showed significant declines. The
sharper slope of the line from the year-8 reassessment illustrates the gains for prediction of later
advancement.
The lesson for selection is that assessment information needs to be updated periodically to
account for individual change. Moreover, it should not be assumed that changes over time are due
simply to managers growing at differential rates. They are also likely to suffer differential declines
in ability and motivation as bad habits become ingrained and the flames of youthful ambition are
extinguished by the wet blanket of reality.

LONG-TERM PREDICTIVE ELEMENTS


Although the overall AC predictions are the key to the validity of the AC, the rich panoply of MPS
data lent itself to examination of the contributions of the different assessment elements to assessor
judgments and their relationships to later success. Early MPS research evaluated the contributions
of techniques such as the interview (Grant & Bray, 1969), projective tests (Grant, Katkovsky, &
Bray, 1967), and other methods to assessors’ judgments (Grant & Bray, 1966).
Later research investigated the techniques that related most strongly to long-term advancement
(Howard & Bray, 1988; Howard & Bray, 1990). Among the dimension factors, all three concerning
ability (administrative, interpersonal, and intellectual) showed significant relationships with later
promotions as did both motivational factors (advancement motivation and work involvement). The
two dimension factors related to personality (stability of performance and independence) showed
very little relationship to advancement 20 years later.

4.0

3.5
Transformed factor score

MPS:0
3.0

2.5

2.0
MPS:8

1.5

1.0
1 2 3 4 5&6
Level at MPS:20

FIGURE 39.2 General effectiveness factor over time by management level at year 20.
The Management Progress Study and Its Legacy for Selection 851

Within these broad categories were more specific strong predictors. For example, the mental
ability tests had more predictive power than the test of contemporary affairs. Among the interper-
sonal skills, oral communications was particularly useful in predicting later advancement.
Ambition was by far the most significant motivational characteristic bearing on later advance-
ment, and goal-setting was a key indicator. Goals were coded from assessment interviews in which
participants were asked to state the highest management level they wanted to attain in their career.
These goals could range from 1 (the lowest level of management) to 7 (a telephone company presi-
dent). The participants’ goals showed a substantive correlation with level attained by year 25 (r = .54),
even when the influence of cognitive ability was controlled (r = .43). Correlations with advancement
were significant for the college and noncollege groups separately, so that these groups’ differing
expectations of success did not fully explain the results. Perhaps the final career goal represents a
personal sizing up of one’s potential in relation to the organizational environment (Howard, 2005).
Lifestyles also showed some relationship to later advancement. For both educational groups,
those destined for high-level management were concerned about developing and improving them-
selves, mentally and physically. They were also more involved in their work.
Among the college graduates, academic success in high school and college was a good predictor
of advancement, as was attending a college of better quality. College major also made a difference;
the higher-level men were more likely to have been humanities or social science majors than busi-
ness or technical majors (Howard, 1986). Apparently the verbal and interpersonal skills, breadth of
interests, and inclinations toward leadership that accompany a liberal arts education trumped the
technical competence acquired in more specialized degrees.

PREDICTING TURNOVER
A precursor to success in Bell System management—or in any organization—is whether or not a
manager stays on the payroll. In the 1950s, men joining large organizations were typically expected
to stay until retirement. Nevertheless, nearly half of the college graduates (128 of 274) had left the
Bell System by the time of the MPS:20 assessments. Most of the terminations occurred in the first
few years of employment: 25% were gone within 4 years and nearly 40% within 10. Terminations
among the noncollege participants were negligible, given that they had been employed by the com-
pany more than 9 years on average before attending the AC. But the MPS:20 staff was able to suc-
cessfully predict which of the noncollege men were likely to retire early (Howard, 1988).
The staff at the original AC predicted that 70% of the college recruits would stay with the com-
pany, an overestimate of the 53% who actually did stay. Yet the assessors were still able to differen-
tiate those who would stay or leave far better than chance. Figure 39.3 shows their batting average
(in each case p < .01) for all terminations as well as by termination type. The subgroups included
56% that were discharged or induced to leave (forced terminators) and 44% who left voluntarily
(Howard & Bray, 1988).
The forced terminators’ lower scores on administrative and cognitive skills differentiated them
from the voluntary terminators and the stayers. They were rated significantly lower on decision-
making and planning and organizing in the in-basket, and they had lower mental ability test scores.
In addition, they showed unrealistic expectations about their future, lower resistance to stress, and
less endurance on the Edwards Personal Preference Schedule. Thus, the forced terminators were not
only less capable than those who stayed or left voluntarily, but they also lacked the resources that
would help them cope with a difficult situation.
The voluntary terminators showed equal abilities to those who stayed, but they were significantly
different from stayers and forced terminators in motivational characteristics. Dimension ratings
identified them as higher on need for advancement, but their ambition had an insistent and rigid
quality, accompanied by lower ratings on the ability to delay gratification and on behavior flexibil-
ity. Their high need for status was not matched by an intrinsic interest in the work itself, and they
had lower standards for the quality of their work. The voluntary terminators had lower interview
852 Handbook of Employee Selection

70%
66%
Predicted to leave (or ?)
60% Predicted to stay

50%

40%
% Terminated

40%
37%

30% 29%

21% 20%
20%

10%

0%
All terminations Voluntary Forced

FIGURE 39.3 MPS:0 predictions and terminations—college sample.

ratings on need for security and Bell System value orientation, which flagged less loyalty to the
company. When rapid advancement did not materialize, these men, being relatively uncaring about
their work or their company, departed.

SUCCESSFUL MANAGERS’ DEVELOPMENT OVER TIME


A full account of the study’s contributions to the fields of adult and career development is beyond the
scope of this chapter; interested readers should consult Managerial Lives in Transition (Howard &
Bray, 1988). However, knowledge of how managers are likely to change and develop over time can
be helpful when deliberating the pros and cons of a selection decision. This section provides a brief
review of how the more and less successful managers evolved over time.

MANAGERIAL ABILITIES
Evaluation of changes in managerial abilities must rely on data between MPS:0 and MPS:8 because
some key exercises were lacking at MPS:20. Figure 39.4 shows changes in three ability factors
(based on specific scores or ratings rather than dimensions), separated by management level at
MPS:20.
There was a surprising difference among the level groups in the administrative ability factor.
Those who ended up at year 20 in the lowest two management levels showed declines in administra-
tive ability in their early years, whereas those at the fifth and sixth levels showed a marked increase.
The future executives had developed their administrative capabilities over the first 8 years to a
rather extraordinary degree compared with their cohorts.
None of the level groups demonstrated better interpersonal ability at MPS:8 than they had
at the beginning, and most were rated more poorly. The best that could be said for the future
executives was that they held their own. Although motivation might be a key to the declines, the
results suggest that organizations need to continually reinforce interpersonal skills in development
planning.
All level groups gained on the cognitive ability factor, although those who would ultimately
reach third level and higher gained more. In this case, continuing one’s self-education in worldly
affairs was mostly a function of having gone to college.
The Management Progress Study and Its Legacy for Selection 853

65
Administrative Interpersonal Cognitive
60

55

50

45

40

35
Year 0
30 Year 8

25
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
MPS: 20 level

FIGURE 39.4 Ability factors MPS:0 to MPS:8 by MPS:20 level.

The longitudinal ability data suggest that organizations should not make selection decisions with
the assumption that managers will likely improve their interpersonal skills over time; unless an
intervention is planned, they are likely to get worse. Although some managers might get better at
administrative skills, this is most likely to occur as they move up the ladder and take on greater
responsibilities. As a matter of course, only cognitive ability can be expected to increase as manag-
ers continue to expand their knowledge.

MOTIVATION FOR ADVANCEMENT


Ambition continued to sharply differentiate the men by management level, even after 20 years;
however, all groups declined over time, as Figure 39.5 shows. Although many men had given up on
further promotions, the higher a man’s level, the more likely he was to cling to the idea of another
step up the ladder.

5.0

Level 5+
4.0
Assessor rating

Level 4
3.0 Level 3

Level 2
2.0

Level 1

1.0
0 8 20
Year of study

FIGURE 39.5 Need for advancement over time by MPS:20 level.


854 Handbook of Employee Selection

The selection message in these data is that ambition is more characteristic of young managers
than older, experienced ones. Although setting high goals at the beginning of one’s career
was a powerful predictor of later advancement, it is likely to lose its potency as time goes by.
However, reinforcing the drive to advance with promotions along the way helps to maintain its
importance.

MANAGEMENT STYLE
The higher-level men at MPS:20 showed a clear preference for leadership and advancement, but this
did not necessarily mean they were heavy-handed in executing their roles. On the Bass version of
the California F-scale, shown in Figure 39.6, the lower-level men grew into greater authoritarianism
over time. Perhaps they insisted on doing things by the book with strict enforcement of the rules.
By contrast, the men at the fourth level and higher ended up lowest on authoritarianism by MPS:20.
They had likely developed a broader perspective and learned to operate with more uncertainty and
greater flexibility.
Over the same period of time, the lower-level men became more nurturant on the Edwards
Personal Preference Schedule, whereas the higher-level men became less so. Figure 39.7 shows that
on the Bell System norms, the first-level men were at the 81st percentile by MPS:20, whereas the
fifth- and sixth-level men were only at the 34th percentile. Apparently the lower-level men were
more likely to offer sympathetic understanding and help to others, whereas the top executives were
much less likely to do so.
The combination of low authoritarianism and low nurturance suggests that the highest-level men
wanted to be flexible and objective in their leadership style but avoid emotional involvement, a style
I previously described as “cool at the top.”

They are like steely-eyed Solomons, attempting to size up situations in a rational way without the
distractions of emotional entanglements. They achieve grace not through sympathy and compassion,
but through a determination to be open to all the alternatives and to make decisions that are ultimately
objective, practical, and fair. (Howard, 1984, p. 21)

When selecting executives, then, it is important to recognize that overly nurturant behavior can be
a liability rather than an asset. The job often requires tough decisions, and executives need to be
able to sleep at night.

Percentile
90

80
MPS:20

70

60

MPS:0
50

40

30
1 2 3 4 5& 6
Level at MPS:20

FIGURE 39.6 Authoritarianism over time by MPS:20 level.


The Management Progress Study and Its Legacy for Selection 855

Percentile
90

80
MPS:20

70

60

MPS:0
50

40

30
1 2 3 4 5& 6
Level at MPS:20

FIGURE 39.7 Nurturance over time by MPS:20 level.

AC ADVANTAGES AND DISADVANTAGES


In the early years of MPS, as Bray talked to his psychologist colleagues, he encouraged the spread
of the AC method outside of the Bell System. However, ACs remained the province of only a few
companies until the 1970s, when consulting firms began to provide organizations with the needed
methodology and training. It soon became clear that ACs offered other advantages as selection tools
beyond being a valid predictor of later managerial success.

FAIRNESS AND ADVERSE IMPACT


ACs are typically viewed as fair by participants because they are so clearly job relevant and much
more personal than paper-and-pencil tests. That ACs could evaluate men and women with no
adverse impact was well established in early implementations at AT&T. The lower-level operational
centers that sprang up soon after MPS began (the Personnel Assessment Program) assessed 100,000
employees between 1958 and 1974. Among these candidates, 33% of the men and 35% of the women
were recommended for first-level management. Moreover, the same dimensions were related to suc-
cess for men and women (Bray, 1976).
The fairness of ACs eventually impressed the Equal Employment Opportunity Commission
(EEOC), which was formed to enforce the Civil Rights Act of 1964. In a landmark case, the EEOC
charged that AT&T, the largest U. S. employer of women, was actually an oppressor of women. It
was true that many jobs in the telephone companies were segregated by gender: men served in the
plant department installing equipment whereas women staffed the traffic department as telephone
operators. Whereas men had a chance to move into middle management, women stalled at the first
level of management.
AT&T signed a consent decree with the EEOC and other federal agencies in 1973. At AT&T’s
recommendation, the decree specified that an AC would be used to evaluate recent female college
hires and determine whether they should be put on the previously male fast track to middle manage-
ment (Wallace, 1976). Among nearly 1700 women assessed, 42% were recommended for middle
management—a proportion similar to that for men undergoing the same assessment program (the
Management Assessment Program; Bray, 1976).
Several years later the EEOC further confirmed its confidence in the fairness and accuracy of
the AC. The agency had Development Dimensions International use the method to help them select
district directors for their 22 field offices.
856 Handbook of Employee Selection

In the late 1970s, as work on MPS:20 was in progress, the question arose as to whether the
MPS data could be generalized beyond its White male population. As a result of discussions with
AT&T executives, Doug Bray and I began research parallel to MPS that I named the Management
Continuity Study, or MCS. As before, participants were eligible only if they were considered viable
candidates for middle management and beyond. More than 20 years after the start of MPS, times
had indeed changed. When the telephone companies provided their recent hires as participants,
one-half were women and one-third represented ethnic minority groups. Table 39.3 shows the pri-
mary differences in the composition of the two samples.
As with previous AC results, there were no gender differences in the overall ratings of man-
agement potential of the MCS participants. Assessors predicted that 51% of the men and 53% of
the women “will” make middle management within 10 years and 39% of the men and 45% of the
women “should” do so. They thought 11% of the men and 13% of the women should rise beyond
the third level of management. None of these gender differences was statistically significant. The
termination predictions also did not differ by gender, although the reason for leaving the company
did. Assessors believed that significantly more of the men (23% of those leaving) would be forced
to do so compared with only 4% of the women.
Although overall management potential was the same for the MCS men and women, their constel-
lations of abilities, interests, motivations, and management style showed many interesting nuances
that can be only briefly summarized here (for details, see Howard & Bray, 1988). Table 39.4 gives a
snapshot of some of these differences. What is important to note is that values, interests, personality,
and motivation best differentiated the men and women (in a discriminant function analysis), but the
sexes were seen as equal on the major abilities and in psychological adjustment.
There were ethnic group differences in overall management potential, although interpreta-
tion was complicated by interactions with gender and small sample sizes of some ethnic groups.
Only the Black sample (N = 79) was large enough to compare with the Whites (N = 237). It is
not known to what extent affi rmative action created selection disparities between these two
groups.
Blacks were at the greatest disadvantage on cognitive abilities, which relied on paper-and-pencil
tests. On the simulations, they lagged somewhat behind Whites in administrative skills, but the two
groups were equal on interpersonal skills. There were no overall differences between the groups on
the motivation factors. Thus, if only cognitive tests had been used for selection, the adverse impact
would have been considerably greater than it would be in an AC, which looks at a broader range of
abilities and other characteristics. Unfortunately, the breakup of the Bell System precluded collect-
ing criterion data that would have allowed examination of the validity—and potential differential
validity—of the MCS assessment.

PRACTICAL ADVANTAGES
ACs, particularly those that rely primarily on simulations, offer organizations several additional
practical advantages as selection tools (Howard, 2006).

TABLE 39.3
AT&T Longitudinal Research Samples
MPS MCS
Assessed 1956–1960 Assessed 1977–1982
6 telephone companies 18 telephone companies
All White males One-half female; one-third ethnic minority
College and noncollege All college graduates
N = 422 N = 344
The Management Progress Study and Its Legacy for Selection 857

TABLE 39.4
Management Continuity Study Gender Differences
Women Higher Men Higher
Verbal skills Quantitative skills
Interpersonal sensitivity Assertiveness
Work standards Desire to advance
Poise General information
Self-objectivity Expectations of success
Flexibility Authority orientation

1. Live behavior: Because participants demonstrate live behavior, there is little opportunity
for them to fake the demonstration of skills. Decision-makers also do not need to speculate
from test scores about what participants’ behavior would be like.
2. Diagnosis for development: A significant advantage of simulations over other selection
methods is their usefulness for development. The dimensions are primarily aimed at skills
and abilities that can be learned, and demonstrations of live behavior in simulations provide
excellent material for credible feedback. If relatively stable personality and motivation char-
acteristics are also measured in the assessment, participants can learn how they manifest
these qualities in different situations and how to keep them from getting out of control.
3. Future orientation: Simulations can be directed at future jobs or challenges, producing
information that can only be speculated about from current job performance.
4. Realistic job preview: Because they simulate real-life challenges representative of a par-
ticular position or role, ACs form a realistic job preview for candidates. Gaining this per-
spective can encourage those who are not a good fit to opt out of the selection process. This
experience is especially important for advancement to the first level of management, where
candidates might not realize what leadership really entails. It can also be a revelation to
senior-level candidates who do not appreciate the difference between operational and stra-
tegic leadership.
5. Plasticity: AC simulations are fundamentally plastic; that is, they can be designed to rep-
resent various organizational challenges and measure specific dimensions of interest.

DISADVANTAGES OF ACS
The major drawback of ACs is the effort and expense involved in conducting them. Although mod-
ern middle-management ACs are typically much shorter (1 day) than the MPS model (3.5 days)
and take advantage of computer technology, they still entail considerable administrative and staff-
ing costs. This drawback has meant that organizations are more selective about when to use ACs.
Measurement of live behavior is particularly advantageous in circumstances such as the following:

• Skills are complex and cannot easily be measured other ways.


• Development and not just selection is a goal.
• Behavior has a large impact (as in executive roles).
• Participants’ concerns for fairness are crucial.
• People are at hard transition points in their careers (such as operational to strategic
leadership).
• The new position will be significantly different from the present one.

Another disadvantage of simulations is that because scoring is a matter of holistic judgment,


interrater reliability needs to be constantly monitored. Moreover, not every competency can be
simulated, particularly those requiring long-term, cumulative actions such as networking.
858 Handbook of Employee Selection

THE MPS SELECTION LEGACY


The legacy of MPS for selection can be viewed in several ways. Some aspects of the methodology
remain unchanged; others have been favorably enhanced. In still other respects, the science or prac-
tice that the study pioneered has become sidetracked or essential elements overlooked.

PRESERVED FROM MPS


The basic methodology of an AC has been codified by an International Task Force on Assessment
Center Guidelines (2000). These guidelines specify that an AC must evaluate dimensions or competen-
cies based on a job analysis of the target position. Multiple measurement methods are required, some
of which must be simulations of the position or role to be filled. Ratings of the competencies are made
by multiple, trained assessors who observe and record behavioral data and pool their judgments.
In addition to consolidating and spreading the basic methodology of assessment, MPS identified
some fundamental dimensions that have lasting relevance. For example, it is hard to imagine lead-
ership roles that don’t require proficiency in decision-making, organizing and planning, oral com-
munications, or resistance to stress. Although new dimensions, such as change leadership or global
acumen, are needed to reflect evolving business conditions, the MPS classic dimensions remain
essential to an evaluation of management potential.
Several of the methods assembled and refined in MPS have also become AC classics. The basic
challenges of the in-basket exercise for making decisions, planning sequences of events, and del-
egating to subordinates are still essential to managerial roles. The leaderless group discussion also
remains popular as a way of identifying emergent leadership, even though the topics discussed
might vary from center to center.
Also confirmed many times over is the AC’s predictive validity. As ACs became entrenched,
a great deal of research and several meta-analyses demonstrated the criterion-related validity
of overall assessment ratings with career progress, potential ratings, training, and performance.
Estimates of the relationship between AC ratings and success in management range from .31 to .43
(Thornton & Rupp, 2006) with an upper bound of .63 under optimal conditions (Gaugler, Rosenthal,
Thornton, & Bentson, 1987).

ENHANCED FROM MPS


Whereas the AT&T assessments were designed for a large, hierarchical business organization, the
method was later adapted to many other types of settings. For example, governments, schools, fire
departments, hospitals, and religious organizations have all made use of ACs. Different levels of
participants are also assessed, ranging from empowered team members to CEOs.
As the uses of ACs expanded, so did their dimensions and exercises. New types of simulations
are now in use, including the following (Howard, 2006).

Interaction Role-Plays
MPS had only one brief role-play: the in-basket interview was conducted by a role player act-
ing as the participant’s boss. However role-plays have become an important staple in AC practice.
Participants review background information on a peer, internal or external customer, direct report,
prospective client, or other important contact. They then meet with a trained role player to resolve
a problem or gain commitment to a course of action. Role-plays are often structured to bring out a
primary competency such as coaching, influencing others, or managing conflict.

Analysis and Presentation


In these exercises, participants analyze quantitative and narrative data and make short- and/or long-
term recommendations to improve matters such as productivity, quality, profitability, organizational
The Management Progress Study and Its Legacy for Selection 859

structure, or morale. The analysis is often followed by a presentation of the results to one or more
role players or external observers.

Media Interview
Typically confined to executive assessments, these exercises provide participants with background
information about a situation that has attracted media attention. Candidates then meet with a media
representative (a trained role player) to answer probing questions and defend their organization’s
actions.

Visionary Address
Participants make an inspirational talk on some selected problem or issue.

Fact-Finding Exercise
The participant is given a brief description of a hypothetical situation. The task is to seek informa-
tion from a resource person and make a decision about that situation within a limited period.

Each simulation in MPS and early operational centers had its own unique setting. However, the
modern assessment is more likely to be of a “day in the life” variety, in which the participant takes
on one organizational role and faces numerous challenges and tasks related to that role. Such centers
are more realistic for participants and also more efficient, given that participants have to understand
only one background situation before proceeding.
The context in which ACs take place has also changed. Delivery of assessment materials has
advanced from papers wrapped in large envelopes to online messages in e-mail, voice mail, and video
mail. In a simulated desktop environment, participants can schedule appointments, create to-do lists,
and bring new documents into the system. Computer-enhanced ACs also facilitate other efficiencies.
For example, they shorten administration time and facilitate assessor scoring and reporting. Although
participants might still come to an established setting (more likely an office than an estate), other par-
ticipants can avoid traveling to a site altogether and participate by computer in a virtual AC.
AC professionals have also improved the dimensions in order to get reliable ratings on reason-
ably independent characteristics. Definitions of dimensions are more precise and often include key
behaviors. For example, the MPS definition of “decision-making” was simply “How ready is this
person to make decisions, and how good are the decisions that are made?” A modern definition
would include more detailed guidance on what to look for, such as identifying issues, problems, and
opportunities; gathering, comparing, and interpreting data to draw conclusions; using relevant deci-
sion criteria to choose an effective option; and taking action within a reasonable time. The number
of dimensions for an individual AC has been reduced, and the ones that remain are often tied to
specific business challenges.
In summary, ACs have been enhanced in many ways since their beginnings in MPS, but this
is as it should be. The fundamental plasticity of ACs is what keeps them alive and relevant to
modern organizational challenges. What is missing is convincing research to demonstrate that the
various innovations work successfully and do not compromise the method’s validity. For example,
although we might assume that more precisely defined dimensions will increase assessor reliability
and therefore enhance validity, research might reveal instead that a broadly interpreted predictor
will better match a broadly interpreted criterion.

SIDETRACKED FROM MPS


AC research and practice have unfortunately deviated from the MPS model in at least two unpro-
ductive ways. A problem for practice is organizations’ homemade competencies. These are often
armchair driven, poorly defined, and extraordinarily messy in terms of identifying unique behaviors
that can be evaluated.
860 Handbook of Employee Selection

Another sidetrack has been the research demonstrating that AC ratings vary more by exercise
than by dimension, with the resulting recommendation that ACs should be focused on exercises
or tasks rather than dimensions. This research stemmed from use of a multitrait-multimethod
(MTMM) analysis of AC data, with dimensions equated to traits and exercises equated to
methods (cf., Lievens & Conway, 2001). These models could not have been constructed from
MPS or its immediate successors because assessors rated dimensions only after all information
from all exercises were reviewed in an integration session (within-dimension method). Later
centers had assessors rate dimensions by exercise as an additional step before the final
integration (within-exercise method); data from these centers were used to fill the cells of the
MTMM matrix.
As I have argued extensively elsewhere (Howard, 2008), a center where each exercise is used
to rate all or nearly all dimensions is not good measurement and is not representative of AC prac-
tice. Bray selected techniques for MPS believing that each would add a different perspective and
facilitate rating specific dimensions; he did not expect that each technique would be equally adept
at measuring nearly every dimension. Simulations are designed to bring out behaviors relative to
particular dimensions. For example, the in-basket elicits decision-making, planning and organizing,
and delegation, but the leaderless group discussion brings out emergent leadership and interpersonal
skills. The trouble begins when assessors are asked to rate too many dimensions from one exercise.
For example, if the in-basket is used to measure interpersonal skills, where the evidence is bound
to be weak at best, it is no wonder the assessors take their cues from other dimensions they can rate
well from the exercise.
Other findings in psychology suggest that situations (exercises) and individual characteristics
(dimensions) interact in determining behavior. Indeed, where methods other than MTMM have
been used to study the elements of ACs, exercises and dimensions have been shown to account for
meaningful variance in AC performance. One generalizability analysis showed that 60% of the
variance in participants’ AC performance could be attributed to individuals and constructs and only
11% to assessors and exercises (Arthur, Woehr, & Maldegen, 2000).
Aside from the research issues, the recommendation to rate exercises instead of dimensions in an
AC makes no practical sense. Many tasks and jobs change quickly in today’s economy, which would
quickly render task-based assessment results obsolete. More importantly, exercise ratings cannot be
generalized to other situations. Because ACs are often used to simulate untried roles—including
some that may not yet exist—and build a leadership pipeline for as yet undetermined positions, it
is important to be able to generalize about the human qualities needed for various organizational
challenges.
An important principle lost in this sidetrack is that behaviors, not exercises, are the currency
of ACs; exercises serve only to elicit behaviors. Dimensions are logical clusters of behaviors; they
organize the behavioral information so that conclusions can be drawn about participants’ readiness
for future positions and responsibilities.

OVERLOOKED FROM MPS


Simulations have become the hallmark of ACs, and many, if not most, modern centers use them
exclusively. However, this was not the MPS model, which included an extensive array of tools and
techniques. It is true that these measures were often aimed at psychological characteristics that
went well beyond the key abilities and motivations for a middle-management role. Part of the MPS
inclusiveness was because the study, as basic research, sought to understand the full context of the
managers’ lives and not just the key elements shaping their job performance. It was only fitting,
then, that most of these psychological elements were discarded when Bray designed the fi rst opera-
tional assessment for Michigan Bell.
However, the tide has now turned and professional assessors, often with at least some psycho-
logical training, once again prevail in ACs rather than operational managers, who usually cannot
The Management Progress Study and Its Legacy for Selection 861

Cognitive ability (test) .64

Ambition (interview, personality test) .62

Interpersonal skills (simulations) .60

Educational level (biodata) .54

FIGURE 39.8 Assessment exercises predicting MPS:20 level.

be spared from their primary jobs. This turnaround opens the door to incorporating other measure-
ment techniques into the AC. Figure 39.8 illustrates (with multiple regression data) how different
techniques used at the original MPS assessment contributed to the prediction of management level
20 years later.
Although interpersonal skills measured in simulations were highly predictive of advancement in
management, that prediction was enhanced by including educational level (a biodata item), ambition
(comprised of information from the interview and a personality test), and cognitive ability (from a
mental ability test). Instead of being competitive about their favorite method (e.g., “my test works
better than your simulation”) researchers and practitioners should remember that different tech-
niques have advantages for measuring different human characteristics. Wise psychologists should
become better aware of how techniques work together to select the best candidates for organiza-
tional purposes.
Another technique from MPS that has been overlooked is the use of staff specialists. MPS was
privileged to have two clinical psychologists—Joseph F. Rychlak and Walter Katkovsky—who
stayed with the study throughout its three assessments and also participated in the MCS assess-
ments. Rychlak was a specialist in interviewing and coding life themes from those discussions;
Katkovsky was a specialist in interpreting the various projective measures. By keeping these
two aimed at their specialty assignments, MPS and MCS gained reliability and accuracy that
would not be available if they had spread the tasks among a variety of assessors. Other ACs often
have assessors that are particularly interested in or adept at evaluating certain exercises (e.g., in-
baskets) or performing certain role-plays (e.g., a media interview). Although specialists can reduce
administrative flexibility in some situations, the advantages of cultivating them can far exceed the
disadvantages.
Another practice that today’s ACs should take advantage of is the strategic use of cross-situational
dimensions (Howard, 2008). Because MPS made no attempt to rate dimensions after exercises,
assessors could draw information from any technique to support their ratings. ACs that have asses-
sors rate dimensions by exercise—the large majority of those used in practice today—often over-
look the opportunity to build a dimension rating from diverse behaviors elicited from different
exercises. Yet, some complex dimensions cannot be measured well within a single exercise.
Cross-situational dimensions have key behaviors that are best elicited by different types
of stimuli. For example, in the dimension “accelerating change,” a key behavior “identifies
opportunities for improvement,” is best elicited by covertly planting opportunity cues among
various exercises in the design. Another key behavior, “catalyzes change,” is best measured by
an exercise requiring an action plan; whereas a third key behavior, “addresses change resis-
tance,” is best measured in a role-play. Even ACs using the within-exercise method can emulate
the more free-wheeling within-dimension MPS design by having assessors make key behavior
ratings from the relevant exercises and rolling them up into a cross-situational dimension rat-
ing at the integration session. Such a design will add precision to the measurement of complex
dimensions.
862 Handbook of Employee Selection

DURABILITY OF MPS FINDINGS


The MPS legacy for selection will primarily rest in the methodology that it initiated with the man-
agement AC. Yet the question can also be asked if its many research findings are still relevant more
than 50 years later. After all, work and organizational life changed dramatically between the slow-
moving, bureaucratic organizations of the 1950s and the rapidly changing, globally competitive
organizations of the 21st century (Howard, 1988).
Answering this question was a primary driver for beginning MCS in the 1970s. The intent of that
study was to parallel MPS, not only with an initial assessment that could compare college recruits
across two generations, but also with longitudinal data collection and at least one more AC to mea-
sure changes that occur with age and experience.
Unfortunately, the government-ordered breakup of the Bell System in 1984 dashed those ambi-
tious plans. A direct comparison of the two groups, MPS and MCS, at the original assessments
offers some important clues about generational and cultural differences. However, the impact of age
and experience must remain a hypothesis until another sample can be measured with similar instru-
ments across a significant segment of time.
Our hypotheses about the durability of MPS findings rest on a direct comparison of the MPS
college and MCS groups at their original assessment. To check whether the results were unique
to the Bell System, in 1981 and 1982 we initiated the Inter-Organizational Testing Study (ITS),
a cross-sectional study of ten organizations outside of the Bell System. The organizations varied
from a railroad and a bank to an architectural and engineering consulting firm, a news organiza-
tion, and two government agencies; organizational sizes ranged from a few thousand employees to
over 25,000. Samples included nearly 400 middle-aged managers, comparable to the MPS:20 group
in age, and about the same number of young managers, comparable to the MCS group. The par-
ticipants went through a half-day of testing on the same instruments used in the longitudinal study
samples. On nearly every score, the differences between the two generations directly paralleled the
findings with the two AT&T samples and pointed toward the same explanatory models (Howard &
Bray, 1988).

• Aging and experience: We hypothesized that aging and experience accounted for the find-
ings when the MPS sample changed between years 0 and 20 and the MCS sample at year
0 resembled the MPS group at year 0. These changes included an increase in verbal skills
and motivational shifts. Whereas the MPS men’s intrinsic motivation increased over time
(higher need for achievement and work standards), their promotion orientation declined.
That is, as time went by they were more willing to delay promotions and had more realistic
expectations. The personalities of the MPS men seemed to toughen over time; they showed
more resistance to stress, less restraint on hostility, and less inclination to take the blame
when something went wrong. The men also underwent lifestyle changes; the role of friends
declined in favor of more involvement with the marital family and recreational activities.
• Culture changes: A culture change was indicated when the MPS participants changed over
20 years and the MCS group resembled them at year 20 rather than at year 0. A primary
theme here was less respect for institutions. The MPS:20 and MCS:0 participants showed
less deference toward people in authority, lower needs for superior approval, and less reli-
gious involvement. There was also less identification with “masculine” qualities among
the men.
• Both aging and culture changes: We hypothesized that aging and culture changes were at
work when the MPS participants changed between years 0 and 20 but the MCS partici-
pants differed from them at both ages. Factors falling within this hypothesis often involved
having increased control over one’s life. That is, compared with the MPS:0 findings, the
MPS:20 participants, and to a lesser extent the MCS:0 participants, had greater need for
autonomy, less need for peer approval, less involvement with the parental family, lower
The Management Progress Study and Its Legacy for Selection 863

needs for security, and less interest in service to the community. Both groups also showed
a tendency toward less effective interpersonal skills and less behavior flexibility. Greater
pessimism about their future with the organization also appeared to be a culture change
and a characteristic that grew with maturity. The MPS:20 and MCS:0 groups had a lower
motivation for advancement, less forward-striving, and lower career expectations.
The MPS:20 group also showed declines on various scales of a Bell System question-
naire of attitudes toward work and the organization. The MCS:0 participants generally
scored lower than the MPS:0 sample but were inconsistent in their average scores relative
to the group at MPS:20. However, these attitudinal data might be artifacts of the Bell
System questionnaire; other research has typically found higher levels of job satisfaction
for older workers (Hedge, Borman, & Lammlein, 2006).
• Generation differences: Where the MPS group changed little over 20 years but the MCS
group looked significantly different, we hypothesized a generation effect. That is, being
young at a particular point in time might have created qualities unique to a generation.
One prominent but worrisome difference was the younger generation’s reduced interest in
leadership. They were less identified with management, less interested in leading others,
had poorer communication skills, and were more nonconforming. At the same time they
had a greater need for emotional support, on the giving end (nurturance) and the receiving
end (succorance).
The AT&T studies compared the post-World War II generation (MPS) and the Baby
Boom generation (MCS). By now, another generation or two have ushered in new values
and motivations, and more cultural changes have taken place. The results that might be
enduring, then, are those that we isolated as due in whole or in part to aging and experi-
ence. If these findings hold up under the scrutiny of additional research, they have strong
implications for how to develop managers and what to be aware of when selecting from
different age groups.
Some key facts to remember are that the average manager does not improve spontane-
ously; organizations need to work at developing the skills they want. Moreover, there is a
tendency for the rich to get richer and the poor to get poorer. Developing and promoting
your best management candidates will not be enough; organizations also need to attend
to managers lower on the totem pole lest they fall into bad habits or lose their desire to
excel.

In retrospect, the MPS stimulated more than 50 years of AC practice and research with compel-
ling implications for personnel selection. At the moment, there seems to be no reason why it cannot
encourage 50 more.

REFERENCES
Arthur, W. A., Jr., Woehr, D. J., & Maldegen, R. (2000). Convergent and discriminant validity of assessment
center dimensions: A conceptual and empirical re-examination of the assessment center construct-related
validity paradox. Journal of Management, 26, 813–835.
Bray, D. W. (1976, March–April). Identifying managerial talent in women. Atlanta Economic Review, 38–43.
Bray, D. W. (1990). Reminiscing in tempo. Autobiography prepared for the Society of Industrial and
Organizational Psychology. Available online at http://www.siop.org/Presidents/Bray.aspx
Bray, D. W., Campbell, R. J., & Grant, D. L. (1974). Formative years in business: A long-term AT&T study of
managerial lives. New York, NY: Wiley.
Bray, D. W., & Grant, D. L. (1966). The assessment center in the measurement of potential for business man-
agement. Psychological Monographs, 80 (17, Whole No. 625).
Bray, D. W., & Howard, A. (1983). The AT&T longitudinal studies of managers. In K. W. Schaie (Ed.),
Longitudinal studies of adult psychological development. New York, NY: Guilford.
Frederiksen, N., Saunders, D. R., & Wand, B. (1957). The in-basket test. Psychological Monographs, 71,
(Whole No. 438).
864 Handbook of Employee Selection

Gaugler, B. B., Rosenthal, D. B., Thornton, G. C., III, & Bentson, C. (1987). Meta-analysis of assessment
center validity. Journal of Applied Psychology, 72, 493–511.
Grant, D. L., & Bray, D. W. (1969). Contributions of the interview to assessment of managerial potential.
Journal of Applied Psychology, 53, 24–34.
Grant, D. L., Katkovsky, W., & Bray, D. W. (1967). Contributions of projective techniques to assessment of
managerial potential. Journal of Applied Psychology, 51, 226–232.
Hedge, J. W., Borman, W. C., & Lammlein, S. E. (2005). The aging workforce: Realities, myths, and implica-
tions for organizations. Washington, DC: American Psychological Association.
Howard, A. (1974). An assessment of assessment centers. Academy of Management Journal, 17, 115–134.
Howard, A. (1984, August). Cool at the top: Personality characteristics of successful executives. In P. D. Lifton
(Chair), Industrial assessment and personality psychology. Symposium conducted at the meeting of the
American Psychological Association, Toronto, Canada.
Howard, A. (1986). College experiences and managerial performance (Monograph). Journal of Applied
Psychology, 71, 530–552.
Howard, A. (1988). Who reaches for the golden handshake? The Academy of Management Executive, 11(2),
133–144.
Howard, A. (1995). A framework for work change. In A. Howard (Ed.), The changing nature of work (pp. 3–44).
San Francisco, CA: Jossey-Bass.
Howard, A. (2005). Subconscious and conscious motives in long-term managerial success. In G. P. Latham
(Chair), The effects of subconscious trait and state motivation on performance. Symposium presented
at the 20th annual conference of the Society for Industrial and Organizational Psychology, Los Angeles,
CA.
Howard, A. (2006). Best practices in leader selection. In J. A. Conger & R. E. Riggio (Eds.), The practice of
leadership: Developing the next generation of leaders. San Francisco, CA: Jossey-Bass.
Howard, A. (2008). Making assessment centers work the way they are supposed to. Industrial and Organizational
Psychology, 1, 101–107.
Howard, A., & Bray, D. W. (1988). Managerial lives in transition: Advancing age and changing times.
New York, NY: Guilford.
Howard, A., & Bray, D. W. (1990). Predictions of managerial success over long periods of time: Lessons
from the management progress study. In K. W. Clark & M. B. Clark (Eds.), Measures of leadership
(pp. 113–130). West Orange, NJ: Leadership Library of America.
International Task Force on Assessment Center Guidelines. (2000). Guidelines and ethical considerations for
assessment center operations. Public Personnel Management, 28, 315–331.
Lievens, F., & Conway, J. M. (2001). Dimensions and exercise variance in assessment center scores: A large-
scale evaluation of multitrait-multimethod studies. Journal of Applied Psychology, 86, 1202–1222.
Martin, A. H. (1941). Examination of applicants for commissioned rank. In L. Farago & L. F. Gittler (Eds.),
German psychological warfare. New York, NY: Committee for National Morale.
Murray, H. A. (1938). Explorations in personality. New York, NY: Oxford University Press.
Office of Strategic Services Assessment Staff. (1948). Assessment of men. New York, NY: Holt, Rinehart &
Winston.
Riesman, D., Glazer, N., & Denney, R. (1950). The lonely crowd: A study of the changing American character.
New Haven, CT: Yale University Press.
Rychlak, J. F., & Bray, D. W. (1967). A life-theme method for scoring of interviews in the longitudinal study of
young business managers. Psychological Reports (Monograph Supplement, 1-V21).
Thornton, G. C., III, & Byham, W. C. (1982). Assessment centers and managerial performance. New York,
NY: Academic Press.
Thornton, G. C., III, & Rupp, D. E. (2006). Assessment centers in human resource management: Strategies for
prediction, diagnosis, and development. Mahwah, NJ: Lawrence Erlbaum.
U. S. Bureau of the Census. (1975). Historical statistics of the United States: Colonial times to 1970. Washington,
DC: U.S. Department of Commerce.
Vernon, P. E., & Parry, J. B. (1949). Personnel selection in the British forces. London, England: University of
London Press.
Wallace, P. A. (Ed.). (1976). Equal employment opportunity and the AT&T case. Cambridge, MA: MIT Press.
Whyte, W. H., Jr. (1956). The organization man. New York, NY: Simon & Schuster.
40 Project A
12 Years of R & D
John P. Campbell and Deirdre J. Knapp

This chapter1 is about personnel selection and classification research on a scale never before
attempted in terms of (a) the types and variety of information collected, (b) the number of jobs that
were considered simultaneously, (c) the size of the samples, and (d) the length of time that individu-
als were followed as they progressed through the organization.
The effort, commonly known as Project A, was sponsored by the U.S. Army Research Institute
for the Behavioral and Social Sciences (ARI). For contract management reasons the research pro-
gram was conducted as two sequential projects: Project A (1982–1989) and Career Force (1990–
1994), which worked from a single overall design (described subsequently).
Collectively, these projects attempted to evaluate the selection validity and classification effi-
ciency of systematically sampled domains of prediction information for different selection and clas-
sification goals for the entire enlisted personnel system of the U.S. Army, using various alternative
decision rules (i.e., “models”). Pursuing such ambitious objectives required the development of a
comprehensive battery of new tests and inventories, the development of a wide variety of training
and job performance measures for each job in the sample, four major worldwide data collections
involving thousands of Army enlisted job incumbents for one to two days each, and the design and
maintenance of the resulting database.
The truly difficult part was the never-ending need to develop a consensus among all of the project
participants regarding literally hundreds of choices among measurement procedures, analysis meth-
ods, and data collection design strategies. Although many such decisions were made in the original
design stage, many more occurred continuously as the projects moved forward, driven by the target
dates for the major data collections, which absolutely could not be missed. The fact that all major
parts of the projects were completed within the prescribed time frames and according to the speci-
fied research design was a source of wonder for all who participated.

ORIGINS OF PROJECT A
Project A resulted from pressing organizational needs from within the Department of Defense
(DoD), and not from unsolicited proposals for research grants. The post-Vietnam development of
the all volunteer force, the comprehensive force modernization programs of the services during the
1970s, periods of high attrition, and the fallout from the misnorming of forms 6/7 of the Armed
Services Vocational Aptitude Battery (ASVAB) all created great pressure on the military services
to make certain that their selection and classification procedures were as valid as possible. The
culmination was a 1980 Congressional mandate to conduct a more thorough validation of ASVAB
for selection and classification purposes. Prior to 1980, virtually all validation evidence was based

1 Much of this chapter was drawn from Campbell and Knapp (2001) and from the technical reports generated by Project
A and Career Force. We are indebted to all of the individuals from the Army Research Institute (ARI), the Human
Resources Research Organization (HumRRO), the American Institutes for Research (AIR), the Personnel Decisions
Research Institute (PDRI), and the University of Minnesota who contributed their technical expertise, initiative and
effort, personal discipline, and peer leadership to these projects. It was the experience of a lifetime.

865
866 Handbook of Employee Selection

on training grades as criteria. The DoD’s formal response was the Joint-Service Job Performance
Measurement/Enlistment Standards Project (JPM).
Project A was the Army’s contribution to JPM. Moreover, the Army viewed the Congressional
mandate as an opportunity to address a much larger set of personnel research questions than just
the validation of ASVAB against job performance. In September 1982, a contract for Project A was
signed with the Human Resources Research Organization (HumRRO) and its subcontractors, the
American Institutes for Research (AIR) and Personnel Decisions Research Institute, Inc. (PDRI).
The overall design of the Project A program was intended to be fundamentally different from
the conventional paradigm that dominated personnel research from 1906 to 1982, which consisted
of computing the correlation between a single predictor score, or a predictor composite score, and a
single criterion measure of performance obtained from a sample of job incumbents. Literally thou-
sands of such estimates have been generated over the past 100+ years (e.g., Ghiselli, 1973; Hunter
& Hunter, 1984; Nathan & Alexander, 1988; Schmidt, 1988; Schmidt & Hunter, 1998; Schmitt,
Gooding, Noe, & Kirsch, 1984).
There are probably legitimate reasons why single investigators working to generate one bivariate
distribution at a time has served as the dominant paradigm through most of our history. For one,
the recurring problem of how best to select individuals for a particular job in a particular organiza-
tion is a very real one, and a rational management will devote resources to solving such a problem.
Also, certain structural and technological factors have worked against the establishment of long-
term coordinated research projects that dealt with large parts of the personnel system at one time.
For example, the field of industrial and organizational psychology is not very large and the supply
of research labor is limited. When the basic outline of Project A/Career Force was proposed, there
was no single organization or university group that had the resources necessary to carry it out.
Coalitions of organizations had to form. Also, until fairly recently, there were no means available
for coordinating the efforts of researchers who are geographically scattered. Neither was there a
technology for acquiring and maintaining a large central database that could be accessed and ana-
lyzed efficiently from remote locations.
In general, the dominant paradigm came to be so because of the constraints imposed by technol-
ogy, the structural characteristics of the research enterprise itself, and the contingencies built into
the reward structures for individual investigators.

ENABLING OF PROJECT A
Fortunately, along with the Army’s need to address enlisted selection and classification as a system,
there were concomitant developments in the structure and technology of the personnel research
enterprise. For example, advances in computerized database management and electronic communi-
cation made it possible to design, create, edit, update, and maintain a very large database that could
be accessed from anywhere. What is routine now was new and liberating in 1982.
Advances in computerization also permitted use of new testing technologies, and the develop-
ment of powerful, linear programming algorithms made the estimation of classification efficiency
and the comparison of alternative selection/classification strategies using the entire Army database
a very manageable analytic problem. Certainly, the development of confirmatory techniques within
the general domain of multivariate analysis models opened up several powerful strategies for gen-
eralizing research findings from a sample of jobs to a population of jobs and from the specific mea-
sures that were used to a latent structure of constructs.
Finally, the realization in industrial and organizational psychology during the 1970s that one of
our fundamental tasks is to learn things about an appropriately defined population, and not to learn
more and more specific things about specific samples, changed the field’s approach to the estimation
of selection validity and classification efficiency. Meta-analysis and corrections for attenuation and
restriction of range were no longer novel games to play. They were a necessary part of statistical
estimation.
Project A 867

In sum, the intent was to design a research program that would be directly useful for meeting the
system’s needs, both as they existed initially and as changes took place. Simultaneously, everyone
hoped that by considering an entire system and population of jobs at once, and by developing mea-
sures from a theoretical/taxonomic base, the science of industrial and organizational psychology
would also be served.

SPECIFIC RESEARCH OBJECTIVES


The objectives were ambitious and spanned a continuum from operational/applied concerns to more
theoretical interests. They are summarized as follows:

1. Identify the constructs that constitute the universe of information available for selection/
classification into entry-level skilled jobs, given no prior job experience, and develop pre-
dictor measures for those constructs identified as “best bets.”
2. Develop multiple measures of entry-level and noncommissioned officer (NCO) job
performance.
3. Develop a general model of performance for entry-level skilled jobs and for NCO jobs.
4. Validate existing selection measures (i.e., ASVAB) against training and job performance.
5. On the basis of the “best bet” constructs, validate a battery of new predictor measures.
6. Estimate validity and incremental validity of the new predictors.
7. Estimate the degree of differential prediction across (a) major domains of predictor infor-
mation, (b) major factors of job performance, and (c) different types of jobs.
8. Develop new analytic methods to evaluate optimal selection and classification.
9. Compare the marginal utility of performance across jobs.
10. Develop a fully functional research database that includes all archival research data on the
three cohorts of new Army accessions included in the research program.

OVERALL RESEARCH DESIGN


The first 6 months of the project were spent developing a final detailed version of the operative
research plan, which was published in 1983 as ARI Research Report No. 1332, Improving the
Selection, Classification, and Utilization of Army Enlisted Personnel: Project A Research Plan.

SAMPLING JOBS (MOS)


In 1982 the population of enlisted jobs included approximately 275 different Military Occupational
Specialties (MOS), and the entire enlisted force was approximately 800,000. Because data could
not be collected from all of them, MOS were sampled representatively after considering the tradeoff
between the number of jobs to be researched and the number of individuals sampled from each
job. Cost considerations dictated that 18–20 MOS could be studied if the initial goal was 500 job
incumbents per job, and this assumed that a full array of job-specific performance measures would
be developed for only a subset of those MOS.
An initial sample of 19 MOS was drawn on the basis of the following considerations:

1. High-density jobs that would provide sufficient sample sizes


2. Representation of the Career Management Fields (CMF) into which MOS are organized
3. Representation of the jobs judged most crucial to the Army’s missions

The initial set of 19 MOS included only 5% of Army jobs but represented 44% of the soldiers
recruited in FY81. An independent (i.e., without considering the CMF designation) cluster analysis
of MOS (based on task content similarity) was carried out via Army subject matter expert (SME)
868 Handbook of Employee Selection

TABLE 40.1
Sampled Military Occupational Specialties (MOS)
MOS Batch A MOS Batch Z
11B Infantryman 12B Combat engineer
13B Cannon crewman 16S MANPADS crewman
19E/K Armor tank crewman 27E Tow/dragon repairer
31C Single-channel radio operator 29E Communications-electronics radio repairer
63B Light-wheel vehicle mechanic 51B Carpentry/masonry specialist
71L Administrative specialist 54B NBC specialist
88M Motor transport operator 55B Ammunition specialist
91A/B Medical specialist/medical NCO 67N Utility helicopter repairer
95B Military police 76Y Unit supply specialist
94B Food service specialist
96B Intelligence analyst

judgments to evaluate the representativeness of the sample of 19 and to make changes in the compo-
sition of the sample if it was judged appropriate to do so. Two jobs were added, and Table 40.1 shows
the resulting MOS (n = 21) that were studied over the course of Project A. “Batch A” MOS received
the most attention in that soldiers in these jobs were administered a full array of first- and second-
tour job performance measures, including hands-on work sample tests, written job knowledge tests,
and Army-wide and MOS-specific ratings. Soldiers in “Batch Z” were not measured as extensively
with regard to the job performance criterion measures.

DATA COLLECTION DESIGN


The basic design framework and major samples are depicted in Figure 40.1. The design encom-
passed two major cohorts, each of which was followed into their second tour of duty (i.e., enlistment
term) and collectively produced six major research samples. Development of the predictor and cri-
terion measures administered during the major phases of this research involved dozens of additional
smaller data collection efforts as well, for purposes of pilot and field-testing. Each of the six major
data collections is briefly characterized below.

Concurrent Validation (CVI) Sample


This sample was drawn from soldiers who had entered the Army between July 1983 and June 1984
and had served 18–24 months. Data were collected from soldiers and their supervisors at 13 posts
in the United States and at multiple locations in Germany. Batch A soldiers (see Table 40.1) were
assessed for 1.5 days on the first-tour job performance measures and for a half-day on the new
predictor measures. Batch Z soldiers were tested for a half-day on a subset of the performance mea-
sures and a half-day on the new predictors.

Longitudinal Validation Predictor (LVP) Sample


Virtually all new recruits who entered the Army into one of the sampled MOS from August 1986
through November 1987 were assessed on the 4-hour Experimental Battery (to be described) within
2 days of first arriving.

Longitudinal Validation End-of-Training (LVT) Sample


End-of-training performance measures were administered to those individuals in the LVP sample
who completed advanced individual training (AIT), which could take from 2–6 months, depending
on the MOS.
Project A 869

83 84 85 86 87 88 89 90 91 92 93

81/82
File
data
analyses 83/84
Concurrent
sample

Trial
predictor 83/84
battery Concurrent
and sample
first-tour
performance Second-tour
measures performance
N = 9,500 measures
CVI N = 1,000
CVII

86/87 Longitudinal sample 86/87 86/87


Longitudinal Longitudinal
Experimental End-of-training
sample sample
predictor performance
battery measure First-tour Second-tour
N = 45,000 N = 30,000 performance performance
LVP LVT measures measures
N = 10,000 N = 1,500
LVI LVII

83 84 85 86 87 88 89 90 91 92 93

FIGURE 40.1 Project A: Overall data collection design and major research samples.

Longitudinal Validation First-Tour (LVI) Sample


Individuals in the 86/87 cohort who were measured with the Experimental Battery, completed train-
ing, and remained in the Army were assessed with the first-tour job performance measures after
serving 18–24 months. Data were collected from 13 posts in the United States and multiple loca-
tions in Europe.

• Concurrent validation second-tour (CVII) sample: The same teams that administered the
first-tour performance measures to the LVI sample administered second-tour performance
measures at the same location and during the same time periods to a sample of junior
NCOs from the 83/84 cohort who were in their second tour of duty (4–5 years of service).
• Longitudinal validation second-tour (LVII) sample: This sample included members of the
86/87 cohort from the Batch A MOS who were part of the LVP (predictors), LVT (training
performance measures), and LVI (first-tour job performance measures) samples and who
reenlisted for a second tour (6–7 years of total service). The second-tour (NCO) perfor-
mance measures were administered at 15 U.S. posts, multiple locations in Germany, and
two locations in Korea.

RESEARCH INSTRUMENT DEVELOPMENT: PREDICTORS


A major objective was to develop an experimental battery of new tests that had maximum poten-
tial for enhancing selection and classification decisions for the entire enlisted personnel system.
Consequently, rather than basing the selection of predictor constructs on job analyses of the
specific occupations in question, although we subsequently did them for purposes of criterion
development, the general strategy was to identify a universe of potential predictor constructs for
870 Handbook of Employee Selection

the population of enlisted MOS and then sample appropriately from it. The appropriate constructs
were those that were judged to maximize the expected linkages with the population of perfor-
mance components, not just the performance components encompassed by the 21 MOS in the
sample. The next steps were to construct measures for each sampled construct that was above a
threshold of criticality, given the measurement purposes. Use of available commercial measures
was not considered given the requirements of the military enlistment testing system, although
some such tests (e.g., subtests from the Employee Aptitude Survey, Differential Aptitude Test)
were used as marker measures in the development of the new measures. Accordingly, the intent
was to develop a predictor battery that was maximally useful for selection and classification into
an entire population of jobs, and that provided maximal incremental information beyond that
provided by the ASVAB.

CRITICAL CONSTRUCTS
After a thorough literature review (including available meta-analyses), the research team identified
a list of 53 potentially useful predictor variables. A sample of 35 personnel selection experts was
then asked to estimate the expected correlations between each predictor construct and an array of
potential performance factors. All of the information was then used to arrive at a final set of vari-
ables (k = 25) for which new measures would be constructed.
The measure development process included (a) a large-scale field test that also administered a set
of established marker tests for several of the constructs (see Peterson et al., 1990); (b) development
of software for a computerized battery of perceptual/psychomotor tests, as well as a portable test-
ing station and a special response pedestal permitting various responses (e.g., one-hand tracking,
two-hand coordination); (c) several paper-and-pencil cognitive tests; and (d) two inventories. One
inventory assessed relevant vocational interests, and the second focused on major dimensions of
personality and biographical history. This process resulted in the following experimental predictor
battery that was first used in the Concurrent Validation (CVI) in 1985.

Experimental Cognitive Ability Tests: Paper and Pencil


This section presents a description of the tests grouped by the construct they were intended to
measure.

Spatial visualization—rotation
Assembling objects test: Select the alternative that depicts the components assembled
correctly.
Object rotation test: Is the figure represented the same as the test object, only rotated?
Spatial visualization—scanning
Maze test: Determine which of the four entrances leads to a correct pathway.
Spatial orientation
Orientation test: Rotate the frame to match correct orientation of a picture.
Map test: From directions to only one landmark infer directions to another.
Reasoning test: Given a series of figures, identify the figure that should appear next.

Experimental Cognitive Ability Tests: Computer-Based


All data for these tests were collected using the custom fabricated response pedestal.

Reaction Time (Processing Efficiency)


Simple reaction time 1: Mean decision time (the time between appearance of the stimulus
and the removal of the subject’s hand from the home button to strike response button) and
movement time (total time to strike response button minus decision time) were computed.
Project A 871

Choice reaction time 2: Making correct choices from two alternatives.


Short-term memory
Memory search test: Memory for previous displays of letters or numbers.
Perceptual speed and accuracy
Perceptual speed and accuracy test: Percent correct and mean decision time for compari-
son of two visual stimuli presented simultaneously (same or different?).
Target identification test: Item shows a target at the top of the screen and three color-
labeled stimuli near the bottom. Identify which stimulus represents the same object as
the “target.” Percent correct and mean decision times are computed.

Psychomotor Precision
Psychomotor precision encompasses two of the ability constructs identified by Fleishman and his
associates: control precision and rate control (Fleishman, 1967).

Target tracking test 1: As a target moves at a constant rate along a path consisting of horizontal
and vertical lines, a single joystick is used to keep crosshairs centered on the target. The mean
distance from the crosshairs to the center of the target, computed several times each second,
constitutes overall accuracy.
Target shoot test: A target moves in an unpredictable manner. A joystick is used to move the
crosshairs into the center of the target and “fire.” Scored on accuracy and “time to fire.”
Multilimb coordination
Target tracking test 2: The subject manipulates two sliding resistors to control movement of
the crosshairs: one in the horizontal plane and the other in the vertical plane.
Number operations
Number memory test: Successive arithmetic operations appear on the screen until a solution is
presented and the subject must indicate whether the solution presented is correct.
Movement judgment
Cannon shoot test: A “cannon” appears and can “fire” a shell, which travels at a constant
speed, at a moving target. Subject must fire so that the shell intersects the target.

Personality/Temperament and Biographical Measures


The biographical and temperament/personality variables were incorporated in an inventory titled
the Assessment of Background and Life Experiences (ABLE). A list of the specific scales and the
composites into which they are grouped are shown in Table 40.2.

Interest (AVOICE) Factors/Scales


The Vocational Interest Career Examination was originally developed by the Air Force. That inven-
tory served as the starting point for the AVOICE (Army Vocational Interest Career Examination).
The intent for the AVOICE was to sample content from all six of the constructs identified in
Holland’s (1966) hexagonal model of interests, as well as to provide coverage of the vocational
areas most important in the Army. The 22 scales assessed by the AVOICE, and the composites into
which they are grouped, are shown in Table 40.2. The scales can also be sorted into the Holland
factors.

Measurement of Outcome Preferences


On the basis of the extensive literature on job outcomes provided by studies of job satisfaction and
work motivation, an inventory was developed that asked the respondent to reflect the strength of
his or her preferences for certain job outcomes (e.g., rewards) on a seven-point scale. The final form
of the inventory was titled the Job Orientation Blank (JOB). Factor analyses of the field test data
suggested that the JOB’s 29 items could be grouped into six factors, which were grouped into three
composites, as shown in Table 40.2.
872 Handbook of Employee Selection

TABLE 40.2
Experimental Battery: Composite Scores and Constituent Basic Scores
Computer-Administered Test
ASVAB Composites (4) Composites (8) ABLE Composite (7) AVOICE Composites (8)
Quantitative Psychomotor Achievement orientation Rugged outdoors
Math knowledge Target tracking 1 distance Self-esteem Combat
Arithmetic reasoning Target tracking 2 distance Work orientation Rugged individualism
Cannon shoot time score Energy level Firearms enthusiast
Technical Target shoot distance
Auto shop Leadership potential Audiovisual arts
Mechanical Movement time Dominance Drafting
comprehension Pooled movement time Audiographics
Electronics information Dependability Aesthetics
Perceptual speed Traditional values
Speed Perceptual speed & Conscientiousness Interpersonal
Coding speed accuracy (DT) Nondelinquency Medical services
Number operations Target identification (DT) Leadership guidance
Adjustment
Verbal Basic speed Emotional stability Skilled/technical
Word knowledge Simple reaction time (DT) Science/chemical
Paragraph Choice reaction time (DT) Cooperativeness Computers
comprehension Cooperativeness Mathematics
General science Perceptual accuracy Electronic
Perceptual speed & Internal control communication
accuracy (PC) Internal control
Spatial Test Composite (1) Target identification (PC) Administrative
Assembling objects test Physical condition Clerical/administrative
Object rotation test Basic accuracy Physical condition Warehouse/shipping
Maze test Simple reaction time (PC)
Orientation test Choice reaction time (PC) Food service
Map test Food service professional
JOB Composites (3)
Reasoning test Number speed and accuracy Food service employee
Number speed (operation High job expectations
DT) Pride Protective services
Number memory (PC) Job security Fire protection
Serving others Law enforcement
Short-term memory Ambition
Short-term memory (PC) Structural/machines
Short-term memory (DT) Job routine Mechanics
Routine Heavy construction
Electronics
Job autonomy Vehicle operator
Autonomy
DT, decision time; PC, proportion correct.

Basic Predictor Composite Scores


The ASVAB together with the experimental tests produced a set of 72 scores. This number was
too large for validation analyses that take advantage of idiosyncratic sample characteristics (e.g.,
multiple regression). Therefore, a series of analyses was conducted to determine a smaller set of
predictor composite scores that would preserve the heterogeneity of the full set of basic scores to
the greatest extent possible. These analyses included exploratory factor analyses and confirmatory
Project A 873

factor analyses guided by considerable prior theory and empirical evidence (McHenry, Hough,
Toquam, Hanson, & Ashworth, 1990; Peterson et al., 1990). A final set of 31 composites was identi-
fied and is shown in Table 40.2.
Collectively, the Experimental Battery and the ASVAB were intended to be a comprehensive
and representative sample of predictor measures from the population of individual differences that
are relevant for personnel selection and classification, and which can be measured in a standardized
fashion at the time of organizational entry.

JOB ANALYSES AND CRITERION DEVELOPMENT


JOB ANALYSES
In contrast to the predictors, virtually all criterion development in Project A/Career Force was based
on extensive job analyses, including task descriptions, critical incident analysis, and interviews with
Army SMEs. Relevant job manuals and available Army Occupational Survey results were used
to enumerate the complete population of major tasks (100–150) for each MOS. The tasks for each
MOS were then grouped into clusters and rated for criticality and difficulty by panels of SMEs.
Additional panels of Army SMEs were used in a workshop format to generate approximately 700
to 800 critical incidents of effective and ineffective performance per MOS that were specific to each
MOS and approximately 1,100 critical incidents that could apply to any MOS. For the MOS-specific
and Army-wide critical incidents, a retranslation procedure was carried out to establish dimensions
of performance.
Together, the task descriptions and critical incident analyses of MOS-specific and Army-wide
performance were intended to produce a detailed content description of the major components of
performance in each MOS and to provide the basis for the development of the performance criterion
measures.
The job analysis goals for the second tour included the description of the major differences in
technical task content between first and second tour and the description of the leadership/supervi-
sion components of the junior NCO position. The task analysis and critical incident steps used for
first tour were also used for second tour. In addition, a special 46-item job analysis instrument, the
Supervisory Description Questionnaire, was constructed and used to collect item criticality judg-
ments from SMEs. Consequently, the supervisory/leadership tasks judged to be critical for an MOS
became part of the population of tasks for that MOS.

PERFORMANCE CRITERIA
The general goals of training performance and job performance measurement were to define, or
model, the total domain of performance in some reasonable way and then develop reliable and valid
measures of each major factor. The general procedure for criterion development followed a basic
cycle of a comprehensive literature review, initial instrument construction based on the job analyses
previously described, pilot testing, instrument revision, field-testing, and proponent (i.e., manage-
ment) review. The specific measurement goals were as follows:

1. Develop standardized measures of training achievement for the purpose of determining


the relationship between training performance and job performance.
2. Make a state-of-the-art attempt to develop job sample or “hands-on” measures of job
task proficiency.
3. Develop written proceduralized knowledge measures of job task proficiency.
4. Develop rating scale measures of performance factors that are common to all first-tour
enlisted MOS (Army-wide measures), as well as for factors that are specific to each MOS.
874 Handbook of Employee Selection

5. Compare hands-on measurement to paper-and-pencil tests and rating measures of profi-


ciency on the same tasks (i.e., a multitrait, multimethod approach).
6. Evaluate existing administrative records as possible indicators of job performance.

INITIAL THEORY
Criterion development efforts were guided by a model that viewed performance as truly multidi-
mensional. For the population of Army entry-level enlisted positions, there were two major types
of performance components: (a) those that reflect specific technical tasks or specific job behaviors
that are not required for other jobs and (b) components that are defined and measured in the same
way for every job (i.e., Army-wide), such as first aid and basic weapons proficiency, contributions to
teamwork, continual self-development, support for the norms and customs of the organization, and
perseverance in the face of adversity. The working model of total performance with which Project
A began viewed performance as multidimensional within these two broad categories.

TRAINING PERFORMANCE MEASURES


Because a major program objective was to determine the relationships between training perfor-
mance and job performance, a comprehensive training achievement test was constructed for each
MOS on the basis of matching the previously determined content of the critical job tasks for each
MOS to the program of instruction (POI). For the POI content judged to be reflective of critical job
tasks, items were written to represent the proceduralized knowledge reflective of how to do a task.
After pilot testing, revision, and review, the result was a 150- to 200-item training achievement test
for each MOS. Rating scales were also developed for completion by peers and drill instructors at
the end-of-training (EOT).

First-Tour (Entry-Level) Performance Measures


For first-tour performance criteria development, the task-based information was used to develop
standardized hands-on job samples, paper-and-pencil job knowledge tests, and rating scales for
each Batch A MOS. Roughly 30 critical tasks per MOS were covered by the written job knowledge
tests and rating scales, and about one-half of those tasks were also tested using a hands-on format.
For the hands-on simulations, each examinee passed through a testing station for each of the 15 ±
2 major job tasks and was asked to perform a standardized simulation of the task, using real equip-
ment, if at all possible.
From the critical incident analyses, a modified behaviorally anchored rating scale procedure
was used to construct six to nine rating scales for performance factors specific to a particular job
and also for ten performance factors that were defined in the same way and relevant for all jobs.
The critical incident procedure was also used with workshops of combat veterans to develop rating
scales of predicted combat effectiveness because, except in the late stages of the project, soldiers in
our samples did not have combat experience. Ratings were gathered from peers and supervisors of
first-tour soldiers. Data collection activities included a comprehensive rater training program.
The final category of criterion measures was produced by a search of the Army’s administrative
records for potential performance measures, which yielded several promising indicators that were
collected largely through self-report.

Second-Tour (NCO) Measures


The job analyses of the second-tour jobs indicated considerable overlap between first-and second-
tour technical task content, although NCOs were expected to perform at somewhat higher levels.
Consequently, although there were some differences in technical tasks selected for testing, the first-
tour technical performance measures were generally used to measure second-tour performance as
Project A 875

well. The more substantive differences occur because during their second tour, soldiers begin to
take on substantial and critical leadership responsibilities.
The second-tour job analysis results identified six additional MOS-specific leadership dimen-
sions and three Army-wide leadership dimensions. A set of supervisory performance rating scales
was created to measure the following dimensions: acting as a role model, communication, per-
sonal counseling, monitoring subordinate performance, organizing missions/operations, personnel
administration, and performance counseling/correcting. Because it proved infeasible to collect peer
ratings from second-tour soldiers in CVII, only supervisor ratings were collected in LVII.
On the basis of a review of the literature and consideration of feasibility, two additional methods
were developed for assessing NCO performance. The first was a set of assessment center-like role-
play exercises, and the second was a written situational judgment test.

Supervisory Role-Play Exercises


Role-play exercises were developed to simulate three of the critical and distinct NCO supervisory
tasks.

1. Counseling a subordinate with personal problems that affect performance


2. Counseling a subordinate with a disciplinary problem
3. Conducting one-on-one remedial training

The format for the simulations was for the examinee to play the role of a supervisor. A trained con-
federate, who also scored the performance of the examinee on several specific dimensions within
the three major tasks, played the subordinate.

Situational Judgment Test


The situational judgment test (SJT) measurement’s purpose was to evaluate the effectiveness of
judgments about how to react in typical supervisory problem situations. A critical incident method-
ology was used to generate the situations, and response options were developed through input from
senior NCO SMEs and from examinees during the field tests. Independent groups of SMEs scaled
the response options in terms of effectiveness, and examinees selected the options they believed
would be most and least effective.

MODELING THE LATENT STRUCTURE OF PERFORMANCE


As detailed above, there were three distinct performance domains—training performance, first-tour
job performance, and second-tour job performance—and there were many more individual scores
for each person than there were on the predictor side (e.g., 150+ scores for first-tour performance).
Depending on the instrument, either expert judgment or exploratory factor analysis/cluster analysis
was used to identify “basic” composite scores that reduced the number of specific individual scores
but attempted to minimize the loss of information. These analyses resulted in 24–28 basic criterion
scores for each job, which was still too many for validation purposes.
The next step used all available expert judgment to postulate a set of alternative a priori factor
models of the latent structure underlying the covariances among the basic scores. These alternative
models were then subjected to a confirmatory analysis using LISREL. The first confirmatory test
used the covariance matrix estimated on the CVI sample to evaluate the relative accuracy of fit of
the alternative models proposed by the psychologist SMEs. The model that best fit the concurrent
sample data was evaluated again by fitting it to the LVI sample data. This was a true confirmatory
test. The same procedure was used to determine the best fitting model in the longitudinal sample
(LVI), from among a new set of a priori alternatives proposed by SMEs, and then evaluate it again
on the CVI data. A similar kind of double cross-validation procedure was followed to test the fit of
alternative factor models to the basic criterion score covariances estimated from the concurrent and
876 Handbook of Employee Selection

longitudinal second-tour samples (CVII and LVII). For the first- (entry-level enlisted) and second-
tour (junior NCOs), the best fitting models determined independently in the concurrent and longi-
tudinal sample were virtually identical. Also, the best fitting model in one sample (i.e., current or
longitudinal) fit the data equally as well in the other sample.
Because there were far fewer criterion measures, a similar confirmatory procedure was not used
to model training performance. Instead, expert judgment was used to group the training perfor-
mance criteria into composites that paralleled the latent factors in the first-tour and second-tour
performance models. The expert judgment based factors were then checked against the criterion
intercorrelation matrix estimated from the training validation (LVT) sample. The prescribed model
fit the data better than any alternative.
In summary, the modeling analyses produced a specification of the factor scores (i.e., latent
variables) that defined the latent structure of performance at each of three organizational levels, or
career stages: EOT performance, first-tour job performance, and second-tour job performance. It
was the performance factor scores at each of these three points that constituted the criterion scores
for all subsequent validation analyses.

A MODEL OF TRAINING PERFORMANCE


As noted previously, the EOT performance measures were intended to parallel the Army-wide rating
scales and job knowledge tests used for first-term job incumbents in the same MOS. Consequently,
the training achievement tests constructed for each MOS covered Army-wide basic training content
and MOS-specific technical content. Of the ten Army-wide first-term rating scales, three had no
EOT counterpart. The remaining seven were grouped into clusters to parallel the first-tour ratings
factors. A leadership potential scale was added. The six scores obtained from the EOT measures
are shown in Figure 40.2.

A MODEL OF FIRST-TOUR JOB PERFORMANCE


The result of the confirmatory factor analyses described earlier was a five-factor model of first-term
(entry-level) performance that was very robust across samples. The definition of the factors is pro-
vided below and the basic criterion scores that comprise them are shown in Figure 40.3.

1. Core Technical Proficiency (CTP): Represents the proficiency with which the soldier per-
forms the tasks that are “central” to the MOS. These tasks represent the core of the job
and are its primary definers. This construct does not include the individual’s willingness to
perform the task.
2. General Soldiering Proficiency (GSP): In addition to the technical content specific to an
MOS, individuals in every MOS also are responsible for being able to perform a variety of
Army-wide tasks (e.g., first aid, land navigation).
3. Effort and Leadership (ELS): This construct reflects the degree to which the individual
exerts effort over the full range of job tasks, perseveres under adverse or dangerous condi-
tions, and demonstrates leadership and support toward peers.

From the EOT achievement test


1. Technical content score (TECH)
2. Army-wide basic training content total score (BASC)
From the EOT rating scales
3. Technical achievement and effort (ETS)
4. Maintaining personal discipline (MPD)
5. Physical fitness and military bearing (PFB)
6. Leadership potential (LDR)

FIGURE 40.2 A model of training performance and associated scores.


Project A 877

1. Core technical proficiency (CTP) 1. Core technical proficiency (CTP)


• Hands-on test–MOS specific tasks • Hands-on test–MOS specific tasks
• Job knowledge test–MOS-specific tasks • Job knowledge test–MOS specific tasks

2. General soldiering proficiency (GSP) 2. General soldiering proficiency (GSP)


• Hands-on test–common tasks • Hands-on test–common tasks
• Job knowledge test–common tasks • Job knowledge test–common tasks

3. Effort and leadership (ELS) 3. Achievement and effort (AE)


• Admin: number of awards and certificates • Admin: number of awards and certificates
• Army-wide rating scales: overall • Army-wide rating scales: overall
effectiveness rating scale effectiveness rating scale
• Army-wide rating scales: • Army-wide rating scales: technical
effort/leadership ratings factor skill/effort ratings
• Average of MOS specific ratings scales • Average of MOS-specific rating scales
• Average of combat performance prediction
4. Maintaining personal discipline (MPD) rating scales
• Admin: number of disciplinary actions
• Admin: promotion rate score 4. Maintaining personal discipline (MPD)
• Army-wide rating scales: personal • Admin: number of disciplinary actions
discipline ratings factor • Army-wide rating scales: personal
discipline ratings factor
5. Physical fitness and military bearing (PFB)
• Admin: physical readiness test score 5. Physical fitness and military bearing (PFB)
• Army-wide rating scales: physical • Admin: physical readiness test score
fitness/bearing ratings factor • Army-wide ratings scales: physical
fitness/bearing ratings factor

6. Leadership (LDR)
• Admin: promotion rate score
• Army-wide rating scales:
leading/supervising ratings factor
• Individual scores from each of the
three role plays
• Situational judgment test–total score

FIGURE 40.3 Models of first- and second-tour job performance and associated scores.

4. Maintaining Personal Discipline (MPD): MPD reflects the degree to which the individual
adheres to Army regulations and traditions, exercises personal self-control, demonstrates
integrity in day-to-day behavior, and does not create disciplinary problems.
5. Physical Fitness and Military Bearing (PFB): PFB represents the degree to which the
individual maintains an appropriate military appearance and stays in good physical
condition.

Note that the first two factors are represented by the hands-on work sample test and the job knowl-
edge tests, whereas the last three factors are each represented by rating scales and administrative
measures. Again, this factor solution represented the best fitting a priori model in the concurrent
and longitudinal cohort samples and was cross-validated from one cohort to the other with no loss
in the accuracy of fit. It was a very stable representation. It also led to further developments in per-
formance modeling, most notably the eight-factor model described by J. P. Campbell and his col-
leagues (e.g., Campbell, McCloy, Oppler, & Sager, 1993) and the core technical versus contextual
performance model proposed by Borman and Motowidlo (1993).

A MODEL OF SECOND-TOUR PERFORMANCE


A confirmatory factor analysis procedure similar to that used for the first-tour analysis yielded
a six-factor model of NCO performance. The sixth factor represents the leadership supervisory
878 Handbook of Employee Selection

component. The other five factors are very similar in content to the first-term model, as shown
in Figure 40.3, which presents the factors and associated basic criterion scores for the fi rst- and
second-term models side-by-side. Elements of the two models that differ are identified in italics.

CORRELATIONS OF PAST PERFORMANCE WITH FUTURE PERFORMANCE


The longitudinal component of the Project A design provided an opportunity to collect performance
data on the same people at three points in time: (a) at the end of training (LVT), (b) during the first
tour of duty (LVI), and (c) during the second tour of duty (LVII). It was virtually an unparalleled
opportunity to examine the consistencies in performance over time from the vantage point of mul-
tiple jobs, multiple measures, and a substantive model of performance itself.
This question encompasses at least two specific issues. First, the degree to which individual dif-
ferences in future performance can be predicted from individual differences in past performance
is a function of the relative stability of performance across time. Do the true scores for individuals
change at different rates even when all individuals are operating under the same “treatment” condi-
tions? The arguments over this question sometimes become a bit heated (Ackerman, 1989; Austin,
Humphreys, & Hulin, 1989; Barrett & Alexander, 1989; Barrett, Caldwell, & Alexander, 1985;
Henry & Hulin, 1987; Hulin, Henry, & Noon, 1990).
The second issue concerns whether the current and future jobs possess enough communality in
their knowledge, skill, or other attribute requirements to produce valid predictions of future perfor-
mance from past performance. Perhaps the determinants of performance on the new job are simply
too different. For example, the degree to which “managers” should possess domain-specific exper-
tise has long been argued. Just as an army should not be equipped and trained to fight only the last
war, the promotion system should not try to maximize performance in the previous job. The data
from Project A permit some of the above issues to be addressed, and the models of performance for
training, first-tour performance, and second-tour provide some clear predictions about the pattern
of convergent and divergent relationships.
The LVT × LVI, LVI × LVII, and LVT × LVII intercorrelations, corrected and uncorrected for
attenuation, are reported in Reynolds, Bayless, and Campbell (2001). Only the correlations between
first- and second-tour performance are shown here (Table 40.3). Three correlations are shown for
each relationship. The top figure is the mean correlation across MOS corrected for restriction of
range (using the training sample as the population) but not for attenuation. The first value in the
parentheses is this same correlation after correction for unreliability in the measure of “future”
performance, or the criterion variable when the context is the prediction of future performance from
past performance. The second value within the parentheses is the value of the mean intercorrelation
after correction for unreliability in the measure of “current” performance and the measure of future
performance. The reliability estimates used to correct the upper value were the median values of
the individual MOS reliabilities.
The pattern of correlations in Table 40.3 exhibits considerable convergent and divergent properties.
The most interesting exception concerns the prediction of second-tour leadership performance.
Virtually all components of previous performance are predictive of future leadership performance,
which has important implications for modeling the determinants of leadership. For example, on
the basis of the evidence in Table 40.3, one might infer that effective leadership is a function of
being a high scorer on virtually all facets of performance. The least critical determinant is military
bearing and physical fitness, which some might call “looking like a leader.” Project A provides the
only existing data set for examining these issues. A surprisingly similar pattern of relationships
was found when training performance was used as a predictor of first-tour performance and of
NCO performance. The average corrected correlation between the rating of leadership potential for
trainees and rated leadership potential for first-term individuals was .58. In general, the results were
consistent, meaningful, and stronger than anyone expected.
Project A 879

TABLE 40.3
Zero-Order Correlations of First-Tour Job Performance (LVI) Variables With
Second-Tour Job Performance (LVII) Variables: Weighted Average Across MOS
LVI:CTP LVI:GSP LVI:ELS LVI:MPD LVI:PFB LVI:NCOP
LVII: Core technical .44 .41 .25 .08 .02 .22
proficiency (CTP) (.55/.59) (.49/.55) (.30/.33) (.10/.11) (.02/.03) (.26/.29)
LVII: General .51 .57 .22 .09 −.01 .19
soldiering (.60/.68) (.67/.76) (.26/.29) (.11/.12) (−.01/−.01) (.22/.25)
proficiency (GSP)
LVII: Effort and .10 .17 .45 .28 .32 .43
achievement (EA) (.11/.12) (.18/.20) (.49/.53) (.30/.33) (.35/.38) (.46/.50)
LVII: Leadership .36 .41 .38 .27 .17 .41
(LEAD) (.39/.42) (.44/.47) (.41/.45) (.29/.32) (.18/.20) (.44/.48)
LVII: Maintain −.04 .04 .12 .26 .17 .16
personal discipline (–.04/–.05) (.04/.05) (.13/.15) (.29/.32) (.19/.21) (.18/.20)
(MPD)
LVII: Physical fitness −.03 –.01 .22 .14 .46 .30
and bearing (PFB) (–.03/−.04) (–.01/–.01) (.24/.27) (.15/.17) (.51/.56) (.33/.36)
LVII: Rating of .11 .15 .35 .25 .31 .41
overall effectiveness (.14/.16) (.19/.22) (.45/.49) (.32/.36) (.40/.44) (.53/.68)
(EFFR)

Total pairwise N values range from 333 to 413. Correlations corrected for range restriction. Correlations between matching
variables are in bold. Leftmost coefficients in parentheses are corrected for attenuation in the future criterion. Rightmost
coefficients in parentheses are corrected for attenuation in both criteria. ELS, effort and leadership; NCOP, NCO potential—a
single scale.

CRITERION-RELATED VALIDATION
TYPES OF INFORMATION
Project A/Career Force produced a very large archive of validation results. The following is a basic
list of the major categories of information.

1. For the CVI and LVI samples (i.e., the prediction of first-tour performance), the basic
data set consisted of the zero-order correlations of each basic predictor score with each of
the five performance factors. This intercorrelation matrix was computed for each MOS in
each of the CVI and LVI samples. Then for each sample, the full prediction equation (e.g.,
each predictor variable was included) was evaluated for each criterion factor, using several
different kinds of predictor weights. For comparative purposes, the estimates were cor-
rected for restriction of range and for criterion unreliability.
2. The incremental validity of each predictor domain over ASVAB was evaluated for pre-
dicting each of the five performance factors in CVI and LVI.
3. The same basic validities and the incremental validities were estimated for the CVII and
LVII samples, except that the six factors in the second-tour (NCO) performance model
were used as criteria.
4. For the LVII sample, the ASVAB, the Experimental Battery, and assessment of perfor-
mance during the first tour were weighted via hierarchical regression to estimate the
incremental validities over ASVAB produced by adding the Experimental Battery first
and then the first-tour performance assessment to the prediction of performance in the
second tour.
880 Handbook of Employee Selection

5. The LVI sample data were used to estimate the overall classification gains (compared to
gains from selection only) when particular predictor batteries were used to make optimal
assignments to MOS.
6. Using a nine-factor measure of the perceived leadership environment for enlisted person-
nel that was administered during CVI, the moderator effects of the perceived leadership
environment on the relationship between cognitive ability and first-tour performance and
the relationship between personality and first-tour performance were estimated.

SELECTED HIGHLIGHTS OF RESULTS


The Project A/Career Force data archive has been used in literally hundreds of different analy-
ses pertaining to many kinds of research questions. Only a few of the highlights are discussed
here. For more detail, the reader should consult the special summer issue of Personnel Psychology
(Campbell, 1990) and the Project A “book” (Campbell & Knapp, 2001), as well as the many journal
articles and technical reports referenced in those sources.
ASVAB validities were estimated twice for each major factor of first-tour performance and twice
for each major factor of second-tour performance. As shown in Table 40.4, the profiles of validity
estimates (i.e., across performance factors) were very similar for each of the samples (e.g., .62 to
.65, on the average across MOS, for predicting the Core Technical Proficiency factor). Correcting
for unreliability in the criterion pushes the estimates (not shown) close to .70. ASVAB predicts job
performance in the Army as well as it does training performance, and the estimated validities are
quite high.
In general, ASVAB tends to be the best predictor of each of the performance factors in each of
the major data sets, although the spatial tests, the computer-based cognitive tests, and the personal-
ity and interest measures have substantial correlations as well. The personality scales tend to have
slightly higher correlations with the personal discipline factors. The relatively high correlation of
the interest scales with the technical performance factors and with the leadership related factors was
somewhat unexpected, given the extant literature.
Incremental validities were primarily concentrated in the prediction of the peer leadership
and personal discipline factors by the ABLE scales. At the MOS level, specific psychomotor tests

TABLE 40.4
Comparison of Multiple Correlations Averaged Across Batch A MOS When the
ASVAB Factors, the Spatial, Cognitive Computer-Based Scores, ABLE Composites,
and AVOICE Composites Are Used as Predictors
ASVAB Spatial Computer-Based ABLE-Based AVOICE Basic
Factors Composite Scores Scores Scores
Criterion LV CV LVII LV CV LVII LV CV LVII LV CV LVII LV CV LVII
CTP 63 63 64 57 55 57 50 53 53 27 26 24 41 35 41
GSP 67 65 63 64 63 58 57 57 48 29 25 19 40 34 29
ELS (AE) 39 31 29 32 25 27 34 26 09 20 33 13 25 24 09
MPD 22 16 15 14 12 15 15 12 12 22 32 06 11 13 06
PFB 21 20 16 10 10 13 17 11 03 31 37 17 15 12 09
LDR 63 55 49 34 35

Results corrected for range restriction and adjusted for shrinkage. Decimals omitted. CTP, core technical proficiency; CV,
concurrent validation; ELS (AE), effort and leadership (achievement and effort); GSP, general soldiering proficiency; LDR,
leadership; LV, longitudinal validation (first-term); LVII, longitudinal validation (second-term); MPD, maintain personal
discipline; PFB, physical fitness and bearing.
Project A 881

incremented ASVAB for core technical components that involved firing hand operated weapons
(e.g., TOW missile) (Walker & Rumsey, 2001).
An estimate of the maximum validity attainable from ASVAB plus the Experimental Battery for
predicting Core Technical Proficiency is shown in Table 40.5. The “reduced equation” is comprised
of the four ASVAB factors plus six to eight scores (depending on the MOS) from the Experimental
Battery that were chosen a priori by the psychologist SMEs as the most likely to increment ASVAB.
The results are shown for unit weights, zero-order validity coefficients as weights, and multiple
regression weights with the R values adjusted for shrinkage and corrected for unreliability in Core
Technical Proficiency.
For the prediction of second-tour performance, as shown in Table 40.4, the estimated validi-
ties of the cognitive ability tests for predicting Core Technical Proficiency and General Soldiering
Proficiency were virtually identical for first tour (2–3 years after enlistment) and for second tour
(6–7 years after enlistment). Overall, the validities did not degrade, as some have speculated they
should (e.g., Henry & Hulin, 1987). The only real change was for predicting Effort and Leadership/
Effort and Achievement, but the nature of this performance dimension in fact changed between first
tour and second tour. For second-tour, the leadership components formed a separate factor.
As noted previously, the first-tour performance factors had substantial correlations with the sec-
ond-tour performance factors and displayed considerable convergent and divergent validity. The
assessment of first-tour performance (i.e., in LVI) also provided considerable incremental validity
over ASVAB and the Experimental Battery for predicting the Effort/Achievement and Leadership
performance factors in the second tour, but not for the Core Technical Proficiency factor, as shown in
Table 40.6. This reflects, in part, the increased importance of these factors for the NCO position.
Estimating potential classification gains from new predictors is a complex issue that depends on
a number of contextual parameters (e.g., quotas, assignment priorities, selection ratio, differential
recruitment costs across MOS, and variability in selection validities across jobs). The Project A
database for LVI was used to model a multivariate normal population, and Monte Carlo procedures
were then used to generate estimates of potential classification gains from using various forms of
the Experimental Battery plus ASVAB. Quotas for the nine MOS were set proportional to 1993
accessions.
There are two cross-validation issues in estimating classification gains. One concerns the weights
used for the predictor equation for each MOS (e.g., ordinary least-squares weights are sample specific
to some degree), and the second concerns the sample specificity of the differential assignments
themselves. During Project A/Career Force, a new index of classification efficiency labeled “mean
average performance” (MAP) was developed and Monte Carlo methods were used to provide an
unbiased estimate of the gains in aggregate performance resulting from classification, as compared
with selection alone (Rosse, Campbell, & Peterson, 2001). Using a test battery for each MOS that

TABLE 40.5
Comparison of LVI Estimates of Maximum Predictive Validity, Averaged Over
Batch A MOS, When Unit Weights, Zero-Order Validities, or Multiple Regression
Weights Are Used to Weight Predictors (Criterion Is Core Technical Proficiency)
Unit Validity Adjusted Corrected
Comparison Weights Weights R Ra
Full equation (all .57 .70 .70 (.77)
predictors)
Reduced equation (for .67 .72 .72 (.79)
selection)

All estimates corrected for restriction of range.


a Corrected for criterion unreliability.
882 Handbook of Employee Selection

TABLE 40.6
Multiple Correlations for Predicting Second-Tour Job Performance (LVII)
Criteria From ASVAB and Various Combinations of ASVAB, Selected Experimental
Battery Predictors, and First-Tour (LVI) Performance Measures: Corrected for
Restriction of Range and Criterion Unreliability
Predictor Composite

LVII Criterion Type of Estimate A A+X A+X+1


Core technical proficiency Adjusted R 64 69 68
Unit weight 52 39 42
Effort/achievement Adjusted R 00 00 38
Unit weight 16 13 21
Leadership Adjusted R 36 43 65
Unit weight 40 43 50

Adjusted R values from Rozeboom (1978; formula 8). Decimals omitted. A, ASVAB factors (quantitative,
speed, technical, verbal). X, experimental battery (spatial, rugged/outdoors interests from AVOICE, achieve-
ment orientation, adjustment, physical condition, internal control, cooperativeness, dependability, and leader-
ship from ABLE). 1, the LVI “can do” composite (CTP + GSP) for CTP and “will do” composite (ELS +
MPD + PFB) for effort/achievement and leadership. See Table 40.3 for LVI performance factor labels.

was selected, on a priori grounds, to be potentially useful for classification purposes, the aggregate
gain in MAP for Core Technical Proficiency was .14 standard deviation (SD) units if all accessions
must be classified, and .22 SD units if 5% could remain unassigned. In an organization the size of
the Army, such gains would have enormous utility.
Parallel to Project A, ARI sponsored Project B, which developed the Enlisted Personnel
Assignment System (EPAS) and uses linear programming strategies to make optimal MOS assign-
ments that maximize a specific function (aggregate performance, training cost savings, number of
individuals above a performance minimum, etc.) given a set of constraints to be accommodated
(MOS quotas, priorities, selection ratios, etc.). Together, the Project A database and the EPAS algo-
rithm provided an unparalleled test bed for estimating the effects of various selection and classifica-
tion strategies (Konieczny, Brown, Hutton, & Stewart, 1990).
In addition to the above, many additional data collections and analyses were carried out pertain-
ing to such issues as (a) estimating the differential utility of performance gains across jobs, (b) the
differential criticality of specific performance factors across jobs, (c) the prediction of attrition,
(d) the influence of reward preferences on performance prediction, and (e) the influence of race and
gender on performance assessments. The reader must consult Campbell and Knapp (2001) and the
articles and technical reports they referenced for the details.

SOME BROADER IMPLICATIONS


It is all well and good that the project did what it proposed to do on time, that it was a rewarding
experience for the participants, and that it provided massive evidence for the validity of ASVAB, but
what are the broader implications of its substantive outcomes? We list a critical few.

JOB AND OCCUPATIONAL ANALYSIS


For purposes of developing measures of individual performance, the strong conclusion must be that
one method of job analysis is not enough. Different methods (e.g. task analysis, critical incidents)
give somewhat different, but complementary, pictures of performance requirements. There is prob-
ably no personnel research purpose that would not be better served by multiple methods.
Project A 883

IMPORTANCE OF TAXONOMIC THEORY


The necessity of thinking in terms of the latent structure soon became apparent to everyone. Even
the diehards were pushed in this direction because the specific MOS in the sample were not the
primary interest.

IMPLICATIONS FOR PERFORMANCE MEASUREMENT


Project A presented the first real opportunity to investigate the latent structure of performance in
this way. We believe it helped change the way industrial-organizational psychologists think about
the “criterion problem” (or at least how they should think about it) and about performance measure-
ment in general, regardless of the purpose (Knapp, 2006).

RATING METHOD
The rating method represents a complex process of information processing and social cognition that is
rampant with opportunities for biased and error-filled judgments (e.g., Morgeson & Campion, 1997).
It has a bad press. However, Project A rating measures yielded reasonable distributional properties,
had reasonable single-rater reliabilities across cohorts, and produced a factor structure that was highly
replicable (virtually to the second decimal place). One conclusion might be that ratings are a valu-
able measurement method if (a) the dimensions to be rated are carefully defined and meaningful to
the raters, (b) there is considerable rater training, (c) the setting ensures that raters will give sufficient
time and attention to the rating task, and (d) the goals of the rater are commensurate with the goals
of the researchers. That is, the measurement is for research purposes, not operational performance
appraisal, and the raters accept the goal of doing their best to assess performance on the dimensions as
defined by the researchers. Although minimal, such conditions probably go far beyond most studies.

ROLE OF THE MEASUREMENT GOAL


A frequently asked question concerns which type of criterion measure is “best,” which implies there
must be a near-ultimate criterion lurking someplace. For example, the National Research Council
panel (Wigdor & Green, 1991) took the position (we think in error) that the hands-on job sample
simulation is the preferred criterion measure, always. The intent of Project A was to counter the
argument that there is always one preferred measurement method. Different measurement methods
permit different sources of variation to operate (McCloy, Campbell, & Cudeck, 1994), and the
choice of measurement method depends on the potential sources of variation that the investigator or
practitioner wants to capture, not on being more or less ultimate.

GENERAL FACTOR VERSUS SPECIFIC FACTORS


Because of the generally positive manifold in the intercorrelation matrix for any set of job per-
formance measures, even when method variance and unreliability are controlled (Viswesvaran,
Schmidt, & Ones, 1993), there will always be a general factor. However, the general factor is not
there because of only one general performance requirement. It arises most likely because individual
differences in general mental ability (GMA) and individual differences in the predisposition toward
conscientious and effort are determinants of performance on virtually all aspects of most jobs, even
for performance requirements that entail very different content (e.g., electronic troubleshooting vs.
rewarding subordinates appropriately). However, a general factor does not preclude either the exis-
tence or the importance of specific factors for selection and classification. The naïve use of the term
“overall performance” should become a thing of the past. There is no substantive construct that can
be labeled as general, or overall, performance. Anyone who tries to provide specifications for such a
884 Handbook of Employee Selection

construct simply cannot do it. They must resort to combining the verbal specifications for the indi-
vidual specific factors. Now, anyone can add up the “scores” on the specific factors to obtain a total
score; and a weighted sum may indeed be necessary for certain specific decision purposes (Schmidt
& Kaplan, 1971). MacKenzie, Podsakoff, and Jarvis (2005) refered to such a score as a formative
score in contrast to a reflective score, which is intended to represent a substantive latent variable. For
confirmatory and reliability estimation purposes, the two require a different measurement model
(see Chapter 2, this volume), and Mackenzie et al. (2005) discussed how model misspecifications
can lead to faulty inference.

ROLE OF PERSONALITY
In retrospect, the development and validation of the ABLE was one of the primary reasons for
the resurgence of research on personality for selection purposes, along with the subsequent meta-
analysis reported by Barrick and Mount (1991) that covered nonmilitary occupations. However, the
ABLE results, which differed in substantive ways between the concurrent and longitudinal valida-
tions, also introduce cautions about the reactivity to experience of some types of items and the
subsequent use of concurrent designs for validating personality measures.

GMA VERSUS DIFFERENTIAL PREDICTION


There is certainly no denying the dominant role of GMA in the prediction equation for virtually any
job. However, the principal lessons from Project A are that the degree to which incremental validity
and/or differential validity are possible is influenced significantly by the component of performance
being predicted and the range of predictor variables that can be used.

ESTIMATING CLASSIFICATION EFFICIENCY


The Project A database also provided a rare opportunity to estimate classification gains under a variety
of conditions, without having to assume the simplifying conditions required by the Brogden-type esti-
mator. Zeidner and Johnson and their colleagues (e.g., Scholarios, Johnson, & Zeidner, 1994; Zeidner,
Johnson, & Scholarios, 1997) carried out an extensive series of Monte Carlo simulation studies using
the Project A database and showed that small but operationally significant classification gains could
be realized using only a battery of ability tests. Our own analyses showed that larger estimated gains
could be obtained if the entire Project A Experimental Battery, plus ASVAB, could be used.

BEYOND THE ARMY


The original objectives set by the sponsor were met. But what about the Army as a unique orga-
nization and the generalization of any findings to the civilian sector? Certainly in some respects
the Army is a unique organization. No civilian organization has a similar mission, and some of
the components of individual performance have unique aspects. However, the bulk of the perfor-
mance domain for the enlisted occupational structure has civilian counterparts and the personnel
corps is reasonably representative of the civilian labor force in similar jobs with similar levels of
experience. It is our firm belief that the major implications of the project’s methods and results are
not constrained by the uniqueness of the Army as an organization and have broad applicability to
understanding the world of work.

CONCLUSIONS
The experiences of Project A argue again and again for the necessity of placing the measures and
variables used in a particular study within a model of the relevant latent structure that is specified as
Project A 885

well as possible. It facilitates the interpretation of results, the integration of findings across studies,
the identification of future research needs, the use of the findings for unanticipated applications, and
the generalization of findings to other settings.
Continually trying to improve our models of relevant domains, as well as the interrelationship
among them, is critical for (good) practice as it is for (good) science. We began Project A with high
respect for our practice and our science. The respect for both and the appreciation for how they are
so strongly interrelated were even greater at the end.

REFERENCES
Ackerman, P. L. (1989). Within task intercorrelations of skilled performance: Implications for predicting
individual differences? (A commentary on Henry & Hulin, 1987). Journal of Applied Psychology, 97,
360–364.
Austin, J. T., Humphreys, L. G., & Hulin, C. L. (1989). A critical reanalysis of Barrett et al. Personnel
Psychology, 42, 583–596.
Barrett, G. V., & Alexander, R. A. (1989). Rejoinder to Austin, Humphreys, and Hulin: Critical reanalysis of
Barrett, Caldwell, and Alexander. Personnel Psychology, 42, 597–612.
Barrett, G. V., Caldwell, M. S., & Alexander, R. A. (1985). The concept of dynamic criteria: A crucial reanaly-
sis. Personnel Psychology, 38, 41–56.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance. A
meta-analysis. Personnel Psychology, 44, 1–26.
Borman, W. C., & Motowidlo, S. J. (1993). Expanding the criterion domain to include elements of contextual
performance. In N. Schmitt, W. C. Borman, & Associates (Eds.), Personnel selection in organizations
(pp. 71–98). San Francisco, CA: Jossey-Bass.
Brogden, H. E. (1946). An approach to the problem of differential prediction. Psychometrika, 11, 139–154.
Campbell, J. P. (1990). An overview of the Army selection and classification project (Project A). Personnel
Psychology, 43, 231–239.
Campbell, J. P., & Knapp, D. J. (2001). Exploring the limits in personnel selection and classification. Mahwah,
NJ: Lawrence Erlbaum.
Campbell, J. P., McCloy, R. A., Oppler, S. H., & Sager, C. E. (1993). A theory of performance. In N. Schmitt &
W. C. Borman (Eds.), Frontiers in industrial/organizational psychology: Personnel selection and clas-
sification (pp. 35–71). San Francisco, CA: Jossey-Bass.
Fleishman, E. A. (1967). Performance assessment based on an empirically derived task taxonomy. Human
Factors, 9, 349–366.
Ghiselli, E. E. (1973). The validity of aptitude tests in personnel selection. Personnel Psychology, 26,
461–477.
Henry, R. A., & Hulin, C. L. (1987). Stability of skilled performance across time: Some generalizations and
limitations on utilities. Journal of Applied Psychology, 72, 457–462.
Holland, J. L. (1966). The psychology of vocational choice. Waltham, MA: Blaisdell.
Hulin, C. L., Henry, R. A., & Noon, S. L. (1990). Adding a dimension: Time as a factor in the generalizability
of predictive relationships. Psychological Bulletin, 107, 328–340.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin, 98, 72–98.
Knapp, D. J. (2006). The U.S. Joint-Service Job Performance Measurement Project. In W. Bennett,
C. E. Lance, & D. J. Woehr (Eds.), Performance measurement: Current perspectives and future chal-
lenges (pp. 113–140). Mahwah, NJ: Lawrence Erlbaum.
Konieczny, F. B., Brown, G. N., Hutton, J., & Stewart, J. E. (1990). Enlisted Personnel Allocation System:
Final report (ARI Technical Report 902). Alexandria, VA: U.S. Army Research Institute for the
Behavioral and Social Sciences.
Mackenzie, S. B., Podsakoff, P. M., & Jarvis, C. D. (2005). The problem of measurement model misspecifi-
cation in behavioral and organizational research and some recommended solutions. Journal of Applied
Psychology, 90, 710–730.
McCloy, R. A., Campbell, J. P., & Cudeck, R. (1994). A confirmatory test of a model of performance determi-
nants. Journal of Applied Psychology, 79, 493–504.
McHenry, J. J., Hough, L. M., Toquam, J. L., Hanson, M. A., & Ashworth, S. (1990). Project A validity results:
The relationship between predictor and criterion domains. Personnel Psychology, 43, 335–354.
886 Handbook of Employee Selection

Morgeson, F. P., & Campion, M. A. (1997). Social and cognitive sources potential of inaccuracy in job analy-
sis. Journal of Applied Psychology, 82, 627–655.
Nathan, B. R., & Alexander, R. A. (1988). A comparison of criteria for test validation: A meta-analytic investi-
gation. Personnel Psychology, 41, 517–536.
Peterson, N. G., Hough, L. M., Dunnette, M. D., Rosse, R. L., Houston, J. S., Toquam, J. L., & Wing, H. (1990).
Project A: Specification of the predictor domain and development of new selection/classification tests.
Personnel Psychology, 43, 247–276.
Reynolds, D. H., Bayless, A., & Campbell, J. P. (2001). Criterion reliability and the prediction of future perfor-
mance from prior performance. In J. P. Campbell & D. J. Knapp (Eds.), Exploring the limits in personnel
selection and classification. Mahwah, NJ: Lawrence Erlbaum.
Rosse, R. L., Campbell, J. P., & Peterson, N. G. (2001). Personnel classification and differential job assign-
ments: Estimating classification gains. In J. P. Campbell & D. J. Knapp (Eds.), Exploring the limits in
personnel selection and classification. Mahwah, NJ: Lawrence Erlbaum.
Schmidt, F. L. (1988). Validity generalization and the future of criterion related validity. In H. Wainer &
H. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychol-
ogy: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Schmidt, F. L., & Kaplan, L. B. (1971). Composite vs. multiple criteria: A review and resolution of the contro-
versy. Personnel Psychology, 24, 419–434.
Schmitt, N., Gooding, R. Z., Noe, R. A., & Kirsch, M. (1984). Meta-analyses of validity studies published
between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology, 37,
407–422.
Scholarios, T. M., Johnson, C. D., & Zeidner, J. (1994). Selecting predictors for maximizing the classification
efficiency of a battery. Journal of Applied Psychology, 79, 412–424.
Viswesvaran, C., Schmidt, F. L., & Ones, D. S. (1993, April). Theoretical implications of a general factor in
job performance criteria. Paper presented at the 8th Annual Conference of the Society of Industrial and
Organizational Psychology. San Francisco, CA.
Walker, C. B., & Rumsey, M. G. (2001). Application of findings: ASVAB, new aptitude tests, and personnel
classification. In J. P. Campbell & D. J. Knapp (Eds.), Exploring the limits in personnel selection and
classification. Mahwah, NJ: Lawrence Erlbaum.
Wigdor, A. K., & Green, B. F. (1991). Performance assessment for the workplace (Vols. I-II). Washington, DC:
National Academy Press.
Zeidner, J., Johnson, C. D., & Scholarios, D. (1997). Evaluating military selection and classification systems in
the multiple job context. Military Psychology, 9, 169–186.
41 The Dictionary of
Occupational Titles
and the Occupational
Information Network

Norman Peterson and Christopher E. Sager

MILESTONES
The Dictionary of Occupational Titles (DOT; U.S. Department of Labor [DoL], 1991a, 1991b) and
its successor, the Occupational Information Network (O*NET TM; Peterson, Borman, Mumford,
Jeanneret, & Fleishman, 1999), have played a unique role for the United States, providing a single,
standard system for analyzing and presenting information about occupations for a wide variety of
applied uses. Although not perfect (Miller, Treiman, Cain, & Roos, 1980, Committee on Techniques
for the Enhancement of Human Performance: Occupational Analysis, 1999), these systems are both
milestones of behavioral science achievement. Designed at significantly different periods of time
with different technologies and in response to different workplace dynamics, the DOT and the
O*NET have been simultaneously shaped by the nations’ societal needs and the state of the art
occupational analysis in response to those needs. Industrial-organizational (I-O) psychology has
played a major, but not the only, role in the development of these systems. In this chapter we discuss
the development and implementation of these systems.

ORGANIZING FRAMEWORK
I-O psychologists most often approach occupational analysis from the framework of obtaining
information about a single occupation or a fairly homogeneous set of occupations within one or
a few organizations and for one or a few specific applications, such as employee selection, perfor-
mance appraisal, or compensation. National occupational systems necessarily take a much broader
view, both from the point of view of the population of occupations to be included and the purposes
to be served by the system. The DOT and the O*NET have as their intended population all of the
occupations in the U.S. economy, and their intended uses are more universal in nature; for example,
the provision of job descriptions in narrative terms for general lay use, the development and mainte-
nance of an organizing framework for all occupations, and use in matching applicants with jobs at
state employment agencies across the nation.
The DOT evolved fairly gradually over its history and its successor, the O*NET, seemingly
represents a sharp departure from the DOT. However, if occupations are viewed as multidimen-
sional variables subject to a broad range of independent variables (e.g., technological, demographic,
cultural, and legal), and occupational analysis as the science and art of describing occupations,
then the DOT and O*NET do not appear so different. Figure 41.1 shows one way of depicting

887
888 Handbook of Employee Selection

Laws Demographics Markets Technology

Organizational
restructuring

Attributes of Content and


Outcomes
persons structure of work

Occupational
analysis

FIGURE 41.1 Framework for conceptualizing the changing nature of work and occupational analysis. (From
the National Academy of Sciences, The changing nature of work, National Academies Press, Washington,
DC, 1999. With permission.)

occupations and occupational analysis in a larger causal network (from Committee on Techniques
for the Enhancement of Human Performance: Occupational Analysis, 1999, p. 15). The independent
variables shown there (law, demographics, markets, technology) wax and wane in terms of their
effect on the nature and structure of occupations and occupational analysis systems. Research on
and experience in the conduct of occupational analysis itself obviously more directly affects the
nature of occupational analysis systems.
A matrix is a convenient device to conceptualize occupational analysis in general. Figure 41.2
shows a matrix with rows representing single occupations or categories of more than one occu-
pation (e.g., file clerk, police officer, retail sales representative) and columns representing vari-
ous attributes of occupations (e.g., tasks, abilities, skills, education required). The main focus
of I-O psychologists has been to define and measure the various types of attributes best used to
describe occupations—tasks, skills, and abilities. An example of this might be the Position Analysis
Questionnaire (McCormick, Jeanneret, & Mecham, 1972). Economists, sociologists, and national
government agencies like the DoL tend to focus at a more macro level on the larger structure of
occupations. Here the focus is on the number of occupations, their titles, and the best ways to group

Occupational Occupational attributes (e.g., activities, skills)


categories
A B C .............. ZZZ

1 Xa1 Xb1 Xc1 ………. Xzzz1


2 Xa2
3 Xa3
.
.
999 Xa999 Xzzz999

FIGURE 41.2 Matrix representation of occupational analysis. (Adapted from the National Academy of
Sciences, The changing nature of work, National Academies Press, Washington, DC, 1999. With permission.)
The Dictionary of Occupational Titles and the Occupational Information Network 889

the occupations for statistical summaries; for example, the Standard Occupational Classification
(U.S. Department of Commerce, 1980). The cell entries in the matrix represent “scores” for each
occupational category on each occupational attribute, however they might be obtained. Finally, note
that if hierarchical arrangements are imposed on the rows or columns, or both, then various aggre-
gations of the occupational information can be obtained and used for different purposes.
As noted above, the DOT and the O*NET intentionally incorporate the population of the rows
and the columns of this matrix. Although many of the occupational analyses conducted by individual
I-O psychologists for research or applied purposes cover the population of the columns (attributes) or
a large part of that population, almost none cover the population of the rows (occupations). Instead,
they focus on a single row or a few such rows of this matrix. The extent to which an occupational
analysis system (e.g., the DOT or the O*NET) adequately includes, defines, and measures the
elements in the populations of rows and columns ultimately determines its value in meeting societal
needs.

DOT
In the 1930s, the United States was in the throes of the Great Depression. One of the governmental
responses to this crisis was the Wagner-Peyser Act of 1933, an act “to provide for the establishment
of a national employment system and for cooperation with the States in the promotion of such
system, and for other purposes.” This act established the U.S. Employment Service (USES) to “assist
in coordinating the State public employment services throughout the country and in increasing their
usefulness by developing and prescribing minimum standards of efficiency … [and] promoting
uniformity in their administrative and statistical procedure.” Funds allocated were to be used, among
other things, for “job search and placement services to job seekers including counseling, testing,
occupational and labor market information, assessment, and referral to employers” (Section 3(a)).
Each state service pursued these ends in their own ways, creating difficulties in communication
between state services and between the states and the USES (Droege, 1988). In reaction, a panel
including several prominent psychologists was formed to oversee the development of an occupational
research program. In the five years from 1934 to 1939, over 50,000 individual job analyses were
carried out using procedures initially developed at the Minnesota Stabilization Research Institute
(Paterson & Darley, 1936). These analyses were conducted in work sites via job observation and
interview of job holders and supervisors.
The first edition of the DOT (DoL, 1939), including 17,500 occupations, was primarily based
on those analyses. The content of these job analysis reports, and derivative ratings of other occu-
pational attributes based on those reports, constituted the DOT attributes, or columns, of our con-
ceptual matrix shown in Figure 41.2. The initial set of DOT “rows” in the idealized occupational
matrix were the 17,500 occupations organized into 550 occupational groups that “reflected occupa-
tional relationships based on work performed, job content, and skill level” (Droege, 1988, p. 993).
In essence, this method of (a) conducting on-site job analyses employing observation and interview
techniques and resulting in a (b) written job analysis report (often referred to as a JAR) which was
the basis for (c) additional ratings of job attributes by job analysts (that may or may not have been
involved in the initial job analysis) remained constant over the entire history of the DOT.
Droege (1988) summarized the somewhat checkered organizational history of the occupational
research program at the USES and the state employment services up to the mid 1980s. Cutbacks
in national office staff after World War II, in the early 1950s, and in the early 1980s led to vari-
ous forms of Federal-state cooperative organizations responsible for conducting the occupational
research necessary to update and maintain the DOT. An important component of these organiza-
tional systems was the Occupational Analysis Field Centers (OAFCs), each based in a State, (the
number of these centers varied over time, ranging from 4 to 12) which were primarily responsible
for DOT updating activities. Over this time period the second (DoL, 1949), third (DoL, 1965), and
fourth (DoL, 1980) editions of the DOT were produced as well as various supplements and other
890 Handbook of Employee Selection

publications. In 1991, a revision of the fourth edition was published (DoL, 1991a). Each of these edi-
tions had changes in the occupational groupings and associated coding systems and in the informa-
tion or “attributes” about the occupations; these changes were primarily incremental in nature.

STRUCTURE AND CONTENT OF THE DOT


Depending on which version of the DOT is described, the structure and content of the information
will differ1. For our purposes, we will summarize the content of the fourth edition (DoL, 1980) pri-
marily because (a) it was the last edition published (although a revised fourth edition was published
in 1991), (b) it is neatly summarized in Droege (1988), and (c) it was the subject of an extensive
critique by the National Research Council (NRC; Miller et al., 1980).
The DOT is a dictionary. As such, its basic entry has that form. It opens with the occupational
code, a nine-digit number. The first three digits plus the last three digits place an occupation in a
unique “row” embedded in a hierarchical structure. The first digit places it in one of nine broad
occupational categories (e.g., professional or machine trades), the first and second place it in one
of 82 occupational divisions that are further divided into 559 occupational groups (identified by
the first, second, and the third digit). The last three digits serve to uniquely identify the occupation
within these strata and have no substantive reference. Digits 4–6 are the worker function ratings
for the occupation (see below), reflecting the level of complexity of the occupations with respect to
interactions with data, people, and things. The code is followed by a textual occupational title, an
industry designation, and any alternate titles that apply. The body of the definition then contains a
“lead statement” that summarizes the occupation in terms of the specific actions carried out by the
worker, the work fields (what gets done on the job), the machines, tools, and work aids used on the
job, and the products of the job (see below for more detail). This lead statement is then followed by
task element statements that more specifically describe the job.
In addition to this basic code and definition, a large amount of additional information is pro-
vided. Droege depicted the DOT content as organized into two major areas: “work performed” and
“worker characteristics,” with the former including components that “relate to the work activities
of a job” whereas the latter reflect “worker attributes that contribute to successful job performance”
(Droege, 1988, p. 995).
Included in work activities were the following:

• Worker functions: A set of three ratings depicting the worker’s level of involvement with
the famous “data,” “people,” and “things” (Fine, 1968a, 1968b, 1988; DoL, 1972)
• Work fields: One chosen from a set of 100 technological and socioeconomic objectives,
such as “material moving,” “cooking-food preparing,” and “researching” with associated
definitions and verbs
• Work devices: A list of machines, tools, equipment, and work aids used on the job
• Materials, products, subject matter, and services (MPSMS): One or more of these chosen
from an organization of 48 groups with 328 constituent categories, such as a “marine life”
group with “finfish,” “shellfish,” and “other marine life” categories

Included in worker characteristics were the following:

• General educational development: Three ratings, on six-point scales, of the reasoning,


mathematical, and language development required on the job
• Special vocational preparation: Amount of training required using a time scale (“short
demonstration only” through “over 10 years”)

1 Notably, the number of occupations will vary considerably because of the deletion, combining, or adding of occupations.
The Dictionary of Occupational Titles and the Occupational Information Network 891

• Aptitudes: Estimates of levels of 11 abilities “predictive” of an individual’s job perfor-


mance, based largely on the General Aptitude Test Battery components with two additions,
Eye-Hand-Foot Coordination and Color Discrimination
• Interests: Selection of a primary interest factor from a defined list of 12
• Temperaments: Selection of the primary temperaments (e.g., adaptability requirements of
a job) from a list of 11
• Physical demands: A single rating of the strenuousness of the job, on a five-point scale, and
estimates of the importance and frequency of 28 specific physical demands
• Environmental conditions: Ratings of the importance of 14 environmental conditions,
with seven of these considered as hazards

Collectively then, the hierarchical, occupational category system embedded in the DOT code and
the definitions and ratings of the occupational attributes constitute the actual “rows” and “columns”
for the DOT when it is thought about in terms of our idealized occupational analysis matrix. For the
fourth edition, there were 12,099 rows or occupations and about 100 “occupational attribute” scores,
depending on how you count, in addition to the textual definition of the occupation itself.

COLLECTION OF JOB ANALYSIS DATA


The structure and content of the DOT, described just above, were the products of considerable
data collection and refinement efforts. As noted earlier, the prototypical way in which the basic job
analysis data were collected was via field observation and interviews of incumbents and supervisors
by analysts based in the state occupational analysis field centers. Droege (1988) described the basic
methodology used by these analysts (pp. 1003–1014). This was a very labor-intensive process. Here
are Droege’s headings for the steps required for a job analysis.

• Preparing for job analysis studies: Researching the industry, selecting establishments,2
contacting establishments, initial meeting with establishment officials, arranging for an
authorized study, obtaining establishment data to facilitate job analysis, preparing a pre-
liminary staffing table, taking the establishment tour
• Conducting job analysis studies: Establishing a job analysis study schedule, the
observation-interview job analysis method, determining number of jobs, meeting with
establishment official at end of study
• Writing job description: Parts of a job description, determining detail needed in job and
task descriptions, breaking a job down into tasks, flag statements, estimating time percent-
ages of tasks, preparing job summaries

The primary product of this process was the Job Analysis Report (JAR). As Droege (1988) stated,
“The job analysis report serves the dual purpose of structuring the job analysis and recording the
resulting data” (p. 1007). He also provided an example of a completed JAR. In it, the analyst docu-
mented the basic identifying information (title, industry, Standard Occupational Classification code,
and a single sentence summary of the job), the work devices used; the MPSMS provided; informa-
tion about the type and length of training, licenses required, methods of promotion, and supervision
on the job; and the data collection methods used in the job analysis. With this basic information, the
analyst then made ratings of all those attributes described above under structure and content (worker
functions, GED, aptitudes, etc.).
2 It appears that sampling establishments were not done with any formal sampling technique. Droege (1988) indicated the job
analysis studies were done in “a number of establishments representing key segments of all industries” (p. 1003) and these
were selected from lists drawn up by job analysts on the basis of their experience in industry research, industrial directories,
and classified pages of telephone directories. Factors that influenced the actual selection of an establishment included type
of product or service, number of employees, location, and policy or history of the establishment in allowing studies.
892 Handbook of Employee Selection

USE OF THE DOT


Extensive use was made of the DOT by governmental agencies and others. The NRC performed a
thorough review (Miller et al., 1980) of the uses of the fourth edition of the DOT in the late 1970s.
They surveyed purchasers of the DOT, interviewed major federal users, surveyed selected state
users, compiled a bibliography of its use in social science research, interviewed staff in national
and state offices of the USES (the intended primary users), and visited local Employment Service
offices. Their basic conclusion was “there is an important and continuing need for a comprehensive,
reliable catalogue of occupations in the U.S. economy as well as descriptions of the work performed
in each occupation.” (p. 3).
USES Use
The NRC found that the USES staff primarily used the job titles and definitions of the DOT for their
job placement activities, with heavy use also made of the DOT coding structure. The worker func-
tions, worker trait information, and industry designations were used less, although 75% found that
information useful for helping clients “explore vocational and occupational options” (p. 43). The
testing and labor certification divisions of the USES also made extensive use of the DOT, the former
for producing, validating, and norming the tests as well as making use of the tests for counseling
and placement, and the latter for making determinations about the “certification” of occupations for
foreign labor (foreign workers were allowed to enter only for jobs that required skills U.S. workers
could not provide). All of this “certification” was conducted within the DOT occupational coding
system.

Nongovernmental Use
The NRC report noted that the fourth edition of the DOT had sold over 100,000 copies in the first
fourteen months after its publication in 1977, compared to over 40,000 copies of the third edition
in the comparable time period. (The third edition sold almost 150,000 copies in total.) These were
copies directly purchased from the DoL, and were over and above the approximately 35,000 copies
distributed free of charge to all government agencies. In their survey of purchasers, they found that
37% were in career or vocational counseling work, 18% did library work, and 15% did management/
compensation work. Eight percent were in employment placement whereas 7% each were in voca-
tional education, rehabilitation counseling, and work having to do with projections/occupational
information dissemination. Five percent were purchased by teachers/researchers (5000 copies!).
(Figures are from Table 4-2, Miller et al., 1980, p. 52). It is clear that counselors, librarians, and
placement workers heavily used the DOT to get their work done, and it also played a major role for
personnel workers in business.
Most users (88%) reported that discontinuance of the DOT would have at least a minor disrup-
tive effect on their work, and 50% or more of teacher/researchers (67%), occupational information
disseminators (58%), vocational educators (50%), and rehabilitation counselors (50%) reported that
it would have a major disruptive effect (Table 4-5, Miller et al., 1980, p. 56).

Other Government Agency Use


Naturally enough, extensive use was made of the DOT by other government agencies. Some of these
uses mirror those described just above; others were quite different. They included the following:

• Employment training and production of occupational information (Bureau of Apprentice-


ship and Training): DOT codes used for record-keeping and statistical reporting, such as
reviewing and registering apprenticeship programs for skilled trades.
• Rehabilitation and employment counseling (Veterans Administration): DOT descriptions
and worker trait groups (abilities, interests, temperaments) used by counselors for various
purposes such as determining transferability of skills, identifying occupational objectives,
and for occupational exploration.
The Dictionary of Occupational Titles and the Occupational Information Network 893

• Vocational and occupational education: Used by vocational educators, primarily at the


state level, for program planning, counseling, and curriculum development.
• State government agencies: State Occupational Information Coordinating Committees
(SOICCs) identified government DOT users within their states, which were then surveyed
by the NRC in a manner similar to the results reported above for the nongovernmental
users. These within-state users were found in educational institutions (49%) and within
state and county/local governments (20% each). Most of the work they were doing was
in three areas: (a) career, vocational, or rehabilitation counseling; (b) vocational educa-
tion; and (c) projections/occupational information dissemination. About half (49%) of these
users indicated that discontinuance of the DOT would cause major inconvenience to or
seriously disrupt their work.
• Social science researchers: Sociologists, psychologists, and economists used the DOT in a
broad range of research activities; for example, classification or identification of samples by
use of DOT codes and the use of titles and definitions in vocational guidance tools. (An appen-
dix of their report has an annotated list of these publications, which is interesting to peruse.)

This brief review demonstrates the degree of penetration of the DOT into the U.S. occupational
information realm by 1980. Still, despite its widespread use, it was not without criticism, and these
were well documented by the NRC.

CRITICISMS OF THE DOT


The DOT received a careful review and critique from the NRC in 1980 (Miller et al., 1980). Aside
from making two general conclusions, that “there is a strong and continuing need both within and out-
side the U.S. Employment Service for the kind of information provided by the DOT” and “substantial
improvements in the procedures and products of the occupational analysis program are required in
order to meet the national need for occupational information,” they made several recommendations.
Chief among these were the following (from Miller et al., 1980, Chapter 9, pp. 214–234):

• The occupational analysis program should concentrate on the fundamental activity of job
analysis and research and development strategies for improving procedures, monitoring
changes in job content, and identifying new methodologies.
• A permanent, professional research unit of high quality should be established to conduct
such research and improvement efforts.
• An outside advisory committee should be established, representing employers, unions,
relevant academic disciplines, and the public, to review and make recommendations about
occupational analysis program activities.

Note that these recommendations are primarily organizational in nature. As briefly mentioned ear-
lier, the way in which the DOT job analysis information had been collected over the years was
continually subject to the vagaries of funding fluctuations, which led to organizational changes,
especially in the responsibility for collecting job analysis information. These recommendations
seem responsive to that history. Unfortunately, the recommendations were not followed as assidu-
ously as they might have been.
More specific recommendations were made in the areas of data collection procedures, measure-
ment of occupational characteristics, classification, other research, and organization and adminis-
tration (a total of nineteen such recommendations). A few of these are paraphrased below to give
the flavor of the full set.

• Continue on-site observation and interviews as the major mode of data collection, but
experiment with other methods.
894 Handbook of Employee Selection

• Select establishments and work activities for job analysis according to a general sampling
plan tailored to occupational analysis. (As mentioned in a footnote above, selection of
establishments was done primarily by analysts within industries assigned to the different
field centers with virtually no overall guidance on how to study or select establishments
within industries.)
• Devise procedures to monitor and identify changes in the job content of the economy.
• Review and replace, if appropriate, the worker trait and worker function scales. (The
committee argued that these measures did not reflect “conceptually central aspects of
occupational content,” they were not developed in accord with current psychometric
standards and had lower than desirable reliability, and some of the scales had limited
use.)
• Investigate cross-occupational linkages indicating possible transferability of skills or
experience.
• Use a standard classification, compatible with or identical to the standard system
implemented by the Office of Federal Statistical Policy and Standards (or its successor
developers).
• Give research priority to developing criteria for defining occupations—the aggregation
problem (i.e., methods for determining the homogeneity or heterogeneity of a target occu-
pation, and, therefore, the number and/or sources of individual job analyses that it might be
necessary to study and “aggregate” to adequately describe an occupation).
• Collection and dissemination of occupational information should be continuous, not epi-
sodic (coinciding with production of new editions).
• DOT data collection and development procedures should be documented and publicly
described, and its data made available to the public.

No doubt all of these recommendations were seriously received, and some were at least partially
implemented. However, some of these issues were not squarely faced until the development of a new
kind of occupational information system was seriously contemplated.

DOT TO O*NETTM TRANSITION


In addition to the DOT’s problems, as documented by the NRC’s critique (Miller et al., 1980), it
became apparent by the late 1980s that the nature of work itself was changing and that theoretical
and technical advances relevant to the description of occupations were making the DOT increas-
ingly less optimal (e.g., Dunnette, 1999; Dye & Silver, 1999). Influences affecting the world of
work included changes in technology, demographics, the structure of organizations, and the actual
content of work (e.g., Committee on Techniques for the Enhancement of Human Performance:
Occupational Analysis, 1999). “Jobs” were becoming broader and their content was changing.
These influences made it less desirable to describe jobs primarily in terms of static occupation-
specific “tasks” (Brannick, Levine, & Morgeson, 2007). Partly in response to this trend, since the
inception of the DOT, applied researchers had developed cross-occupational descriptors of occu-
pations that address other requirements and characteristics of work. The resulting taxonomies
eventually represented improvements relative to DOT’s worker characteristics and work activities.
One example is a comprehensive taxonomy of abilities developed by Fleishman (1975); another
is the Position Analysis Questionnaire (PAQ; McCormick, Mecham, & Jeanneret, 1989) that was
designed to assess work activities performed in many occupations. In addition, advances in infor-
mation technology had developed better ways to present and access information than the dictionary/
book approach. These and other developments led the DoL to constitute the Advisory Panel for the
Dictionary of Occupational Titles (APDOT).
The APDOT recommended a comprehensive redesign of the DOT (APDOT, 1993), specifying
particular characteristics toward the goal of a system that promotes
The Dictionary of Occupational Titles and the Occupational Information Network 895

the effective education, training, counseling and employment of the American workforce. The DOT
should be restructured to accomplish its purpose by providing a database system that identifies, defines,
classifies and describes occupations in the economy in an accessible and flexible manner. Moreover,
the DOT should serve as a national benchmark that provides a common language for all users of
occupational information. (APDOT, 1993, p. 13)

APDOT’s specifications for a new occupational analysis system included the following themes:

• Use of a common occupational classification system that facilitates linkage to other occu-
pational and labor-market databases and represents collapsing, merging, and restructuring
of the workplace with fewer than the 12,000 DOT occupations
• Multiple windows of job descriptors intended for different user needs
• A “common language” of descriptors useful across occupations in their definition and
scaling
• Improved data collection, with more systematic sampling and use of structured job analy-
sis questionnaires
• Hierarchical organization so that users could ease into the system from the top down or
aggregate from the bottom up
• A relational electronic database that could be queried dynamically in a manner that facili-
tates access for a variety of users

DEVELOPMENT AND TESTING OF A PROTOTYPE O*NET


The O*NET prototype was developed and tested in response to the goals and specifications outlined
by the APDOT. Peterson et al. (1999) described this process in considerable detail. The prototype
settled on describing 1,122 occupations primarily on the basis of the 1984 Bureau of Labor Statistics
(BLS) Occupational Employment Statistics (OES) classification. Initial development focused on
collecting data for 80 of these occupations.

O*NET CONTENT MODEL


A number of the remaining APDOT specifications were addressed by the prototype’s development
of a content model that maps out all of the descriptors to be assessed for each occupation (National
Center for O*NET Development, DoL, 2007). Figure 41.3 shows O*NET’s current content model
that is very similar to the prototype’s content model and reasonably similar to the content model
proposed by APDOT. The figure shows six domains: (a) worker characteristics, (b) worker require-
ments, (c) experience requirements, (d) occupational requirements, (e) workforce characteristics,
and (f) occupation-specific information. This figure also makes the point that some of these descrip-
tors are cross-occupation, supporting comparisons across occupations via common scales; whereas
some are occupation-specific, providing richer information regarding particular characteristics of
each occupation.

Worker Characteristics
These cross-occupation descriptors were conceptualized to represent the enduring characteristics
of individuals that are relevant to job performance and/or the capacity to acquire knowledge and
skills necessary for work (National Center for O*NET Development DoL, 2007). The abilities for
the prototype and the current version of O*NET include 52 descriptors that address enduring human
capabilities that are not substantially affected by experience (e.g., Oral Comprehension, Deductive
Reasoning, Spatial Orientation, and Near Vision) originating from work by Fleishman (1975). The
occupational interests are six descriptors based on Holland’s (1976) taxonomy of interests (i.e.,
Realistic, Investigative, Artistic, Social, Enterprising, and Conventional) that represent preferences
896 Handbook of Employee Selection

Worker-oriented

Worker Worker Experience


characteristics requirements requirements
Abilities Skills-knowledge Experience and training
Occupational interests Education Skills-entry requirement
Work values Licensing
Work styles

Cross- Occupation-
occupation specific

Occupational
requirements
Occupational-specific
Generalized work activities information
Labor market
Detailed work activities
characteristics Tasks
Organizational context
Work context Tools and technology
Labor market information
Occupational outlook

Job-oriented

FIGURE 41.3 Occupational Information Network (O*NET) content model. (Adapted from the O*NET
Center Online.)

for work environments and outcomes. The work values are 21 descriptors consisting of specific
needs that the work can satisfy originating from Dawis and Lofquist’s (1984) Theory of Work
Adjustment. They include occupational reinforcers such as Ability Utilization, Recognition, and
Autonomy.

Worker Requirements
These cross-occupation descriptors are work-related attributes that can be developed by educa-
tion, training, and experience (National Center for O*NET Development DoL, 2007). Basic skills
are 10 developed capabilities that facilitate acquisition of information and learning (e.g., Writing,
Mathematics, Critical Thinking). Cross-functional skills are currently 25 developed capacities rele-
vant to the performance of activities occurring across occupations. The prototype included 36 cross-
functional skills that were restructured for the current version of O*NET (Mumford, Peterson, &
Childs, 1999). The knowledge are 33 descriptors that organize sets of principles and facts that apply
to general domains of knowledge (i.e., Economics and Accounting, Mechanical, Biology, and Law
and Government). Education covers educational experiences required to perform an occupation in
terms of level (i.e., high school, vocational school, college, etc.) and subjects (e.g., English/Language
Arts, Basic Math, Humanities, etc.). The current educational portion of the current O*NET is fairly
similar to its prototype version. The work styles are 16 descriptors that represent personal charac-
teristics relevant to how well a person performs work, traditionally referred to as personality and/or
temperament variables. The prototype included a list of 17 descriptors that underwent only minor
modifications in the operational version.

Experience Requirements
These descriptors are both cross-occupation and occupation-specific (National Center for O*NET
Development DoL, 2007). They begin with an indication of the amount of different types of experi-
ence required to be hired (e.g., related work experience, on-the-job training, apprenticeship). Next
The Dictionary of Occupational Titles and the Occupational Information Network 897

are the basic and cross-functional skills required to be hired for the occupation. These are the same
skills referred to in worker requirements. Finally, this part of the content model includes licenses,
certificates, and registrations relevant to the occupation.

Occupational Requirements
These descriptors are work-oriented descriptors that describe the requirements of each occupation
(National Center for O*NET Development DoL, 2007). The generalized work activities (GWAs) in
the current version of O*NET include 41 descriptors that address general work behaviors that occur
across occupations (e.g., Getting Information, Analyzing Data or Information, Performing General
Physical Activities, and Resolving Conflicts and Negotiating with Others). The GWA taxonomy was
substantially influenced by the theory behind and content of the PAQ (McCormick et al., 1989) for
nonsupervisory jobs and for supervisory and management jobs by several managerial dimension
systems summarized in Borman and Brush (1993). The prototype GWAs included 42 descriptors
that are not very different from the current version (Jeanneret, Borman, Kubisiak, and Hanson,
1999).3
Organizational context descriptors reflect characteristics of organizations in which the occupa-
tions are embedded. They cover a variety of areas such as the level of employee empowerment,
skill variety, reward systems, and organizational culture. There were some adjustments between the
prototype and current version of organizational context items (Arad, Hanson, & Schneider, 1999).
Finally, the work context descriptors cover physical and social factors that influence the work envi-
ronment. The current version has 57 work context descriptors. They are organized somewhat dif-
ferently than in the prototype version; but, are otherwise fairly similar (Strong, Jeanneret, McPhail,
Blakely, & D’Egidio, 1999).

Workforce Characteristics
This part of the content model contains information from the BLS regarding wages, level of employ-
ment, and employment outlook for each occupation (National Center for O*NET Development DoL,
2007).

Occupation-Specific Information
This portion of the content model serves as the primary location of occupation-specific information
(National Center for O*NET Development DoL, 2007). It includes tasks, tools, and technology rel-
evant to each occupation. The tasks are work behaviors more specific than GWAs and detailed work
activities. The O*NET prototype included tasks, but not tools and technology.

Hierarchical Structure
It is important to note that the descriptors in each domain of the content model represent the level at
which data are collected and are at the most specific level of a hierarchical structure. For example,
the ability Deductive Reasoning is one of the “Idea Generation and Reasoning” abilities that belong
to the “Cognitive” abilities, which in turn belong to the “Abilities” domain. Finally, the abilities are
a domain of descriptors belonging to the “Worker Characteristics” (see Figure 41.3).

O*NET PROTOTYPE DATA COLLECTION


Methods
The O*NET prototype departed from the DOT in other significant ways beyond its extensive con-
tent model. Consistent with developments in relevant areas of applied research, the primary data

3 The current version of the O*NET content model also includes detailed work activities consisting of statements describing
work within particular GWAs.
898 Handbook of Employee Selection

collection approach became structured questionnaires administered to incumbents (e.g., Fleishman


& Mumford, 1988), and a sampling strategy was employed in an attempt to achieve a representative
sample of incumbents in each occupation rather than a sample of convenience.
For most descriptors, the relevant questionnaire began with the title and definition of the con-
struct (e.g., “Reading Comprehension: Understanding written sentences and paragraphs in work
related documents” (Peterson et al., 1999, p. 58)). This was generally followed by two to three scales
per descriptor. For example, the Basic and Cross-functional Skills questionnaire presented a level,
importance, and job entry requirement scale for each skill; and the GWAs questionnaire presented
a level, importance, and frequency scale for each GWA. The 7-point level scales included behavioral
anchors, and 5-point importance scales had anchors that went from “Not Important” to “Extremely
Important.”
The sampling strategy for the prototype started with the 80 occupations referred to above. Next,
organizations of different sizes likely to employ individuals in these occupations were identi-
fied (Peterson et al., 1999). Organizations, called “establishments” in the sampling strategy, were
screened via phone, negotiations with human resource representatives were conducted, and an orga-
nizational level interview was conducted. This was followed by mailing incumbent questionnaires
to each organization’s point-of-contact. Overall response rates for this approach were poor, although
60% of incumbents who actually received a questionnaire did complete it. The primary problem
was the decision by many establishments not to distribute questionnaires after they had received
them in the mail (Petersonal et al., 1999). A sufficient number of incumbents, to support analysis,
responded for only 30 of the targeted 80 occupations. In addition to the incumbent ratings, analysts
(i.e., DoL Occupational Analysis Field Center analysts and I-O psychology graduate students) rated
each of the 1,122 occupations on some of the descriptor domains after reviewing a compiled list of
10 to 20 tasks for each occupation.

Prototype Results and Evaluation


Despite the overall low response rate for the incumbent data collection, reliability and validity
analyses yielded positive results (Peterson et al., 1999; Peterson et al., 2001). Within domains of
the content model interrater reliability estimates were at least .70 (based on an average of 10 rat-
ers). Descriptor ratings were able to distinguish among different occupations in expected ways.
Examination of correlations (including factor analyses) among descriptors within and across domains
of the content model showed results consistent with expectations. For example, a factor analysis of
the Basic and Cross-Functional skills showed three theoretically consistent factors: (a) cognitive
skills, (b) technical skills, and (c) organizational skills (Mumford et al., 1999). Additionally, GWAs
focusing on manual and physical activities correlated strongly with similar abilities (e.g., strength
and psychomotor abilities; Peterson et al., 2001).
The Committee on Techniques for the Enhancement of Human Performance: Occupational
Analysis (1999) outlined a number of advances achieved by the O*NET prototype relative to previous
systems. First, it brought together current occupational classification (category) and occupational
descriptor (enumerative) systems into a single database that allows extensive description of
occupations and comparisons among occupations. The content model represents a “theoretically
informed and initially validated” (p. 195) appreciation of the ways in which occupations can be
usefully described. The resulting database is accessible via multiple windows. The same occupations
can be described and compared according to descriptors as varied as GWAs, cognitive ability
requirements, and interests. Furthermore, the Committee on Techniques for the Enhancement
of Human Performance: Occupational Analysis (1999) noted that work was becoming more (a)
cognitively demanding because of advances in information technology, (b) team-based because
of organizational restructuring, and (c) customer-oriented. These authors noted that the O*NET
prototype’s content model had the breadth and depth of descriptors to capture these trends.
Peterson et al. (2001) enumerated some specific limitations of the O*NET prototype. Primary
among these was the poor response rate. This led the Committee on Techniques for the Enhancement
The Dictionary of Occupational Titles and the Occupational Information Network 899

of Human Performance: Occupational Analysis (1999) to concerns about the representativeness of


the incumbent sample. The operational O*NET data collection took a number of steps to address
this issue. Additionally, a number of potential sources of rating inaccuracy were discussed (Peterson
et al., 2001) including inflation of ratings by incumbents, social desirability, and information over-
load. Peterson et al. (2001) pointed out, however, that these are potential inaccuracies that are hardly
unique to O*NET and that there is no strong evidence, in the analyses of O*NET prototype data,
supporting them.
Other challenges are relevant to the unit of analysis (Peterson et al., 2001). For example, orga-
nizational context information is generally associated with individual organizations. It is possible
that such descriptors vary so much across organizations that it is not informative to associate a
mean level of a particular organizational context variable (e.g., autonomy) with an occupation. The
O*NET prototype also assumes that there is a true score for an occupation on a particular descriptor
and that it can be estimated by calculating the mean across a number ratings (i.e., aggregation). This
assumption may not perfectly fit some occupations; for example, junior incumbents may perform
work differently than senior incumbents and work environments can vary considerably across orga-
nizations within an occupation. These may be issues that won’t ever be completely understood or
solved, but minimized and/or managed with data collection monitoring.

O*NET BECOMES OPERATIONAL


The operational database has gone through a number of revisions. The first version, O*NET 98,
consisted of analyst ratings of 1,122 BLS OES based occupations (Tsacoumis, 2007). Next, the
O*NET occupational structure was modified to be more consistent with a new system that included
900+ occupations. This second version was titled “O*NET 4.0” and is referred to as the “analyst
ratings only” data. These data are currently being periodically updated with incumbent ratings
collected by an improved data collection system. The first update, 2003, included new information
(i.e., new incumbent ratings) for 54 occupations. As of June, 2008, the most recent version, O*NET
13.0, contains updated data for 809 of 812 of O*NET’s current “occupations” (O*NET Resource
Center, 2008)

A STANDARD OCCUPATIONAL STRUCTURE


Recall that the APDOT (1993) recommended a consistent occupational structure that allowed occu-
pational descriptive information to be linked to labor-labor market databases (e.g., BLS’s labor statis-
tics). Consistent with this recommendation, in 2000, the Office of Management and Budget (OMB)
directed all government agencies to collect such data on the basis of the Standard Occupational
Classification System (SOC; DoL BLS, 2004). The current O*NET SOC 2006 occupational struc-
ture includes 949 occupational titles; data are being collected for 812 of these, referred to as data
level occupations (National Center for O*NET Development, 2006; Tsacoumis, 2007). The remain-
ing 137 titles include (a) military occupations, (b) catch all titles (i.e., “all other” occupations in a
particular area), and (c) SOC titles that are covered in more detail by one or more of the 812 data
level occupations. This new occupational structure is important not only because it matches the way
employment data are collected. It also represents a structure with less than one-tenth the data col-
lection burden, in terms of number of occupations, compared with the DOT (i.e., 812 O*NET SOC
vs. approximately 12,000 DOT occupations).

OTHER REVISIONS
Other revisions to the O*NET’s prototype involved adjustments to the rating scales (i.e., data collec-
tion instruments) and the content model itself on the basis of postprototype reviews and additional
research (e.g., Hubbard et al., 2000). Significant changes to the content model are discussed in the
900 Handbook of Employee Selection

content model discussion above. On the basis of analyses of the prototype data and other research,
the basic structure of the descriptor rating scales was modified. For example, the prototype GWAs
questionnaire presented a level, importance, and frequency scale for each GWA, respectively. This
was changed so that the importance rating comes first and a “level” rating is made only if the impor-
tance rating is greater than 1 “Not Important.” Finally, the frequency scale was deleted. Figure 41.4
shows these scales for the first GWA in the current questionnaire. These changes were designed to
improve the reliability and validity of ratings partly by making the rating process more straightfor-
ward and reducing the raters’ burden.

DATA COLLECTION AND SCREENING


The primary data collection method for the operational O*NET is administering questionnaires
to incumbents. Respondents are identified by first randomly sampling establishments likely to use
incumbents in targeted occupations and then randomly sampling workers in those occupations
within those establishments (DoL Employment and Training Administration (ETA), 2005). Only a
subset of the questionnaires is administered to an individual respondent. Because of several proce-
dural adjustments relative to the prototype, in the period between June 2001 and September 2004,
the establishment response rate was 70%—a vast improvement over the prototype’s result and the
employee response rate was 65%, similar to the prototype’s result. These adjustments improved
the representativeness of the data. Several analyses are used to ensure the quality of the data. For
example, the value in the O*NET database for a particular descriptor (e.g., Deductive Reasoning)
on a particular scale (e.g., Level) for a particular occupation (e.g., plumber) is based on the mean of
ratings made by multiple incumbents. Mean values for scales are flagged for suppression if sample
size is too small (n < 10) or if the there is too little agreement across raters (i.e., a large standard
error). Additionally, interrater reliabilities are calculated within each descriptor domain (e.g., GWAs
or work styles).
Occupational analysts make ratings for the abilities after reviewing incumbent ratings of task,
GWA, and work context descriptors (Donsbach, Tsacoumis, Sager, & Updegraff, 2003; Tsacoumis
& Bryum, 2006). Using analyst ratings of abilities instead of incumbent ratings is based on the
rationale that analysts are likely to understand some ability constructs (e.g., Category Flexibility
and Speed of Closure) better than incumbents. Eight analysts rate each descriptor. Similar to the

Observing, receiving, and otherwise


1. Getting information
obtaining information from all relevant
sources.

A. How important is GETTING INFORMATION to the performance of the occupation?

Not Somewhat Very Extremely


important∗ important Important important important
1 2 3 4 5

∗ If you marked not important, skip LEVEL below and go on to the next activity.

B. What level of GETTING INFORMATION is needed to perform the occupation?

Follow a Study international


standard blueprint Review a budget tax laws

1 2 3 4 5 6 7
Highest level

FIGURE 41.4 The first GWA from the current O*NET incumbent data collection questionnaires. (Adapted
from the O*NET Data Collection Program: Office of Management and Budget Clearance Package Supporting
Statement and Data Collection Instruments, 2008.)
The Dictionary of Occupational Titles and the Occupational Information Network 901

incumbent ratings, the mean level rating for a particular descriptor on a particular scale for a par-
ticular occupation is flagged for suppression if the mean importance rating is too low. Also, mean
importance and level ratings are flagged if the there is too little agreement among raters. With these
ratings collected for 480 of the 812 occupations, the median importance and level interrater reli-
abilities were .87 and .90, respectively (i.e., on the basis of eight raters per occupation)4.
Soon the update phase of O*NET’s data collection efforts will be complete and a database, con-
sisting of incumbent ratings for most of the descriptors (analysts ratings for the abilities and skills),
for 812 O*NET SOC occupations will be available (O*NET Center, 2007a). Another, planned effort
includes collecting incumbent/analyst data on occupations that are economically important, pro-
jected to grow in employment, or are being substantially affected by technology and innovation
(Tsacoumis, 2007). Additionally, data are being collected from occupational experts on new and
emerging occupations.

O*NET DATABASE
The online database is a publicly available, free, and very user friendly tool (O*NET Center, 2007 b;
see http://online.onetcenter.org). It can be accessed in several different ways. For example, O*NET
Online supports searching for job descriptions by name, O*NET SOC number, or job family. Once
an occupation is identified, summary, detailed, or customized descriptive reports can easily be gen-
erated depending on desired detail or the domains of the content model on which the user wishes to
focus. Additional services and products offered include self-assessment tools on interests and abili-
ties that individuals can use to match themselves with potential occupations and a downloadable
file version of the O*NET database for use by human resource, counseling, and other professionals
and related researchers.

APPLICATIONS OF O*NET
Frequency of Use
In 2008 the DoL ETA reported information regarding usage of O*NET products. At that time,
O*NET Online (http://online.onetcenter.org) was being visited an average of 775,000 times per
month. As of September 2007, 2,179 individuals and organizations provided certifying information
indicating their intent to use O*NET data and other products (DoL ETA, 2008). These users repre-
sented categories such as educational services, computer system design and program services, and
government and public administration. At that time, 5,309 websites were linked to O*NET Online.
Between January 2002 and September 2007 there were 59,232 downloads of the O*NET database.
O*NET has also been referred to extensively in literature. As of September 2007, DoL ETA (2008)
showed 58 journal publications and 35 book and book chapters referencing O*NET.

General Application
There are several general descriptions of O*NET discussing its characteristics and potential useful-
ness. Early descriptions were either by or contributed to researchers who worked on O*NET’s devel-
opment (e.g., Committee on Techniques for the Enhancement of Human Performance: Occupational
Analysis, 1999; Peterson et al., 1999; Peterson et al., 2001) and focused to a considerable extent on
how well O*NET addressed APDOT’s original mandate. Other authors discussing the overall area
of job analysis also gave O*NET considerable attention (e.g., Sackett & Laczo, 2003). In a job analy-
sis text book, Brannick et al. (2007) provided a multi-page description of O*NET mentioning its
advantages relative to matching people to jobs and comparing jobs on cross-occupational attributes

4 On the basis of (a) a study conducted by Tsacoumis & Van Iddekinge (2006) supporting the validity of analyst skill
ratings (collected via the same method as analyst ability ratings), (b) the idea that some skills might also be better under-
stood by analysts, and (c) practical considerations (i.e., cost), the U.S. Department of Labor ETA (2008) has decided to
use analyst instead of incumbent skill ratings.
902 Handbook of Employee Selection

TABLE 41.1
A Comparison of Some Occupations in Terms of the Level of GWA
Performing General Physical Activitiesa
O*NET-SOC Occupation O*NET Level Rating
Tree trimmers and pruners 88
Light or delivery service truck drivers 84
Carpet installers 77
Pest control workers 66
Short-order cooks 54
Home health aides 44
Childcare workers 37
Security guards 24
Telemarketers 16

O*NET-SOC titles and level ratings from O*NET Online 12.0 (2008).
a Performing general physical activities is defined in the O*NET content model as performing

physical activities that require considerable use of your arms and legs and moving your whole
body, such as climbing, lifting, balancing, walking, stooping, and handling materials.

(i.e., descriptors). These authors also indicate that the O*NET content model and database could be
useful as (a) a source of information for developing competency models, (b) generating descriptions
of jobs for the purpose of identifying characteristics to assess with selection instruments, and (c)
serving as an early step in the development of more detailed job analysis information.
Tables 41.1 and 41.2 show two examples of how O*NET data might be presented and used
for different purposes (O*NET Online 12.0, 2008). Table 41.1 shows the level of the GWA
Performing General Physical Activities (on a scale from 0 to 1005) needed for the performance of
nine occupations that a high school graduate might consider. A career counselor might use such
information when advising a client at this educational level with physical fatigue issues. Table 41.1
shows that some occupations required much less physical activity than others. Table 41.2 shows the
work style O*NET descriptors with importance ratings of 50 or greater, on a scale from 0 to 1006,
for childcare workers. It shows the work styles rated to be most important to the performance in this
occupation. A human resources (HR) professional might use this information to identify potential
commercially offered personality instruments that could be helpful as part of a selection system for
this occupation.

Specific Research Applications


Like the DOT, the O*NET has also been used in specific research applications. Some are in expected
areas. For example Converse, Oswald, Gillespie, Field, and Bizot. (2004) used O*NET’s ability rat-
ings to develop a person-occupation matching system for career guidance. More specifically, the
authors addressed one of the criticisms of O*NET ratings.
Harvey and Wilson (2000) have criticized the types of rating instruments used to collect O*NET data,
arguing it is unlikely that high quality data will be produced in situations where abstract constructs
are measured using single-item scales that include only a few general activities as anchors. However,
although perhaps not ideal for some purposes (e.g., situations in which more concrete ability require-
ments information is needed), the ability data contained within O*NET is useful for our person–
occupation matching purposes. On the basis of the analyses presented here and expert judgments of
career counselors at the Ball Foundation, the O*NET ability scores did produce reasonable aptitude
profiles for occupations. (Converse et al., 2004, p. 484)

5 O*NET GWA level ratings are made on a 7-point scale and transformed to a 100-point scale for presentation.
6 O*NET work styles importance ratings are made on a 5-point scale and transformed to a 100-point scale for presentation.
The Dictionary of Occupational Titles and the Occupational Information Network 903

TABLE 41.2
Important Work Styles for Childcare Workers With Importance Ratings of 50 or Higher
Title of O*NET O*NET
Work Style Definition of O*NET Work Style Importance Rating
Dependability Job requires being reliable, responsible, and dependable, and fulfilling 92
obligations.
Self-control Job requires maintaining composure, keeping emotions in check, controlling 89
anger, and avoiding aggressive behavior, even in very difficult situations.
Concern for others Job requires being sensitive to others’ needs and feelings and being 86
understanding and helpful to others on the job.
Integrity Job requires being honest and ethical. 86
Cooperation Job requires being pleasant with others on the job and displaying a 82
good-natured, cooperative attitude.
Stress tolerance Job requires accepting criticism and dealing calmly and effectively with 81
high-stress situations.
Social orientation Job requires preferring to work with others rather than alone, and being 78
personally connected with others on the job.
Adaptability/ Job requires being open to change (positive or negative) and to considerable 74
flexibility variety in the workplace.
Attention to detail Job requires being careful about details and thorough in completing tasks 69
Leadership Job requires a willingness to lead, take charge, and offer opinions and 69
direction.
Initiative Job requires a willingness to take on responsibilities and challenges. 68
Independence Job requires developing one’s own ways of doing things, guiding oneself 65
with little or no supervision, and depending on oneself to get things done.
Persistence Job requires persistence in the face of obstacles. 58
Innovation Job requires creativity and alternative thinking to develop new ideas for and 57
answers to work-related problems.
Achievement/ Job requires establishing and maintaining personally challenging 54
effort achievement goals and exerting effort toward mastering tasks.
Analytical Job requires analyzing information and using logic to address work-related 50
thinking issues and problems.

O*NET descriptors and importance ratings from O*NET Online 12.0 (2008).

Reiter-Palmon, Brown, Sandall, Buboltz, and Nimps (2006) described use of the O*NET content
model to structure and develop a web-based job analysis process that collects specific tasks and
links them to O*NET abilities and skills. Taylor, Li, Shi, and Borman (2008) examined the trans-
portability of job information across countries. They compared data collected in the United States
(i.e., from the O*NET database) to responses to the scales on the O*NET GWAs, basic and skill
functional skills, and work styles questionnaires from incumbents in three jobs (i.e., first-line super-
visor, office clerk, and computer programmer) in three other countries (i.e., New Zealand, China,
and Hong Kong). Results showed only modest differences suggesting that generally this type of job
information is transportable across countries.
O*NET has also been used in research applications that may have been somewhat less expected.
For Glomb, Kammeyer-Mueller, and Rotundo (2004), O*NET data was one of the sources of
information used to compare occupations in terms of emotional and cognitive demands and to assess
these demands as factors related to compensation. Crouter, Lanza, Pirretti, Goodman, and Neebe
(2006) demonstrated how O*NET data could be used to understand the work life circumstances
of parents in the family research context. Liu, Spector, and Jex (2005) used O*NET data for the
descriptor Autonomy in the work values domain of the content model as one source of information
904 Handbook of Employee Selection

to investigate the relationship between job stressors and physical well-being and psychological
strain. In a related effort, Dierdorff and Ellington (2008) showed that significant variance in work-
family conflict can be accounted for by two characteristics of an individual’s occupation (i.e.,
interdependence and responsibility for others). Data from scales in the work context domain of the
content model were used to generate scores on each characteristic for the occupations included in
their analyses.
One particular research effort demonstrates O*NET’s content model as a useful taxonomy of
occupational descriptors. While developing a system for selecting individuals for team settings,
Morgeson, Reider, and Campion (2005) used selected domains from the content model to con-
duct a job analysis and to develop questions for a structured selection interview. With the goal of
identifying the knowledge, skills, abilities, and other attributes (KSAOs) required for successful
performance in team settings, job incumbents completed O*NET skill, ability, and work styles
surveys regarding the level and importance of these descriptors to their work. Responses were used
as part of the job analysis effort to identify KSAOs important to performance in this study’s work
context (i.e., a Midwestern steel corporation). This job analysis information was used to develop
questions for a structured selection interview. Content for a number of these questions was taken
from the O*NET skills and work styles domains (e.g., Active Listening, Social Perceptiveness, Time
Management, Cooperation, and Stress Tolerance).
Another study demonstrates the usefulness of the populated O*NET database. LaPolice, Carter,
and Johnson (2008) used O*NET data in the component validation context to show that O*NET
descriptors can be used to predict literacy scores for O*NET-SOC occupations. Researchers used
mean level ratings for these occupations from four content domains (i.e., abilities, skills, knowledge,
and GWAs). Within each domain they identified descriptors relevant to three kinds of literacy identi-
fied in the national adult literacy survey (NALS) (a) prose (involving text in sources such as editori-
als, news stories, poems, and fiction), (b) document (involving text in sources such as job applications,
payroll forms, maps, tables, etc.), and (c) quantitative (involving arithmetic operations, numbers in
printed materials, etc,). Literacy scores for each occupation were calculated from responses on the
NALS for individuals who identified their occupation in the survey’s background questionnaire.
Results showed that mean NALS prose, document, and quantitative literacy scores for these occupa-
tions were predicted by descriptor level ratings in the O*NET database with multiple correlations
coefficients, corrected for shrinkage, of .81, .79, and .80, respectively. This study also demonstrates
how O*NET data can be used at a level above that represented by the descriptors in each content
domain. LaPolice, Carter, and Johnson (2008) identified a practically useful level of aggregation they
referred to as “literacy” derived from four content domains. Other possibilities in this direction might
include aggregating O*NET descriptors to address important broad competencies other than literacy
(e.g., Interpersonal Skills, Cognitive Flexibility, Technological Awareness, and Leadership).

Other Uses
Beyond the direct uses of O*NET data and associated tools that the O*NET Center makes available,
the DoL ETA (2007) keeps track of organizations that use O*NET data to design, implement, and
sustain a variety of practical applications that are described in the “into action” section of its website.
They focus on matching people to work in different environments from different perspectives.

• Several programs are using O*NET to help students understand and prepare for the world
of work. Toward this end it helps them identify occupations that match their interests and
capabilities and evaluate the educational, training, and practical experiences required
to prepare for those occupations (e.g., Boys & Girls Club of America and New York
CareerZone).
• Other programs focus on assisting individuals to find work matching their skills and other
characteristics. They include interventions that focus on older workers, some of whom
are reentering the workforce, disabled workers, and workers seeking new employment
The Dictionary of Occupational Titles and the Occupational Information Network 905

voluntarily or involuntarily (i.e., because of events such as large layoffs and plant closures).
Typically, O*NET occupation descriptions are combined with national or local labor mar-
ket information to help individuals identify job opportunities and/or potential alternative
occupations (e.g., Alabama’s Comprehensive Labor Market Information System, Maryland
Division of Rehabilitation Services, and Catholic Community Services in Baton Rouge).
• Lehigh Carbon Community College used O*NET and state and national labor market data,
among other tools, to help a curriculum design effort to understand the needs of business
and industry.
• A large staffing service used O*NET occupation descriptions to help understand the needs
of its corporate customers in terms of in-demand skill-sets.

These research and applied examples demonstrate that O*NET is being used. Additionally, it appears
to have reinvigorated research made possible by a complete, up-to-date taxonomy of occupations
described by a complete, up-to-date taxonomy of occupational descriptors.

CONCLUSIONS
O*NET and the DOT are two large-scale efforts to provide comprehensive descriptions of all occu-
pations in the U.S. economy. Three questions are relevant to evaluating their success. First is the
taxonomy of occupations complete and reasonably valid? Second, is the taxonomy of occupational
descriptors complete and are the methods of assessing occupations on those descriptors reasonably
valid? Finally, is the information presented in a manner that is accessible to its users?
As noted at the beginning of this chapter, the DOT and the O*NET were developed in different
eras in response to different societal needs and with different technologies. The DOT was, in our
opinion, a highly successful response to the challenges of its era. Its shortcomings with regard to
these three questions became apparent only as the times changed and are embodied in the content
of the report produced by APDOT (APOT, 1993), the panel that was specifically charged with
identifying the desired characteristics of a system to replace the DOT. Whether the current version
of O*NET represents an affirmative answer to these questions can be captured by the extent to
which O*NET satisfies APDOT’s charge for a redesigned system. APDOT’s first requirement was
for a new occupation classification system that could be efficiently linked to other occupational and
labor market databases. OMB’s establishment of the SOC (i.e., 949 occupations; 812 of which will
be described in O*NET) as the across government standard substantially addresses the need for a
comprehensive taxonomy of occupations. A big advantage is that the Bureau of Labor Statistics will
track employment for the same occupations that O*NET describes. The assessment of agreement in
ratings across incumbents within an occupation could and should be used as one way of monitoring
whether an occupational category is and has remained a single “occupation.”
The APDOT also stipulated that the new system should include an improved sampling approach
and modern and efficient methods of data collection. The current O*NET procedures that focus on
collecting most descriptor data via incumbent questionnaires and identifying incumbents from a
representative sample of organizations satisfies these requirements. The current system’s favorable
data collection response rates also support the representativeness its sample.
APDOT emphasized the need for a comprehensive taxonomy of cross occupation descriptors
that is hierarchically organized and describes occupations from multiple windows (i.e., with dif-
ferent content domains). The extensive O*NET content model represents substantial evidence that
these requirements have been met.
Finally, O*NET Online, a completely free-of charge database accessible through the Internet,
substantially achieves the APDOT vision that the data should be presented in a relational electronic
database that could be queried dynamically in a manner that facilitates access for a variety of users.
The O*NET usage information presented above also documents the high degree of access that has
been achieved.
906 Handbook of Employee Selection

However, there are two questions that have not been completely addressed. First, can or will
O*NET be sufficiently sensitive to identify the emergence and disappearance of occupations?
Monitoring labor market information and actively seeking to develop descriptions of emerging occu-
pations can address this question. O*NET is currently involved in the latter. The second question is
validity. A great deal of evidence supporting the validity of O*NET data collection instruments was
generated during the prototype development and data collection. Subsequent development efforts
have enhanced its validity. However, more research comparing O*NET data to other sources of
occupational information will further the investigation of its validity and likely identify opportuni-
ties for its improvement.
The O*NET was conceived as an open system, continuously evolving to reflect changes in the
occupational structure and in the ways that occupations can be usefully described. As long as this
stricture is followed, there is ample reason to believe that a national occupational database will
continue to stand as a milestone of I-O psychology’s achievements.

REFERENCES
Advisory Panel for the Dictionary of Occupational Titles (APOT). (1993). The new DOT: A database of occu-
pational titles for the twenty-first century (Final Report). Washington, DC: Employment Service, U.S.
Department of Labor Employment and Training Administration.
Arad, S., Hanson, M. A., & Schneider, R. J. (1999). Organizational context. In N. G. Peterson, M. D. Mumford,
P. R. Jeanneret, & E. A. Fleishman (Eds.), An occupational information system for the 21st century: The
development of O*NET (pp. 147–174). Washington, DC: American Psychological Association.
Borman, W. C., & Brush, D. H. (1993). More progress toward a taxonomy of managerial performance require-
ments. Human Performance, 6, 1–21.
Brannick, M. T., Levine, E. L., & Morgeson, F. P. (2007). Job and work analysis: Methods, research, and appli-
cations for human resource management. Thousand Oaks, CA: Sage.
Byrum, C. N., & Tsacoumis, S. (2006). O*NET analyst occupational abilities ratings: Analysis Cycle 623
results. Alexandria, VA: Human Resources Research Organization.
Committee on Techniques for the Enhancement of Human Performance: Occupational Analysis. (1999). The
changing nature of work: Implications for occupational analysis. Washington, DC: National Academy
Press.
Converse, P. D., Oswald, F. L., Gillespie, M. A., Field, K. A., & Bizot, E. B. (2004). Matching individuals
to occupations using abilities and the O*NET: Issues and an application in career guidance. Personnel
Psychology, 57, 451–487.
Crouter, A. C., Lanza, S. T., Pirretti, A., Goodman, W. B., and Neebe, E. (2006). The O*NET jobs classification
system: A primer for family researchers. Family Relations, 55, 461–472.
Dawis, R. V., & Lofquist, L. H. (1984). A psychological theory of work adjustment: An individual-differences
model and its applications. Minneapolis, MN: University of Minnesota Press.
Dierdorff, E. C., & Ellington, J. K. (2008). It’s the nature of work: Examining behavior-based sources of work-
family conflict across occupations. Journal of Applied Psychology, 93 (4), 883–892.
Donsbach, J., Tsacoumis, S., Sager, C., & Updegraff, J. (2003). O*NET analyst occupational abilities ratings:
Procedures (DFR-03-22). Alexandria, VA: Human Resources Research Organization.
Droege, R. C. (1988). Department of Labor job analysis methodology. In S. Gael (Ed.), The job analysis hand-
book for business, industry, and government (Vol. 2, pp. 993–1018). New York, NY: Wiley.
Dunnette, M. D. (1999). Introduction. In N. G. Peterson, M. D. Mumford, P. R. Jeanneret, & E. A. Fleishman
(Eds.), An occupational information system for the 21st century: The development of O*NET (pp. 3–7).
Washington, DC: American Psychological Association.
Dye, D., & Silver, M. (1999). The origins of O*NET. In N. G. Peterson, M. D. Mumford, P. R. Jeanneret, &
E. A. Fleishman (Eds.), An occupational information system for the 21st century: The development of
O*NET (pp. 9–19). Washington, DC: American Psychological Association.
Fine, S. A. (1968a). The use of the Dictionary of Occupational Titles as a source of estimates of educational and
training requirements. Journal of Human Resources, 3, 363–375.
Fine, S. A. (1968b). The 1965 Third Edition of the Dictionary of Occupational Titles—Content, Contrasts, and
Critique. Kalamazoo, MI: Upjohn Institute for Employment Research.
Fine, S. A. (1988). Functional job analysis scales: A desk aid. Milwaukee, WI: Author.
The Dictionary of Occupational Titles and the Occupational Information Network 907

Fleishman, E. A. (1975). Manual for Ability Requirement Scales (MARS). Bethesda, MD: Management
Research Institute.
Fleishman, E. A. (1988). The ability requirements scales. In S. Gael (Ed.), The job analysis handbook for busi-
ness, industry, and government (pp. 917–925). New York, NY: Wiley.
Glomb, T. M., Kammeyer-Mueller, J. D., & Rotundo, M. (2004). Emotional labor demands and compensating
wage differentials. Journal of Applied Psychology, 89(4), 700–714.
Harvey, R. J., & Wilson, M. A. (2000). Yes Virginia, there is an objective reality in job analysis. Journal of
Organizational Behavior, 20, 829–854.
Holland, J. L. (1976). Vocational preferences. In M. D. Dunnette (Ed.), Handbook of industrial and organiza-
tional psychology (pp. 521–570). Chicago, IL: Rand McNally.
Hubbard, M., McCloy, R. Campbell, J., Nottingham, J., Lewis, P., Rivkin, D., & Levine, J. (2000). Revision
of O*NET data collection instruments (Revised Version). Raleigh, NC: National O*NET Consortium;
National Center for O*NET Development; Employment Security Commission.
Jeanneret, P. R., Borman, W. C., Kubisiak, U. C., & Hanson, M. A. (1999). Generalized work activities. In
N. G. Peterson, M. D. Mumford, P. R. Jeanneret, & E. A. Fleishman (Eds.), An occupational information
system for the 21st century: The development of O*NET (pp. 105–125). Washington, DC: American
Psychological Association.
Liu, C., Spector, P. E., & Jex, S. M. (2005). The relation of job control with job strains: A comparison of mul-
tiple data sources. Journal of Occupational and Organizational Psychology, 78, 325–336.
McCormick, E. J., Jeanneret, P.R., & Mecham, R.C. (1972). A study of job characteristics and job dimensions as based
on the Position Analysis Questionnaire (PAQ). Journal of Applied Psychology Monograph, 56, 347–368.
McCormick, E. J., Mecham, R. C., & Jeanneret, P. R. (1989). Technical manual for the Position Analysis
Questionnaire (2nd ed.). Palo Alto, CA: Consulting Psychologist Press.
Miller, A. R., Treiman, D. J., Cain, P. S., & Roos, P. A. (Eds.). (1980). Work, jobs and occupations: A critical
review of the Dictionary of Occupational Titles. Washington, DC: National Academy Press.
Morgeson, F. R., Reider, M. H., & Campion, M. A. (2005). The importance of social skills, personality charac-
teristics, and teamwork knowledge. Personnel Psychology, 58, 583–611.
Mumford, M. D., Peterson, N. G., & Childs, R. A. (1999). Basic and cross-functional skills. In N. G. Peterson, M.
D. Mumford, P. R. Jeanneret, & E. A. Fleishman (Eds.), An occupational information system for the 21st
century: The development of O*NET (pp. 49–69). Washington, DC: American Psychological Association.
National Center for O*NET Development. (2006). Updating the O*NET-SOC taxonomy: Summary and impli-
cations. Retrieved July 23, 2008, from http://www.onetcenter.org/reports/UpdatingTaxonomy.html
National Center for O*NET Development U.S. Department of Labor (2007). The O*NET® content model
detailed outline with descriptions. Retrieved January 30, 2008, from http://www.onetcenter.org/dl_files/
ContentModel_DetailedDesc.pdf
O*NET Center (2007a). O*NET center data collection. Retrieved January 30, 2008, from http://www.
onetcenter.org/dataPublication.html
O*NET Center (2007b). O*NET Resource center. Retrieved January 30, 2008, from http://www.onetcenter.org
O*NET Online 12.0. O*NET Center. Retrieved July 30, 2008, from http://online.onetcenter.org/find
O*NET Resource Center. Data publication schedule. Retrieved July 31, 2008, from http://www.onetcenter.org/
dataPublication.html
Paterson, D. G., & Darley, J. G. (1936). Men, women, and jobs. Minneapolis, MN: University of Minnesota Press.
Peterson, N. G., Borman, W. C., Mumford, M. D., Jeanneret, P. R., & Fleishman, E. A. (Eds.). (1999). An
occupational information system for the 21st century: The development of O*NET. Washington, DC:
American Psychological Association.
Peterson, N. G., Mumford, M. D., Borman, W. C., Jeanneret, P. R., Fleishman, E. A., Levin, K. Y., Campion, M. A.
et al. (2001). Understanding work using the occupational information network (O*NET): Implications
for practice and research. Personnel Psychology, 54, 451–492.
Reiter-Palmon, R., Brown, M., Sandall, D. L., Buboltz, C., & Nimps, T. (2006). Development of an O*NET
web-based job analysis and its implementation in the U.S. Navy: Lessons learned. Human Resources
Management Review, 16, 294–309.
Sackett, P. R., & Laczo, R. M. (2003). Job and work analysis. In W. C. Borman, D. R. Ilgen, & R. J. Kilmoski
(Eds.), Handbook of psychology: Industrial and organizational psychology (Vol. 12, pp. 21–37).
Hoboken, NJ: John Wiley & Sons.
Strong, M. H., Jeanneret, P. R., McPhail, S. M., Blakely, B. B., & D’Egidio, E. L. (1999). Work context:
Taxonomy and measurement of the work environment. In N. G. Peterson, M. D. Mumford, P. R.
Jeanneret, & E. A. Fleishman (Eds.), An occupational information system for the 21st century: The
development of O*NET (pp. 127–145). Washington, DC: American Psychological Association.
908 Handbook of Employee Selection

Tsacoumis, S. (2007, May). The feasibility of using O*NET to study skill changes. Paper presented at the
Workshop on Research Evidence Related to Future Skills Demands sponsored by Center for Education
National Research Council, Washington, DC.
Tsacoumis, S., & Van Iddekinge, C. H. (2006). A comparison of incumbent and analyst ratings of O*NET
skills. Alexandria, VA: Human Resources Research Organization.
U.S. Department of Commerce. (1980). Standard Occupational Classification manual. Washington, DC: Author.
U.S. Department of Labor. (1939). Dictionary of occupational titles. Washington, DC: U.S. Government
Printing Office.
U S. Department of Labor. (1949). Dictionary of occupational titles. (2nd ed.). Washington, DC: U.S. Government
Printing Office.
U.S. Department of Labor. (1965). Dictionary of occupational titles. (3rd ed.). Washington, DC: U.S. Government
Printing Office.
U.S. Department of Labor. (1972). Handbook for analyzing jobs. Washington, DC: U.S. Government Printing
Office.
U.S. Department of Labor. (1980). Dictionary of occupational titles. (4th ed.). Washington, DC: U.S. Government
Printing Office.
U.S. Department of Labor. (1991a). Dictionary of occupational titles. (4th. ed., rev.). Washington, DC: U.S.
Government Printing Office.
U.S. Department of Labor. (1991b). The revised handbook for analyzing jobs. Washington, DC: U.S. Government
Printing Office.
U.S. Department of Labor, Bureau of Labor Statistics. (2004, December). Standard Occupational Classification
(SOC) user guide. Retrieved January 31, 2008, from http://www.bls.gov/soc/socguide.htm
U.S. Department of Labor, Employment and Training Administration. (2005, September). O*NET data col-
lection program: Office of management and budget clearance package supporting statement. Retrieved
January 31, 2008, from http://www.onetcenter.org/dl_files/ContentModel_DetailedDesc.pdf
U.S. Department of Labor, Employment and Training Administration. (2007, February). O*NET in action.
Retrieved December 11, 2007, from http://www.doleta.gov/programs/onet/oina.cfm
Wagner-Peyser Act of 1993, P.L. 73-30, 48 Stat. 113, 73d Cong. (1933).
42 Situational Specificity and
Validity Generalization

Lawrence R. James and Heather H. McIntyre

Most investigators would agree that a well-designed test will have nonzero validity against a well-
measured criterion in work contexts that require whatever the test measures for success. If “validity
generalization” were limited to this inference, then there would be no reason for this chapter. Indeed,
the authors of this chapter subscribe to this inference. But, validity generalization is not limited to this
inference. Validity generalization (VG) is often in the business of slaying the situational specificity
dragon. A somewhat famous quote from the developers of VG illustrates this point:

The evidence from these two studies appear to be the last nail required for the coffin of the situational
specificity hypothesis. (Schmidt, Hunter, Pearlman, & Rothstein-Hirsch, 1985, p. 758.)

In our opinion, the situational specificity hypothesis continues to thrive. It is in support of that opinion
that this chapter is written.
There is little question that soundly measured knowledge, skills, abilities (i.e., KSAs) and
personality generally predict soundly measured, organizationally relevant criteria. In this sense,
validity can be said to generalize. Whether the validity for a given type of predictor (e.g., critical
intellectual skills) against a given class of criterion (e.g., job performance) is generally the same—or
cross-situationally consistent—is another issue. The cross-situational consistency hypothesis has
endured a long history of theoretical and empirical debate, the roots of which can be found in
dialectics such as the person-situation debate (e.g., Epstein, Mischel, & Peake, 1982). The emergence
of meta-analysis as a popular method for testing the consistency of predictive validities across a set
of separate studies accelerated and transformed the debate into one of a more quantitative and
methodological nature.
Basically, meta-analysis made it possible for organizational researchers to apply increasingly
sophisticated quantitative maneuvers so as to accurately assess the predictive validity of test scores
and, more importantly, the consistency of these estimates over studies. Perhaps the most well-established
product of these efforts is VG analysis in which validity coefficients for the same, or similar, predictor-
criterion variable pairs from different study samples are assessed in terms of cross-situational
(organizational) consistency (i.e., Pearlman, Schmidt, & Hunter, 1980; Hunter & Schmidt, 1998;
Schmidt & Hunter, 1977; Schmidt, Hunter, & Raju, 1988).
The VG approach attempts to control statistical artifacts that distort validity coefficient esti-
mates, thus enhancing the comparability of these estimates across situations (Hunter & Schmidt,
1998; Schmidt & Hunter, 1977; Schmidt et al., 1993). However, critics argue that VG paints an
incomplete picture of the cross-situational consistency of predictor-criterion pairs (see James,
Demaree, & Mulaik, 1986; James, Demaree, Mulaik, & Ladd, 1992; Algera, Jansen, Roe, & Vijn,
1984). This claim is made in light of two observations: (a) VG analysis rarely frees up all of the
between-sample variance in validity coefficient estimates, sometimes freeing up very little variance

909
910 Handbook of Employee Selection

for certain predictor-criterion pairs and/or job types (i.e., Murphy, 2000; Ones, Viswesvaran, &
Schmidt, 2003; Salgado, Anderson, Moscoso et al., 2003); and (b) the formulas used in VG analyses
fail to explicitly (statistically) incorporate measured situational variables (e.g., authority structure,
interpersonal interactions, social climate), despite extensive histories of situational influence on
many of the predictor-criterion pairs frequently cited in VG literature (i.e., Ghiselli, 1959, 1966,
1973; Peters, Fisher, & O’Connor, 1982). It is further noteworthy that closer inspection of VG equa-
tions reveals hidden assumptions and perhaps the potential for more than just a trace or hint of
cross-situational variability (James et al., 1992).
The following discussion poses philosophical and empirical questions of relevance to the
situational specificity-cross-situational consistency debate in the context of VG. For example, is
situational specificity truly a prerequisite for understanding the relationship between KSA (or
personality) scores and job-related behaviors? Are statistical corrections sufficient? In other words,
what are the scientific and practitioner-oriented advantages to including situational factors in the
estimation of cross-situational consistency in predictor-criterion relationships? The purpose of this
chapter is to address these questions by tracing the development of predictive validity estimation
from the early situational-specificity dominated approach to the new contender, cross-situational
consistency. This discussion includes a comparison of conceptual and methodological assumptions
underlying situational specificity and cross-situational consistency. Ultimately, we seek to identify
the advantages and disadvantages of each approach and critically assess their collaborative strength
in illuminating patterns of behavior in organizational settings.

SITUATIONAL SPECIFICITY
Validity studies conducted in the mid- to late-20th century offered modest hope for the utility of
personality and KSAs as predictors of crucial outcome variables (i.e., job performance) in applied
settings. Of particular interest were validity coefficients for cognitive ability tests, which tended to
be small to modest and inconsistent across job types (Ghiselli, 1959, 1966, 1973). As a result of
inconsistency, the situational specificity hypothesis (SSH) drew a large following and dominated
the literature for several decades (i.e., Murphy, 2000; Schmidt & Hunter, 1998). The adoption of the
SSH in fact reflected a more general academic movement toward situational specificity of behavior
(Mischel, 1968; Epstein et al., 1982). However, in the context of meta-analysis, the SSH posits
that the underlying population validities for any particular predictor-criterion pair will vary across
populations (studies) because of the differential impact of certain situational attributes on the (typi-
cally linear) relationship between the predictor and criterion (i.e., James et al., 1986, 1992; Murphy,
2000; Schmidt et al., 1993).
Support for the SSH is easily found in the literature. For example, several studies have revealed
the moderating role of job complexity on validity estimates of cognitive ability and KSAs (i.e.,
Hunter & Hunter, 1984; Levine, Spector, Menon, Narayanan, and Cannon-Bowers, 1996; Russell,
2001; Salgado et al., 2003; Schmidt et al., 1993). Hunter and Hunter (1984) found that cognitive abil-
ity demonstrated a higher validity for predicting job performance and training success for occupa-
tions involving greater task complexity. Salgado et al. (2003) found that the empirical validity for
general mental ability ranged from .12 for police to .34 for sales occupants when predicting supervi-
sor ratings of job performance. Schmidt et al. (1993) demonstrated that the variability of validity
estimates for various measures of cognitive ability (i.e., general, verbal, quantitative, reasoning,
perceptual speed, memory, and spatial and mechanical) was at least partially a function of job type.
For example, the standard deviation of the validity estimates (i.e., SDr) for reasoning ability predict-
ing job performance was .04 for jobs involving stenography, typing, and filing and .19 for production
and stock clerks.
In regard to personality and empirical validities, more extensive evidence of situational specificity
can be found, especially in the case of team-oriented organizations (Barrick & Mount, 1991, 1993;
Mount et al., 1998; Stewart, 1996; Stewart & Carson, 1995). For example, Barrick and Mount (1993)
Situational Specificity and Validity Generalization 911

found that validity coefficients representing relationships between key personality traits (conscien-
tiousness and extraversion) and job performance (as rated by supervisors) varied over managers as a
function of managers’ perceived level of autonomy on the job. Validities tended to increase in pro-
portion with the amount of perceived autonomy. Additionally, Mount et al. (1998) showed that some
Big-Five traits were more valid than other Big-Five traits, but the dominant trait varied as a function
of the degree of social and interpersonal interaction required at work. To illustrate, agreeableness and
extraversion had stronger validities against performance for employees engaged in team-oriented
jobs (e.g., highest mean validities were .24 and .20, respectively) relative to employees engaged in
clerical and “cubicle” jobs in which only dyadic interactions occur (i.e., newspaper employees in the
circulation department; banking employees in loan operations; telemarketing representatives) (e.g.,
highest mean validities were .16 and .16, respectively). In contrast, the opposite was true for consci-
entiousness. Specifically, dyadic jobs yielded greater validity estimates (e.g., highest mean validity
was .29) than team-oriented jobs (e.g., highest mean validity was .19). Moreover, even when validi-
ties are examined for one specific job type (e.g., sales), validities for the extraversion-sales effective-
ness relationship vary across organizations with only 54% of their variance being accounted for by
statistical artifacts (Barrick & Mount, 1991; Stewart, 1996).
The illustrative studies above suggest active involvement of situational moderators in relationships
between personality or KSA predictors and job performance. Perhaps even more important is the
fact that a considerable number of prominent situational attributes have not been examined, at least
extensively, as potential moderators of predictor-criterion relationships in meta-analyses. Included
here are environmental, organizational, climate, and culture variables such as authority structure,
standardization of job tasks and procedures, reward processes, leadership styles, organizational
norms and values, and stress. We address the issue of potential situational moderators later in this
chapter.
The critical point for now is that findings supporting the SSH led to certain conclusions for
scientists and practitioners interested in the utility of personality and cognitive ability scores and
a convenient method of assessing predictive validity. To ascertain the efficacy with which scores
obtained from a particular type of measure (i.e., general mental ability) predict a particular criterion
(i.e., job performance), a local validity estimate is warranted. In other words, for every organization
interested in employing general mental ability measures for the purposes of selection testing, pro-
motion, etc., a separate validity study, yielding an individualized validity coefficient, is advisable.
Indeed, it was this point of view that dominated validity research for most of the 1960s and 1970s.
As will be discussed later, this opinion has not been steadfast.
The reservoir of situational factors yet to be taken into account may seem overwhelming and
difficult to incorporate into studies. This problem is exacerbated by the fact that very little effort has
been made to cultivate conceptual models that could interpolate situational variables as explanations
of specificity of validity coefficient estimates (James et al., 1992). Therefore it is not surprising that
some prefer to question the viability of a situational specificity approach in terms of its ability to
effectively explain the situational complexities in meta-analytic results. Indeed, rather than address
these issues directly by assessing potential situational moderators, a more indirect approach was
adopted by a series of ongoing research studies (Hunter & Hunter, 1984). This approach asked the
questions “To what extent does the situation need to be statistically controlled in order to attain a
reasonable level of comparability across organizations and samples?” This question prompted other
questions, such as “Are there other between-sample sources of variance that can be localized and
removed in a parsimonious and systematic manner?” It was then thought that perhaps identifying
and partitioning variance attributable to these nonsituational factors would help to elucidate the
amount of variance left unaccounted for and, therefore, potentially attributable to situational factors. In
other words, rather than directly exploring the impact of situational factors on the stability of validities
through explanatory and statistical models, the presence of these factors is merely inferred based on
the amount of variance remaining. This indirect, process-of-elimination technique emerged as what
we now know as VG analysis (James et al., 1992).
912 Handbook of Employee Selection

VG
VG developed out of a need to increase the precision of validity coefficient estimates for similar
or identical predictor-criterion pairs—a need that had been unfulfilled during the critical period of
situational specificity. VG analysis is essentially a special version of meta-analysis and has been
described as an inferential version of its more descriptive counterpart (Murphy, 2000). That is,
whereas meta-analysis provides an overall depiction of the relationship between a predictor and
criterion for a set of samples, VG goes one step further to ascertain the degree to which certain
factors contribute to the consistency of this relationship across samples. Typically, these estimates
are computed separately for different job types (i.e., clerical, mechanical, managerial) (Schmidt &
Hunter 1977). The factors to be considered are statistical artifacts such as unreliability of predictor and
criterion scores, range restriction in predictor scores, and sampling error (i.e., Schmidt & Hunter
1977; Hunter & Schmidt, 1998; Schmidt et al., 1988; Schmidt et al., 1993).
A VG analysis begins with the standard meta-analytic procedure of calculating the mean (weighted
by sample size), over studies, of validity coefficients for a particular predictor-criterion pair and the
variance among validities around these respective means. Each study represents one sample from
its respective population (i.e., organization, situation), and it is assumed that (a) the true validity for a
particular predictor-criterion pair may be equal across populations but that (b) differences among studies
in statistical biases (e.g., predictor and/or criterion reliability, range restriction, and sampling error)
may distort and restrict the size of the observed validity. In an attempt to identify and effectively
model the impact of these biases on estimates of true validity, the following structural equation—in
which Greek symbols represent population parameters—is generally used in VG analysis:

r k = ρkαk1/2φk1/2ξk + ek (42.1)

where ρk is the population correlation between the unrestricted true scores for the predictor and
criterion in situation k (i.e., the true validity); r k represents the observed validity coefficient—i.e.,
the correlation between a predictor Xk and a criterion Yk for a random sample of nk individuals
from population (i.e., organization, situation) k; αk is the unrestricted population reliability for the
criterion in situation k; φk is the unrestricted population reliability for the predictor in situation k;
ξk reflects the degree of range restriction in the predictor in situation k; and ek is the sampling error
inherent in r k .
Once the statistical artifact population estimates are inserted in the equation and ρk is estimated
for each k, the next step in the VG analysis is to estimate the variance among the ρk, referred to as
V(ρ), and determine whether or not this variance is small enough to justify generalization of the
validity to all situations. V(ρ) is calculated based on the following estimation equation (see James
et al., 1992):

V(ρ) = [V(r) – V(r')]/Π (42.2)

where V(ρ) refers to variance in population (true) validities; V(r) is the between-situation variance in
the observed validities; V(r') is the expected between-situation variance in validities due to statistical
artifacts; and Π is an additional correction for mean reliabilities and range restriction across situations.
In essence, the amount of variance attributable to statistical artifacts is subtracted from the total
observed variance, and the remaining variance, termed “residual variance,” represents true variance
in validities that is unaccounted for (i.e., by statistical artifacts).
A primary step of a VG analysis is to determine whether or not cross-situational consistency in
validities has been achieved. Basically, if the estimate of V(ρ) is approximately equal to 0, the ρk are
deemed equal across situations (i.e., generalizable), whereas V(ρ) > 0 suggests a potential situational
moderator. To this end, two rules have emerged that elaborate on the term “approximately equal”
by imposing predetermined, theoretically justified critical values, and it is above these values that
Situational Specificity and Validity Generalization 913

V(ρ) must not extend in order for cross-situational consistency to be established. One example is the
“75% rule,” in which 75% of the total variance in validity estimates (i.e., V(ρ), must be accounted
for by statistical artifacts to effectively rule out the SSH. The remaining 25% is presumed to be
caused by unmeasured artifacts (i.e., clerical and programming errors; Hermelin & Robertson,
2001; Schmidt & Hunter, 1977). This rule has been highly debated in the literature for being insensi-
tive to situational moderators (e.g., James et al., 1986).
An additional technique for assessing cross-situational consistency is the 90% confidence rule,
in which the lower-bound of the validity distribution (e.g., 10th percentile) is compared to a
minimal validity coefficient value (e.g., 0, .01, .1). If the value at the 10th percentile is greater than
the minimal value, one can say with 90% confidence that the validity of the scores will generalize to
other populations. The 90% rule, in essence, is simply an acknowledgement from cross-situational
advocates that variance in validities are typically nonzero and, therefore, as long as 90% of the
true validities lie in the positive range, validities can be said to at least minimally generalize across
situations. It has been pointed out that this latter rule is more lenient and, therefore, oriented toward
detecting cross-situational consistency, whereas the more strict 75% rule is directed toward testing
and (potentially) ruling out situational specificity (James et al., 1992; Murphy, 2000).

EVALUATING VG
On the basis of the equations above, it can be deduced that there are no substantive situational variables
explicitly taken into account in VG analysis. Only basic statistical properties of the measurement
scores are corrected. There are no situational variables measured and incorporated into the model
to ascertain which situational variables influence a particular predictor-criterion relationship and the
validity coefficient that corresponds to this relationship. Furthermore, it is worth noting that a logical
assumption underlying this residualized model of the SSH testing is that situational factors and
statistical artifacts are independent in the population, an assumption that has only recently received
critical attention (James et al., 1992; Raju, Anselmi, Goodman, & Thomas, 1998; Thomas & Raju,
2004). Elaboration of this assumption and its implications will be addressed later in the Conclusions.
Several investigators would say that the VG approach has been a success (e.g., Schmidt et al.,
1993). First of all, consider that the criterion-related validity estimates of individual traits, such
as general mental ability and personality, for predicting job performance was typically cited in
the 1950s to late 1970s as being in the .20–.30 range, maxing out at .40 (Ghiselli, 1966). With the
advent of the statistical corrections applied in VG analyses, these estimates have increased as much
as 120%. This is a result of a shift in the maximum percentage of variance in criterion (true) scores
accounted for by predictor (true) scores from 16% to upwards of 36% (Hunter & Hunter, 1984).
Similar trends have been observed for other types of predictor-criterion pairs such as integrity tests
predicting counterproductive work behaviors, which demonstrates a mean estimated population
validity coefficient of .41 (i.e., Ones et al., 1993).
But how successful has VG been in determining the extent to which these estimates are
generalizable across populations (i.e., organizations)? Strong proponents of VG posit that essentially
all of the variance of validity coefficients can be accounted for by the statistical artifacts incorporated
into the VG model such that, for example, a general mental ability measure will be equally predictive
of job performance for a particular job type in all organizations (Schmidt et al., 1988; Schmidt et al.,
1993). Indeed, VG corrections, particularly sampling error, has accounted for up to 100% of the
variance (sometimes more than 100%) in validity estimates across studies for certain job types
(i.e., Salgado et al., 2003; Schmidt et al., 1993). On the other hand, corrections have accounted for
as little as 16% in the variance in validities in other meta-analytic samples (Ones et al., 2003). For
the SSH to be disconfirmed, there should be a negligible amount of residual variance after the variance
originating from statistical artifacts has been removed. According to the 75% rule, there should be
no more than 25% of the total variance remaining, which would indicate situational specificity in
the latter example. Nonetheless, as previously mentioned, proponents of cross-situational consistency
914 Handbook of Employee Selection

generally use the 90% confidence rule to frame results in terms of the degree to which validities
are applicable across studies. Thus, the presence of any residual variance, even as high as 25%, is
perceived by these investigators as conveying little practical significance. Instead, they argue that
the predictive utility of the KSA or trait score applies, at least to some degree, to all organizations
(i.e., Murphy, 2000; Schmidt et al., 1993). In other words, despite evidence in support of situational
specificity (i.e., violation of the 75% rule), simultaneous and yet conceptually conflicting support for
cross-situational consistency can be obtained (i.e., via the 90% confidence rule) with any residual
variance being dismissed as arising from methodological or statistical issues. Indeed, the extent to
which situational specificity is explored ends here. A salient and distinguishing feature of the cross-
situational approach in VG analysis, and meta-analytic validity studies in general, is that situational
causes of meaningful differences in validities among populations are not fully addressed, and it is
this issue on which much of the following concerns are based.

SITUATIONAL SPECIFICITY RESPONSE TO VG


It might be concluded from the preceding section that there is little to no room allowed for situational
specificity in a VG analysis when focus is placed on cross-situational consistency of validities.
Situational variables are excluded from statistical modeling in VG, thereby requiring an assumption of
independence between statistical artifacts and situational factors to (indirectly) estimate the presence
of situational specificity. Furthermore, even when residual variance does accumulate, this phenomenon
is typically written off by VG advocates as originating from additional statistical artifacts rather than
acquiescing to the situational variables to explain even a small portion of residual variance. The VG
estimation approach based on residualization is disconcerting to proponents of the SSH, who argue
that situational variables should be measured and tested as moderators before these variables are
rejected as sources of situational specificity. The decision to disregard this step of the analysis brings
about several problematic implications for the interpretability of the aggregate validity coefficients
produced by VG analyses. Of note are concerns, raised by VG critics, that target the rigor of VG
analysis with respect to cross-situational consistency hypothesis testing and the implications of these
practices for the SSH. These concerns are as follows: (a) insufficient power to detect the presence of
situational factors in meta-analysis, (b) the implications of conceptualizing validity as discrete (i.e.,
a value is either valid or invalid) versus continuous in nature, (c) the consequences of violating the
assumption of independence of situational influences from statistical artifacts, and (d) the need for
more substantive investigations of situational specificity (i.e., suggestions for future endeavors).

INSUFFICIENT POWER TO DETECT MODERATORS


The first concern with VG analysis involves the power to detect situational specificity. Originally,
adequate sample size is presented as crucial to supporting the VG hypothesis (Schmidt & Hunter,
1977). Yet, it is later suggested that if sample size is too large, situational factors are more easily
detectable (Schmidt, Hunter, & Pearlman, 1982). Indeed, inherent in several VG empirical findings
is a negative relationship between the amount of variance explained by statistical artifacts and
the sample size (number of participants in each sample) and meta-analytic sample size (number
of samples; i.e., organizations), respectively (i.e., Salgado et al., 2003; Schmidt et al., 1993). For
example, Salgado et al. (2003) found that general electricians and telephone technicians were the
only two populations in which 100% of the variance in validity estimates for cognitive ability was
resolved by VG corrections. These two job-type samples also happened to have the smallest sample
sizes (k = 7 and 6, respectively; N = 413 and 351, respectively).
Similar patterns emerged for other predictor-criterion relationship such as between cognitive
ability and training effectiveness. Schmidt et al. (2003) found that the percent variance explained by
artifacts was roughly proportionate to the sample size and study sample size such that the smaller the
sample sizes, the larger the proportion of variance accounted for by statistical artifacts. Specifically,
Situational Specificity and Validity Generalization 915

in a VG analysis of validities for general mental ability predicting job performance, there were 50
more samples (3,394 more employees) in typing and filing occupations than in samples engaged in
production and stock occupations. Yet, the proportions of variance in validities explained by VG
corrections are 377% for production and stock clerks (the smaller sample) and 76% for typing and
filing clerks (the larger sample). Similar findings emerged for other predictors of performance such
as verbal ability, quantitative ability, and perceptual speed. These findings suggest that situational
factors may exert impact on validity estimates but that most VG studies based on small samples
have insufficient power to detect these effects, a concern which has been voiced elsewhere (i.e.,
James et al., 1986; James, Demaree, Mulaik, & Mumford, 1988; Murphy, 2000). Thus, adequate
power (i.e., sufficiently large sample sizes) is advised when attempting to test the SSH. Insufficient
power to detect moderation should preclude an interpretation in favor of or against the SSH.

NATURE OF VALIDITY: DISCRETE VERSUS CONTINUOUS


An important point made by Murphy (2000) is that early VG research coincided with a time when
validity was generally regarded as discrete and dichotomous in nature. Applied researchers and
laymen (e.g., lawyers; judges involved in adverse impact cases) typically ended their evaluations of
a validity coefficient with a verdict of “valid” versus “nonvalid.” Very little attention was given
to gradations in validity. As a result, VG-corrected validities have typically been reported as
being generalizable or not. In particular, cross-situational consistency in validities for a particular
predictor-criterion pair was, and continues to be, decided by comparing the amount of corrected
variance in validities with a critical, cut-off value. As previously noted, this comparison of observed
validities with a critical value typically corresponds to the 90% confidence rule or 75% rule. If the
amount of residual variance exceeds 25% or if less than 90% of the validity distribution lies above
a lower cutoff (i.e., ρk = .00 or .10), then the validity is said to be situationally specific; otherwise,
the validity is said to be generalizable across situations. A range of cross-situational consistency,
including the degree to which a minimum cutoff is exceeded or approached by observed data (i.e.,
computations of ρk and V(ρ)), is not considered in VG analysis.
Strict adherence in VG to a dichotomous interpretation of validities may create further obstacles
to detecting situational specificity. For example, as previously mentioned, researchers would generally
agree that the size of validities for any particular trait (e.g., KSA or personality trait) against a
particular criterion (e.g., job performance) will differ as a function of job type such that a different
set or pattern of KSAs and personality are optimal for different job roles on the basis of the cognitive
and motivational demands of the job duties. Furthermore, differences in validities for predictor-
criterion pairs for a certain job, between situations (i.e., organizational populations), is calculated,
and the proportion of this variance attributable to statistical artifacts is compared to a prespecified
cut-off value (e.g., 75%, 90% confidence). However, what is not addressed in VG studies are the
differences among job types in the exact amount of between-situation variance attributable to statistical
artifacts. Indeed, although one may posit that 75% of the variance in validities must be resolved by
statistical artifact corrections, no inferences are made regarding differences among job types in the
amount of variance explained.
This phenomenon is largely overlooked despite evidence of its presence in previously published
VG findings (i.e., Salgado et al., 2003; Schmidt et al., 1993). Salgado et al. (2003) found that the
variance in validity estimates for the general mental ability-job performance relationship was fully
accounted for by artifacts in several job types (e.g., engineer, sales, skilled worker) (100% variance
explained). In contrast, at least three job types (manager, police, and typing) were left with 53–69%
of the variance in validities unexplained (i.e., not due to artifacts).
Schmidt et al. (1993) showed that the amount of variance in validities for general mental
ability-job performance relationships attributable to artifacts was strikingly different for computing
and accounting occupations (54%) and production and stock clerks (377%). Similar patterns emerged
among different occupations for other predictors such as verbal ability, quantitative ability, and
916 Handbook of Employee Selection

perceptual speed. Therefore, it appears that for each predictor-criterion pair, there is a range of
validities that do not necessarily represent only sampling error and methodological differences
among studies; rather, one natural and substantive dimension on which these validities differ
is job type. A subsequently reasonable question to pose is “Why?” Perhaps the origin of these
varying validities, after removing statistical artifacts, lies in situational factors such as interpersonal
communication, supervision over other employees versus subordinate, and reward structure, which
inherently vary across job types. However, certain jobs may be more or less vulnerable to situational
factors on the basis of the degree of complexity in one’s social and interpersonal environment (i.e.,
little communication required, self-employed, small group, steepness of organizational hierarchy).
Further examination of validity and cross-situational consistency as a gradient may illuminate the
moderating impact of situational factors, serving as a more sensitive and informative “test” than the
dichotomous all or none 75% rule.

ASSUMPTION OF INDEPENDENCE IN VG MODEL


Several investigators have now addressed the implicit assumption in VG that the effects of situational
variables and statistical artifacts on validity coefficients must be independent (Burke, Rupinski,
Dunlap, & Davison, 1996; James et al., 1986; James et al., 1992; Raju et al., 1998; Thomas & Raju,
2004). James et al. (1992) made the argument that variations in the restrictiveness of climate would
likely engender variations in criterion reliability. Restrictiveness of climate encompasses various
environmental factors such as authority structure, standardization of job tasks and procedures, and
reward structure (James et al., 1992). A highly restrictive climate (i.e., strict rules, guidelines, steep
hierarchical structure, reward system not based on individual merit) would likely contribute to a
decreased expression of individual differences amongst employees on performance because of a
tendency toward compliance and conformity. This should, in turn, attenuate criterion reliability
and any relationship between these variables and job functioning (i.e., true validity) that might have
been extant in, say, a less restrictive climate (i.e., open communication, fewer restrictions and rules,
reward system based on individual merit).
If a situational variable such as restrictiveness of climate jointly affects the magnitudes of validities
and criterion reliabilities, then the VG model is likely to include a covariance between validities and
the reliabilities. Covariation between validities and a statistical artifact such as criterion reliability
challenges the assumption of independence between these factors, which is an assumption that most
VG estimation equations rely on. A covariation between validities and criterion reliabilities implies
that removing variance in validities associated with variance in criterion reliabilities quite possibly
also entails removing variance due to situational factors such as restrictiveness of climate. This is
because variation in the situational variable (climate) serves as a common cause for variation in
validities and criterion reliabilities. To remove variance due to reliability is to remove variance due
to its causes—the situational variable. It follows that one is likely to erroneously reject the SSH.
In response to a concern of interdependencies among validities and statistical artifacts, two alter-
native statistical models were introduced (James et al., 1992; Raju et al., 1998). Although James
et al.’s (1992) proposed model addressed the former assumption of independence directly, Raju
et al.’s (1991) model attempted to circumvent the problem engendered by lack of independence.
Specifically, James et al.’s (1992) model deleted the assumption of independence by including cova-
riance terms for covariances between validities and all of the statistical artifacts included in the
VG correction. Raju et al.’s (1991) model corrects for unreliability, attenuation, and sampling error
within each individual sample before averaging observed rxy values across studies. Therefore, violation
of the assumption within studies is no longer an issue, although violation of the assumption across
studies remains unresolved (Thomas & Raju, 2004).
Thomas and Raju (2004) tested and compared the accuracy of these two approaches. Although
no comparison was made between results obtained by application of the James et al. (1992) model
versus the traditional VG estimation equations, Raju et al.'s (1998) model demonstrated that their
Situational Specificity and Validity Generalization 917

model surpassed the traditional VG model in accuracy. Furthermore, the Raju et al. (1998) model
demonstrated comparable properties to the James et al. (1992) model in accurately estimating
validity coefficients, albeit with slightly more stable estimates (i.e., lower variance in estimates
across samples). This latter finding can be perceived as bolstering support for the SSH in James
et al.’s estimation because not only does the procedure provide levels of estimate accuracy similar to
other methods that have exceeded traditional VG techniques in accuracy, but the residual variances
of these estimates, which can be interpreted as arising from situational influences, increases. Of
course, neither model identifies which, nor in what way, situational variables serve to moderate
validities.
Although inclusion of a conceptual model in VG analysis has not been attempted, a study by
Burke, Rupinski, Dunlap, and Davison (1996) addressed the substantive nature of the indepen-
dence assumption violation inherent in VG analysis by testing James et al.’s (1992) proposition that
restrictiveness of climate contributes to violation of the assumption via its dual impact on statistical
artifacts. Burke et al. (1996) examined the impact of organizational climate on the range restrictions
and internal consistencies (coefficient alpha) for the predictor (job satisfaction) and criterion (job
performance), respectively, across 537 sales organizations. Organizational climate—specifically,
restrictiveness-of-climate—was measured via aggregated (store-level) employee perceptions (self-report)
on five first-order factors (goal emphasis, management support, nonmonetary reward orientation,
means emphasis general, means emphasis specific) of the higher-order concern for employees factor
of an organizational climate measure (Burke, Borucki, & Hurley, 1992).
Contrary to expectations, Burke et al. (1992) found that a more restrictive climate was associated
with more, rather than less, variability in the predictor and criterion scores, although only a small
subset of these correlations was significant. Furthermore, restrictiveness of climate demonstrated
only three significant correlations with internal consistency (i.e., coefficient alpha values) for the
predictor and criterion measurement scores, and one of these was in the opposite of the proposed
direction. It is worth noting that, for internal consistency, the two significant correlations in the
hypothesized direction both occurred for one particular first-order factor—management support.
This factor, relative to the remaining factors, most closely matched the restrictiveness-of-climate
conceptualization provided by James et al. (1992) in which trust and encouragement from manage-
ment serve as the integral forces behind the level of restrictiveness and, subsequently, the range of
individual differences manifested in the workplace. Furthermore, the predictor included in the analysis
was job satisfaction. This is perhaps not the most appropriate predictor to use in selection testing,
which usually relies on KSAs or personality. Finally, the sample was limited to sales organizations.
Therefore, although this study provides a first step toward examining the magnitude of influence that
situational factors exert on statistical artifacts and validities, additional studies are warranted.

SUBSTANTIVE TESTS OF SITUATIONAL SPECIFICITY


Stronger tests are needed of the SSH. In particular, effort needs to be given to explicitly measuring
situational variables to identify those variables that impact validities, as well as to determine their
relative strength of impact. As reviewed, support for the SSH has been found in meta-analyses in
which several situational variables moderate the validities of predictor-criterion pairs. These mod-
erators include job complexity, level of autonomy, and whether or not employees worked in a team
setting. An examination of the patterns of results engendered by these moderators for cognitive
ability, KSAs, and personality reveals the following. For cognitive ability and KSAs predicting job
performance, the predominant moderator is complexity of job duties and tasks (e.g., Salgado et al.,
2003). Alternatively, elements of social environment (e.g., team-orientation of organization or job,
level of autonomy) appear to play a larger role in modifying validities for certain personality-job
performance relationships (e.g., Barrick & Mount, 1991, 1993; Mount et al., 1998; Stewart, 1996;
Stewart & Carson, 1995). Otherwise, no further trends have been uncovered and no theoretical
framework for interpreting these findings has been offered.
918 Handbook of Employee Selection

A theoretical framework that has been proposed for situational specificity pertains to restrictiveness
of climate (James et al., 1992). Although this framework was introduced earlier in this review, it was
largely for the purpose of discussing potential assumption violations in VG. However, its possible
merit as an explanatory framework is based on implications that the predictive validities of lower-
level individual-level traits and KSAs/personality are affected by the restrictiveness of the organiza-
tional climate. An example is the conscientiousness-job performance relationship. For organizations
with nonrestrictive climates, in which reward structures are typically built around merit and effort,
it may be that the mean relationship between consciousness and job performance tends toward the
upper range of validities typically found in this type of research. On the other hand, these values
may drop to near zero, or at least nonsignificance, in a highly restrictive organization because of an
emphasis on tenure and seniority, rather than just hard work, in the organization’s reward structure.
If there is a substantial range in restrictiveness in the general population of organizations included
in a meta-analysis, then there may also be a similarly wide range in validities across studies. If the
variance in these validities can be statistically accounted for by variations in scores on an organiza-
tional-level measure of restrictiveness of climate, then the variance in validities may be statistically
controlled. All that is necessary is to include the situational variable(s) in the VG equations and
analyses. This, of course, requires a revised VG model, but such a model is easily developed.

CONCLUSIONS
In responding to the critiques of VG posed by James et al. (1986), Schmidt et al. (1988, p. 666)
stated, “they [James et al., 1986] did not recognize that it is the higher level pattern of findings across
numerous VG analyses that is important in revealing the underlying reality.” Indeed, replications of
findings are crucial to building a theory or stable explanation concerning a particular (set of) behav-
ioral phenomenon. In this sense, many studies have demonstrated that the VG approach is effective
in improving cross-situational consistency in validity coefficients through statistical corrections for
measurable artifacts (i.e., reliability, range restriction, and sampling error) (i.e., Schmidt et al., 1993;
Hunter & Schmidt, 1998). However, numerous VG studies also demonstrate a pattern of inconsistency,
particularly across jobs, in the degree to which adjustments in statistical artifacts improve validities.
Moreover, in these studies, it is often the case that little variance is accounted for by corrections for
statistical artifacts. Specifically, examination of individual meta-analyses reveals that there is a size-
able amount of unexplained variance in validities after controlling for statistical artifacts and that the
amount of variance accounted for is also a function of job type (Algera et al., 1984; Salgado et al.,
2003; Schmidt et al., 1993). These patterns, when taken together, suggest that variation in statistical
artifacts are likely responsible for only a portion of the total fluctuation in validity estimates across
studies. Additional factors remain unmeasured and need specification, particularly contextual (i.e.,
situational) factors. At this point, research suggests that differences in job type and task complexity
moderate the relationship between personality/cognitive ability and job behaviors (Barrick & Mount,
1991, 1993; Mount et al., 1998; Russell, 2001; Salgado et al., 2003; Schmidt et al., 1993; Stewart,
1996; Stewart & Carson, 1995). It is likely that many more situational variables are operative.
Unfortunately, up to this point, the impact of situational factors on validities tends to be examined
indirectly. VG studies typically direct their attention toward what is considered to be more influential
causes of variability of validity estimates—namely, statistical artifacts. The degree of situational
specificity is addressed by examining the residual variance after removing what is alleged to be
attributable to the statistical artifacts. It seems plausible that the results of VG analyses, as helpful as
they may be for illuminating the impact of statistical artifacts on validity estimates, are incomplete
and often times misleading because of both the lack of specific inclusion of situational variables in
analyses and indirect impact of situational variables on validity mean and variance estimates. The
potential for situational moderators at the meta-analytic level are not only possible but also likely.
However, until situational factors are explicitly measured and included in VG analyses, conclusions
regarding situational specificity cannot be safely and soundly made.
Situational Specificity and Validity Generalization 919

REFERENCES
Algera, J. A., Jansen, P. G. W., Roe, R. A., & Vijn, P. (1984). Validity generalization: Some critical remarks on
the Schmidt-Hunter procedure. Journal of Occupational Psychology, 57, 197–210.
Barrick, M. R., & Mount, M. K. (1991). The Big Five personality dimensions and job performance: A meta-
analysis. Personnel Psychology, 44, 1–26.
Barrick, M. R., & Mount, M. K. (1993). Autonomy as a moderator of the relationships between the big five
personality dimensions and job performance. Journal of Applied Psychology, 78, 111–118.
Burke, M. J., Borucki, C. C., & Hurley, A. E. (1992). Reconceptualizing psychological climate in a retail
service environment: A multiple-stakeholder perspective. Journal of Applied Psychology, 77, 5,
717–729.
Burke, M. J., Rupinski, M. T., Dunlap, W. P., & Davison, H. K. (1996). Do situational variability act as substan-
tive causes of relationships between individual difference variables? Two large-scale tests of “common
cause” models. Personnel Psychology, 49, 573–598.
Epstein, M. W., Mischel, W., & Peake, P. K. (1982). Beyond déjà vu in the search for cross-situational consis-
tency. Psychological Review, 89, 730–755.
Ghiselli, E. E. (1959). The generalization of validity. Personnel Psychology, 12, 397–402.
Ghiselli, E. E. (1966). The validity of occupational aptitude tests. New York, NY: Wiley.
Ghiselli, E. E. (1973). The validity of aptitude tests. Personnel Psychology, 26, 461–477.
Hermelin, E., & Robertson, I. T. (2001). A critique and standardization of meta-analytic validity coefficients in
personnel selection. Journal of Occupational and Organizational Psychology, 74, 253–277.
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance.
Psychological Bulletin, 96, 72–98.
Hunter, J. E., & Schmidt, F. L. (2000). Fixed effects vs. random effects meta-analysis models: Implications for
cumulative research knowledge. International Journal of Selection and Assessment, 8, 275–292.
James, L. R., Demaree, R. G., & Mulaik, S. A. (1986). A note on validity generalization procedures. Journal of
Applied Psychology, 71, 440–450.
James, L. R., Demaree, R. G., Mulaik, S. A., & Ladd, R. T. (1992). Validity generalization in the context of
situational models. Journal of Applied Psychology, 5–14.
James, L. R., Demaree, R. G., Mulaik, S. A., & Mumford, M. D. (1988). Validity generalization: A rejoinder to
Schmidt, Hunter, and Raju (1988). Journal of Applied Psychology, 73, 673–678.
Levine, E. L., Spector, P. E., Menon, S., Narayanan, L., & Cannon-Bowers, J. (1996). Validity generalization
for cognitive, psychomotor, and perceptual tests for craft jobs in the utility industry. Human Performance,
9, 1–22.
Mischel, W. (1968). Personality and assessment. New York, NY: John Wiley.
Mount, M. K., Barrick, M. R., & Stewart, G. L. (1998). Five-factor model of personality and performance in
jobs involving interpersonal interactions. Human Performance, 11, 145–165.
Murphy, K. R. (2000). Impact of assessments of validity generalization and situational specificity on the science
and practice of personnel selection. International Journal of Selection and Assessment, 8, 194–206.
Ones, D. S., Viswesvaran, C., and Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validi-
ties: Findings and implications for personnel selection and theories of job performance. Journal of
Applied Psychology, 78, 679–703.
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (2003). Personality and absenteeism: A meta-analysis of integ-
rity tests. European Journal of Personality, 17, S19–S38.
Pearlman, K., Schmidt, F. L., & Hunter, J. E. (1980). Validity generalization results for tests used to predict job
proficiency and training success in clerical occupations. Journal of Applied Psychology, 65, 373–406.
Peters, L. H., Fisher, C. D., & O’Connor, E. J. (1982). The moderating effect of situational control of per-
formance variance on the relationships between individual differences and performance. Personnel
Psychology, 35, 609–621.
Raju, N. S., Anselmi, T. V., Goodman, J. S., & Thomas, A. (1998). The effect of correlated artifacts and true
validity on the accuracy of parameter estimation in validity generalization. Personnel Psychology, 51,
453–465.
Raju, N. S., & Burke, M. J. (1983). Two new procedures for studying validity generalization. Journal of Applied
Psychology, 68, 382–395.
Raju, N. S., Burke, M. J., Normand, J., & Langlois, G. M. (1991). A new meta-analytic approach. Journal of
Applied Psychology, 76, 432–446.
Russell, C. J. (2001). A longitudinal study of top-level executive performance. Journal of Applied Psychology,
86, 560–573.
920 Handbook of Employee Selection

Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., de Fruyt, F., & Rolland, J. P. (2003). Criterion validity
of General Mental Ability measures for different occupations in the European community. Journal of
Applied Psychology, 88, 1068–1081.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychol-
ogy: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124,
262–274.
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generaliza-
tion. Journal of Applied Psychology, 529–540.
Schmidt, F. L., Hunter, J. E., & Pearlman, K. (1982). Progress in validity generalization: Comments on Callender
and Osburn and further developments. Journal of Applied Psychology, 67, 835–845.
Schmidt, F. L., Hunter, J. E., Pearlman, K., & Rothstein-Hirsh, H. (1985). Forty questions about validity gen-
eralization and meta-analysis. Personnel Psychology, 38, 697–798.
Schmidt, F. L., Hunter, J. E., & Raju, N. S. (1988). Validity generalization and situational specificity: A second
look at the 75% rule and Fisher’s z Transformation. Journal of Applied Psychology, 73, 665–672.
Schmidt, F. L., Law, K., Hunter, J. E., Rothstein, H. R., Pearlman, K., & McDaniel, M. (1993). Refinements
in validity generalization methods: Implications for the situational specificity hypothesis. Journal of
Applied Psychology, 3–12.
Stewart, G. L. (1996). Reward structure as a moderator of the relationship between extraversion and sales
performance. Journal of Applied Psychology, 81, 619–627.
Stewart, G. L., & Carson, K. P. (1995). Personality dimensions and domains of service performance: A field
investigation. Journal of Business and Psychology, 9, 365–378.
Thomas, A., & Raju, N. S. (2004). An evaluation of James et al.’s (1992) VG estimation procedure when artifacts
and true validity are correlated. International Journal of Selection and Assessment, 12, 299–311.
43 Employee Selection in Europe
Psychotechnics and the
Forgotten History of Modern
Scientific Employee Selection
Jesús F. Salgado, Neil R. Anderson, and
Ute R. Hülsheger

INTRODUCTION
First of all, we must commence with a seemingly naïve clarification. Europe, as a whole, is a con-
tinent consisting of dozens of independent countries, with many different languages, legal regula-
tions, cultures, history, and, indeed, employee selection frameworks (Shimmin, 1989; see also Myors
et al., 2008). Just for illustrative purposes it must be taken into account that Europe has more coun-
tries than North, Central, and South America combined. The situation, therefore, is totally different
from that in the United States in relation to the cultural, legal, and national influences upon employee
selection procedures (Shimmin, 1989). Nevertheless, in spite of this heterogeneity, there was an
important exchanging of ideas across Europe during the 20th century. Three basic mechanisms were
used for this exchange. The Congresses of the International Psychotechnics Association (later called
International Association of Applied Psychology) were the first and most important mechanism. The
exchange of national journals was the second mechanism. Finally, research visits to the psychotech-
nical laboratories across Europe was constant during the first 30 years of the 20th century.
To present a complete and comprehensive account of the history of employee selection in Europe
is a titanic job beyond the scope of the present chapter. Consequently, in the following pages, we
will present a general but limited review of the main historical milestones of European employee
selection during the 20th and into the 21st century. We will divide the chapter in three parts:
(a) The European Origin of Scientific Employee Selection: The Years of Success (1900–1945);
(b) The Fall of the European Employee Selection After World War II: The Years of Decline and
Stagnation (1945–1980); and (c) The Resurgence of European Employee Selection: The Years of
Optimism and Expansion (Since 1980).

THE EUROPEAN ORIGIN OF SCIENTIFIC EMPLOYEE


SELECTION: THE YEARS OF SUCCESS (1900–1945)
Employee selection research has a large tradition in Europe, and it is not an error to argue that
scientific personnel selection was born in Europe and translated to the United States by Hugo
Münsterberg, although not all researchers or practitioners internationally would agree with the last
part of this affirmation. For example, it is frequently mentioned in handbooks and manuals that the
German psychologist Hugo Münsterberg, a professor of psychology at Harvard University between
1892 and 1916, with exception of the important period 1911–1912 at the University of Berlin, is
the founding father of employee selection (e.g., Spector, 2000; Chmiel, 2000). Some writers even
go as far as to suggest that employee selection is American in its inception (e.g., see Landy, 1992).
However, this last point of view neglects the fundamental European contribution in the historical

921
922 Handbook of Employee Selection

process of the development of the scientific and practical discipline of employee selection. It seems
that the sad destiny of the work of the first European contributors to employee selection is obscurity.
For example, two recent and very well documented historical accounts of employee selection and
I-O psychology outside of the United States ignore practically all of the European contributions
between 1900 and 1940 (Vinchur, 2007; Warr, 2007). Reading these accounts, it seems that the
European contributions were a simple anecdote. However, the reality was very different.
Some years before Münsterberg began his experiments on employee selection, researchers had
already conducted personnel examinations in several European countries. For example, starting in
1901, the Italian psychologist Ugo Pizzoli carried out professional examinations of the apprentices
in Modena (Italy) (Baumgarten, 1949; Carroll, 1951). In France, scientific employee selection began
in 1900 when Eduard Toulouse, director of the Experimental Psychology Laboratory at the Ècole
des Hautes Etudes (School of High Studies), proposed a project for professional selection to the
Ministry of Work and asked Jean Marie Lahy to conduct it. Lahy began his task in 1903, and
the first articles were published in 1905 in the Revue Scientifique and in 1910 in the Bulletin de
l’Inspection du Travail (Lahy 1922; Toulouse, 1927). Between 1905 and 1908, Lahy provided the
first steps toward developing a job analysis method and carried out preliminary experiments on the
selection of streetcar operators. In the same years, Imbert, professor at the University of Montpellier
(France), was developing a method for selecting engineers and reported his proposals during the
Congress of Psychology at Strasbourg in 1909 (Lahy, 1922). Lahy also proposed a general method for
employee selection that is, in synthesis, very similar to the current method for validating employee
selection procedures (later termed “orthodox personnel selection” by Guion, 1976). The first step
of the method consisted of determining the psychophysiological features characteristic of highly
proficient workers. To do so, two groups of workers should be examined: the first group consists
of the elite and the second one consists of mediocre workers. Both groups should complete tests
measuring psychological functions demanded by the occupation and the psychological aptitudes
to obtain group norms. The second step of the method, following the suggestion by the Russian
psychologist Rossolimo, consists of applying the same tests to the job candidates and to classify
them according to the norms in order to determine their “psychological profile.” Lahy applied this
method for selecting typists in 1912 and for selecting army gunners in 1917 during World War I
(Lahy, 1922). In 1907, the Zeitschrift für angewandte Psychologie [Journal of Applied Psychology],
edited by Otto Lipmann and William Stern, appeared in Germany. This was the first journal in
the world (together with the Rivista di Psicología Applicatta edited since 1905 by G. C. Ferrari in
Italy) devoted to the application of psychological knowledge, with special focus on work-related
issues. In summary, a complete program for employee selection, and more generally for work and
organizational psychology, was operating in many European countries since 1907, 5 years before
Münsterberg began his experiments.
In effect, Münsterberg first described his studies on the selection of drivers and his program
for industrial psychology—the so-called psychotechnics—in 1912 when he was lecturing as an
exchange professor at the University of Berlin. In April 1912, during the Fifth German Congress of
Experimental Psychology in Berlin, Münsterberg briefly mentioned his experience with two tramway
drivers, two telephone employees, two civil servants, and with other workers from various industries
as a comment after Karl Marbe’s presentation (Münsterberg, 1912a; Menzerath, 1913). The same
year, Münsterberg published in Leipzig (Germany) the book Psychologie und Wirtschaftsleben: Ein
Beitrag zur angewandten Experimental Psychologie (Münsterberg, 1912b) that was subsequently
translated to English with the title Psychology and Industrial Efficiency and was first published
in Boston in 1913. His points of view were subsequently developed in his book Grundzüge der
Psychotechnik [Compendium of Psychotechnics], which appeared in 1914. Therefore, Germany was
the first country receiving Münsterberg’s development in employee selection, and German was the
first language for his studies, for which Münsterberg received the first feedback, comments, and
support from the European and German applied psychologists. Moreover, the work by William
Stern on individual differences formed the basis of the main ideas of Münsterberg’s vocational
Employee Selection in Europe 923

selection. Stern (1903) was also the creator of the term “psychotechnics” and published the seminal
book Die differentielle Psychologie in ihren methodischen Grundlagen [Differential Psychology in
its Methodological Foundations] (1911), which served as basis for Münsterberg’s program. William
Stern proposed that four basic tasks were the object of differential psychology: (a) the variation
theory, testing the variation of features within many individuals; (b) the correlation theory, testing
the correlation between various features; (c) psychography, investigating individuals’ characteris-
tics features; and (d) the comparison theory for comparing individuals.
The complete European psychotechnical model was in use since the 1920s (Giese, 1925;
Chleusebairgue, 1934). According to Giese (1925), two types of psychotechnics could be distin-
guished: subject psychotechnics and object psychotechnics. Subject psychotechnics consisted of
selection and placement of employees, training, vocational and career counseling, and personnel
advising and guidance. Object psychotechnics, on the other hand, consisted of studies on fatigue,
time and motion, accident prevention, and advertising.
Regarding the use of psychological procedures for examining individuals, Lahy began to use
cognitive tests for selecting drivers in France since 1908. Walter Moede and Curt Piorkowski, Otto
Lipmann, and William Stern used similar tests in Germany from 1914 onwards (Grundlach, 1999).
Agostino Gemelli used psychophysiological measures for selecting military pilots during World
War I in Italy (Lahy, 1922; Baumgarten, 1949). Emilio Mira used attention and perceptual tests
for selecting drivers in Spain from 1918 (Mira, 1922). Pierre Bovet carried out personnel selection
programs for mechanics and dressmakers (Lahy, 1922) and Walther (1929) used tests for selecting
machine tending and manual workers (wrappers and packers) in Switzerland. Cyril Burt, the mem-
bers of the National Institute of Industrial Psychology (NIIP; e.g., Winifred Spielman, G. Miles) and
the members for the National Board of Health (e.g., Eric Farmer) developed and used many cogni-
tive tests in the United Kingdom as early as 1914 (see Kwiatkowski, Duncan, & Shimmin, 2006;
Shimmin & Wallis, 1994, for a comprehensive account of the NIIP). Many more examples from
other European countries could be cited, but our central point remains the same—that the founda-
tions of scientific employee selection are European in origin and not North American, and these
principles, which remain in common use today, have been around for over a century now.
A consequence of the popularity of psychological procedures for employee selection was that
hundreds of criterion validity studies were carried out and reported during the first half of the
20th century in European countries, and they were largely used in civil and military contexts (e.g.,
see the journals Industrielle Psychotechnik, Le Travail Humain, Journal of National Institute of
Industrial Psychology, and the Proceedings of the International Association of Psychotechnics,
mentioning just a few of them). Three criteria were generally used in those validity studies: job
performance, training proficiency, and accidents.
In 1920, following an idea by Bovet, Director of the Jean Jacques Institute of Geneva, the
first International Conference of Psychotechnics was held in Geneva (Switzerland) under Eduard
Claparède’s presidency (see the comprehensive collection of the 13 volumes edited by Grundlach,
1998). Claparède was also the first president of the International Association of Psychotechnics
founded this same year. Later, in 1952, this association was renamed into the current International
Association of Applied Psychology (IAAP). In this early period, psychotechnics was primarily
associated with the use of apparatus tests aimed at assessing individual characteristics. In the early
1920s, various European companies started selling psychotechnical testing equipment while lead-
ing psychotechnicians constructed their own devices on the basis of their own ideas regarding the
psychological requirements of various vocations and the most suitable instruments for measuring
them. The use of this sophisticated equipment contributed significantly to the reputation of psycho-
technics as a solid and scientific enterprise. In the early years, the most well-known psychotechni-
cians considered one of their primary tasks the development of instruments that would enable them
to assess vocational aptitude as precisely as possible. This is strikingly close to the present-day role
of test and product developers in psychological consultancies. A list of further, relevant European
precursors in the field of employee selection is described in Table 43.1.
924 Handbook of Employee Selection

TABLE 43.1
List of Representative Psychologists in Europe Working on Personnel
Selection Before 1980
Belgium
Christiaens, Arthur G. Ferrari, Gulio C.
Brabant, V. Filipinni, Azeglio
Decroly, Ovide J. Gemelli, Agostino
Piret, Roger Pizzoli, Ugo
Coetsier, Leo Ponzo, Mario
Biegel, Rebekka A. Sanctis, Sante de
Spaltro, Enzo
Czechoslovakia
Forster, William Latvia
Seracky, Frank Möller, Johan
Vana, Joseph
The Netherlands
Denmark Algera, Jan
Eriksen, Erik C. De Wolff, Charles
Drenth, Peter
France
Roe, Robert A.
Bacqueyrisse, Louis
Bonnardel, Raymond Poland
Camus, Pierre Wojciechowski, Jean
Faverge, Jean Marie Joteiko, Josephine
Goguelin, Pierre Lehmann, Alfred
Lahy, Jean Marie Studencki, Stanislas
Levy-Leboyer, Claude Suchorzewski, Henri
Montmollin, Marie
Romania
Pacaud, Suzanne
Nestor, Jacob-Marius
Palmade, Guy
Soviet Union
Germany
Kolodnaja, A.
Amthauer, Rudolf
Levitof, N.
Arnold, Willen
Luria, A. R.
Couvé, Raymond
Rossolimo, G.
Giese, Fritz
Spielrein, I.
Klemm, Otto
Toltchinsky, A.
Lipmann, Otto
Moede, Walter Switzerland
Piorkowski, Curt Baumgarten, Franzisca
Poppelreuter, Walther Bobet, Pierre
Rupp, Hans Claparede, Eduard
Schuler, Heinz Dachler, Peter
Selz, Otto Fontegne, Julien
Simoneit, Max Suter, Jules
Stern, Eric Walther, Leon
Stern, William
United Kingdom
Tramm, Karl
Anstey, Edgar
Italy Barlett, Frederic C.
Boganelli, Eleuterio Bartram, Dave
Bonaventura, Enzo Burt, Cyril
Diez Gasca, Maria Farmer, Eric
Faneli, Carlo Fletcher, Clive
Employee Selection in Europe 925

TABLE 43.1 (continued)


List of Representative Psychologists in Europe Working on Personnel
Selection Before 1980
Italy United Kingdom
Frisby, Roger Saville, Peter
Herriot, Peter Schackleton, Viv
Myers, C.S. Spielman (later Raphael), Winifred
Parry, John Vernon, Philip E.
Robertson, Ivan T. Warr, Peter

A good example of the advances in psychotechnics for employee selection purposes was the proposal
by the German applied psychologist Lipmann (1922), who suggested that occupations varied according
to their cognitive demands and provided a scheme for classifying jobs. Lipmann’s proposal was very
popular in Europe during the 1920s and was well known amongst American industrial psychologists of
that age (e.g., Bingham, Kornhauser, Viteles). According to Lipmann, differences between occupations
were not solely due to different mental functions but also due to the different intensity with which these
were used. Lipmann proposed a classification of occupations on the basis of the nature of the object on
which the work is made. Consequently, occupations could be distinguished into three groups depending
on whether the action is made on things, people, or concepts (ideas). In essence, this is similar to Fine’s
scheme of functional job analysis, thereby being a very early precursor of the scheme for classifying
occupations used by The Dictionary of Occupational Titles (DOT). Examples of occupations based
on things can be carpenters or watchmakers. Professions involving people can be physicians or
attorneys. Finally, occupations involving concepts can be philosophers or mathematicians. The Spanish
psychologist Mira expanded this classification by incorporating the relation between these dimensions
and cognitive abilities. Thus, according to Mira, things are related to spatial intelligence, people are
related to verbal intelligence, and concepts are related to abstract intelligence.
Making a cursory review of the main advances in European employee selection in its first years,
the following should be mentioned as critical events of the period.

1. According to the exhaustive review by Grundlach (1999), in 1911 a captain of the German
army named Meyer wrote an article in the journal Archiv für Psychologie [Archives of
Psychology] in which he suggested using psychological examinations for recruiting soldiers
(Grundlach, 1999). In 1915, Moede and Piorkowski established the first laboratory in which
psychologists examined truck drivers for the Preussische Gardekorps [Prusian Garde-
Corps] in Berlin. At the end of the war, there were 17 laboratories with tens of thousands
of individuals examined (see Table 43.2 for a summary), and Moede had been appointed as
the chief of the laboratories for the psychological assessment of the truck reserve sections
within the inspection of motor troops (Leiter der psychologischen Prüfungslaboratorien
der Kraftfahr-Ersatz-Abteilungen bei der Inspektion der Kraftfahrtruppen). Two simula-
tion tests were used for assessing the drivers. Together with the selection of drivers, psy-
chological assessment was used for selecting pilots, radiotelegraphers, observers, and those
responsible of sound-localization services since 1916.
Together with Moede and Piorkowski, other psychologists as well known as Eric Stern,
Klemm, Lipmann, Selz, and Wertheimer participated in the army selection program. From
1916 onwards during World War I, William Stern conducted a series of experiments for
selecting female tramway drivers (Stern, 1917; see also Sachs, 1920). He used tests of gen-
eral intelligence, reaction time, concentrated attention, and speed of response. In Germany,
psychotechnical laboratories and testing rooms existed in large organizations as the Berlin
Post Office, the State Railways, A.E.G., Siemens, and Osram since 1922. Selection tests
were used for occupations as different as salesmen, telephonists, and railway workers
926 Handbook of Employee Selection

TABLE 43.2
Some Results of the Aptitude Testing Program for Assessing
Drivers in the German Army Between 1915 and 1918
Highly Moderately Tentatively Sum of
Qualified Qualified Qualified Qualified Unqualified Withdrawal of
Total individuals Individuals Individuals Individuals Individuals Driver’s License
Laboratory Individuals (%) (%) (%) (%) (%) (%)
Berlin 1,650 8 76 10 94 6 5
Stettin 983 10 62 25 97 3 9
Pose 134 10 43 29 82 18 35
Breslau 1,657 4 66 20 90 10 6
Düsseldorf 2,063 13 65 15 93 7 10
Köln 1,194 6 56 25 87 13 12
Hannover 350 22 56 12 90 10 11
Apolda 248 3 59 27 89 11 27
Mannheim 998 20 54 15 89 11 8
Danzig 158 7 52 29 88 12 4
Frankfurt 816 29 53 10 92 8 8
Total 10,251 11.7 62.7 17 91.4 8.6 11.4

Source: Adapted from Moede, W., Lehrbuch der Psychotechnik [Textbook of Psychotechnics], p. 433, Springer, Berlin, 1932.

(engine drivers, plate-layers, supervisors, etc.; Miles, 1922). Therefore, the German army
used scientific psychological employee selection at least 2 years before the American army
started their famous project with the Alpha and Beta tests (it must be remembered that this
project really began in 1918). Furthermore, German psychologists used multiple proce-
dures for employee selection such as intelligence tests, psychomotor tests, simulations, and
sensorial tests, rather than mainly concentrating on intelligence testing as it was the case
in the American army.
It is interesting to note that at the Conference of Applied Psychology, held in Berlin in
October 1922, apart from discussions about statistical methods, training in psychological
testing, measurement of personality, and cooperation between researchers and practitio-
ners, it was also discussed

what should be done with those who were rejected by the tests and the conclusion was
that for the interests of the workers and of the community it appears equally essential that
those candidates who are rejected should receive further help and advice. (Miles, 1922,
p. 192)

It is obvious that this discussion is a precursor of the current pragmatic concerns and
research into fairness, applicant reactions, and organizational justice in employee selection
procedures.
In 1913, Jules Suter, professor at the University of Zürich (Switzerland), intro-
duced psychotechnical examinations at the Bally Manufacturing Company (Heller,
1929; Baumgarten, 1949). Between 1916 and 1920, psychotechnic examinations using
Münsterberg’s and William Stern’s procedures for selecting drivers were conducted by
Claparède and Fontégne in Geneva (Switzerland), by Tramm in Berlin (Germany), by
Mira in Barcelona (Spain), Sbowoda in Vienna (Austria), and by Houdmont, Defailly, and
Chopinet in Brussels (Belgium) (see Perez-Creus, 1947, for a more completed description
of the studies with drivers and also Sollier, 1927).
Employee Selection in Europe 927

2. Pilot selection during World War I is another relevant example of the early contribution
of European psychologists. For example, Gemelli conducted psychophysiological exami-
nations for selecting military pilots in Italy as early as 1915, as H. G. Anderson did in
Great Britain, Camus in France, and Selz, W. Benary, and Kronfeld in Germany, and
Lehmann in Denmark (Eriksen, 1927; Ferrari, 1939; Grundlach, 1999; Lahy, 1922; Perez-
Creus, 1947). The typical European model consisted of (a) an examination of the emotional
responses and their influence on the respiratory and cardiac rhythm, (b) an examination
of the psychomotor reaction times, (c) a measure of the intensity and duration of the emo-
tional responses, and (d) a measure of fatigability. Two approaches were used in this case:

• A global laboratory examination in which the candidate had to perform the job of a pilot
• An examination of specific functions required for doing the job

Just for a comparative purpose, it should be remembered that the first studies with
American pilots were conducted in June 1917 when the U.S. “National Research Council
Committee on Aviation tried out tests reported as promising by French and Italian psy-
chologists, and other tests as well, on 75 candidates at the Massachusetts Institute of
Technology Ground School” as acknowledged by Hennon, one of those responsible for the
American aviation program (Hennon, 1919, p. 105). According to Damos (2007):

The United States had no airlines, no civil airports, and no civil pilots prior to 1917, and was
ranked 14th internationally in terms of aviation … The first American-made aircraft did
not see service in Europe until August, 1918. Thus, America has little in terms of aircraft,
production facilities, flying training schools, or pilots at the beginning of the war. … In early
1917, the U.S. Army decides to adopt modified version of the Canadian and British flying
training programs (Damos, 2007, p. 12).

Thus, European psychologists actually conducted these selection processes 3 years


prior, and they served as an example for the American army. Interestingly, the selection of
civil pilots (e.g., postal aviation) was made in Belgium by Brabant, director of the Belgian
Aeronautic Laboratory, who conducted a study on psychological selection and air acci-
dents. Before the introduction of psychological selection, 2% of the accidents were due to
the observers, 8% due to the aircrafts, and 80% to the pilots. After the selection, the first
two figures did not vary, but the third figure was reduced from 80% to 20% (see Lahy,
1922, for a more detailed description of the study).
3. The analysis of functions and tasks within different occupations was another line of work
of the forefathers of European employee selection. For example, Tramm analyzed the
use of the break system in tramways in Berlin. In Paris, Lahy and Bienemann analyzed
the job of typists. Around 1914, it is very interesting that Lipmann created the first job
analysis questionnaire consisting of 104 items. The Dutch psychologist van Gineken
(1918) subsequently extended the questionnaire by adding 26 items to the original
questionnaire and conducting many studies of job analysis (see Perez-Creus, 1947, for a
complete version of the Lippman-van Gineken questionnaire). In Barcelona (Spain) the
Instituto d’Orientacio Profesional (Professional Guidance Institute), directed by Mira,
used a job analysis questionnaire to be filled out by workers and technicians (Lahy, 1922).
Occupations were also classified according to the degree of aptitudes required. In Germany,
Piorkowski (1915) classified occupations in four groups: (a) nonskilled jobs, which do not
require any special ability; (b) semiskilled job, requiring some special aptitudes, mainly
attention and reaction but only a medium degree of intelligence; (c) occupations demanding
some degree of intelligence plus a combination of specific aptitudes; and (d) occupations
demanding initiative, organization, research, creativity, commanding skills, and the ability
928 Handbook of Employee Selection

to distinguish between highly important and less important aspects. In Spain, Mallart
(1925) classified aptitudes according to the main professional groups.
4. An important characteristic of personnel selection under the psychotechnic model was
that a large network of laboratories, institutes and, most importantly, personnel was cre-
ated for supporting the efforts of these researchers. Thus, by 1926 there were up to 250
psychotechnic institutes in Germany; 11 in Czechoslovakia; 5 in Austria; 6 in Switzerland
and Hungary; 5 in Russia; 3 in Spain; and a similar number in Belgium, The Netherlands,
Poland, Hungary, Finland, Italy, the Scandinavian countries, the United Kingdom, and the
Baltic Republics (i.e. Latvia, Esthonia, Lithuania). Seven international conferences (con-
gresses) of psychotechnics were held during the next 14 years after the foundation of the
International Association of Psychotechnics in 1920 in Geneva.
5. Notable advances in employee selection were made all over Europe and not only in the
more wealthy countries. For example, Duck conducted studies on engineers, clerical,
and construction workers in Innsbruck (Austria) (Perez-Creus, 1947). In Czechoslovakia,
between the two World Wars, work on employee selection was conducted at the Institute
of Psychotechnics in Prague, and the best-known Czech psychotechnician in those years
was William Forster, who designed a reaction time apparatus for assessing distributed
attention in 1928. This apparatus received remarkable interest and it was in use for many
years afterwards (e.g., in Spain) for assessing drivers. The members of the Institute
mainly focused on conducting employee selection for industry, policy, army, and public
administration (Paulik, 2004). They conducted also selection processes for screening
students at secondary schools and universities. Together with the Institute of Psychotechnics
in Prague, other psychotechnic laboratories were founded in private industries mainly for
personnel selection purposes. For example, the psychotechnic laboratory of the Prague
electrical company (founded in 1925), the laboratory of the Prague company Koh-i-noor
(founded in 1935), the laboratory of the textile industrial school at Liberec (founded in
1923), and the laboratory at the Vitkovice ironworks (founded in 1923).
Moreover, psychotechnic tests were used in large companies such as Bat’a in Zlin (dur-
ing 1937–1939) (see Paulik, 2004, for a detailed account). In general, the objective of this
employee selection effort was to improve productivity and to reduce injuries and accidents
by assessing the applicants’ aptitude. The success of this assessment program is illus-
trated in a report given in 1932 at Vitkovice Ironwoks, documenting a decrease of 61%
of workplace injuries (Paulik, 2004). According to an article published in the Journal of
the National Institute of Industrial Psychology (Anonymous, 1926a), entitled “Industrial
Psychology in Europe,” psychological tests were used by various large firms and were
used to select employees on the state railways in Austria. In Finland, since 1922, rail-
way employees were selected on the basis of tests and airmen were given psychological
tests in the army laboratories (Anonymous, 1926a). In Italy, the Psychological Laboratory
of the Catholic University of Milan, founded by Gemelli in 1921, carried out different
validity studies and developed tests for the selection of spinners in artificial silk mills,
women workers in wool industries, staff in two large shoe factories and, also, research was
done for the selection of airplane pilots (Ferrari, 1939). Also in Italy, Maria Diez-Gasca
conducted studies on the selection of women for millinery, and Marzi conducted studies
for selecting smiths, welder, turners, mechanics, motor drivers, electricians, fitters, and
switchboard operators (Ferrari, 1939). In Riga (Latvia), firemen were selected using tests
(Anonymous, 1926a). In Warsaw (Poland), there was a special testing center for engine
drivers (see Wojciechowski, 1927). Tests were particularly used on the European railways;
for instance, Nestor (1933a, 1933b) reported that as early as 1917 tests for engine drivers
were introduced in Dresden by Ullbricht, Schreiber, and Glasel, and that in 1925, Couvé
and Heydt used tests taken by the whole staff, including office workers (by the end of
1930, 75,000 persons had been tested; see also Anonymous, 1926b). Nestor (1933b; see
Employee Selection in Europe 929

also Viteles, 1938, for Russia) reported that in 1924, Karner introduced cognitive tests for
apprentices on the Austrian Federal Railways; in 1925, the Swiss Federal Railways intro-
duced ability tests for workshop apprentices. Also in 1925, the selection of engine drivers
and stokers was introduced in Russia (by Kolodnaja), and the Dresden Method together
with Rossolimo’s tests were used. In 1926, tests were introduced in Czechoslovakia for
selecting engine drivers, motor-trolley drivers, and office workers at various railway com-
panies. Since 1927, cognitive and psychomotor ability tests were introduced in Latvia (by
Möller), Poland (by Wojciechowski), Finland (by Hjelt, Vuolle, and Petterson), Yugoslavia,
Italy (by Filippini), Belgium, Sweden (Anderberg, 1936), Hungary, Norway, France (by
Lahy), and Romania (by Nestor).
6. Together with many paper-and-pencil tests, European psychotechnicians designed and
created many sets of apparatus and devices for assessing mental, psychomotor, and
sensorial aptitudes, and a robust industry was created around the production and selling
of these apparatus. Also, film technology was used to achieve a more naturalistic setting
(e.g., see Erisman & Moers, 1926; Giese, 1925, for illustrations and photographs of many
apparatus). During the 1930s, coordination motor tests received a lot of attention in
Europe. Hopkins (1939, 1944), reporting on his visit to Berlin in 1936, mentioned that the
German air force used complex coordination tests and reaction tests for selecting pilots;
this information was later confirmed by Fitts (1946). Hopkins (1939) also mentioned that
84 psychologists were employed in the 15 army psychological laboratories that existed
in Germany. In the United Kingdom, Barlett and Craik (1939) developed a psychomotor
test called the “Reid machine” for the Royal Air Force that later became known as the
Sensory Motor Apparatus (SMA). Williams (1940) reported the validity of the SMA as
being moderately high.
7. The examination of leaders and administrators was made as long ago as 1914 in France
and 1925 in the former USSR (Henry, 1914; Jurowskaja, 1925). Pryns Hopkins described
a very interesting test used by the Hungarian army since 1933 in a paper presented at a
meeting of the British Psychological Society on April 3, 1938, and subsequently published
as an article (Hopkins, 1939). The tests

were conducted in a hall which resembled a gymnasium. Here ladders, bars, boards, building
blocks, ropes, etc., were lying about, and two officers who were examining a candidate for
commissioned rank were requiring him to instruct three men to use this material for the
construction of a bridge suitable for them to cross. The candidate was first left alone to
consider the situation. He then informed the two officers how he would proceed with his
instructions, and listened to their comments. Thereupon he summoned his three men and,
just as if he were their officer, gave them the orders and the supervision necessary for the
execution of their task. (Hopkins, 1939, p. 60)

Pointedly not far removed at all from modern-day assessment center exercises still in use
by the British Army for officer selection (Anderson, Lievens, van Dam, & Born, 2006).
Subsequently, the two officers judged the candidate’s suitability for promotion. Another
test for selecting sergeants was taken in

a hall equipped with desks at which the candidates sat for a group test before a screen, on which
were thrown pictures of an air raid, anti-aircrafts guns going into action, the extinguishing
of fires, care for the wounded, etc. After viewing any one of such films, the candidates had to
give a report of what they had seen and to answer questions upon it. (Hopkins, 1939, p. 61)

Many other interesting tests and simulations used by the Hungarian army were described
in Hopkins’ article, and it is obvious again that the modern assessment center method is a
descendent of these procedures.
930 Handbook of Employee Selection

8. Many important journals were devoted to industrial psychology or applied psychology


during the early years of European employee selection, and their volumes contained dozens
of articles reporting validity coefficients and descriptions of tests, apparatus, and other
methods (e.g., simulations, graphology, interview): Zeitschrift für angewandte Psychologie
(Germany, 1907–1934, Stern and Lipmann, founders); Praktische Psychologie (Germany,
1919–1923, Piorkowski and Moede, founders); Industrielle Psychotechnik (Germany,
1924–1944, Moede, founder); Psychotechnische Zeitschrift (Germany, 1925–1936, Rupp,
founder); Journal of the National Institute of Industrial Psychology, later Human
Factor, later Occupational Psychology, later Journal of Occupational Psychology and
now Journal of Occupational and Organizational Psychology (United Kingdom, since
1922, Myers, founder); Le Travail Humain (France, since 1934, Lahy, founder); Annals
de l’Instituto d’Orientacio Profesional (Spain, 1920–1926, Mira, founder); Rivista de
Psicologia Aplicata (Italy, since 1908, G. C. Ferrari, founder); Psychotechnika (Poland,
1927, Studencki, founder); and Revue de Physiologie et de Psychologie du travail (Poland).
The ten-volume Handbuch der Arbeitswissenschaften [Encyclopedia of Work Science]
published in 1930 must especially be mentioned. Indeed, the first two volumes of this
encyclopedia contained 5321 pages!

European leadership in employee selection during the first three decades of the 20th century
was acknowledged by the most reputed American scholars such as Viteles (1926, 1928, 1930, 1932),
Kornhauser (1929–1930), Kitson (1922), and Bingham (1919). In their work they reviewed the
extensive work carried out in European countries and compared it with the developmental level of
American employee selection at that time.
However, a few years after these reviews, European employee selection started to dramatically
lose its leading role. Indeed, it has never quite recovered since that time, and the indisputable cur-
rent leader of employee selection research is certainly the United States. However, it is also fair
to acknowledge that researchers such as Bingham, Viteles, Kitson, and Kornhauser had special
characteristics: they read research published in several languages, including German, French, and
Russian (Viteles); they visited the European research centers; and they were close with colleagues of
the leading European psychologists at the time, such as Lahy, Moede, Lipmann, Baumgarten, Mira,
Myers, and Spielrein, to mention just a few.
Several political reasons can be put forward to explain the decline of European personnel
selection at the end of the 1930s. In the USSR, psychological testing was forbidden as a capitalistic
ideology, psychotechnic offices were closed and many of its industrial psychologists disappeared
under Stalin’s purges. In Germany, Austria, Poland, and the Czech Republic, many research
Institutes were closed, the most relevant researchers and professors were condemned to exile or
worse—for example, Stern, Lewin, Baumgarten, and many others during the Nazi period. Some of
them preferred suicide, including Klemp and Lipmann, who intentionally ended his life when the
Nazis closed his Institute and he was fired (Baumgarten, 1933, 1940). Similarly, Rebecca Biegel,
director of the psychotechnical laboratory of the Postal Office in Holland, preferred to commit
suicide, thereby avoiding her death in a concentration camp (Baumgarten, 1949).
Such personal histories make for salutary reading, and it is perhaps somewhat shocking for
readers of this handbook to discover the tragic fates of some of these individuals who were
undoubtedly persecuted under different regimes merely for being scientists or practitioners in
I-O psychology. In Italy, as another example, Enzo Bonaventura was prosecuted because of his
Jewish origin and religious beliefs, and he had to abandon the country and flee to Palestine.
Bonaventura (1934) had been one of the first Italian contributors to the examination of emotions
for personnel selection purposes. In Spain, the fascist rebellion against the Spanish Republic and
its eventual victory meant that the best reputed Spanish industrial psychologists such as Mira,
Rodrigo, and Germain had no choice but to disappear into exile whilst others were fi red from
their jobs (e.g., Mallart).
Employee Selection in Europe 931

The selection system for soldiers and officers of the Spanish Republican army (1937–1939) was
described by Mira (1943), who was Chief of Psychological and Psychiatric Division of the Spanish
Republican Army. The use of cognitive and psychomotor tests as well as personality questionnaires
and interviews formed the basis for the selection decision.
The beginning of World War II produced a borderline paralysis of the research efforts in most
continental European countries and for legendary individuals such as Lahy, who died in Paris dur-
ing the Nazi-control of the city. Consequently, the period between 1935 and 1945 was practically the
end of the successful period in several European countries (with perhaps the only exception being
Great Britain).
However, this panoramic view should not suggest that German personnel selection during the
Nazi period was inefficient or nonexistent. This is simply not true. Two important figures con-
tinued their research and practice under Nazi support. One of them was Moede, probably the
most important German industrial psychologist between 1915 and 1945. Moede was the edi-
tor of the journal Industrielle Psychotechnik between 1924 and 1944. The second important
German researcher under the Third Reich was Max Simoneit. Mira (1943) considered the book
by Simoneit (1938) as the main source of information about military officer selection in the Nazi
army. Incidentally, Simoneit was a member of the group that plotted the July 20, 1944, attempt
on Hitler’s life. According to several authors (Ansbacher, 1941; Fitts, 1946; Mira, 1943; Simoneit,
1938), the most important key in the selection and promotion of officers in the German army
was the assessment of personality and character. A selection team consisting of two officers, a
military physician, three psychologists, and a psychiatrist was assigned to each candidate. The
team collected information regarding biographical background, expression analysis, psychological
testing, behavior efficiency assessment, and a final examination. Biographical information was
collected using interviews. Expression analysis consisted of the analysis of nonverbal behavior
and communication, facial expression, verbal content, and graphological (handwriting) analysis.
The psychological testing consisted of a job simulation task in which the candidate could consult
books, ask supplementary information, and then chose what he thought was the best method for
solving a task-related problem. After finishing the task, the candidate was asked about the solving
process and about the reasoning process. On the basis of those explanations, his performance on
each task was evaluated. The assessment of behavior efficiency was done by placing the candidate
in an experimental situation with some degree of risk and adventure. The fi nal examination, called
Führerprobe [Command Test], a test of leadership capabilities, was the most complex procedure.
The assessment lasted for 2 days, and in this time, the candidate was subjected to interrogations,
athletic tests, and manual and mental work.
An important part was a task in which the candidate had to lead a group of unknown soldiers,
ordering displeasing tasks. In this last part of the selection process, the team made notes about the
reaction of the future officer but also the reactions of the soldiers, because the German directorate
assumed that soldier reactions were a good signal of the officer’s leadership ability. Finally, the can-
didate was confronted with future colleagues to informally discuss various topics under the obser-
vation of the examining team. Putting together all of this information and test results, a diagnosis
of the candidate’s abilities was given. As can be seen, this assessment procedure is once again very
similar to our current assessment centers, and the British and U.S. armies very soon imitated it (see
Highhouse, 2002, for further details).

THE FALL OF THE EUROPEAN EMPLOYEE SELECTION ENTERPRISE AFTER


WORLD WAR II: THE YEARS OF DECLINE AND STAGNATION (1945–1975)
The years immediately after World War II showed a renewed interest for employee selection in
the western European countries and important research was conducted until 1960 in the United
Kingdom, France, Spain, West Germany, Sweden, and other countries. However, this recovery
began slowly and the contacts among researchers were understandably few, infrequent, and very
932 Handbook of Employee Selection

difficult to maintain. In the first Congress of the International Association of Psychotechnics (IAP)
celebrated after the World War II, Baumgarten (see Grundlach, 1998) described a devastating pan-
orama: Dunajervski, Syrkin, Spielrein, Gastew, and Kolodnaja were dead or imprisoned in Russia;
Stern, Lipmann, Giese, Poppelreuter, and Klemm in Germany; Lahy and Weinberg in France; and
Studencki, Segal, Lipszycowa, and Przedborska in Poland. Claparède and Myers were also dead.
Out of 27 members of the Director Committee of the IAP, 11 were dead and 2 had been expelled
from the Director Committee for ideological reasons. Another source of difficulties for improv-
ing the situation was the problem of obtaining publications, especially from scientific journals.
Information on books was also problematic to obtain let alone to be able to afford to buy books. In
fact, these kinds of difficulties were present until the 1990s. In this regard, the Congresses of the
IAP were perhaps the best opportunity for many personnel selection researchers to share their find-
ings and to get to know what was being researched across the world, especially across European
countries. Consequently, an examination of the proceedings of the IAP gives a fascinating insight
into developments during this period in European selection psychology.
A cursory review shows that between 1949 and 1958 four congresses were held in Bern, Paris,
Göttenbourg, and Rome, and that, for example:

• Studies on the selection of railway workers and drivers were made in France and Italy.
• German and Italian psychologists reported the status of personnel selection in their
countries.
• The classification of occupations was another issue of interest for Belgian, French, and
German researchers.
• Personality in personnel selection was the subject of several studies conducted by British,
French, and Spanish psychologists.
• The measurement of criteria and methodological advances in employee selection was
another theme of interest for psychologists from Belgium, France, Italy, Spain, and the
United Kingdom.
• Other issues of interest were the validity of interviews, the selection of engineers, military
pilot selection, and aptitude tests and their relation with accidents at work.

Of crucial importance in this period were collaborations between the American armed forces
and their European counterparts (e.g., Belgian, British, Dutch, German, Norwegian, Spanish, and
Swedish armies, air forces, and navies). In Holland, Buiten and Hofstee (1971), Fokkema (1958),
and Helbing (1964) conducted validity studies for the selection of pilots, fighter control officers, and
soldiers using cognitive and personality tests. In Spain, a series of validity studies was conducted
with the U.S. Air Force Cognitive Battery for selecting pilots and drivers (e.g., Germain, Pinillos,
Ramo, & Pascual 1958, 1959). Riggs (1958) in the Royal Norwegian Air Force and Anttila (1968) in
Finland conducted studies for selecting pilots and air mechanics, respectively.
These collaborative efforts were frequent during the 1950s and 1960s and continued in the next
decades. However, and rather curiously, there was practically no contact with American selection
researchers working at universities or in industry. In other words, this collaboration was limited to
psychologists working in military settings in Europe and the United States. The reputed American
I-O psychologist, Walter Bingham, wrote in 1951:

During the third week of July the Thirteenth International Congress of Psychology was convened in
Stockholm; and following it, in Gothenburg was held the Tenth International Congress of Applied
Psychology (or call it “Psychotechnique” if you are speaking French). It was this latter congress in
which I had an active role and in which you who look at Personnel Psychology might be expected to
have a special interest.
As to your real interest in such international affairs I am not certain since almost none of you
attended; or, I ought to say, almost none who live in the USA. Readers of this journal in countries
scattered over six continents took part; but our homeland, which furnishes employment to half of the
Employee Selection in Europe 933

world’s psychotechnologists and practitioners of psychological profession, was conspicuously under-


represented. Among two hundred registrants we have only four! From Brazil came five times that
number although transportation from Rio de Janeiro cost more than from New York … As the sessions
progressed, it was encouraging to observe how some of these familiar topics had here and there newly
studied by resorting to improved experimental design and ingenious statistical controls. Even more
significant were a few papers boldly opening vistas of inquiry into which American psychotechnolo-
gists have not yet looked. It is hoped that our inadequate part in this congress can be explained without
presupposing the spread among us of professional complacency, a creeping distemper which provokes
the ludicrous posture of looking down one’s nose.” (Bingham, 1951, pp. 430–431)

A few further relevant contributions that appeared in this period must also be mentioned.
Probably, the most important was the book by Vernon and Parry (1949), in which these two British
psychologists gave an account of the employee selection program conducted in the British armed
forces during the war, highlighting the largest research program conducted in Europe. Even today,
it is a model of good practice and research. Together with this book, Vernon, Parry, Burt, and other
British psychologists published a series of articles in which they reported the essence of the job
done and reported the validity studies conducted (e.g., see Parry, 1947; Vernon, 1947a). They had
also made important contributions to methodology, including developing formulas for correcting
validity coefficients for range restriction in predictor and criterion, which, unfortunately, was widely
unknown in these years (Burt, 1942a, 1942b; Vernon, 1946). Much of this work was subsequently
translated to civilian organizations and, especially, to the British Civil Service (see Anstey, 1977).
For example, assessment center technology was introduced in the Civil Service and for selecting
police officers. The origins of these procedures remain visible in assessment centers conducted
today in both of these contexts (Feltham, 1988).
In France, Bonnardel, the leading French personnel selection psychologist of the period,
conducted many validity studies and developed new tests (see Salgado et al., 2003, for a large list of
Bonnardel‘s validity studies). Other French psychologists with relevant contributions to personnel
selection were Pacaud, Goguelin, and Palmade (see Pacaud, 1971; Palmade 1948). Studies on
the selection of engineers, supervisors, drivers, and mechanics were conducted, new tests were
developed, and new laboratories were created (e.g., the psychological laboratory at Peugeot) by this
group of highly active French researchers. The selection of railways workers was another focus of
interest for employee selection in this historical period, and studies were conducted in France, Spain,
Belgium, and other European countries. For example, in Romania, Craciunescu (1969) conducted a
validity study of cognitive tests for selecting railway mechanics.
In Belgium, studies were done by Miret-Alsina (1958) at the Sabena Company for selecting civil-
ian pilots. In the Scandinavian countries, Trankell (1959) also developed a procedure for selecting
pilots for the Scandinavian Airlines System (SAS). The procedure required the individual to hold a
pencil in each hand and to place one dot in each of series of circles connected by straight lines. One
series of circles was used for the right hand and the other series for the left hand. The individual
had to alternatively move left and right hands from circle to circle following the lines. The test was
validated using the results of a training course. This selection process for SAS included other tests
such as verbal intelligence, maturity, and tact. In the 1960s and 1970s, a projective test, the Defense
Mechanism Test (DMT), developed by Krach at the University of Lund (Sweden) received a great
deal of attention. Several studies were published reporting its validity (see Krach, 1960, for a com-
plete description of the test and the validity studies). The DMT displays several pictures through
a tachistoscope and the length of the exposure is gradually longer. The individual must provide a
description of what she/he has seen. The responses are assessed in terms of the defense mechanisms
revealed.
The 1960s and 1970s can best be conceptualized as the obscure period of European employee
selection, because apparently little research was being done in European countries and notably little
interest for this research was shown in the universities. For example, only a couple of presentations
were held at the 16th and 17th International Congress of Applied Psychology celebrated in Holland
934 Handbook of Employee Selection

(1968) and Belgium (1971). Indeed, this is a generalization, and some few exceptions can be found in
different countries. However, important exceptions to the previous generalization were the research-
ers of the University of Amsterdam (Holland), who carried out a relatively large number of doctoral
dissertations and internal reports on the validity of psychological tests (see Salgado et al., 2003, for
a list of these contributions), with Drenth and Roe as the most significant researchers. In Germany,
Amthauer (1973) conducted a large series of validity studies with the intelligence I-S-T 70 test.
Consequences of this lack of research and interest in employee selection research were that this
discipline disappeared from many university psychology programs, few students looked for careers
in this occupational field, there was practically no interchange between researchers and practitio-
ners, and, most importantly, few jobs and opportunities were offered to psychologists working as
practitioners in the field of employee selection (Warr, 2007).

THE RESURGENCE OF EUROPEAN EMPLOYEE SELECTION: THE


YEARS OF OPTIMISM AND GROWTH (SINCE 1975)
From about 1975 onwards, employee selection began a period of resurgence and renaissance in
Europe mainly because of the efforts of researchers who began their career some years before.
Researchers like Peter Drenth, Paul Jansen, Jan Algera, Robert Roe, and Charles De Wolff in The
Netherlands; Peter Herriot, Alan Jones, Victor Dulevic, Ivan Robertson, Mike Smith, Viv Shackleton,
Dave Bartram, Peter Saville, and Peter Warr in Great Britain; Peter Dachler in Switzerland, José
Forteza, Nicolas Seisdedos, and Jose Maria Prieto in Spain; Karel De Witte in Belgium; Claude
Levy-Leboyer, Marie Montmollin, and Jean Pierre Rolland in France; and Herman Branstätter
and Heinz Schuler in Germany, among others, must be mentioned as the senior leading European
researchers. This group of senior researchers conducted many of the primary validity studies used in
recent European meta-analyses of selection procedure validity. They developed a European model
of selection in which the applicant perspective is the core characteristic, which predated the current
interest on the justice framework and the applicant reactions to the selection procedures (De Wolff &
van den Bosch, 1983; Herriot, 1989). They proposed new concepts for the validation of selection
processes such as the social validity concept (Schuler, 1993) or social desirability as an intelligent
adaptation of applicants to personality assessments (Seisdedos, 1993). They created a large series
of tests with hundreds of validity studies (e.g., Saville & Holdsworth in the United Kingdom; see
SHL, 1996) or adapted many others from other European countries or from the United States,
as was done, for example, by Seisdedos and Prieto in Spain, or the researchers of the Centre de
Psychologie Appliquee de Paris. They critically examined the new methodological advances, such
as meta-analysis and validity generalization procedures (e.g., Algera, Jansen, Roe, & Vijn, 1984).
Furthermore, they conducted many surveys on recruitment and selection in European countries
(e.g., Altink, Roe, & Greuter, 1991; Bruchon-Schweitzer & Darrieux, 1991; Dany & Torchy, 1994;
Levy-Leboyer, 1994; Prieto, Blasco, & Quintanilla, 1991; Robertson & Makin, 1986; Schuler, Frier,
& Kauffmann, 1991; Shackleton & Newell, 1991; Smith, 1991; Smith & Abrahansen, 1992) to have
a good overview of employee selection in Europe and to identify similarities and differences among
the European countries. They were pioneers in the use of computer technology for developing cog-
nitive tests (e.g., Bartram, 1987). As can be seen from the previous list of contributions, European
researchers have a plurality of interests. They are not only interested in the criterion-validity of
selection methods, but also on the construct validity; the uses of the various procedures for recruit-
ment and selection; applicant reactions to selection procedures; and the social, ethic, and legal
perspectives on employee selection. This is not, of course, to say that North American researchers
and practitioners have not pursued these lines of research, but interest in several of these topics has
been more recent in North America.
Perhaps the beginning of this quest toward the recovery of the status of European employee
selection can be situated in the Congress of the International Association of Applied Psychology
celebrated in Edinburgh (Great Britain) in July 1982. At this congress, Böhrer presented a
Employee Selection in Europe 935

validity study of the selection system used in the Belgian armed forces; Busch-Jensen from
Denmark presented the question of values associated to selection procedures; and Bartram and
Dale reported a validity study on personality measures for predicting training proficiency in the
British military pilots. In another presentation, Bartram described “Micropat,” a computerized
test battery for pilots; Horia Pitaru presented a study on the validity of selection tests for pre-
dicting job performance in data processing occupations in Romania; Poppleton from the United
Kingdom reported on the development and validation of a test for the measurement of sales
aptitude; and Schuler and Stehle presented an empirical study on the validity of the assessment
center method (see IAAP, 1982).
Today, European employee selection researchers contribute to the main journals in the world,
such as Journal of Applied Psychology, Personnel Psychology, Human Performance, Applied
Psychology: An International Review, and some of them are members of the editorial boards
or even are the editors. In Europe, employee selection researchers published articles in jour-
nals such as International Journal of Selection and Assessment, Journal of Occupational and
Organizational Psychology, Zeitschrift für Personalpsychologie, European Journal of Work
and Organizational Psychology, or Journal of Organizational Behavior. In 1993, a European
researcher (Neil R. Anderson) founded the fi rst journal in the world devoted entirely to employee
selection research, the International Journal of Selection and Assessment, today among the lead-
ing journals in the world for this discipline according to the Thompson Institute for Scientific
Information (ISI).
Employee selection is currently an important research area in European work and organizational
psychology (Roe & van den Berg, 2003), and examples of active researchers include Derous, de
Corte, de Fruyt, De Witte, and Lievens in Belgium; Rolland and Steiner in France; Höft, Hülsheger,
Hornke, Kanning, Kersting, Krause, Marcus, Moser, and certainly Schuler in Germany; Aramburu-
Zabala, Garcia-Izquierdo, Gorriti, Moscoso, Osca, Sáez, and Salgado in Spain; Anderson, Bartram,
Cunningham-Snell, Robertson, Silvester, Smith, and Warr in the United Kingdom; Bertolino in
Italy; Nicolaou in Greece; Kleinmann, König, and Klehe in Switzerland; and Born, Evers, Roe, te
Nijenhuis, van Dam, van den Berg, van Vianen, and Voskuijl in The Netherlands. They participated
in symposia, congresses, and published their research with such well-known and reputed American
scholars as Wally Borman, James L. Farr, Milton Hakel, Michael Harris, Scott Highhouse, Tim
Judge, Rich Klimoski, Frank J. Landy, Gary Latham, Kevin R. Murphy, Deniz S. Ones, Mitchell
Rothstein, Paul R. Sackett, Juan I. Sanchez, Frank Schmidt, Paul E. Spector, Dirk D. Steiner, George
Thornton, and Vish Viswesvaran, mentioning just a few of them.
Among the most recent contributions of European employee selection, several edited books
must be mentioned, including International Handbook of Selection and Assessment (Anderson &
Herriot, 1997), Handbook of Industrial, Work and Organizational Psychology (Anderson, Ones,
Sinangil, & Viswesvaran, 2001), Handbook of Personnel Selection (Evers, Anderson, & Voskuijl,
2005), and La Psychologie Appliqué a la Gestion des Resources Humaines (Levy-Leboyer, Hutteau,
Louche, & Rolland, 2001). Relevant contributions made by European researchers include studies
on the multimodal interview (Schuler, 2002; Schuler & Funke, 1989), on the construct validity of
assessment centers (Lievens & Van Keer, 2001; Scholz & Schuler, 1993), on interview validity and
constructs assessed by interviews (Salgado & Moscoso, 2002; Schuler, 2002), the meta-analyses
of the validity of cognitive ability testing (Bertua et al., 2005; Hülsheger, Maier, & Stumpp, 2007;
Salgado & Anderson, 2003; Salgado et al., 2003), the examination of the validity of situational
tests (Lievens, Buyse, & Sackett, 2005; Lievens & Sackett, 2006), Internet-based recruitment and
testing (Bartram, 2000; Lievens & Harris, 2003), the contributions on fairness reaction to employee
selection procedures (see the whole issue of the International Journal of Selection and Assessment
edited by Neil R. Anderson, 2004), the meta-analyses of the validity of personality measures
(Salgado, 1997, 2002, 2003), the studies on typical and maximum performance (see the whole
issue of Human Performance edited by Ute-Christine Klehe, Neil R. Anderson, & Chockalingam
Viswesvaran, 2007), and the studies on the adverse impact of selection procedures (De Corte,
936 Handbook of Employee Selection

Lievens, & Sackett, 2006; see also Annual Review articles by Hough & Oswald, 2000, and Sackett
& Lievens, 2008, for more comprehensive reviews of the European contributions).
As a summary, in our opinion, common characteristics of the current personnel selection field
across the European countries are the following:

1. A large cultural diversity that expresses the use of different techniques and tools across
the European countries with a different frequency of use. For example, cognitive tests are
more often used in the United Kingdom and Spain than in Germany or Italy.
2. An international generalization of the validity for cognitive ability tests across the European
Union (EU).
3. Great interest in the use of personality testing and studies showing validity coefficients
similar to those found in the United States.
4. Great interest in applicant reactions and the ethics of personnel selection. The social con-
text of the selection process is viewed as a primarily relevant one.
5. Construct validity of the personnel selection procedures is crucial.
6. Growing interest in the possible employment discrimination due to various selections tools
and procedures.
7. A confluence of the national employment laws within the framework of the EU (currently
consisting of 25 countries with over 400 million individuals), with an emphasis in the
rights of the nationals of the EU countries.
8. A deeply held concern for the researcher-practitioner divide.
9. Concern for the “turbulent” socioeconomic environment, which implies frequent and
quick changes in jobs and organizations. In this context, competency models are viewed
as a possible line of advance together with new conceptualizations of personnel selection
as a “theater model” (i.e., this model “aims at developing competences between selection,
learning in practice, and direction,” and putting “the emphasis on the principle of employ-
ment and career opportunity” Roe & van den Berg, 2003, p. 275).
10. Finally, Roe and van den Berg (2003) identified six principles which provide direction and
legitimacy to selection practices in Europe: (a) meritocracy, (b) risk avoidance, (c) employ-
ment and career opportunity, (d) fair chance, (e) two-sidedness, and (f) involvement.

CONCLUSIONS
In this chapter we have summarized three main periods of development over the history of
European personnel selection research and practice. In so doing, we have brought to light several
vitally important contributions by scholars who seem to have been unfairly underacknowledged
within Europe but especially in North America. Our retrospective review is, of course, necessar-
ily a narrative one and one influenced by the interpretations of historic precedent by us as authors.
However, we suspect that many North American colleagues may never have even heard of some
of the names cited in this chapter, let alone been aware of some of the persecution and hardships
endured by many European I-O psychologists in the past. These personal histories make salu-
tary reading for us all. We believe the early establishment of the fundamental bases of employee
selection theory, methods, and techniques in Europe some 100 years ago is as impactful now.
We have argued that modern-day methods can be directly traced back to these historic forbear-
ers, and indeed, that the principles of scientific personnel selection were established during this
period. For a science that relies so heavily on the measurement of past behavior to predict future
behavior, it is curious and ironic that present-day accounts of selection in many of the popular
textbooks fail to take heed of these historic contributions. Despite periods of growth, difficult
periods, and periods of stagnation, our overall conclusion is that European employee selection
is alive and well. It is experiencing a second youth, and we look forward to future developments
with keen interest.
Employee Selection in Europe 937

ACKNOWLEDGMENTS
Jesús F. Salgado’s work for this chapter was partially supported by grant SEB1098-2005 from
Ministerio de Educación y Ciencia (Spain) and grant PSI2008-03592/PSIC from Ministerio de
Ciencia e Innovación (Spain). We thank Nancy T. Tippins, James L. Farr, and Wally Borman for
their valuable comments on an earlier version of this chapter.
A chapter on European employee selection coauthored by a Spaniard, an Englishman, and a
German would have been unthinkable 60 years ago. We dedicate this chapter to the bravery, forti-
tude, and honor of former colleagues who suffered incalculably for the science and practice of I-O
psychology and employee selection. We also dedicate the chapter to our American and European
colleagues who are contributing to the progress of our science.

REFERENCES
Algera, J., Jansen, P. G. W., Roe, R. A., & Vijn, P. (1984). Validity generalization: Some critical remarks at the
Schmidt-Hunter procedure. Journal of Occupational Psychology, 57, 197–210.
Altink, W. M., Roe, R. A., & Greuter, M. A. (1991). Recruitment and selection in The Netherlands. European
Review of Applied Psychology, 41, 35–45.
Amthauer, R. (1970). Intelligenz-Struktur-Test (I-S-T 70). Handanweisung für die Durchführung und
Auswertung [Intelligence-structure-test (I-S-T 70). Manual]. Göttingen, Germany: Hogrefe.
Anderberg, R. (1936). Psychotechnische Rekrutierungsmethoden bei den schwedischen Staatsbahnen.
[Psychotechnical methods for recruitment in Swedish railways]. Industrielle Pychotechnik, 13,
353–383.
Anderson, N., & Herriot, P. (Eds.). (1997). International handbook of personnel selection. Chichester, England:
Wiley.
Anderson, N., Lievens, F., van Dam, K., & Born, M. (2006). A construct-driven investigation of gender differ-
ences in a leadership-role assessment center. Journal of Applied Psychology, 91, 555–566.
Anderson, N., Ones, D. S., Sinangil, H. K., & Viswesvaran, C. (Eds.). (2001). International handbook of indus-
trial, work and organizational psychology. London, England: Sage.
Anonymous. (1926a). Industrial psychology in Europe. Journal of the National Institute of Industrial
Psychology, 3, 264–267.
Anonymous. (1926b). Selection tests on the German railways. Journal of the National Institute of Industrial
Psychology, 3, 201–204 [This paper is based on a lecture given by Dr. Glasel to the Railway Student’s
Association at Dresden in May 1926].
Ansbacher, H. L. (1941). German military psychology. Psychological Bulletin, 38, 370–392.
Anstey, E. (1977). A 30-year follow-up of the CSSB procedure, with lessons for the future. Journal of
Occupational Psychology, 50, 149–159.
Anttila, H. (1968). Studies in the selection of air mechanics. In IAAP Proceeding of the 17th International
Congress of Applied Psychology, July, 18–22, Amsterdam, The Netherlands.
Barlett, F. C., & Craik, K. J. (1939). Report on the Reid Machine (Report Nº 59). London, England: Flying
Personnel Research Committee, Royal Air Force, Ministry of Defense.
Bartram, D. (1987). The development of an automated testing system for pilot selection: The Micropat Project.
Applied Psychology: An International Journal, 36, 279–298.
Bartram. D. (2000) Internet recruitment and selection: Kissing frogs to find princes. International Journal of
Selection and Assessment, 8, 261–274.
Baumgarten, F. (1933). Otto Lipmann—Psychologist. Personnel Journal, 12, 324–327.
Baumgarten, F. (1940). Otto Klemm, professeur de psychologie appliqué à l’Université de Leipzig (1884–1937)
[Otto Klemm, professor of applied psychology at the University of Leipzig (1884–1937)]. Le Travail
Humain, 8, 96–98.
Bingham, W. V. (1919). Army personnel work. With some implications for education and industry. Journal of
Applied Psychology, 3, 1–10.
Bingham, W. V. (1951). Today … and yesterday. Nº16. In Scandinavia. Personnel Psychology, 4, 429–438.
Bonaventura, E. (1934). L’examen de l’emotivé dans la sélection des conducteurs des véhicules rapides [The
sensitivity test in the selection of drivers of fast motor vehicles]. In H. Grundlach (Ed.). (1998). Applied
Psychology: The first-thirteenth Congress Proceedings of the International Association of Applied
Psychology, (Vol. 8, pp. 225–229). London, England: Routledge.
938 Handbook of Employee Selection

Bruchon-Schweitzer, M., & Ferrieux, D. (1991). Une enquête sur le recrutement en France [An investigation of
recruitment in France]. European Review of Applied Psychology, 41, 9–17.
Buiten, B., & Hofstee, W. K. B. (1971). The prediction of adjustment to military life. In R. Piret (Ed),
Proceeding of the 17th International Congress of Applied Psychology, July 18–22 (pp. 253–258).
Brussels, Belgium.
Burt, C. (1942a). The value of statistics in vocational psychology (I). Occupational Psychology, 16, 164–174.
Burt, C. (1942b). The value of statistics in vocational psychology (II). Occupational Psychology, 17, 25–33.
Carroll, J. B. (1951). Personnel psychology abroad. The International Association of Psychotechnology.
Personnel Psychology, 4, 127–133.
Chleusebairgue, A. (1934). Psicología del trabajo professional [Psychology of professional work]. Barcelona,
Spain: Labor.
Chmiel, N. (2000). History and context for work and organizational psychology. In C. Chmiel (Ed.),
Introduction to work and organizational psychology. A European perspective (pp. 1–19). Oxford,
England: Blackwell.
Craciunescu, R. (1969). Validarea examenului psihologie in trasportul feroviar [The validation of psychological
examination in the railways transport]. Revista de Psihologie, 15, 405–415.
Damos, D. L. (2007). Foundations of military pilot selection systems: World War I. (Technical Report 1210).
Arlington, VA: U.S. Army Research Institute for the Behavioral and Social Sciences.
Dany, F., & Torchy, V. (1994). Recruitment and selection in Europe: Policies, practices and methods. In
C. Brewster & A. Hegewisch (Eds.), Policy and practice in European human resource management: The
Price-Waterhouse-Cranfield Survey, London, England: Routledge.
De Corte, W., Lievens, F., & Sackett, P. R. (2006). Predicting adverse impact and multistage mean criterion
performance in selection. Journal of Applied Psychology, 91, 523–537.
De Wolff, C. J., & van den Bosch, G. (1984). Personnel selection. In P. J. D. Drenth, H. Thierry, H. Willems, & C. J.
de Wolf (Eds.), Handbook of work and organizational psychology (Vol. 1). Chichester, England: Wiley.
Eriksen, E. C. (1927). New psychological test for pilots. In H. Grundlach (Ed.). (1998). Applied psychology:
The first-thirteenth Congress Proceedings of the International Association of Applied Psychology (Vol. 4,
pp. 353–369). London, England: Routledge.
Erismann, T., & Moers, M. (1926). Psicología del trabajo professional [Professional work psychology].
Barcelona, Spain: Labor.
Evers, A., Anderson, N., & Voskuijl, O. (Eds.). (2005). International handbook of personnel selection. London,
England: Blackwell.
Feltham, R. (1988). Validity of a police assessment center: A 1–19 year follow-up. Journal of Occupational
Psychology, 61, 129–144.
Ferrari, C. A. (1939). Industrial psychology in Italy. Journal of Occupational Psychology, 13, 141–151.
Fitts, P. M. (1946). German applied psychology during World War II. American Psychologist, 1, 151–161.
Fokkema, S. D. (1958). Aviation psychology in The Netherlands. In J. B. Parry & S. D. Fokkema (Eds.),
Aviation psychology in western Europe (pp. 58–69). Amsterdam, The Netherlands: Swets & Zeitlinger.
Gemelli, A. (1917, August). The application of psychophysiological methods to the examination of candidates
for aviation service. Rivista di Psicologíat.
Germain, J., Pinillos, J. L., Ramo, M., & Pascual, M. (1958). Estudios sobre la selección de conductores en el
ejército del aire [Studies on selection of drivers in air army]. Revista de Psicología General y Aplicada,
13, 777–790.
Germain, J., Pinillos, J. L., Ramo, M., & Pascual, M. (1959). Selección de pilotos en el ejército del aire español
[Selection of pilots in the Spanish air army]. Revista de Psicología General y Aplicada, 14, 75–114.
Giese, F. (1925). Handbuch psychotechnischer Eignungprüfungen [Handbook of psychotechnical tests of
ability]. Halle, Germany: C. Marhold.
Grundlach, H. (Ed.). (1998). Applied psychology: The first-thirteenth Congress Proceedings of the International
Association of Applied Psychology (Vol. 1–13). London, England: Routledge.
Grundlach, H. (1999). El factor humano y el ingreso de la psicología y la psicotecnia en la guerra [The human
factor and the introduction of psychology and psychotechnics in the war]. Persona, 2, 163–179.
Guion, R. M. (1976). Recruiting, selection and job placement. In M. D. Dunnette (Ed.), Handbook of industrial
and organizational psychology. Chicago, IL: Rand McNally.
Helbing, J. C. (1964). Results of the selection of R.N.A.F air traffic and fighter control officers. In A. Cassie,
S. D. Fokkema, & J. B. Parry (Eds.), Aviation psychology. Studies on accident liability, proficiency criteria
and personnel selection. The Hague, The Netherlands: Mouton.
Heller, W. J. (1929). Industrial psychology and its development in Switzerland. Personnel Journal, 8,
435–441.
Employee Selection in Europe 939

Henmon, V. A. C. (1919). Air service tests of aptitude for flying. Journal of Applied Psychology, 3, 103–109.
Henry, R. A. (1914). Le socialisme et l’art de commander a l’industrie. Paris, France: Gauthier-Villars.
Herriot, P. (1989). Selection as a social process. In M. Smith & I. T. Robertson (Eds.), Advances in selection
and assessment methods (pp. 171–187). Chichester, England: Wiley.
Highhouse, S. (2002). Assessing the candidate as a whole: A historical and critical analysis of individual
psychological assessment for personnel decision-making. Personnel Psychology, 55, 363–396.
Hopkins, P. (1939). Psychological tests in the army and air force of foreign countries. Occupational Psychology,
13, 59–63.
Hopkins, P. (1944). Observations on army and air force selection and classification procedures in Tokyo,
Budapest, and Berlin. Journal of Psychology, 17, 31–37.
Hough L. M., & Oswald, F. L. (2000). Personnel selection: Looking toward the future—remembering the past.
Annual Review of Psychology, 51, 631–634.
Hülsheger, U. R., Maier, G. W., & Stumpp, T. (2007). Validity of general mental ability for the prediction of
job performance and training success in Germany: A meta-analysis. International Journal of Selection
and Assessment, 15, 3–18.
IAAP. (1982). 20th International Congress of Applied Psychology. Book of Abstracts. Edinburgh, Scotland:
July 25–31.
Jurowsjaja, M. A. (1925). Psichologieeskij profil administratora [The psychological profile of managers]. In
Sammelband “Inteligentnyj Trud” [Monograph “Works on Intelligence”]. Moscow, Russia.
Kitson, H. C. (1922). Second international conference of psychotechnics applied to vocational guidance and to
scientific management. Journal of Applied Psychology, 6, 418–424.
Kornhauser, A.W. (1929). Industrial psychology in England, Germany and the United States. Personnel Journal,
8, 421–434.
Kragh, U. (1960). The defense mechanism test: A new method of diagnosis and personnel selection. Journal of
Applied Psychology, 44, 303–309.
Kwiatkowski, R., Duncan, D. C., & Shimmin, S. (2006). What have we forgotten—and why? Journal of
Occupational and Organizational Psychology, 79, 183–201.
Lahy, J. M. (1922). La confèrence psychotecnique de Genève [The conference on psychotechnics in Geneve].
Journal of Psychologie Normal et Pathologique, 19, 65–79.
Landy, F. (1992). Hugo Münsterberg: Victim or visionary? Journal of Applied Psychology, 77, 787–802.
Levy-Leboyer, C. (1994). Selection and assessment in Europe. In H. C. Triandis, M. D. Dunnette, & L. M. Hough
(Eds.), Handbook of industrial and organizational psychology (Vol. 4., 2nd ed.). Palo Alto, CA: Consulting
Psychologist Press.
Levy-Leboyer, C., Huteau, M., Louche, C., & Rolland, J. P. (Eds.). (2001). Psychologie appliquée a la gestion
des resources humaines [Applied psychology and human resources management]. Paris, France: Editions
d’Organization.
Lievens, F., Buyse, T., & Sackett, P. R. (2005). The operational validity of a video-based situational judgment
test for medical college admissions: Illustrating the importance of matching predictor and criterion con-
struct domains. Journal of Applied Psychology, 90, 442–452.
Lievens, F., & Harris, M. M. (2003). Research on Internet recruitment and testing: Current status and future
directions. In C. L. Cooper & I. T. Robertson (Eds.), International review of industrial and organiza-
tional psychology (Vol. 18, pp. 131–165). Chichester, England: Wiley.
Lievens, F., & Sackett, P. R. (2006). Video-based vs. Written situational judgment tests: A comparison in terms
of predictive validity. Journal of Applied Psychology, 91, 181–188.
Mallart, J. (1925). Problemas fundamentales de la investigación psicotécnica [Fundamental problems of psy-
chotechnical research]. Madrid, Spain: Instituto de Reeducación Profesional.
Menzerath, P. (1913). Le Ve Congrès de Psychologie Expérimentale (Berlin, 16–19 avril 1912) [The Fifth
Congress of Experimental Psychology (Berlin, 16–19 April 1912)]. L’Anné Psychologique, 19,
236–256.
Miles, G. H. (1922). The Berlin conference in applied psychology. Journal of the National Institute of Industrial
Psychology, 1, 190–192.
Mira, E. (1943). Psychiatry in war. New York, NY: Norton.
Miret-Alsina, F. (1958). Aviation psychology in the Sabena. In A. Cassie, S. D. Fokkema, & J. B. Parry
(Eds.), Aviation psychology. Studies on accident liability, proficiency criteria and personnel selection
(pp. 22–35). The Hague, The Netherlands: Mouton.
Münsterberg, H. (1912a). Comment. In F. Schumman (Ed.), Bericht über den V. Kongress für experiment-
elle Psychologie in Berlin von 16. bis 20. April 1912 [Proceedings of the V Congress of Experimental
Psychology in Berlin, from 16–20 April, 1912] (pp. 115–116). Leipzig, Germany: J. A. Barth.
940 Handbook of Employee Selection

Münsterberg, H. (1912b). Psychologie und Wirtschaftsleben. Ein Beitrag zur angewandten Experimental-
psychologie [Psychology and business. A contribution to applied experimental psychology]. Leipzig,
Germany: J. A. Barth.
Münsterberg, H. (1913). Psychology and industrial efficiency. Boston, MA: Houghton Mifflin.
Münsterberg, H. (1914). Grundzüge der Psychotechnik [Compendium of psychotechnics]. Leipzig, Germany:
J. A. Barth.
Myors, B., Lievens, F., Schollaert, E., van Hoye, G., Cronshaw, S., Mladinic, A., et al. (2008). International
perspectives on the legal environment for selection. Industrial and Organizational Psychology:
Perspectives on Science and Practice, 1, 206–246.
Nestor, J. M. (1933a). Vocational tests on the European railways. I. Human Factor, 7, 11–23.
Nestor, J. M. (1933b). Vocational tests on the European railways. II. Human Factor, 7, 51–58.
Pacaud, S. (1971). Le diagnostic du potential individual: I: Le probléme general Le personnel d’exécution. In
M. Reuchlin (Ed.), Traité de psychologie appliquée (Vol. 4, pp. 5–66) [Treatise on applied psychology].
Paris, France: Presses Universitaires de France.
Palmade, G. (1948). La psychotechnique [Psychotechnics]. Paris, France: Press Universitaires de France.
Parry, J. B. (1947). The selection and classification of R.A.F. air crew. Occupational Psychology, 21,
158–169.
Paulik, K. (2004). The history of the psychology of work and organization in Czech and Slovak industry.
European Psychologist, 9, 170–179.
Perez-Creus, J. (1947). Orientación y selección profesional [Orientation and professional selection]. Madrid,
Spain: La Abeja.
Piorkowski, C. (1915). Beitrage zur psychologischen Methodologie der wissenschaftlichen Berufsberantung
[Psychological methods of scientific career counseling]. In Beihefte zur Zeitschrift für angewande
Psychologie [Supplement of the Journal of Applied Psychology] (Vol. 11). Leipzig, Germany: J. A. Bart.
Prieto, J. M., Blasco, R., & Quintanilla, I. (1991). Recrutement et selection du personnel en Espagne [Recruitment
and personnel selection in Spain]. European Review of Applied Psychology, 42, 47–62.
Riggs, E. (1958). Aviation psychology in the Royal Norwegian Air Force. In J. B. Parry & S. D. Fokkema
(Eds.), Aviation Psychology in western Europe. Amsterdam, The Netherlands: Swets & Zeitlinger.
Robertson, I. T., & Makin, P. J. (1986). Management selection in Britain: A survey and critique. Journal of
Occupational Psychology, 59, 45–57.
Roe, R., & Van den Berg, P. T. (2003). Selection in Europe: Context, development and research agenda.
European Journal of Work and Organizational Psychology, 12, 257–287.
Sachs, H. (1920). Studien zur Eignungsprüfung der Strassenbahnfahrer [Studies on aptitude tests of tram
drivers]. In Zeitschrift für angewande Psychologie [Journal of Applied Psychology] (Vol. 17). Leipzig,
Germany: J. A. Barth.
Sackett, P. R., & Lievens, F. (2008). Personnel selection. Annual Review of Psychology, 59, 387–417.
Salgado, J. F. (1997). The Five Factor Model of personality and job performance in the European Community
(EC). Journal of Applied Psychology 82, 1, 30–43.
Salgado, J. F. (2002). The Big Five personality dimensions and counterproductive behaviors. International
Journal of Selection and Assessment, (Deniz Ones, Guest Editor) 10, 117–125.
Salgado, J. F. (2003). Predicting job performance by FFM and non-FFM personality measures. Journal of
Occupational and Organizational Psychology, 76, 323–346.
Salgado, J. F., & Anderson, N. (2003). Validity generalization of GMA tests across the European Community
Countries. European Journal of Work and Organizational Psychology, 12, 1–17.
Salgado, J. F., Anderson, N., Moscoso, S., Bertua, C., De Fruyt, F., & Rolland, J. P. (2003). GMA measures
and the prediction of Job Performance and training success for different occupations in the European
Community. Journal of Applied Psychology, 88, 1068–1081.
Salgado, J. F., & Moscoso, S. (2002). Comprehensive meta-analysis of the construct validity of the employment
interview. European Journal of Work and Organizational Psychology, 11, 299–324.
Scholz, G., & Schuler, H. (1993). Das nomologische Netzwek des Assessment Centers: Eine Metaanlyse
[The nomological network of the assessment center: A meta-analysis]. Zeitschrift für Arbeits- und
Organisationspsychologie, 37, 73–85.
Schuler, H. (1993). Social validity of selection situations: a concept and some empirical results. In H. Schuler,
J. L. Farr, & M. Smith (Eds.), Personnel selection and assessment: Individual and organizational per-
spectives (pp. 11–26). Mahwah, NJ: Lawrence Erlbaum.
Schuler, H. (2002). Das Einstellungs-interview. Göttingen, Germany: Hogrefe.
Schuler, H., & Funke, U. (1989). The interview as a multi-modal procedure. In R. W. Eder & G. F. Ferris (Eds.),
The employment interview (pp. 183–193). Newbury Park, CA: Sage.
Employee Selection in Europe 941

Seisdedos, N. (1993). Personnel selection, questionnaires, and motivational distortion: an intelligent attitude of
adaptation. In H. Schuler, J. L. Farr, & M. Smith (Eds.), Personnel selection and assessment: Industrial
and organizational perspectives. Hillsdale, NJ: Lawrence Erlbaum.
Shackleton, N. J., & Newell, S. (1991). Management selection: A comparative survey of methods used in top
Britain and French companies. Journal of Occupational Psychology, 64, 23–96.
Shimmin, S. (1989). Selection in a European context. In P. Herriot (Ed.), Assessment and selection in organiza-
tions (pp. 109–118). Chichester, England: Wiley.
Shimmin, S., & Wallis, D. (1994). Fifty years of occupational psychology in Britain. Leicester, England: British
Psychological Society.
SHL. (1989). Validation review. Surrey, England: Saville & Holdsworth.
SHL. (1996). Validation review II. Surrey, England: Saville & Holdsworth.
Simoneit, M. (1938). Leitgedanken über die psychologische Untersuchung des Offizier-Nachwuchs in der
Wehrmacht [Principles of the psychological study of the officer recruit relationship in the army]. Berlin,
Germany: Bernard & Graefe.
Smith, M. (1991). Recruitment and selection in the UK with some data on Norway. European Review of Applied
Psychology, 41, 27–34.
Smith, M., & Abrahansen, M. (1992). Patterns of selection in six countries. The Psychologist, 205–207.
Spector, P. E. (2000). Industrial and organizational psychology. Research and practice (2nd ed.). New York,
NY: Wiley.
Stern, W. (1900). Über Psychologie der individuellen Differentielle [On the Psychology of Individual
Differences]. Leipzig, Germany: J.A. Barth.
Stern, W. (1903). Angewandte Psychologie. Beiträge zur Psychologie der Aussage. Mit besonderer
Berücksichtigung von Problemen der Rechtspflege, Pädagogik, Psychiatrie und Geschichtsforschung,
1, 4–45.
Stern, W. (1911). Die differentielle Psychologie in ihren methodischen Grundlagen [Differential psychology
and its methodological foundations]. Leipzig, Germany: J. A. Barth
Stern, W. (1917). Über eine psychologische Eignungsprüfung für Strassenbahnfahrerinnen [On the psychologi-
cal aptitude test of female tramway drivers]. Zeitschrift für angewande Psychologie (Vol. 31). Leipzig,
Germany: J. A. Bart
Toulouse, E. (1927). Discours [Speech]. In H. Grundlach (Ed.), (1998), Applied psychology: The first-thirteenth
Congress Proceedings of the International Association of Applied Psychology (Vol. 4, pp. 21–27).
London, England: Routledge.
Trankell, A. (1959). The psychologist as an instrument of prediction. Journal of Applied Psychology, 43,
170–175.
Van Gineken, M. (1918). De rechte man op de rechte plaats [The right man in the right place]. Amsterdam, The
Netherlands: R. K. Boek-Centrale.
Vernon, P. E. (1946). Statistical methods in the selection of navy and army personnel. Journal of the Royal
Statistical Society, 8, 139–153.
Vernon, P. E. (1947). Research on personnel selection in the Royal Navy and the British army. American
Psychologist, 2, 35–51.
Vernon, P. E., & Parry, J. B. (1949). Personnel selection in the British forces. London, England: University of
London Press.
Vinchur, A. J. (2007). A history of psychology applied to employee selection. In L. L. Koppes (Ed.), Historical
perspectives in industrial and organizational psychology (pp. 193–218). Mahwah, NJ: Lawrence
Erlbaum.
Viteles, M. (1926). Psychology in industry. Psychological Bulletin, 23, 631–681.
Viteles, M. (1928). Psychology in industry. Psychological Bulletin, 25, 309–340.
Viteles, M. (1930). Psychology in industry. Psychological Bulletin, 27, 567–635.
Viteles, M. (1932). Industrial psychology. New York, NY: Norton.
Viteles, M. S. (1938). Industrial psychology in Russia. Occupational Psychology, 12, 85–103.
Walther, L. (1929). Technopsychology in a Swiss industry. Personnel Journal, 8, 1–18.
Warr, P. (2007). Some historical developments in I-O psychology outside the United States. In L. L. Koppes
(Ed.), Historical perspectives in industrial and organizational psychology (pp. 81–107). Mahwah, NJ:
Lawrence Erlbaum.
Williams, G. O. (1940). Flying aptitude tests (Report Nº 152). London, England: Flying Personnel Research
Committee, Royal Air Force, Ministry of Defense.
This page intentionally left blank
44 Employee Selection
Musings About Its Past,
Present, and Future
Robert M. Guion

A friend went on an overseas tour with a group of people he had not known before. One man in the
group made himself a tad obnoxious by starting nearly every statement with, “Ya know what I don’t
like?” In an edited book like this, many people (including perhaps the editors) anticipate the content
of the final chapter to be a listing of things liked or, with potentially greater entertainment value,
things not liked in the chapters preceding it. This is not the role I’m taking.
These chapters offer a lot to think about. Some of them, at some point, have led me to put
a chapter down, stare off into space, and consider some implications that may never have been
intended. Some of the resulting notions and questions may be trivially semantic; others seem (to
me) important to real-life employment practice and its underlying logic. This chapter has become
a collection of musings. Some of them were stimulated by the content, even by offhand paren-
thetic remarks, in the various chapters. Some of them were things I started musing about, without
a lot of awareness that I was doing so, as I went about working on a revision of my own book on
assessment for personnel decisions—things that began to come into clearer (or at least less foggy)
focus on changes (or needs for changes) in the last few years in the thinking of some people in the
field of personnel selection. I hope they stimulate other musings, maybe even some answers to my
questions and uncertainties.

PSYCHOMETRIC MUSINGS
WHAT DO WE MEAN BY “MEASUREMENT” OR “PSYCHOMETRICS”?
In most disciplines using psychological measurement, and most of the accompanying college
coursework, the term psychometrics refers to the theory of test construction and to the statis-
tical evaluation of the result and its use. I suspect that more attention is given to the statistics
than to the theory, and I think the term should include a broader array of assessment procedures.
Maybe it should also apply to the act of measurement. As a matter of fact, in some organiza-
tions only the people who administer tests, especially individually administered tests, are called
psychometricians.
This book seems unique in that several of its chapters describe or comment on actual selec-
tion procedures, most notably Chapter 18, this volume. However, even these chapters seem to
describe overall processes of selection, of which test administration is surely a part, rather than
focusing on principles of test administration. Any exceptions seem to concentrate on electronic
means of administration, not on more general principles. Some principles are clearly stated or
at least inescapably implied; they include well-known things like standardization of procedures,
instructions, scoring, and normative interpretation. I’ve seen much that is new about theory and
statistics in these chapters, but I have not picked up any new principles of testing as such, even
where technological changes have made some of the old ones less certain than they used to be.

943
944 Handbook of Employee Selection

DO CHANGES IN WORK REQUIRE CHANGES IN MEASUREMENT PRINCIPLES?


Changes in jobs, work, and in the workplace can dramatically influence personnel selection require-
ments and redefine some old practices. Does that imply a real need for new psychometric principles?
Is change uniform? Does it affect all levels of responsibility, all occupations, all industries, or all
organizations within industries? (In a multiple-choice test those questions, with their repetition of
the word “all,” would be a dead giveaway that the answer to such items is “no.”) Change surely
occurs in some settings sooner or more suddenly or more disruptively than in others. For the others
where change is not a big deal, the old rules or principles (e.g., rules of standardization of every-
thing) are probably not affected. Those with big changes need reconsideration, and I don’t recall
finding enough discussion of reconsidered principle in these chapters or elsewhere. The biggest
psychometric changes came with the advent of computers in testing. With computerized item gen-
eration procedures capable of producing dozens of essentially parallel test forms, when computer-
ized adaptive testing requires some people to answer more questions than others do, how should we
reconceptualize the notion of standardization? What about standardization of “the interview,” over
the many degrees of structure? (I get tired of that phrase. Article after article refers to “the validity
of the interview,” as if one interview, conducted by one interviewer, were like any other.)
For a given test, for a given intended purpose, how should scores be interpreted? Is the distinction
between domain-referenced and norm-referenced interpretation of interest only in educational mea-
surement? Gebhardt and Baker (Chapter 13) don’t think so. For selection based on physical abilities,
they advocate domain-referenced test interpretations; domain referencing is frequent for testing
when cut scores are appropriate, as in licensure or certification, but wider use might be considered,
even in combination with norm-referencing. Suppose, for example, that a minimally acceptable
performance level, or perhaps some standard of excellence, could be established with some reason-
able degree of objectivity. One such value could be assigned a value of zero (implying neutrality,
not absence), and test scores might be transformed to the criterion scale to reflect how far above or
below zero a candidate might be expected to perform.
Hough and Dilchert (Chapter 14) don’t like ipsative scores for personality tests. However, for
ability testing, I can think of two kinds of testing situations where ipsative measurement makes
sense. One of these is in selecting people for rather specific kinds of training; knowing a candidate’s
relative strengths and weaknesses might be useful if a profile of trait requirements based on work
analysis can be compared to a candidate’s profile. Even without profile matching, an instructor can
use the knowledge of relative strengths and weaknesses, without necessary comparisons to other
people, to individualize instruction. The other situation is where the organization seeks diversity in
traits (not demographic diversity). A candidate whose main strengths include a trait not common
enough in the existing workgroup might be more valuable than one who is, according to exist-
ing norms, stronger in that trait—if for no reason other than the tendency to capitalize on one’s
strengths. In another application, Chan in Chapter 15 argued that profiles of personal values may be
more informative than single, isolated values. I agree; I consider it premature to declare that ipsative
scoring is out of bounds for selection procedures.
It is neither premature nor too late to expand the repertoire of characteristics to assess.
Achievement motivation should be considered more often than it is, especially to predict perfor-
mance in the maintenance phase of work (Murphy, 1989; Helmreich, Sawin, & Carsrud, 1986).
Chan in Chapter 15 sees fear of failure as a part of the need for achievement, and it may be, but what
would be found if we separate the measurement of two facets of achievement motivation? Chan
associates fear of failure with risk avoidance. I think there is a different facet, one in which beating
the odds in risky situations is a form of delightful achievement—I don’t really know that, but I think
I’ve seen occasional examples.
There’s more. Cognitive style encompasses a whole class of constructs that received a lot of
research attention a generation or so ago but seem to have gone out of style; I don’t recall much use
of the research in actual selection settings, and that’s a shame. According to Chan, they are neither
Employee Selection 945

adaptive nor maladaptive traits—merely “preferences or habits.” Selection people should recognize
that work-related habits may be strong influences on the way—and on how well—work gets done.
Constructs with similar influences are precisely what are needed for effective selection. Maybe the
advice to physicians (heal thyself) would apply to us; we may be too strongly influenced by our own
acquired habit of meeting requirements for job-relatedness (as defined by validity coefficients) to
recognize that good selection can require “preferences or habits” more specifically related to work
in general than to specific jobs. Or is that a heresy?

WHAT IF WE ABANDONED TALK ABOUT CONSTRUCTS (TRAITS) AND SHIFTED


TO A FOCUS ON WHAT PEOPLE ARE EXPECTED TO DO IN A GIVEN WORK SETTING?

This musing began while reading Lievens and Chan (Chapter 16) and revisiting the common sense
but scientifically challenged material on various adjectives preceding the word “intelligence,” any
of them likely to be comprised of multiple constructs. It led me to wonder what would happen if
we were to choose people on the basis of behavioral samples instead of signs—if we were finally to
take seriously the advice of Wernimont and Campbell (1968). I see several possible consequences,
some of them desirable. We would have to identify in somewhat general terms the behaviors to
be sampled and establish a sensible taxonomy covering the full array. We would need to develop
another new array of assessment tools that sample, at least abstractly, those behaviors of interest.
Some contemporary competency measures might fall into that array, but I suspect that eight catego-
ries of them (Bartram, 2005) would not cover it all. Another effect would be a change in the concept
of job-relatedness from construct matching to content sampling. Another personally undesirable
consequence is that I would have to write a whole new book abandoning the central theme in Guion
(1998) about forming predictive hypotheses. Sections on validity in the Standards and Principles
would have to be entirely new, not just new steps along the evolutionary path. With these changes,
would selection decisions be any better?

IS THE EVOLUTION OF “VALIDITY” COMPLETE?


That may be a stupid question. Is evolution ever complete? Probably not. The concept of validity
may evolve further or it may become an extinct historical fact for the psychometric trivia games.
Extinction is about the only real completion of the process, and maybe completing one aspect of the
topic by letting it drift into extinction merely starts a new evolutionary line.

ARE WE FINALLY THROUGH WITH THE TRINITARIAN METAPHOR?


When I wrote “On trinitarian doctrines of validity,” nearly 30 years ago (Guion, 1980), it began as a
tongue-in-cheek metaphor, maybe with some sarcasm but no intent to put a new term in the psycho-
metric lexicon. The intent was to poke fun at the fervor with which the three “aspects” of validity in
the 1966 Standards were reified and deified by many employment testers as three distinct “kinds” of
validity. The article pointed to construct validity and to constructs as the proper focus of measure-
ment and as the source of conceptual unity, but the theological metaphor and sarcasm in the original
manuscript were reduced at the behest of the issue editor. But mea culpa!
The new metaphor of choice is evolution. (I’ve used it, too.) The evolution of the validity notion
has been noted in several chapters in this volume, preeminent among them Chapter 2 by Putka and
Sackett. Some chapters depose psychometric trinitarianism and now refer to a new “unitarian”
definition of validity. Does this bring us to the end of the evolutionary path? Not likely. Trace the
evolution of the validity concept through a sequence of some definitions that have or might have
been published: Validity is the extent to which (a) a test measures what it is supposed to measure,
(b) scores on a test correlate with a criterion measure (i.e., a validity coefficient), (c) inferences from
946 Handbook of Employee Selection

scores reflect a specified construct rather than a different one, and (d) an intended interpretation or
inference from scores that can be supported by appropriate evidence and theory. This end point,
the “new definition,” is often considered a “unitarian” view, but is it? It’s not very unitary. Many
different inferences can be drawn from the same scores—as many as the different interpretations
intended by different test users. Validities of the inferences surely differ in the kinds and levels of
evidence and theory available to support them—hence, many validities. These many validities can-
not be enthroned on a single restricting adjective, nor can the many sources of validity evidence be
grouped with just a few adjectives. The number of potentially intended interpretations or inferences
from scores, and the number of validities fitting them, may be less than infinite, but it is surely more
than a few. I hope the trinitarian metaphor and the damage it has done will soon be interred, but
I don’t hold much hope that the unitarian metaphor will be any more viable.

IS THE NEW DEFINITION WIDELY ACCEPTED YET?


No, but I think it will be, largely because it is highly practical, properly and conveniently amorphous,
and because it makes good psychometric sense. It is not yet central in the thinking of people engaged
somehow in employment testing, whether journeymen at the trade or the so-called experts. Even in
this book, the vestiges of trinitarianism hang on, at least in verbal habits. The new definition includes
validity and reliability as components of the unitary concept of validity, and I surely agree that the
intended interpretations of scores should be reliable. An opposing tradition hanging on tightly is that
these are independent topics and deserve independent chapters in psychometrics books.
Validity coefficients are still too central in too many people’s ideas of validity for the new defini-
tion to be widely noticed. These coefficients are not validity; they are merely insufficient evidence
of it, even when means of distributions of validity coefficients, perhaps corrected for many kinds of
correlation-reducing errors (but hardly ever for correlation-enhancing errors) and called “true valid-
ity.” Validity generalization (VG) and meta-analysis in general have been major advances for employ-
ment testing, but they offer only one kind of evidence of validity—under the new definition.
Incidentally, it seems unwise to get too comfortable with VG. Its twin is the denial of situational
specificity. When Kehoe and Murphy in Chapter 5 refer to “local generalizations” and give six
examples of decisions about the design of a local selection system that can influence validity, they
seem to be skating quite close to a new, different, and improved version of situational specificity.
Tett and his associates (referenced in Chapter 14; see especially Tett & Burnett, 2003) call attention
to different degrees of trait activation in different environments; highly trait-activating situations
could be intentional (not erroneous) enhancements of correlation. These things will not, I think, hurt
VG findings very much, but they should temper the VG comfort levels and lead to a more systematic
search for correlation-enhancing actions.

WILL THE NEW DEFINITION MUTE CONCERNS ABOUT CRITERIA?


It’s not likely. Even those who still revere criterion-related validity coefficients as the be-all-and-
end-all of validity end up using whatever criterion they can find. This is hardly new. Between World
Wars I and II, “psychologists in general tended to accept the tacit assumption that criteria were
either given of God or just to be found lying about” (Jenkins, 1946, p. 93; an article that should be
read for fun and for its continuing significance). In chapters dealing with criteria, here and else-
where, I see little evidence that the psychometric evolution has happened in the conceptualization
and measurement of behavior at work or of its results.
Moreover, criterion-related correlations, whether the criterion is construct-related or job-related,
still constitute important data for the evaluation of intended predictive inferences. One concern
about job-related performance ratings is whether they are collected for research purposes or for
administrative purposes. Administrative use, more often than not, seems to carry with it a lot of
baggage that is likely to produce systematic error.
Employee Selection 947

In Chapter 21 by Borman, Bryant, and Dorio, it was properly noted that BARS is a system, “even
a philosophy.” Pat Smith and I developed a BARS system for our local police department. It was
designed as a system; all command officers took part in its development and continued use. The
intended purposes were, first, use by sergeants in performance observation and evaluation; second,
training and development at the patrol officer and supervisory levels; and third, to help supervising
sergeants provide performance feedback; it got used for other personnel activities as well. It contin-
ued to be used, properly and without professional monitoring, for 10 years—until the presence of
computers in every patrol car required some new anchoring behaviors and subsequent rescaling. It
was never intended to serve as a criterion for validating selection procedures. With its developmen-
tal and administrative baggage, it would not have served that function very well.

WOULDN’T A GIGANTIC MATRIX OF PREDICTOR AND CRITERION


CLASSES AND THEIR CORRELATIONS BE NICE?
Chapter 22 by Dorsey, Cortina, and Luchman on adaptive behavior, especially after other chapters
describing various taxonomies, led to really fanciful musings. Taxonomies of predictor constructs
exist and are being developed; maybe they could be consolidated and expanded into one generally
accepted taxonomic system. Maybe it could consist of categories, mentioned in several chapters,
like compound constructs or mid-level general factors. These would fall neatly between the nearly
infinite variety of precise trait facets and the inherent ambiguity of some very broad constructs.
A corresponding overall taxonomy of things worth predicting (a.k.a. criteria) could also be devel-
oped. Now imagine enough predictor-criterion correlations in each matrix cell for a substantial
meta-analysis so that each cell had a generalizable validity coefficient. (I said this was fanciful.)
Imagine how helpful such a matrix could be in choosing appropriate assessment constructs for par-
ticular work categories.1 Imagine the help it would offer when one is faced with unusual combina-
tions of criteria to predict and many different personal assets of people applying for such work. The
work reported by Dorsey et al. is a beginning for one column of criteria—adaptive work behavior—
and some cells in it. It seems much more systematic than the practice of isolated researchers follow-
ing individual interests fitting no grand plan.

HOW SHOULD TESTS THEMSELVES BE EVALUATED?


Putka and Sackett in Chapter 2 said that asking about the validity of a test is not useful, and they
are right; the useful questions concern the validity of the inferences from scores. However, they
seem to skip over the question of whether the test itself is good enough to bother getting scores on.
(They don’t, really; they surely agree with the 1999 Standards saying that test content and internal
structure are considerations in evaluating possible inferences.) At first I thought Zickar, Cortina,
and Carter (Chapter 19) were going to answer this question with their concept of “inferential appro-
priateness,” but they, too, referred more to scores than to the instruments.
To answer the question, I go back to Anne Anastasi (1986) and her insistence that validity must
be built into a test early in its development. That is not to suggest that a test intended to serve
one purpose cannot provide scores valid for another purpose; she suggested—demanded—that test
development have a purpose, that its purpose be clear, and that its structure and components be
developed to serve that purpose as well as possible. In some respects, I think she may have antici-
pated something like inferential appropriateness of structure, administration, and individual items
that are well written, well standardized, unambiguous, and definitively keyed.
Content matters, according to Kehoe and Murphy (Chapter 5), and it should be evaluated. They
(and authors of other chapters) emphasized the primary importance of defining content in terms of

1 Oops! I’d better muse about a three-dimensional matrix; a criteria such as quality of work would not be predicted by the
same variables for welders and bookkeepers. Or so I think. Maybe later, such as 10 years later.
948 Handbook of Employee Selection

important behaviors. (This leads me to offer a new but widely applicable aphorism: “Decide what
you want to do before you start doing it.”) This is an obvious requirement in developing simulations
or other work samples, and it is, I think, just as important for describing the behavioral content by
which latent traits are reflected. Content might be described in terms of intended uses or methods of
measurement. In employment testing, the description begins in work or job analysis as knowledge,
skill, ability, or other characteristics (KSAOs), which might be described as important behaviors
or perhaps constructs defined with varying precision.2 The point, for me, of Kehoe and Murphy’s
Chapter 5 is that someone evaluating a test needs to identify a clear target it is supposed to hit. The
clarity of the target depends on how well it is described or defined, and the evaluation of the test
depends on how well the target is sampled or replicated. If targets are job behaviors, sampling can
probably be straightforward enough that agreement on its adequacy or appropriateness is an easy
call. If the targets are constructs, sampling may be harder to do, but it is still necessary. When the
attributes being measured are said to be compound constructs, the problem usually gets more dif-
ficult because the literature is sparser.
The special contribution of Chapter 5 is its emphasis on justifying decisions made in a particu-
lar situation (a local situation) by generalizing from previous research. I think this means research
done anywhere, locally or somewhere else. Kehoe and Murphy refer to these justifications as local
generalizations—needed to justify local decisions, not about candidates, but about procedures to be
used in making candidate choices. Local generalization, insofar as it focuses on assessment proce-
dures, seems to be based on some empirical evidence and also on judgment.
Judgment matters, too. Kehoe and Murphy are joined by many other chapter authors in referring
often to the importance of judgment in employment testing. Such testing is typically dependent on
the judgments of job experts in defining the work to be done and the personal attributes required in
doing it. What is less typical, and may signal a vital change in the way validity is viewed, is tying
that dependency to the concept of local generalization. I wonder if we can go so far as to suggest
that, no matter what sort of evidence is used to support claims of validity, it must be interpreted in
local terms where the assessment is done and the employment decisions are made. If that is a rea-
sonable interpretation, and it seems reasonable to me, it also raises new questions about the kinds of
trait-work generalizations we should seek.
I see a practical implication in tying the new definition to local generalization. The threat, at least
the possibility, of litigation is always with us. Cronbach’s (1988) notion of a validity “argument”
(p. 3) really anticipated the new definition by calling for the organization and integration of all rel-
evant evidence and theory to try to convince colleagues, clients, and courts that the use of a specific
test for a specific purpose in a specific setting is a good thing (i.e., valid).

ARE RELIABILITY AND VALIDITY DIFFERENT?


They aren’t terribly different, a notion that appears often in these chapters. Both are concerned
with the relative freedom from error, whether error that happens consistently or error that does
not. Remember Murphy’s Law—if anything can go wrong, it will. It should always be remembered
when validity is defined in terms of planned interpretations. A recently encountered saying (maybe
on a bumper sticker) is fitting: “If you want to make God laugh, make plans.” Intended interpreta-
tions will surely be accompanied by consistent and inconsistent error, and the consistent errors can
be considered contaminants interfering with the clarity and validity of the intended interpretations.
Maybe that is why some chapters emphasize generalizability theory.
Generalizability theory is an extension of classical reliability; some have told me that it has noth-
ing to say about validity. Nuts! A strong generalizability coefficient says much more about valid-
ity than an equally strong validity coefficient does, and I think most of the authors in this volume
would agree. Classical notions of reliability are mainly concerned with inconsistent error across

2 It is fascinating—and instructive—to note that most of these chapters referred to the primary importance of work or job
analysis for all sorts of purposes.
Employee Selection 949

replications; validity has been more concerned about contamination by consistent or systematic
errors. That’s a well-established way of thinking, but it’s not very useful. Whether error is consistent
or inconsistent depends too much on the procedures used in estimating it. Generalizability analysis
examines sources of variance without categorizing them.
Including either generalizability theory or classical reliability as part of the new validity has
some elements of “déjà vu all over again.” A slightly older validity concept also held that validity is
a property of the inferences to be made from the scores, not a property of a test. Scores, of course,
are based on responses of examinees to items in the test. Thorndike (1949) listed four categories of
individual differences in responses that potentially lead to inconsistency and measurement error;
the nature of the measuring instrument was not his concern. Stanley (1971) moved Thorndike’s con-
tribution further by adding measurement error attributable to poor test administration or to scoring
errors. Zickar, Cortina, and Carter (Chapter 19) have pared the list down to three, but that was not
just condensing it. They included errors attributable to the items used, those due to raters or scorers
(and probably those administering the tests), and those due to chance factors related to timing—that
is, to events unlikely to have happened if testing had been done at a different time. They elaborated
these conditions, but the first needs no elaboration to make the point: at least to some limited extent,
unreliability is a function of the test itself. Musing about the implications of that one sentence could
fill a chapter.

IS THERE AN EVOLUTIONARY POINT OF DIMINISHING RETURNS?


The broadening of the sources of error from Thorndike to Zickar raises a quite different question:
Has the evolution of the validity concept become so complex, demanding so much judgment and
such complex analyses, that it may collapse under its own weight? I concede happily that the “good
ole days” of running a quick correlation coefficient and calling it test validity seem to be nearly over
at last. The downside of that happy conclusion is that adequacy in evaluating measurement is so far
removed from desk calculators and simple arithmetic that few organizations can do the evaluative
work required by the new definition. The highly paid time to think through the construct definition
or modeling, the numbers of cases needed for the multivariate analyses that someone will surely
consider necessary for investigating contaminating sources of variance, the mathematical sophis-
tication of some of the less familiar analytic tools, the technological controls needed to detect or
to control problems associated with the technology increasingly used in modern assessment—all
of this and still more “value-added” recommendations render it unlikely that very many employers
can do a totally complete evaluation of their tests, testing practices, and score interpretations. I’ll
bet that only very large corporations, large commercial consulting firms (probably working only on
their own proprietary tests), and large consortia within industries will be able to do justice to the
demands of the new definition. And will they be willing to do so for general edification?

DOES ANY MEASUREMENT THEORY OFFER MORE THAN AN


EXTENSION OF CLASSICAL PSYCHOMETRICS?
In the book Quantification (Woolf, 1961), several scientists, in many disciplines, were asked to
describe the quantification of concepts in their disciplines; E. G. Boring (1961) contributed a chapter
for psychology. He acknowledged quantification (i.e., measurement) in psychophysics, reaction time
analysis, early studies of learning, and individual differences. Only “mental testing” and scaling
rooted in psychophysics are now considered part of psychometric theory. Boring said that both
of these “exhibit the persistence of an ancient habit of thought [referring respectively to standard
scores and the just-noticeable difference] that, in spite of demonstrated error, continues by its accep-
tance to hinder progress” (p. 123).
Is progress today hindered by the narrow conception of psychometrics? I think so. I once asked
Ledyard Tucker, arguably one of the great influences in psychometrics, to distinguish the concept of
950 Handbook of Employee Selection

test theory from other measurement theories. His answer was an uncertain, “Er, do you mean scaling?”
(L. Tucker, personal communications, late 1960s/early 1970s). No, I did not, but I couldn’t then articulate
my meaning very well. I should have answered that scaling, especially the scaling of attitudes and values
by methods created by Thurstone and Likert, falls within the boundaries of classical test theory with its
true scores and error scores, and I wanted advice on finding alternatives beyond those boundaries.
I still do. Alternative measurement theories seem to have been around. Most of them involve
mathematical modeling, highly endorsed in many of these chapters, including information theory
and its measures of structure and uncertainty, signal detection theory, and models of sequential pro-
cesses (Coombs, Dawes, & Tversky, 1970). Although signal detection theory expanded psychophys-
ics, and although structure and uncertainty bear some resemblance to consistent and inconsistent
variance, these approaches to measuring psychological variables regrettably seem outside of the
typical psychometric boundary. So are measurement and classification procedures now used in
neural and brain physiology research.
Do these various approaches to quantification differ so much from each other that their underlying
assumptions and rationale constitute different measurement theories? If we try to fit all of them under
the classical test theory umbrella (as this handbook may be doing), are we routinely overlooking pos-
sibilities for new or expanded approaches? Is classical test theory just one psychometric theory among
many or is it pretty much complete with testing and scaling methods kept pretty much to itself?
I didn’t get answers to questions like these from Woolf’s (1961) volume, and I didn’t find them in
these chapters or in the larger literature. The quantification approaches chronicled in these chapters
and in that literature always seem to be marked by some consistencies that match or recall various
definitions of true scores, and they are always plagued by inconsistencies that seem close to the
concept of error scores. Maybe classical test theory, as expanded, is the only measurement theory
that is alive, well, and growing. The concept of validity has evolved, and so has the whole of classi-
cal test theory. Gulliksen (1950) took it further than Spearman, and it has continued to change since
then. Yet I still wonder: Do those of us in psychometrics have something to learn from the way other
psychologists measure their variables?

MUSINGS ON THE PERPETUAL DISCONNECT: RESEARCH AND PRACTICE


I admit my bias when it comes to the role of a psychologist working in an organization. I have long
been convinced, now more than ever, that the so-called scientist-practitioner model is essential to
career success and to the success of an applied discipline. Industrial psychologists (as they were
known in the days of the American Association for Applied Psychology) had broad professional and
scientific interests, and they were scientist-practitioners before the term was invented.
Long before I ever heard the term, I was especially impressed by a psychologist in a pharmaceutical
company who epitomized the concept for me. In our first conversation, he described three projects he
was working on. One was a fairly routine test validation study. Another was a study to test a hypoth-
esis he had derived for a training program he was to develop from learning theory. The third was a
classical psychophysics study to determine just noticeable differences in shades of pink. The first was
a conventional personnel research project. The second borrowed from experimental psychology. The
third mined the history of psychology for a product development project in a company manufacturing
“pink pills for pale people” that had to be able to distinguish at a glance one pill from another. His job
was not a management job. He was a research psychologist hired to be useful to the company when-
ever company problems could be explained or solved by psychological knowledge and methods.

WHAT HAS HAPPENED TO THE SCIENTIST-PRACTITIONER MODEL IN


INDUSTRIAL-ORGANIZATIONAL (I-O) PSYCHOLOGY?
Has it decayed? It almost seems so. At best, it has shrunk. The idea of the company psychologist doing
psychological research the company needs to have done does not appear prominently in these chapters.
Employee Selection 951

From general observation, I think the closest many company psychologists come to research is moni-
toring the work of outside consultants hired to do it. That research seems (I have no systematically
acquired data) to be limited to topics in human resources—no psychophysics, and I don’t recall even
the currently popular neurosciences having much impact on either practice or research in employee
selection or other aspects of the human resource domain, let alone organizational problems in marketing
or product development or other areas outside of that domain. This restriction of scientific interest, of
course, is the pattern in the larger discipline of psychological science, in the larger galaxy of “the sci-
ences,” and in the scholarly disciplines in general. Specializing starts in graduate school if not earlier,
and collegiality seems limited to people with similar specialties. It isn’t only with meteorologists that
psychologists talk about the weather; that’s all they have in common with scholars in other specialties
as well. Although the phenomenon of specialization is everywhere, it seems to be a rare company
psychologist who combines research and its application even within a narrowly prescribed field.
Academic researchers in “applied” psychology seem also to avoid the practitioner role. Some of
them in earlier times would take on consulting or even summer adjunct jobs to stay familiar with issues,
problems, and outlooks in actual employing organizations. Their goals were to do “practical” research.
Even avowedly applied journals are now filled, sometimes exclusively, with articles stemming from or
contributing to theories of interest to the researcher if not to organizational decision-makers.

WILL THIS HANDBOOK HELP CLOSE THE GAP?


I think it can. The foregoing is clearly the ranting, without apology, of an octogenarian. The happier
news is that the harangue must be somewhat modified by this book. The chapters here hold out a great
deal of hope that company psychologists and academic researchers can indeed play collaborative
roles, even if not within the same skin. The book contains important research and theoretical contri-
butions. It offers explicit help and information for practitioners (and for academics who are aware of
the need to stay in touch). The book as a whole offers the kind of mix that is important to both sides
of the scientist-practitioner hyphen. The model is not yet dead, and this book may offer life support.

HABIT TRUMPS PROGRESS


Putka and Sackett (Chapter 2) said, “perspectives on reliability and its estimation have evolved greatly
since Gulliksen’s 1950 codification of CTT, yet these advances have been slow to disseminate into
personnel selection research and practice” (pp. 9–10, italics mine). I think personnel selection research
lags, in areas other than reliability, in knowledge and understanding of psychometric theory. What
proportion of I-O psychologists still think of psychometric evaluation only in the language of the 1966
tripartite division of aspects (or, horrors, of kinds) of validity? I’m sure the proportion of informed
readers is growing, but I’m equally sure it is not as high as their former professors would like.
Common words given special technical meanings may account for some of the lag. I doubt that
Spearman ever intended his basic equation (X = t + e) to mean that an obtained score equals a score
component consistent over replications and an inconsistent component, X = c + ic. I like the distinc-
tion between the error concept in reliability and the corresponding error concept in validity, but
until the Putka-Sackett definition becomes the definition of error score (and therefore of true score
in the Spearman formula) the true score concept will remain ambiguous, subject to each reader’s
unique if unexpressed idea of what the word “true” means.
Habits rooted in the distant past impede progress in practice and in research. I’ve railed unsuc-
cessfully against some other habits for over half of a century. One pernicious habit is insistence
on a single criterion, combining any available sort of stuff into a single index number that doesn’t
measure anything. Another is the unwitting and automatic assumption that all regression is linear
and therefore precludes consideration of logistic or inverted U parabolic regressions or growth tra-
jectories over time, and that linear multiple regression is so inherent in nature that profile matching
is not worth considering. I’m pleased that Schmitt, Arnold, and Nieminen (Chapter 3) report that
952 Handbook of Employee Selection

multidimensional criteria are becoming more widely accepted, and I’m happy to note that several
chapters at least show some willingness to consider nonlinearity and profile matching.

WHAT WOULD HAPPEN IF WE BROKE OLD HABITS?


We might get some really good ideas. Within organizations, we might apply various branches of
psychology to solve organizational problems. We might toss out habitual terms, like validity, in
favor of more clearly descriptive terms like evaluation. We might even look to measurement models
outside of those of classical psychometrics, models used in other branches of psychology, and even
models from other disciplines, such as biology or meteorology.

HOW DO PSYCHOLOGISTS DIFFER FROM MANAGERS?


What is the proper role of the company psychologist in relation to managers? It would be admirable, I
think, if the company psychologist were perceived as an in-house consultant. Problems and roadblocks
happen in organizations, and many of them are least partly psychological in nature. Some of them may
be noticed by the professional psychologist who (better than an external consultant) can bring them to
management attention; some of them may be noticed by management and brought to the attention of
the internal consultant. In either direction, the company psychologist should be enough of a generalist
to recognize various psychological implications beyond his or her own special interests—and to know
where to go (including external consultants) to find expertise related to those implications.
A consultant role implies mutual respect among those who consult and those who are consulted.
Those who seek consultation (the managers) should respect the consultant enough to consider seri-
ously the suggestions, advice, or findings the consultant reports and to ask searching questions
about any of it they do not clearly understand. The consultant should respect the managers enough
to recommend actions, including proposed research, and to explain the recommendations thor-
oughly enough to show respect for the managers’ abilities to understand and absorb and use tech-
nical or complex information. The fact that this is a hard role to play successfully is apparent in
several chapters, but that does not argue against the effort.
The company psychologist who aspires to a scientist-practitioner role may face two opposing
pressures, one from the demands of the science and the other from the demands (or, worse, the lack
of concern or interest) of the boss. For example, the pressure to do good science calls for clean data;
management may say that new data collection is unnecessary because data exist in the files. If the
keepers of the files have taken care to be sure that file data are not dirty or rancid, it may not matter;
typically, however, the files have not had that level of care.
Accepted statistical procedures for scientific research, as Schmitt, Arnold, and Nieminen (Chapter 3)
made clear about utility analysis in particular, seem arcane and downright misleading in the percep-
tions of most managers. I’m not sure that the chapter authors have shown how structural equation or
multilevel models fare in most companies, but I suspect no more convincingly than utility analysis.
Still more difficult are situations in which research evidence is solid but management doesn’t
want to bother with it. Suppose research shows substantial advantages for considering differences
in test scores throughout the range of a distribution. This might suggest top-down decisions, or
maybe banding or multiple-category expectancy charts, to guide selection decisions, in contrast to
management’s call for cut scores in its quest for speed and simplicity. We must somehow find ways
to minimize such differences in perspective.

HOW CAN WE GET BETTER ASSESSMENT-BASED DECISIONS? WHAT INFORMATION DO


DECISION MAKERS NEED WHEN THEY MAKE ASSESSMENT-BASED DECISIONS?
Many chapters emphasize the importance of clearly defined goals and values. It’s good advice but
not easily followed when different people have different views, as they typically do. I wonder what
Employee Selection 953

would happen if the conference method of management were resurrected. The method, used in the
1940s and 1950s, brought together people with different views about something—such as different
ideas on how to use test scores in making selection decisions—with a conference leader trained and
experienced in moving people toward a consensus. Retrieving this old notion might not only reduce
the tensions between selection researchers and decision-makers, but also those among decision-
makers. It might lead over the long term to less confrontational conversations on other issues. It
would be more productive than muttering about getting one’s own way without considering the qual-
ity of other points of view. Remember the Carl Rogers prescription for conflict reduction: Express the
other person’s ideas, in your own words, to his or her satisfaction (Rogers & Roethlisberger, 1952).
Suppose some people supported top-down decisions, and others objected that such a procedure is
too mechanical and leaves no room for discretion. An alternative might be to use relatively narrow
score bands for a “top-band down” policy. Perhaps bands in an expectancy chart could be used, as
Tippins, Papinchock, and Solberg (Chapter 17) said, so that “a person in one range of scores may be
hired without any other education or experience credentials while another person with a score in a
lower range may be hired only if he or she has certain kinds of experience.” (That seems to me to be
a good idea.) Or should the scores simply be given to decision-makers as “one source of job-relevant
information that they use according to their professional judgment” (an alternative often suggested,
although I believe the internal consultant should offer more guidance than that). Clearly, different
ways to use test scores require different levels of judgment from decision-makers, from no room at
all for judgment (as in strict top-down selection) to a minor role for assessments.

TO JUDGE OR NOT TO JUDGE—THAT IS NOT QUITE THE QUESTION


The dichotomy, to judge or not, is silly. Even selecting top-down from scores can invoke a ques-
tion of the integrity of the higher scores (did this applicant or a surrogate actually take the test?).
Judgments happen, but to what degree should decisions be based on judgments rather than scores?
Managerial discretion is rarely considered in standard validation research, but it is more common
than validation designs acknowledge, even in the newly evolved notions of validity. In many civil service
jurisdictions, something like the “rule of three” sends the three or so highest-scoring candidates to
the decision-making manager, who (probably after interviewing all three) makes an offer to one of
them, often with no requirement to give reasons for the choice. A looser version exists in private busi-
ness; the number considered might be more or less than three, depending on what is in the application
files, but the reasons for the choice of the group to be considered, and for the final selection, had better
be clearly placed in the record. As Tippins et al. (and others) have pointed out, in the United States (at
least) failure to consider the potential for litigation can lead to a lot of avoidable trouble.
However, I’m more concerned with being able to justify (i.e., to validate) the interpretations
made in selecting one candidate instead of another. To whatever extent judgment enters the decision
process, judgment must somehow become a part of the validation exercise. Maybe I’m referring
here to a distinction between validating interpretations of test scores and validating decisions. We
don’t often do the latter, although Kehoe, Mol, and Anderson (Chapter 10) came pretty close with
their suggestion about fitting together the development of a selection system and its delivery. That, I
think, requires the consultant who develops the system to give the decision-maker the information
relevant to the decisions, to help the decision-maker understand and learn how to use the informa-
tion, and then to permit her or him to use some intelligence and accept some responsibility for the
quality of the decisions made. Is that a psychometric heresy?

WHAT IF THE PRACTICE OF PERSONNEL ASSESSMENT BECAME TOTALLY


DOMINATED BY A FEW VERY LARGE CONSULTING ORGANIZATIONS?
That is, no individual consultants operating alone, no consultants from test publishing houses (which
might even disappear), no in-house testing specialists. Would assessment tools become exclusively
954 Handbook of Employee Selection

the proprietary assessment procedures of the large fi rms? Would their technical merits (or demerits)
be known and understood by client organizations? Would professional organizations or regulatory
agencies insist (successfully) that the proprietary tests meet the same standards of excellence that
researchers (in or out of large research organizations or publishing houses) are expected to meet?
Would the very size of these consulting organizations make possible and facilitate the large-scale
research that could solve many currently intractable psychometric questions? If so, would they fol-
low professional scholarly traditions and share the results or hold them close and proprietary for
the sake of competitive advantage? These questions flowed freely as I read parts of Chapter 17 by
Tippins et al.

WILL LOCAL RESEARCH BE USEFUL FOR PREDICTING PERFORMANCE OR ITS OUTCOMES?


I believe Kehoe et al. (Chapter 10) would say “yes.” They contend that internal selection calls for
different procedures than those designed for hiring from the outside. Much more is known about
internal candidates, and there seems to be no good reason to ignore it. There is a risk of letting
personal relationships or bygone events weigh too heavily in the decision, so this may be an area in
which policy-capturing research could lead to ways to evaluate various kinds of prior information.
Of course, essential skills should be assessed as objectively as possible for internal and external
candidates.
Quite apart from that, the usual alternative is reliance on meta-analyses, and that scares me.
What kinds of meta-analysis can be expected in the future if no new local studies are done? It
would have to be based on old data, and date of research has been reported as a moderator in several
meta-analyses. Another problem is that the new concept of validity requires consideration of more
evidence for evaluating the validities of test score interpretations than validity coefficients alone. Is
meta-analysis alone evidence enough for evaluating the validity of interpretations other than predic-
tive ones? At best I am wary of long-term reliance on meta-analysis without local research. However,
Kehoe and Murphy (Chapter 5) seemed persuaded that meta-analysis, and synthetic validity as well,
do offer generalizations that can be extended to a local situation.

IS EDUCATION REALLY BROADENING?


It has been said that education is broadening. It seems that we have educated ourselves, and it
seems to have led to broader, more general concepts in preference to narrower, more specific ones. I
thought Pat Smith (1985) had settled the issue when she said that if we want to hit the side of a barn,
we use a shotgun, but to hit a bullseye painted on it, we need a rifle. Abandoning metaphors, if we
want to predict broad, general criteria, we use broad, general predictors; to predict a highly specific
criterion, we need measures that focus on highly specific traits. Many chapters seem to settle it all
over again—but differently. Doesn’t the general trend in these chapters treat the question as part of
a more general, evolutionary trend? Work itself seems to be evolving. We used to think of the “work
one does” and the “job one has” as redundant expressions. Now the boundaries of one’s job are per-
meable, so that people shift regularly from one set of tasks to another. In our local hospital kitchen,
I’ve seen chefs washing dishes and floors, and I’ve seen the man hired to do the heavy work making
out the food orders for the coming week. In research and practice, the trend seems to be moving in
the direction of ever broader, more inclusive job descriptions. Where, I wonder, will the broadening
trend, in this and other concepts, take us?

WILL OUR CONSTRUCTS BE BROADER AND MORE GENERAL OR MORE NARROWLY FOCUSED?
The trend, clearly reflected in most of these chapters, is toward broader, less fractionated constructs.
Roughly half a century ago, we sought ever-smaller factors of human attributes; now we seek com-
petencies that are as atheoretical as factor analyzing any matrix of correlations we can put together.
Employee Selection 955

They are, nevertheless, large hunks of what people must be able to do if they are to do their work
well. For example, for sales clerks a necessary competence may require getting things done fast for
customers, although speed and customer satisfaction with the service may not be compatible. Fifty
years ago we wanted factors to be small and orthogonal; now we move toward such inclusive com-
petencies that they overlap conceptually and are therefore correlated.
Even within factor-speak, hierarchical factor structures deny orthogonality. Carroll’s (1993)
comprehensive three-stratum theory of cognitive abilities is invoked in enough of these chapters to
show that leaders in employment testing are really taking it seriously. At the bottom are narrow, cor-
related abilities. The intermediate level consists of broader, more general abilities broad enough to
encompass some of the abilities in the lowest stratum but not quite broad enough to encompass them
all. These broader constructs are correlated well enough to allow a single general level of cognitive
ability to emerge. Carroll didn’t specify any particular number of factors at the low or intermediate
levels, but he extended the variety of intermediate factor content far beyond that of earlier hierarchi-
cal models, and his highest level of generality is far broader than the older concept of g.
Ones, Dilchert, Viswesvaran, and Salgado (Chapter 12) are committed to the importance of
general mental ability (a construct probably somewhat less inclusive than Carroll’s third stratum),
but they acknowledged that people at the same level of general mental ability differ in more specific
abilities “due to differential ‘investment’ of their cognitive capacity (guided by other personal char-
acteristics as well as idiosyncratic developmental and educational experiences)” (my italics). I would
add work experience to the idiosyncratic experiences. Among them, this thought may return us to a
narrower focus on kinds or categories, as well as degrees, of individual differences.
Without reliance on factors, it seems that some traits that seem undiluted (homogeneous in mea-
surement) are being combined more often to form compound traits. This was explicit in the chapter
by Hough and Dilchert on personality (Chapter 14), in which such examples as integrity or customer
service orientation were cited. It was at least implicit in discussions of work-related competencies or
cognitive abilities such as creativity. Psychomotor abilities may combine some basic physical traits
(e.g., static strength, dynamic strength, and stamina) in a compound trait simply headed strength.
Compound traits, defined as composites of homogeneous traits that may not be noticeably corre-
lated, may be better predictors than any of their component parts. (But stop to muse: Isn’t that what
we’ve done all along with multiple regression but without the assumption of optimal weights?)
In short, although concepts of predictor traits have become broader, they still include the foun-
dation of narrower, more homogeneous parts. I think Pat Smith had it right. Compared with the
past, the trend leads to a wider range of choices between excessively broad and excessively narrow
options, and we can choose as the problem at hand directs us. That is good. The availability of many
options frees us from the constraints imposed by having just a few, even if it befuddles us.

WHAT HAVE WE LEARNED ABOUT ASSESSMENT AND MAKING SELECTION DECISIONS?


We have learned to identify the attributes to be assessed more clearly through more clearly targeted
analyses of work performed (Pearlman & Sanchez, Chapter 4). We have learned much more (or
maybe just nailed it down better) about forms of evidence needed in evaluating the interpretations
test users make from the scores (Putka & Sackett, Chapter 2; Schmitt et al. Chapter 3; Kehoe &
Murphy, Chapter 5; and several other chapters). That sentence itself shows important new learning
about the precise use of psychometric language; we used to speak ambiguously of evaluating (vali-
dating) tests, but we speak more accurately now with what I’ve been calling the new defi nition. We
have learned that generalization is a two-way street; not only do we generalize to the wider world
(i.e., population) from local studies, but we take our broad generalizations and turn them into local
generalization (Kehoe & Murphy, Chapter 5). We are learning how to learn about assessment’s
organizational values (Cascio & Fogli, Chapter 11; Ployhart & Weekley, Chapter 9). After reading
the many chapters that refer to generalizability as the way to do a thorough reliability study, more
of us will learn to do those studies. We are in the process of learning how to cope with and use new
956 Handbook of Employee Selection

electronic means of assessment (Reynolds & Dickter, Chapter 8), and if new electronic tools keep
begetting new ones, we’ll never run out of new things to learn.
Despite all this learning, we have not seemed to learn much about integrating effective assess-
ment with effective decision-making. Too many decision-makers make employment decisions only
now and then, and they find the assessment information unfamiliar and unfathomable. Too many
assessment experts accept the limitations of the occasional makers of selection decisions and try to
make their lives easier (a purposefully ambiguous reference: Whose lives?). Once again, what would
happen if testers made a concerted effort to be sure the decision makers genuinely understood the
implications of score differences? The answer, I suspect, is that decisions would themselves be more
valid because they will have considered more fully the implication of effective assessment and the
scores it produces.

DOES THE NOTION OF SYSTEMS MOVE US TO BROADER CONCEPTS?


It seems that Gestalt psychology is alive and well, although we now use more syllables, referring to
integrated systems. (Six syllables instead of two shows great progress.) Unfortunately, much of what
I read referring to systems seems pretty superficial, and I wonder if superficiality in the design of a
system may not do more harm than good.
The idea of a multilevel personnel selection system is, so far, only a fascinating set of possibili-
ties. The amount of research needed is a lot greater than the amount of relevant knowledge at hand;
witness the “mays” and the “mights” in Chapter 9 by Ployhart and Weekley. Chapter 10 by Kehoe
et al. gives legs to this musing: “There has been far more speculative approach than clear signs of
having arrived in terms of the focus being upon the generation of theoretical models and conceptual
think-piece papers rather than the publication of robust empirical studies into multilevel selection
efforts.” They say further that it is more likely that decision-makers will act “in a noticeably ad hoc
manner, will give different weights to different within- and cross-level variables based on some
notional ‘rules of thumb’ known only to themselves, and will be prone to a gamut of errors brought
on by information overload, imperfect information processing, and satisfying in their decision-
making strategies.” Wow! I wish I’d said that.
In any case, multilevel selection systems exist mainly in the writing of academic scholars. To
develop practical systems, researchers need to expand their mission to include explicitly the integra-
tion of selection systems in organizational routine. They’ll need the practitioners in those organiza-
tions to help them. They need more than psychometric competence or competence in experimental
design or model building. They need to manage change, at least in the sense of overcoming manage-
rial resistance to change, and maybe managerial resistance to learning challenging new ideas. They
need to practice what we preach about participation and involvement in the development of selection
procedures. All of this means that system development in this context requires becoming experts in
the use of the tools our present expertise has provided and evaluated.

A FINAL MUSING
I had not realized before trying to pin them down how many questions I have about employee selec-
tion that I can’t answer. I’ve offered tentative answers to many of them. Some of these answers may
be based on accumulated experience and reading, but some of them may be based on my own cog-
nitive habits. Many of them I have no answers for at all and, indeed, many of the questions may be
unanswerable. That is for research and practice over the next decade or so to find out.

REFERENCES
Anastasi, A. (1986). Evolving concepts of test validation. Annual Review of Psychology, 37, 1–15.
Bartram, D. (2000). Internet recruitment and selection: Kissing frogs to find princes. International Journal of
Selection and Assessment, 8, 261–274.
Employee Selection 957

Boring, E. G. (1961). The beginning and growth of measurement in psychology. In H. Woolf (Ed.), Quantification:
A history of the meaning of measurement in the natural and social sciences (pp. 108–127). Indianapolis,
IN: Bobbs-Merrill.
Carroll, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York, NY: Cambridge
University Press.
Coombs, C. H., Dawes, R. H., & Tversky, A. (1970). Mathematical psychology: An elementary introduction.
Englewood Cliffs, NJ: Prentice-Hall.
Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer & H. I. Braun (Eds.), Test validity
(pp. 3–17). Hillsdale, NJ: Lawrence Erlbaum Associates.
Guion, R. M. (1980). On trinitarian doctrines of validity. Professional Psychology, 117, 385–398.
Gulliksen, H. (1950). Theory of mental tests. New York, NY: Wiley.
Helmreich, R. L., Sawin, L. L., & Carsrud, A. L. (1986). The honeymoon effect in job performance: Temporal
increases in the predictive power of achievement motivation. Journal of Applied Psychology, 71,
185–188.
Jenkins, J. G. (1946). Validity for what? Journal of Consulting Psychology, 10, 93–98.
Murphy, K. R. (1989). Is the relationship between cognitive ability and job performance stable over time?
Human Performance, 2, 183–200.
Rogers, C. R., & Roethlisberger, F. J. (1952). Barriers and gateways to communication. Harvard Business
Review, 30, 46–52.
Smith, P. C. (1985). Global measures: Do we need them? In Division 14 Scientific Contribution Award Address.
American Psychological Association, Los Angeles, CA.
Spearman, C. (1927). The abilities of man. New York, NY: Macmillan.
Stanley, J. C. (1971). Reliability. In Thorndike, R. L. (Ed.), Educational measurement, 2nd ed. (pp. 356–442).
Washington, DC: American Council on Education.
Tett, R. P., & Burnett, D. D. (2003). A personality trait-based interactionist model of job performance. Journal
of Applied Psychology, 88, 500–517.
Thorndike, R. L. (1949). Personnel selection: Test and measurement techniques. New York, NY: Wiley.
Wernimont, P. F., & Campbell, J. P. (1968). Signs, samples, and criteria. Journal of Applied Psychology, 52,
372–376.
Woolf, H. (Ed.). (1961). Quantification: A history of the meaning of measurement in the natural and social
sciences. Indianapolis, IN: Bobbs-Merrill.
This page intentionally left blank
Author Index
A Angell, L. C., 812, 814
Angelmar, R., 803
Abe, H., 791, 792 Angoff, W. H., 99, 165
Abraham, J. D., 555 Ansbacher, H. L., 931
Abrahams, N., 161 Anselmi, T. V., 913
Abrahansen, M., 934 Anstey, E., 933
Ackerman, L. D., 449, 451, 452 Anttila, H., 932
Ackerman, P. L., 58, 300, 343, 499, 878 Aquino, K., 504
Adams, G. A., 561 Arabian, J., 57, 104
Adams-Webber, J., 449 Arad, S., 239, 402, 463, 537, 555, 833, 897
Adkins, C. L., 322, 810 Archer, D., 346
Adler, N. J., 781 Argote, L., 811
Aguinis, H., 23, 62, 205, 235, 236, 237, 238, 248, 551, 556, Armenakis, A. A., 805
559, 563, 651, 715, 774, 775 Armor, D. J., 689, 690
Ahearne, M., 480, 772 Armstead, P., 383, 385
Aiken, L. S., 10, 37 Arneson, J. J., 264
Aiman-Smith, L., 144 Arnold, J. D., 51
Al-Olayan, F., 131 Arthur, W., Jr., 63, 165, 166, 305, 308, 349, 351, 387,
Albright, L., 304 732, 804, 811
Alexander, R. A., 58, 62, 64, 66, 103, 262, 288, 598, Arvey, R. D., 55, 86, 236, 284, 285, 286, 497, 504,
866, 878 543
Alge, B. J., 469, 471 Aryee, S., 473, 480, 538, 560
Algera, J. A., 909, 918, 934 Asch, B. J., 687
Algina, J., 411 Ash, S. R., 133, 134
Allan, E., 497 Ashe, R. L, Jr., 242
Allen, E., 560 Ashford, S. J., 349
Allen, M. R., 198, 571 Ashforth, B. E., 500
Allen, T. D., 309, 454, 457, 465, 561 Ashkanasy, N. M., 342, 346, 829
Alley, W. E., 684 Ashton, M. C., 303, 495, 505
Alliger, G. M., 55, 66, 305, 826 Ashworth, S., 103, 310, 351, 873
Allworth, E., 771, 772 Ask, T., 284
Almurshidee, K. A., 131 Astin, A. W., 553
Alonso, A., 263 Astrand, P., 279, 280, 281, 282
Altink, W. M., 934 Attal-Toubert, K., 655
Alvarez, S. A., 202 Attenweiler, W. J., 33, 306
Ambrose, M. L., 500 Atwater, L. E., 826, 829
Amburgey, R., 385 Au, W. T., 501
Amelang, M., 351 Aurbach, H. A., 256
Amendola, A. A., 294 Austin, E. J., 344
Amthauer, R., 934 Austin, J. T., 58, 107, 454, 552, 572, 878
Anastasi, A., 947 Avedon, M. J., 422
Anderberg, R., 929 Averill, J. R., 300
Anderson, A., 128 Avery, D. R., 130, 131, 132, 135, 137, 139, 141
Anderson, C. K., 286 Avis, J. M., 37, 311, 768, 772
Anderson, D., 158 Avolio, B. J., 267, 505, 826
Anderson, D. R., 532, 535
Anderson, K. S., 540 B
Anderson, L. E., 86
Anderson, N. R., 73, 103, 140, 141, 197, 213, 215, Baack, D. W., 131
216, 237, 240, 261, 263, 264, 421, 424, Baba, V. V., 536
425, 455, 785, 811, 816, 823, 910, 921, Bachrach, D. G., 454, 457, 465, 480, 526
929, 935 Bagozzi, R. P., 407
Anderson, S. E., 466, 467 Bailyn, L., 561
Andersson, L. M., 490 Baird, B. M., 157
Andiappan, M., 500, 503 Baker, D. P., 804, 807, 809, 811, 815
Andrews, M. C., 205 Baker, E. L., 801

959
960 Author Index

Baker, T. A., 277, 279, 281, 282, 283, 284, 285, 287, 290, Beatty, R. W., 444, 516
293, 294 Beaty, J., 388
Ballinger, G. A., 471 Beaubien, J. M., 335
Balma, M. J., 244 Beaupré, M. G., 346
Balsam, T., 730 Becker, B. E., 88, 92, 197, 198
Baltes, M. M., 555 Becker, D. E., 692
Baltes, P. B., 555 Becker, G. S., 199, 205
Balzer, W. K., 157 Bedwell, S., 347
Banas, J., 479 Beehr, T. A., 791
Bandura, A., 727 Begin, J. P., 788
Bangert-Drowns, R. L., 430 Behan, B. A., 831
Banks, C. G., 516 Behling, O., 236
Banks, D., 476, 860 Belau, L., 813
Banse, R., 346 Bell, B. S., 136, 801, 815
Bar-On, R., 341, 344 Bell, J. F., 27
Baranowski, L. E., 86 Bell, S. T., 165, 804, 812, 813, 814, 815, 816, 817
Baratta, J. E., 775 Bemis, S. E., 598
Barber, A. E., 127, 128, 129, 135, 142, 143, 145 Benbow, C. P., 264
Barchard, K. A., 346, 351 Bendoly, E., 480
Barge, B. N., 304 Bennett, B. E., 587
Baritz, L., 572 Bennett, C. E., 265
Barlett, F. C., 929 Bennett, M., 421, 422, 423, 424
Barling, J., 497, 535, 536, 545, 561 Bennett, R. J., 490, 491, 505, 555
Barlow, C., 829 Bennett, W., Jr., 93, 401, 804
Barnard, C. I., 465 Benson, G. S., 88
Barnes, J. D., 303 Benson, M. J., 311
Barnes, L. K., 770 Bentson, C., 63, 732, 858
Barnett, R. C., 535, 541 Bentz, V. J., 826
Barney, J. B., 198, 201 Bergman, M. E., 473
Barney, M. F., 81, 82, 84, 90, 91, 236, 237, 247 Bernardin, H. J., 444, 446, 447, 514, 515, 521, 788
Baron, H., 207, 256, 305, 731, 787 Bernardy, C. J., 236, 497
Baron, R. A., 498, 499, 500 Bernstein, I. H., 600
Baron, R. M., 59 Bernthal, P. R., 722
Barrett, G. V., 9, 38, 41, 57, 58, 59, 62, 64, 90, 103, 107, Berry, C. M., 236, 260, 264, 493, 494, 495, 497, 501
164, 222, 235, 262, 288, 596, 598, 805, 878 Bertolino, M., 257
Barrick, M. R., 74, 103, 204, 245, 299, 304, 306, 308, 309, Bertua, C., 103, 261, 263, 264, 455, 910
353, 457, 477, 499, 502, 526, 540, 544, 545, 607, Bettencourt, L. A., 473
730, 765, 768, 769, 791, 810, 811, 813, 826, 884, Bettenhausen, K., 480
910, 911, 917, 918 Beutell, N. J., 561
Barron, H., 731 Bevier, C. A., 116, 160, 265, 657, 774
Barry, B., 814 Bhaskar-Shrinivas, P., 791, 792, 793
Barsade, S. G., 341 Bhattacharya, C. B., 772
Barsness, Z. I., 804 Bicksler, B. A., 687, 699
Bartlett, A. L., 813 Biderman, M. D., 348
Bartlett, C. A., 781, 782 Biderman, Y., 536
Bartlett, C. J., 287 Bierman, L., 205
Bartram, D., 57, 79, 104, 115, 119, 177, 300, 310, 731, 934, Bies, R. J., 490, 504
935, 945 Biga, A. M., 180
Bateman, T. S., 465 Bills, M. A., 490
Bauer, D. J., 158 Bilsky, W., 322
Bauer, R., 580 Bilzon, J. L., 279
Bauer, T. N., 132, 141, 144, 163, 421, 422, 423, 424, 768, Bing, M. N., 496
773 Bingham, W. V., 2, 552, 930, 932
Baughman, K., 204, 776 Binning, J. F., 9, 38, 41, 57, 58, 59, 105, 107, 222, 235, 596,
Baum, R. J., 832 805
Baumeister, R. F., 300 Birati, A., 504
Baumgarten, F., 926, 930 Birkeland, S. A., 748
Baumgartner, H., 409, 410 Birkland, A. S., 300
Bax, E. H., 205 Bisqueret, C., 352
Bayless, A., 878 Bizot, E. B., 902
Beach, L. R., 142 Bjork, R. A., 326
Beadle, D., 129 Black, J. S., 791, 792, 794
Bearden, R. M., 303 Blackburn, M. A., 383
Beatty, A. S., 256 Blackman, M. C., 306
Author Index 961

Blair, C. A., 467 Brashers, D. E., 13, 18, 26


Blakely, B. B., 897 Brass, D. J., 469, 471, 478
Blakely, B. R., 281, 284, 285 Braun, H. I., 596
Blanchard, K. H., 826 Braverman, E. P., 347, 731
Blanco, T. A., 691 Bray, D. W., 732, 829, 843, 845, 846, 847, 848, 850, 851,
Blasco, R., 934 852, 854, 855, 856, 860, 862
Blau, P. M., 471 Breaugh, J., 127, 134, 136, 137
Blickensderfer, E., 803, 804 Brennan, R. L., 10, 11, 14, 15, 16, 17, 18, 19, 20, 22, 23, 24,
Bliese, P. D., 36, 202, 354, 465, 469, 470, 472, 775 25, 27, 29, 31, 32, 33, 34, 35, 37, 102
Block, J., 300 Brentz, R. D., 517
Bobko, P., 116, 160, 162, 177, 236, 265, 269, 287, 329, 351, Bretz, R. D., 128, 129, 130, 138, 141
583, 632, 657, 773, 774 Brewster, C., 218, 794
Bochner, S., 793 Bridges, W., 82
Boehm, V. R., 57 Brief, A. P., 465
Boggild, H., 536 Briggs, A., 813
Bolanovich, D. J., 686 Bright, A., 80
Boldt, R. F., 688 Brinley, A., 562
Boles, J. S., 560 Brockner, J., 422
Bolgar, C., 248 Brogden, H. E., 56, 884
Bolino, M. C., 481 Brogden, H. F., 392
Bollen, K. A., 412 Brooks, M. E., 128, 129
Bommer, W. H., 480 Brooks-Shesler, L., 475, 476
Bonache, J., 218 Bross, A., 794
Bonaventura, E., 930 Brown, A., 177
Bono, J. E., 826, 828 Brown, C. A., 490
Boon, C., 88 Brown, C. W., 61
Bordeaux, C., 562 Brown, D. C., 694
Boring, E. G., 949 Brown, D. G., 697
Borkenau, P., 346 Brown, D. J., 130, 133, 134, 135, 772
Borman, W. C., 65, 77, 82, 86, 180, 242, 303, 309, 439, Brown, G. N., 882
440, 444, 445, 446, 447, 448, 449, 450, 451, Brown, K. G., 197, 240, 245, 371
452, 453, 454, 456, 457, 464, 465, 466, 467, Brown, M. E., 501, 826, 903
468, 470, 472, 480, 496, 514, 526, 540, 544, Brown, S. H., 774
551, 553, 554, 555, 742, 827, 863, 877, 887, 897 Brown, S. P., 329
Born, D. H., 679 Brown, T. A., 35
Born, M. P., 425, 791, 929 Brown, T. J., 769
Borneman, M. J., 258 Brown, T. S., 490
Borsboom, D., 9, 10, 11, 13 Brown, W., 11, 12, 23, 24
Borucki, C. C., 917 Brubaker, P. H., 279
Boselie, P., 88 Brubaker, T. L., 545
Bost, J. E., 29 Bruce, N., 827
Boswell, W. R., 197, 203 Bruchon-Schweitzer, M., 934
Botero, I., 479 Bruck, C. S., 561
Bouchard, T. J., 259 Bruhn, M., 765
Boudreau, J. W., 66, 135, 239, 240, 241 Bruk-Lee, V., 491, 498, 499, 500
Bowen, C. C., 303 Bruno, F., 533
Bowen, D. E., 200, 204 Brush, D. H., 448, 449, 450, 451, 452, 453, 897
Bowers, C. A., 803, 807, 809, 812, 814, 815 Brutus, S., 422
Bowler, W. M., 469, 471, 478 Bruursema, K., 491, 496, 498, 501
Bowman, D., 774 Bryant, R. H., 439
Bowman, D. B., 339, 340, 344 Bryden, M., 266
Bownas, D. A., 93, 743, 744, 751 Bryk, A. S., 37
Boyatzis, R. E., 341 Buboltz, C., 903
Boyce, A. S., 140, 267 Buchholz, R. L., 284
Bracken, D., 563 Buck, D. E., 466
Brackett, M. A., 342 Buckley, M. R., 446, 521, 788
Braddy, P. W., 133 Budd, J. W., 504
Bradley, J. C., 354, 775 Buffardi, L. C., 136
Brand, C. R., 259 Bui, T., 824
Brandel, M., 131, 144 Buiten, B., 932
Brandt, C. J., 86 Bunderson, J. S., 813
Brannick, M. T., 74, 77, 82, 84, 115, 116, 119, 162, 494, Burch, G. S. J., 215, 811, 816
748, 894, 897 Burchell, B., 561
Brannik, M. R., 825 Burke, C. S., 477, 806, 809
962 Author Index

Burke, E. F., 159, 261, 378, 380, 456 Carroll, J. B., 154, 258, 259, 922, 955
Burke, M. J., 916, 917 Carroll, J. M., 300
Burke, R. J., 500, 541, 724 Carroll, S. A., 136
Burke, S., 463, 464 Carson, K. P., 66, 910, 917, 918
Burnett, D. D., 89, 310, 946 Carsrud, A. L., 944
Burns, M., 456 Carter, G. W., 308, 771
Burris, E. R., 500, 503 Carter, N., 399
Burroughs, W. A., 775 Carver, C. B., 291
Burse, R. L., 280 Casady, T., 497
Burt, C., 933 Cascio, W. F., 60, 62, 88, 164, 205, 235, 236, 237, 238, 239,
Burton, J., 538 240, 241, 242, 246, 247, 248, 287, 288, 521, 522,
Burton, W. N., 531 551, 553, 556, 559, 563, 596, 651, 715, 745, 775
Busch, C. M., 770 Casella, G., 20
Buster, M. A., 162 Casper, W. J., 136, 562
Butler, R. J., 543 Caspi, A., 309, 497
Butterfield, K. D., 380, 384, 385 Catano, V. M., 266, 351
Button, S. B., 329 Cattell, R. B., 17, 259
Buyse, T., 158, 351, 761, 935 Cellar, D. F., 772
Byham, T. M., 735 Cerone, S., 205
Byham, W. C., 735, 801, 804, 809, 829, 830, 843, 844 Chadwick, C., 198
Byrne, B., 406 Chadwick-Jones, J. K., 490
Byrne, B. M., 406 Chaffin, D. B., 281
Byrne, D. E., 812 Chalmers, C., 204
Chan, D., 55, 154, 157, 160, 257, 260, 300, 302, 306, 321,
C 324, 326, 327, 328, 330, 332, 333, 335, 339, 342,
346, 347, 348, 349, 350, 351, 352, 353, 354, 355,
Cable, D. M., 127, 128, 129, 130, 133, 134, 136, 137, 142, 423, 433, 469, 472, 632, 760, 773, 786, 811
144 Chang, H. H., 156
Cabrera, E. F., 241 Chao, G. R., 304
Cain, P. S., 887 Chapman, D. S., 129, 133, 136, 137, 141, 142, 236
Calder, B. J., 824 Charam, R., 735, 830
Caldwell, C., 584 Charles, N., 143
Caldwell, M. S., 58, 878 Chasteen, C. S., 109
Caligiuri, P., 781, 782, 784, 787, 790, 791, 792, 793, 794 Chatman, J. A., 215
Callahan, C. M., 90 Cheatham, D. W., 516
Callender, J. C., 607 Chemers, M. M., 826
Camara, W. J., 595, 610, 684 Chen, C. B., 134
Camobreco, F. E., 826 Chen, G., 469, 472, 475, 477
Campbell, D. T., 54, 59, 99, 108, 598 Chen, J. J., 481
Campbell, J. P., 23, 58, 65, 66, 67, 160, 258, 269, 305, 311, Chen, P. Y., 500
327, 351, 440, 447, 449, 450, 451, 452, 453, 454, Chen, Z. X., 473, 538
457, 464, 551, 553, 554, 555, 558, 680, 684, 690, Chenoweth, D., 294
721, 728, 824, 865, 877, 878, 880, 881, 882, Chernyshenko, O. S., 158, 180, 300, 303, 304, 658
883, 945 Cherry, B., 182, 186
Campbell, R. J., 846 Cheung, G. W., 406
Campbell, W. J., 157 Cheung, N., 346
Campion, J. E., 137, 404 Childs, R. A., 896
Campion, M. A., 62, 83, 85, 87, 90, 164, 215, 347, 404, Chipman, S. F., 80
421, 422, 423, 424, 473, 731, 803, 805, 806, Chleusebairgue, A., 923
809, 814, 817, 828, 829, 883, 904 Chmiel, N., 921
Canavan, F., 539 Choi, S. W., 155
Cannon-Bowers, J. A., 456, 801, 802, 803, 804, 809, 815, Christal, R. E., 79, 299
910 Christensen, F. G. W., 256
Canter, M. B., 587 Christiansen, N. D., 109, 303, 308, 791, 827
Caplinger, J. A., 54, 607 Christie, R., 539
Cappelli, P., 182, 198 Chuah, S. C., 347
Cardy, R. L., 514 Chuang, A., 768
Carey, N. B., 690 Church, A. H., 563, 791, 792, 829
Carless, S. A., 136 Cizek, G. J., 63, 377, 380
Carlson, K. D., 135, 729 Clark, J. P., 490
Carr, L. S., 90, 239 Clark, K. E., 829
Carr, W. K., 688 Clark, M. B., 829
Carretta, T., 162 Clarke, S., 543, 544
Carretta, T. R., 259, 270, 545 Claudy, J. G., 690
Author Index 963

Clause, C. S., 760, 761 Cronshaw, S. F., 66, 305


Cleary, T. A., 287 Crookenden, M. P., 691
Cleveland, J. N., 107, 514, 551, 553, 556, 560, 561 Cropanzano, R., 55, 471, 501
Coates, J. F., 82 Cross, T., 473
Cober, A. B., 133 Crosson, J. S., 690
Cober, R. T., 130, 133, 134, 135 Crouter, A. C., 903
Cohen, J., 59, 162, 241, 265 Crump, C. E., 279, 280, 282, 286, 290, 294
Cohen-Charash, Y., 493, 494, 496, 501 Cudeck, R., 258, 883
Colakoglu, S., 782 Cui, G., 792, 793
Colbert, A. E., 197, 240, 245, 371, 477, 502, 810 Cullen, J. C., 479
Colbert, B. A., 197 Cunningham, J. W., 82, 90
Cole, E. J., 825 Cunningham-Snell, N., 141
Colella, A., 329, 551 Cureton, E. E., 38
Coleman, V. I., 465, 467 Curran, L. T., 690
Collins, B. J., 480 Curry, J. E., 281
Collins, C. J., 127, 130, 196
Collins, J., 832 D
Collins, J. J., 531
Collins, J. M., 260, 490 Dahl, H. A., 279
Collins, K. M., 560 Dalal, R. S., 469, 471, 493, 494, 498, 501
Colquitt, J. A., 202, 464, 469, 470, 473, 495, 501, 537 Dalessio, A. T., 771, 775, 829
Combs, J., 198, 571 Dalldorf, M. R., 690
Concelman, J., 735 Dalton, M., 791, 829
Conger, J. A., 829 Damos, D. L., 927
Congleton, J. J., 294 Dana, J., 391, 393
Conlon, D. E., 135, 495, 501 Daniel, M. H., 259
Connelly, B. S., 258, 300, 304 Daniels, D., 321, 471
Connerley, M. L., 135, 141, 236, 497 Daniels, E., 472
Connolly, J. J., 304 Dany, F., 934
Conte, J. M., 82, 288, 710, 771 Darandari, E., 31
Converse, P. D., 902 Darby, M. M., 684
Conway, J. M., 33, 305, 347, 349, 480, 809, 860 Darley, J. G., 889
Conway, M., 824 Darling, R. W., 162
Conway, N., 88 Daus, C. S., 342, 828
Cook, T. D., 54, 59, 108 Davidshofer, C. O., 105, 107, 367
Cool, K., 199 Davidson, S. L., 539
Coombs, C. H., 950 Davies, M., 300, 342
Cooper, C. L., 560, 563, 724 Davies, S. E., 80, 115, 236, 300, 607, 731
Cooper, J., 533 Davis, D. D., 785, 786
Cooper, L. A., 305, 827 Davis, J., 388
Cooper, M. L., 540, 560 Davis, J. E., 59
Cordes, C. L., 540 Davis, P. O., 282
Cortina, J. M., 60, 301, 306, 399, 463, 469, 473, 475, 770, Davis, R. D., 263, 496
775 Davison, H. K., 306, 916, 917
Cosentino, C. J., 721, 735 Dawes, R. H., 950
Costa, P. T., 300, 306, 540, 791, 814, 826 Dawes, R. M., 64, 391
Costanzo, M., 346 Dawis, R. V., 80, 330, 896
Côté, S., 353 Day, A. L., 351
Coward, W. M., 62, 264, 311 Day, D. V., 138, 183, 257, 311, 422, 710, 771, 773, 784, 787,
Cox, J., 832 788, 791, 792, 823, 834
Cox, T. H., 812 Day, E. A., 63, 109, 308, 349, 811, 812, 813
Cozzarelli, C., 540 Day, J. D., 343
Craciunescu, R., 933 De Clercq, S., 657
Craig, B. N., 294 De Corte, W., 90, 161, 164, 265, 269, 580, 631, 715, 774,
Craig, S. B., 413, 829 935
Craik, K. J., 929 De Dreu, C. K., 814
Cran, D. J., 770 de Fruyt, F., 103, 261, 263, 264, 910
Cranny, C. J., 103 De Hoogh, A. H. B., 478
Cravens, D. W., 774 de Jong, M., 657
Crawford, M. S., 281 de Luque, M. S., 833
Crocker, L. M., 99, 411 De Meuse, K. P., 809
Cron, W. L., 329 De Pater, I. E., 791
Cronbach, L. J., 12, 13, 14, 16, 17, 21, 22, 23, 24, 25, 26, 27, de Pontbraind, R., 521
31, 32, 36, 37, 38, 101, 102, 157, 245, 596, 948 De Raad, B., 342
964 Author Index

de Vries, R. E., 495 Dixon, G. R., 801, 809


De Wolff, C. J., 934 Do, B.-R., 156, 388
Deagle, E., 475 Dobson, P., 89
DeArmond, S., 468, 470 Dolen, M. R., 421
Deary, I. J., 255, 268 Donahue, L. M., 812, 813, 815
Debrah, Y. A., 473, 538 Donnellan, M. B., 157
DeCorte, W., 381 Donoghue, J. R., 180
DeCotiis, T., 445 Donovan, D. T., 769
D’Egidio, E. L., 109, 607, 897 Donovan, J. J., 309, 457, 607
DeGrendel, D. J. D., 772 Donovan, M. A., 239, 402, 463, 473, 537, 555, 833
DeGroot, T., 310, 473 Donsbach, J., 900
DeJoy, D. M., 537 Dorans, N. J., 155
DeKoekkoek, P. D., 555 Dorfman, P. W., 833
Delaney-Klinger, K., 90 Dorio, J., 439
Delany, T., 127 Dorsey, D. W., 454, 463, 464, 472, 526
Delbridge, K., 760 Dotson, C. O., 282
Delery, J. E., 197, 205 Doty, D. H., 197
Deller, J., 302 Dougherty, T. W., 137, 540
DelVecchio, D., 133 Douglas, C., 343
Demaree, R. G., 909, 915 Douglas, E. F., 748
deMille, R., 346 Douglas, S. C., 496
Den Hartog, D. N., 478, 823, 833 Doverspike, D., 165
DeNisi, A. S., 128, 197, 424, 556, 558, 829 Dowling, P. J., 220
Denney, R., 845 Doz, Y. L., 781, 782
Denning, D. L., 705 Dragesund, T., 284
Derous, E., 425 Drasgow, F., 156, 158, 176, 177, 178, 180, 260, 270, 303,
DeRue, D. S., 810 388, 404, 429, 446, 466, 473, 692, 697
Derwall, J., 580 Driskell, J. E., 806, 807
DeShon, R. P., 11, 13, 17, 20, 22, 23, 26, 27, 33, 36, 53, 66, Driskill, W. E., 77, 94
101, 102 Droege, R. C., 78, 889, 890, 891
Desmarais, S., 538 Drollinger, S., 53
Dess, G. G., 205 Drotter, S., 735, 830
Detert, J. R., 500, 503 Druckman, D., 326
Detterman, D. K., 259 Druskat, V., 341, 350
DeVader, C. L., 826 Druskat, V. U., 807
Devine, D. J., 803, 813, 814 DuBois, C. L. Z., 465
DeVore, C. J., 490, 491, 494 DuBois, P. H., 705
DeVries, D. L., 441 Dudley, N. M., 301, 309, 469, 473, 475, 770
DeWall, C., 300 Duehr, E. E., 301
Dewe, P., 88 Duell, B., 471
DeWitte, K., 425 Duffy, M. K., 199, 490, 538, 807
Di Paolo, N. T., 111, 159, 230 Dukerich, J. M., 834
Diamante, T., 829 Dunbar, S. B., 33, 690
Diaz, T. E., 21, 684 Duncan, D., 133
Dickson, M. W., 833 Duncan, D. C., 923
Dickter, D. N., 171, 186 Dunford, B. D., 195, 198
Diedrich, F. J., 477 Dunlap, W. P., 60, 916, 917
Diefendorff, J. M., 413 Dunlop, P. D., 502, 503
Diehl, M., 347 Dunn, L., 730
Dierdorff, E. C., 81, 480, 904 Dunnette, M. D., 23, 58, 300, 308, 440, 499, 514, 558, 769,
Dierickx, I., 199 824, 894
Dietz, G., 88 Dupre, K. E., 561
Digman, J., 791 Duxbury, L. E., 561
Digman, J. M., 300 Dweck, C. S., 328
Dilchert, S., 57, 255, 260, 261, 262, 263, 267, 269, 299, Dwight, S. A., 179
300, 302, 308, 310, 340, 496, 825, 827 Dye, D., 894
DiMaggio, P. J., 214 Dyke, R. B., 325
DiMatteo, M. R., 346 Dzieweczynski, J. L., 308
Dimitrov, D. M., 101
Dineen, B. R., 133, 134, 135, 469, 473 E
Dionne, S. D., 826
DiPaolo, N. T., 586, 761 Eagle, B. W., 560
Dipboye, R. L., 521, 829, 831, 832, 833 Earles, J. A., 103, 162, 456, 690
Dirks, K. T., 480 Earley, P. C., 803
Author Index 965

Eastabrook, J. M., 351 Feltham, R., 933


Eaton, N. K., 300, 499 Fennema, E., 266
Ebel, R. L., 26 Fenwick, J. W., 284
Eby, L. T., 561 Fenwick, R., 535
Eddy, E. R., 804 Ferdig, R. E., 430
Eden, D., 813 Ferrara, P., 90
Edelstein, L., 580 Ferrari, C. A., 927, 928
Edens, P. S., 63, 308, 349 Ferrieux, D., 934
Edwards, B. D., 804, 811 Ferrin, D. L., 480
Edwards, J. R., 63, 166, 407 Ferris, G. R., 343, 345, 353, 539, 787
Eggebeen, S. L., 832 Ferzandi, L. A., 791
Ehrhart, M. G., 307, 776, 833 Festinger, L., 394
Ehrlich, S. B., 834 Fetter, R., 480
Eid, M., 33 Fidell, L. S., 287
Eidson, C. E., Jr., 33, 37, 306 Fiedler, F. E., 827, 832
Einarsen, S., 490 Field, H. S., 444
Eitelberg, M. J., 686, 688, 694 Field, K. A., 902
Ellingson, J. E., 61, 160, 162, 269, 344, 608, 631, 773 Fine, S. A., 77, 78, 87, 156, 388, 890
Ellington, J. K., 904 Finnegan, E. B., 347, 731
Elliot, A. J., 329 Fischer, D. L., 82, 801
Elliott, E. S., 328 Fisher, A., 245
Ellis, A. P. J., 807 Fisher, C. D., 910
Ellis, B. B., 405 Fisher, R. A., 11, 17, 26
Elster, R. E., 684 Fiske, D. W., 58, 59, 99, 598
Embretson, S. E., 180, 697 Fitts, P. M., 929, 931
England, G. W., 322, 786 Fitzpatrick, A. R., 14
Englehard, G., 37 Flanagan, J. C., 78, 442, 447, 552, 698
Englert, P., 658 Fleetwood, S., 143
Enns, J. R., 505 Fleishman, E. A., 77, 79, 86, 278, 280, 281, 286, 451, 452,
Entin, E. E., 477 453, 871, 887, 894, 895
Epstein, L., 88 Fleiss, J. L., 21, 23, 26, 286
Epstein, M. W., 909, 910 Fletcher, C., 421, 425, 584
Erez, A., 464, 467, 473, 537 Fletcher, P. A. K., 173
Erez, M., 786 Fletcher, R., 131
Eriksen, E. C., 927 Flinders, K., 134
Erisman, T., 929 Florey, A. T., 813
Erker, S. C., 721, 722, 827 Flyer, E. S., 684
Erwin, F., 729 Fogli, L., 82, 235, 237, 246, 247, 465
Erwin, P., 543 Fokkema, S. D., 932
Evers, A., 657, 823, 935 Foldes, H. J., 301, 310
Eyde, L. D., 588 Folger, R., 422, 490, 491, 504
Eyring, A. R., 137 Fontaine, J. R. J., 657
Ford, J. K., 471, 771
F Ford, M. E., 343
Forret, M. L., 130
Fabrigar, L. R., 407 Fortunato, V. J., 768
Facteau, J. D., 413 Foster, D. F., 261, 379
Faley, R. H., 99, 101 Foster, M. R., 53
Fallon, J., 388 Foti, R. J., 826
Fan, X., 156 Foulke, J. A., 281
Faraj, S., 832 Foushee, H. C., 807
Farmer, W., 303 Fowlkes, J. E., 809
Farr, J. L., 1, 329, 442, 445, 761, 792 Fox, H. C., 255
Farrell, S., 771 Fox, K. E., 342
Faterson, H. F., 325 Fox, S., 135, 347, 489, 490, 491, 492, 496, 498, 499, 500, 502
Faulkner, C., 385 Frame, J. H., 641
Featherman, D. L., 555 Franke, W., 705
Fedak, G. E., 691 Frankel, M. S., 589
Feigelson, M. E., 179 Franklyn-Stokes, A., 383, 385
Feild, H. S., 74, 805 Frase, M. J., 236
Feingold, A., 731 Fraser, S. L., 86
Feldman, D. C., 84, 514 Frederiksen, N., 89, 846
Feldt, L. S., 10, 11, 15, 16, 17, 19, 22, 24, 29, 31 Fredland, J. E., 687
Felstead, A., 561 Freeman, R. E., 572, 575
966 Author Index

Frei, R. L., 765, 770 Giacalone, R. A., 490


Freyd, M., 2 Gialluca, K. A., 86
Friede, A. J., 154 Gibb, A., 141
Friedel, L. A., 747 Gibbons, P., 731
Friedman, D. H., 690 Gibby, R. E., 140, 180, 261, 267
Frisbie, D. A., 28, 33 Gibson, C. B., 88, 814
Frisch, M. H., 825 Gibson, W. M., 54, 155, 607
Fritzsche, B. A., 190 Giese, F., 929
Frone, M. R., 543, 538, 544, 560 Giles, W. F., 805
Fullerton, H. N., Jr., 812 Gillespie, M. A., 902
Funder, D. C., 306, 346 Gilley, K. M., 792
Funke, U., 935 Gilliam, T., 282, 294
Furnham, A., 300, 302, 309, 310, 341, 344, 345, 793 Gilliland, S. W., 182, 186, 306, 371, 404, 422, 423, 773,
Futrell, D., 809 787
Gilmore, D., 162
G Gilroy, C. L., 687, 699
Giluk, T. L., 240
Gael, S., 77 Gladstein, D. L., 801
Gaertner, S., 304 Glaze, R. M., 387
Gailliot, M., 300 Glazer, N., 845
Gaines, W. G., 294 Glebbeek, A. C., 205
Gainey, R. R., 503 Gledhill, N., 279
Galante, S. P., 198 Glendon, A. I., 542
Gallagher, D. G., 535 Gleser, G. C., 13, 14, 101, 596
Gallo, A., 129 Glickman, A., 805
Galperin, B. L., 500 Glomb, T. M., 469, 474, 903
Ganster, D. C., 204, 490 Godshalk, V. M., 561
Ganzach, Y., 392 Goetzel, R. Z., 532, 535
Garcia, M., 445 Goff, M., 13
Gardner, R. W., 326 Goff, S., 262
Gardner, T. M., 198, 571 Goffin, R. D., 301, 309, 494
Gardner, W., 826 Goh, A., 491, 496, 498, 500, 502
Gardner, W. L., 245 Gohm, C. L., 342
Gasser, M. B., 65, 453 Goiffin, R. D., 86
Gatchel, R., 284 Goldberg, L. R., 299, 300, 309, 448, 791
Gates, T. S., 687 Golding, L. A., 282
Gatewood, R. D., 74, 77, 82, 86, 444 Goldman, B. M., 55
Gaugler, B. B., 63, 419, 732, 858 Goldman, D., 730
Gavin, J., 803 Goldman, N., 688
Gavin, J. H., 813 Goldman, R. F., 280
Gebbia, M. I., 730 Goldman, S., 344
Gebhardt, D. L., 277, 278, 279, 280, 281, 282, 283, 284, Goldstein, H. W., 117, 732
285, 286, 287, 288, 290, 293, 294 Goldstein, I. L., 86, 90, 106, 161, 164, 287, 365
Geher, G., 346 Goldstein, N. B., 306
Geimer, J. L., 697 Goleman, D., 341, 344, 825, 833
Geisinger, K. F., 709 Goltsi, V., 421, 425
Geisser, M. E., 284 Goodenough, D. R., 325
Gellermann, W., 589 Gooding, R. Z., 65, 103, 304, 690, 866
Gemelli, A., 928 Goodman, J. S., 913
Gentry, W. A., 53 Goodman, W. B., 903
George, C. E., 177, 178 Goodstone, M. S., 829
George, J. M., 480 Goodwin, V. L., 833
Georgi, D., 765 Gootman, J. A., 552
Gerbing, D. W., 407 Gottfredson, L. S., 104, 255, 258, 260, 265, 340
Gerhardt, M. W., 826 Gottschalk, R. J., 745, 746
Gerhart, B., 130, 196, 197, 198, 203 Gough, H. G., 300, 305
Gerlach, M. L., 441 Gowing, M. K., 80, 82, 348, 404
Germain, J., 932 Grafton, F. C., 688, 689
Gerson, K., 541 Graham, J. M., 101
Gewin, A. G., 33, 349 Graham, J. R., 705
Geys, H., 37 Graham, J. W., 465
Ghiselli, E. E., 29, 51, 61, 264, 304, 447, 621, 745, 866, Grandey, A. A., 501, 538
910, 913 Grant, D. L., 846, 847, 848, 850
Ghoshal, S., 199, 781, 782 Grant, R. M., 199
Author Index 967

Grauer, E., 388 Haertel, E. H., 9, 10, 11, 15, 22, 23, 24, 27, 32, 35
Graves, J. P., 525 Hagan, C. M., 514
Graves, L. M., 541 Haggerty, J. J., 196, 198, 203
Grayson, D., 33, 34, 35 Haig, A. J., 284
Green, B. F., 161, 286, 586, 689, 690, 693, 883 Haig, B. D., 407
Green, F., 561 Hakel, M. D., 422, 775
Green, S. B., 30, 33 Hakstian, A. R., 771, 772
Greenberg, I. M., 688 Haladyna, T., 164
Greenberg, J., 422, 490, 521 Halbesleben, J. R. B., 478
Greenberg, L., 497 Hale, A. R., 542
Greenberger, D. B., 538 Halfhill, T. R., 816
Greenhaus, J. H., 560, 563 Hall, A., 198, 571
Gregersen, H. B., 791, 794 Hall, J. A., 346
Gregory, D., 80 Hall, R. J., 474, 572, 575
Gregory, R. J., 705 Hallier, J., 135
Gregory, S., 771 Halpert, J. A., 111, 159, 230, 586, 761
Greguras, G. J., 27, 260, 413, 422 Halverson, S. K., 826, 829, 832
Greuter, M. A., 934 Hamel, G., 201
Griffeth, R. W., 304 Hamilton, J. G., 407
Griffin, M. A., 463, 464, 465, 477, 531, 544, 557, 558 Hammer, L. B., 560, 768
Griffin, R. W., 490 Hammer, M. R., 792
Griffin-Blake, C. S., 537 Han, J., 130
Griffith, J. E., 688 Hanges, P. J., 226
Grigsby, T. D., 560 Hanisch, K. A., 400
Groenvynck, H., 657 Hannan, R., 287
Grossman, N., 533 Hannum, K., 829
Grossnickle, W. F., 133 Hansen, C. P., 544
Grove, W. M., 391, 392, 393 Hansen, F., 128, 143
Groves, K. S., 832 Hanson, M. A., 86, 103, 310, 351, 450, 454, 466, 873, 897
Grubb, W. L., 260, 306, 342, 348, 772 Hanson, R. A., 522
Gruen, T., 772 Hanson, T., 811
Grundlach, H., 923, 925, 927, 932 Hansson, R. O., 555
Gruys, M. L., 264, 491, 500, 505, 584 Harackiewicz, J. M., 329
Guastello, S. J., 541, 543 Harms, P. D., 497
Gudykunst, W. B., 792 Harold, C. M., 305, 421, 422, 424
Guenole, N., 658 Harpaz, I., 786
Guenster, N., 580 Harris, C., 143
Guest, D., 88 Harris, D. A., 693
Gueutal, H. G., 191 Harris, H., 218, 794
Guilford, J. P., 346 Harris, J. H., 684
Guion, R. M., 1, 23, 38, 58, 60, 80, 99, 101, 103, 105, 106, Harris, L. C., 503
107, 162, 200, 280, 311, 367, 439, 441, 514, 651, Harris, M. M., 133, 141, 352, 423, 935
716, 721, 922, 943, 945 Harris, P. M., 497
Gulliksen, H., 12, 23, 951 Harris, R., 710, 771
Gully, S. M., 471, 787 Harrison, D. A., 132, 135, 791, 792, 803, 811, 813,
Gundlach, M. J., 496 814, 826
Gupta, N., 205 Hartel, C. E., 346
Gupta, V., 226 Harter, J. K., 477, 502
Gust, J., 388 Hartigan, J. A., 244, 607, 690
Gustafsson, J. E., 28, 29, 35, 259 Hartman, M. S., 143
Gutek, B. A., 55, 560 Hartman, N. S., 260, 306, 348, 772
Guterman, H. A., 310 Harvey, O. J., 326, 327
Guthrie, J., 88 Harvey, R. J., 75, 87, 902
Gutman, A., 293, 627, 628, 644 Hastie, R., 64
Guttman, L. A., 24 Hattori, Y., 284
Gwinner, K. P., 473 Hattrup, K., 639
Haugland, S. N., 306
H Hausdorf, P. A., 133
Hause, E. L., 129
Haar, J. M., 236 Hausknecht, J. P., 111, 116, 138, 141, 159, 183, 230, 257,
Haber, S., 575 422, 586, 761, 762, 765, 773
Hackett, R. D., 473 Hawk, J., 264
Hackman, J. R., 79, 88, 473, 801, 804, 811 Hawkins, K., 532
Haddleton, E., 141 Hazard, R. G., 284
968 Author Index

Hazer, J. T., 392 Hogan, J. C., 55, 80, 115, 236, 277, 278, 280, 281, 282, 285,
Hebl, M. R., 132 286, 290, 291, 292, 300, 301, 309, 490, 544,
Hechanova, R., 791 607, 730, 731, 769, 770, 771, 773
Heckman, R. J., 135 Hogan, M. J., 351
Hedge, J. W., 454, 464, 742, 863 Hogan, P. F., 693
Hedlund, J., 215, 339, 349, 813 Hogan, R. T., 80, 115, 236, 300, 308, 343, 490, 607, 731,
Heffner, T. S., 340 770, 771, 827
Heggestad, E. D., 111, 116, 159, 180, 230, 303, 499, 684 Holiday-Wayne, J., 137
Heilman, M. E., 481 Holladay, C. L., 832
Heisey, J. G., 689 Holland, B., 300, 309, 769
Helbing, J. C., 932 Holland, J. L., 79, 330, 457, 458, 871, 895
Held, J. D., 161, 691 Holland, P. W., 155
Heli, W., 204 Hollander, E., 727
Heller, D., 307 Hollenbeck, G. P., 823, 832, 833, 834
Heller, W. J., 926 Hollenbeck, J. R., 215, 810, 811, 813, 814, 815
Hellervik, L., 162 Hollinger, R. C., 490, 496, 503
Hellesoy, O. H., 490 Hollwitz, J. C., 494
Helmreich, R. L., 807 Holmgren, R. L., 690
Hendrickson, C. L., 130 Holtz, B. C., 162, 177, 265, 269, 715, 771, 774
Heneghan, J., 475 Hom, P. W., 304
Heneman, H. G., 127, 445 Homan, A. C., 814
Henle, C. A., 23 Hoobler, J., 538
Hennig-Thurau, T., 770 Hooijberg, R., 833
Henry, R. A., 58, 878, 881, 929 Hooper, G. S., 346
Hepburn, C. G., 561 Hoover, H. D., 155
Hepworth, W., 496, 501 Hopkins, P., 929
Hermeking, M., 131 Horn, J., 494
Hermelin, E., 913 Horn, J. L., 407
Hernandez, M., 132 Horowitz, I. B., 812, 813, 814, 815
Herrin, G. D., 281 Horowitz, S. K., 812, 813, 814, 815
Herriot, P., 204, 215, 424, 934, 935 Horvath, J. A., 339, 402
Herrnstein, R. J., 104 Horvath, M., 786
Herscovitch, L., 498 Hough, L. M., 103, 265, 269, 299, 300, 301, 302, 304,
Hersey, P., 832 305, 308, 309, 310, 351, 393, 499, 527, 714,
Hershberger, S. L., 30 729, 730, 731, 732, 747, 748, 769, 773, 774,
Hershcovis, M. S., 493, 494, 496, 497, 498, 500, 501 827, 873, 936
Herst, D. E., 561 House, R. J., 816, 832, 833
Hesketh, B., 464, 555, 771, 772 Houston, J. S., 303
Heslin, R., 801 Howard, A., 190, 721, 722, 729, 730, 731, 732, 827, 828,
Hess, R., 164 829, 831, 835, 845, 846, 847, 848, 850, 851, 852,
Hess, U., 346 854, 856, 858, 860, 861, 862
Hetter, R. D., 697 Howe, V., 128
Hewer, A., 300 Howell, W. C., 831
Hezlett, S. A., 33, 255 Howes, J. C., 501
Hickman, M., 496 Hox, J. J., 37
Hicks, L. E., 302 Hoyt, C., 24
Higgins, C. A., 540, 561 Hoyt, C. L., 826
Higgs, A. C., 90, 93, 239, 249, 809 Hu, C., 134
Highhouse, S., 128, 129, 381, 391, 392, 931 Hu, L. T., 809
Hillgren, J. S., 516 Hubbard, M., 899
Hirschfeld, R. R., 805, 806, 817 Huff, J. W., 816
Hitt, M. A., 197, 205 Huff, K., 378
Ho, J. M. C., 501 Huffcutt, A. I., 104, 305, 347, 446, 447, 809
Hochwarter, W. A., 343, 540 Hughes, D., 378, 380, 383, 384, 385
Hocking, R. R., 33, 35 Hughes, J. M. C., 384
Hodgdon, J. A., 287 Hui, C., 454
Hoepener, R., 342 Huiras, J., 497
Hoffman, B. J., 476 Hulin, C. L., 58, 878, 881
Hoffman, C. C., 109, 607, 732 Hull, C. L., 51, 255
Hoffman, R. G., 57, 104 Hülsheger, U. R., 263, 455, 921, 935
Hofmann, D. A., 329, 542, 545, 558, 775 Humphrey, S. E., 814
Hofstede, G., 786 Humphreys, L. G., 265, 878
Hofstee, W. K. B., 932 Hungerford, M. K., 830
Hogan, J. B., 305, 772 Hunt, D. E., 326
Author Index 969

Hunt, S. T., 303, 447, 451, 452, 453 Jehn, K. A., 813
Hunter, A. E., 305 Jenkins, J. G., 946
Hunter, D. R., 456 Jennings, D., 55, 160, 632, 760
Hunter, J. E., 11, 15, 22, 24, 36, 51, 56, 57, 61, 64, 65, 66, Jensen, A. R., 104, 348, 456
75, 103, 104, 119, 163, 201, 236, 241, 245, 255, Jex, S. M., 903
258, 261, 262, 268, 269, 304, 307, 311, 371, 455, Jingo, M., 400
456, 458, 472, 499, 516, 519, 526, 528, 596, 598, Joensen, J., 284
599, 607, 621, 690, 722, 729, 730, 732, 733, 745, John, O. P., 540, 791
746, 747, 771, 772, 866, 909, 910, 911, 910, 912, Johns, G., 89, 391, 392, 393, 491
913, 914, 918 Johnson, C. D., 884
Hunter, R. F., 258, 261, 268, 304, 455, 596, 607, 621, 690, Johnson, D. E., 467
729, 730, 747, 772, 866, 910, 911, 913 Johnson, D. J., 809, 810, 816
Hunthausen, J. M., 768, 769 Johnson, E. C., 134, 165, 502, 809
Hurley, A. E., 407, 917 Johnson, H. M., 346, 348
Hurley, R. F., 768 Johnson, J. L., 199
Hurtz, G. M., 309, 457, 607 Johnson, J. W., 151, 161, 162, 463, 464, 465, 553, 555, 607
Huselid, M. A., 88, 151, 196, 197, 198, 765 Johnson, M., 815
Huteau, M., 935 Johnson, M. S., 37
Hutnik, N., 793 Johnson, S. K., 832, 833
Hutton, J., 882 Johnson, W., 259
Hyde, J. S., 266 Johnston, J. H., 802
Hyland, M., 794 Johnston, W. B., 82
Joireman, J., 300, 468, 471
I Jones, C., 157, 181, 771, 772, 775
Jones, D. A., 136
Icenogle, M. L., 560 Jones, D. P., 745, 746
Ilgen, D. R., 65, 215, 463, 537, 813, 814, 815 Jones, J. R., 204
Ilies, R., 26, 307, 469, 473, 475, 477, 479, 502, 826 Jones, J. W., 490
Ingerick, M. J., 37, 463, 684 Jones, K., 343
Inness, M., 535 Jones, R. G., 82, 801, 802, 803, 805, 817
Irwin, J. L., 180 Jones, S. E., 587
Irwing, P., 266 Jordan, M., 204
Ispas, D., 141 Jordan, M. H., 805
Iverson, R., 543 Jordan, P. J., 341, 346, 350
Joreskog, K. G., 15, 16, 28, 29, 32, 404, 405, 406
J Joshi, A., 794
Judd, C. M., 25, 33
Jackson, A. S., 281, 287, 288 Judge, T. A., 33, 128, 129, 134, 137, 138, 141, 240, 257,
Jackson, C. L., 468, 470 300, 307, 309, 457, 467, 468, 470, 472, 475, 496,
Jackson, D. N., 166, 300, 301, 303, 457, 827 501, 540, 607, 787, 826, 827, 828
Jackson, P. R., 805 Judiesch, M. K., 371, 722
Jackson, S., 13, 18, 26 Jundt, D., 815
Jackson, S. E., 151, 197, 198, 811 Jung, C., 326
Jacobs, J. A., 541 Junker, B. W., 37
Jacobs, R. R., 57, 104, 115, 119, 542, 705, 710, 715, 771,
772, 775, 792 K
Jaeger, A. J., 351
Jaeger, R. M., 688 Kabin, M. B., 61, 162, 269, 344, 608
Jago, I. A., 281 Kacmar, K. M., 205
Jamal, M., 536 Kahn, R. L., 537, 726
James, L. R., 909, 910, 911, 912, 913, 915, 916, 917, 918 Kahneman, D., 264
Jamieson, B. D., 520 Kaiser, R. B., 731, 827
Jamnik, V. K., 279 Kamata, A., 31
Janovics, J., 388 Kamdar, D., 468, 470, 471, 477, 478
Jansen, K. J., 810 Kammeyer-Mueller, J., 104, 144, 903
Jansen, P. G. W., 354, 909, 934 Kamp, J. D., 300, 499
Janz, T., 162 Kane, H. D., 259
Jaramillo, F., 771 Kane, J. S., 514, 521
Jarratt, J., 82 Kane, M. T., 38
Jarvis, C. B., 118 Kanfer, R., 84, 300, 343, 440, 472
Javidan, M., 833 Kanter, R. M., 539, 673
Jayne, M. E. A., 239, 242 Kantor, J., 581
Jeanneret, P. R., 63, 77, 79, 447, 450, 451, 452, 453, 584, Kantrowitz, T. M., 84
589, 593, 828, 887, 888, 894, 897 Kaplan, L. B., 884
970 Author Index

Karambayya, R., 480 Klayman, N., 392


Karande, K., 131 Klehe, U. C., 213
Karasek, R. A., 537, 556 Kleiman, L. S., 99, 101
Karp, S. A., 325 Klein, G. D., 804
Karpinos, B. D., 686 Klein, K. J., 202, 203, 811, 833
Karwowski, W., 282 Kleinmuntz, B., 391, 393, 394
Kashy, D. A., 33, 304 Klepa, L., 560
Kastello, G. M., 279 Klimoski, R., 802, 805, 817
Katch, F. I., 279 Kline, R. B., 407
Katch, V. L., 279 Kluemper, D., 310
Katkovsky, W., 850, 861 Kluger, A. N., 256, 392, 397, 421, 422, 424, 829
Katz, A., 322 Knapik, J. J., 290, 294
Katz, D., 465, 726 Knapp, D. J., 66, 67, 680, 865, 880, 882, 883
Katzell, R. A., 572 Knorz, C., 490
Kavanagh, E. J., 304 Knutsson, A., 536
Kayes, D. C., 807 Kochhar, R., 205
Kazama, S. K., 832 Koedijk, K., 580
Keashly, L., 500 Kogan, N., 326
Keating, D. P., 342, 343 Kohlberg, L., 300
Keck, S., 803 Kolen, M. J., 155
Keegan, A. E., 478 Komaroff, E., 17, 29, 30, 31
Keenan, P. A., 429 Konieczny, F. B., 882
Keeping, L. M., 130, 133, 134, 135 Koocher, G. P., 575, 589
Kehoe, J. F., 164, 165, 186, 265, 293 Koopman, P. L., 823
Kehoe, J. R., 99 Koopman, R. F., 162
Keil, C. T., 775 Korman, A. K., 237, 829
Keith-Spiegel, P., 589 Kornhauser, A. W., 2, 930
Keller, K. S., 497 Kossek, E. E., 561
Kelloway, E. K., 504, 538, 545 Kouzes, J. M., 322
Kelly, M. L., 310, 771 Koys, D. J., 480
Kemp, C., 177, 771 Kozlowski, S. W. J., 202, 203, 464, 471, 479, 801,
Kendall, J., 533, 534 811, 815
Kendall, L. M., 442, 477 Krach, U., 933
Kenny, D. A., 25, 33, 59, 304, 346 Kraemer, W. J., 290
Kerk, C. J., 294 Kraft, M., 300
Kern, J., 538 Kraiger, K., 102
Kessler, R. C., 533 Kraimer, M. L., 478
Kessler, S., 491, 498 Krajewski, H. T., 732
Ketchen, D., 198, 571 Kramer, R. M., 724
Kethley, R. B., 55, 291, 349 Krause, D. E., 308
Keyserling, W. M., 281 Krausz, M., 536
Kichuk, S. L., 803 Kraut, A. I., 237
Kidron, A., 322 Kravitz, D. A., 132, 162, 164, 265, 715, 774
Kiesler, S., 177 Kriek, H. J., 658
Kiker, D. S., 473 Kriska, S. D., 138
Kilcullen, R. N., 305 Kristof-Brown, A. L., 134, 165, 166, 809, 810, 811, 815
Kilduff, M., 143, 803 Kruschwitz, N., 555
Kim, B., 479 Kubisiak, U. C., 86, 449, 450, 451, 452, 897
Kim, B. H., 154 Kucinkas, S. K., 690
Kim, D.-I., 155 Kuder, G. F., 24
Kimball, B. A., 572, 575 Kudisch, J. D., 768
King, D. W., 561 Kuhlman, D. M., 300
King, J., 134 Kuhn, T. S., 9, 37
King, L. A., 561 Kulik, C. L. C., 430
Kingsbury, F. A., 2 Kulik, J. A., 430
Kinyungu, C., 658 Kuljanin, G., 158
Kirkpatrick, S. A., 832 Kumar, K., 544
Kirksey, J., 809 Kunce, C., 305
Kirsch, M., 65, 103, 304, 690, 866 Kuncel, N. R., 255, 309
Kirton, M. J., 326, 327 Kurosawa, A., 400
Kisamore, J. L., 748 Kwiatkowski, R., 923
Kitayama, S., 330 Kwok, C., 501
Kitson, H. C., 930 Kwok, O., 158
Klawsky, J. D., 772 Kyllonen, P. C., 347, 697
Author Index 971

L Lee, T. W., 128


Lefkowitz, J., 571, 572, 573, 574, 576, 578, 579, 589, 730
Laczo, R. M., 901 Leggett, E. L., 328
Ladd, R. T., 909 Legree, P. J., 143, 340
Ladenson, R. F., 589 Leiter, M. P., 538
Ladik, D. M., 771 Lent, R. H., 256
Laenen, A., 37 Lepak, D. P., 134, 204, 205
Laffitte, L. J., 406 LePine, J. A., 215, 467, 468, 469, 470, 496, 537, 540, 813
LaHuis, D. M., 37, 311 Leslie, L. M., 132
Lahy, J. M., 922, 923, 927 Lesser, M., 544
LaLonde, C., 29 Lev-Arey, D., 132
Lam, H., 257 Levesque, L. L., 141
Lamazor, A., 388 Levin, L. S., 256
Lambert, T. A., 33, 349 Levine, C., 300
Lammlein, S. E., 742, 863 Levine, E. L., 74, 82, 87, 456, 825, 894, 910
Lamon, S. J., 266 Levine, J. M., 811, 816
Lance, C. E., 158, 236, 349, 406, 407, 409, 410, 411, 786 Levine, M. S., 287
Lance, C. L., 22, 33, 34, 53 Levinson, H., 522
Landauer, T., 190 Levy, D. G., 697
Landers, R. N., 260 Levy, P. E., 130, 133, 134, 135, 445, 474
Landis, R. S., 133 Levy-Leboyer, C., 934, 935
Landon, T. E., 55, 284 Lewicki, R. J., 473
Landy, F. J., 9, 38, 54, 59, 79, 82, 99, 190, 281, 288, 342, Lewis, C., 24, 29
344, 346, 351, 352, 402, 442, 445, 514, 542, Lewis, K., 55
593, 595, 622, 627, 628, 921 Lewis, R. E., 135
Lane, R. D., 346 Lewis, S., 560, 561, 563
Lange, S. R., 388 Li, L., 771
Langenfeld, T. E., 99 Liao, H., 144, 478, 543, 544, 768
Langevin, A. M., 765 Libby, R., 340
Langfred, C., 803 Licata, J. W., 769
Langton, L., 503 Liden, R. C., 787
Lankford, J. S., 177 Lieberson, S., 834
Lanza, S. T., 903 Liebler, A., 346
Lapierre, L. M., 504 Lievens, F., 33, 90, 92, 109, 111, 116, 129, 130, 133, 140,
LaPort, K., 475 141, 154, 158, 159, 161, 164, 207, 215, 230, 237,
Lassiter, D. L., 804 265, 269, 310, 339, 348, 349, 351, 352, 381, 422,
Latham, G. P., 196, 240, 392, 445, 472, 514, 515, 517, 518 580, 631, 715, 761, 774, 785, 787, 860, 929, 935,
Lau, A. W., 826 936
Lau, D. C., 813 Lightfoot, M. A., 684
Laughery, K. R., 288 Likert, R., 303
Laughlin, J. E., 180 Lilienfeld, S. O., 305
Laur, J., 555 Lily, R. S., 705
Laurence, J. H., 684, 685, 686, 694 Lin, A., 793
Lautenschlager, G. J., 178, 412 Lin, L., 183
Lavergne, H., 655 Lindell, M. K., 86
Law, K. S., 342, 345, 346, 473, 480, 828 Lindsley, D. H., 688
Lawler, E. E., III, 88, 440, 522, 558, 824 Ling, J., 133
Lawless, P., 504 Linn, M. C., 266
Lawshe, C. H., 99, 101, 104, 244 Linn, R. L., 22, 29, 688
Lawson, L., 804 Lischetzke, T., 33
Lawton, R., 542, 543, 544 Littell, R. C., 27, 28
Lazarova, M., 790 Little, I. S., 129
Le, H., 21, 23, 26, 241, 745 Little, R. D., 687
Leach, D. J., 805, 811 Liu, C., 903
Lebiecki, J. E., 301, 770 Liu, W., 478
Leblebici, H., 824 Liu, Y., 571
Lebow, B. S., 391 Livingston, S. A., 62, 155
Leck, J. D., 131, 504 Lobel, S. A., 812
Lee. C., 801 Locke, E. A., 342, 472, 476, 515, 517, 518, 832
Lee, D., 687 Lockhart, D. E., 199
Lee, G., 28, 33, 155 Lockwood, A., 562
Lee, G. C., 687 Loevinger, J., 38, 39
Lee, K., 495, 502, 503, 505 Lofquist, L. H., 896
Lee, R., 322 London, M., 417, 423, 426, 829
972 Author Index

Lord, F. M., 11, 13, 15 Marsh, H. W., 33, 34, 35


Lord, R. G., 826, 834 Marshall, G. W., 771
Lorenzet, S. J., 804, 815 Marston, C., 724
Louche, C., 935 Martin, A. H., 843
Loughlin, C., 545 Martin, B. A., 303
Lowman, R. L., 571, 581, 584, 587 Martin, C., 445
Lubinski, D., 255, 258, 259, 264, 270 Martin, D. E., 340
Lucas, R. E., 157 Martin, N. R., 311
Luchman, Joseph, 463 Martinko, M. J., 245, 496, 498
Lucier, C., 833 Martinussen, M., 456
Luecht, R. M., 29 Martocchio, J. J., 540
Lueke, S. B., 164 Maslach, C., 538, 540
Luk, D. M., 791 Mathieu, J. E., 329, 813
Lukasik, M. A., 813 Matsumoto, D., 131, 346
Lumsden, J., 13 Matthews, G., 340, 341, 342, 352
Lund, S. J., 282, 294 Matthiesen, S. B., 490
Luo, Z., 377 Maurer, S. D., 116, 128, 129, 305, 732
Lygren, H., 284 Mavor, A. S., 294, 688, 692, 693
Lyness, K. S., 560 Maxwell, A. E., 29
Lynn, R., 266 Maxwell, S. E., 284, 343
May, K. E., 516, 811
M May, S., 826
Mayberry, P. W., 690, 691
Macan, T. H., 391, 422 Mayer, D. M., 132, 470, 776
Macaulay, J. L., 832 Mayer, J. D., 340, 341, 342, 344, 345, 346, 825
MacCallum, R. C., 407 Mayer, T., 284
MacCann, C., 347 Mayfield, M. S., 90
MacDonald, P., 156 McAllister, D. J., 471, 478
MacDuffie, J. P., 88 McArdle, J. J., 407
Macfarlane-Dick, D., 421, 423, 424 McArdle, W. D., 279, 280, 281, 282, 290
MacGregor, M., 688 McBride, J. R., 681, 697
MacIver, R., 731 McCabe, D. L., 380, 384, 385, 389
Mack, H., 305 McCall, M. W., 833
Mack, M. J., 61, 163 McCallum, K., 281
MacKenzie, S. B., 118, 454, 457, 465, 480, 526, 884 McCaulley, M. H., 326
MacLennan, W. J., 268 McClelland, D. C., 329, 330, 825
MacLeod, A. A., 771 McClough, A. C., 801, 802, 805, 806, 807, 817
Maertz, C. P., Jr., 421, 423, 424, 425 McCloy, R. A., 21, 27, 37, 58, 180, 258, 300, 303, 351, 449,
Mahaffie, J. B., 82 451, 452, 464, 499, 553, 684, 689, 693, 877, 883
Mahoney, T. A., 834 McCormick, E. J., 76, 79, 86, 116, 441, 447, 541, 542, 888,
Maier, G. W., 263, 455, 935 894, 897
Maier, M., 541 McCrae, R. R., 300, 306, 540, 791, 814, 826
Maier, M. H., 60, 61, 686, 688 McCulloch, C. E., 20
Mainiero, L., 541 McCulloch, M. C., 165, 166
Maio, G. R., 322 McDaniel, M. A., 54, 116, 260, 269, 305, 306, 307, 340,
Major, B., 540 342, 347, 348, 731, 732, 748, 765, 770, 772
Major, D. A., 560, 561 McDonald, R. P., 28, 35, 100, 101, 102, 596, 598, 600
Makin, P. J., 934 McFadden, K. L., 497
Maldegen, R., 732 McFarland, C., 417, 422
Mallart, J., 928 McFarland, L. A., 138, 207, 236, 256, 305, 731, 787
Malloy, T. E., 304 McGeoch, B., 533
Malos, S. B., 404 McGinnis, P. M., 282
Mann, R. D., 811 McGrath, J. E., 801, 803, 811, 816
Manning, L., 688 McGraw, K. O., 21, 24, 26
Mannix, E., 812, 814 McHenry, J. J., 58, 103, 310, 351, 721, 873
Mano, H., 141 McIntyre, H. H., 909
Manson, T. M., 748 McIntyre, K. S., 641
Marcoulides, G. A., 26, 27, 33, 35 McKay, P. F., 116, 130, 131, 132, 135, 137, 139, 141
Marcus, B., 421, 495, 496, 497, 498, 501, 502, 505 McKenzie, R., 687
Mariano, L. T., 37 McKenzie, R. C., 311, 596
Markham, P. M., 339 McKie, D., 31
Marks, M. A., 807 McLagan, P. A., 91
Markus, H., 330 McLarty, J. R., 79
Marlowe, H. A., 343 McLelland, D. C., 90
Author Index 973

McLeod, B. M., 793 Mitchell, M. S., 471, 500


McLeod, P. L., 812 Mitchell, T. R., 321, 472
McManus, M. A., 310, 771, 774 Mitchelson, J., 833
McMorris, B., 497 Mitroff, I. I., 381, 392
McMurrian, R., 560 Miyashiro, B., 811
McNelly, T. L., 63, 308, 349 Moberg, P., 429
McPhail, S. M., 9, 74, 75, 835, 897 Moe-Nilssen, R., 284
Mead, A. D., 176, 177, 178, 260, 429 Moers, M., 929
Mead, A. W., 178 Moffitt, T. E., 497
Meade, A. W., 158, 236, 412 Mohamed, A. A., 291, 349
Means, B. M., 688 Mohammed, S., 812, 813, 814
Meara, N. M., 343 Mohr, D., 129
Mecham, R. C., 447, 888, 894 Mohrman, S. A., 88
Mecham, R. L., 135 Mol, S. T., 213, 221, 791, 792, 793
Mecham, R. M., 79 Molenberghs, G., 37
Medland, M. E., 143 Moon, H., 468, 470, 477
Medsker, G. J., 340, 809 Mooney, V., 284
Meehl, P. E., 38, 101, 245, 391, 393, 596 Moore, D. W., 696
Meehl, R. J., 63, 64 Moreland, K. L., 588
Meglino, B. G., 245 Moreland, R. L., 811, 816
Meglino, B. M., 128, 129, 322, 810 Morgan, B. B., Jr., 803, 804, 805
Mehra, A., 803 Morgan, M. T., 430
Meindl, J. R., 834 Morgeson, F. P., 74, 81, 83, 85, 87, 90, 215, 236, 307, 308,
Melican, G. J., 584 347, 422, 473, 607, 731, 805, 806, 808, 809, 810,
Mellenbergh, G. J., 11, 13 817, 827, 828, 829, 883, 894, 904
Mendenhall, M. E., 218, 791, 792, 794 Moriarty Gerrard, M. O., 111, 159, 230, 586, 761
Mendini, K., 475 Morris, J. A., 84
Menkes, J., 825 Morris, M. A., 458
Menon, S., 456, 910 Morrison, A. M., 441
Menzerath, P., 922 Morrison, E. W., 478
Meredith, W., 407, 409 Morrison, M., 303
Merenda, P. F., 10, 23, 37 Moscoso, S., 103, 177, 178, 261, 263, 264, 305, 910, 935
Meriac, J. P., 467 Moses, J. L., 832
Messick, S., 38, 44, 53, 54, 104, 326, 596 Mosier, C. I., 32
Meuter, M. L., 473 Mosier, S. B., 287
Meyer, C. J., 814 Mosley, D. C., Jr., 423, 424
Meyer, J. P., 388, 495, 498 Moss, S. C., 545
Michael, J. H., 804 Mossholder, K. W., 478
Michels, L. C., 178, 412 Most, R. B., 588
Mikulay, S. M., 494 Motowidlo, S. J., 65, 82, 240, 308, 309, 454, 456, 457, 465,
Miles, D. E., 490, 496, 498, 500 466, 467, 469, 472, 473, 526, 551, 553, 554, 555,
Miles, E. W., 560 877
Miles, G. H., 926 Mount, M. K., 13, 33, 103, 245, 299, 304, 307, 308, 309,
Miles, J. A., 141 353, 456, 457, 477, 499, 502, 526, 540, 544, 545,
Milkovich, G. T., 517 607, 730, 765, 768, 769, 791, 811, 824, 825, 884,
Miller, A. R., 887, 890, 892, 893, 894 910, 911, 917, 918
Miller, H. E., 770 Mowen, J. C., 769
Miller, L., 129 Moynihan, L. M., 198, 571
Miller, M. L., 772 Muczyk, J. P., 525
Milliken, G. A., 27 Mueller, J. S., 496, 501
Mills, A. E., 412 Mueller, L., 164, 165
Mills, C. N., 584 Mulaik, S. A., 410, 909, 915
Mills, P. R., 533 Mulatu, M. S., 268
Millsap, R. E., 10, 158, 421, 422, 773 Muldrow, T. W., 311, 596
Miner, J. B., 772 Mulkey, J., 801
Miners, C., 353 Mulki, F. P., 771
Mintzberg, H., 538 Mullen, B., 809
Mira, E., 923, 931 Mullich, J., 134, 140, 145
Miret-Alsina, F., 933 Mullins, M. E., 28
Mischel, W., 909, 910 Mumford, M. D., 77, 305, 451, 452, 453, 729, 827, 887,
Mishra, P., 421, 430 896, 898, 915
Mital, A., 282 Mumford, T. V., 215, 805, 806, 817, 829
Mitchell, J. L., 77, 94 Münsterberg, H., 922
Mitchell, K. E., 305 Murnighan, J. K., 813
974 Author Index

Murphy, K. J., 833 Nigan, A., 689


Murphy, K. R., 11, 13, 36, 53, 61, 66, 99, 101, 102, 103, Nikolaou, I., 257
105, 107, 163, 241, 308, 354, 367, 445, 514, 552, Niles-Jolly, K., 776
556, 559, 639, 910, 912, 913, 914, 915, 944 Nilsen, D., 304
Murphy, L. L., 709 Nimps, T., 903
Murphy, S. E., 825, 826, 832 Nindl, B. C., 290
Murray, C., 104 Noble, C. L., 692
Murray, H. A., 844, 846 Noe, R. A., 65, 103, 134, 202, 304, 690, 866
Murray, V., 531 Noel, J., 735, 830
Myers, D. C., 280, 284, 286 Nohria, N., 782
Myers, I. B., 326 Nolan, L. G., 687
Mykletun, A., 536 Noon, S. L., 878
Mykletun, R. J., 536 Noonan, L., 447
Myors, B., 132, 241, 266, 404, 409, 652, 668 Nord, W. R., 135
Norris, D., 164
N Novick, M. R., 13, 15, 24, 29, 688, 690
Nowakowski, J. M., 479
Nadler, D. A., 831 Nowicki, S., 346
Nadler, M. B., 831 Nunnally, J. C., 10, 13, 101, 102, 600
Nagle, B. F., 553 Nussbeck, F. W., 33
Naglieri, J. A., 175, 178, 378 Nutting, S. M., 55, 284, 543
Nagy, T. F., 587 Nye, C. D., 156, 388
Nahapiet, J., 199 Nyfield, G., 731
Nahrgang, J. D., 473
Nakache, P., 135 O
Nanda, H., 13, 101
Nandkeolyar, A. K., 464, 468, 469, 774, 775 Oakland, T. D., 259
Narayanan, L., 456, 910 Oakley, J. L. ., 471
Narayandas, D., 774 Oaten, M., 300
Nathan, B. R., 103, 262, 866 Oates, G., 268
Nathanson, C., 380, 383, 385, 389 O’Connor, E. J., 910
Naylor, J. C., 596 O’Connor, J. F., 834
Neal, A., 463, 464, 544, 555 Oddou, G., 218, 791, 792
Neal, T. M., 143 Offerman, L. R., 82, 404
Neale, M. A., 812, 814 Ogbonna, E., 503
Near, J. P., 465, 554 Oh, I., 241, 456
Nebel, K. L., 349 Oi, W. Y., 687
Neebe, E., 903 Oke, A., 351
Neece, W. M., 555 Oldham, G. R., 79, 88, 199, 202, 473
Nelson, C., 391 Olea, M. M., 259, 456
Nelson, C. E, 494 O’Leary, R. S., 513, 522
Nelson, D. L., 541 O’Leary-Kelly, A. M., 490, 501
Nemeth, Y. M., 53 Olian, R. L., 514
Nestor, J. M., 928 Oliver, C., 214
Netemeyer, R. G., 560, 561 Olson, A., 164, 165, 293
Neubert, M. J., 811 Olson, J. B., 429
Neuman, G. A., 813 Oltman, P. K., 325
Neuman, J. H., 498, 499, 500 O’Neil, H. F., 801, 811
Neustel, S., 158 Ones, D. S., 11, 25, 57, 66, 101, 103, 116, 155, 246, 255,
Newell, S., 729, 787, 934 256, 260, 261, 262, 263, 267, 270, 277, 299,
Newell, S. P., 497 300, 301, 302, 307, 308, 309, 310, 340, 456,
Newman, D. A., 57, 104, 115, 119 493, 494, 495, 496, 499, 555, 721, 729, 730, 731,
Newsome, S., 351 769, 770, 775, 791, 792, 825, 827, 828, 883, 910,
Newstead, S. E., 383, 385 913, 935
Ng, K. T., 13, 36 Oppler, S. H., 58, 164, 351, 449, 451, 452, 464, 472, 553,
Ng, K. Y., 495, 501 684, 877
Nguyen, N. T., 348 Orbe, M. P., 131
Nichols, D. R., 806 O’Reilly, C. A., III, 787, 812, 813
Nicholson, N., 490 Organ, D. W., 465, 467, 468, 469, 470, 472, 554
Nicol, A. A. M., 494 Orlitzky, M., 580
Nicola, D. J., 421, 423, 424 Orvis, K. A., 301, 770
Nicoll, A., 531 Osburn, H. G., 288, 607
Nielsen, T. M., 816 Ostgaard, D. J., 66, 264
Nieminen, L., 51 Ostroff, C., 23, 73, 200, 204, 215
Author Index 975

O’Sullivan, M., 342, 346, 347 Peng, T. K., 478


Oswald, F. L., 65, 151, 154, 155, 157, 164, 265, 300, 302, Penner, L. A., 309, 454, 457, 465, 468, 470, 540
308, 309, 310, 382, 453, 729, 730, 731, 732, 773, Penney, L. M., 491, 496, 498, 500, 502
902, 936 Pérez, J. C., 345
Otto, R. M., 279 Perez-Creus, J., 926, 927, 928
Outerbridge, A. N., 66, 258, 262, 472 Perkins, L. A., 132
Outtz, J. L., 164, 181, 287, 627, 628, 632 Perloff, R., 581
Oviedo-Garcia, M., 305 Perrewé, P. L., 531, 539, 540
Owens, W. A., 68, 729, 827 Perrewé, P. M., 343
Ozeki, C., 561 Perry, D. C., 545
Ozminkowski, R. J., 532 Peters, E., 749
Peters, L. H., 910
P Petersen, A. C., 266
Peterson, N. G., 57, 77, 80, 93, 104, 451, 452, 453, 870,
Pacaud, S., 933 873, 881, 887, 895, 896, 898, 899, 901
Packer, A. E., 82 Peterson, N. S., 155
Paese, M., 422 Petrides, K. V., 341, 344, 345
Paese, M. J., 735, 830 Pfeffer, J., 1, 395, 834
Page, R., 207, 256, 305, 731, 787 Pharmer, J. A., 807, 812
Paik, Y., 139 Philips, J. L., 813, 814
Paine, J. B., 454, 457, 465, 526 Phillips, J., 793, 794
Palfai, T., 344 Phillips, J. M., 128, 787
Palmade, G., 933 Phillips, J. S., 64, 103, 262, 598
Palmer, B., 345 Phillips, R. A., 572, 575
Palmer, E. M., 128 Phoenix, A., 135
Pandolf, K. B., 280 Picano, J. J., 55
Panzer, F. J., 807 Piccolo, R. F., 469, 473
Papinchock, J. M., 363 Pierce, J. R., 480
Papper, E. M., 90, 239 Pierce, L., 463, 464, 477
Paquet, S. L., 182 Pinillos, J. L., 932
Parasuraman, S., 560, 561, 563 Piorkowski, C., 927
Park, K., 537 Piotrowski, M., 769, 828
Parker, D., 542, 543, 544 Piquero, A., 496
Parker, G. Y., 687 Pirozzolo, F. J., 825
Parker, J. D., 351 Pirretti, A., 903
Parker, S. K., 463, 557, 558 Plake, B. S., 709
Parkes, K. R., 536 Plamondon, K. E., 239, 402, 463, 537, 555, 833
Parks, K. M., 533 Plasentin, K. A., 136
Parks, L., 353 Plomin, R., 255
Paronto, M. E., 423 Ployhart, R. E., 23, 60, 66, 82, 127, 138, 162, 164, 177,
Parry, J. B., 844, 933 183, 195, 196, 197, 200, 201, 202, 203, 204,
Pascual, M., 932 206, 207, 208, 215, 216, 265, 269, 307, 310,
Patel, R., 80 420, 421, 422, 423, 424, 465, 469, 470, 472,
Paterson, D. G., 889 715, 731, 732, 747, 771, 773, 774, 775, 776,
Patterson, D. W., 555 785, 786, 804, 816
Patton, G. K., 306 Pluta, P., 341
Patz, R. J., 37 Podratz, L. T., 824, 830
Paul, M. C., 204 Podsakoff, P. M., 118, 454, 465, 480, 526
Paulhus, D., 539 Pommerich, M., 155
Paulhus, D. L., 380, 385 Pope, K. S., 419, 421, 580
Paulik, K., 928 Porath, C. L., 88, 473, 490
Paullin, C., 305 Porter, C. O. L. H., 135, 495, 501
Paunonen, S. V., 156, 300, 494 Posner, B. Z., 322
Payne, B. K., 503 Posthuma, R. A., 423, 424
Payne, S. C., 306, 335, 480 Potosky, D., 160, 177, 265, 269, 352, 583, 632, 773
Peake, P. K., 909 Powell, B. C., 480
Pearlman, K., 73, 75, 82, 92, 236, 237, 247, 268, 421, 422, Powell, G. N., 541
598, 745, 746, 773, 909, 914 Powell, W. W., 214
Pearsall, M. J., 807 Prahalad, C. K., 201, 781, 782
Pearson, C. M., 490 Prasow, P., 749
Pedhazur, E. J., 287, 600 Pratt, A. K., 180
Peeters, M. A. G., 813, 814, 817 Premack, S. L., 128
Pelled, L. H., 813 Price, K. H., 803, 813
Pence, E. C., 446, 447 Prien, E. P., 584, 828
976 Author Index

Prien, K. O., 584, 828 Reilly, M. E., 157


Priest, H. A., 809 Reilly, R. R., 304, 305, 397, 421, 422, 773
Prieto, J. M., 934 Reise, S. P., 407
Primoff, E. S., 77, 79, 87, 588 Reiss, A. D., 246
Prince, C., 803 Reiter-Palmon, R., 903
Pritchard, R. D., 327 Rensvold, R. B., 406
Probst, T. M., 500, 545 Rest, J. R., 589
Pryor, R. G. L., 587 Reuterberg, S. E., 28, 29, 35
Pryzwansky, W. B., 587 Reymen, I., 813
Psotka, J., 340 Reynolds, D. H., 171, 182, 183, 190, 629, 878
Pugh, R. H., 407 Reynolds, J. R., 561
Pulakos, E. D., 65, 239, 402, 447, 463, 464, 465, 468, 469, Rhodes, M. G., 268
472, 474, 513, 514, 515, 522, 537, 538, 555, 747, Rich, B. L., 470, 496
827, 833 Richards, C., 540
Purohit, Y. S., 561 Richardson, M. W., 24
Putka, D. J., 9, 21, 23, 25, 26, 27, 33, 35, 37, 692 Richey, R. G., 480
Pyburn, K. M., Jr., 164, 715, 774 Richman, W. L., 177, 178, 179
Riesman, D., 845
Q Riggio, H. R., 825
Riggio, R. E., 343, 345, 825, 828, 832
Quaintance, M. K., 79, 86, 280, 281 Riggs, E., 932
Qualls-Payne, A. L., 23 Riley, Y., 732
Quigley, A. M., 55, 290, 291, 292 Ringenbach, K. L., 329
Quinlan, D. M., 346 Riordan, C. M., 137, 786
Quinn, R. P., 537 Rioux, S. M., 468, 470
Quiñones, M. A., 281, 419, 771, 832 Roberts, B. W., 300, 301, 303, 309, 497, 731
Quintanilla, I., 934 Roberts, J. S., 180
Roberts, R. D., 300, 339, 341, 342, 347, 352
R Roberts, R. R., 340
Robertson, D. U., 54
Rachels, J., 572, 574 Robertson, G. J., 588
Rae, G., 28, 32 Robertson, I. T., 543, 544, 729, 731, 732, 913, 934
Rajaratnam, N., 13, 14, 101 Robie, C., 27, 311, 386, 412, 772
Raju, N. S., 115, 116, 119, 241, 405, 406, 909, 913, 916, 917 Robins, G., 343
Raknes, B. I., 490 Robinson, E., 385
Ramo, M., 932 Robinson, S. L., 490, 491, 501, 505, 555
Ramos, R. A., 745 Rock, D. A., 349, 729
Ramsay, H., 240 Rock, J., 639
Ramsay, L. J., 154 Rodahl, K., 279
Ramsberger, P. F., 688, 692 Rodgers, R., 516, 519, 528
Ramsey, J., 206 Rodriguez, D., 80
Ramstad, P. M., 66, 135, 241 Roe, R. A., 213, 909, 934, 935, 936
Rashkovsky, B., 607 Roehling, M. V., 128, 129, 135, 404
Rashovsky, B., 109 Roese, N., 322
Raskin, E., 325 Roethlisberger, F. J., 953
Raudenbush, S. W., 37 Rogelberg, S. G., 801, 802, 805, 806, 807, 817
Rauschenberger, J. M., 239, 242 Rogers, C. R., 953
Ravenhill, N. A., 279 Rogers, D. L., 519
Ravlin, E. C., 128, 129, 245, 322, 810 Rogers, K., 504
Raybourn, E. M., 475 Rogers, P. L., 346
Raykov, T., 17, 29, 30, 31 Rogers, W M.., 28, 55, 160, 632
Raymark, P. H., 33, 37, 80, 306, 347 Rogg, K. L., 785
Raymond, M. R., 158, 159 Rokeach, M., 321, 322
Rayson, M. P., 279 Roland, R. R., 55
Read, W., 517 Roll, C. R., 689, 690
Reardon, K. K., 539 Rolland, J. P., 910, 935
Redding, J., 479 Roos, P. A., 143, 887
Reddon, J. R., 166, 301 Rooy, D. L. V., 205
Redman, T., 478 Rosen, C. C., 469, 474
Ree, M. J., 103, 162, 259, 270, 456, 545, 690 Rosenbaum, R. W., 490
Reeve, C. L., 111, 116, 128, 159, 180, 230, 257, 303 Rosenberg, S., 448
Reeves, T. J., 265 Rosenthal, D. B., 63, 732, 858
Reeves, V., 284 Rosenthal, R. A., 346, 537, 574, 599
Reider, M. H., 215, 473, 805, 828, 904 Ross, J. J., 679
Author Index 977

Ross, M., 422 Saklofske, D. H., 344


Rosse, J. G., 770, 772 Saks, A. M., 127, 131, 196
Rosse, R. L., 881 Sala, F., 341, 344
Rostker, B., 687 Salancik, G. R., 824, 834
Rostow, C. D., 263, 496 Salas, E., 86, 463, 464, 471, 477, 801, 802, 803, 804, 805,
Roth, L., 62, 164, 264, 609 806, 807, 809, 811, 812
Roth, P. L., 116, 160, 162, 236, 265, 269, 305, 306, 347, Sales, T., 55
352, 632, 657, 765, 773, 774, 809 Salgado, J. F., 103, 177, 178, 255, 256, 261, 263, 264, 277,
Rothausen, T. J., 215, 561 309, 455, 493, 494, 499, 721, 730, 731, 732, 785,
Rothe, H. F., 440 910, 913, 914, 915, 917, 918, 921, 933, 934, 935
Rothstein, H. R., 54, 256, 269, 421, 422, 424, 729, 770 Salinas, C., 825
Rothstein, M., 166, 301, 457, 827 Salovey, P., 340, 344, 345, 825
Rothstein, M. A., 291, 533, 534 Saltz, J. L., 776
Rothstein, M. G., 301, 309 Salvaggio, A. N., 776
Rothstein-Hirsh, H., 909 Samuelson, P. A., 574
Rotundo, M., 269, 480, 489, 490, 504, 505, 903 Sanchez, A., 809
Rounds, J. B., 330 Sanchez, J. I., 73, 82, 83, 86, 87, 90, 92, 263
Rousseau, D. M., 816 Sandall, D. L., 903
Rowe, P. M., 141 Sanders, M. S., 541, 542
Rowland, K. M., 824 Sands, W. A., 681
Rozeboom. W. W., 36 Santa Maria, D. L., 282
Rozell, E. J., 66, 196, 381, 765 Saunders, D. M., 131
Roznowski, M., 400 Saunders, D. R., 846
Ruan, C. M., 284 Sawin, L. L., 944
Rubin, R. S., 480 Saxe, R., 771
Rubineau, B., 477 Scalia, C., 639
Ruiz, G., 136, 141 Scalora, M. J., 497
Rumsey, M. G., 303, 305, 684, 881 Scarpello, E. G., 279
Rupinski, M. T., 916, 917 Schaie, K. W., 268, 347
Rupp, D. E., 190, 308, 478, 732, 858 Schakel, L., 346
Rupp, D. R., 829 Schat, A. C. H., 538
Russell, C. J., 397, 833, 910, 918 Schaubroeck, J., 204, 423, 535
Russell, D. P., 130, 186 Schaufeli, W. B., 538
Russell, J. A., 300 Schemmer, F. M., 282
Russell, J. T., 241, 711 Scherbaum, C. A., 54, 57
Russell, M., 560 Scherer, K. R., 346
Russell, S., 129 Schinkel, S., 421, 424
Russell, S. S., 406, 407, 408, 410 Schippers, M. C., 812
Rutte, C. G., 813 Schippmann, J. S., 81, 90, 91, 237, 584, 765, 828
Ryan, A. M., 63, 127, 136, 138, 140, 183, 207, 215, 237, Schittekatte, M., 657
256, 260, 305, 311, 381, 412, 420, 421, 422, 423, Schleicher, D. J., 422
424, 429, 731, 747, 773, 786, 787, 828 Schley, S., 555
Ryan, D., 535 Schmelkin, L. P., 600
Ryan, K., 468, 469, 470, 472 Schmidt, F. L., 11, 15, 21, 22, 24, 25, 26, 27, 28, 33, 35, 36,
Rychlak, J. F., 846, 861 51, 56, 57, 61, 64, 65, 66, 75, 101, 103, 104, 115,
Rynes, S. L., 127, 128, 130, 133, 136, 137, 138, 141, 142, 116, 119, 161, 163, 201, 236, 241, 245, 246, 255,
143, 145, 197, 240, 245, 371, 514, 580 258, 261, 262, 268, 269, 305, 307, 309, 311, 371,
472, 493, 494, 499, 526, 580, 596, 598, 599,
S 722, 729, 730, 732, 733, 745, 746, 75, 455, 456,
457, 458, 747, 771, 813, 828, 866, 883, 884, 909,
Saari, L. M., 515 910, 912, 913, 914, 915, 918
Saavedra, R., 803 Schmidt, W. H., 322
Sacco, J. M., 138, 186, 205, 785 Schmit, M. J., 80, 347, 412, 454, 473, 544
Sachs, H., 925 Schmitt, N., 9, 23, 28, 32, 51, 55, 60, 61, 65, 66, 82, 103,
Sackett, P. R., 9, 61, 62, 63, 66, 67, 75, 107, 154, 158, 160, 154, 157, 158, 160, 162, 164, 205, 236, 242, 257,
161, 162, 164, 236, 258, 260, 264, 265, 266, 260, 269, 304, 305, 306, 328, 330, 332, 333,
269, 294, 311, 344, 348, 351, 422, 465, 480, 342, 344, 348, 349, 350, 352, 355, 382, 412,
490, 491, 493, 494, 504, 505, 545, 555, 580, 433, 463, 464, 469, 472, 608, 632, 690, 697,
608, 609, 631, 688, 689, 690, 692, 693, 715, 747, 760, 773, 785, 786, 866
730, 761, 773, 774, 828, 901, 935, 936 Schneider, B., 23, 60, 66, 82, 86, 106, 161, 196, 200, 203,
Safrit, M. J., 288 204, 215, 216, 365, 765, 776, 809
Sager, C. E., 58, 86, 351, 449, 451, 452, 464, 553, 877, 900 Schneider, J. R., 824, 830
Sagie, A., 504, 536 Schneider, M., 833
Sagiv, L., 323 Schneider, R. J., 299, 300, 301, 343, 345, 769, 897
978 Author Index

Schneier, C. E., 516 Sheppard, L., 55, 160, 632


Schoen, R. A., 326 Sheppard, V. A., 279, 281, 282, 283
Schoenfeldt, L. F., 771 Shiarella, A. H., 105, 639
Scholarios, D., 240, 884 Shields, J. L., 689
Scholarios, T. M., 884 Shimizu, K., 205
Scholz, G., 308, 935 Shimmin, S., 921, 923
Schoneman, P., 31 Shin, K., 495, 505
Schooler, C., 268 Shine, L. C., 596
Schraagen, J. M., 80 Shiner, R., 309
Schrader, B. W., 788 Shoben, E. W., 291
Schroder, H. M., 326 Shotland, A., 55
Schroeder, E. P., 291 Shrivastava, P., 381, 392
Schuler, H., 308, 421, 496, 498, 501, 502, 934, 935 Shrout, P. E., 21, 23, 26, 31, 37, 286
Schuler, R. S., 151, 198 Shullman, S. L., 441
Schulze, R., 341, 347 Shurtz, R. D., 533
Schutte, N. S., 345 Siddarth, S., 771
Schuyt, R., 833 Siem, F. M., 190
Schwab, D. P., 127, 445 Silbey, V., 60
Schwager, S., 539 Silva, J. M., 710, 715, 771
Schwartz, G. E., 346 Silver, M., 894
Schwartz, S. H., 322, 323 Silverhart, T. A., 771, 775
Scott, B. A., 467, 472, 501 Silverman, S. B., 311, 413,791
Scott, W. D., 51 Silvester, J., 141
Scott, W. R., 214 Silzer, R., 63, 828
Scratchley, L. S., 771 Simmering, M. J., 202
Scullen, S. E., 13, 22, 30, 33, 729 Simon, T. M., 533, 534
Searcy, C. A., 771 Simoneit, M., 931
Searle, S., 560 Simons, C., 809
Searle, S. R., 20, 26 Simons, T., 813
Searle, W., 792, 793 Simpson, R., 560
Sedlak, A., 448 Simpson, S. A., 545
Segall, D. O., 156, 684 Sims, H. P., 832
Seijts, G. H., 807 Sims, W. H., 688
Seisdedos, N., 934 Sin, H., 811
Sekula, B. K., 281 Sinangil, H. K., 935
Seligman, C., 322 Sinar, E. F., 157, 182
Sellman, W. S., 679, 680, 681, 685, 687, 688, 690, Sinclair, R. R., 479
693, 697 Singh, N., 131
Selzer, B. K., 256 Singh, R., 560
Semadar, A., 343 Sipe, W. P., 66, 196, 215, 809
Senge, P. M., 555 Sisco, H., 305
Sessa, V. I., 823, 830, 832 Skarlicki, D. P., 240, 490, 491, 504
Shackleton, N. J., 934 Slade, L. A., 786
Shackleton, V., 729, 787 Slocum, J. W., Jr., 329
Shaffer, J., 241 Slora, K. B., 496
Shaffer, M. A., 791, 792, 793 Smallwood, N., 145, 206
Shah, P. P., 480 Smerd, J., 534
Shalin, V. L., 80 Smith, A. B., 735, 830
Shane, G. S., 746 Smith, A. P., 545
Shanker, T., 143 Smith, B., 555
Sharf, J. C., 293 Smith, C. A., 465, 466, 554
Shavelson, R. J., 12, 16, 17, 21, 22, 24, 25, 26, 27, 32, 37, Smith, C. S., 831
402 Smith, C. V., 279
Shaw, D. G., 516 Smith, D., 66
Shaw, J. C., 501 Smith, D. A., 496, 693
Shaw, J. D., 199, 205, 490, 538, 807 Smith, D. B., 196, 215, 809
Shaw, K. N., 515 Smith, D. E., 422, 446
Shaw, M. N., 697 Smith, E. M., 471
Sheehan, M. K., 401 Smith, J. A., 826
Shellenbarger, S., 563 Smith, K. A., 813
Shelton, D., 343 Smith, M., 729, 731, 732, 934
Shen, W., 266 Smith, M. A., 715, 748, 774
Shepherd, W. J., 772 Smith, P. C., 157, 442, 444, 524, 954
Sheppard, C., 490 Smith, P. L., 29
Author Index 979

Smith-Jentsch, K. A., 807 Stewart, S. M., 500


Smither, J. W., 421, 422, 773, 829 Stoffey, R. W., 421, 422, 773
Smolensky, E., 552 Stokes, G. S., 305, 729, 771, 772, 827
Snape, E., 478 Stone, D. L., 191
Snell, A. F., 305, 748, 772 Stone, N. J., 305, 347, 809
Snell, S. A., 89, 195, 198, 204, 205 Stoop, B. A. M., 354
Snitz, B. E., 391 Storms, P. L., 496, 503
Snoek, J. D., 537 Stough, C., 345
Snow, C. C., 89 Strahan, E. J., 407
Soetjipto, B. W., 478 Strasser, S., 538
Solberg, E. C., 363 Straus, S. G., 141
Son, C. ., 495 Strauss, G., 520
Song, L. J., 342, 828 Strauss, J. P., 304, 769
Sorbom, D., 29, 404, 405, 406 Stricker, L. J., 349, 729
Sosik, J. J., 832 Strickland, W. J., 59, 679, 686
Sothmann, M. S., 279, 286, 287, 288, 289, 290 Strober, M. H., 199, 205
Spalek, B., 504, 505 Stroh, L. K., 782
Sparrowe, R. T., 478 Stromme, S. G., 279
Spearman, C., 9, 11, 12, 14, 15, 23, 24, 36, 43, 255, 258, Strong, M. H., 897
950, 951 Stroup, W. W., 27
Spector, P. E., 346, 347, 348, 456, 489, 490, 491, 492, 493, Stroupe, J. P., 771
494, 496, 498, 499, 500, 501, 502, 503, 504, Stumpp, T., 263, 455, 935
540, 903, 910, 921 Sturman, M. C., 240, 243
Spell, C. S., 236 Su, H., 134
Spencer, L. M., 90 Subramanian, R., 544
Spencer, S., 91 Sullivan, S., 533
Spiegel, E., 833 Sullivan, S. E., 541
Spies, R. A., 709 Sulsky, L. M., 447, 788
Spinath, F. M., 255 Sun, L., 538
Spreitzer, G. M., 88 Sun, L.-Y., 473, 480
Spychalski, A. C., 419 Sundstrom, E., 809, 816
Stagl, K. C., 477 Süss, H.-M., 343, 344
Stahl, G., 792 Sussman, M., 54
Stamm, C., 533 Sutton, M., 561
Stankov, L., 300, 342 Sutton, R. I., 1, 395
Stanley, D. J., 498 Suutari, V., 218
Stanley, J. C., 949 Svensson, G., 834
Stanton, J. M., 128, 157 Swank, E., 385
Stark, E. M., 807 Switzer, F. S., III, 116, 160, 265, 657, 765, 773, 774
Stark, S., 158, 180, 300, 303, 446, 466 Sytsma, M. R., 33
Starke, M., 127, 134, 136, 137
Starr, J. M., 255, 268 T
Staw, B. M., 88
Stawarski, C. A., 697 Tabachnick, B. G., 287
Steel, L., 690 Taggar, S., 807
Steel, P. G., 104 Tajfel, H., 812
Steelman, L. A., 533 Takeuchi, R., 204, 470, 478
Steenkamp, J. E., 409, 410 Tamanini, K. B., 721
Steffensmeier, D., 497 Tan, H. T., 340
Steffy, B. D., 490 Tangirala, S., 471
Steilberg, R. C., 205 Tannenbaum, S. I., 801, 803
Stein, J. H., 55 Tarique, I., 791, 793, 794
Steindl, J. R., 270 Tasa, K., 807
Steiner, D. D., 257, 404, 423, 787, 788 Tate, L., 378, 380, 383, 384, 385
Steiner, I. D., 809, 811, 812 Tausig, M., 535
Steinmayr, R., 351 Taylor, E. K., 392
Stelly, D. J., 117 Taylor, G. A., 132
Stern, W., 258, 923, 925 Taylor, H. C., 241, 711
Sternberg, R. J., 339, 347, 349, 402, 832 Taylor, J. E., 387
Stevens, C. K., 130, 810 Taylor, J. J., 824
Stevens, M. J., 82, 215, 801, 803, 805, 806, 817 Taylor, M. S., 127, 196
Stewart, G. L., 464, 468, 469, 765, 769, 774, 775, 809, 811, Taylor, P., 658
812, 813, 814, 815, 816, 817, 910, 911, 917, 918 Taylor, S., 135
Stewart, J. E., 882 Taylor, S. J., 828
980 Author Index

te Nijenhuis, J., 259, 657, 658 Treadway, D. C., 541


Teachout, M. S., 102, 456, 771 Treiman, D. J., 887
Templer, K. J., 388 Treviño, L. K., 380, 384, 385, 500, 501, 503, 826
Tenbrunsel, A. E., 804 Trevor, C. O., 761
Tenopyr, M. L., 305 Triandis, H. C., 139
Tepper, B. J., 490, 504, 538 Trierweiler, L. I., 33
Terpstra, D. A., 291 Trigg, M, K., 143
Terpstra, D. E., 55, 66, 196, 349, 381, 765 Tripp, T. M., 490, 504
Terris, W., 490, 496 Trout, B., 696
Teta, P., 300 Trumbo, D. A., 514
Tetrick, L. E., 531, 535, 558 Truss, A., 688
Tett, R. P., 89, 166, 301, 308, 310, 342, 345, 457, 827, 946 Truxillo, D. M., 163, 423, 768, 773
Thatcher, S. M. B., 813 Tryon, R. C., 13
Thayer, P. W., 58, 59 Tsacoumis, S., 899, 900, 901
Theorell, T., 537, 556 Tsui, A. S., 787
Thibaut, J., 422 Tucker, J. S., 479
Thijs, M., 346 Tung, R. L., 792, 794
Thirtle, M. R., 693 Tupes, E. C., 299
Thomas, A., 913, 916 Turban, D. B., 127, 129, 130, 136, 137, 165, 166, 471, 478
Thomas, B. A., 464, 475 Turhan, A., 31
Thomas, D. R., 561 Turner, N., 557, 558
Thomas, J. C., 827 Turnley, W. H., 481
Thomas, K. M., 132 Turvey, C., 344
Thomas, L. L., 377 Tushman, M., 803
Thomas, R., 391, 393 Tversky, A., 950
Thomas, S. C., 138, 183, 257, 422, 773 Tweed, R. G., 771
Thomas-Hunt, M., 816 Tyler, P., 116, 160, 265, 657, 774
Thompson, D. E., 560 Tyran, K. L., 814
Thompson, L. F., 133 Tziner, A., 504, 813
Thoresen, C. J., 354, 540, 775
Thoresen, J. D., 53, 354, 775 U
Thornbury, E. E., 129
Thorndike, E. L., 51, 342 Ugelow, R. S., 289
Thorndike, R. L., 9, 553, 949 Uggen, C., 497
Thornton, G. C., III, 63, 308, 584, 641, 732, 829, 830, 843, Uggerslev, K. L., 129, 136, 141
844, 858 Uhlaner, J. E., 686
Thorsteinson, T. J., 128, 423 Ulrich, D., 88, 145, 206
Thune, A., 279 Um, K. R., 155
Thurstone, L. L., 303, 446 Underhill, C. M., 303
Tierney, B. W., 500 Ungerford, M., 824
Tiffin, J., 264 Updegraff, J., 900
Timmreck, C., 563 Urry, V. W., 65
Tippins, N. T., 1, 156, 157, 178, 213, 225, 363, 369, 378,
379, 380, 381, 386, 388, 585 V
Tisak, M. S., 343
Toegel, G., 829 Valentine, L. D., 680, 687
Tomlinson, E. C., 473 Valenzi, E. R., 60
Tonidandel, S., 829 van Dam, K., 215, 237, 929
Toops, H. A., 553 Van de Walle, D., 329
Toossi, M., 812 van den Berg, P. T., 935, 936
Topolnytsky, L., 498 Van den Berg, S. A., 792, 793
Toquam, J. L., 103, 310, 351, 873 van den Bosch, G., 934
Torchy, V., 934 van der Flier, H., 657, 658
Tornow, W. W., 829 van der Linden, W. J., 155
Toth, C. S., 771 Van Der Molen, H. T., 791
Toth, P., 501 van der Veer, G. C., 804
Toulouse, E., 922 van der Vegt, G., 813
Towler, A., 496, 501 Van der Zee, K., 346
Tracey, T. J., 330 van Dierendonck, D., 421, 424
Trankell, A., 933 Van Dyne, L., 477, 803
Trapnell, P. D., 299 van Gineken, M., 927
Trattner, M. H., 66 Van Hoye, G., 130, 140, 141
Traub, R. E., 600 Van Iddekinge, C. H., 33, 37, 215, 306, 347, 805, 901
Traw, K., 533 Van Keer, E., 352
Author Index 981

van Knippenberg, D., 812, 814 Wand, B., 846


van Leest, P. F., 658 Wanek, J. E., 494, 545, 555
van Leeuwen, L., 658 Wang, A., 342
Van Rooy, D. L., 299, 340, 341 Wang, D., 473
Van Scotter, J. R., 454, 467, 526 Wang, H., 473
van Tuijl, H., 813 Wang, L., 771
Van Vianen, A. E. M., 423, 791, 792 Wang, M., 406, 407, 408, 410
van Welie, M., 804 Wang, S., 532, 801
Vance, C. M., 139 Wanous, J. P., 128
Vandenberg, R. J., 158, 406, 407, 409, 410, 411, 537, 786 Ward, C., 792, 793
Vanderpool, M., 266 Warner, J. T., 687, 699
Vangeneugden, T., 37 Warr, P. B., 556, 557, 922, 934
Vansickle, T. R., 79 Warthen, M. S., 697
Vardi, Y., 490 Washington, D. O., 497
Vaubel, K. P., 288 Waters, B. K., 681, 684, 686, 687
Vaughn, D. S., 93 Waters, S. D., 697
Vecchio, R. P., 128 Watola, D., 479
Venkataramani, V., 422, 469, 471 Watson, C. B., 826
Ventura, P. L., 533, 534 Watson, R., 535
Verhaeghen, P., 268 Watson, T. W., 690
Vernon, P. E., 844, 933 Watts, S. A., 421, 429
Vetter, V. A., 580 Wayne, S. J., 787
Viano, K., 135 Webb, N. M., 26, 27, 32, 402
Vijn, P., 909, 934 Webber, S. S., 480, 812, 813, 815
Villado, A. J., 165, 351, 387, 804, 813 Webster, J., 129, 133, 141
Villanova, P., 58, 107, 454, 514, 552 Wechsler, D., 399
Vincent, N. L., 264 Weekley, J. A., 157, 181, 195, 204, 206, 305, 771, 772,
Vinchur, A. J., 1, 2, 765, 767, 768, 771, 772, 774, 922 775, 776
Visher, C. A., 496 Weekly, J. A., 177
Viswesvaran, C., 11, 25, 43, 57, 66, 101, 116, 155, 246, 255, Wegener, D. T., 407
256, 261, 263, 270, 277, 299, 300, 301, 302, Weick, K. E., 440, 558, 824
304, 307, 309, 340, 341, 448, 451, 452, 453, 493, Weiner, J. A., 155
494, 495, 499, 555, 721, 731, 769, 770, 775, 791, Weiner, N., 834
792, 825, 827, 828, 883, 910, 913, 935 Weis, S., 343, 344
Viteles, M. S., 51, 930 Weisband, S., 177
Vodanovich, S. J., 540 Weiss, H. M., 322
Volpe, C. E., 803 Weiss, J. R., 187, 188
Voskuijl, O., 823, 935 Weissbein, D. A., 471
Voyer, D., 266 Weissman, D., 793
Voyer, S., 266 Weissmuller, J. J., 79
Vredenburgh, A. G., 542 Weitz, B. A., 771
Vroom, V., 832 Weitz, E., 490
Welch, D. E., 220
W Welch, J., 235
Wellins, R. S., 722, 801, 804, 809
Wackerle, F. W., 831 Welsh, E. T., 469, 474
Wadsworth, E. J. K., 545 Welsh, J. R., 690
Wagner, R. K., 339, 340, 402 Wendt, R. N., 587
Wagner, U., 497, 502 Werbel, J. D., 809, 810, 816
Wai, J., 264 Werner, J. M., 454, 480
Wainer, H., 596 Wernimont, P. F., 305, 728, 945
Waldman, D. A., 267, 829 Werts, C. E., 29
Walker, C. B., 684, 881 Wesson, M. J., 470, 495, 501
Walker, L., 422 West, S. G., 10
Walker, P. A., 346 Wetzel, C. D., 697
Wall, T. D., 805 Wexley, K. N., 445
Wallace, J. C., 464, 475, 540 Whaley, M. H., 279, 282, 291
Wallace, P. A., 855 Whalley, L. J., 255
Wallace, S. R., 691 Whetzel, D. L., 116, 260, 305, 306, 340, 348, 732,
Wallbott, H. G., 346 770, 772
Wallis, D., 923 White, L. A., 303, 305, 310, 454, 472, 526
Walther, L., 923 White, L. L., 775
Walumbwa, F., 505 White, S. S., 204, 464, 765
Wanberg, C. R., 479 Whiteman, M. C., 255
982 Author Index

Whiting, S. W., 480 Woolf, H., 949, 950


Whitney, K., 82, 811 Woycheshin, D. E., 86
Whittington, J. L., 833 Wright, C., 479
Whyte, G., 196, 240 Wright, J., 813
Whyte, W. H., Jr., 845 Wright, P. M., 195, 196, 197, 198, 201, 203, 571
Widaman, K. F., 33, 407 Wright, T. A., 55
Wiechmann, D., 136, 422, 429, 463, 785 Wroblewski, V. R., 303
Wiemann, S., 236, 494 Wuensch, K. L., 133
Wiesenfeld, B. M., 422 Wulff, C., 128
Wiesner, W. H., 305, 803 Wunder, R. S., 377
Wigdor, A. K., 244, 286, 607, 689, 690, 693, 883
Wiggins, J. S., 299 X
Wiley, W. W., 78
Wilhelm, O., 352 Xie, J. L., 504, 505, 535
Wilhem, O., 347
Wilk, S. L., 61, 730 Y
Willemsen, M. E., 791
Williams, A. P. O., 89 Yamakawa, K., 284
Williams, B. A., 180, 303 Yang, J., 478
Williams, G. O., 929 Yauger, C., 544
Williams, K. M., 380, 385 Yen, W. M., 14
Williams, K. Y., 812, 813 Yerkes, R. M., 686
Williams, L. J., 466, 467 Ying, Z., 156
Williams, M. L., 132 Yongmei, L., 198
Williams, R. H., 36 Young, C. E., 88
Williams, T. J., 55 Young, M. C., 303, 305
Williams, W. M., 339, 402 Youngcourt, S. S., 335
Williamson, I. O., 134 Younger, J., 145
Williamson, L. G., 404 Yu, K. Y. T., 130, 133
Willis, S. L., 347 Yuce, P., 129
Wilson, J. M., 804 Yun, S., 478, 832
Wilson, M., 791 Yusko, K. P., 732
Wilson, M. A., 77, 87, 902
Wilson, M. G., 537 Z
Wilson, S. E., 177
Wilt, J. M., 349 Zabojnik, J., 833
Wing, H., 689 Zaccaro, S. J., 476, 825, 832
Wingate, P. H., 641 Zajac, D. M., 329
Winkler, C., 775 Zald, D. H., 391
Winters, D., 135 Zanutto, E., 813
Wintle, J., 136 Zapata-Phelan, C. P., 470
Wise, L. L., 57, 58, 104, 691, 692, 693, 722 Zapf, D., 490
Wiseman, R. L., 791, 792 Zawacki, R. A., 809
Witkin, H. A., 325 Zedeck, S., 86, 106, 161, 164, 287, 365, 465, 593
Witt, L. A., 343, 353, 477, 502 Zeidner, J., 884
Wittmer, D. P., 576 Zeidner, M., 340, 341, 342, 658
Witvliet, C., 140 Zeitlin, S. B., 346
Woehr, D. J., 236, 401, 446, 447, 467, 732 Zellars, K. L., 540
Wofford, J. C., 832 Zenderland, L., 399
Wolfe, D. M., 537 Zickar, M. J., 386, 399, 412, 572
Wolfinger, R. D., 27 Zimmerman, D. W., 29, 36
Wong, C. M., 343 Zimmerman, R. D., 134, 165, 205, 809
Wong, C. S., 342, 345, 828 Zink, D. L., 644
Wong, S. P., 21, 24, 26 Zook, L. M., 454
Wood, G., 834 Zottoli, M. A., 128
Wood, L. M., 351 Zubek, J., 540
Wood, R. E., 476 Zuckerman, M., 300
Wood, T. M., 288 Zumbo, B. D., 29
Wood, W., 394 Zusman, R. R., 133
Woodruff, R. B., 774 Zweig, D. I., 236
Subject Index
3M physical performance testing, 285
sales representatives, core sales competencies for, recruits in U.S. Military, 688
782–783 Age
work behavior dimensions, 783 and counterproductive work behavior, 497
disparate treatment based on, 642, 645–646
A and occupational safety, 544
and work stress, 539–540
Ability emotional intelligence model, 340, 341, 342; see Age Discrimination in Employment Act of 1967, 292–294,
also Emotional intelligence; Intelligence 788
Ability Requirement Scales (ARS), 79 Aggression, 496
Aborigines, in research on cognitive ability, 657–658 Albemarle v. Moody, 632, 634, 635, 636
Abusive supervision, 538 Albertsons v. Kirkingberg, 647
Academic versus social intelligence, 343; see also Alternate forms of test security, 153–156
Intelligence Alternate test formats, 157–158
Access, to selection tools, 188 American College of Sports Medicine (ACSM), 278
Accident proneness, 543 physical activity intensity, categories of, 279
Achievement motivation, 527; see also Motivation American Educational Research Association (AERA),
construct of, 329–330 9, 403
Adaptability, 537–538, 643; see also Adaptive American Institutes for Research (AIR), 866
performance American Psychological Association (APA), 9, 403, 419
Adaptive performance, 555; see also Adaptability American Public Transportation Association (APTA), 710
definition of, 463–465 Americans with Disabilities Act of 1990 (ADA), 291, 534,
dimensions of, 464–465 788
distal individual differences, 469–470 Analysis of variance (ANOVA), 11
excessive changes, 480–481 Angoff method, 62
immediate/proximal determinants, 471–472 Applicant(s), 628
impact of, 479 knowledge of company, 129
measurement, 474–475 reactions, to feedback, 420
moderators and mediators, 476–477 fairness perceptions, 422–423
predictors of, 468–469 feedback and self-image, 423–426
Adaptive Thinking & Leadership (ATL) program, 475 screening tools, 174–175
Adaptive work-related behavior, definition of, 464 versus current employees, 752–753
Administering assessments and decision-making, 377; see Applicant tracking systems (ATS), 173–175
also Decision-making Arduous jobs, job analysis for, 277; see also Work
candidate information, combination of, 390 analysis
clinical versus mechanical combination, 390–392 biomechanical analysis, 278–279
decision-makers, influences affecting, 392–395 environmental working conditions, identification of,
unproctored internet testing (UIT), 377 279–280
examinee identity authentication and cheating, ergonomic analysis, 278–279
379–380 physical abilities, identification of, 280–281
perceived value of testing, 380–382 physically demanding tasks, 277–278
response to, 389–390 physiological analysis, 278–279
simulation, conducting, 387–388 Aristotle, 574
test security, 380 Armed Forces Qualification Test (AFQT), 681, 684–685,
testing environment, standardization of, 378 686, 687
UIT effects, 382–387 Armed Services Vocational Aptitude Battery (ASVAB),
UIT simulation, 388–389 60–61, 455, 681, 684, 687, 691, 697, 865, 866,
Administrative versus technical innovation, 391 872–873
Adverse employment decisions, 611–612, 630–631, 644 misnorming, 688–689
reduction in, 640, 752 revision of, 697
Advisory Panel for the Dictionary of Occupational Titles Army Alpha test, 686
(APDOT), 894–905 Army Beta test, 686
Aerobic capacity tests, 282 Army Classification Battery, 747
Affirmative action, 672 Army General Classification Test (AGCT), 686
African Americans Army Vocational Interest Career Examination (AVOICE),
cognitive ability testing, 265–269 457, 871, 872

983
984 Subject Index

Assessment center (AC) method, 584–585, 723, 731–732, Blue collar jobs, 741; see also Technical jobs, in private
847–848 sector organizations
advantages and disadvantages, 855–857 BNSF v. White, 644
early developments, 843–844 Boston NAACP v. Beecher, 637–638
in executive selection, 829 Bradley v. City of Lynn, 632
feedback, 417–418, 420 Brunet v. City of Columbus, 634
multitrait-multimethod (MTMM) analysis of, 860 Bullying, 490
for personality measurement, 307–308 Bultmeyer v. Fort Wayne, 647
in World War II, 844–845 Burlington v. Ellerth, 644
Assessment of Background and Life Experiences (ABLE), Business case, making, 186
457, 871, 872, 884 Business game, 846
Assessment tools and techniques, for leadership Business trends
screening methods baby boomers retirement
behavioral consistency method, 729–730 and experience leaders, searching of, 723
biographical data, 729 declining organizational commitment, 724
tests and inventories globalization and diversity of followers, 723–724
assessment centers method, 731–732 leader readiness, 724–725
cognitive ability tests, 730 on-the-job development experiences, 722
interviews, 732–733 Business value, of personnel selection, 196, 235
personality measures, 730–731 challenges, 236–237
situational judgment tests, 731 dynamic and contemporary approach, 237
AT&T, 845 dynamic change and change management, 237–239
Authoritarianism, 854 hiring and HR management, selection beyond,
Authority and accountability alignment principle, 239–240
230–231 multiple organizational stakeholders, expectations
Automated testing, 175–176 of, 239
computerized adaptive rating scales (CARS), 445–446 social context and interpersonal processes, importance
computerized and web-based testing, 585 of, 240
computerized assessments and feedback, 429–430 benchmarking current practices, 246–247
Automation, 188 business value of employee selection, rationale for
Avoid performance goal orientation (APGO), 329 evaluating, 240–241
employee selection, outcomes of, 241–245
B managers knowledge, about employee selection,
245–246
Baby boomers, retirement of, 723 selection processes, need for, 247–248
Banding, 62 traditional model, 235–236
Barth v. Gelb, 643 valuation of success, reasons for, 248
Basic ability tests, 281, 282, 283 recommendations and future directions, 249–250
Bayesian models, 104
Beck v. University of Wisconsin, 647 C
Behavioral consistency method, 729–730
Behavioral indicators (BIs), 91 California F-scale, Bass version, 846
Behaviorally anchored rating scales (BARS), 442, California Psychological Inventory (CPI), 672
444, 947 Candidate’s perspective
Behavioral requirements, for leaders, 725–728 costs of, 428
assessment principles feedback, benefits of, 427
multiple selection technique, 727 Career achievement profiles, see Behavioral consistency
selection and development, 727–728 method
skills and potentials,, 726–727 Career Force, 865, 866
transparency, 728 CEO Briefi ng, 246
Behavioral social intelligence versus cognitive social Challenges to business value, of employee selection,
intelligence, 343 236–237
Behavior observation scales (BOS) format, 445 Cheating studies, incidence of, 384–385
Behavior summary scales (BSS) format, 444–445 Citizenship performance
Bell System, 846, 847 definition of, 465–466
Bernard v. Gulf Oil, 635 distal individual differences, 470–471
Berndt v. Kaiser, 641 excessive changes, 481
Bew v. City of Chicago, 639 facets of, 466
Big Five personality, dimensions of, see Five-Factor immediate/proximal determinants, 472–474
Model (FFM) impact of, 480
Biodata measurement, 475–476
in executive selection, 827 moderators and mediators, 477–479
measures for personality measurement, 304–305 predictors of, 468–469
Biographical data, 382, 710, 729; see also Biodata and task performance, 454
Subject Index 985

Civil Rights Act of 1991 (CRA-91), 631 sales work, 766


Civil Rights Act of 1964, 292–294, 788 and strategic work analysis, 89–92
Clady v. Los Angeles, 638 systems thinking, 91
Class Compilation, 203
defined by, 706 Composition, 203
specification, 706 Computer adaptive testing (CAT), 156, 179–180
Classic Model of performance, 553 Computerized adaptive rating scales (CARS), 445–446
Classical reliability theory, 600 Computerized and web-based testing, 585
Classical test theory (CTT), 9, 14–16, 102, 400 assessments and feedback, 429–430
Classical tradition, 24 Concern over future consequences (CFC), 471
Clinical judgment, 63–64 Concurrent validation designs, 64–65
Code of conduct, 580 Confidentiality, 573
Coefficient of equivalence and stability (CES), 25 Configural invariance, 407–408
Cognitive ability Confirmatory factor analysis (CFA), 10, 404, 405
in employee selection, 255 and linear structural relations (LISREL), 405–407
acceptability of, 255–258 Confirmatory Factor Analytic Tradition, 27–32
age differences in, 267, 267 MI/E in selection research, 411
criterion-related validity evidence, 261–264 criteria, 412–413
current usage, 255–258 predictors, 412
definitions and theoretical underpinnings, 258 multifaceted replicates and noncrossed measurement
future challenges, 269–270 designs in, 32–34
group differences on, 264–269 Connecticut v. Teal, 631
history, 255–258 Conscientiousness, 540, 544
measurement, 259–261 in result measurement, 526–527
race and ethnic group differences, 265, 266 Conscription, 686–687
sex differences, 265, 266 Consequential validity, 53, 104; see also Validation;
structure, 258–259 Validity
use in 18 countries, 257 Consequentialist theory, 576
and job performance, 455–456 Consistency, 13
mean difference on, 657–658 Construct validity, 39, 51, 101, 402, 614, 635; see also
in result measurement, 526 Validation; Validity
and safety behaviors, 544, 545 Contamination, 403
versus cognitive styles, 325 confirmatory factor analytic approaches to, 404–405
Cognitive ability testing, 730, 731, 754 Contemporary Affairs Test, 846
in executive selection, 827 Content validity, 39, 51, 106–107, 402, 614; see also
meta-analyses of, 761 Validation; Validity
for sales and service operations selections, 772 characteristic features, 117
Cognitive versus behavioral social intelligence, 343 and job relatedness, 635
Cognitive styles, in employee selection research, 324 relationship to local generalizations, 118
versus cognitive abilities, 325 and work analysis, 633, 634
concerns and emerging issues, 326–327 Contextual performance, 554–555
future research directions, strategic agenda for, Contractors, 757
334–335 Convergent validity, 598; see also Validation; Validity
versus personality traits, 325 Conversion hysteria, 543–544
practical considerations in employee selection, Corporate and unit interests management
332–333 approval roles, 219–220
versus strategies, 324–325 expatriate employment, 220
studying, reasons for, 325 funding, 219
varieties of, 325–326 Counterproductive work behavior (CWB), 489–506, 555
Cognitive task analysis, 80 actor-based research, 490
Collective bargaining agreements, 752, 758 assessment of, 491–492
Commissioned officers, categories of, 679–680 consequences of, 503–505
company-grade officers, 679–680 definitions, 489–490
field-grade officers, 680 dimensions of, 491
flag officers, 680 environmental conditions, 499–502
general officers, 680 and person, 502–503
promotion of, 696 future directions, 505
Competence, 203, 322, 580, 708 potential antecedents of, 492–499
contingent teamwork, 803 emotions, 498–499
core teamwork, 802–803 individual differences, 492–498
emotional intelligence, 341 job attitudes, 498, 499
entry-level competencies, for global companies, 725 target-based research, 490
of executives, 824–826 Criterion deficiency, 555–556
professional, 583 “closing in” on, 562–563
986 Subject Index

Criterion-focused occupational personality scales, 770 test administration, 717


Criterion-oriented methods test elements versus job requirements, relationship,
characteristic features, 114 716–717
and local generalizations, 114–115 scoring process, 717–718
Criterion-related validity, 39, 51, 402, 598, 614; see also Demands-control model, 537
Validation; Validity Deontological theory, 576
evidence, 261 Department of Labor’s Office of Safety and Health
job knowledge, acquisition of, 261 Administration (OSHA), 751
job performance, 261, 262 Developing and selecting assessment tools, decisions in,
learning, cognitive ability tests for, 261 363
moderators of cognitive ability test validity, 262 constructs, measuring, 364
multivariate model for, 105 administrative concerns, 368–370
variables affecting, 309–310 alternatives, consideration of, 367–368
Critical business process, 724–725 applicant reactions, 370–371
Critical incident technique, 78, 92 feasibility of, 365
Critical mass, 208 group differences in test score, 366
Critical Thinking in Social Science Test, 846 importance and need at entry to job, 364–365
Cross-situational consistency hypothesis, of predictor- KSAOs to measure, number of, 365–366
criterion pairs, 909–910 legal ramifications, consideration of, 368
Crystallized versus fluid form of social intelligence, 343 organizational goals, criteria relative to, 366
Cultural differences, in employee selection, 333 organizational reactions, 370
Culture poor hiring decisions, consequences of, 370
and recruitment, 131 selection decisions, adverse impact in, 366
Current employees timing, 366
applicants versus, 752–753 scores, using, 373
Cut score, 62–63, 586, 603 combining scores, 374
critical, 638 reported test score, calculation and form of,
effective, 638 373–374
nominal, 638 test scores, use of, 374–375
set in U.S. Military, 691 validity evidence, gathering, 371
setting, 164–165 appropriateness and feasibility, 371–372
Cut-off scores, 609, 612 test development, cost of, 372–373
versus top-down selection, 748 test, validity of, 371
validation studies and utility of selection program,
D 372–373
Development Dimensions International (DDI), 722
Dalton v. Suburu-Izuzu, 647 Deviance, 490
Data-based employee selection, 1–3 Diagnostic Analysis of Nonverbal Accuracy (DANVA2),
Data cleaning, 60–61 346
Data storage, 188 The Dictionary of Occupational Titles (DOT), 887–894,
Daubert thresholds, 622 925
Decision-maker, role of criticisms of, 893–894
“broken leg” case, 393–394 job analysis data, collection of, 891
clinical versus mechanical combination, 390–392 structure and content of, 890–891
mechanical combination process, involvement in, 393 use of, 892–893
self-discovery, 394–395 Differential item functioning (DIF), 603
technical merits, of mechanical combination, 393 Differential prediction, see Predictive bias
Decision-making, 596–597; see also Administering Disability
assessments and decision-making decisions for cases related to, 646–648
definition of, 859 disparate treatment based on, 642
modes of, 61 Disadvantaged groups, 652, 653–655
Angoff method, 62 Discriminant validity, 599; see also Validation; Validity
banding, 62 Disparate impact evidence, 665
clinical judgment, 63–64 Distal individual differences
cut scores, 62–63 adaptive performance, 469–470
scores, profiles of, 63 citizenship performance, 470–471
statistical judgment, 63–64 Distributive justice, 422
top-down selection using test scores, 61 Downsizing, 641–642
Decisions to make before collection, in test scores, Drama-based training, 539
151–153; see also Decision-making Dynamic and contemporary approach to employee
Defensibility of process, 716–718 selection, 237
documentation, 718 dynamic change and change management, 237–239
job analysis, 716 hiring and HR management, selection beyond,
multiple versus composite score, relationship, 717 239–240
Subject Index 987

multiple organizational stakeholders, expectations of, qualitative (behavioral) approaches, 245


239 utility analyses, 241–243
Dynamic change and change management psychometric musings on, 943–950
in selection processes, 237–239 constructs and job-relatedness, 945
contaminating sources of variance, 949
E context and internal structure, evaluation of,
947–948
Economist Intelligence Unit (EIU), 246 measurement or psychometrics, 943
Education, in U.S. Military selection, 684 new psychometric principles, need for,
Edwards Personal Preference Schedule, 846 944–945
EEOC v. Aetna, 643 predictor constructs, taxonomies of, 947
EEOC v. Allstate, 646 quantification in, 950–954
EEOC v. Ligget Myers, 643 reliability and validity, difference between,
EEOC v. Routh, 647 948–949
Effective Practice Guidelines, 246 selection procedures, criteria for validating,
Eligible list, 706–707 946–947
Embedded Figures Test, 325 trinitarian doctrines of, 945–946
Emergence, forms of, 203 true validity, 946
compilation, 203 validity, evolution of, 945
composition, 203 science-based, 1–3
Emotional Accuracy Scale (EARS), 346 self-report data, problems of, 333
Emotional Competence Inventory (ECI), 344 Employee skill strategy, 217–219
Emotional intelligence, 730 Employee theft, 593
ability emotional intelligence model, 340, 341, 342 Employment Litigation Section (ELS), 289
competence, definition of, 341 Enlistment-screening test, 692
conceptual framework, 349–351 Ensley NAACP v. Seibels, 638
definition, 340 Enterprise resource planning (ERP), 189
future research, strategies for Entry-level leadership; see also Leadership, selection
bivariate relationships, 352–353 methods and desired outcomes; Mid-level
disentangling methods and constructs, 351–352 leadership
longitudinal validation designs, 353–354 assessment, 726–727
multilevel perspective, adopting, 354 tools and techniques, 728–733
predictor and criterion, conceptual matching case studies of, 733–737
between, 351 changing behavioral requirements, 725–728
measurement approaches, 344 competencies, for global companies, 725
assessment center exercises, 345, 348–349 current business trends affecting, 722–725
interviews, 345, 347 for high-level assignments, 722–723
other-reports, 345, 346 globalization and diversity of followers, 723–724
performance-based tests, 345, 346–347 high-velocity hiring for, 733–735
self-report approach, 344–346 integrating assessment content and technology,
situational judgment test, 345, 347–348 721–722
trait emotional intelligence model, 340, 341 leadership pipeline approach for, 735–737
Employee safety, 751–752 skills and potentials, assessment principles for, 726
Employee selection Environment, in blue collar selection
education, branding, 954–956 economic changes, impact on, 742
in Europe labor organization
after World War II (1945–1980), 931–934 role of, 743
resurgence of (since 1980), 934–936 working with, 743
scientific (1900–1945), 921–931 line management, working with, 742–743
knowledge of managers in, 245–246 technical and management staffing, 742
legal and social issues in, 332–333 tradition of distrust, 743
musings on perpetual disconnect in, 950–954 Environmental working conditions, identification of,
assessment-based decisions, 952 279–280
judgment, 953 EQ-I, 344
nonlinearity and profile matching, 951 Equal Employment Opportunity Commission (EEOC),
personal assessment by consulting organizations, 629
practice of, 953–954 Equal Employment Opportunity laws, 817
psychologists and managers, difference between, Equality, as criterion of fairness, 577
952 Equating, 155
scientist-practitioner model, in I-O psychology, Equity, as criterion of fairness, 577
950–951 Equivalence measurement, 177–178
outcomes of, 241–245 Ercegovich v. Goodyear, 641–642
expectancy charts and performance differences, Ethical egoism, 574
243–245 Ethical Principles of Psychologists, 419
988 Subject Index

Ethics, of employee selection, 571–589 Expectation, 12–13


assessment centers, 584–585 Expert-only strategy, 221–222
computerized and web-based testing, 585 Expert-owner strategy, 222–223
dilemmas, 578–580
harm prevention, 578–579 F
role conflict, 579–580
temptation, 579 Factor analysis, 100
values conflict, 579–580 Factor variance-covariance (FVC) invariance, 410
issues in selection, 583–584 Fairness, 602
validity, 583 need, as criterion of, 577
professional competence, 583 Family-friendly policies, 561
test security, 583 Faragher v. Boca Raton, 644
multiple responsibilities, 584 Feedback
general guidelines, 587–589 in assessment process, 417
individual level assessments, 584 applicants’ reactions to feedback, 420–426
multiple stakeholder perspective, 575 assessment center feedback, 417–418
participants in validation research, 572–574 benefits and costs of, 426–431
principles, 576–578 executive selection, individual assessment for,
beneficence, 577 418–419
fairness and justice, 576–577 implications, 431–433
moral character, 578 and professional standards, 419–420
nonmaleficence, 577–578 test feedback, 418
respect, 576 opportunities, for different assessment methods,
in professional practice, 580–582 428–429
situational/contextual organizational issues, Feliberty v. Kemper, 647
585–587 Field independence, 325
organizational pressure to misuse test data, Final test battery, selection of, 286
586–587 canonical correlation, 287
in retesting, 586 logistic regression, 287
in setting cut scores, 586 Financial metrics, 523
in unionized organization, 585–586 Fink v. New York City, 647
universalist perspective, 574–575 Five-Factor Model (FFM), 299, 300, 301, 494–495,
values position, 572 730–731, 768–769; see also Personality
Ethics Resource Center (ERC), 379 questionnaires
Eudaimonia, 574 in leader selection, 826, 827
Examination plan, 709 and work stress, 539–540
Executive(s), 823–835 Flexibility and equilibrium tests, in employee selection,
assessment techniques, 826 282
assessment centers, 829 Fluid form of social intelligence, see Crystallized versus
biodata, 827 fluid form of social intelligence
cognitive ability tests, 827 Forced-choice formats, for personality measurement,
individual psychological assessment, 828 302–303
leadership questionnaires, 829–830 Frame-of-reference training, 446–447, 521
multisource appraisal, 829 Functional job analysis, 78; see also Work analysis
personality questionnaires, 827–828 Furnco v. Waters, 631
competencies and attributes of, 824–826
integrity, 826 G
intelligence, 825
personality, 826 Geller v. Markham, 645
self-efficacy, 826 Gender
social/emotional intelligence, 825 and counterproductive work behavior, 497
definition of, 823–824 disparate treatment based on, 642
fitting to strategic demands of position and occupational safety, 544
culture, 833 and work stress, 540
followers, 832–833 General aptitude test battery (GATB), 177
outcome, 832 General mental ability (GMA), 114
team, 832 Generalizability theory (G-theory), 16–19, 99, 102, 600
impact on organizational performance, 833–834 dependency of σ2T and σ2E
performance and potential, evaluation of, 830–831 on characteristics of the measurement procedure,
selection, in HRM system, 830 21–22
Executive Order 11246, 628 on desired generalizations, 19–20
Executive selection, individual assessment for, 418–419 on intended use of scores, 20–21
Expectancy charts and performance differences local generalizations, 108–109
between high and low scorers, 243–245 from test scores, 101–102
Subject Index 989

Geographic dispersion, 781 I


Gillespie v. Wisconsin, 635, 639
Global branding, 131 Ideal point response methods, for personality
Globalization and diversity of followers measurement, 303–304
changing leadership requirements, 723–724 “Identification” rule, 631
Goal cascading, 516–517 Immediate/proximal determinants
Goal-setting behavior, 769 adaptive performance, 471–472
Grade point average (GPA), 154 citizenship performance, 472–474
Grandparenting, 753 In situ performance, 559
Graveyard shifts, 535–536 Inadequate data reporting, 65–66
Grenier v. Cyanamid, 647 In-basket technique, 846, 858
Griggs v. Duke Power, 633, 635, 637 Incentives
Grow Your Own Leaders, 735 for healthcare programs participation, 533, 534
Growth markets, 131 Inconsistency, 13
Grutter v. Bollinger, 640 Index of Vocal Emotion Recognition (Vocal-I), 346
Guardians v. Civil Service, 634, 639 Individual differences
Guilford-Martin Inventory, 846 and counterproductive work behavior, 492–498
background factors, 497
H cognitive ability, 496
demographic factors, 496–497
Hamer v. Atlanta, 638 Five-Factor Model, 494–495
Han Dynasty, 705 individual personality traits, 496
Hare v. Potter, 644–645 integrity tests, 494
Hayden v. County of Nassau, 640, 645 Individual expectancy chart, 243–245
Hazen v. Biggens, 646 Individual performance objectives
Health Insurance Portability and Accountability Act cascading goals and, 516–517
(HIPAA), 533–534 challenges associated with, 519–522
Health insurance premiums controllable objectives, 521
growth rate of, 532 developing comparable and fair objectives,
Healthy workers, 531–535 520–521
healthcare costs, 532–533 fluid jobs, 522
modifiable health risk factors on, 532–533 job-relevant objectives, 519–520
organizational wellness programs, 533–534 measuring important aspects, 522
comprehensive, 533 team-based jobs, 521–522
selection of, 534–535, complications in, 534–535 training, 519
Hedberg v. Indiana Bell, 647 guidelines for developing, 517–519
Hierarchical linear models (HLMs), 37 and higher-level objectives, linkage of, 517
High-fidelity item presentation, 180–182 measuring results of, 522–523
High-involvement organizations, see High-performance and success, multilevel issues, 558–562
organizations (HPOs) work-family conflict, 560–562
High-performance organizations (HPOs), 88 Individual psychological assessment
High-performance work practices (HPWPs), 571 in executive selection, 828
High potentials, definition of, 823–824 Industrial-organizational (I-O) psychology, 1
High-stakes decisions, 559 criterion problem in, 552–558
High-velocity hiring criterion deficiency, 555–556
for entry- and mid-level leadership, 733–735 definition of success, 553–555
Hippocratic Oath, 581 definitions and assumptions of, 552–553
Hispanics health and well-being integration, 558
cognitive ability testing, 265–269 individual health, 556–558
physical performance testing, 285 organizational health, 558
recruits in U.S. Military, 688 occupational analysis by, 887–889
Honesty-humility, 495 research on children, 561–562
Host country nationals (HCNs), 220 scientist-practitioner model in, 950–951
HR-XML consortium, 187 values of, 572
HTML (hypertext markup language), 187 Informed consent, 582
Human capital management, 726 in validation research, 573
Human capital theory, 199 Inimitable resources, 198
Human resource management (HRM) Institute of Psychotechnics, 928
executive selection in, 830 Instrumental stakeholder theory, 575
hiring and, 239–240 Instrumental values versus terminal values, 322–323
leaders in, 722 Integrity of leadership, 826
strategy, 216–217 Intelligence, 825, 945
Human Resources Research Organization (HumRRO), academic, 343
866 definition of, 258
990 Subject Index

emotional, 825, 828 inducements, 143


social, 343, 825, 828 pay and promotion opportunities, 142
Interest measures, motivational construct of, 330–331 and talent management, 145
Internal and external candidates, differences between, 428 and technology, 144
Internal structure evidence timeliness, 142
characteristic features, 118 work-life balance, 143
relationship to local generalizations, 118 potential applicants, reaching, 127
International Archive of Education Data (IAED), 788 knowledge of company, 129
Internet applicant recruitment activities, 130
case law, 185–186 recruitment information source, 128–129
data privacy and protection, 184–185 sourcing globally, 130–132
definition, 183–184 strategic talent management, 134–136
Interpersonal Competency Inventory (ICI), 349 technological innovation, 133–134
Interpersonal conflict Job component validity (JCV), 109
and counterproductive work behavior, 500 Job demands, 537
Interpersonal Perception Task-15 (IPT-15), 346 Job Diagnostic Survey (JDS), 79
Interpersonal stressors, 500 Job element method, 79
Interview(s), 732–733 Job hopping, 724
assessing cognitive ability, 260 Job knowledge tests, 441
construct validity of, 305–306 Job Orientation Blank (JOB), 871, 872
cost to organization, 368–369 Job performance
cultural differences, 787 dimensionality of, 447–450
error in rating, 400–401 measured in U.S. military, 689–691
and measuring Job relatedness, 707, 945
emotional intelligence, 347 Job simulation/work sample tests, 281, 283
personality, 305–306 advantages and disadvantages of, 283
practical intelligence, 347 Johnson v. City of Memphis, 632
social intelligence, 347 Joint-Service Job Performance Measurement/Enlistment
public sector, role of, 710 Standards Project (JPM), 689–691, 866
tools, 176 Journal of Business and Psychology, 413
Invariance Justice perceptions, and feedback, 422–423
meaning of, 404
sources of K
configural, 407–408
factor variance-covariance, 410 Katkovsky’s Management Incomplete Sentences Test,
metric, 409–410 846
uniqueness, 410–411 Knowledge, skills, and abilities (KSAs), 57
I-O psychology, scientist-practitioner model in, 950–951 Knowledge, skills, abilities, and other attributes (KSAOs),
Isoinertial tests, 284 73, 200, 464, 806–808
Isokinetic test, 282
Isometric/static strength tests, 281 L
Isotonic tests, 281–282
Item response theory (IRT), 14n, 23n, 37, 100, 155, 180, Labor leaders, 743, 751
301, 303, 386, 405, 446 Labor organizations, 750, 758
role of, 743–744
J working with, 743, 751
Lanning v. SEPTA, 638
Japanese and Caucasian Brief Affect Recognition Test Latent failures, 542
(JACBART), 346 Lawson v. CSX, 647
Job Adaptability Inventory (JAI), 464 Leader Behavioral Description Questionnaire, 829
Job analysis, see Work analysis Leaderless group discussion, 846–847
Job candidates, attracting to organizations, 127 Leader member exchange (LMX), 473
interest maintaining, 136 Leaders, 727; see also Leadership
around globe, 139–140 changing behavioral requirements for, 725–728
and hiring process, 138–139 skills and potentials, assessment principles for, 726
recruiter demographics, 137 competencies, for global companies, 725
selection procedures, 138 mistakes by, 724
site visits, 137 in private sector organization, 727
of targeted talent, 141–142 skills and potentials, assessment principles for, 726
technology, role of, 140–141 skills needed for, 727
treatment of applicants by recruiters, 136 Leadership, 823–835; see also Entry-level leadership;
offers, accepting, 142 Leaders; Mid-level leadership
around globe, 143–144 assessment techniques, 826
family and friends, 143 assessment centers, 829
Subject Index 991

biodata, 827 disparate treatment, 642–643


cognitive ability tests, 827 downsizing, 641–642
individual psychological assessment, 828 job-related alternatives, 631–633
leadership questionnaires, 829–830 optimal balance between job-relatedness and adverse
multisource appraisal, 829 impact, 639–641
personality questionnaires, 827–828 prevention of EEO violations, 643–645
competencies and attributes, 824–826 significance of criterion-related validity coefficient,
integrity, 826 637–638
intelligence, 825 validation evidence, 634–636
personality, 826 work analysis, 633–634
self-efficacy, 826 Levels of Emotional Awareness Scale (LEAS), 346
social/emotional intelligence, 825 Line management
ethical leadership, definition of, 825 working with, 742–743, 750–751, 757
fitting to strategic demands of position Linear mixed models (LMMs), 37
culture, 833 Linear structural relations (LISREL), 404
followers, 832–833 and confirmatory factor analysis (CFA), 405–407
outcome, 832 Lineworkers, 742
team, 832 Linkage methods, 106
impact on organizational performance, 833–834 Local generalizations, 108–109, 946, 948
performance and potential, evaluation of, 830–831 and content-oriented evidence, 117–118
pipeline approach, for entry- and mid-level leadership, and criterion-oriented methods, 114–115
735–737 and internal structure evidence, 118
questionnaires, for executive selection, 829–830 and meta-analysis, 115–116
selection methods and desired outcomes professional considerations, 118–120
assessment, 726–727 synthetic validation, 116–117
case studies of, 733–737 and types of evidence, 112–114
changing behavioral requirements for, 725–728 Lomack v. City of Newark, 640–641
competencies, for global companies, 725 Low-fidelity simulations, see Situational judgment test
current business trends affecting, 722–725
globalization and diversity of followers, 723–724 M
for high-level assignments, 722–723
integrating assessment content and technology, Maintenance, 189
721–722 Maladjustment
in private sector organization, 727 personal, 543
skills and potentials, 726 social, 543
skills needed for, 7 Management and labor, selling selection procedures to
Leadership Opinion Questionnaire (LOQ), 829 adverse employment decisions, reduction in, 752
Learning goal orientation (LGO), 329 employee safety, 751–752
Learning goals versus performance goals, 328–329 selection procedure, fairness of, 752
Leftwitch v. Harris Stowe, 645 Management by objectives (MBO), 516
Legal compliance, of selection policies, 230 Management Progress Study (MPS), 843–863
Legal environment, in various countries, 651–675 assessment center method
data collection methodology, 652–673 advantages and disadvantages, 855–857
discrimination, 665, 667, 674 early developments in, 843–844
evidence for, 665 multitrait-multimethod analysis of, 860
refutation of, 665, 667, 668–671 in World War II, 844–845
discussion, 673–675 managerial assessment, beginnings of, 845–847
laws protecting specific groups, 658–665, 666–667, 674 managerial success, predicting, 847–852
law violation, consequences, 667, 668–671 design and sample, 849
mean difference between subgroups, 657–658, 674 long-term predictive elements, 850–851
preferential treatment for minority groups, 668–671, sustainability, 849–850
672, 674–675 turnover, 851–852
science-based selection tools, 672–673, 675 selection legacy, 858–863
selection methods, banning, 667, 672 successful managers’ development over time
status of disadvantaged subgroups, 652, 653–655, 673 management style, 854–855
status of women in workplace, 652, 655–657, 673 managerial abilities, evaluation of changes in,
Legal principles, 627–648 852–853
adverse impact in multistage hiring process, 630–631 motivation for advancement, 853–854
age discrimination, 645–646 Management staffing, 742
“applicant” definition in Internet recruitment, 629–630 Managers and psychologists, difference between, 952
criterion information in performance appraisal, Mayer-Salovey-Caruso Emotional Intelligence Test
636–637 (MSCEIT), 346; see also Emotional
cut scores, 638–639 intelligence; Intelligence
disability decisions, 646–648 Meacham v. Knolls, 636, 643, 646
992 Subject Index

Measurement bias, 608 Moral character standards, in U.S. military enlistment,


Measurement error, 9, 10 692
emerging perspectives on, 36–37 Moral values, 322
Measurement invariance/equivalence (MI/E), 404 Motivation
Measures, evaluation of, 399 definition of, 327
CFA MI/E in selection research, 411 and safety behaviors, 544
criteria, 412–413 Motivational constructs, in employee selection research,
predictors, 412 327
invariance, sources of concerns and emerging issues, 331–332
configural, 407–408 examples of, 328
factor variance-covariance, 410 achievement motivations, 329–330
metric, 409–410 interest measures, 330–331
uniqueness, 410–411 trait goal orientations, 328–329
LISREL and CFA, 405–407 future research directions, strategic agenda for
reliability, 400 adaptive value, 334–335
momentary time-limited factors, error due to, dimensionality, 334
401–402 interconstruct relationships, 335
raters, error due to, 400–401 person-environment fit, 335
sources of error, 402 specificity, level of, 334
uniqueness of individual items, error due to, practical considerations, in employee selection
400 cultural differences, 333
validity, 402 legal and social constraints, 332–333
contamination, 403–405 self-report data, problems with, 333
sufficiency, 402–403 subgroup differences, 333
Measure-to-construct relationships, 107 studying, reasons for, 328
Measure-to-measure relationships, 107 Multiculturalism, 781
Mediation, meaning of, 467–468 Multidimensional Emotional Intelligence Assessment
Medical model, 115 (MEIA), 345; see also Emotional intelligence;
Mental health, 556–557 Intelligence
Meta-analysis, 607–608, 635 Multilevel issues, consideration of, 66
characteristic features of, 115–116 Multilevel personal selection system, 956
relationship to local generalizations, 116 Multinational organizations, selection in, 781–796
Metric invariance, 409–410 cross-national selection and assessment, challenges of
Mid-level leadership; see also Entry-level leadership; conceptual equivalence across cultures, creating,
Leadership 786
assessment, 726–727 cross-culturally acceptable methods, developing,
tools and techniques, 728–733 786–788
case studies of, 733–737 selection constructs applicable across cultures,
changing behavioral requirements, 725–728 determining, 785–786
competencies, for global companies, 725 systems affecting employee selection, national
current business trends affecting, 722–725 differences in, 788–789
globalization and diversity of followers, 723–724 employee selection practices
for high-level assignments, 722–723 to leverage, integration of, 784–785
high-velocity hiring for, 733–735 for local responsiveness and flexibility,
integrating assessment content and technology, differentiation of, 784
721–722 for worldwide coordination and control,
leadership pipeline approach for, 735–737 standardization of, 782–784
skills and potentials, assessment principles for, 726 implications for employee selection
Miller v. Illinois, 647 international assignment candidate selection,
Minimum qualification (MQ) individual-level antecedents for, 790–793
in public sector employment, 709 international assignments, types of, 789–790
in selection process, 630–631 practices in international assignee selection,
Minnesota Job Description Questionnaire (MJDQ), 80 793–795
Minnesota Multiphasic Personality Inventory (MMPI), Multiple regression, 608, 612
672 Multiple stakeholder management, 572
Minority population, 652 Multipurpose Occupational Systems Analysis Inventory-
Mobbing, 490 Close-Ended (MOSAIC), 80
Moderation, meaning of, 467 Multipurpose work analysis systems, 92
Moderators and mediators Multisource (360º) appraisal, in executive selection, 829
adaptive performance, 476–477 Multi-trait multi-method (MTMM), 59, 860
citizenship performance, 477–479 Murphy v. UPS, 647
Montreal Set of Facial Displays of Emotion (MSFDE), Muscular endurance/dynamic strength tests, 282
346 Muscular strength tests, 281
Moore v. Philadelphia, 644 Myers-Briggs Type Indicator (MBTI), 326
Subject Index 993

N Online testing, 156; see also Computerized and web-based


testing
National Council for Measurement in Education (NCME), Organizational aggression, 490
9, 403 Organizational citizenship behavior (OCB), 465
National Labor Relations Board, 752 Organizational commitment, 724
Navy Computer Adaptive Personality Scales (NCAPS), Organizational deviant behavior, 555
303 Organizational expectancy chart, 243–245
Negative psychological effects (NPEs), 425 Organizational performance, 558
Nested modeling approach, 406 Organizational politics, 538–539
Nixon, Richard, 687 Organizational stressors, 500–501
Nonipsative forced choice, for personality measurement, Organization development (OD), 90
303 Organization for Economic Cooperation and Development
Normative stakeholder theory, 575 (OECD), 532
Norm-referenced testing, 289 Organization purposes, 215–216
Organization’s perspective feedback
O benefits of, 426
costs of, 426–427
Obesity, 534 Orthodox personnel selection, 922
Occupational analysis, see Work analysis Oubre v. Entergy, 642
Occupational Information Network (O*NETTM), 80,
887–889, 894–905 P
application of, 901–905
content model Paper and pencil questionnaire
experience requirements, 896–897 for KSAOs assessment, 805
hierarchical structure, 897 Project A, 870
occupational requirements, 897 for sales and service operations, 771
occupation-specific information, 897 Parent country nationals (PCNs), 220
worker characteristics, 895–896 Parker v. University of Pennsylvania, 629–630
worker requirements, 896 Part-time employment
workforce characteristics, 897 women’s wish in, 657
job titles for common occupations in services and Part-time versus full-time workers, 536
sales, 766 Passing scores, for employment decisions, 288–289
as operational database Pattern-matching approach, 59
data collection and screening, 900–901 Pay and promotion opportunities, 142
occupational structure, 899 Performance appraisal, 636
other revisions, 899–900 Performance goals versus learning goals, 328–329
prototype data collection Performance Improvement Characteristic (PIC) checklist, 80
methods, 897–898 Perseverance, 544
results and evaluation, 898–899 Personal control, 539
service and sales occupations, worker requirements Personality, 2
and characteristics for, 767 and task performance, 457
work styles for childcare workers, 903 measures, 730–731
Occupational Personality Questionnaire, 731 mean difference on, 658
Occupational safety, 541–545 Personality questionnaires, 302; see also Five-Factor
adolescents and, 544 Model (FFM)
behaviors, 544 in executive selection, 827–828
compliance, 544–545 measuring leadership, 829–830
participation, 545 Personality traits, 806–808
systems and criteria, 542 versus cognitive styles, 325
workplace accidents, 542–544 versus social skills, 343
demographic factors, 544 Personality variables, 299
personal characteristics, 543 criterion scores and personality, relationship between,
Offer acceptance, 142 311
around globe, 143–144 measurement methods, 301
family and friends, 143 biodata measures, 304–305
inducements, 143 computer-adaptive, IRT, nonipsative forced choice,
pay and promotion opportunities, 142 303
and talent management, 145 forced-choice item response formats, 302–303
and technology, 144 ideal point response methods, 303–304
timeliness, 142 interview, 305–306
work-life balance, 143 other-reports of individuals’ personality, 304
Office of Workman’s Compensation Programs (OWCP), self-report questionnaire measures, 301–302
751 simulations and assessment centers, 307–308
Officer Candidate/Training School (OCS/OTS), 680, 695 situational judgment test, 306–307
994 Subject Index

nonlinear personality-criterion relationships, 311 interviews, 345, 347


personality-performance relationships, 311 other-reports, 345, 346
structure of, 299–301 performance-based tests, 345, 346–347
validity and factors affecting usefulness, 308 self-report approach, 344–346
adverse impact, 310 situational judgment test, 345, 347–348
criterion-related validity, variables affecting, Practice tests, 760–761
309–310 Predictive bias, 608
incremental validity, 310 Predictor composites, creating
predictor-criterion relationships, nature of, 310–312 choosing predictors, 160–161
Personality-Related Position Requirements Form (PPRF), weighting predictors, 161–162
80 Predictor-criterion relationships, nature of, 310–312
Personal values, 322 Predictor-measure-to-criterion construct relationships,
Person-environment fit, 809 107
Person-group fit, 809–811 Preferential treatment, 672
Personnel Decisions Research Institute, Inc. (PDRI), 866 Principles for the Validation and Use of Personnel
Personnel selection, business-unit-level value, 196–197 Selection Procedures, 364, 604–610
Person-team fit, 809 analysis on work, 605–606
Physical abilities, identification of, 280–281 application to litigation, 605
Physical examination, in U.S. military enlistment, 692 fairness and bias, 608
Physical performance tests, 277, 281 history of, 604
arduous jobs, job analysis for, 277 operational considerations, 608–610
environmental conditions, identification of, bands, 609
279–280 combining selection procedures, 609
ergonomic/biomechanical/physiological analysis, cut-off scores versus rank order, 609
278–279 multiple hurdle versus compensatory models, 609
physical abilities, identification of, 280–281 technical validation report, 609–610
physically demanding tasks, 277–278 utility, 609
benefits of, 294 purpose of, 604
legal issues, 291 versus Standards, comparison, 604–605
Age Discrimination in Employment Act of 1967, validation, 606–607
292–294 content validity, 606–607
Americans with Disabilities Act of 1990, 291 criterion-related, 606
Civil Rights Act of 1964, 292–294 internal structure, 607
test development/selection, 284–285 validity evidence, generalization, 607–608
test scoring and administration meta-analysis, 607–608
passing scores, establishment of, 288–289 synthetic/job component validity, 607
physical performance test preparation, 289 transportability, 607
scoring, types of, 287–288 Private sector organizations, blue collar selection in
test administration, 289 environment, 741
validity of, 285 economic changes, impact on, 742
final test battery, selection of, 286–287 labor organization, working with, 743
Police Officers v. Columbus, Ohio, 635 line management, working with, 742–743
Political skill, 539 tradition of distrust, 743
Polygraph, 667 recruitment and employee development, 759–762
Position Analysis Questionnaire (PAQ), 79, 894, 897 selection process, planning and developing
Position classification plan, 706 constituents, issues, and preferences, 749–754
Positive psychological effects (PPEs), 425 implementing, 754–756
Potential applicants, reaching, 127 maintaining, 756–759
knowledge of company, 129 psychometric and practical considerations, 744–749
recruitment activities, 130 Procedural justice, 422
recruitment information source, 128–129 Production rates, 439–440
Power analysis, 59 Professional ideal, 572, 574
Practical intelligence Professional model, see Professional ideal
conceptual framework, 349–351 Professional standards, and feedback, 419–420
definition, 339–340 Professional standards/guidelines
future research, strategies for importance of authorities, 594
bivariate relationships, 352–353 Principles for the Validation and Use of Personnel
disentangling methods and constructs, 351–352 Selection Procedures, 604–610
longitudinal validation designs, 353–354 reliance of, 621–622
multilevel perspective, adopting, 354 Standards for Educational and Psychological Testing,
predictor and criterion, conceptual matching 595–604
between, 351 Standards/Principles versus Uniform Guidelines,
measurement approaches, 344 616–620
assessment center exercises, 345, 348–349 Standards versus Principles, 615, 616–618
Subject Index 995

technical information versus litigation, 621 Quality, 523


Uniform Guidelines on Employee Selection Quantity, 523
Procedures, 610–615
Profile of Nonverbal Sensitivity (PONS), 346 R
Project A, 865–885
criterion-related validation, 879–882 Race
enabling of, 866–867 and downsizing, 641
implications of, 882–884 recruitment based on, 631
initial theory, 874 Random-Effects Model Tradition, 26–27
job analyses, 873 Rashomon, 400
latent structure of performance, modeling, 875–878 Rater error, 400–401
first-tour job performance, 876–877 Rater error training, 447
second-tour job performance, 877–878 Rater training, 446–447
objectives of, 867 Rating formats, 442
origins of, 865–866 behaviorally anchored rating scales (BARS), 442,
past performance with future performance, 444
correlations of, 878–879 behavior observation scales (BOS) format, 445
performance criteria, 873–874 behavior summary scales (BSS) format, 444–445
research design, 867–869 computerized adaptive rating scales (CARS),
data collection design, 868–869 445–446
Sampled Military Occupational Specialties (MOS), Realistic job previews (RJPs), 536–537
867–868 Recruitment
research instrument development, 869–873 activities, 130
training performance measures and employee development, 759–762
first-tour (entry-level) performance measures, 874 diagnostic testing, 760
second-tour (NCO) measures, 874–875 practice tests, 760–761
Prove performance goal orientation (PPGO), 329 technical school, relationships with, 760
Psychological services, 581–582 information source, 128–129
Psychologists and managers, difference between, 952 Reductions-in-force, see Downsizing
Psychometrics, 594, 943 Reliability, 9, 100, 600
converging trends in validity, 100 consistency, 13
Psychomotor precision, Project A, 871 estimation of, 22–24
Psychopathatic deviants, 543–544 classical tradition, 24
Psychotechnics, 922–925 confirmatory factor analytic tradition, 27–32
object, 923 random-effects model tradition, 26–27
subject, 923 expectation, 12–13
Public sector employment, 705–718 inconsistency, 13
candidate recruitment, 711–712 measurement models, role of
civil service examinations, 706–708 classical test theory, 14–16
multiplicity of jobs, 707–708 generalizability theory, 16–22
test content, defined, 708 replication, 11–12
personnel decision-making and legal jeopardy, and validity, difference between, 948–949
714–718 Reliability theory, 400
balancing validity and diversity, 715–716 momentary time-limited factors, error due to, 401–402
defensibility, 716–718 raters, error due to, 400–401
negative consequences, 715 sources of error, 402
unique competitive process, 714 uniqueness of individual items, error due to, 400
position classification in, 706 Replication, 11–12
promotional process, 712–714 Reporting test scores, 162–163
past performance appraisal, 713–714 Request for information (RFI), 186
promotional tests, development, 713 Reserve Officers Training Corps (ROTC), 694–695
validation, 708–711 Results, 513–528
alternative measures, 710 individual difference predictors of, 526–527
minimum qualification, 709 individual performance objectives and, 516–519
potential selection tools, identification, 709–710 challenges associated with, 519–522
risks and legal challenges, 711 measurement of
role of interview, 710 challenges associated with, 524–526
performance objectives, 522–523
Q and workplace behavior measurement, 514–515
Resume builders, 174
Q sort, 846 Resume parsing tools, 174
Qualitative (behavioral) approaches Resume search tools, 174
for assessing outcomes of employee-selection Resume storage tools, 174
programs, 245 Retaliation, 644
996 Subject Index

Retention, 204–205 decision-making, 163


Retesting, 158–159 cut scores setting, 164–165
Retest policies, 754 profile selection, 165–166
Return on investment (ROI), 726 selection methods, 163–164
Reverse discrimination, 628 and diversity, 207
Ricci v. Destafano, 632, 645 economical change impact on, 742
The Rights and Responsibilities of Test Takers, 419 policy, 226–230
Robinson v. Shell Oil, 644 for technical jobs, 754–755
Rod and Frames Test, 325 procedure, criteria for validating, 946–947
Rokeach Value Survey, 321 process, management of, 231–233; see also Selection
Role ambiguity, 537 process, planning and developing
and counterproductive work behavior, 500 relationship with HR activities, 208–209
Role conflict, 537 and talent segmentation, 207–208
and counterproductive work behavior, 500 Selection process, planning and developing
Role overload, 537 constituents, issues, and preferences, 749–754
Role-plays, 858 control of, 749–750
Rotter Incomplete Sentences Blank, 846 line management, working with, 750–751
Rule of Three Whole Scores, 707 line organization, working with, 751
multiple organizational stakeholders, expectations
S of, 239
parameters around testing, 752–753
Salary selling to management and labor, 751–752
disparity for women, 655 for technical jobs, characteristics, 750
Sales work, 440 fairness of, 752
implications for practice and future research implementation, 754–756
criterion issues, 774–775 influencing constituents, 755
levels issues, 776 selection policies, 754–755
predictor issues, 775 training testing professionals, 755
temporal issues, 775–776 maintenance, 756–759
nature of adequate staffing levels, 756–757
duties and responsibilities, 765–766 contractors, 757
competencies, 766 employee placement, in demanding jobs, 758–759
and service, similarities between, 767 mergers and acquisition, 759
selection for, 768–774 policies and developmental opportunities, 757–758
adverse impact, 773–774 reduction in force, 758
applicant reactions, 772–773 temporary assignments, 757
on background, experience, interests, and other life validity, 756
history dimensions, 771–772 negotiation, 755
on cognitive ability, 772 psychometric and practical considerations, 744–749
Big Five personality dimensions, 768–769 balancing fidelity, with efficiency, 748
narrow personality traits, 769–770 cut-off scores versus top-down selection, 748
service/customer/sales orientation, 770–771 economic consideration, 744
on situational judgment, 772 labor contracts, jobs covered by, 749
worker requirements and characteristics for, 767 technical selection procedure, 745–747
Sarnoff’s Survey of Attitudes Toward Life, 846 test performance, subgroup difference on, 747–748
School and College Ability Test, 846 Selection system governance, 225
Schutte Self-Report Emotional Intelligence Test Self-efficacy, 826
(SREIT), 345; see also Emotional intelligence; Self-esteem, 826
Intelligence Self-image, and feedback, 423–426
Schwartz Value Survey, 323 Self-interest, 574
Scientist-practitioner model, in I-O psychology, 950–951 Self-report questionnaires, for personality measurement,
Scores, profiles of, 63 301–302
Screening methods Self-report survey, 491
behavioral consistency method, 729–730 Service
biographical data, 729 nature of
Selection, 195 competencies, 766
alignment of, 200–203 duties and responsibilities, 765–766
business unit value, potential contribution to, 203 sabotage of, 503
global considerations, 206–207 and sales work, similarities between, 767
as lever for change, 206 selection research for, 768–774
multilevel selection, 203–204 adverse impact, 773–774
selection and retention, 204–205 applicant reactions, 772–773
talent as assets versus costs, 205–206 on background, experience, interests, and other life
critical mass, developing, 208 history dimensions, 771–772
Subject Index 997

on Big Five personality dimensions, 768–769 Social support, 538


on cognitive ability, 772 Social values, 322
implications for practice and future research Society for Human Resource Management (SHRM)
criterion issues, 774–775 Foundation, 246
levels issues, 776 Society for Industrial and Organizational Psychology
on narrow personality traits, 769–770 (SIOP), 249, 285, 672
predictor issues, 775 Sourcing globally, 130
service/customer/sales orientation, 770–771 Southeastern v. Davis, 647
on situational judgment, 772 Spatial and perceptual/psychomotor ability, and task
temporal issues, 775–776 performance, 456–457
worker requirements and characteristics for, 767 Speed of success, 646
Simulations, for personality measurement, 307–308 SPSS, 27
Situational judgment test (SJT), 154, 177, 306–307, 697, Staff training, 189
731, 805 Standard job analysis, 519–520, 525; see also Work
Project A, 875 analysis
for sales and service operations, 772, 773 Standards for Educational and Psychological Testing,
Situational specificity, 909, 910–911 364, 403, 594, 595–604
substantive tests of, 917–918 application of, 595–596
Situational specificity hypothesis (SSH), 910, 911, 913, cautions provided by, 597
914, 915, 917 credentialing procedures, 601
Situational Test of Emotion Management (STEM), 347 development and administration of selection
Situational Test of Emotional Understanding (STEU), 347 procedures, 603
Small sample sizes, 65 employment test standards, 600–601, 602
Smith v. City of Jackson, 641, 646 fairness, 602–603
Smoking, 534 history of, 595
Social capital theory, 199 purpose of, 596
Social complexity, 198 reliability and measurement error, 600
Social context and interpersonal processes, in employee rights and responsibilities, 603
selection, 240 selection decision-making in, 596–597
actual selection decisions, 240 validity, defined by, 596
benchmarking current practices, 246–247 validity evidence, evaluation, 601
business value of employee selection, rationale for validity evidence, integration, 599
evaluating, 240–241 validity evidence, sources, 597–599
employee selection, knowledge of managers, 245–246 content, 597
employee selection, outcomes of, 241–245 internal structure, 598
rational selection data, 240 response processes, 597
selection processes, need for, 247–248 testing consequences, 599
traditional psychometric paradigm of selection, 240 variable relationship, 598–599
Social effectiveness constructs, 343 validity standards, 599–600
Social exchange theory (SET), 471 Starceski v. Westinghouse, 642
Social identity theory, 424 State Occupational Information Coordinating Committees
Social intelligence; see also Intelligence (SOICCs), 893
versus academic intelligence, 343 Statistical judgment, 63–64
cognitive versus behavioral social intelligence, 343 Stone v. Mt. Vernon, 647
conceptual framework, 349–351 Strategic Job Modeling (SJM), 81
definition, 342–343 Strategic talent management, 134–136
fluid form versus crystallized social intelligence, 343 Strategic work analysis (SWA), 89, 90; see also Work
future research, strategies for analysis
bivariate relationships, 352–353 and competency modeling, 89–92
disentangling methods and constructs, 351–352 Strategic Human Resource Management (SHRM), 197
longitudinal validation designs, 353–354 firm, resource-based view, 198–199
multilevel perspective, adopting, 354 human and social capital theories, 199
predictor and criterion, conceptual matching Strategy, 195
between, 351 alignment of, 200–203
measurement approaches, 344 styles versus cognitive styles, 324–325
assessment center exercises, 345, 348–349 Strong Vocational Interest Blank, 330
interviews, 345, 347 Structural modeling, 100
other-reports, 345, 346 Stutz v. Freeman, 647–648
performance-based tests, 345, 346–347 Subgroup differences, in employee selection, 333
self-report approach, 344–346 Subject matter experts (SMEs), 75, 473
situational judgment test, 345, 347–348 Subjective rating scale, 514, 515, 521, 524
social versus academic intelligence, 343 Success
social effectiveness constructs, 343 I-O definition of, 553–555, 562
social skills versus personality traits, 343 adaptive performance, 555
998 Subject Index

contextual performance, 554–555 implications for research


organizational deviant behavior, 555 individual-level considerations, 815
task performance, 553–554 team task analysis, 815
and job performance, differentiation, 552 team-level considerations, 815–816
nonwork factors and, 562 individual-level considerations, 804–809
and performance, multilevel issues, 558–562 to effective team performance, 804–805
Summary judgment for defendants (SJDs), 641 measurement and validation, 805, 809
Supervisory role-play exercises, Project A, 875; see also team-level considerations
Role-plays person-group fit, 809–811
Sustainable selection programs, managing, 213 team composition, 811–815
HR technology team size, 809
guiding principles, 225–226 Team Role Test, 805
selection policy, 226–231 Team staffing, 804
selection process management, 231–233 Team task analysis (TTA), 804, 815
selection system governance, 225 Team type, 803
organization context for selection, 213–214 Teamwork KSA test, 805, 809
selection system sustainability defining, 214 Technical innovation versus administrative innovation,
corporate and unit interests, 219–221 391
employee skill strategy, 217–219 Technical jobs, in private sector organizations
HR strategy, 216–217 with Army Classification Battery, 747
professional selection expertise role, 221–223 definition of, 741
purposes, 215–216 desired characteristic of
selection development and delivery, 223–224 selection procedures for, 750
Sustained competitive advantage, 195 environment for, 741
Sutton v. UAL, 647 economic changes, impact on, 742
Swinburne University Emotional Intelligence Test labor organization, working with, 743
(SUEIT), 345; see also Emotional intelligence; labor organization role, 743
Intelligence line management, working with, 742–743
Synthetic validity, 104, 607, 635; see also Validation; technical and management selection, 742
Validity tradition of distrust, 743
characteristic features, 116 recruitment and employee development, 759–762
relationship to local generalization, 117 selection process, planning and developing
Systems thinking competency, 91 constituents, issues, and preferences, 749–754
implementation, 754–756
T maintenance, 756–759
psychometric and practical considerations, 744–749
Tacit knowledge, definition of, 339–340 Technical schools
Talent developing relationships with, 760
as assets versus costs, 205–206 Technical staffing, 742
management and offer acceptance, 145 Technical support, 189
segmentation, 207–208 Technical trainers, 751
Task performance, 553–554 Technological innovation, in recruitment, 133–134
measurement of, 439 Technology and employee selection, 171
and citizenship performance, 454 critical issues, 176
job performance, dimensionality of, computer adaptive testing, 179–180
447–454 demographic considerations, 182
objective criteria, 439–441 equivalence measurement, 177–178
predictors of, 454–458 high-fidelity item presentation, 180–182
subjective criteria, 441–447 legal considerations, 183–186
Taylor v. Principle Financial, 647 reactions, 182–183
Team composition, 811–815 unproctored internet testing, 178–179
aggregation method, 811–812 implementation issues, 186–189
individual differences, 812–814 industrial-organizational psychology and IT
outcome variables, 814 technology-based systems, 173–176
team and task type, 814–815 Technology and offer acceptance, 144
Team membership, selection for, 801–818 Temporary assignments, 757
contingent teamwork competencies Terminal versus instrumental values, 322–323
task demands, 803–804 Test administration and test scores, 151
team staffing, 804 Test feedback, 418
team type, 803 Testing time, 157
core teamwork competencies, 802–803 Test scores/scoring
implications for practice, 816 and administration, for physical performance test
individual-level considerations, 817 compensatory approach, 287–288
team-level considerations, 817 multiple hurdle approach, 287–288
Subject Index 999

passing scores, establishment of, 288–289 discrimination/adverse impact, 611–612


physical performance test preparation, 289 fairness, 612
test administration, 289 history of, 610
alternate test formats, 157–158 job relatedness, 613
decisions to make before collection, 151–153 purpose of, 610–611
predictor composites, creating selection procedures/employment decisions, 611
choosing predictors, 160–161 validity, 613–615
weighting predictors, 161–162 construct validity, 614
reporting, 162–163 criterion-related, 614
retesting, 158–159 documentation, 614
selection decisions making, 163 utility, 614–615
cut scores setting, 164–165 Uniqueness invariance, 410–411
profile selection, 165–166 Universalist principle, 572
selection methods, 163–164 Unproctored internet testing (UIT), 178, 377
testing time, 157 cheating, 178–179
Test security, 153–156 examinee identification, 178
alternate forms, 153–156 examinee identity authentication and cheating,
computer adaptive tests, 156 379–380
online testing, 156 perceived value of testing, 380–382
Test validation, 2 response to, 389–390
Thematic Apperception Test (TAT), 844, 846 simulation, conducting, 387–388
Third country nationals (TCNs), 220 test environment, standardization of, 378
TI/CODAP (Task Inventory/Comprehensive Occupational test security, 178, 380
Data Analysis Programs), 79 test standardization, 179
Time compression diseconomies, 199 UIT effects, simulating, 382–387
Time-limited error, 401–402 UIT simulation, 388–389
Timeliness, 142, 523 Unproctored web-based cognitive ability assessment,
Tinker Toys, 846 260–261
Title VII of the Civil Rights Act of 1964, 628, 757 U.S. Air Force
Top-down selection transformation in, 698
cut-off scores versus, 748 U.S. Army
selection using test scores, 61 transformation in, 697
Traditional model, of employee selection, 235–236 U.S. Department of Defense (DoD), 679, 687
Training, 519 U.S. Employment Service (USES), 889, 893
Trait affectivity, 543 use of The Dictionary of Occupational Titles, 892
Trait anxiety, 496, 502 U.S. Military, selection and classification in,
Trait emotional intelligence model, 340, 341; see also 679–699
Emotional intelligence; Intelligence all-volunteer force, 686–688
Trait Emotional Intelligence Questionnaire (TEIQue), 344; ASVAB misnorming, 688–689
see also Emotional intelligence; Intelligence career broadening, 696
Trait goal orientations, motivational construct of, 328–329 command selection, 696
Trait Meta-Mood Scale (TMMS), 344 current enlisted selection, 691–692
Transportability, 607 defense transformation, 697–698
Treadwell v. Alexander, 647 enlistment process, 692–693
Trinitarianism, 945–946 job performance measurement project, 689–691
Two-group configural invariance model, 408 military personnel testing, history of, 686
Two-group metric invariance model, 411 need for, 685–686
occupations in, 680, 682–683
U officer commissioning programs selection, 693–695
direct commissioning, 695
UIT simulation, 388–389 marital status, 694
Ultimate criterion, 553 OCS/OTS in, 695
U.N. Educational, Scientific, and Cultural Organization ROTC in, 694–695
(UNESCO), 788 service academies, 694
Unfairness, 612 officer executive development, 696
Unidimensional criterion versus multidimensional officer retention and attrition, 695–696
perspectives, 65 personnel system, 679–680
Uniform Guidelines on Employee Selection Procedures, rank structure in, 680, 681
364, 610–615, 747, 748 recruit quality benchmarks and enlistment standards,
alternative selection procedure, 612–613 693
application and limitations, 611 recruit quality indicators, 680, 681, 684–685
bottom line approach, 612 service differences in, 684
business necessity, 613 U.S. v. Georgia Power, 637
cut-off scores, 612 Utility, 241–243, 596, 609, 614–615
1000 Subject Index

V Values
in employee selection research, 321
Validation, 88, 402; see also Validity concerns and emerging issues, 323–324
contamination, 403–405 future research directions, strategic agenda for,
content-based approaches, 105–106 334–335
differences between entry-level and promotional practical considerations in employee selection,
testing, 712–713 332–333
evidence structure of, 322–323
criterion-related, 598, 614 studying, reasons for, 322
evaluation, 601 of I-O psychology, 572
generalization, 607–608 Vigilance, 544
integration, 599 Vocational and occupational education
sources, 597–599 use of The Dictionary of Occupational Titles, 893
integrated framework for, 107 Vocational interests, and task performance, 457–458
strategies, 51
data cleaning, 60–61 W
in different contexts, 53
limitations of, 64–67 Wagner-Peyser Act of 1933, 889
long-term or scientific perspective, 66–67 Wards Cove v. Atonio, 642–643
process, 102–108 Warrant officer, 680
sufficiency, 402–403 Watson v. Fort Worth Bank, 636
tests and criteria, 102–103 Well-being, 556–557
from test scores, 101–102 Whiteback v. Vital signs, 647
and work analysis, 633 Withdrawal
Validity, 9, 38, 51, 99, 100, 583, 584, 596, 606 assessment of, 491–492
broader consequences of test score use, 44–45 consequences of, 504
consequential, 53, 104 Women
construct, 39, 51, 101, 402, 614, 635 in firefighter recruitment, 712
content, 39, 51, 106–107, 117, 118, 402, 614, 633, 634, occupational segregation of, 656, 657
635 recruits in U.S. Military, 688
convergent, 598 salary disparity compared with men, 655
criterion-related, see Criterion-related validity status in workforce, 652, 655–657
data collection, different approaches, 52 unemployment of, 655
discriminant, 599 Wong Law Emotional Intelligence Scale (WLEIS), 345;
error score, 951 see also Emotional intelligence; Intelligence
evolution of, 945 Work analysis, 73, 605–606, 614, 633–634, 707, 711–712,
generalization, 103–104 825
of inference versus test, 38–39 data collection issues, 83
limiter of, 42–44 data sources, 83
multiple inferences, in validation research, 59 data types and analysis level, 84
nomological net, 57–58 qualitative methods, 85
predictive inference versus evidence, 41–42 quantitative methods, 85
predictor-criterion relationship versus broader frontiers of, 88–93
conceptualization, 38 inferential leaps and linkages, 86–87
psychometrics, converging trends in, 100 information framework, 75–77
and reliability, difference between, 946–949 by I-O psychologists, 887–889
situational factors, 55–57 quality evaluation, 87–88
as strategy for job relatedness, 40–41 practical implications, 81–82
strong versus weak inferences, 54–55 specific methods, 77–81
synthetic, 104, 116, 117, 607, 635 traditional selection-related application, 73–74
from test scores, 101–102 for criterion development, 74
transport, 635 for predictor development, 74
trinitarian approach, 51, 945–946 for validity evidence development, 74–75
true, 946 validity evidence extension, 75
unitarian approach, 52 Work design, 542
versus validity evidence, 39–40 Work environment
Validity generalization (VG), 116, 256, 599, 635, 909, and counterproductive work behavior, 499–502
912–917, 946 group influence, 501–502
evaluation of, 913–914 leadership, 501
situational specificity-cross-situational consistency in, organizational justice, 501
914–917 stressors, 500–501
assumption of independence in, 916–917 Worker-attribute categories, model of, 238
discrete versus continuous, 915–916 Work-family conflict, 560–562
insufficient power to detect moderators, 914–915 Workforce, in 21st century, 552
Subject Index 1001

Work hours, 535, 541 organizational level stressors, 535–537


Work-life balance, 143 personal characteristics, 539–540
Workload, and counterproductive work behavior, 500 World Values Survey, 323
Work locus of control, 496
Workplace behavior, measurement, versus results, 514–515 X
Workplace stress, and mental health, 557–558
Work sample/performance tests, 440–441 XML (extensible markup language), 187–188
Work schedule, 535–536
Work stress, 535–541 Z
demographic characteristics, 540–541
interpersonal relationships, 538–539 Zaccagnini v. Levy, 641
job-level stressors, 537–538 Zuniga v. Boeing, 641

You might also like