Professional Documents
Culture Documents
Previewpdf
Previewpdf
Previewpdf
Assessment
This book highlights the principles of psychological assessment to help researchers and clinicians better
develop, evaluate, administer, score, integrate, and interpret psychological assessments. It discusses psy-
chometrics (reliability and validity), the assessment of various psychological domains (behavior, personal-
ity, intellectual functioning), various measurement methods (e.g., questionnaires, observations, interviews,
biopsychological assessments, performance-based assessments), and emerging analytical frameworks to
evaluate and improve assessment including: generalizability theory, structural equation modeling, item
response theory, and signal detection theory. It also discusses ethics, test bias, and cultural and individual
diversity.
Key Features
• Gives analysis examples using free software
• Helps readers apply principles to research and practice
• Provides text, analysis code/syntax, R output, figures, and interpretations integrated to guide readers
• Uses the freely available petersenlab package for R
Principles of Psychological Assessment: With Applied Examples in R is intended for use by graduate stu-
dents, faculty, researchers, and practicing psychologists.
Dr. Isaac T. Petersen is assistant professor at the University of Iowa. He completed his Bachelor of Arts
in psychology and French at the University of Texas, his PhD in psychology at Indiana University, and his
clinical psychology internship from Western Psychiatric Hospital at the University of Pittsburgh Medical
Center.
Dr. Petersen is a licensed clinical psychologist with expertise in developmental psychopathology. His clini-
cal expertise is in training parents how to deal with difficult children. He is interested in how children
develop individual differences in adjustment, including behavior problems as well as competencies, so that
more effective intervention and prevention approaches can be developed and implemented. He is particu-
larly interested in the development of externalizing behavior problems (e.g., ADHD, conduct problems,
and aggression) and underlying self-regulation difficulties. Dr. Petersen’s primary interests include how
children develop self-regulation as a function of bio-psycho-social processes including brain functioning,
genetics, parenting, temperament, language, and sleep, and how self-regulation influences adjustment. A
special emphasis of his work examines neural processes underlying the development of self-regulation
and externalizing problems, using electroencephalography (EEG) and event-related potentials (ERPs). He
uses longitudinal designs, advanced quantitative methods, and multiple levels of analysis, including bio-
psycho-social processes, to elucidate mechanisms in the development of externalizing problems. His work
considers multiple levels of analysis simultaneously, in interaction, and over lengthy spans of development
in ways that identify people’s change in behavior problems over time while accounting for the changing
manifestation of behavior problems across development (heterotypic continuity).
Chapman & Hall/CRC
Statistics in the Social and Behavioral Sciences Series
Series Editors
Jeff Gill, Steven Heeringa, Wim J. van der Linden, Tom Snijders
Big Data and Social Science: Data Science Methods and Tools for Research and Practice, Second Edition
Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter and Julia Lane
Isaac T. Petersen
First edition published 2024
by CRC Press
2385 Executive Center Drive, Suite 320, Boca Raton, FL 33431, U.S.A.
Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot as-
sume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have
attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders
if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please
write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or
utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including pho-
tocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission
from the publishers.
For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the
Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are
not available on CCC please contact mpkbookspermissions@tandf.co.uk
Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for iden-
tification and explanation without intent to infringe.
DOI: 10.1201/9781003357421
Publisher’s note: This book has been prepared from camera-ready copy provided by the authors.
To our daughter, Maisie.
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Contents
Acknowledgments xxiii
Introduction 1
2 Constructs 21
2.1 Types of Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Differences in Measurement Expectations . . . . . . . . . . . . . . . . . . . 24
2.3 Practical Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 How to Estimate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Latent Variable Modeling: IRT, SEM, and CFA . . . . . . . . . . . . . . . 26
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 Suggested Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3 Reliability 27
3.1 Classical Test Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Measurement Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Overview of Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.4 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Types of Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.6 Applied Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.7 Standard Error of Measurement . . . . . . . . . . . . . . . . . . . . . . . . 67
3.8 Influences of Measurement Error on Test–Retest Reliability . . . . . . . . . 68
3.9 Effect of Measurement Error on Associations . . . . . . . . . . . . . . . . . 70
3.10 Method Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.11 Generalizability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.12 Item Response Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.13 The Problem of Low Reliability . . . . . . . . . . . . . . . . . . . . . . . . 76
3.14 Ways to Increase Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.15 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.16 Suggested Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
vii
viii Contents
4 Validity 79
4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3 Types of Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 Validity Is a Process, Not an Outcome . . . . . . . . . . . . . . . . . . . . . 108
4.5 Reliability Versus Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.6 Effect of Measurement Error on Associations . . . . . . . . . . . . . . . . . 110
4.7 Generalizability Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.8 Ways to Increase Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
4.10 Suggested Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9 Prediction 265
9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
9.2 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
9.3 Receiver Operating Characteristic Curve . . . . . . . . . . . . . . . . . . . 290
9.4 Prediction Accuracy Across Cutoffs . . . . . . . . . . . . . . . . . . . . . . 293
9.5 Prediction Accuracy at a Given Cutoff . . . . . . . . . . . . . . . . . . . . 308
9.6 Optimal Cutoff Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 326
9.7 Accuracy at Every Possible Cutoff . . . . . . . . . . . . . . . . . . . . . . . 328
9.8 Regression for Prediction of Continuous Outcomes . . . . . . . . . . . . . . 330
9.9 Pseudo-Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
9.10 Ways to Improve Prediction Accuracy . . . . . . . . . . . . . . . . . . . . . 333
9.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
9.12 Suggested Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
References 583
Index 611
List of Figures
xiii
xiv List of Figures
8.6 Item Characteristic Curves of an Item with Low Difficulty Versus High
Difficulty. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
8.7 Item Characteristic Curves of an Item with Low Discrimination Versus High
Discrimination. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
8.8 Item Characteristic Curve of an Item from a True/False Exam, Where Test
Takers Get the Item Correct at Least 50% of the Time. . . . . . . . . . . 225
8.9 Item Characteristic Curve of an Item from a 4-Option Multiple Choice
Exam, Where Test Takers Get the Item Correct at Least 25% of the Time. 226
8.10 Item Characteristic Curve of an Item Where the Probability of Getting an
Item Correct Never Exceeds .85. . . . . . . . . . . . . . . . . . . . . . . . 227
8.11 One-Parameter Logistic Model in Item Response Theory. . . . . . . . . . . 228
8.12 Empirical Item Characteristic Curves of the Probability of Endorsement of
a Given Item as a Function of the Person’s Sum Score. . . . . . . . . . . . 229
8.13 Two-Parameter Logistic Model in Item Response Theory. . . . . . . . . . 230
8.14 Three-Parameter Logistic Model in Item Response Theory. . . . . . . . . . 230
8.15 Four-Parameter Logistic Model in Item Response Theory. . . . . . . . . . . 231
8.16 Item Boundary Characteristic Curves from Two-Parameter Graded Response
Model in Item Response Theory. . . . . . . . . . . . . . . . . . . . . . . . 232
8.17 Item Response Category Characteristic Curves from Two-Parameter Graded
Response Model in Item Response Theory. . . . . . . . . . . . . . . . . . . 233
8.18 Item Characteristic Curves from Two-Parameter Logistic Model in Item
Response Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
8.19 Item Information from Two-Parameter Logistic Model in Item Response
Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
8.20 Test Information Curve from Two-Parameter Logistic Model in Item Re-
sponse Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
8.21 Test Standard Error of Measurement from Two-Parameter Logistic Model
in Item Response Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
8.22 Test Reliability from Two-Parameter Logistic Model in Item Response
Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
8.23 Visual Representation of an Efficient Assessment Based on Item Character-
istic Curves from Two-Parameter Logistic Model in Item Response Theory. 240
8.24 Visual Representation of a Bad Measure Based on Item Characteristic Curves
of Items from a Bad Measure Estimated from Two-Parameter Logistic Model
in Item Response Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
8.25 Visual Representation of a Bad Measure Based on the Test Information
Curve. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
8.26 Visual Representation of a Good Measure Based on Item Characteristic
Curves of Items from a Good Measure Estimated from Two-Parameter
Logistic Model in Item Response Theory. . . . . . . . . . . . . . . . . . . . 243
8.27 Visual Representation of a Good Measure (for Distinguishing Clinical-Range
Versus Sub-clinical Range) Based on the Test Information Curve. . . . . . 244
8.28 Test Characteristic Curve from Rasch Item Response Theory Model. . . . 246
8.29 Test Information Curve from Rasch Item Response Theory Model. . . . . 247
8.30 Test Reliability from Rasch Item Response Theory Model. . . . . . . . . . 248
8.31 Test Standard Error of Measurement from Rasch Item Response Theory
Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8.32 Test Information Curve and Standard Error of Measurement from Rasch
Item Response Theory Model. . . . . . . . . . . . . . . . . . . . . . . . . . 250
8.33 Item Characteristic Curves from Rasch Item Response Theory Model. . . . 251
8.34 Item Information Curves from Rasch Item Response Theory Model. . . . . 251
List of Figures xvii
24.1 Question Asking About One’s Hispanic Origin in the 2020 U.S. Census. . 566
24.2 Question Asking About One’s Race in the 2020 U.S. Census. . . . . . . . 567
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
List of Tables
7.1 Criteria for Acceptable and Good Fit of Structural Equation Models Based
on Fit Indices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
7.2 Modification Indices from Confirmatory Factor Analysis Model. . . . . . . 202
7.3 Modification Indices from Structural Equation Model. . . . . . . . . . . . 213
This book was supported by a grant from the University of Iowa Libraries.
This book would not be possible without the help of others. Much of the content of this
book was inspired by Richard Viken’s course in psychological assessment that I took as
a graduate student. I thank W. Joel Schneider who provided several examples that were
adapted for this book. I thank Danielle Szabreath, Samar Haddad, and Michele Dumont for
help in copyediting. I acknowledge my wife, Alyssa Varner1 , who helped design several of
the graphics used in this book, in addition to all of her support throughout the process.
1 https://alyssajovarner.com
xxiii
Taylor & Francis
Taylor & Francis Group
http://taylorandfrancis.com
Introduction
Why R?
R is free, open source, open platform, and widely used. Unlike proprietary software used for
data analysis, R is not a black box. You can examine the code for any function or computation
you perform. You can even modify and improve these functions by changing the code, and
you can create your own functions. R also has advanced capabilities for data wrangling and
has many packages available for advanced statistical analysis and graphing. In addition,
there are strong resources available for creating your analyses in R so they are reproducible
by others (Gandrud, 2020).
1
2 Introduction
What Is Assessment?
Assessment is the gathering of information about a person, group, setting, or context. In
psychological assessment, we are interested in gathering information about people’s psy-
chological functioning, including their thoughts, emotions, and behaviors. Psychological
assessment can also consider biological and physiological processes that are linked to people’s
thoughts, emotions, and behaviors. Many assessment approaches can be used to assess
people’s thoughts, emotions, and behaviors, including self-report questionnaires, question-
naires reported by others (e.g., spouse, parent, teacher, or friend), interviews, observations,
biopsychological assessments (e.g., cortisol, heart rate, brain imaging), performance-based
assessments, archival approaches (e.g., chart review), and combinations of these.
whether a child is removed from their abusive home, whether a person is deemed competent
to stand trial, whether a prisoner is released on parole, and whether an applicant is admitted
to graduate school. These important assessment-related decisions should be made using the
best available science.
The problem is that there has been a proliferation of pseudo-science in assessment and
treatment. There are widely used psychological assessments and treatments that we know
are inaccurate, do not work, or in some cases, that we know to be harmful. Lists of harmful
psychological treatments (e.g., Lilienfeld, 2007) and inaccurate assessments (e.g., Hunsley
et al., 2015) have been published, but these treatments and assessments are still used by
professional providers to this day. Practice using such techniques violates the aphorism,
“First, do no harm.” This would be inconceivable for other applied sciences, such as chemistry,
engineering, and medicine. For instance, the prescription of a particular medication for a
particular purpose requires approval by the U.S. Food and Drug Administration (FDA).
Psychological assessments and treatments do not have the same level of oversight.
The gap between what we know based on science and what is implemented in practice
(the science–practice gap) motivated McFall’s (1991) “Manifesto for a Science of Clinical
Psychology,” which he later expanded (McFall, 2000). The Manifesto has one cardinal
principle and four corollaries:
The Manifesto orients you to the scientific perspective from which we will be examining
psychological assessment techniques in this book.
4 Introduction
However, difficulties with replication could exist even if researchers have the best of intentions,
engage in ethical research practices, and are transparent about all of the methods they used
and decisions they made. The replication crisis could owe, in part, to noisy (imprecise and
inaccurate) measures. The field has paid insufficient attention to measurement unreliability
as a key culprit in the replication crisis. As Loken & Gelman (2017) demonstrated, when
Introduction 5
measures are less noisy, measurement error weakens the association between the measures.
But when using noisy measures and selecting what to publish based on statistical significance,
measurement error can make the association appear stronger than it is. This is what Loken
& Gelman (2017) describe as the statistical significance filter: In a study with noisy measures
and a small or moderate sample size, statistically significant estimates are likely to have a
stronger effect size than the actual effect size—the “true” underlying effects could be small
or nonexistent. The statistical significance filter exists because, with a small sample size,
the effect size will need to be larger in order to detect it as statistically significant due to
larger standard errors. That is, when researchers publish a statistically significant effect
with a small or moderate sample size and noisy measures, the effect size will necessarily be
large enough to detect it (and likely larger than the true effect). However, the effect of noise
(measurement error) diminishes as the sample size increases. So, the goal should be to use
less noisy measures with larger sample sizes. And, as discussed in Chapter 13 on ethical
considerations in psychological assessment, the use of pre-registration could be useful to
control researcher degrees of freedom.
The lack of replicability of findings has the potential to negatively impact the people we
study through misinformed assessment, treatment, and policy decisions. Therefore, it is
crucial to use assessments with strong psychometric properties and/or to develop better
assessments. Psychometrics refer to the reliability and validity of measures. These concepts
are described in greater detail in Chapters 3 and 4, but for now, think about reliability as
consistency of measurement and validity as accuracy of measurement.
1. Risky hypotheses are posed that are falsifiable. The hypotheses can be shown to
be wrong.
2. Findings can be replicated independently by different research groups and different
methods. Evidence converges across studies and methods.
3. Potential alternative explanations for findings are specified and examined empiri-
cally (with data).
4. Steps are taken to guard against the undue influence of personal beliefs and biases.
5. The strength of claims reflects the strength of evidence. Findings and the ability to
make judgments or predictions are not overstated. For instance, it is important to
present the degree of uncertainty from assessments with error bars or confidence
intervals.
6 Introduction
Prerequisites
Applied examples in R are provided throughout the book. Each chapter that has R examples
has a section on “Getting Started,” which provides the code to load relevant libraries, load
data files, simulate data, add missing data (for realism), perform calculations, and more.
The data files used for the examples are available on the Open Science Framework (OSF):
https://osf.io/3pwza.
Most of the R packages used in this book can be installed from the Comprehensive R Archive
Network (CRAN) using the following command:
install.packages("INSERT_PACKAGE_NAME_HERE")
Several of the packages are hosted on GitHub repositories, including uroc (Gneiting & Walz,
2021), dmacs (Dueber, 2019), and petersenlab (Petersen, 2024).
You can install the uroc and dmacs packages using the following code:
install.packages("remotes")
remotes::install_github("evwalz/uroc")
remotes::install_github("ddueber/dmacs")
Many of the R functions used in this book are available from the petersenlab package
(Petersen, 2024): https://github.com/DevPsyLab/petersenlab. You can install the
petersenlab package (Petersen, 2024) using the following code:
install.packages("remotes")
remotes::install_github("DevPsyLab/petersenlab")