Updating Conceptions of Validity and - 2020 - Research in Social and Administrat

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Research in Social and Administrative Pharmacy 16 (2020) 1127–1130

Contents lists available at ScienceDirect

Research in Social and Administrative Pharmacy


journal homepage: www.elsevier.com/locate/rsap

Updating conceptions of validity and reliability T


a,∗ b
Michael J. Peeters , Spencer E. Harpe
a
University of Toledo College of Pharmacy & Pharmaceutical Sciences, 3000 Arlington Ave, MS1013, Toledo, OH, 43614, United States
b
Midwestern University Chicago College of Pharmacy, Department of Pharmacy Practice, 555 31st Street, Downers Grove, IL, 60515, United States

A R T I C LE I N FO A B S T R A C T

Keywords: Measurement validity is important when conducting research. This is as true for sociobehavioral research as for
Validity clinical research. Although the importance of validity is not new, its conceptualization has changed substantially
Reliability in the past few decades. In the literature, there is a lack of consistency in how validity is presented. This may
Validation stem from a lack of awareness of the relatively recent changes in conceptualization of validity, the continued use
of a historical framework in some educational texts, and/or the continued use of a historical framework in some
training programs. This article presents a brief history of the conceptualization of validity including the pro-
gression from a perspective of related concepts of reliability and validity, to multiple types of validity, to a view
of validity as a unitary concept supported by different types of evidence. This article closes by raising some
important considerations about promoting use of a contemporary validity framework and associated termi-
nology in current research, as well as in the education of future health-sciences researchers.

Introduction their understanding of the concept, and promote uptake of the “con-
temporary” framework.
Measurement validity has a peculiar place in biomedical research
and health-sciences education. Many clinicians and health sciences re- Traditional framework
searchers appear to conflate the two conceptual frameworks currently
in use—a “traditional” framework and a “contemporary” framework. In the “traditional” framework, reliability and validity are seen as
The “traditional” framework is dominant in clinical medicine where related but distinct concepts. When espousing the traditional frame-
many clinicians understand and tend to apply this conceptual frame- work, current textbook editions in general clinical research,3 epide-
work when engaging in discussions of validity. miology,4–6 and online sources7 often describe validity and reliability
Meanwhile, in educational and psychological testing, validity has using a dartboard analogy. As shown in Fig. 1, validity is presented as
evolved from this earlier conception. This “contemporary” framework accuracy, or how close the darts are to the bulls-eye regardless of their
is more nuanced. While helpful primers of this “contemporary” fra- proximity to one another. Meanwhile, reliability is seen as precision, or
mework had been previously reviewed elsewhere,1,2 our experience how close the darts group or cluster together on the dartboard re-
suggests that more is needed—perhaps more specific to a readership of gardless of their proximity to the bulls-eye. The optimal or ideal si-
sociobehavioral researchers in the health sciences and possibly a more tuation is one in which both reliability and validity are high and the
specific review comparing and contrasting a “traditional” framework of darts group together as well as fall near to the bulls-eye. Just as accuracy
validity and reliability with a “contemporary” framework of validity, and precision are different from one another, this framework has va-
within which reliability is included. lidity and reliability as related yet separate concepts.
This commentary seeks to challenge understandings of validity and Importantly, this analogy poses some philosophical challenges. One
reliability among health sciences researchers. Health sciences re- overly simplistic interpretation is that a measurement approach could
searchers need an improved understanding of the contemporary con- be accurate without being precise (i.e., valid but not reliable). Few
ceptualization of validity, as well as more contemporary language re- would argue that this “valid but not reliable” combination is a tenable
garding validity and validation aligned with current thinking in the situation, but the language surrounding the traditional concept of va-
area. Our hope is that this commentary will introduce readers to lidity that is typically associated with the dartboard analogy tends to
changes in the conceptualization of validity, help researchers' update support this idea.


Corresponding author.
E-mail address: michael.peeters@utoledo.edu (M.J. Peeters).

https://doi.org/10.1016/j.sapharm.2019.11.017
Received 9 July 2019; Received in revised form 23 November 2019; Accepted 29 November 2019
1551-7411/ © 2019 Elsevier Inc. All rights reserved.
M.J. Peeters and S.E. Harpe Research in Social and Administrative Pharmacy 16 (2020) 1127–1130

external relationship between a “real” attribute and an instrument's


score. Even now, this criterion validity concept closely follows the
traditional framework of validity as an association with or indication of
reality, such as with diagnostic tests being associated with disease or no
disease or a score on a licensure exam indicating practice readiness.
This traditional conception works as long as there is an external stan-
dard that can be used as a basis for evaluating some measure. For ex-
ample, criterion validity can be ascertained for hemoglobin A1c as a
diagnostic test in Type 2 diabetes mellitus. This test result can be
compared with later incidence of Type 2 diabetes mellitus by other
diagnostic criteria (with associated sensitivity, specificity, and positive/
negative predictive values).
This criterion-based validity framework is lacking, however, when
attempting to measure phenomena that are not able to be measured
directly or indirectly in the physical (or “real”) world, such as self-ef-
ficacy, knowledge, or health-related quality of life. This poses im-
portant challenges for issues commonly of interest in sociobehavioral
research in the health sciences where there may be relatively few ex-
ternal criteria against which to compare our constructed instruments.
Without external criteria for comparison, researchers turned their
Fig. 1. Dartboard analogy of reliability and validity. focus to a different approach to determining what an instrument was
measuring. This gave rise to content validity, which was initially more
This dartboard analogy may have a place in introductions to de- conceptual approach than statistical. Subject matter experts assessed
scribe a relationship between validity and reliability—but should not be content validity by providing input on the extent to which the items in
the final word. It is not the entire story. Educators must also ac- an instrument truly reflected what the instrument was intended to
knowledge current understanding of the validity concept. Limiting in- measure. If the subject matter experts concluded that the items “con-
struction to this analogy can risk perpetuating an idea that validity and tained” or “covered” the phenomenon of interest, then the instrument
reliability are distinct concepts that should be handled separately from was said to have content validity. (A recent review focused, using a
a methodological perspective rather than being viewed as different traditional framework, on some methods towards content validity of an
facets of a singular concept. instrument.9).
By the mid-20th century researchers noted increasing challenges
with the limited criterion/content conceptualization of validity, thus
Contemporary framework
the field shifted further to include construct validity. This type of validity
focused on whether an instrument measured some underlying trait or
A second framework, termed the “contemporary” framework
phenomenon of interest. This went beyond the idea of content validity,
herein, takes the validity and reliability concepts that were previously
which could be somewhat subjective. Various statistical methods based
treated as distinct and merges them into one unitary concept of validity.
on item response theory or factor analysis were used to identify one or
This unitary validity is supported by different sources of evidence.
more underlying latent variables based on responses to a series of
Similar to the traditional framework, validity is not dichotomous but is
questions or items. Evidence of construct validity for a measurement
on a continuum. There can be stronger or weaker evidence supporting
instrument was generated by identifying the underlying latent variable
validity of a given measurement. Another important concept within the
(s).
contemporary framework is that validity is not a property of the mea-
At this point in the progression of validity theory, there were at least
surement instrument itself. Instead, validity lies in the interpretation of
three types of measurement validity: criterion, content and construct. We
the resulting measurements or scores from an instrument's use within a
say “at least” here, since these three types of validity are sometimes
specific group and for a particular purpose.
subdivided, such as criterion validity into predictive and concurrent
Kane8 describes validation in terms of drawing inferences from a
validity or construct validity into convergent and divergent validity.
particular test score or measurement value, which may be a helpful way
Some research methods texts continue to use this validity framework
to consider the context- or use-specific nature of validity. For example,
for their discussions of survey research.6,10,11 This includes, un-
the Pharmacy College Admissions Test (PCAT) itself is neither “valid”
fortunately, an otherwise helpful review of some methods for “content
nor “invalid.” A particular use and interpretation of PCAT scores may
validity” (i.e., building validity evidence related to content).9 Each type
have stronger or weaker validity evidence depending on the specifics of
of validity had its own methods of assessment. To add confusion,
the situation. For example, using PCAT scores for pharmacy college
multiple types of validity may have been determined for a particular
admissions appears to be supported by good validity evidence. If the
instrument. Since these were often reported alongside reliability esti-
PCAT were used to license physicians, there would be poor validity
mates in instrument development studies, validity and reliability came
evidence supporting that use. In Kane's terminology,8 there would be
to be viewed as characteristics of the instrument itself in the traditional
very limited evidence supporting the inference from PCAT score to
view of validity. The lack of a general framework describing how the
physician licensure. The PCAT appears helpful for one use but not for
types of validity fit together seemingly reinforced the idea that these
the other.
were indeed separate concepts. As a result, researchers, educators, and
program evaluators were left wondering which “type” of validity was
History of validity most important. It was unclear which type of validity should receive the
most focus or weight in validation processes.
The evolution of validity theory, from the traditional criterion-based With the 1999 update to the Standards for Educational and
validity to the contemporary unified framework has been described in Psychological Testing,12 the conceptual framework for validity and re-
detail by Kane.8 A brief overview is provided here. liability changed substantially.1,2 In the fields of educational and psy-
A century ago, the concept of criterion validity had emerged as the chological testing and assessment, changes in the conceptualizations of
dominant way to describe validity. In this conception, validity was an validity and reliability had been taking place reflecting shifts in social

1128
M.J. Peeters and S.E. Harpe Research in Social and Administrative Pharmacy 16 (2020) 1127–1130

which aligns with how reliability and validity are presented in most
applied biomedical textbooks,3–6,21,22 including formal continuing
education for health-professions.9,23,24 Continued use of the traditional
framework does not indicate to learners and practicing health profes-
sionals that the conceptualization fo validity has changed over time.
While some health professions education specialists already com-
municate a contemporary framework,1,2,17–20 many clinician colleagues
as well as collaborators from other health professions do not appear to
have made this transition. Perhaps the most likely contributing factor is
a lack of awareness or understanding. It is not uncommon to hear some
colleagues mention a “valid” or “validated” measurement instrument
without stating the particular use and context (i.e., that there is validity
evidence supporting an instrument's use and interpretation in a parti-
cular context). Describing validity in terms of use and interpretation
rather than as a property of the instrument itself may seem cumber-
some, but it communicates a message consistent with a contemporary
framework for validity.
Working with colleagues who only understand a traditional frame-
Fig. 2. Contemporary validity framework using terminology from the current work of validity and are unfamiliar with evolution to the contemporary
Standards for Educational and Psychological Testing.14 framework has multiple problems. First, it can be more difficult to
communicate, integrate, and build the literature without suitable un-
derstanding within a common language.13,25,26 This results in the lit-
and psychological science in the preceding decades.13 The culmination
erature appearing disjointed and divided. Second, some clinical areas
of these shifts resulted in the adoption of a single unified theory of
rely heavily on patient-reported outcomes (e.g., quality of life) or other
validity where validity came to be viewed as a unitary concept sup-
individual-level psychological or educational phenomena (e.g., self-ef-
ported by multiple sources of evidence (Fig. 2). More generally, all
ficacy) that would benefit from the contemporary validity framework.
validity is captured within a unitary construct validity framework with
In an unfortunate example, recent guidance on developing “valid” pa-
multiple evidence sources, not multiple types of validity.1,14
tient-reported outcome measures mentioned different types of validity
Validity in the most recent Testing Standards continues to be de-
including content, convergent, and divergent validity.27–29 Third, when
scribed as “the most fundamental consideration in developing and
validity, validation, and reliability are mentioned in accreditation
evaluating tests.”15 Within the contemporary framework, reliability has
standards for health professions education, the contemporary frame-
become a sine qua non of validity rather than an issue to be viewed as
work is assumed. For instance, the Accreditation Council for Pharmacy
external to validity. As one source of evidence for validity's internal
Education has used “validity”, “reliability” and “validating” 17 times in
structure, the reliability of scores from an instrument within a sample of
their most recent PharmD program accreditation documents.30,31
individuals provides evidence towards valid inference with those scores
from that instrument's particular use (i.e., from a specific sample of
individuals in that particular context of use).1,13 Reliability is one
Next steps
source of evidence for validity; however, reliability should not be the
only source of validity evidence. For instance, test scores from appli-
The continued use of two validity frameworks only adds to confu-
cants on the PCAT should be very reliable, but reliability alone is not a
sion around this important topic. Previous publications have raised
sufficient marker of validity. Other sources of evidence supporting va-
awareness of the contemporary view,1,2,8,15,17–20 though now it be-
lidity would be needed if a medical school were considering accepting
comes important to acknowledge that the predominant, traditional
PCAT scores for medical school admissions.
view of validity currently taught and used by many is not consistent
Additionally, reliability is specific to the circumstance. Test scores
with the contemporary understanding of validity. With sufficient
that are acceptably reliable for one context may differ from another
awareness and acknowledgement of the misalignment between the
location and/or use. For instance, motivation can affect responses.
traditional view of validity and contemporary validity theory, the next
Differences have been shown when standardized test scores were used
logical step is incorporating the contemporary framework into current
in a high-stakes environment compared with standardized test scores
scholarship and training. This raises important considerations related to
used with low-stakes implications.16 As mentioned previously, the
what we should be teaching learners about validity and reliability.
context of use matters when discussing validity and its sources of evi-
One option is for educators to teach validity and reliability using a
dence.
stepped approach. The traditional perspective could be introduced first,
starting with validity and reliability of diagnostic tests since this ap-
Potential gaps in understanding and their consequences plication is more concrete and readily applicable to patient care. Then,
teaching could progress to presenting validity in a contemporary per-
A number of health professions education sources very aptly de- spective for more abstract applications needing further validity evi-
scribe this contemporary perspective of validity and have updated dence (e.g., psychiatric diagnoses, health-related quality of life).
terminology to align with the contemporary framework.1,17–20 Un- Alternatively, educators could focus on the more traditional perspective
fortunately, terminology aligned with the traditional view of validity, in entry-level professional training and intersperse instruction aligned
such as discussing content, construct, and criterion as different types of with the contemporary validity framework during post-professional
validity, continues to appear in references,4,9,10 including research training (e.g., residencies or fellowships) or graduate training, which
methods texts for surveys.11 Given the lack of consistency in current would seem to align better with cognitive load32 of these more mature
textbooks and scholarly literature in the health sciences, a gap appears learners. While there are advantages and disadvantages to either of
between modern thinking on validity and present approaches to the these approaches, as well as many other potential approaches, it is
topic in both health-oriented sociobehavioral research and health pro- important for the contemporary validity framework to be introduced at
fessions education. To the best of our knowledge, the traditional fra- some point in training.
mework is generally taught in health professions education programs,

1129
M.J. Peeters and S.E. Harpe Research in Social and Administrative Pharmacy 16 (2020) 1127–1130

Suggested best practices valuable insights during revisions of this manuscript.

In sum, research in the health sciences should use and build on the References
contemporary theory of validity, which has been included as the cur-
rent standard of practice in the most recent version of The Standards for 1. Downing SM. Validity: on the meaningful interpretation of assessment data. Med
Educational and Psychological Testing.12,15 Below are some suggested Educ. 2003;37:830–837.
2. Sullivan GM. A primer on the validity of assessment instruments. J Grad Med Educ.
best practices for consideration: 2011;3:119–120.
3. Hulley SB, Cummings SR, Browner WS, Grady DG, Newman TB. Designing Clinical
1. Language surrounding validity needs to change. Authors, peer-re- Research. third ed. Philadelphia, PA: Lippincott Williams & Wilkins; 2007.
4. Dawson B, Trapp RG. Basic and Clinical Biostatistics. fourth ed. New York, NY:
viewers, and journal editors are implored to prevent outdated lan- McGraw-Hill; 2004.
guage for validity to find its way into published articles. It is simply 5. Gordis L. Epidemiology. Fifth ed. Philadelphia, PA: Elsevier Saunders; 2014.
incorrect to describe an educational or psychological test as “valid” 6. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. third ed. Philadelphia, PA:
Lippincott Williams & Wilkins; 2008.
or “validated”. Test scores from a certain group can have evidence for 7. Statistics How To: statistics for the rest of us!. http://www.statisticshowto.com/
validity, when that test is used for a specific purpose. reliability-validity-definitions-examples Accessed 11.13.19.
2. Validation is the process of generating evidence for validity of a 8. Kane MT. Validating the interpretations and uses of test scores. J Educ Meas.
2013;50:1–73.
learning assessment or program of assessment.8,15,33 Educators and
9. Almanasreh E, Moles R, Chen TF. Evaluation of methods used for estimating content
administrators may infer and make decisions based on that evi- validity. Res Soc Adm Pharm. 2019;15:214–221.
dence. The higher the stakes of inferences and/or decisions, the 10. Kimberlin CL, Winterstein AG. Validity and reliability of measurement instruments
stronger the validation evidence must be. used in research. Am J Health Syst Pharm. 2008;65:2276–2284.
11. Burns SK, Gray JR, Grove N. Understanding Nursing Research: Building an Evidence-
3. Researchers already using the contemporary framework must un- Based Practice. sixth ed. St. Louis, MO: Elsevier Saunders; 2015.
derstand that collaborators, especially clinician colleagues, have 12. American Educational Research Association. American Psychological Association,
very likely been taught validity in only a traditional framework. This and National Council on Measurement in Education. Standards for Educational and
Psychological Testing. Washington DC: American Psychological Association; 1999.
may require extra work and additional patience in the collaborative 13. Messick S. Validity. In: Linn RL, ed. Educational Measurement. Third ed. New York,
efforts in order to help their colleagues understand the changes in NY: American Council on Education/Macmillan; 1989:13–103.
the conception of validity. 14. Newton PE, Shaw SD. Standards for talking and thinking about validity. Psychol
Methods. 2013;18:301–319.
4. Health sciences education, both clinical and research, should in- 15. American Educational Research Association. American Psychological Association,
clude the updated framework. Students should learn the updated and National Council on Measurement in Education. Standards for Educational and
contemporary framework so they can use appropriate language in Psychological Testing. Washington DC: American Psychological Association; 2014.
16. Waskiewicz RA. Pharmacy students' test-taking motivation-effort on a low-stakes
the future and understand how the contemporary framework builds standardized test. Am J Pharmaceut Educ. 2011;75 Article41.
off and extends the traditional framework. 17. Streiner DL, Norman GR, Cairney J. Health Measurement Scales. fifth ed. New York,
NY: Oxford University Press; 2015.
18. Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric
Conclusion
instruments: theory and application. Am J Med. 2006;119:166 e7-e16.
19. Schmitz CC. Your intergalactic decoder ring has arrived: “Reliability” and “validity”
While validity is a foundational aspect of all research, it is especially defined. https://www.facs.org/education/division-of-education/publications/rise/
important in sociobehavioral research. The traditional framework of articles/rap-archive/your-intergalactic-decoder-ring-has-arrived-reliability-and-
validity-defined Accessed 11.13.19.
validity and reliability is insufficient. Continued education to update 20. Cook DA, Lineberry M. Consequences validity evidence: evaluating the impact of
perspectives from the traditional framework of validity and reliability educational assessments. Acad Med. 2016;91:785–795.
to the contemporary framework of validity will be needed. Updating 21. Bender DA, Varghese J, Jacob M, Murray RK. Clinical biochemistry. In: Rodwell VW,
Bender DA, Botham KM, Kennelly PJ, Well P, eds. Harper's Illustrated Biochemistry.
language and conceptions of validity in research in the health sciences 30th ed. New York, NY: McGraw-Hill; 2015.
and its education will not be an easy endeavor; however, this research 22. Walker JS, Roback HB, Welch L. Psychological and neuropsychological assessment.
community has the capabilities, tools, and resources to move ahead if Chapter 6 In: Ebert MH, Loosen PT, Nurcombe B, Leckman JF, eds. CURRENT
Diagnosis & Treatment: Psychiatry. second ed. New York, NY: McGraw-Hill; 2008.
desired. Sociobehavioral researchers in the health sciences can also help 23. Hecker K, Violato C. Validity, reliability, and defensibility of assessments in veter-
to lead clearly-needed change among collaborators, students, and other inary education. J Vet Med Educ. 2009;36:271–275.
health professionals. 24. Applegate KE, Crewson PE. An introduction to biostatistics. Radiology.
2002;25:318–322.
25. Bordage G. Conceptual frameworks to illuminate and magnify. Med Educ.
Res social adm pharm category 2009;43:312–319.
26. Peeters MJ, Garavalia LS. Teachable Moments Matter for: an analysis of the use of
pharmacy curriculum outcomes assessment (PCOA) scores within one professional
Commentary.
program. Curr Pharm Teach Learn. 2017;9:175–177.
27. US Department of Health and Human Services Food and Drug Administration.
Funding Guidance for Industry: Patient-Reported Outcome Measures: Use in Medical Product
Development to Support Labeling Claims. Silver Springs: MD. Food and Drug
Administration; 2009.
This research did not receive any specific grant from funding 28. Gabriel SE, Normand SLT. Getting the methods right - the foundation of patient-
agencies in the public, commercial, or not-for-profit sectors. centered outcomes research. N Engl J Med. 2007;367:787–790.
29. Rothrock NE, Kaiser KA, Cella D. Developing a valid patient-reported outcome
measure. Clin Pharmacol Ther. 2011;90:737–742.
Ethical approval 30. Accreditation Council for Pharmacy Education. Accreditation Standards and Key
Elements for the Professional Program in Pharmacy Leading to the Doctor of Pharmacy
Not applicable. Degree (“Standards 2016”). February 2015; February 2015 Available at https://
www.acpe-accredit.org/pdf/Standards2016FINAL.pdf Accessed 11.13.19.
31. Accreditation Council for Pharmacy Education. Guidance for the Accreditation
Declaration of competing interest Standards and Key Elements for the Professional Program in Pharmacy Leading to the
Doctor of Pharmacy Degree (“Guidance for the Standards 2016”). February 2015;
February 2015 Available at https://www.acpe-accredit.org/pdf/
None.
GuidanceforStandards2016FINAL.pdf Accessed 11.13.19.
32. van Merrienboer JJG, Sweller J. Cognitive load theory in health professional edu-
Acknowledgements cation: design principles and strategies. Med Educ. 2010;44:85–93.
33. Peeters MJ, Martin BA. Validation of learning assessments: a primer. Curr Pharm
Teach Learn. 2017;9:925–933.
The authors thank Paul Rega, MD, and Ken Cor, PhD, for their

1130

You might also like