Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Measurement Validity: A Shared Standard for Qualitative and Quantitative Research

Author(s): Robert Adcock and David Collier


Source: The American Political Science Review , Sep., 2001, Vol. 95, No. 3 (Sep., 2001), pp.
529-546
Published by: American Political Science Association

Stable URL: https://www.jstor.org/stable/3118231

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

American Political Science Association is collaborating with JSTOR to digitize, preserve and
extend access to The American Political Science Review

This content downloaded from


132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American
Aeica
Political Science Review
PoitclSineRveVo.9,N.3Spmbr20 Vol. 95, No. 3 September 2001

Measurement Validi
Research
ROBERT ADCOCK and
Scholars routinely make claim
operationalize their concepts
little attention has been devot
First, we seek to establish a sh
more effectively, and commu
draw a clear distinction betwe
contextual specificity of meas
combine generality and validity
of terms for alternative measu
validation most relevant to po

R esearchers routinely make complex choices 1994), no major statement on this topic has appeared
about linking concepts to observations, that is, since Zeller and Carmines (1980) and Bollen (1989).
about connecting ideas with facts. These choices Although King, Keohane, and Verba (1994, 25, 152-5)
raise the basic question of measurement validity: Do cover many topics with remarkable thoroughness, they
the observations meaningfully capture the ideas con- devote only brief attention to measurement validity.
tained in the concepts? We will explore the meaning of New thinking about measurement, such as the idea of
this question as well as procedures for answering it. In measurement as theory testing (Jacoby 1991, 1999), has
the process we seek to formulate a methodological not been framed in terms of validity.
standard that can be applied in both qualitative and Four important problems in political science re-
quantitative research. search can be addressed through renewed attention to
Measurement validity is specifically concerned with measurement validity. The first is the challenge of
whether operationalization and the scoring of cases establishing shared standards for quantitative and qual-
adequately reflect the concept the researcher seeks to itative scholars, a topic that has been widely discussed
measure. This is one aspect of the broader set of (King, Keohane, and Verba 1994; see also Brady and
analytic tasks that King, Keohane, and Verba (1994, Collier 2001; George and Bennett n.d.). We believe the
chap. 2) call "descriptive inference," which also encom- skepticism with which qualitative and quantitative re-
passes, for example, inferences from samples to popu- searchers sometimes view each other's measurement
lations. Measurement validity is distinct from the va- tools does not arise from irreconcilable methodological
lidity of "causal inference" (chap. 3), which Cook and differences. Indeed, substantial progress can be made
Campbell (1979) further differentiate into internal and in formulating shared standards for assessing measure-
external validity.1 Although measurement validity is ment validity. The literature on this topic has focused
interconnected with causal inference, it stands as an almost entirely on quantitative research, however,
important methodological topic in its own right. rather than on integrating the two traditions. We
New attention to measurement validity is overdue in propose a framework that yields standards for mea-
political science. While there has been an ongoing surement validation and we illustrate how these apply
concern with applying various tools of measurement to both approaches. Many of our quantitative and
validation (Berry et al. 1998; Bollen 1993; Elkins 2000; qualitative examples are drawn from recent compara-
Hill, Hanna, and Shafqat 1997; Schrodt and Gerner tive work on democracy, a literature in which both
groups of researchers have addressed similar issues.
Robert Adcock (adcockr@uclink4.berkeley.edu) is a Ph.D candi- This literature provides an opportunity to identify
date, Department of Political Science, and David Collier parallel concerns about validity as well as differences in
(dcollier@socrates.berkeley.edu) is Professor of Political Science, specific practices.
University of California, Berkeley, CA 94720-1950.
Among the many colleagues who have provided helpful comments
A second problem concerns the relation between
on this article, we especially thank Christopher Achen, Kenneth measurement validity and disputes about the meaning
Bollen, Henry Brady, Edward Carmines, Rubette Cowan, Paul Dosh, of concepts. The clarification and refinement of con-
Zachary Elkins, John Gerring, Kenneth Greene, Ernst Haas, Edward cepts is a fundamental task in political science, and
Haertel, Peter Houtzager, Diana Kapiszewski, Gary King, Marcus
Kurtz, James Mahoney, Sebastian Mazzuca, Doug McAdam, Ger-
carefully developed concepts are, in turn, a major
ardo Munck, Charles Ragin, Sally Roever, Eric Schickler, Jason prerequisite for meaningful discussions of measure-
Seawright, Jeff Sluyter, Richard Snyder, Ruth Stanley, Laura Stoker, ment validity. Yet, we argue that disputes about con-
and three anonymous reviewers. The usual caveats apply. Robert cepts involve different issues from disputes about mea-
Adcock's work on this project was supported by a National Science surement validity. Our framework seeks to make this
Foundation Graduate Fellowship.
1 These involve, respectively, the validity of causal inferences about distinction clear, and we illustrate both types of dis-
the cases being studied, and the generalizability of causal inferences putes.
to a broader set of cases (Cook and Campbell 1979, 50-9, 70-80). A third problem concerns the contextual specificity

529
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research September 2001

of measurement validity-an issue that arises when a attention in political science: content, convergent/dis-
measure that is valid in one context is invalid in criminant, and nomological/construct validation.
another. We explore several responses to this problem
that seek a middle ground between a universalizing
OVERVIEW OF MEASUREMENT VALIDITY
tendency, which is inattentive to contextual differences,
and a particularizing approach, which is skeptical Measurement
about validity should be understood in relatio
to issues that arise in moving between concepts and
the feasibility of constructing measures that transcend
specific contexts. The responses we explore seekobservations.
to
incorporate sensitivity to context as a strategy for
establishing equivalence across diverse settings.Levels and Tasks
A fourth problem concerns the frequently confusing
language used to discuss alternative procedures
We for
depict the relationship between concepts and
measurement validation. These procedures have often
servations in terms of four levels, as shown in Figur
been framed in terms of different "types of validity,"
At the broadest level is the background concept, whi
among which content, criterion, convergent, and encompasses
con- the constellation of potentially dive
struct validity are the best known. Numerousmeanings
other associated with a given concept. Next is
systematized
labels for alternative types have also been coined, and concept, the specific formulation o
we have found 37 different adjectives that have been adopted by a particular researcher or grou
concept
researchers. It is usually formulated in terms of
attached to the noun "validity" by scholars wrestling
with issues of conceptualization and measurement.2
explicit definition. At the third level are indicato
The situation sometimes becomes further confused,
which are also routinely called measures. This lev
includes any systematic scoring procedure, rangi
given contrasting views on the interrelations among
different types of validation. For example, in from
recentsimple measures to complex aggregated inde
validation studies in political science, one valuable
It encompasses not only quantitative indicators b
analysis (Hill, Hanna, and Shafqat 1997) treatsalso
"con-
the classification procedures employed in qual
vergent" validation as providing evidence fortive"con- research. At the fourth level are scores for ca
which
struct" validation, whereas another (Berry et al. include both numerical scores and the results of
1998)
treats these as distinct types. In the psychometrics
qualitative classification.
tradition (i.e., in the literature on psychological and
Downward and upward movement in Figure 1 can be
understood
educational testing) such problems have spurred a as a series of research tasks. On the
theoretically productive reconceptualization. This left-hand
liter- side, conceptualization is the movement from
ature has emphasized that the various procedures
the for
background concept to the systematized concep
assessing measurement validity must be seen,Operationalization
not as moves from the systematized co
cept
establishing multiple independent types of validity, butto indicators, and the scoring of cases appl
rather as providing different types of evidence forindicators
valid- to produce scores. Moving up on the righ
hand side, indicators may be refined in light of scores
ity. In light of this reconceptualization, we differentiate
between "validity" and "validation." We use validity to
and systematized concepts may be fine-tuned in light
knowledge about scores and indicators. Insights de
refer only to the overall idea of measurement validity,
rived from these levels may lead to revisiting t
and we discuss alternative procedures for assessing
background
validity as different "types of validation." In the final concept, which may include assessing a
ternative
part of this article we offer an overview of three main formulations of the theory in which a partic
types of validation, seeking to emphasize how ular systematized concept is embedded. Finally,
proce-
dures associated with each can be applied by both
define a key overarching term, "measurement" involve
quantitative and qualitative researchers. the interaction among levels 2 to 4.
In the first section of this article we introduce a
framework for discussing conceptualization, measure-
Defining Measurement Validity
ment, and validity. We then situate questions of validity
in relation to broader concerns about the meaning Valid
of measurement is achieved when scores (includi
concepts. Next, we address contextual specificity the
and results of qualitative classification) meaningful
equivalence, followed by a review of the evolving capture the ideas contained in the corresponding co
cept.
discussion of types of validation. Finally, we focus on This definition parallels that of Bollen (1989
three specific types of validation that merit central
184), who treats validity as "concerned with whether a
variable measures what it is supposed to measur
King, Keohane, and Verba (1994, 25) give essentiall
2 We have found the following adjectives attached to validitythein
same definition.
discussions of conceptualization and measurement: a priori, appar-
ent, assumption, common-sense, conceptual, concurrent, congruent, If the idea of measurement validity is to do serious
methodological work, however, its focus
consensual, consequential, construct, content, convergent, criterion- must be fur-
ther specified, as emphasized by Bollen
related, curricular, definitional, differential, discriminant, empirical, (1989, 197).
face, factorial, incremental, instrumental, intrinsic, linguistic, logical,
Our specification involves both ends of the connection
nomological, postdictive, practical, pragmatic, predictive, rational,
response, sampling, status, substantive, theoretical, and trait. A
between concepts and scores shown in Figure 1. At the
parallel proliferation of adjectives, in relation to the concept concept
of end, our basic point (explored in detail below)
democracy, is discussed in Collier and Levitsky 1997. is that measurement validation should focus on the

530
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American Political Science Review
Ameica Poiia cec eiwVl 5 o Vol. 95, No. 3

FIGUR
I

Level 1. Backgl round Concept


The broad constellati ion of meanings and
understandings associat ed with a given concept.

Task: ConceF itualization Task: Revisiting Background


Formulating a matized
systel concept through Concept. Exploring broader issues conc
reasoning about the
background concept, in the background concept in light of insights a
research.
light of the goals of I scores, indicators, and the systematized con

Level 2. Systematized Concept


A specific formulation of a concept used by a

J
given scholar or group of scholars;
commonly involves an explicit definition.
2.
Task: Operationalization Task: Modifying Systematized
Developing, on the basis of a systema- Concept. Fine-tuning the systematized
tized concept, one or more indicators concept, or possibly extensively revising it, in
for scoring/classifying cases. light of insights about scores and indicators.

I
C
*0
Level 3. Indi icators
Also referred to as "meas ures" and "opera- y
E tinnalinztinns "II,IIn nlalitativ
%,.11..1i,.A,?I ,e research, these

2
// are the operal tional definit
tions employed in
U) / I - cl lassifying ca ases.
aa
0 Task: Scoring Cases Task: Refining Indicators
E Applying these indicators to produce Modifying indicators, or potentially creating
scores for the cases being analyzed. new indicators, in light of observed scores.

Level 4. Scores for Cases


The scores for cases
atedgener
by a particular
indicator. These include bot :h numerical scores
tive classification
and the results of qualital
L I

relation between observations and the systematized Measurement Error, Reliability, and Validity
concept; any potential disputes about the background
concept should be set aside as an important but Validity is often discussed in connection with measure-
separate issue. With regard to scores, an obvious butment error and reliability. Measurement error may be
crucial point must be stressed: Scores are never exam-systematic-in which case it is called bias-or random.
ined in isolation; rather, they are interpreted and given Random error, which occurs when repeated applica-
meaning in relation to the systematized concept. tions of a given measurement procedure yield incon-
In sum, measurement is valid when the scores (levelsistent results, is conventionally labeled a problem of
4 in Figure 1), derived from a given indicator (level 3), reliability. Methodologists offer two accounts of the
can meaningfully be interpreted in terms of the system- relation between reliability and validity. (1) Validity is
atized concept (level 2) that the indicator seeks tosometimes understood as exclusively involving bias,
operationalize. It would be cumbersome to refer re-that is error that takes a consistent direction or form.
peatedly to all these elements, but the appropriate From this perspective, validity involves systematic er-
focus of measurement validation is on the conjunction ror, whereas reliability involves random error (Car-
of these components. mines and Zeller 1979, 14-5; see also Babbie 2001,

531
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research
II II I

144-5). Therefore, unreliable scores may still be cor- concepts, the better the theory we can formulate with
rect "on average" and in this sense valid. (2) Alterna- them, and in turn, the better the concepts available for
tively, some scholars hesitate to view scores as valid if the next, improved theory." Various examples of this
they contain large amounts of random error. They intertwining are explored in recent analyses of impor-
believe validity requires the absence of both types of tant concepts, such as Laitin's (2000) treatment of
error. Therefore, they view reliability as a necessary but language community and Kurtz's (2000) discussion of
not sufficient condition of measurement validity (Kirk peasant. Fearon and Laitin's (2000) analysis of ethnic
and Miller 1986, 20; Shively 1998, 45). conflict, in which they begin with their hypothesis and
Our goal is not to adjudicate between these accounts ask what operationalization is needed to capture the
but to state them clearly and to specify our own focus, conceptions of ethnic group and ethnic conflict en-
namely, the systematic error that arises when the links tailed in this hypothesis, further illustrates the interac-
among systematized concepts, indicators, and scores tion of theory and concepts.
are poorly developed. This involves validity in the first In dealing with the choices that arise in establishing
sense stated above. Of course, the random error that the systematized concept, researchers must avoid three
routinely arises in scoring cases is also important, but it common traps. First, they should not misconstrue the
is not our primary concern. flexibility inherent in these choices as suggesting that
A final point should be emphasized. Because error is everything is up for grabs. This is rarely, if ever, the
a pervasive threat to measurement, it is essential to case. In any field of inquiry, scholars commonly asso-
view the interpretations of scores in relation to system- ciate a matrix of potential meanings with the back-
atized concepts as falsifiable claims (Messick 1989, ground concept. This matrix limits the range of plau-
13-4). Scholars should treat these claims just as they sible options, and the researcher who strays outside it
would any casual hypothesis, that is, as tentative state- runs the risk of being dismissed or misunderstood. We
ments that require supporting evidence. Validity as- do not mean to imply that the background concept is
sessment is the search for this evidence. entirely fixed. It evolves over time, as new understand-
ings are developed and old ones are revised or fall from
MEASUREMENT VALIDITY AND CHOICES use. At a given time, however, the background concept
ABOUT CONCEPTS usually provides a relatively stable matrix. It is essential
to recognize that a real choice is being made, but it is
no less essential to recognize that this is a limited
A growing body of work considers the systematic
analysis of concepts an important component choice.
of polit-
ical science methodology.3 How should we understand
Second, scholars should avoid claiming too much in
the relation between issues of measurement defending
validitytheir choice of a given systematized concept.
and broader choices about concepts, which
It isare
notaproductive to treat other options as self-
central focus of this literature? evidently ruled out by the background concept. For
example, in the controversy over whether democracy
versus nondemocracy should be treated as a dichotomy
Conceptual Choices: Forming the
or in terms of gradations, there is too much reliance on
Systematized Concept
claims that the background concept of democracy
We view systematized concepts as the point of depar-inherently rules out one approach or the other (Collier
ture for assessing measurement validity. How do schol- and Adcock 1999, 546-50). It is more productive to
ars form such concepts? Because background concepts recognize that scholars routinely emphasize different
routinely include a variety of meanings, the formation aspects of a background concept in developing system-
of systematized concepts often involves choosingatized concepts, each of which is potentially plausible.
among them. The number of feasible options varies Rather than make sweeping claims about what the
greatly. At one extreme are concepts such as triangle,background concept "really" means, scholars should
which are routinely understood in terms of a single present specific arguments, linked to the goals and
conceptual systematization; at the other extreme are context of their research, that justify their particular
"contested concepts" (Gallie 1956), such as democracy. choices.
A careful examination of diverse meanings helps clarify A third problem occurs when scholars stop short of
the options, but ultimately choices must be made. providing a fleshed-out account of their systematized
These choices are deeply interwined with issues ofconcepts. This requires not just a one-sentence defini-
theory, as emphasized in Kaplan's (1964, 53) paradox tion, but a broader specification of the meaning and
of conceptualization: "Proper concepts are needed to entailments of the systematized concept. Within the
formulate a good theory, but we need a good theory to psychometrics literature, Shepard (1993, 417) summa-
arrive at the proper concepts.... The paradox is rizes what is required: "both an internal model of
resolved by a process of approximation: the better ourinterrelated dimensions or subdomains" of the system-
atized concept, and "an external model depicting its
3 Examples of earlier work in this tradition are Sartori 1970, 1984 relationship to other [concepts]." An example is Bol-
and Sartori, Riggs, and Teune 1975. More recent studies include len's (1990, 9-12; see also Bollen 1980) treatment of
Collier and Levitsky 1997; Collier and Mahon 1993; Gerring 1997,
1999, 2001; Gould 1999; Kurtz 2000; Levitsky 1998; Schaffer 1998. political democracy, which distinguishes the two di-
Important work in political theory includes Bevir 1999; Freeden mensions of "political rights" and "political liberties,"
1996; Gallie 1956; Pitkin 1967, 1987. clarifies these by contrasting them with the dimensions

532
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American Political Science Review
Amerian Poitica Sciece Reiew Vl. 95 No.
Vol. 95, No. 3

developed
competing meanings, they may find a different answer b
them.to the validity questionBoll
for each meaning.
throughBy restricting the focus of measurement co
validation to
social
the systematizedor
concept, we do not suggest e
that
political scientists should
Sartori ignore basic conceptual is-
(1984
tic field.
sues. Rather, arguments about the background concept
One consequ
and those about validity can be addressed adequately
out only when each is engaged on its own terms, rather
account
needs than conflated to into one overly broad issue.be Consider
ation Schumpeter's of(1947, chap. 21) procedural th definition of
single conc
democracy. This definition explicitly rules out elements
concepts. of the background concept, such as the concern In with
is the substantive discus
policy outcomes, that had been central to
which are
what he calls the classical theory of democracy. Al- v
overall con
though Schumpeter's conceptualization has been very
analyzed influential in political science, some sep
scholars (Harding
subtypes and Petras 1988; Mouffe 1992) have called for a revised o
Schmitter conception that encompasses other concerns, such a as
result social and economicfromoutcomes. This important debate
or from exemplifies the kind of conceptual emdispute that should
of the conv
be placed outside the realm of measurement validity.
discussed below. Recognizing that a given conceptual choice does not
involve an issue of measurement validity should not
preclude considered arguments about this choice. An
example is the argument that minimal definitions can
Measurement Validity and the Systematized
facilitate causal assessment (Alvarez et al. 1996, 4; Karl
Versus Background Concept
1990, 1-2; Linz 1975, 181-2; Sartori 1975, 34). For
We stated earlier that the systematized concept, rather instance, in the debate about a procedural definition of
than the background concept, should be the focus in democracy, a pragmatic argument can be made that if
measurement validation. Consider an example. A re- analysts wish to study the casual relationship between
searcher may ask: "Is it appropriate that Mexico, prior democracy and socioeconomic equality, then the latter
to the year 2000 (when the previously dominant party must be excluded from the systematization of the
handed over power after losing the presidential elec- former. The point is that such arguments can effectively
tion), be assigned a score of 5 out of 10 on an indicator justify certain conceptual choices, but they involve
of democracy? Does this score really capture how issues that are different from the concerns of measure-
'democratic' Mexico was compared to other coun- ment validation.
tries?" Such a question remains underspecified until we
know whether "democratic" refers to a particular sys-
tematized concept of democracy, or whether this re-
Fine-Tuning the Systematized Concept with
Friendly Amendments
searcher is concerned more broadly with the back-
ground concept of democracy. Scholars who question We define measurement validity as concerned with the
Mexico's score should distinguish two issues: (1) a relation among scores, indicators, and the systematized
concern about measurement-whether the indicator concept, but we do not rule out the introduction of new
employed produces scores that can be interpretedconceptual as ideas during the validation process. Key
adequately capturing the systematized concept used in is the back-and-forth, iterative nature of research
here
a given study and (2) a conceptual concern-whether emphasized in Figure 1. Preliminary empirical work
the systematized concept employed in creating the may help in the initial formulation of concepts. Later,
indicator is appropriate vis-a-vis the background con- even after conceptualization appears complete, the
cept of democracy. application of a proposed indicator may produce un-
We believe validation should focus on the first issue,expected observations that lead scholars to modify
whereas the second is outside the realm of measure- their systematized concepts. These "friendly amend-
ment validity. This distinction seems especially appro- ments" occur when a scholar, out of a concern with
priate in view of the large number of contested con- validity, engages in further conceptual work to suggest
cepts in political science. The more complex and refinements or make explicit earlier implicit assump-
contested the background concept, the more important tions. These amendments are friendly because they do
it is to distinguish issues of measurement from funda- not fundamentally challenge a systematized concept
mental conceptual disputes. To pose the question ofbut instead push analysts to capture more adequately
validity we need a specific conceptual referent against the ideas contained in it.
which to assess the adequacy of a given measure. A A friendly amendment is illustrated by the emer-
systematized concept provides that referent. By con- gence of the "expanded procedural minimum" defini-
trast, if analysts seek to establish measurement validitytion of democracy (Collier and Levitsky 1997, 442-4).
in relation to a background concept with multiple Scholars noted that, despite free or relatively free

533
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research September 2001
I

elections, some civilian governments in Central and Contextual Specificity in Political Research
South America to varying degrees lacked effective
Contextual specificity affects many areas of political
power to govern. A basic concern was the persistence
science. It has long been a problem in cross-national
of "reserved domains" of military power over which
survey research (Sicinski 1970; Verba 1971; Verba,
elected governments had little authority (Valenzuela
Nie, and Kim 1978, 32-40; Verba et al. 1987, Appen-
1992, 70). Because procedural definitions of democracy
dix). An example concerning features of national con-
did not explicitly address this issue, measures based
text is Cain and Ferejohn's (1981) discussion of how
upon them could result in a high democracy score for
the differing structure of party systems in the United
these countries, but it appeared invalid to view them as States and Great Britain should be taken into account
democratic. Some scholars therefore amended their
when comparing party identification. Context is also a
systematized concept of democracy to add the differ-
concern for survey researchers working within a single
entiating attribute that the elected government must to
nation, who wrestle with the dilemma of "inter-person-
a reasonable degree have the power to rule (Karl 1990,
ally incomparable responses" (Brady 1985). For exam-
2; Loveman 1994, 108-13; Valenzuela 1992, 70).ple, De-scholars debate whether a given survey item has
bate persists over the scoring of specific cases (Rabkin
the same meaning for different population sub-
1992, 165), but this innovation is widely accepted
groups-which could be defined, for example, by re-
among scholars in the procedural tradition (Hunting-
gion, gender, class, or race. One specific concern is
ton 1991, 10; Mainwaring, Brinks, and Perez-Linan
whether population subgroups differ systematically in
2001; Markoff 1996, 102-4). As a result of this friendly
their "response style" (also called "response sets").
amendment, analysts did a better job of capturing, for
Some groups may be more disposed to give extreme
these new cases, the underlying idea of procedural answers, and others may tend toward moderate an-
minimum democracy. swers (Greenleaf 1992). Bachman and O'Malley (1984)
show that response style varies consistently with race.
They argue that apparently important differences
VALIDITY, CONTEXTUAL SPECIFICITY, AND across racial groups may in part reflect only a different
EQUIVALENCE manner of answering questions. Contextual specificity
also can be a problem in survey comparisons over time,
Contextual specificity is a fundamental concern as that
Baumgartner and Walker (1990) point out in dis-
arises when differences in context potentially threaten
cussing group membership in the United States.
the validity of measurement. This is a central topic in issue of contextual specificity of course also
The
psychometrics, the field that has produced the arises
most in macro-level research in international and
innovative work on validity theory. This literaturecomparative studies (Bollen, Entwisle, and Anderson
emphasizes that the same score on an indicator 1993,may 345). Examples from the field of comparative
have different meanings in different contexts (Mosspolitics are discussed below. In international relations,
1992, 236-8; see also Messick 1989, 15). Hence,attention
the to context, and particularly a concern with
validation of an interpretation of scores generated in
"historicizing the concept of structure," is central to
one context does not imply that the same interpreta-"constructivism" (Ruggie 1998, 875). Constructivists
tion is valid for scores generated in another context. In
argue that modern international relations rest upon
political science, this concern with context can "constitutive
arise rules" that differ fundamentally from
those of both medieval Christendom and the classical
when scholars are making comparisons across different
world regions or distinct historical periods. It can also world (p. 873). Although they recognize that
Greek
arise in comparisons within a national (or other) sovereignty
unit, is an organizing principle applicable across
given that different subunits, regions, or subgroupsdiverse
may settings, the constructivists emphasize that the
constitute very different political, social, or cultural
"meaning and behavioral implications of this principle
contexts.
vary from one historical context to another" (Reus-
The potential difficulty that context poses for valid
Smit 1997, 567). On the other side of this debate,
measurement, and the related task of establishing neorealists such as Fischer (1993, 493) offer a general
measurement equivalence across diverse units, deserve
warning: If pushed to an extreme, the "claim to context
more attention in political science. In a period when
dependency" threatens to "make impossible the collec-
the quest for generality is a powerful impulse intive thepursuit of empirical knowledge." He also offers
social sciences, scholars such as Elster (1999, chap. 1) historical support for the basic neorealist posi-
specific
have strongly challenged the plausibility of seekingtion that the behavior of actors in international politics
general, law-like explanations of political phenomena.
follows consistent patterns. Fischer (1992, 463, 465)
concludes that "the structural logic of action under
A parallel constraint on the generality of findings may
be imposed by the contextual specificity of measure-anarchy has the character of an objective law," which is
ment validity. We are not arguing that the questgrounded
for in "an unchanging essence of human na-
generality be abandoned. Rather, we believe greater ture."
sensitivity to context may help scholars develop mea- The recurring tension in social research between
sures that can be validly applied across diverse con-
particularizing and universalizing tendencies reflects in
texts. This goal requires concerted attention to part
the contrasting degrees of concern with contextual
issue of equivalence. specificity. The approaches to establishing equivalence

534
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American
AmericanPolitical Science Review
Political Science Review Vol. 95, Vol.
No. 95, No.
3 3

discussed equivalent observations


below point that adequately
to tap the
the con-
These cept of sticking point.recognize
approaches Scholars who look only at wage
are important,conflicts run thebut risk of omitting,
they for some nationals
with the contexts, domains
quest for of conflict that are highly relevant to
general
The lessons for
the concept politica
they seek to measure.
empirical By allowing the empirical domain
assessment to which amea
of system-
sarily basedatized
on concept is aappliedparticul
to vary across the units being
claims should compared,
be analysts
made,may take a productive atstep toward
le
to this specific
establishing equivalenceset. To
among diverse contexts. This
heterogeneous practice mustin be carefully
ways justified, but under
tha some
validity, it circumstances
is essentialit can make an important to contribution
(1to
establishingvalidequivalence
measurement. ac
and, if necessary, (2) adopt
Extension to additional ca
dures. Establishing Equivalence: Context-Specific
Indicators and Adjusted Common
Indicators
Establishing Equivalen
Domains of Observation Two other ways of establishing equivalence involve
careful work at the level of indicators. We will discuss
One important means of establishing equivalence context-specific indicators,4 and what we call adjusted
across diverse contexts is careful reasoning, incommon
the indicators. In this second approach, the same
indicator is applied to all cases but is weighted to
initial stages of operationalization, about the specific
domains to which a systematized concept applies. Well
compensate for contextual differences.
before thinking about particular scoring procedures,An example of context-specific indicators is found in
Nie, Powell, and Prewitt's (1969, 377) five-country
scholars may need to make context-sensitive choices
study
regarding the parts of the broader polity, economy, or of political participation. For all the countries,
society to which they will apply their concept. Equiva-
they analyze four relatively standard attributes of par-
lent observations may require, in different contexts, a
ticipation. Regarding a fifth attribute-membership in
focus on what at a concrete level might be seen as
a political party-they observe that in four of the
distinct types of phenomena. countries party membership has a roughly equivalent
Some time ago, Verba (1967) called attention to the
meaning, but in the United States it has a different
form and meaning. The authors conclude that involve-
importance of context-specific domains of observation.
In comparative research on political opposition mentin in U.S. electoral campaigns reflects an equivalent
stable democracies, a standard focus is on politicalform of political participation. Nie, Powell, and Prewitt
parties and legislative politics, but Verba (pp. 122-3)
thus focus on a context-specific domain of observation
notes that this may overlook an analytically equivalent
(the procedure just discussed above) by shifting their
form of opposition that crystallizes, in some countries,
attention, for the U.S. context, from party membership
to campaign participation. They then take the further
in the domain of interest group politics. Skocpol (1992,
6) makes a parallel argument in questioning thestep claimof incorporating within their overall index of
that the United States was a "welfare laggard" because
political participation context-specific indicators that
social provision was not launched on a large scalefor until
each case generate a score for what they see as the
the New Deal. This claim is based on the absence of appropriate domain. Specifically, the overall index for
standard welfare programs of the kind that emerged the United States includes a measure of campaign
earlier in Europe but fails to recognize the distinctive
participation rather than party membership.
forms of social provision in the United States, such A as
different example of context-specific indicators is
veterans' benefits and support for mothers and foundchil- in comparative-historical research, in the effort
dren. Skocpol argues that the welfare laggard charac-
to establish a meaningful threshold for the onset of
terization resulted from looking in the wrong place,democracy in the nineteenth and early twentieth cen-
that is, in the wrong domain of policy. tury, as opposed to the late twentieth century. This
Locke and Thelen (1995, 1998) have extended this effort in turn lays a foundation for the comparative
approach in their discussion of "contextualized analysis
com- of transitions to democracy. One problem in
parison." They argue that scholars who study national
establishing equivalence across these two eras lies in
responses to external pressure for economic decentral-
the fact that the plausible agenda of "full" democrati-
ization and "flexibilization" routinely focus on the has changed dramatically over time. "Full" by
zation
points at which conflict emerges over this economic the standards of an earlier period is incomplete by later
transformation. Yet, these "sticking points" may be
standards. For example, by the late twentieth century,
located in different parts of the economic and political
universal suffrage and the protection of civil rights for
system. With regard to labor politics in different coun-
the entire national population had come to be consid-
tries, such conflicts may arise over wage equity, ered
hours essential features of democracy, but in the nine-
of employment, workforce reduction, or shop-floor
reorganization. These different domains of labor4 rela-
This approach was originally used by Przeworski and Teune (1970,
tions must be examined in order to gather analytically
chap. 6), who employed the label "system-specific indicator."

535
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research September 2001

teenth century they were not (Huntington 1991, 7, 16). basic formal procedures. Thus, for certain purposes, it
Yet, if the more recent standard is applied to the can be analytically productive to adopt a standard
earlier period, cases are eliminated that have long been definition that ignores nuances of context and apply the
considered classic examples of nascent democratiza- same indicator to all cases.
tion in Europe. One solution is to compare regimes In conclusion, we note that although Przeworski and
with respect to a systematized concept of full democ- Teune's (1970) and Verba's arguments about equiva-
ratization that is operationalized according to the lence are well known, issues of contextual specificity
norms of the respective periods (Collier 1999, chap. 1; and equivalence have not received adequate attention
Russett 1993, 15; see also Johnson 1999, 118). Thus, a in political science. We have identified three tools-
different scoring procedure-a context-specific indica- context-specific domains of observation, context-spe-
tor-is employed in each period in order to produce cific indicators, and adjusted common indicators-for
scores that are comparable with respect to this system- addressing these issues, and we encourage their wider
atized concept.5 use. We also advocate greater attention to justifying
Adjusted common indicators are another way to their use. Claims about the appropriateness of contex-
establish equivalence. An example is found in Moene tual adjustments should not simply be asserted; their
and Wallerstein's (2000) quantitative study of social validity needs to be carefully defended. Later, we
policy in advanced industrial societies, which focuses explore three types of validation that may be fruitfully
specifically on public social expenditures for individuals applied in assessing proposals for context-sensitive
outside the labor force. One component of their mea- measurement. In particular, content validation, which
sure is public spending on health care. The authors focuses on whether operationalization captures the
argue that in the United States such health care ideas contained in the systematized concept, is central
expenditures largely target those who are not members to determining whether and how measurement needs
of the labor force. By contrast, in other countries to be adjusted in particular contexts.
health expenditures are allocated without respect to
employment status. Because U.S. policy is distinctive, ALTERNATIVE PERSPECTIVES ON TYPES
the authors multiply health care expenditures in the OF VALIDATION
other countries by a coefficient that lowers their scores
on this variable. Their scores are thereby made roughly Discussions of measurement validity are confounded
equivalent-as part of a measure of public expendi- by the proliferation of different types of validation, and
tures on individuals outside the labor force-to theby an even greater number of labels for them. In this
section
U.S. score. A parallel effort to establish equivalence in we review the emergence of a unified concep-
the analysis of economic indicators is provided
tionby
of measurement validity in the field of psychomet-
Zeitsch, Lawrence, and Salernian (1994, 169), who rics,
use propose revisions in the terminology for talking
an adjustment technique in estimating total factorabout validity, and examine the important treatments
of validation in political analysis offered by Carmines
productivity to take account of the different operating
environments, and hence the different context, ofand
the Zeller, and by Bollen.
industries they compare. Expressing indicators in per-
capita terms is also an example of adjusting indicators
Evolving Understandings of Validity
in light of context. Overall, this practice is used to
In the
address both very specific problems of equivalence, as psychometric tradition, current thinking about
with the Moene and Wallerstein example, as well as
measurement validity developed in two phases. In the
more familiar concerns, such as standardizing byfirst
pop-phase, scholars wrote about "types of validity" in a
ulation. way that often led researchers to treat each type as if it
Context-specific indicators and adjusted common
independently established a distinct form of validity. In
indicators are not always a step forward, and some
discussing this literature we follow its terminology by
referring
scholars have self-consciously avoided them. The use of to types of "validity." As noted above, in the
such indicators should match the analytic goal of the
rest of this article we refer instead to types of "valida-
researcher. For example, many who study democrati- tion."
zation in the late twentieth century deliberately adopt
The first pivotal development in the emergence of a
a minimum definition of democracy in order tounified con- approach occurred in the 1950s and 1960s,
when
centrate on a limited set of formal procedures. They do a threefold typology of content, criterion, and
this out of a conviction that these formal procedures
construct validity was officially established in reaction
are important, even though they may have different to the confusion generated by the earlier proliferation
of types.6
meanings in particular settings. Even a scholar such as Other labels continued to appear in other
disciplines, but this typology became an orthodoxy in
O'Donnell (1993, 1355), who has devoted great atten-
tion to contextualizing the meaning of democracy,
insists on the importance of also retaining a minimal
6 The second of these is often called criterion-related validity.
definition of "political democracy" that focuses on
Regarding these official standards, see American Psychological As-
sociation 1954, 1966; Angoff 1988, 25; Messick 1989, 16-7; Shultz,
Riggs, and Kottke 1998, 267-9. The 1954 standards initially pre-
5 A well-known example of applying different standards for democ-
sented four types of validity, which became the threefold typology in
racy in making comparisons across international regions is 1966
Lipset
when "predictive" and "concurrent" validity were combined as
(1959, 73-4). "criterion-related" validity.

536
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American
AmericanPolitical Science Review
Political Science Review Vol. 95, Vol.
No. 95, No.
3 3

psychology. A the general idea of valid measurement. These


recurring metap specific
terized the procedures generally
three types do not encompass
as content valida-
"som
representing tion and have in common
three differentthe practice of assessing ro
vation" (Guion 1980,
measurement validity by taking386). The
as a point of reference
defined as follows.
established conceptual and/or theoretical relationships.
We find it helpful to group these procedures into two
* Content validity assesses
types according to the kind of theoretical or conceptual
indicator represents the univer
relationship that serves as the point of reference.
in the systematized concept b
Specifically, these types are based on the heuristic
* Criterion validity assesses
distinction between description and explanation.7 First,
duced by an indicator are empi
scores for other
some procedures rely on "descriptive" expectations
variables, ca
concerning whether given attributes are understood as
which are considered direct
nomenon of concern. facets of the same phenomenon. This is the focus of
what we label "convergent/discriminant validation."
* Construct validity has had a range of meanings. One
Second, other procedures rely on relatively well-estab-
central focus has been on assessing whether a given
lished "explanatory" causal relations as a baseline
indicator is empirically associated with other indica-
against which measurement validity is assessed. In
tors in a way that conforms to theoretical expecta-
labeling this second group of procedures we draw on
tions about their interrelationship.
Campbell's (1960, 547) helpful term, "nomological"
These labels remain very influential and are still thevalidation, which evokes the idea of assessment in
centerpiece in some discussions of measurement valid- relation to well-established causal hypotheses or law-
ity, as in the latest edition of Babbie's (2001, 143-4)like relationships. This second type is often called
widely used methods textbook for undergraduates. construct validity in political research (Berry et al.
The second phase grew out of increasing dissatisfac- 1998; Elkins 2000).8 Out of deference to this usage, in
tion with the "trinity" and led to a "unitarian" ap-the headings and summary statements below we will
proach (Shultz, Riggs, and Kottke 1998, 269-71). refer
A to nomological/construct validation.
basic problem identified by Guion (1980, 386) and
others was that the threefold typology was too often
taken to mean that any one type was sufficient to Types of Validation in Political Analysis
establish validity (Angoff 1988, 25). Scholars increas-A baseline for the revised discussion of validation
ingly argued that the different types should be sub- presented below is provided in work by Carmines and
sumed under a single concept. Hence, to continue with Zeller, and by Bollen. Carmines and Zeller (1979, 26;
the prior metaphor, the earlier trinity came to be seen Zeller and Carmines 1980, 78-80) argue that content
"in a monotheistic mode as the three aspects of validation
a and criterion validation are of limited utility
unitary psychometric divinity" (p. 25). in fields such as political science. While recognizing
Much of the second phase involved a reconceptual-that content validation is important in psychology and
ization of construct validity and its relation to content education, they argue that evaluating it "has proved to
and criterion validity. A central argument was that thebe exceedingly difficult with respect to measures of the
latter two may each be necessary to establish validity,
more abstract phenomena that tend to characterize the
but neither is sufficient. They should be understood as
social sciences" (Carmines and Zeller 1979, 22). For
part of a larger process of validation that integrates
criterion validation, these authors emphasize that in
"multiple sources of evidence" and requires the com-many social sciences, few "criterion" variables are
bination of "logical argument and empirical evidence"
available that can serve as "real" measures of the
(Shepard 1993, 406). Alongside this development,phenomena
a under investigation, against which scholars
reconceptualization of construct validity led to "a more
can evaluate alternative measures (pp. 19-20). Hence,
comprehensive and theory-based view that subsumed for many purposes it is simply not a relevant procedure.
other more limited perspectives" (Shultz, Riggs, and
Although Carmines and Zeller call for the use of
Kottke 1998, 270). This broader understanding of multiple sources of evidence, their emphasis on the
construct validity as the overarching goal of a single,
limitations of the first two types of validation leads
integrated process of measurement validation is widely them to give a predominant role to nomological/
endorsed by psychometricians. Moss (1995, 6) statesconstruct validation.
"there is a close to universal consensus among validity In relation to Carmines and Zeller, Bollen (1989,
theorists" that "content- and criterion-related evidence
185-6, 190-4) adds convergent/discriminant validation
of validity are simply two of many types of evidence
that support construct validity."
7 Description and explanation are of course intertwined, but we find
Thus, in the psychometric literature (e.g., Messick this distinction invaluable for exploring contrasts among validation
1980, 1015), the term "construct validity" has become procedures. While these procedures do not always fit in sharply
essentially a synonym for what we call measurement bounded categories, many do indeed focus on either descriptive or
validity. We have adopted measurement validity as the explanatory relations and hence are productively differentiated by
our typology.
name for the overall topic of this article, in part 8 See also the main examples of construct validation presented in the
because in political science the label construct validity major statements by Carmines and Zeller 1979, 23, and Bollen 1989,
commonly refers to specific procedures rather than to 189-90.

537
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research September 2001

to their three types and emphasizes content validation, ent types of validation in qualitative as well as quanti-
which he sees as both viable and fundamental. He also tative research, using examples drawn from both tradi-
raises general concerns about correlation-based ap-tions. Furthermore, we will employ crucial distinctions
proaches to convergent and nomological/construct val- introduced above, including the differentiation of levels
idation, and he offers an alternative approach based onpresented in Figure 1, as well as the contrast between
specific procedures for validation, as opposed to the
structural equation modeling with latent variables (pp.
192-206). Bollen shares the concern of Carmines and overall idea of measurement validity.
Zeller that, for most social research, "true" measures
do not exist against which criterion validation can THREE
be TYPES OF MEASUREMENT
carried out, so he likewise sees this as a less relevant
VALIDATION: QUALITATIVE AND
type (p. 188). QUANTITATIVE EXAMPLES
These valuable contributions can be extended in
We
several respects. First, with reference to Carmines now discuss various procedures, both qualitative
and
Zeller's critique of content validation, we recognize
and quantitative, for assessing measurement validity.
that this procedure is harder to use if concepts
Weare
organize our presentation in terms of a threefold
abstract and complex. Moreover, it often does nottypology:
lend content, convergent/discriminant, and nomo-
logical/construct validation. The goal is to explicate
itself to the kind of systematic, quantitative analysis
routinely applied in some other kinds of validation.
each of these types by posing a basic question that, in
all three
Yet, like Bollen (1989, 185-6, 194), we are convinced it cases, can be addressed by both qualitative
is possible to lay a secure foundation for content
and quantitative scholars. Two caveats should be intro-
validation that will make it a viable, and indeedduced.
essen-First, while we discuss correlation-based ap-
tial, procedure. Our discussion of this taskproaches
below to validity assessment, this article is not
derives from our distinction between the background
intended to provide a detailed or exhaustive account of
and the systematized concept. relevant statistical tests. Second, we recognize that no
Second, we share the conviction of Carmines and boundaries exist among alternative procedures,
rigid
Zeller that nomological/construct validation is given
impor- that one occasionally shades off into another.
Our typology is a heuristic device that shows how
tant, yet given our emphasis on content and conver-
gent/discriminant validation, we do not privilegevalidation
it to procedures can be grouped in terms of basic
the degree they do. Our discussion will seek to questions,
clarify and thereby helps bring into focus parallels
some aspects of how this procedure actually worksand
andcontrasts in the approaches to validation adopted
by qualitative
will address the skeptical reaction of many scholars to and quantitative researchers.
it.
Third, we have a twofold response to the critique of Content Validation
criterion validation as irrelevant to most forms of social
research. On the one hand, in some domains criterion Basic Question. In the framework of Figure 1, does a
validation is important, and this must be recognized. given indicator (level 3) adequately capture the full
For example, the literature on response validity in content of the systematized concept (level 2)? This
survey research seeks to evaluate individual responses "adequacy of content" is assessed through two further
to questions, such as whether a person voted in a questions. First, are key elements omitted from the
particular election, by comparing them to official voting indicator? Second, are inappropriate elements in-
records (Anderson and Silver 1986; Clausen 1968; cluded in the indicator?9 An examination of the scores
Katosh and Traugott 1980). Similarly, in panel studies (level 4) of specific cases may help answer these
it is possible to evaluate the adequacy of "recall" (i.e., questions about the fit between levels 2 and 3.
whether respondents remember their own earlier opin-
Discussion. In contrast to the other types considered,
ions, dispositions, and behavior) through comparison
content validation is distinctive in its focus on concep-
with responses in earlier studies (Niemi, Katz, and
tual issues, specifically, on what we have just called
Newman 1980). On the other hand, this is not one of
adequacy of content. Indeed, it developed historically
the most generally applicable types of validation, and
as a corrective to forms of validation that focused solely
we favor treating it as one subtype within the broader
on the statistical analysis of scores, and in so doing
category of convergent validation. As discussed below,
overlooked important threats to measurement validity
convergent validation compares a given indicator with
(Sireci 1998, 83-7).
one or more other indicators of the concept-in which
Because content validation involves conceptual rea-
the analyst may or may not have a higher level of
confidence. Even if these other indicators are as fallible soning, it is imperative to maintain the distinction we
made between issues of validation and questions con-
as the indicator being evaluated, the comparison pro-
cerning the background concept. If content validation
vides greater leverage than does looking only at one of
is to be useful, then there must be some ground of
them in isolation. To the extent that a well-established,
conceptual agreement about the phenomena being
direct measure of the phenomenon under study is
investigated (Bollen 1989, 186; Cronbach and Meehl
available, convergent validation is essentially the same
as criterion validation.
9 Some readers may think of these questions as raising issues of "face
Finally, in contrast both to Carmines and Zeller and
validity." We have found so many different definitions of face validity
to Bollen, we will discuss the application of the differ-that we prefer not to use this label.

538
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American Political Science Review
AeiaI IPoiia Scec Rve Vol. 95,
Vol. 95, No. 3

1955, concept.
282). In the framework of Figure 1, this procedure
Without it
tion may rapidly
involves beco
revising the indicator (i.e., the scoring proce-
pute over dure)
thein order to sortconcept
cases in a way that better fits
vided if conceptual expectations,
the systemati and potentially fine-tuning
attention can
the systematized concept be
to better fit thefoccases. Ragin
indicator adequately
(1994, 98) terms this process of mutual adjustment ca
Examples of Content
"double fitting." This procedure avoids conceptual
stretching (Collier and Mahon 1993; Sartori 1970), that
metric tradition (Angof
is, a mismatch between a systematized concept and the
Kottke 1998, 267-8), c
scoring of cases, which is clearly an issue of validity.
as focusing on the rel
An example of case-oriented content validation is
(level 3) and the syste
found in O'Donnell's (1996) discussion of democratic
out reference to the s
consolidation. Some scholars suggest that one indicator
We will first present
that adopt ofthis consolidation is thefocus.
capacity of a democratic regime
W
different, to withstand severe crises. O'Donnell argues that by
"case-orie
this standard, some Latin American democracies
chap. 3), identified wi
would be considered more consolidated than those in
the examination of sc
central role in content validation. southern Europe. He finds this an implausible classifi-
cation because the standard leads to a "reductio ad
Two examples from political research illustrate, re-
spectively, the problems of omission of key elements absurdum" (p. 43). This example shows how attention
to specific cases can spur recognition of dilemmas in
from the indicator and inclusion of inappropriate ele-
ments. Paxton's (2000) article on democracy focuses on adequacy of content and can be a productive tool in
the
content validation.
the first problem. Her analysis is particularly salient for
scholars in the qualitative tradition, given its focus on In sum, for case-oriented content validation, upward
choices about the dichotomous classification of cases. movement in Figure 1 is especially important. It can
Paxton contrasts the systematized concepts of democ- lead to both refining the indicator in light of scores and
racy offered by several prominent scholars-Bollen, fine-tuning the systematized concept. In addition, al-
Gurr, Huntington, Lipset, Muller, and Rueschemeyer, though the systematized concept being measured is
Stephens, and Stephens-with the actual content of usually
the relatively stable, this form of validation may
indicators they propose. She takes their systematized lead to friendly amendments that modify the system-
concepts as given, which establishes common concep- atized concept by drawing ideas from the background
tual ground. She observes that these scholars include concept. To put this another way, in this form of
validation both an "inductive" component and concep-
universal suffrage in what is in effect their systematized
concept of democracy, but the indicators they employ tual innovation are especially important.
in operationalizing the concept consider only male Limitations of Content Validation. Content validation
suffrage. Paxton thus focuses on the problem thatmakes an an important contribution to the assessment of
important component of the systematized conceptmeasurement
is validity, but alone it is incomplete, for
omitted from the indicator.
two reasons. First, although a necessary condition, the
The debate on Vanhanen's (1979, 1990) quantitative findings of content validation are not a sufficient con-
indicator of democracy illustrates the alternative prob- dition for establishing validity (Shepard 1993, 414-5;
lem that the indicator incorporates elements that cor-Sireci 1998, 112). The key point is that an indicator
respond to a concept other than the systematized with valid content may still produce scores with low
concept of concern. Vanhanen seeks to capture the overall measurement validity, because further threats
idea of political competition that is part of his system-
to validity can be introduced in the coding of cases. A
atized concept of democracy by including, as a compo-
second reason concerns the trade-off between parsi-
nent of his scale, the percentage of votes won by parties
mony and completeness that arises because indicators
other than the largest party. Bollen (1990, 13, 15) and
routinely fail to capture the full content of a system-
Coppedge (1997, 6) both question this measure of
atized concept. Capturing this content may require a
democracy, arguing that it incorporates elements
complex indicator that is hard to use and adds greatly
drawn from a distinct concept, the structure of the
to the time and cost of completing the research. It is a
party system.
matter of judgment for scholars to decide when efforts
Case-Oriented Content Validation. Researchers en- to further improve the adequacy of content may be-
come counterproductive.
gaged in the qualitative classification of cases routinely
carry out a somewhat different procedure for content It is useful to complement the conceptual criticism of
validation, based on the relation between conceptual
indicators by examining whether particular modifica-
meaning and choices about scoring particular cases.
tions In
in an indicator make a difference in the scoring of
cases. To the extent that such modifications have little
the vocabulary of Sartori (1970, 1040-6), this concerns
the relation between the "intension" (meaning)influence
and on scores, their contribution to improving
validity
"extension" (set of positive cases) of the concept. For is more modest. An example in which their
Sartori, an essential aspect of concept formationcontribution
is the is shown to be substantial is provided by
procedure of adjusting this relation between cases and (2000). She develops an alternative indicator of
Paxton

539
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research September 2001

democracy that takes female suffrage into account, atheoretically. Scholars should have specific conceptual
compares the scores it produces with those produced reasons for expecting convergence if it is to constitute
by the indicators she originally criticized, and shows evidence for validity. Let us suppose a proposed indi-
that her revised indicator yields substantially different cator is meant to capture a facet of democracy over-
findings. Her content validation argument stands on looked by existing measures; then too high a correla-
conceptual grounds alone, but her information about tion is in fact negative evidence regarding validity, for
scoring demonstrates the substantive importance of it suggests that nothing new is being captured.
her concerns. This comparison of indicators in a sense An example of discriminant validation is provided by
introduces us to convergent/discriminant validation, to Bollen's (1980, 373-4) analysis of voter turnout. As in
which we now turn. the studies just noted, different measures of democracy
are compared, but in this instance the goal is to find
Convergent/Discriminant Validation empirical support for divergence. Bollen claims, based
on content validation, that voter turnout is an indicator
Basic Question. Are the scores (level 4) produced by
of a concept distinct from the systematized concept of
alternative indicators (level 3) of a given systematized
political democracy. The low correlation of voter turn-
concept (level 2) empirically associated and thus con-
out with other proposed indicators of democracy pro-
vergent? Furthermore, do these indicators have a vides discriminant evidence for this claim. Bollen con-
weaker association with indicators of a second, differ- cludes that turnout is best understood as an indicator
ent systematized concept, thus discriminating this sec-
of political participation, which should be conceptual-
ond group of indicators from the first? Stronger asso-
ized as distinct from political democracy.
ciations constitute evidence that supports interpreting
Although qualitative researchers routinely lack the
indicators as measuring the same systematized con-
data necessary for the kind of statistical analysis per-
cept-thus providing convergent validation; whereas
formed by Bollen, convergent/discriminant validation
weaker associations support the claim that they mea-
is by no means irrelevant for them. They often assess
sure different concepts-thus providing discriminant
whether the scores for alternative indicators converge
validation. The special case of convergent validation in
or diverge. Paxton, in the example discussed above, in
which one indicator is taken as a standard of reference,
and is used to evaluate one or more alternative indi-
effect uses discriminant validation when she compares
alternative qualitative indicators of democracy in order
cators, is called criterion validation, as discussed above. to show that recommendations derived from her as-

Discussion. Carefully defined systematized concepts, sessment of content validation make a difference em-
and the availability of two or more alternative indica-pirically. This comparison, based on the assessment of
tors of these concepts, are the starting point for scores, "discriminates" among alternative indicators.
convergent/discriminant validation. They lay the Convergent/discriminant validation is also employed
groundwork for arguments that particular indicators when qualitative researchers use a multimethod ap-
measure the same or different concepts, which in turnproach involving "triangulation" among multiple indi-
create expectations about how the indicators may be cators based on different kinds of data sources (Brewer
empirically associated. To the extent that empiricaland Hunter 1989; Campbell and Fiske 1959; Webb et
findings match these "descriptive" expectations, they al. 1966). Orum, Faegin, and Sjoberg (1991, 19) spe-
provide support for validity. cifically argue that one of the great strengths of the
Empirical associations are crucial to convergent/ case study tradition is its use of triangulation for
discriminant validation, but they are often simply the enhancing validity. In general, the basic ideas of con-
point of departure for an iterative process. What
vergent/discriminant validation are at work in qualita-
tive research whenever scholars compare alternative
initially appears to be negative evidence can spur
indicators.
refinements that ultimately enhance validity. That is,
the failure to find expected convergence may encour-
Concerns about Convergent/Discriminant Validation. A
age a return to the conceptual and logical analysis of
first concern here is that scholars might think that in
indicators, which may lead to their modification. Alter-
convergent/discriminant validation empirical findings
natively, researchers may conclude that divergence
always dictate conceptual choices. This frames the
suggests the indicators measure different systematized
issue too narrowly. For example, Bollen (1993, 1208-9,
concepts and may reevaluate the conceptualization
1217) analyzes four indicators that he takes as compo-
that led them to expect convergence. This process
nents of the concept of political liberties and four
illustrates the intertwining of convergent and discrimi-
nant validation.
indicators that he understands as aspects of democratic
rule. An examination of Bollen's covariance matrix
Examples of Convergent/Discriminant Validation.reveals that these do not emerge as two separate
Scholars who develop measures of democracy fre-empirical dimensions. Convergent/discriminant valida-
quently use convergent validation. Thus, analysts whotion, mechanically applied, might lead to a decision to
create a new indicator commonly report its correlation eliminate this conceptual distinction. Bollen does not
with previously established indicators (Bollen 1980, take that approach. He combines the two clusters of
380-2; Coppedge and Reincke 1990, 61; Mainwaring etindicators into an overall empirical measure, but he
al. 2001, 52; Przeworski et al. 1996, 52). This is aalso maintains the conceptual distinction. Given the
valuable procedure, but it should not be employedconceptual congruence between the two sets of indica-

540
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American Political Science Review
I I Ie IR
Vol. 95, No. 3

present article, a central focus in his major method-


tor
cra ological statement on this approach (1989, 190-206).
me He demonstrates, for example, its distinctive contribu-
labels. tion for a scholar concerned with convergent/discrimi-
Another concern arises over the interpretation of nant validation who is dealing with a data set with high
low correlations among indicators. Analysts who lack a correlations among alternative indicators. In this case,
"true" measure against which to assess validity must structural equations with latent variables can be used
base convergent validation on a set of indicators, none to estimate the degree to which these high correlations
of which may be a very good measure of the system- derive from shared systematic bias, rather than reflect
atized concept. The result may be low correlations the valid measurement of an underlying concept.12
among indicators, even though they have shared vari- This approach is illustrated by Bollen (1993) and
ance that measures the concept. One possible solution Bollen and Paxton's (1998, 2000) evaluation of eight
is to focus on this shared variance, even though the indicators of democracy taken from data sets devel-
overall correlations are low. Standard statistical tech- oped by Banks, Gastil, and Sussman.13 For each indi-
niques may be used to tap this shared variance. cator, Bollen and Paxton estimate the percent of total
The opposite problem also is a concern: the limita- variance that validly measures democracy, as opposed
tions of inferring validity from a high correlationto reflecting systematic and random error. The sources
among indicators. Such a correlation may reflect fac- of systematic error are then explored. Bollen and
tors other than valid measurement. For example, two Paxton conclude, for example, that Gastil's indicators
indicators may be strongly correlated because they have "conservative" bias, giving higher scores to coun-
both measure some other concept; or they may mea-tries that are Catholic, that have traditional monar-
sure different concepts, one of which causes the other. chies, and that are not Marxist-Leninist (Bollen 1993,
A plausible response is to think through, and seek to1221; Bollen and Paxton 2000, 73). This line of re-
rule out, alternative reasons for the high correlation.10 search is an outstanding example of the sophisticated
Although framing these concerns in the language of use of convergent/discriminant validation to identify
high and low correlations appears to orient the discus- potential problems of political bias.
sion toward quantitative researchers, qualitative re- In discussing Bollen's treatment and application of
searchers face parallel issues. Specifically, these issues structural equation models we would like to note both
arise when qualitative researchers analyze the sortingsimilarities, and a key contrast, in relation to the
of cases produced by alternative classification proce-practice of qualitative researchers. Bollen certainly
dures that represent different ways of operationalizing shares the concern with careful attention to concepts,
either a given concept (i.e., convergent validation) orand with knowledge of cases, that we have emphasized
two or more concepts that are presumed to be distinct above, and that is characteristic of case-oriented con-
(i.e., discriminant validation). Given that these scholars tent validation as practiced by qualitative researchers.
are probably working with a small N, they may be able He insists that complex quantitative techniques cannot
to draw on their knowledge of cases to assess alterna- replace careful conceptual and theoretical reasoning;
tive explanations for convergences and divergences rather they presuppose it. Furthermore, "structural
among the sorting of cases yielded by different classi- equation models are not very helpful if you have little
fication procedures. In this way, they can make valu- idea about the subject matter" (Bollen 1989, vi; see also
able inferences about validity. Quantitative research-194). Qualitative researchers, carrying out a case-by-
ers, by contrast, have other tools for making thesecase assessment of the scores on different indicators,
inferences, to which we now turn. could of course reach some of the same conclusions
about validity and political bias reached by Bollen. A
Convergent Validation and Structural Equation Models
structural equation approach, however, does offer a
with Latent Variables. In quantitative research, an
important means of responding to the limitations of
simple correlational procedures for convergent/dis-reevaluation of substantive findings-in this case concerning party
identification (Greene 1991, 67-71).
criminant validation is offered by structural equation
12 Two points about structural equation models with latent variables
models with latent variables (also called LISREL-typeshould be underscored. First, as noted below, these models can also
models). Some treatments of such models, to thebe used in nomological/construct validation, and hence should not be
extent that they discuss measurement error, focus theirassociated exclusively with convergent/discriminant validation, which
attention on random error, that is, on reliability (Hay- is the application discussed here. Second, we have emphasized that
duk 1987, e.g., 118-24; 1996).11 However, Bollen hasconvergent/discriminant validation focuses on "descriptive" relations
among concepts and their components. Within this framework, it
made systematic error, which is the concern of themerits emphasis that the indicators that measure a given latent
variable (i.e., concept) in these models are conventionally inter-
10 On the appropriate size of the correlation, see Bollen and Lennoxpreted as "effects" of this latent variable (Bollen 1989, 65; Bollen and
1991, 305-7. Lennox 1991, 305-6). These effects, however, do not involve causal
1 To take a political science application, Green and Palmquist's interactions among distinct phenomena. Such interactions, which in
(1990) study also reflects this focus on random error. By contrast, structural equation models involve causal relations among different
Green (1991) goes farther by considering both random and system-latent variables, are the centerpiece of the conventional understand-
ing of "explanation." By contrast, the links between one latent
atic error. Like the work by Bollen discussed below, these articles are
an impressive demonstration of how LISREL-type models canvariable and its indicators are productively understood as involving a
incorporate a concern with measurement error into conventional "descriptive" relationship.
statistical analysis, and how this can in turn lead to a major 13 See, for example, Banks 1979; Gastil 1988; Sussman 1982.

541
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research
I I I I

fundamentally different procedure that allows scholars tion. When other approaches yield positive evidence,
to assess carefully the magnitude and sources of mea- however, then nomological validation is valuable in
surement error for large numbers of cases and to teasing out potentially important differences that may
summarize this assessment systematically and con- not be detected by other types of validation. Specifi-
cisely. cally, alternative indicators of a systematized concept
may be strongly correlated and yet perform very dif-
Nomological/Construct Validation ferently when employed in causal assessment. Bollen
Basic Question. In a domain of research in which a (1980, 383-4) shows this, for example, in his assess-
ment of whether regime stability should be a compo-
given causal hypothesis is reasonably well established,
nent of measures of democracy.
we ask: Is this hypothesis again confirmed when the
cases are scored (level 4) with the proposed indicator Examples of Nomological/Construct Validation. Lijp-
(level 3) for a systematized concept (level 2) that is one hart's (1996) analysis of democracy and conflict man-
of the variables in the hypothesis? Confirmation is agement in India provides a qualitative example of
treated as evidence for validity. nomological validation, which he uses to justify his
Discussion. We should first reiterate that because the
classification of India as a consociational democracy.
Lijphart first draws on his systematized concept of
term "construct validity" has become synonymous in
consociationalism to identify descriptive criteria for
the psychometric literature with the broader notion of
classifying any given case as consociational. He then
measurement validity, to reduce confusion we use
uses nomological validation to further justify his scor-
Campbell's term "nomological" validation for proce-
ing of India (pp. 262-4). Lijphart identifies a series of
dures that address this basic question. Yet, given
causal factors that he argues are routinely understood
common usage in political science, in headings and
to produce consociational regimes, and he observes
summary statements we call this nomological/construct
that these factors are present in India. Hence, classify-
validation. We also propose an acronym that vividly
ing India as consociational is consistent with an estab-
captures the underlying idea: AHEM validation; that
lished causal relationship, which reinforces the plausi-
is, "Assume the Hypothesis, Evaluate the Measure."
bility of his descriptive conclusion that India is a case of
Nomological validation assesses the performance ofconsociationalism.
indicators in relation to causal hypotheses in order to
Another qualitative example of nomological valida-
gain leverage in evaluating measurement validity.
tion is found in a classic study in the tradition of
Whereas convergent validation focuses on multiple
comparative-historical analysis, Perry Anderson's Lin-
indicators of the same systematized concept, and dis-
eages of the Absolutist State.14 Anderson (1974, 413-5)
criminant validation focuses on indicators of different
is concerned with whether it is appropriate to classify
concepts that stand in a "descriptive" relation to one
as "feudalism" the political and economic system that
another, nomological validation focuses on indicators
emerged in Japan beginning roughly in the fourteenth
of different concepts that are understood to stand in an
century, which would place Japan in the same analytic
explanatory, "causal" relation with one another. Al-
category as European feudalism. His argument is partly
though these contrasts are not sharply presented in
descriptive, in that he asserts that "the fundamental
most definitions of nomological validation, they are
resemblance between the two historical configurations
essential in identifying this type as distinct from con-
as a whole [is] unmistakable" (p. 414). He validates his
vergent/discriminant validation. In practice the con-
classification by observing that Japan's subsequent
trast between description and explanation depends on
development, like that of postfeudal Europe, followed
the researcher's theoretical framework, but the distinc-
an economic trajectory that his theory explains as the
tion is fundamental to the contemporary practice of
historical legacy of a feudal state. "The basic parallel-
political science.
ism of the two great experiences of feudalism, at the
The underlying idea of nomological validation is that
opposite ends of Eurasia, was ultimately to receive its
scores which can validly be claimed to measure a
most arresting confirmation of all, in the posterior
systematized concept should fit well-established expec-
destiny of each zone" (p. 414). Thus, he uses evidence
tations derived from causal hypotheses that involve this
concerning an expected explanatory relationship to
concept. The first step is to take as given a reasonably
increase confidence in his descriptive characterization
well-established causal hypothesis, one variable in
of Japan as feudal. Anderson, like Lijphart, thus fol-
which corresponds to the systematized concept of
concern. The scholar then examines the association of
lows the two-step procedure of making a descriptive
claim about one or two cases, and then offering evi-
the proposed indicator with indicators of the otherdence for the validity of this claim by observing that it
concepts in the causal hypothesis. If the assessment
is consistent with an explanatory claim in which he has
produces an association that the causal hypothesisconfidence.
leads us to expect, then this is positive evidence for
A quantitative example of nomological validation is
validity.
found in Elkins's evaluation of the proposal that de-
Nomological validation provides additional leverage
mocracy versus nondemocracy should be treated as a
in assessing measurement validity. If other types of
dichotomy, rather than in terms of gradations. One
validation raise concerns about the validity of a given
indicator and the scores it produces, then analysts
probably do not need to employ nomological valida- 14 Sebastian Mazzuca suggested this example.

542
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American Political Science Review
Ameicn Pliica Siece evew ol 95 N.
Vol. 95, No. 3

potential defen
which the hypothesis either is or is not reconfirmed,
on using the proposed indicator. Rather, nomological
convergent
(1996, validation 21) show
may focus, as it does in Elkins (2000; see also
strongly Hill, Hanna, and Shafqat 1997), on comparing two
assoc
racy. Elkins
different indicators of the same systematized concept, (2
ical validation, and on asking which better fits causal expectations. A e
association, the choice of a dichotomous measure tentative hypothesis may not provide an adequate
makes a difference for causal assessment. He compares standard for rejecting claims of measurement validity
tests of the democratic peace hypothesis using both outright, but it may serve as a point of reference for
dichotomous and graded measures. According to the comparing the performance of two indicators and
hypothesis, democracies are in general as conflict thereby gaining evidence relevant to choosing between
prone as nondemocracies but do not fight one another. them.
The key finding from the standpoint of nomological Another response to the concern that causal hypoth-
validation is that this claim is strongly supported using eses may be too tentative a ground for measurement
a graded measure, whereas there is no statistically validation is to recognize that neither measurement
significant support using the dichotomous measure. claims nor causal claims are inherently more epistemo-
These findings give nomological evidence for the logically secure. Both types of claims should be seen as
greater validity of the graded measure, because they falsifiable hypotheses. To take a causal hypothesis as
better fit the overall expectations of the accepted given for the sake of measurement validation is not to
causal hypothesis. Elkins's approach is certainly more say that the hypothesis is set in stone. It may be subject
complex than the two-step procedure followed by to critical assessment at a later point. Campbell (1977/
Lijphart and Anderson, but the basic idea is the same. 1988, 477) expresses this point metaphorically: "We are
like sailors who must repair a rotting ship at sea. We
Skepticism about Nomological Validation. Many schol-
trust the great bulk of the timbers while we replace a
ars are skeptical about nomological validation. One particularly weak plank. Each of the timbers we now
concern is the potential problem of circularity. If one trust we may in turn replace. The proportion of the
assumes the hypothesis in order to validate the indica- planks we are replacing to those we treat as sound must
tor, then the indicator cannot be used to evaluate the always be small."
same hypothesis. Hence, it is important to specify that
any subsequent hypothesis-testing should involve hy-
CONCLUSION
potheses different from those used in nomological
validation.
In conclusion, we return to the four underlying issue
A second concern is that, in addition to taking the that frame our discussion. First, we have offered a new
hypothesis as given, nomological validation also pre- account of different types of validation. We hav
supposes the valid measurement of the other system- viewed these types in the framework of a unifie
atized concept involved in the hypothesis. Bollen conception of measurement validity. None of the spe
(1989, 188-90) notes that problems in the measure-cific types of validation alone establishes validity
ment of the second indicator can undermine this
rather, each provides one kind of evidence to be
approach to assessing validity, especially when scholars
integrated into an overall process of assessment. Con
rely on simple correlational procedures. Obviously, tent validation makes the indispensable contribution of
researchers need evidence about the validity of the assessing what we call the adequacy of content o
second indicator. Structural equation models withindicators.
la- Convergent/discriminant validation-takin
tent variables offer a quantitative approach to address-
as a baseline descriptive understandings of the rela
ing such difficulties because, in addition to evaluating
tionship among concepts, and of their relation t
the hypothesis, these models can be specified so as to
indicators-focuses on shared and nonshared variance
provide an estimate of the validity of the second
among indicators that the scholar is evaluating. This
indicator. In small-N, qualitative analysis, theapproach
re- uses empirical evidence to supplement and
searcher has the resource of detailed case knowledge
temper content validation. Nomological/construct val-
to help evaluate this second indicator. Thus, both
idation-taking as a baseline an established causal
qualitative and quantitative researchers have a means
hypothesis-adds a further tool that can tease out
for making inferences about whether this important
additional facets of measurement validity not ad-
presupposition of nomological validation is indeed
dressed by convergent/discriminant validation.
met.
We are convinced that it is useful to carefully
A third problem is that, in many domains in whichdifferentiate these types. It helps to overcome the
political scientists work, there may not be a sufficientlyconfusion deriving from the proliferation of distinct
well-established hypothesis to make this a viable ap-types of validation, and also of terms for these types.
proach to validation. In such domains, it may beFurthermore, in relation to methods such as structural
plausible to assume the measure and evaluate theequation models with latent variables-which provide
hypothesis, but not the other way around. Nomological sophisticated tools for simultaneously evaluating both
validation therefore simply may not be viable. Yet, it is measurement validity and explanatory hypotheses-
helpful to recognize that nomological validation need the delineation of types serves as a useful reminder that
not be restricted to a dichotomous understanding invalidation is a multifaceted process. Even with these

543
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research September 2001
I

models, this process must also incorporate the careful Anderson, Perry. 1974. Lineages of the Absolutist State. Lon
Verso.
use of content validation, as Bollen emphasizes.
Angoff, William H. 1988. "Validity: An Evolving Concept." In Test
Second, we have encouraged scholars to distinguish Validity, ed. Howard Wainer and Henry I. Braun. Hillsdale, NJ:
between issues of measurement validity and broader Lawrence Erlbaum. Pp. 19-32.
conceptual disputes. Building on the contrast between Babbie, Earl R. 2001. The Practice of Social Research, 9th ed.
the background concept and the systematized concept Belmont, CA: Wadsworth.
Bachman, Jerald G., and Patrick M. O'Malley. 1984. "Yea-Saying,
(Figure 1), we have explored how validity issues and Nay-Saying, and Going to Extremes: Black-White Differences in
conceptual issues can be separated. We believe that Response Styles." Public Opinion Quarterly 48 (Summer): 491-509.
this separation is essential if scholars are to give a Banks, Arthur S. 1979. Cross-National Time-Series Data Archive
consistent focus to the idea of measurement validity, User's Manual. Binghamton: Center for Social Analysis, State
University of New York at Binghamton.
and particularly to the practice of content validation.
Baumgartner, Frank R., and Jack L. Walker. 1990. "Response to
Third, we examined alternative procedures for Smith's 'Trends in Voluntary Group Membership: Comments on
adapting operationalization to specific contexts: con- Baumgartner and Walker': Measurement Validity and the Conti-
text-specific domains of observation, context-specific nuity of Research in Survey Research." American Journal of
indicators, and adjusted common indicators. These Political Science 34 (August): 662-70.
Berry, William D., Evan J. Ringquist, Richard C. Footing, and
procedures make it easier to take a middle position Russell L. Hanson. 1998. "Measuring Citizen and Government
between universalizing and particularizing tendencies. Ideology in the American States, 1960-93." American Journal of
Yet, we also emphasize that the decision to pursue Political Science 42 (January): 327-48.
context-specific approaches should be carefully consid- Bevir, Mark. 1999. The Logic of the History of Ideas. Cambridge:
Cambridge University Press.
ered and justified.
Bollen, Kenneth A. 1980. "Issues in the Comparative Measurement
Fourth, we have presented an understanding of of Political Democracy." American Sociological Review 45 (June):
measurement validation that can plausibly be applied 370-90.

in both quantitative and qualitative research. Although Bollen, Kenneth A. 1989. Structural Equations with Latent Variables.
most discussions of validation focus on quantitative New York: Wiley.
Bollen, Kenneth A. 1990. "Political Democracy: Conceptual and
research, we have formulated each type in terms of Measurement Traps." Studies in Comparative International Devel-
basic questions intended to clarify the relevance to opment 25 (Spring): 7-24.
both quantitative and qualitative analysis. We have also Bollen, Kenneth A. 1993. "Liberal Democracy: Validity and Method
given examples of how these questions can be ad- Factors in Cross-National Measures." American Journal of Political
Science 37 (November): 1207-30.
dressed by scholars from within both traditions. These
Bollen, Kenneth A., Barbara Entwisle, and Arthur S. Anderson.
examples also illustrate, however, that while they may 1993. "Macrocomparative Research Methods." Annual Review of
be addressing the same questions, quantitative and Sociology 19: 321-51.
qualitative scholars often employ different tools in Bollen, Kenneth A., and Richard Lennox. 1991. "Conventional
finding answers. Wisdom on Measurement: A Structural Equation Perspective."
Psychological Bulletin 110 (September): 305-14.
Within this framework, qualitative and quantitative Bollen, Kenneth A., and Pamela Paxton. 1998. "Detection and
researchers can learn from these differences. Qualita- Determinants of Bias in Subjective Measures." American Sociolog-
tive researchers could benefit from self-consciously ical Review 63 (June): 465-78.
applying the validation procedures that to some degree Bollen, Kenneth A., and Pamela Paxton. 2000. "Subjective Measures
of Liberal Democracy." Comparative Political Studies 33 (Febru-
they may already be employing implicitly and, in par-
ary): 58-86.
ticular, from developing and comparing alternative Brady, Henry. 1985. "The Perils of Survey Research: Inter-Personally
indicators of a given systematized concept. They should Incomparable Responses." Political Methodology 11 (3-4): 269-91.
also recognize that nomological validation can be Brady, Henry, and David Collier, eds. 2001. Rethinking Social Inquiry:
Diverse Tools, Shared Standards. Berkeley: Berkeley Public Policy
important in qualitative research, as illustrated by the
Press, University of California, and Boulder, CO: Roman &
Lijphart and Anderson examples above. Quantitative Littlefield.
researchers, in turn, could benefit from more fre- Brewer, John, and Albert Hunter. 1989. Multimethod Research: A
quently supplementing other tools for validation by Synthesis of Styles. Newbury Park, CA: Sage.
employing a case-oriented approach, using the close Cain, Bruce E., and John Ferejohn. 1981. "Party Identification in the
United States and Great Britain." Comparative Political Studies 14
examination of specific cases to identify threats to
(April): 31-47.
measurement validity. Campbell, Donald T. 1960. "Recommendations for APA Test Stan-
dards Regarding Construct, Trait, or Discriminant Validity."
American Psychologist 15 (August): 546-53.
Campbell, Donald T. 1977/1988. "Descriptive Epistemology: Psycho-
REFERENCES logical, Sociological, and Evolutionary." Methodology and Episte-
mology for Social Science: Selected Papers. Chicago: University of
Alvarez, Michael, Jose Antonio Cheibub, Fernando Limongi, and
Chicago Press.
Adam Przeworski. 1996. "Classifying Political Regimes." Studies
Campbell, in T., and Donald W. Fiske. 1959. "Convergent and
Donald
Comparative International Development 31 (Summer): Discriminant
3-36. Validation by the Multitrait-Multimethod Matrix."
American Psychological Association. 1954. "Technical Recommen-
Psychological Bulletin 56 (March): 81-105.
dations for Psychological Tests and Diagnostic Techniques." Psy- G., and Richard A. Zeller. 1979. Reliability and
Carmines, Edward
chological Bulletin 51 (2, Part 2): 201-38. Validity Assessment. Beverly Hills, CA: Sage.
American Psychological Association. 1966. Standards forClausen,
Educational
Aage. 1968. "Response Validity: Vote Report." Public
and Psychological Tests and Manuals. Washington, DC:Opinion
American
Quarterly 41 (Winter): 588-606.
Psychological Association. Collier, David. 1995. "Trajectory of a Concept: 'Corporatism' in the
Anderson, Barbara A., and Brian D. Silver. 1986. "Measurement and
Study of Latin American Politics." In Latin America in Compara-
Mismeasurement of the Validity of the Self-Reported Vote." Issues and Methods, ed. Peter H. Smith. Boulder,
tive Perspective:
American Journal of Political Science 30 (November): CO:
771-85.
Westview. Pp. 135-62.

544
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
American
AmericanPolitical Science Review
Political Science Review Vol. 95, Vol.
No. 95, No.
3 3

Collier, David, and Robert


Liberal-Conservative Adcock.
Ideology of U.S. Senators: A New Measure." 19
omies: A Pragmatic Approach
American Journal to
of Political Science 41 (October): 1395-413. Cho
nual Review of Political Science
Huntington, Samuel P. 1991. The Third Wave: Democratization in the2: 537
Collier, David, and Steven
Twentieth Century. Norman: UniversityLevitsky.
of Oklahoma Press. 19
tives: Conceptual Innovation
Jacoby, in
William G. 1991. Data Theory and Dimensional Analysis. Com
Politics 49 (April): Newbury430-51.
Park, CA: Sage.
Collier, David, and Jacoby, William G. 1999. "Levels of
James E. Measurement
Mahon,and Political J
ing' Revisited: Adapting
Research: An Optimistic View." AmericanCategories
Journal of Political
American Political Science Science
43 (January): 271-301. Review 87
Collier, Ruth Berins. 1999.
Johnson, Ollie A. 1999. Paths
"Pluralist Authoritarianism in Comparative Tow
Cambridge UniversityPerspective: White Press.
Supremacy, Male Supremacy, and Regime
Cook, Thomas D., Classification."
and National Donald Political Science Review T.7: 116-36. Camp
tation. Boston: Kaplan, Abraham. 1964. TheMifflin.
Houghton Conduct of Inquiry. San Francisco:
Coppedge, Michael. Chandler.1997. "How a Larg
Small in Democratization
Karl, Terry Lynn. 1990. "Dilemmas of Research.
Democratization in Latin
annual meeting ofAmerica."the American
Comparative Politics 22 (October): 1-21. Po
Washington, DC. Katosh, John P., and Michael W. Traugott. 1980. "The Conse-
Coppedge, Michael, quences of and Wolfgang
Validated and Self-Reported Voting Measures." Public H.
Polyarchy." Studies Opinion inQuarterly Comparative
45 (Winter): 519-35. Int
(Spring): 51-72. King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing
Cronbach, Lee J., Social
and Paul
Inquiry: Scientific Inference inE. Meehl.
Qualitative Research. Prince- 1
Psychological Tests." Psychological
ton, NJ: Princeton University Press. Bu
Dahl, Robert A. 1956.
Kirk, Jerome, and Marc A Preface
L. Miller. 1986. Reliability and Validity to in D
University of ChicagoQualitative Research.Press. Beverly Hills, CA: Sage.
Elkins, Zachary. 2000.
Kurtz, Marcus J. 2000. "Gradations
"Understanding Peasant Revolution: Fromof
of Alternative Conceptualizations."
Concept to Theory and Case." Theory and Society 29 (February): Am
Science 44 (April): 93-124.293-300.
Elster, Jon. 1999. Alchemies
Laitin, David D. 2000. "What Is a Language Community?"of the
American M
University Press. Journal of Political Science 44 (January): 142-55.
Fearon, James, Levitsky, David
and Steven. 1998. "Institutionalization
D. and Peronism: The
Laitin. 20
External Validity: Concept,
Specifying the Case and the Case for Unpacking Concepts the Concept."
Paper presented at the
Party Politics annual
4 (1): 77-92. meetin
Science Association,Lijphart, Arend. Washington,
1984. Democracies: Patterns of Majoritarian and DC.
Fischer, Markus. 1992.
Consensus Government "Feudal
in Twenty-One Countries. New EuropeHaven, CT:
course and Conflictual
Yale University Press. Politics." Inte
(Spring): 427-66.Lijphart, Arend. 1996. "The Puzzle of Indian Democracy: A Conso-
Fischer, Markus. 1993.
ciational Interpretation.""On American Context,
Political Science Review 90 F
tional Organization (June): 258-68.
47 (Summer): 493-
Freeden, Michael. 1996.
Linz, Juan J. 1975. Ideologies
"Totalitarian and Authoritarian Regimes." and In P
Approach. Oxford:HandbookOxford of Political Science, vol. 3, ed. Fred I. Greenstein and
University P
Gallie, W. B. 1956. Nelson"Essentially
W. Polsby. Reading, MA: Addison-Wesley. Contest
Pp. 175-411.
the Aristotelian Society
Lipset, Seymour M. 1959. 51: 167-98.
"Some Social Requisites of Democracy:
Gastil, Raymond D. 1988.
Economic Development and Freedom
Political Legitimacy." American in Po- th
Civil Liberties, 1987-1988.
litical Science Review 53 (March):New 69-105. York:
George, Alexander Locke, L.,
Richard, and and Kathleen Thelen.Andrew
1995. "Apples and Oranges Be
Theory Development. Cambridge,
Revisited: Contextualized Comparisons and the Study of Compar- MA
Gerring, John. 1997. "Ideology:
ative Labor Politics." Politics and Society 23 (September): A Def
337-67.
Research Quarterly Locke, Richard, 50 (December):
and Kathleen Thelen. 1998. "Problems of Equiva- 95
Gerring, John. 1999.
lence in Comparative "What Politics: Apples andMakes Oranges, Again." a
Framework for Understanding
Newsletter of the APSA Organized Section in Comparative ConcePolitics
Sciences." Polity 31
9 (Winter): (Spring):
9-12. 357-93.
Gerring, John. Loveman, Brian.
2001. 1994. "'Protected Democracies'
Practical and Military
Knowle
Social Science Methodology.
Guardianship: Political Transitions in Latin America, New 1979-1993." Yo
Press. Journal of Interamerican Studies and World Affairs 36 (Summer):
Gould, Andrew 105-89. 1999.
C. "Conflictin
Formation." Review Mainwaring,of Politics
Scott, Daniel 61 2001.
Brinks, and Anibal Perez-Lifinn. (Su
Green, Donald "Classifying 1991.
Philip. Political Regimes in Latin America, 1945-1999."
"The Effec
Two-Stage Least-Squares Estimates."
Studies in Comparative International Development 36 (1): 37-64. In
ed. James A. Stimson.
Markoff, John. 1996.Ann Waves of Democracy. Arbor: Un
Thousand Oaks, CA: Pine
Pp: 57-74. Forge.
Green, Donald Philip, and Bradley L. Palmquist. 1990. "Of Artifacts Messick, Samuel. 1980. "Test Validity and the Ethics of Assessment."
and Partisan Instability." American Journal of Political Science 34 American Psychologist 35 (November): 1012-27.
(August): 872-902. Messick, Samuel. 1989. "Validity." In Educational Measurement, ed.
Greenleaf, Eric A. 1992. "Measuring Extreme Response Style." Robert L. Linn. New York: Macmillan. Pp. 13-103.
Public Opinion Quarterly 56 (Autumn): 382-51. Moene, Karl Ove, and Michael Wallerstein. 2000. "Inequality, Social
Guion, Robert M. 1980. "On Trinitarian Doctrines of Validity." Insurance and Redistribution." Working Paper No. 144, Juan
Professional Psychology 11 (June): 385-98. March Institute, Madrid, Spain.
Harding, Timothy, and James Petras. 1988. "Introduction: Democ- Moss, Pamela A. 1992. "Shifting Conceptions of Validity in Educa-
ratization and the Class Struggle." Latin American Perspectives 15 tional Measurement: Implications for Performance Assessment."
(Summer): 3-17. Review of Educational Research 62 (Fall): 229-58.
Hayduk, Leslie A. 1987. Structural Equation Modeling with LISREL. Moss, Pamela A. 1995. "Themes and Variations in Validity Theory."
Baltimore, MD: Johns Hopkins University Press. Educational Measurement: Issues and Practice 14 (Summer): 5-13.
Hayduk, Leslie A. 1996. LISREL: Issues, Debates, and Strategies. Mouffe, Chantal. 1992. Dimensions of Radical Democracy. London:
Baltimore, MD: Johns Hopkins University Press. Verso.
Hill, Kim Q., Stephen Hanna, and Sahar Shafqat. 1997. "The Nie, Norman H., G. Bingham Powell, Jr., and Kenneth Prewitt. 1969.

545
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms
Measurement Validity: A Shared Standard for Qualitative and Quantitative Research September 2001

"Social Structure, and Political Participation: Developmental Re-Schmitter, Philippe C., and Terry Lynn Karl. 1992. "The Types of
lationships, Part I." American Political Science Review 63 (June): Democracy Emerging in Southern and Eastern Europe and South
361-78. and Central America." In Bound to Change: Consolidating Democ-
Niemi, Richard, Richard Katz, and David Newman. 1980. "Recon-racy in East Central Europe, ed. Peter M. E. Volten. New York:
structing Past Partisanship: The Failure of the Party IdentificationInstitute for EastWest Studies. Pp. 42-68.
Recall Questions." American Journal of Political Science 24 (No- Schrodt, Philip A., and Deborah J. Gerner. 1994. "Validity Assess-
vember): 633-51. ment of a Machine-Coded Event Data Set for the Middle East,
O'Donnell, Guillermo. 1993. "On the State, Democratization and
1982-92." American Journal of Political Science 38 (August):
Some Conceptual Problems." World Development 27 (8): 1355-69.825-54.
O'Donnell, Guillermo. 1996. "Illusions about Consolidation." Jour- Schumpeter, Joseph. 1947. Capitalism, Socialism and Democracy.
nal of Democracy 7 (April): 34-51. New York: Harper.
Orum, Anthony M., Joe R. Feagin, and Gideon Sjoberg. 1991. Shepard, Lorrie. 1993. "Evaluating Test Validity." Review of Research
"Introduction: The Nature of the Case Study." In A Case for thein Education 19: 405-50.
Case Study, ed. Joe R. Feagin, Anthony M. Orum, and Gideon Shively, W. Phillips. 1998. The Craft of Political Research. Upper
Sjoberg. Chapel Hill: University of North Carolina Press. Pp. 1-26.Saddle River, NJ: Prentice Hall.
Paxton, Pamela. 2000. "Women in the Measurement of Democracy: Shultz, Kenneth S., Matt L. Riggs, and Janet L. Kottke. 1998. "The
Problems of Operationalization." Studies in Comparative Intera-Need for an Evolving Concept of Validity in Industrial and
tional Development 35 (Fall): 92-111. Personnel Psychology: Psychometric, Legal, and Emerging Issues."
Pitkin, Hanna Fenichel. 1967. The Concept of Representation. Berke-Current Psychology 17 (Winter): 265-86.
ley: University of California Press. Sicinski, Andrzej. 1970. "'Don't Know' Answers in Cross-National
Pitkin, Hanna Fenichel. 1987. "Rethinking Reification." Theory and Surveys." Public Opinion Quarterly 34 (Spring): 126-9.
Society 16 (March): 263-93. Sireci, Stephen G. 1998. "The Construct of Content Validity." Social
Przeworski, Adam, Michael Alvarez, Jose Antonio Cheibub, and Indicators Research 45 (November): 83-117.
Fernando Limongi. 1996. "What Makes Democracies Endure?" Skocpol, Theda. 1992. Protecting Soldiers and Mothers. Cambridge,
Journal of Democracy 7 (January): 39-55. MA: Harvard University Press.
Przeworski, Adam, and Henry Teune. 1970. Logic of Comparative Sussman, Leonard R. 1982. "The Continuing Struggle for Freedom
Social Inquiry. New York: John Wiley. of Information." In Freedom in the World, ed. Raymond D. Gastil.
Rabkin, Rhoda. 1992. "The Aylwin Government and 'Tutelary'Westport, CT: Greenwood. Pp. 101-19.
Democracy: A Concept in Search of a Case?" Journal of Inter- Valenzuela, J. Samuel. 1992. "Democratic Consolidation in Post-
american Studies and World Affairs 34 (Winter): 119-94. Transitional Settings: Notion, Process, and Facilitating Condi-
Ragin, Charles C. 1987. The Comparative Method. Berkeley: Univer- tions." In Issues in Democratic Consolidation, ed. Scott Mainwar-
sity of California Press. ing, Guillermo O'Donnell, and J. Samuel Valenzuela. Notre
Ragin, Charles C. 1994. Constructing Social Research. ThousandDame, IN: University of Notre Dame Press. Pp. 57-104.
Oaks, CA: Pine Forge. Vanhanen, Tatu. 1979. Power and the Means to Power. Ann Arbor,
Reus-Smit, Christian. 1997. "The Constitutional Structure of Inter- MI: University Microfilms International.
national Society and the Nature of Fundamental Institutions." Vanhanen, Tatu. 1990. The Process of Democratization. New York:
International Organization 51 (Autumn): 555-89. Crane Russak.
Ruggie, John G. 1998. "What Makes the World Hang Together? Verba, Sidney. 1967. "Some Dilemmas in Comparative Research."
Neo-Utilitarianism and the Social Constructivist Challenge." In-World Politics 20 (October): 111-27.
ternational Organization 52 (Autumn): 855-85. Verba, Sidney. 1971. "Cross-National Survey Research: The Problem
Russett, Bruce. 1993. Grasping the Democratic Peace. Princeton, NJ:of Credibility." In Comparative Methods in Sociology, ed. Ivan
Princeton University Press. Vallier. Berkeley: University of California Press. Pp. 309-56.
Sartori, Giovanni. 1970. "Concept Misformation in Comparative Verba, Sidney, Steven Kelman, Gary R. Orren, Ichiro Miyake, Joji
Research." American Political Science Review 64 (December):Watanuki, Ikuo Kabashima, and G. Donald Ferree, Jr. 1987. Elites
1033-53. and the Idea of Equality. Cambridge, MA: Harvard University
Sartori, Giovanni. 1975. "The Tower of Babel." In Tower of Babel, Press.
ed. Giovanni Sartori, Fred W. Riggs, and Henry Teune. Interna-Verba, Sidney, Norman Nie, and Jae-On Kim. 1978. Participation
tional Studies Association, University of Pittsburgh. Pp. 7-37. and Political Equality. Cambridge: Cambridge University Press.
Sartori, Giovanni, ed. 1984. Social Science Concepts: A SystematicWebb, Eugene J., Donald T. Campbell, Richard D. Schwartz, and
Analysis. Beverly Hills, CA: Sage. Lee Sechrest. 1966. Unobtrusive Measures: Nonreactive Research in
Sartori, Giovanni, Fred W. Riggs, and Henry Teune. 1975. Tower of the Social Sciences. Chicago: Rand McNally.
Babel: On the Definition and Analysis of Concepts in the SocialZeitsch, John, Denis Lawrence, and John Salernian. 1994. "Compar-
Sciences. International Studies Association, University of Pitts- ing Like with Like in Productivity Studies: Apples, Oranges and
burgh. Electricity." Economic Record 70 (June): 162-70.
Schaffer, Frederic Charles. 1998. Democracy in Translation: Under- Zeller, Richard A., and Edward G. Carmines. 1980. Measurement in
standing Politics in an Unfamiliar Culture. Ithaca, NY: Cornell the Social Sciences: The Link between Theory and Data. Cambridge:
University Press. Cambridge University Press.

546
This content downloaded from
132.174.255.116 on Tue, 21 Jun 2022 13:26:22 UTC
All use subject to https://about.jstor.org/terms

You might also like