On The Measurement of Judicial Ideology: Justice System Journal, The October 2015

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/282589347
On the Measurement of Judicial Ideology
Article in Justice System Journal, The · October 2015

DOI: 10.1080/0098261X.2015.1084249
CITATION READS
1 111
3 authors:
Christopher D. Johnston Maxwell Mak

Duke University City University of New York - John Jay College of Criminal Justice
43 PUBLICATIONS 962 CITATIONS 15 PUBLICATIONS 46 CITATIONS
SEE PROFILE SEE PROFILE
Andrew Sidman
City University of New York - John Jay College of Criminal Justice
18 PUBLICATIONS 153 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Decision-making by Three Judge District Court Panels in VRA Cases View project
All content following this page was uploaded by Andrew Sidman on 22 May 2016.
The user has requested enhancement of the downloaded file.

This is an Accepted Manuscript of an article published by Taylor & Francis in Justice System
Journal in 2016, available online:
http://www.tandfonline.com/10.1080/0098261X.2015.1084249
On the Measurement of Judicial Ideology
Christopher D. Johnston
Department of Political Science
Duke University
cdj19@duke.edu
Maxwell Mak
John Jay College of Criminal Justice
mmak@jjay.cuny.edu
&
Andrew H. Sidman
John Jay College of Criminal Justice
asidman@jjay.cuny.edu
Abstract
Researchers cannot assess the importance of ideology to judicial behavior without good
measures of ideology and great effort has been spent developing measures that are valid and
precise. A few of these have become commonly used in studies of judicial behavior. An
emphasis has naturally been placed on developing continuous measures of ideology, like those
that exist for other institutions. There, however, are concerns with using continuous measures
because they are built on two assumptions that may be untenable when examining judicial
decision-making: that the level of precision assumed by these measures is capturing true
ideological distinctions between judges, and the effects of ideology as measures are uniform
across levels. We examine these assumptions using different specifications of ideology finding
that categorical measures are more valid and better depict the impact of ideology on judicial
decision-making at the U.S. Courts of Appeals, but not the Supreme Court.
Key Words
Appellate courts; Interaction with ideology; Judicial behavior; Judicial decision making; Political
methodology; Supreme Court
2

Theory testing requires that we measure important concepts as precisely as is appropriate.
In the area of judicial politics, the importance of ideology to decision-making cannot be
understated (Segal and Spaeth 2002) and much work has been devoted to the measurement of
ideology with an emphasis on the development of continuous measures. For Supreme Court
justices, Segal-Cover scores (Segal and Cover 1989) are a frequently used measure.1 For Courts
of Appeals judges, the measurement of ideology is equally important. To test hypotheses at this
level, researchers have commonly used three measures of judge ideology: (1) the party of the
appointing president, (2) the scores derived by Giles, Hettinger, and Peppers (2001) coding
strategy (GHP scores), and (3) the scores developed by David Nixon presented as part of the
Political Ideology Measurement Project2 (PIMP scores) and discussed in Howard and Nixon
(2003). The first measure has been utilized in many examinations of Courts of Appeals behavior
(e.g., Songer 1987; Songer and Sheehan 1990), but it has been criticized as, “a poor proxy for
judicial preferences. It fails to account for the reality that presidents of the same party vary in
their preferences and the clear evidence in the selection literature that senators of the president’s
party can constrain the president's choice” (Giles 2008, 53). The latter two measures, especially
GHP scores, have become frequently used measures in more recent statistical examinations of
circuit judge decision-making.3
Discussing GHP scores, Epstein, Martin, Segal, and Westerland (2005) note that the
measure does have, “face, convergent, and construct validity and outperforms other common

1
Martin and Quinn scores (Martin and Quinn 2002) are another commonly used measure of Supreme Court
ideology. We do not include them in this discussion, nor do we analyze them in the manner we analyze Segal-Cover
scores, because of our preference for measures of ideology that are not based on the voting behavior of judges and
justices we are attempting to explain and predict. We do, however, use Martin and Quinn scores as a control
variable in analyses of Court of Appeals judges.
2
The Political Ideology Measurement Project is housed at <http://www2.hawaii.edu/~dnixon/PIMP>.
3
Google Scholar indicates that Giles, Hettinger, and Peppers (2001) has been cited in 327 works at the time of our
writing this manuscript. Giles (2008) notes that 40% of the citations he examined used the scores as measures of
judicial preferences. Davis (2006) and Steigerwalt, Vining, and Stricko (2013) are examples of published works
employing Nixon’s measure.
3

measures, such as the party of the appointing President, or the ideology of the state from which
the judge is selected” (2006, 4). We agree that GHP, as well as PIMP and Segal-Cover scores,
have face validity. GHP and PIMP scores, which utilize common space scores (Poole 1998), are
based on a commonly used measure of ideology. Moreover, we do not empirically question the
content and convergent validity of these measures, which are beyond the scope of this
examination. We, however, do question the degree to which these scores have construct and
predictive validity.4
As scholarship on the U.S. Courts of Appeals grows and, therefore, inclusion of
specifications of judicial ideology become more common, we believe it is important to
reevaluate the role and nature of ideology in the decision-calculus of circuit court judges.
Specifically, we argue that continuous measures, and specifically the scores created by Giles,
Hettinger, and Peppers (2001), may serve as a more appropriate basis for categorizing the
ideology of a particular lower court jurist, but treating ideology as a continuous measure in
models of circuit judge voting behavior is inappropriate. We are not arguing that all measures of
judicial ideology are invalid. To the contrary, we find that Segal-Cover scores provide the most
appropriate measure of justice ideology. We do, however, argue that, in models of circuit judge
voting behavior, the continuous measures of ideology we examine here make an assumption that
is not met, namely that the effect of ideology is uniform across its range, for example, that a shift
from -0.9 to -0.8 produces the same change in behavior as a shift from 0.6 to 0.7. Also, if
continuous ideological measures are interacted with legal contexts (e.g., Bartels 2009) or other

4
The level of content validity is difficult to ascertain for any “measure” of ideology. Theoretically, ideology can be
expressed on a number of meaningful dimensions (e.g., social issues and economic/spending issues, Best 1999).
Even the common space scores on which GHP and PIMP are based have two dimensions. Given that most studies
employing these or similar measures use only the first dimension scores, we do not question whether these measures
are capturing all of the relevant aspects of ideology. We do assume that these measures reflect a high degree of
convergent validity. Looking within levels of the judiciary, there are high, significant correlations between each
measure (r = 0.77 for GHP and PIMP; r = -.6 for Segal-Cover and PIMP) supporting the argument that each pair of
scores is measuring the same concept.
4

constraints, the assumption requires that constraints will have the same impact on ideology
across levels of the continuous measure. In other words, a contextual factor will impact a liberal
with the same level of influence as it would on a conservative judge. If this assumption is not
met, instead of providing additional information, as is typically the case when one moves from
the ordinal to the interval level of measurement, these measures are providing additional noise,
masking what could be non-monotonic effects of ideology.
To support this argument, we first analyze the measurement properties of two continuous
measures of ideology for both the Courts of Appeals and Supreme Court using alternating least
squares optimal scaling. Second, we estimate models of vote choice using each of the
continuous measures and a categorical measure of ideology, comparing various model
performance statistics across specifications. Third, we estimate models interacting ideology with
the broad issue area addressed in the case to ascertain the extent to which different measurement
strategies lead to different conclusions regarding the conditioning effects of ideology on those
factors and vice versa. Briefly, we find that (1) all of the continuous measures of ideology we
examine evince some degree of clustering around particular values, violating the assumption
made by interval-level measures, (2) using a nominal measure of ideology produces better
performing models of Courts of Appeals judge behavior, but not Supreme Court justice behavior,
and (3) level of measurement significantly affects the conclusions drawn regarding the effects of
ideology in the Courts of Appeals models when ideology is interacted with issue areas.
Measures of Judicial Ideology
Before delving into the properties of interval-level measures and the assumptions
researchers make when operationalizing concepts at this level, it is worth taking some time to
present the measures that are examined in this study. One measure for which we do not present
5

analyses is the party of the appointing president.5 In addition to the criticism of the measure
noted in the introduction, the party of the appointing president is, put simply, not a measure of
ideology. If one is explicitly testing hypotheses regarding the differences between Democratic
and Republican appointees, this variable is completely appropriate. As a measure of ideology, as
noted earlier, it fails to address nuances in ideological differences by lumping all judges
appointed by presidents of the same party together (Giles 2008). Presidential partisanship does
not capture the qualitative nor quantitative differences of conservatives, liberals or moderates;
this is especially concerning when we test whether and to what degree case-level factors that
may accentuate or attenuate the role of ideology. While parsimonious in specification, the party
of the appointing president is inadequate for testing conditionality and variation in the role of
judicial ideology. Later in this work, we present a three-category measure of ideology based on
GHP scores. One might argue that the three-category measure we employ is little better than the
dichotomous party of the appointing president. We disagree. The measure we present here is
based on a measure of ideology. It, therefore, does not count all appointees from a given party
the same. While our measure adds only one more category, it allows us to distinguish
meaningful behavioral differences between so-identified liberal, moderate, and conservative
judges, labels that comport with our basic understanding of ideology itself.
Of the continuous measures examined here, the scores created by Segal and Cover (1989)
were created first. The ideology scores measure the preferences of Supreme Court nominees,

5
While we do not present analyses using the party of the appointing president as a proxy for ideology, we did
conduct those analyses. As a dichotomous measure, it is unsuitable for the optimal scaling analysis. When included
in the Courts of Appeals and Supreme Court models, we reach the following conclusions. First, the models
employing presidential party generally perform worse than models including the other specifications of ideology
according to the model performance statistics we discuss later. Second, in the models with and without interactions
between ideology and issue area, we can generally conclude that Democratic appointees are more liberal in their
voting behavior and Republican appointees are more conservative, but we observe none of the complexities of
ideology. Thus in civil rights and liberties cases and economic cases, we would draw the same flawed conclusions
about the role of ideology as we do when using continuous measures.
6

and by extension justices, based on the coding of editorials published in four nationally-focused
newspapers, two of which tend to provide liberal commentary and two of which tend to provide
conservative commentary.6 The scores range from zero to one and increase with perceived
liberalism. Given their relatively small numbers and relatively high salience, particularly at
confirmation, there is more readily available information about Supreme Court justices that one
could use to construct a measure of ideology. Thus, an important distinguishing characteristic of
Segal-Cover scores is that they are based directly on information about the perceived policy
and/or legal preferences of the nominee.
There is typically far less information about lower court nominees and, therefore, far less
information on which one could base a measure of ideology. Giles, Hettinger, and Peppers
(2001) work around this problem by leveraging aspects of the appointment process for lower
court judges and making use of common space scores (Poole 1998), which are widely recognized
as valid measures of ideology for presidents and members of Congress. Giles, Hettinger and
Peppers (2001) place assumptions—arguably heavy ones—on the nature of selection of Article
III jurists to lower federal courts. The GHP score for a lower court judge takes on the value of
the nominating president’s common space score if senatorial courtesy is inactive. If senatorial
courtesy is operative, a given judge’s ideology takes on the value of the home-state senator of the
president’s party; if both home-state senators share the same party affiliation as the president, the
judge’s ideology is measured as the average of the senators’ common space scores. The
underlying assumption is that either the president is able to select a nominee exactly at his ideal
point, a single senator can exercise the same influence, or that two home-state senators will
select a nominee equidistant between their ideal points. Giles (2008) discusses some of the

6
Segal and Cover (1989) use the New York Times and Washington Post as their liberal sources and the Chicago
Tribune and The Los Angeles Times as their conservative sources.
7

criticisms of the measure, particularly the assignment of a home state senator’s common space
score to a judge when senatorial courtesy is operative. Nixon (2004) critiques the measure from
the other direction, noting that senatorial courtesy is inoperative in most appointments. Giles,
Hettinger, and Peppers (2001), therefore, assume that in this context presidents are, “completely
unrestrained in their appointment choices” (Nixon 2004, 1). Both critiques raise concerns for us
regarding the construct validity of the measure. It is safe to assume that the relevant players will
look for nominees proximate to their own positions. That judges selected by presidents (or
senators) with first dimension common space scores of 0.2 and 0.25, for example, evince the
same difference in ideological position is far less tenable. The implications of this are addressed
in the next section.
The scores developed by David Nixon (Howard and Nixon 2003; Nixon 2004) are also
based on common space scores. Rather than rely on assumptions about how presidential and
senatorial preferences impact appointment, Nixon’s method makes use of “bridging” judges:
appointees to the federal bench that have served in Congress and, therefore, have a common
space score based on their roll-call voting behavior. Nixon identifies 63 such individuals, using
them as observations in a regression of common space scores on several independent variables.
The resulting parameter estimates are then used to impute the common space scores of
appointees that have not served in Congress. A benefit of this method is that it can be used to
derive scores for appointees at all levels of the federal judiciary, including the Supreme Court,
thus allowing for direct comparisons between judges at different levels, as well as between
judges and legislators or presidents, and across time. These last two characteristics are shared
with GHP scores. Compared to GHP scores, PIMP scores appear to have an “informational”
advantage in that judicial ideology is a function of information about the judges themselves.
8

This information is, however, limited to the judge’s party identification and whether the judge is
a Southern Democrat or Northeastern Republican. There is no direct information on the policy
preferences of the judges.
We choose to focus our attention on these three measurement strategies for three reasons.
First, all three strategies produce widely used judicial ideology scores. Second, all three
strategies produce scores that are assumed to be and treated as interval-level measures. Third, as
none of these measures are based on judges’ voting behavior on the bench, they are all
potentially ideal for use as independent variables in explanatory models using votes as the
dependent variable.
Measurement as Theory Testing
As Jacoby notes, “all measurement is theory testing. Therefore, measurement always
constitutes a tentative statement about the nature of reality” (1999, 271). When we claim that a
given variable is operationalized at some level of measurement, whether it be nominal, ordinal,
interval, or ratio, we are not making a statement about some immutable reality, but rather are
choosing a theoretical model for how units are mapped from categories to numerical values. In
other words, the level of measurement for a variable is a choice made by the researcher, either
explicitly on the basis of theory and/or empirical evidence, or implicitly through the
unconsidered assigning of values to categories. It is important to recognize that this theoretical
aspect to levels of measurement applies to all variables, even those which we may feel are
intrinsically of a certain level. For example, religious identification may seem inherently
nominal, but within the context of a given model we may wish to operationalize this variable as
ordinal vis-à-vis degree of orthodoxy (Jacoby 1999).
9

In the present context, models of judicial decision-making assume that ideology is an
interval-level variable. While a general understanding of what this means is widespread, it is
worthwhile to consider it in a bit more detail. It is useful to consider levels of measurement as
functions (representing underlying models) which assign values to categories of a given variable,
and thus values to the units in the population of interest (Jacoby 1999). The implicit model
underlying the interval-level measurement of ideology in the present context is as follows:
(1) M(si) = β0 + β1(si) ,
where M(si) is the function mapping the set of observations S into the set of values of the
independent variable denoted M. As seen, the current operationalization assumes a linear
relationship. In other words, it is assumed, a priori, that the difference between values assigned
to units (Mi -Mj) is proportional (β1) to the difference between them in the underlying
characteristic (si – sj), in this case, ideological liberalism. Practically speaking, this implies that a
unit change at any point on the measured ideology scale corresponds with a substantively
identical increase in the underlying trait.
While it is possible that this assumption is approximately true, for the reasons discussed
above we believe that some degree of skepticism is warranted, and thus empirical testing of the
underlying measurement model would be fruitful. Before we discuss our empirical approach,
however, we note again that this is more than a simple empirical exercise. There are real
consequences for ignoring measurement issues with respect to the conclusions we draw from our
analyses. We consider two important potential consequences for the present field here.
First, the appropriate specification of ideology can have serious implications for the
interpretation of the role of ideology. If ideological distinctions are coarser, using an interval-
level measure of ideology would overvalue and overestimate the influence of ideology,
10

suggesting significant differences across levels of ideology where no such distinctions actually
exist. In other words, we would treat judges at different levels as substantively different when
they are not, misconstruing the nature of ideology and its relationship to voting. It is also
possible that the interval level mischaracterizes the size of differences between clusters of judges
at different points on the scale. For example, judges near the midpoint are assumed under the
linear model to be equidistant in terms of the underlying trait from both extremes. There may,
however, be nonlinear relationships between ideological groups and voting such that
conservatives and moderates, or moderates and liberals are more similar in their behavior under
particular conditions.
Second, if a lower level of measurement is warranted, this implies the possibility of
substantively interesting, qualitative distinctions between categories of ideologues. In other
words, ideological distinctions may be more complex conceptually than implied by a
unidimensional continuum. If this is the case, we may expect that moderating factors (e.g., case-
level factors; see Bartels 2009) differentially affect different groups of judges. Just as a lower
level of measurement may imply non-linear main effects on the dependent variable, it may also
imply non-linear conditional effects. To the extent that such qualitative differences do indeed
exist between ideological groups, this constitutes a fruitful avenue for future research in the field.
To date, work at all levels of judicial decision-making has more often than not treated ideology
as a unidimensional, interval-level continuum. This assumption has proved too simple in other
subfields of political science (e.g., Feldman and Johnston 2014), and it is one that deserves
examination in the present field.
11

Data and Variables
In order to test these propositions, we employ the Original U.S. Courts of Appeals
Database7, compiled by Donald Songer, the update to this database, compiled by Ashlyn
Kuersten and Susan Haire, and the U.S. Supreme Court Database, originally compiled by Harold
Spaeth.8 To keep the time periods consistent, we analyze judicial behavior from 1947, the
earliest year available for Supreme Court data, through 2002, the latest year available for Courts
of Appeals data. This is an ideal place to examine the role and nature of ideology at the circuit
courts; not only is it a large collection of cases that spans several decades, but it is also the data
employed in many previous examinations of circuit court decision-making (e.g., Calvin, Collins
and Eshbaugh-Soha 2011; Hettinger, Lindquist and Martinek 2006; Kaheny, Haire and Benesh
2008) and the data employed in the Giles, Hettinger, and Peppers (2001) examination. For all of
the analyses, the dependent variable is whether a judge or justice voted in a liberal direction on
the merits of a given case. We only include votes coded as liberal (1) or conservative (0),
excluding unclassified or mixed votes. As the dependent variable is dichotomous, we estimate
all models, except for the alternating least squares analyses, as logit models.9
Ideology
As described previously, we include two continuous measures of ideology for the Courts
of Appeals judges: GHP scores and PIMP scores. We also include two continuous measures of
ideology for Supreme Court justices: Segal-Cover scores and PIMP scores. At both levels of the
judiciary, performance of the continuous measures is compared to a nominal, categorical

7
Both the Original U.S. Appeals Courts Database (1925-1996) and the Update to the Appeals Court Database
(1997-2002) were obtained from the website of the Judicial Research Initiative at the University of South Carolina
<http://artsandsciences.sc.edu/poli/juri/appct.htm>.
8
Our analyses use the 2013 Release 1 justice-centered version of the Supreme Court Database
<http://supremecourtdatabase.org/index.php>.
9
Data and replication files for all of the analyses we present can be found at www.andrewsidman.com.
12

measure of ideology. For the Courts of Appeals, we employ a three-category measure of
ideology using a tertile split of GHP scores. The analyses include two dummy variables,
Moderate, which is scored 1 for the middle-third of judges, and Conservative, which is scored 1
for upper-third of judges. Liberal judges comprise the excluded category. For the Supreme
Court, we follow the same coding strategy using Segal-Cover scores: Moderate is scored 1 for
the middle-third of justices, Conservative is scored 1 for the lower-third10, and liberal justices are
the excluded category.
Control Variables
The analyses that follow include several control variables thought to explain judicial
voting behavior. The following variables are included in all models. First, we include dummy
variables for three different issue areas: civil rights and liberties, economic, and criminal; other
issue areas comprise the excluded category.11 Second, we use a dummy variable to control for
the direction of the decision of the lower court being reviewed (liberal decisions are coded as 1,
conservative as 0). Given the tendency of appellate courts to affirm district court decisions, we
expect this variable to have a positive effect on the likelihood of a liberal vote by circuit judges.
We expect the opposite effect on the behavior of Supreme Court justices whereas the Court tends
to follow a reversal strategy when reviewing decisions. Third, all models include a dummy
variable coded 1 if the respondent is the federal government and fourth, the interaction between
the lower court decision and whether the respondent is the federal government. Given the

10
Segal-Cover scores increase with perceived liberalism and both GHP and PIMP scores, which are based on
common scores, increase with perceived conservatism.
11
We generate the issue area dummies using GENISS in the Courts of Appeals data (1=criminal, 2 through 5=civil
rights and liberties, and 7=economic) and issueArea in the Supreme Court database (1=criminal, 2 through 5=civil
rights and liberties, and 8=economic).
13

observed advantages the federal government has in court12, we expect the interaction to lower the
magnitude of the effects of the lower court decision.
Two variables are included only in the Courts of Appeals models. We control for panel
effects at the U.S. Courts of Appeals by including the number of judges, not including the judge
of the observation, on the panel appointed by a Democrat. We expect that an increase in the
number of Democratic appointees on the panel will increase the likelihood of a liberal vote.
Additionally, we include the one-year lagged median ideology score of the Court as measured by
Martin and Quinn (2002) scores to account for circuit court responsiveness to the Supreme
Court.13 Given that these scores increase with perceived liberalism, we expect a positive
relationship with the likelihood of a liberal vote. The lagged Supreme Court median is,
naturally, only included in the Courts of Appeals models. For a final set of control variables, the
models all include fixed effects. Courts of Appeals models include fixed effects for year and for
circuit. Supreme Court models include fixed effects for terms of the Court.14 Table 1 presents
summary statistics for all of these variables.
[Table 1 about here]
Analysis of the Measurement Properties of Ideology

12
At the Supreme Court, the Solicitor General’s Office and the federal government as a whole have more experience
appearing before the Supreme Court and achieve greater degrees of success (McGuire 1998; Segal 1990; Sheehan,
Mishler and Songer 1992). Songer and Sheehan (1992) note the same federal government advantage in the United
States Court of Appeals. Much like Sheehan, et al. (1992), Songer and Haire (1992) in looking at obscenity cases at
the Court of Appeals, find varying degrees of success for litigants, but also a distinct advantage for the federal
government. They find that the predicted probability of a vote supporting a defendant when opposed by the federal
government decreases to 6.7%, which is daunting when compared to the 25.3% likelihood when a defendant faces a
local government opponent (977-978). The explanation for federal government success is deference, on the part of
the justices, to a coordinate branch of government. With neither the power of the purse nor the sword, the Court is
reliant on the other branches and overall should be more willing to support a majoritarian, popularly-elected branch.
Thus, this effect should trickle down to the lower federal courts.
13
While we generally prefer non-vote based measures of ideology, Martin and Quinn scores are not being used here
to predict the votes of justices and have the advantage of being dynamic. Given their construction, the use of Martin
and Quinn scores in this context allow the circuit court models to account for judges changing their behavior in
response to the shifting voting behavior of the median justice (or changing median justices).
14
Judge and justice fixed effects cannot be included because the measures of ideology employed here are constant
within each judge.
14

In a typical regression analysis, the values of the variables are taken to be fixed with the
respective regression coefficients as parameters to be estimated from the data. A generalization
of this framework allows the values of the variables, and in turn, the level of measurement on
which the variables are operationalized, to be estimated as additional model parameters. In other
words, the level of measurement becomes another aspect of the model to be determined from the
empirical observations. Observations which are initially assigned to the same category remain in
the same category, but the values assigned to those categories are estimated.15 More specifically,
values are assigned which, along with the estimated regression coefficients, jointly maximize the
value of R2. Thus, the “best-fitting” model is found where fit varies along both structural and
measurement dimensions as opposed to only the former in traditional regression analyses
(Jacoby 1999).
Within the linear regression context, alternating least squares (ALS) can be utilized to
obtain these two sets of parameter estimates. The ALS procedure begins with the original
category values, and estimates the vector of regression coefficients via ordinary least squares
(OLS).16 The regression coefficients are then treated as fixed, and values of the variables are
found which minimize the sum of squared errors. These new values are then treated as fixed,
and OLS is utilized to derive updated regression coefficients. This procedure is repeated until

15
As a basic example, consider party identification measured using a seven-point scale ranging from 0 (strong
Democrat) to 6 (strong Republican). In explaining preferences for defense spending, it may be that the preferences
of strong Democrats and weak Democrats do not differ. An optimal scaling procedure would suggest that party
identification is best measured using six categories, not seven, giving strong and weak Democrats the same category
value.
16
Judge votes are obviously dichotomous. Our utilization of ALS is thus equivalent to the linear probability model
for binary dependent variables. As we are presently interested in the measurement characteristics of the ideology
variable and not significance tests on the structural parameters of the model, and given that the linear probability
model provides unbiased estimates of such parameters (Long 1997), this is unproblematic.
15

improvement in model fit halts (thus Alternating Least Squares).17 We estimate the OLS
regression using judge or justice votes as the dependent variable and ideology18, issue areas, the
direction of the lower court decision, whether the federal government is the respondent, and, for
the Courts of Appeals only, panel effects and the lagged Supreme Court median as independent
variables. Ideology, measured using GHP scores, PIMP scores, and Segal-Cover scores, is the
only variable to which we apply the optimal scaling procedure.19
Prior to the ALS procedure, the researcher specifies constraints upon the value
transformations which are allowed for each variable in the model. In other words, the researcher
may opt to restrict value transformations in line with theory regarding level of measurement, or
allow values to be estimated independent of any such restrictions (other than category integrity).
For example, the researcher may choose to place an ordinal restriction on a given variable during
the ALS estimation. This would imply that any value transformations occurring during the
estimation must maintain the original category ordering from the untransformed variable. If the
measurement level is assumed to be nominal, which is a far weaker assumption, the values of the
transformed variable need not preserve the order of the original values.
Beyond the theoretical justifications for imposing (or not) an ordinal level of
measurement, there are practical benefits and drawbacks associated with assuming a nominal or
an ordinal level of measurement. Restricting transformations to an ordinal level can uncover
whether the effects of the original variable are proportional to the change in the underlying units,
the key assumption of interval-level measures. Allowing the transformed variable to be

17
An example of an optimal scaling routine from Jacoby (N.d.) suggests alternating the OLS and optimal scaling
until improvement in R2 is less than 0.001. In our analyses, we only had to scale ideology once per model. R2
improved between the first and second OLS estimations, but not thereafter.
18
We round the ideology variables to two decimal places to reduce the number of “categories” of the original
variable. For each measure, we retain the following number of unique values: GHP, 95 out of 191 values; PIMP
(Courts of Appeals), 57 out of 182 values; Segal-Cover, 21 out of 23 values; PIMP (Supreme Court), 17 out of 24
values.
19
We use the lm and optiscale packages in R to conduct the ALS analyses.
16

measured as nominal enables the researcher to observe whether the effects of the original
variable are proportional to the change in the underlying units and whether the effects are
actually monotonic. Unrestricting the ALS procedure in this way, however, can be impractical
with a large number of original values. For GHP scores, for example, the resulting graph of
original against optimally scaled values includes ninety-five points, one for each unique, original
value of ideology, arrayed such that it difficult to discern patterns in the results. We present the
unrestricted (nominal) transformations first, and then results for the Courts of Appeals including
the ordinal restriction.
If ideology as measured by GHP, PIMP, or Segal-Cover scores meets all of the
assumptions applied to interval-level variables, the original values and optimally scaled values
should be the same. That is, the values of ideology that maximize R2, and best explain variation
in judicial voting behavior, would be those of the untransformed variable. As Figure 1
demonstrates, this is not the case. For all three measures, we observe a general positive
relationship; increases in the transformed values correspond to increases in the original values.
Yet, practically every transformed value (looking horizontally across the graphs) includes a
range of values of the original measure. We do not observe the point-for-point relationship we
should if these scores were truly interval-level measures.
[Figure 1 about here]
Each panel of Figure 1 presents the optimal scaling results from one model of judicial
voting behavior. Panels A and B present models for Courts of Appeals judges; C and D present
models for Supreme Court justices. In all panels, values of the untransformed variable are
presented along the horizontal axis. Optimally scaled values, which are the values of ideology
that maximize R2, are presented along the vertical axis. If a given variable was best measured at
17

the interval level, the points of the scatter plot would fall on the 45-degree line, presented in each
panel as a dashed, gray line. Looking across the panels, for none of the measures does it appear
safe to assume that a one-unit increase in ideology translates into a corresponding, meaningful
increase or decrease in the likelihood of voting liberal. Furthermore, the optimally scaled
category values are not evenly spaced, which is a central assumption of interval-level measures.
As noted earlier, the unrestricted transformations can produce difficult to read graphs, especially
for the Courts of Appeals models given the larger number of unique original values. Figure 2
presents the same optimal scaling for the Courts of Appeals, but with the imposition of an
ordinal restriction on the transformed values. Here, one can more clearly see the clustering of
original values around particular transformed values. For example, in models of voting behavior,
GHP scores ranging from roughly -0.35 to -0.1, a range of 0.25, are best coded as a single value,
-0.2.
To further support our argument, we conduct additional ALS analyses of the measures of
circuit judge ideology, both of which had a much larger number of unique, original values as
compared to the measures of Supreme Court ideology. In these analyses, we reduce GHP and
PIMP scores to ten, evenly spaced categories. For GHP scores, we begin with the minimum
value of -0.7 and set category ranges equal to 0.131. For PIMP scores, we begin with -0.34 and
set category ranges of 0.075. The ALS analyses presented in Figure 3 allow for unrestricted, that
is nominal, transformation of the original values. Again, if the measures should truly be treated
as interval-level, one would observe a roughly unit-for-unit increase between the original and
optimally scaled values. Figure 3 casts doubts on whether these variables should even be treated
as ordinal. For GHP scores (Panel A), the most liberal voting behavior is observed for judges in
18

the penultimate liberal category (-0.569 to -0.438).20 Likewise, the most conservative voting
behavior is observed for judges in the penultimate conservative category (0.348 to 0.479). We
also observe further clustering in three of the “moderate” categories (original scores ranging
from -0.307 to 0.086). PIMP scores (Panel B) appear even less ordinal than GHP scores.
Original and optimally scaled values generally increase from the first through the fifth
categories. For subsequent categories, the relationship between the two sets of scores is
approximately parabolic with conservative behavior reaching a local maximum in the fifth
category, decreasing through the eighth category, and increasing again through the tenth.
The evidence from the optimal scaling analysis suggests a categorical representation of
ideology may be more appropriate to explaining judicial voting behavior. In and of themselves,
the results do not provide sufficient evidence of the inadequacy of these measures in models of
judicial voting behavior. For that, we turn to more appropriate models of behavior and,
importantly, statistical evaluation of model performance using various operationalizations of
ideology. We recognize that it may be impractical for researchers to optimally scale their
preferred continuous measure of ideology before every examination. Furthermore, measures of
ideology using different values for and definitions of categories, as could result from using
different subsets of the datasets used here or new datasets all together, reduces the ability of
researchers to compare their findings. As a practical solution, we compare the performance of
these continuous measures to an easily created categorical measure based on GHP scores for
Courts of Appeals judges and Segal-Cover scores for Supreme Court justices, as described in the
Data and Variables section.

20
Since GHP (and PIMP) scores increase with conservatism, and the structural model will estimate one parameter
for the effect of ideology, lower values will be associated with a greater propensity to vote in a liberal direction.
19

Comparison of Explanatory Models
The results present the estimation of three specifications of logit models, one for each
operationalization of ideology, using all of the independent variables described in the Data and
Variables sections. Again, all of the specifications include year fixed effects and Courts of
Appeals models also include circuit fixed effects.21 We provide brief discussion of the effects of
the independent variables, but our present emphasis is on the model performance statistics
included at the bottom of Tables 2 and 3. We present five different statistics: (1) the Bayesian
information criterion (BIC’), using a version less sensitive to the number of parameters in the
model22, (2) Akaike’s information criterion (AIC), (3) the area under the receiver operating
characteristic (AUROC) curve, (4) McFadden’s R2, which is the pseudo-R2 reported by Stata,
and (5) the percent correctly predicted. Models are assumed to be “better” when (1) BIC’ and
(2) AIC are lower and when (3) AUROC, (4) McFadden’s R2, and (5) percent correctly predicted
are larger.
As Table 2 demonstrates, the model using a categorical specification of ideology
preforms better than the nearest competitor, the model using GHP scores, on four of the five
statistics. The categorical specification results in a BIC’ that is 13.599 points lower, which
indicates a very strong preference for that model. The categorical specification boasts a larger
area under the ROC curve, a larger McFadden’s R2, and a slightly larger percent correctly

21
Year and circuit fixed effects are not presented in the tables, but are available upon request.
22
We opt for this calculation of the BIC because the categorical specification would be penalized for the extra
parameters. Long and Freese (2006) state there are no important differences between this and the more standard
calculation of BIC. As a means of comparing BIC for any set of models, Long and Freese (2006) note that
differences of between 2 and 6 points represent positive support for selecting the model with the lowest BIC.
Differences between 6 and 10 points represent strong support and differences greater than 10, very strong support.
20

predicted. The categorical specification and the model using GHP scores have an equal AIC. In
general, the model using PIMP scores is a very close third with respect to performance statistics.
The differences in model performance are not striking across the three specifications. Generally,
however, it is argued that continuous measures are preferable to categorical measures because
the latter “throw away” information. In the context of the voting behavior of circuit court judges,
far from losing information, the categorical specification of ideology produces a model that
provides at least as good, if not better explanations, and therefore predictions, of circuit judge
voting behavior. For the Supreme Court, however, a continuous measure, Segal-Cover scores,
clearly produces the best performing model. The model in Table 3 using Segal-Cover scores
excels in all five measures, including a BIC’ difference of 958.717 points between itself and the
model using a categorical specification of ideology.
In addition to presenting the improved performance of a model using a categorical
specification of circuit judge ideology, we demonstrate the potential empirical pitfalls of
continuous measures through Figure 4. Figure 4 plots the predicted probabilities of voting
liberal for circuit judges against the actual behavior of judges at those values of ideology for the
ten categories used in the third ALS analysis. For example, the first category for GHP scores
ranged from -0.7 to -0.569. The first curve in each panel of Figure 4 plots the predicted
probability of a liberal vote for GHP scores from -0.7 to 0.61, increasing in increments of 0.131
(Panel A) and for PIMP scores ranging from -0.34 to 0.41, increasing in increments of 0.075
(Panel B).23 The second curve plots the proportion of votes that are liberal for all judges whose

23
Except for the measures allowed to vary as noted in the text, predicted probabilities are calculated holding GHP
and Segal-Cover scores at their respective means (-0.020 for GHP, 0.547 for Segal-Cover). Dichotomous variables
are kept at their base categories (“other” issue area, conservative lower court decision, federal government is not the
respondent), panel effects at their median (one Democratic appointee on the circuit court panel), and the lagged
21

ideology scores are within the category ranges from Figure 3. While we do not expect a perfect
relationship, we should observe a monotonically decreasing proportion of liberal votes as
categories (conservatism) increase, if the interval level were appropriate. That, clearly, is not the
case. The curves representing actual behavior are essentially the mirror images of the curves
presented in Figure 3, the third ALS analysis. We add the predicted probabilities curves to make
clear that the assumption of behavioral changes that are proportional to unit increases is simply
not met for either GHP scores or PIMP scores.
Control Variables
The control variables have the same substantive effects within each level of the judiciary
across specifications.24 The interpretations that follow use the results from the GHP scores model
and the Segal-Cover scores model. Looking first at the effects of different issue areas, relative to
other issue areas, the probability of a liberal vote by a judge decreases by 0.044 in civil rights
and liberties cases, by 0.028 in economic cases, and by 0.177 in criminal cases. For Supreme
Court justices, relative to other issue areas, the probability of a liberal vote increases by 0.113 in
civil rights and liberties cases, 0.111 in economic cases, and 0.039 in criminal cases. The
direction of the lower court decision, combined with the federal government as respondent, has
significant effects on judicial behavior. In general, circuit judges are most likely to vote liberal
when the lower court decision is liberal and the federal government is the respondent (predicted
probability, 𝑝, equals 0.763). Circuit judges are least likely to vote liberal when the lower court
decision is conservative and the federal government is the respondent (𝑝 equals 0.362). Taken
together, the results suggest a high rate of affirmance and deference to the federal government.

Supreme Court median is held at its mean (0.386). For the Courts of Appeals, marginal effects are calculated for the
7th Circuit, holding the year at 1980. For the Supreme Court, term is held at 1993. None of those effects were
significant and all three were very close to zero.
24
Not only do the control variables have the same effects in the three basic models presented in Tables 2 and 3, but
they also have the same effects in the models that interact ideology with issue areas (Tables 4 and 5).
22

We observe a similar dynamic for the Supreme Court with respect to deference for the federal
government, but we also see preferences for reversing as opposed to affirming lower court
rulings; when the federal government is not the respondent, there is a significantly higher
probability of a justice voting liberal when the lower court decision was in a conservative
direction as opposed to liberal one. We observe significant “panel” effects at the Courts of
Appeals, with an increasing number of Democratic appointees increasing the likelihood that a
judge casts a liberal vote. An increase from zero to two Democrat-appointed judges on the panel
increases the probability of a liberal vote by 0.059. Lastly, circuit court judges exhibit
responsiveness to the Supreme Court. An increase of one standard deviation (0.55) from the
mean of the lagged Supreme Court median, which represents a liberal change in Court
preferences, causes the probability of a liberal vote to increase by 0.083.
Ideology and Interactions
As a final illustration of the caution that needs to be observed when measuring judicial
ideology, we present Tables 4 and 5, which estimate the same models for the Courts of Appeals
and Supreme Court as presented in Tables 2 and 3, adding interactions between ideology and
issue areas. Looking first at model performance, Tables 4 and 5 tell a story similar to Tables 2
and 3. The model using the categorical specification of circuit judge ideology performs at least
as well as models using continuous measures. The categorical specification has the lowest BIC’
(by 11.303 points, indicating strong support for this specification), the highest area under the
ROC curve, the highest McFadden’s R2, the lowest AIC, and the highest percent correctly
predicted. Again, the differences in model performance are not spectacular, but all five statistics
suggest that the categorical specification produces models comparable to those using a
23

continuous specification of ideology. Turning to the Supreme Court, again, the model using
Segal-Cover scores outperforms the other two models on all five statistics.
More important than relative model performance, the conditional effects of ideology on
voting behavior provide, for us, strong support for the argument that researchers need to consider
the advantages of categorical measures over continuous ones in this context. We begin this time
with the straightforward presentation of conditional effects in models of Supreme Court
decision-making. All three models expect the likelihood of voting liberal to decrease as
perceived conservatism increases (Segal-Cover scores decrease and PIMP scores increase) across
all issue areas. The interactions between ideology and issue area all have the effect of
accentuating the impact of ideology relative to its effects in other, unspecified issue areas (the
base category). Liberal justices become more likely to vote liberal, conservative justices become
more likely to vote conservative, and moderate justices become slightly more likely to vote
conservative, although this result for moderates is statistically weaker for economic cases.
Importantly, we observe the same substantive relationships for all three specifications of
ideology.
The same cannot be said of the Courts of Appeals models. Recall that an assumption of
interval-level measures is that a constant distance between sets of values of the independent
variable will produce proportionally constant responses in the dependent variable. Consider, for
example, the interaction between civil rights and liberties cases and PIMP scores (the second
substantive column of Table 4). The coefficients suggest that liberal judges become more liberal
when deciding civil rights and liberties cases, moderate judges become slightly more liberal, and
conservative judges become more conservative. Thus, we draw a similar conclusion to the
24

effects of ideology on Supreme Court behavior. Liberal judges are more likely to vote liberal
than moderate judges, and moderate judges are more likely to vote liberal than conservative
judges. The coefficients from the last column of Table 4, however, tell a different story. The
effect of Moderate is not statistically significant, suggesting that in civil rights and liberties
cases, there are no statistical differences in the voting behavior of moderate judges and liberal
judges, ceteris paribus.
Figure 5 graphs the predicted probabilities of a liberal vote in each issue area when
continuous measures equal their minimum value, 25th percentile, median (50th percentile), 75th
percentile, and maximum value. Categories of the nominal level measure are placed with the
minimum, median, and maximum values for ease of presentation. In the bottom two panels,
depicting criminal cases and other issue areas, all three measures produce substantively similar
conclusions: the propensity to vote liberal decreases with conservatism. In the top two panels,
depicting civil rights and liberties cases and economic cases, the continuous measures, because
they assume (and therefore force) proportional changes in the response, tell a misleading story
about the role of ideology in decision-making. In civil rights and liberties cases (Panel A), while
moderates are more likely to vote conservative than liberals, the difference between these two
types of judges (0.042) is much smaller than the difference between moderates and conservatives
(0.107). Such differences are even sharper in economic cases where moderate judges are, on
average, more liberal in their voting behavior than liberal judges.25 Conservative judges are,
once again, far more conservative in their behavior than the other two groups. Thus, for these
two important types of cases, reliance on continuous measures of ideology would lead the

25
The coefficient of the interaction between moderate judges and economic cases is not listed as statistically
significant, but it does come close to conventional levels of significance (p = 0.060).
25

researcher to conclude that increasing conservatism leads to decreases in the likelihood of voting
liberal. In actuality, “moderate” judges are much closer to their liberal colleagues in both civil
rights and liberties and economic cases. Coupled with the model performance statistics, the
results raise serious doubts about the conclusions drawn from analyses using continuous
measures of ideology in the current context.
Conclusion
The measurement strategies presented in Giles, Hettinger, and Peppers (2001) and
Howard and Nixon (2003) are a welcome departure from using the party of the appointing
president as a proxy for judicial preferences. Rather than paint all nominees with the same broad
brush, GHP and PIMP scores leverage the selection process, variation in presidential or
senatorial preferences, or basic characteristics of the nominees to better distinguish liberal,
moderate, and conservative judges. Keeping these measures at the interval level, however,
places a heavy assumption on the manner by which ideology affects behavior. Specifically, it is
assumed that unit increases in these variables cause proportional changes in the likelihood judges
cast liberal votes; that a change from -0.5 to -0.45 in a given ideology scale not only produces a
meaningful change in voting behavior, but also that the same change in behavior results from a
change from 0.45 to 0.5. We argue that this assumption is untenable and our argument is
supported in the context of voting behavior at the Courts of Appeals.
We recognize that quantitative scholars generally prefer continuous measures over
categorical ones. Continuous measures provide increased variation and greater precision. These
benefits are only realized, however, when the assumptions of these levels of measurement are
met. This is not to say that the field should abandon GHP or PIMP scores as measures of circuit
judge ideology. The methods of their creation place both measures on the same scale as actors in
26

the coordinate branches of government, making them well-suited to inter-branch ideological
comparisons.
In studies of judicial behavior that do not include such explicit ideological comparisons,
we find issues with these continuous measures that can be resolved by lowering the level of
measurement. Different research contexts, and different hypotheses, may call for different types
of measures. Ideology can be measured using any number of categories appropriate to testing
these arguments. We present a three-category measure, which, while simple, brings a number of
benefits beyond dropping the unmet assumptions of the interval level. Particularly in models of
circuit judge voting behavior, especially when ideology is interacted with contextual factors, the
categorical measure: (1) is easy to construct assuming the researcher already has GHP scores, (2)
is easy to interpret, (3) produces models that perform just as good as, if not better than, models
using the common continuous alternatives, and (4) leads to more appropriate conclusions
regarding the effects of ideology.
We present three sets of analyses for the Courts of Appeals and Supreme Court. In the
first, it was demonstrated that all four continuous measures of judicial ideology evince some
degree of clustering around particular values when used in models of vote choice. Thus, model
fit improves when these continuous measures are reduced to sets of categories. Second, we
present logit models of vote choice using the continuous measures and a simple, categorical
construction, based on GHP and Segal-Cover scores, identifying liberal, moderate, and
conservative judges and justices. On five different statistics, the Courts of Appeals model
employing the categorical measure performed slightly better than the models using continuous
measures. Thus, there is no information loss dropping from the interval to the nominal level and
drastically decreasing the number of categories into which judges are placed. Interestingly,
27

Segal-Cover scores, and not the categorical measure based on them, provide better explanations
of justice voting behavior. We speculate that the reason for this is that Segal-Cover scores are
based on information about the nominees’ preferences directly. GHP and PIMP scores, even
those for Supreme Court justices, make assumptions about nominee preferences based on
presidential/senatorial preferences and nominee party identification/region respectively. It
appears that the information used by Segal and Cover (1989), which unfortunately is not
available for lower court judges, lends itself to a more precise measure of ideology. Third, we
find for the Courts of Appeals that a categorical measure of ideology produces different
substantive conclusions regarding the effects of ideology when ideology is interacted with the
issue area addressed by a case. Such qualitative differences in the effect of a moderating variable
can only emerge by relaxing the measurement constraints of GHP and PIMP.
Beyond questions of measurement, our analyses also demonstrate the importance of
estimating well-specified models. While ideology is an important predictor of judicial behavior,
other factors exert strong influences over the votes of judges and justices. The substantive issue
area of the case and the presence of the federal government as a party in particular are powerful
predictors of judicial behavior regardless of how ideology is measured in the context we study
here.
One final thought we wish to reiterate is that the context under which judges are studied
is important. Our argument should not be construed as cautioning against the use of continuous
measures of ideology for circuit judges in all research, even for the two measures we examine
here. Furthermore, we do not argue that usage of the party of the appointing president is
necessarily poor. The results we present, however, demonstrate that what are seemingly greater
levels of precision are not automatically better measures in the context of ideology and voting
28

behavior for the data examined here. For other research questions, for other datasets, continuous
measures of circuit judge ideology may be perfectly appropriate. We do argue that researchers
should be particularly mindful of the measures of ideology they employ in empirical
examinations of judicial voting behavior. Moreover, future examinations should be especially
cautious when testing the conditional impact and effects of ideology. We hope that our
discussion serves to make researchers more aware of the assumptions and consequences of those
measurement choices, and to help researchers make those decisions conscientiously.
29

References
Bartels, Brandon L. 2009. “The Constraining Capacity of Legal Doctrine on the US Supreme
Court.” American Political Science Review 103: 474-495.
Best, Samuel J. 1999. “The Sampling Problem in Measuring Policy Mood: An Alternative
Solution.” Journal of Politics 61: 721-740.
Calvin, Bryan, Paul M. Collins, Jr., and Matthew Eshbaugh-Soha. 2011. “On the Relationship
between Public Opinion and Decision Making in the U.S. Courts of Appeals.” Political
Research Quarterly 64: 736-748.
Davis, Jeffrey. 2006. “Justice Without Borders: Human Rights Cases in U.S. Courts.” Law and
Policy 28: 60-82
Epstein, Lee, Andrew D. Martin, Jeffrey A. Segal, and Chad Westerland. 2005. “The Judicial
Common Space.” Journal of Law, Economics, & Organization 23: 303-325.
Feldman, Stanley, and Christopher Johnston. 2014. “Understanding the Determinants of Political
Ideology: Implications of Structural Complexity.” Political Psychology 35: 337-358.
Giles, Michael W. 2008. “Commentary on ‘Picking Federal Judges: A Note on Policy and
Partisan Selection Agendas.” Political Research Quarterly 61: 53-55.
Giles, Michael W., Virginia A. Hettinger, and Todd C. Peppers. 2001. “Picking Federal Judges:
A Note on Policy and Partisan Selection Agendas.” Political Research Quarterly 54: 623-
641.
Hettinger, Virginia A., Stefanie A. Lindquist, and Wendy L. Martinek. 2006. Judging on a
Collegial Court: Influence on Federal Appellate Decision-Making. Charlottesville: Virginia
University Press.
30

Howard, Robert M. and David C. Nixon. 2003. “Local Control of the Bureaucracy: Federal
Appeals Courts, Ideology, and the Internal Revenue Service.” Journal of Law and Policy 13:
233-256.
Jacoby, William G. 1999. “Levels of Measurement and Political Research: An Optimistic View.”
American Journal of Political Science 43: 271-301.
Jacoby, William G. N.d. “opscale: A Function for Optimal Scaling.”
http://polisci.msu.edu/jacoby/icpsr/scaling/computing/alsos/Jacoby,%20opscale%20MS.pdf
(February 24, 2015).
Kaheny, Erin B., Susan B. Haire, and Sara C. Benesh. 2008. “Change over Tenure: Voting,
Variance, and Decision Making on the U.S. Courts of Appeals.” American Journal of
Political Science 52: 490-503.
Long, J. Scott. 1997. Regression Models for Limited and Categorical Dependent Variables.
Thousand Oaks, CA: Sage.
Long, J. Scott and Jeremy Freese. 2006. Regression Models for Categorical Dependent
Variables Using Stata, 2nd Edition. College Station, TX: Stata Press.
Martin, Andrew D., and Kevin M. Quinn. 2002. “Dynamic Ideal Point Estimation via Markov
Chain Monte Carlo for the U.S. Supreme Court, 1953-1999.” Political Analysis 10: 134–53.
McGuire, Kevin T. 1998. “Explaining Executive Success in the U.S. Supreme Court,” Political
Research Quarterly 51: 505-526.
Nixon, David C. 2004. “Appendix A: Ideology Scores for Judicial Appointees.” November 23.
http://www2.hawaii.edu/~dnixon/PIMP/judicial.pdf (February 16, 2015).
Poole, Keith T. 1998. “Recovering a Basic Space from a Set of Issue Scales.” American Journal
of Political Science 42: 954-993.
31

Segal, Jeffrey A. 1990. “Supreme Court Support for the Solicitor General: The Effect of
Presidential Appointments.” Western Political Quarterly 43: 137-152.
Segal, Jeffrey A. and Albert D. Cover. 1989. “Ideological Values and the Votes of U.S. Supreme
Court Justices.” American Political Science Review 83: 557-565.
Segal, Jeffrey A. and Harold J. Spaeth. 2002. The Supreme Court and the Attitudinal Model
Revisited. Cambridge, UK: Cambridge University Press
Sheehan, Reginald S., William Mishler & Donald R. Songer. 1992. “Ideology, Status, and the
Differential Success of Direct Parties Before the Supreme Court.” American Political Science
Review 86: 464-471.
Songer, Donald R. 1987. “The Impact of the Supreme Court on Trends in Economic Policy
Making in the United States Courts of Appeals.” Journal of Politics 49: 830-41.
Songer, Donald R., and Susan Haire. 1992. “Integrating Alternative Approaches to the Study of
Judicial Voting: Obscenity Cases in the U.S. Courts of Appeals.” American Journal of
Political Science 36: 963-82.
Songer, Donald R., and Reginald S. Sheehan. 1990. “Supreme Court Impact on Compliance and
Outcomes: Miranda and New York Times in the United States Courts of Appeals.” Western
Political Quarterly 43: 297-319.
Songer, Donald R., & Reginald S. Sheehan. 1992. “Who Wins on Appeal? Upperdogs and
Underdogs in the United States Court of Appeals.” American Journal of Political Science 36:
235-258.
Steigerwalt, Amy, Richard L. Vining, Jr., and Tara W. Stricko. 2013. “Minority Representation,
the Electoral Connection, and the Confirmation Vote of Sonia Sotomayor.” Justice System
Journal 34: 189-207.
32

Figure 1.
Optimal Scaling of Continuous Measures of Ideology: Nominal Transformations
Note: Each panel presents the scatter plot of the original values of each measure (on the horizontal axis) against
the optimally scaled values (on the vertical axis), derived from alternating least squares analyses as described in
the text. For the purpose of comparison, the “45-degree” line, on which the points would fall if the original and
optimally scaled values were equal, is presented as a dashed, gray line. Note also that Panel A, the optimal
scaling of GHP scores, does not include one extreme point. A GHP score of 0.4 has an optimally scaled value
of -3.38. Inclusion of this point would have distorted the rest of the graph.
33

A. Courts of Appeals: GHP Scores
0.8
0.6
Optimally Scaled Values 0.4
0.2
-0.2
-0.4
-0.6
-0.8
-0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8
Original Values
B. Courts of Appeals: PIMP Scores

0.8
0.6
0.4
Optimally Scaled Values
0.2
-0.2
-0.4
-0.6
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
Original Values
Figure 2.
Optimal Scaling of Continuous
Measures of Ideology: Ordinal Transformations
Note: This figure is similar to Figure 1 except that the optimally scaled values
assume an ordinal relationship between values in the original measure. Again,
the 45-degree line is presented for comparison purposes.
34

A. Courts of Appeals: GHP Scores
10
6
Optimally Scaled Value
-2
Original Value
B. Courts of Appeals: PIMP Scores

12
10
8
Optimally Scaled Value
-2
Original Value
Figure 3.
Nominal Transformation of 10-Category GHP and PIMP Scores
Note: This figure presents the optimal scaling analyses of GHP and PIMP scores both
measured originally as ten-point scales. Each score was transformed into a ten-point scale
using equally sized categories (0.131 for GHP and 0.075 for PIMP) prior to analysis. Same
as the analyses presented Figure 1, the analyses here do not force the optimally scaled
values to maintain the original order of the categories.
35

A. Predicted v. Actual Liberal Voting by GHP Scores
0.5
0.48
0.46
0.44
Probability / Proportion
0.42
0.4
0.38
0.36
0.34
0.32
0.3
GHP Score / Range
P(Lib. Vote) % Lib. Votes
B. Predicted v. Actual Liberal Voting by PIMPScores

0.5
0.48
0.46
0.44
Probability / Proportion
0.42
0.4
0.38
0.36
0.34
0.32
0.3
PIMP Score / Range
P(Lib. Vote) % Lib. Votes
Figure 4.
Predicted and Actual Liberal Voting at the Courts of Appeals
Note: Each panel presents two curves. The first is the predicted probability of voting
liberal at the first listed (lower) value of GHP scores (Panel A) and PIMP scores (Panel B)
in the category range. The second is the actual proportion of votes cast by judges with
GHP or PIMP scores in the given range that are liberal.
36

Figure 5.
The Effects of Ideology Across Issue Areas
Note: Predicted probabilities are generated from the coefficients in Table 4 holding the lower court decision and
the federal government as respondent at their base values, panel effects at their median, and the lagged Supreme
Court median at its mean. For ease of presentation, value labels of the categorical variable are placed at the
minimum, median, and maximum values of the continuous measures to which they are compared.
37

Table 1.
Summary Statistics
Variable Obs. Mean SD Min Max Median
Courts of Appeals
Liberal Vote 32,884 0.382 0.486 0 1 0
GHP Scores 32,884 -0.020 0.347 -0.699 0.608 -0.007
PIMP Scores 32,606 0.055 0.253 -0.336 0.409 0.029
Liberal District Court 32,884 0.309 0.462 0 1 0
Federal Gov. Respondent 32,884 0.484 0.500 0 1 0
# Dem. Appointees on Panel 32,884 0.954 0.944 0 10 1
Lagged S.C. Median 32,884 0.386 0.550 -0.969 1.122 0.588
Issue Area 32,884 Freq. Percent
Civil Rights & Liberties 4,902 14.9%
Economic 13,469 41.0%
Criminal 11,537 35.1%
Other (Base Category) 2,976 9.0%
Supreme Court
Liberal Vote 74,789 0.533 0.499 0 1 1
Segal-Cover Scores 74,789 0.547 0.330 0 1 0.5
PIMP Scores 74,789 0.052 0.255 -0.312 0.366 -0.016
Liberal Lower Court 74,789 0.425 0.494 0 1 0
Federal Gov. Respondent 74,789 0.193 0.394 0 1 0
# Liberal Votes 74,789 4.098 2.802 0 8 4
Issue Area 74,789 Freq. Percent
Civil Rights & Liberties 22,622 30.3%
Economic 15,485 20.7%
Criminal 15,741 21.0%
Other (Base Category) 20,941 28.0%
38

Table 2.
Logit Models of Court of Appeals Voting Behavior
Variables GHP PIMP Categorical
Coef. SE Coef. SE Coef. SE
Ideology
GHP -0.471* 0.039
PIMP -0.601* 0.053
Moderate -0.118* 0.031
Conservative -0.452* 0.035
Control Variables
Issue Area
Civil Rights & Liberties -0.188* 0.051 -0.188* 0.051 -0.187* 0.051
Economic -0.119* 0.044 -0.124* 0.044 -0.118* 0.044
Criminal -0.852* 0.050 -0.850* 0.050 -0.852* 0.050
Liberal District Court 0.995* 0.034 0.990* 0.034 0.998* 0.034
Federal Gov. Respondent -0.144* 0.033 -0.150* 0.035 -0.143* 0.035
× Liberal Lower Court 0.738* 0.058 0.740* 0.058 0.739* 0.058
# Dem. Appointees on Panel 0.123* 0.014 0.121* 0.014 0.126* 0.014
Lagged USSC Median 0.617.. 0.350 0.832* 0.361 0.702* 0.351
Intercept -0.756* 0.167 -0.829* 0.172 -0.603* 0.167
Model Evaluation Statistics

BIC' -4,722.185 -4,638.933 -4,735.784 (13.599)
AIC 1.168 1.169 1.168
AUROC 0.7335 0.7327 0.7340
McFadden’s R2 0.1256 0.1247 0.1261
% Correctly Predicted 70.8% 70.8% 70.9%
Observations 32,884 32,606 32,884
* p < 0.05
Note: Model evaluation statistics in bold type indicate the best performing model by that statistic. The
number in parentheses is the absolute difference between the BIC’ of the model using the categorical
measure and the model with the next lowest BIC’ score (using GHP scores). All models were estimated
with circuit and year fixed effects. Those coefficients are available upon request.
39

Table 3.
Logit Models of Supreme Court Voting Behavior
Variables Segal-Cover PIMP Categorical
Ideology
Segal-Cover 1.441* 0.029
PIMP -1.146* 0.033
Moderate -0.516* 0.020
Control Variables
Issue Area
Civil Rights & Liberties 0.459* 0.021 0.453* 0.021 0.455* 0.021
Economic 0.451* 0.023 0.446* 0.023 0.447* 0.023
Criminal 0.157* 0.023 0.155* 0.023 0.157* 0.023
Liberal Lower Court -0.951* 0.018 -0.931* 0.018 -0.938* 0.018
Intercept -0.833* 0.058 0.088* 0.054 0.624* 0.055

BIC' -7,623.435 (958.717) -6,193.584 -6,664.718
AIC 1.273 1.292 1.285
AUROC 0.6880 0.6713 0.6782
McFadden’s R2 0.0805 0.0667 0.0713
Observations 74,789 74,789 74,789
* p < 0.05
Note: Model evaluation statistics in bold type indicate the best performing model by that statistic. The number
in parentheses is the absolute difference between the BIC’ of the model using Segal-Cover scores and the
model with the next lowest BIC’ score (using the categorical measure). All models were estimated with term
fixed effects. Those coefficients are available upon request.
40

Table 4.
Courts of Appeals Models with Ideology-Issue Area Interactions
Variables GHP PIMP Categorical
Ideology
GHP -0.500* 0.115
PIMP -0.536* 0.156
Moderate -0.171.. 0.091
Ideology-Issue Interactions
Civil Rights & Liberties -0.191* 0.052 -0.158* 0.052 -0.113.. 0.084
× Ideology -0.226.. 0.143 -0.465* 0.198 M: -0.002.. 0.116
C: -0.260* 0.130
Economic -0.112* 0.044 -0.133* 0.044 -0.231* 0.073
× Ideology 0.247* 0.126 0.179.. 0.171 M: 0.188.. 0.100
C: 0.143.. 0.115
Criminal -0.856* 0.050 -0.837* 0.050 -0.752* 0.078
× Ideology -0.157.. 0.133 -0.236.. 0.180 M: -0.106.. 0.105
C: -0.232.. 0.121
Control Variables
Liberal District Court 0.996* 0.034 0.990* 0.034 0.999* 0.034
# Dem. Appointees on Panel 0.123* 0.014 0.121* 0.014 0.127* 0.014
Lagged USSC Median 0.594.. 0.350 0.807* 0.360 0.673.. 0.351
Intercept -0.751* 0.167 -0.822* 0.172 -0.584* 0.174

BIC' -4,723.647 -4,632.682 -4,712.344 (11.303)
AIC 1.168 1.169 1.167
AUROC 0.7342 0.7331 0.7349
McFadden’s R2 0.1263 0.1253 0.1270
Observations 32,884 32,606 32,884
* p < 0.05
Note: Model evaluation statistics in bold type indicate the best performing model by that statistic. The
number in parentheses is the absolute difference between the BIC’ of the model using the categorical
measure and the model with the next lowest BIC’ score (using GHP scores). All models were estimated
with circuit and year fixed effects. Those coefficients are available upon request. For the interaction
coefficients of the categorical specification, “M” denotes the interactions between issue areas and the
“moderate” dummy and “C” denotes the interactions between issue areas and the “conservative” dummy.
41

Table 5.
Supreme Court Models with Ideology-Issue Area Interactions
Variables Segal-Cover PIMP Categorical
Ideology
Segal-Cover 0.688* 0.047
PIMP -0.566* 0.058
Moderate -0.205* 0.036
Ideology-Issue Interactions
Civil Rights & Liberties -0.194* 0.039 0.508* 0.021 1.038* 0.043
× Ideology 1.229* 0.063 -0.924* 0.080 M: -0.581* 0.053
C: -0.955* 0.054
Economic 0.281* 0.045 0.471* 0.023 0.558* 0.044
× Ideology 0.305* 0.069 -0.560* 0.088 M: -0.094.. 0.056
C: -0.221* 0.059
Criminal -0.637* 0.045 0.202* 0.024 0.784* 0.046
× Ideology 1.463* 0.070 -0.861* 0.087 M: -0.649* 0.057
C: -1.032* 0.059
Control Variables
Liberal Lower Court -0.940* 0.018 -0.928* 0.018 -0.931* 0.018
Intercept -0.371* 0.062 0.084.. 0.054 0.342* 0.058

BIC' -8,240.629 (1,154.416) -6,321.152 -7,086.213
AIC 1.264 1.290 1.279
AUROC 0.6945 0.6733 0.6845
McFadden’s R2 0.0868 0.0682 0.0761
Observations 74,789 74,789 74,789
* p < 0.05
Note: Model evaluation statistics in bold type indicate the best performing model by that statistic. The number in
parentheses is the absolute difference between the BIC’ of the model using Segal-Cover scores and the model with
the next lowest BIC’ score (using the categorical measure). All models were estimated with term fixed effects.
Those coefficients are available upon request. For the interaction coefficients of the categorical specification, “M”
denotes the interactions between issue areas and the “moderate” dummy and “C” denotes the interactions between
issue areas and the “conservative” dummy.
42

View publication stats

On The Measurement of Judicial Ideology: Justice System Journal, The October 2015

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

On The Measurement of Judicial Ideology: Justice System Journal, The October 2015

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

On the Measurement of Judicial Ideology

Article in Justice System Journal, The · October 2015

Christopher D. Johnston Maxwell Mak

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

On the Measurement of Judicial Ideology

methodology; Supreme Court

In the area of judicial politics, the importance of ideology to decision-making cannot be

circuit judge decision-making.3

As scholarship on the U.S. Courts of Appeals grows and, therefore, inclusion of

specifications of judicial ideology become more common, we believe it is important to

masking what could be non-monotonic effects of ideology.

continuous measures and a categorical measure of ideology, comparing various model

Measures of Judicial Ideology

and Republican appointees, this variable is completely appropriate. As a measure of ideology, as

meaningful behavioral differences between so-identified liberal, moderate, and conservative

could use to construct a measure of ideology. Thus, an important distinguishing characteristic of

and/or legal preferences of the nominee.

in the next section.

them as observations in a regression of common space scores on several independent variables.

a Southern Democrat or Northeastern Republican. There is no direct information on the policy

preferences of the judges.

Measurement as Theory Testing

As Jacoby notes, “all measurement is theory testing. Therefore, measurement always

given variable is operationalized at some level of measurement, whether it be nominal, ordinal,

unconsidered assigning of values to categories. It is important to recognize that this theoretical

ordinal vis-à-vis degree of orthodoxy (Jacoby 1999).

interval-level variable. While a general understanding of what this means is widespread, it is

worthwhile to consider it in a bit more detail. It is useful to consider levels of measurement as

underlying the interval-level measurement of ideology in the present context is as follows:

(1) M(si) = β0 + β1(si) ,

independent variable denoted M. As seen, the current operationalization assumes a linear

identical increase in the underlying trait.

Second, if a lower level of measurement is warranted, this implies the possibility of

substantively interesting, qualitative distinctions between categories of ideologues. In other

words, ideological distinctions may be more complex conceptually than implied by a

examination in the present field.

excluding unclassified or mixed votes. As the dependent variable is dichotomous, we estimate

judiciary, performance of the continuous measures is compared to a nominal, categorical

the excluded category.

magnitude of the effects of the lower court decision.

summary statistics for all of these variables.

[Table 1 about here]

Analysis of the Measurement Properties of Ideology

respective regression coefficients as parameters to be estimated from the data. A generalization

measurement dimensions as opposed to only the former in traditional regression analyses

only variable to which we apply the optimal scaling procedure.19

an ordinal level of measurement. Restricting transformations to an ordinal level can uncover

the key assumption of interval-level measures. Allowing the transformed variable to be

the ordinal restriction.

If ideology as measured by GHP, PIMP, or Segal-Cover scores meets all of the

in judicial voting behavior, would be those of the untransformed variable. As Figure 1

should if these scores were truly interval-level measures.

[Figure 1 about here]

[Figure 2 about here]

[Figure 3 about here]

importantly, statistical evaluation of model performance using various operationalizations of

preferred continuous measure of ideology before every examination. Furthermore, measures of

researchers to compare their findings. As a practical solution, we compare the performance of

Data and Variables section.

[Table 2 about here]