Professional Documents
Culture Documents
Mckelvey 1975
Mckelvey 1975
To cite this article: Richard D. McKelvey & William Zavoina (1975) A statistical model for the analysis of ordinal level
dependent variables, The Journal of Mathematical Sociology, 4:1, 103-120, DOI: 10.1080/0022250X.1975.9989847
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the
publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations
or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any
opinions and views expressed in this publication are the opinions and views of the authors, and are not the
views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be
independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses,
actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever
caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Journal of Mathematical Sociology © Gordon and Breach Science Publishers
1975, Vol. 4, pp 103-120 Printed in Birkenhead, England
RICHARD D. McKELVEY
University of Rochester
Downloaded by [University of Illinois at Urbana-Champaign] at 21:12 08 August 2013
and
WILLIAM ZAVOINA
Florida State University
This paper develops a model, with assumptions similar to those of the linear model, for use when
the observed dependent variable is ordinal. This model is an extension of the dichotomous probit
model, and assumes that the ordinal nature of the observed dependent variable is due to methodo-
logical limitations in collecting the data, which force the researcher to lump together and identify
various portions of an (otherwise) interval level variable. The model assumes a linear eflect of each
independent variable as well as a series of break points between categories for the dependent variable.
Maximum likelihood estimators are found for these parameters, along with their asymptotic
sampling distributions, and an analogue of R2 (the coefficient of determination in regression
analysis) is defined to measure goodness of fit. The use of the model is illustrated with an analysis
of Congressional voting on the 1965 Medicare Bill.
1. INTRODUCTION
The assumptions underlying the multivariate linear model require interval level
measurement of the dependent variable. Because of this, the linear model is not
appropriate for many social science applications. In general, even if the dependent
variable of theoretical interest is appropriately conceptualized as interval level,
measurement theory in the social sciences is simply not refined enough to generate
an interval level operationalization of this variable. The best that can be hoped for,
in most cases, is a rather crude ordinal scale which purports to represent this true
underlying variable.
The specific concern of this paper is the development of a statistical model, similar
to the multivariate linear model, for utilization in situations when the observed
dependent variable is of the above sort. The model investigated here is an ordinal
extension of the dichotomous probit model. This model, originally developed by
biometricians for the study of quantal assay (especially Finney, 1947), has also had
limited application in economics and the social sciences. The probit model has been
extended for polychotomous use by Aitchison and Silvey (1957) for ordinal dependent
variables, and by Aitchison and Bennet (1970) for categorical variables, but these
extensions are only in the context of a 2 variable model, and the assumptions are
somewhat different than those to be used here. A substantial literature has recently
103
104 R. D. MCKELVEY AND W. ZAVOINA
developed treating the same problem through the extension of binary logit analysis
(Grissle, 1971 ; McFadden, 1974; Theil, 1969,1970). However, this literature generally
assumes strictly categorical rather than ordinal data.
The organization of this paper is threefold. First, we illustrate the theoretical
problems associated with the use of regression analysis with ordinal data, and
elaborate on the types of assumptions from which an alternative model might
proceed. Second, we present the maximum likelihood estimates derived from these
assumptions. Finally, using some examples on legislative voting behavior, we
compare the substantive conclusions drawn from regression analysis with those drawn
Downloaded by [University of Illinois at Urbana-Champaign] at 21:12 08 August 2013
from the ordinal probit model. Differences in substantive conclusions can be con-
siderable.
Y=Xß+u, (2.1)
where
i U- h : i
rj \\xln...xKn\
ß 'X'Y (2.3)
FIGURÉ I
not met. Thus, for example, in Figure 1, we have a trichotomous dependent variable,
and the data indicate that there is a definite relationship between the independent
and dependent variables. Yet there does not seem to be any possible linear model
which could have generated the data and maintained an error term with mean zero
and constant variance. The least squares line, Lit has positive errors for small JS^
and negative errors for large Xlt so it will not do. In fact, if E(Y\Xj) is nonlinear,
as drawn, no linear model with E(u) = 0 would fit the data. To account for these
data, one must assume either a nonlinear model or a different error structure.
Actually, regardless of the form of the relation assumed between X1 and Y, a
different error structure seems indicated, as it will generally be the case that the
error term does not have constant variance. It is clear that for these particular data,
the variance of the error term is at a minimum at the extreme values of X\, and at
a maximum for moderate values, f
Given the above difficulties, an alternative model is clearly appropriate. In this
paper, it is assumed that the above problems are due to incomplete data on the
dependent variable. A distinction is made between the dependent variable of
theoretical interest, Y, and the observed dependent variable, Z. We assume that
the variable of theoretical interest is interval level and would, if we could measure
it, satisfy a linear model. Due to the inadequate measurement techniques, we only
observe an ordinal version of Y, namely Z, for which the linear model is not satisfied.
More formally, we assume, as above, that the variable of theoretical interest, Y,
satisfies :
Y=Xß+u (2.4)
but we make stronger assumptions on the error term, namely we assume that
(2.5)
†
In the binary case, where Y is a dummy variable taking on the values 0 and 1, if E(Y/X) = p,
the Var (Y|X)=p(1-p).
106 R. D. MCKELVEY AND W. ZAVOINA
I.e., u is multívariate normal with expectation 0 and variance covariance matrix (T2/.
Further, we assume that Z is a categorical variable with M response categories
Ru..., RM, which arises from the unobserved variable, Y, as follows: We assume
there are M +1 extended real numbers, /x 0 ,¡x u ...,y. M> with ¡i0 = — oo, [iM = + oo
and with ¡i0 < pt < . . . s ftu, such that
farlgjzM.
Since Z is ordinal, it can also be represented as a series of dummy variables. In
Downloaded by [University of Illinois at Urbana-Champaign] at 21:12 08 August 2013
Puffing together the assumptions in (2.4), (2.5), and (2.6), we can write directly
the probability function of the observed dependent variable, Z, as follows: For
any 1 < / s M, and 1 < j < n, we have, from (2.4) and (2.6)
É ft3Ttf
2 (2.7)
i-i- Z
(-0
= 1] = Pr[Zj (2.8)
where <5>(t) represents the cumulative standard normal density function. I.e.,
f (2.9)
generality, that /¿j = 0 and a = 1. We get, then, the model which we will investigate:
L = L(Z\ß0 ßnPzi-'^Pit-u
L* = log L
The log likelihood function, L*, then, is a function of (ß0 ßK, \i2,..., /¿M_ t ) and
we wish to locate a maximum to L* subject to the constraint that fit < (i2 ^ • • • ^ Í^M- I •
This is simply a constrained maximization problem over a convex set, which we
proceed to solve.
Letting NJJc = —j~- e~rlk'2 for 1 ^ k < M and 1 <j <. n, and letting htJ be the
Kroneker delta, i.e.,
108 R. D. MCKELVEY AND W. ZAVOINA
it follows that
N
J* \j* YJ = -Nj.kXuJ for 0 < u < K
(3.6)
^ ^ . . = - Yjjfjjfi.* for 2 < u £ M - 1
Equations (3.5) and (3.6) can be used to calculate the Q partial derivatives of (3.4)
with respect to the unknown parameters, as well as the QxQ matrix of second
partials. We get, for the first partial derivatives:
eL
* v vz
for 0 < « <: ÍT, and (3.7)
z
for2<u<M-l.
Further, the second partials are:
S2L*
L
ORDINAL LEVEL DEPENDENT VARIABLES 109
To obtain the MLE, then, the Q equations in (3.7) are set to zero and solved for
the Q unknowns, f In order to insure that the solution is a maximum, the matrix
of second partíais, whose entries are given by (3^8), evaluated at the solution point
should be negative definite. Unfortunately, the equations generated by (3.7) are not
linear in the unknowns, so they cannot be solved by ordinary methods for simul-
taneous linear equations. Rather, an iterative method of solution, the Newton
Raphson method for Q dimensions is used.J One of the authors has written a
computer program, hereinafter referred to as NPROBIT, which uses the Newton
Raphson algorithm to iterate towards a root of the above equations. § Results from
the use of this program are presented in the final section of this paper.
Downloaded by [University of Illinois at Urbana-Champaign] at 21:12 08 August 2013
There are several difficulties with the above procedure. First, there is no guarantee
of convergence of the iterative procedure. In practice, this has not been a problem,
the procedure has always converged fairly rapidly, with the convergence taking
between four and ten iterations, depending on the number of parameters to be
estimated.
A more important problem, which is common to such nonlinear problems, is that
there is no guarantee that the process converges to a global, as opposed to a relative
maximum, since there may be multiple roots to the equations generated by (3.7).
A proof that the matrix of second partíais is everywhere negative definite, or a
proof that L* is a quasi concave function would eliminate the possibility of multiple
roots. But, in the absence of such proofs, the possibility of convergence to a relative
maximum cannot be excluded.
Testing Hypotheses
One of the advantages of using maximum likelihood estimates is the nice statistical
properties which the MLE has. Under fairly general conditions the estimates are
consistent and asymptotically efficient, and their asymptotic sampling distribution
is known.U Also, hypotheses can be tested either by the use of this sampling
distribution or by the use of the likelihood ratio.
If V is the QxQ matrix of second partíais of the log likelihood function evaluated
at the MLE ê, and if V~l is the inverse of this matrix, then the MLE, Ô has an
asymptotic distribution v/hich is multivariate normal with means equal to the true
values of the parameters, and variance-covariance matrix estimated by — V~l.
† If the maximum lies on the boundary of the constraint set, then Lagrangian multipliers must
be introduced and these equations solved. In practice, as long as all categories of the dependent
variable have observations in them, the maxima seem to naturally occur in the interior of the
constraint set, so we have ignored the constraint. There do appear to be local maxima outside of
the constraint set, however. So one must be careful to keep the iterative procedure from crossing
into this region. We are grateful to Dr. Fritz Bedall for bringing to our attention some examples
illustrating this possibility.
‡ See, e.g., Draper and Smith (1966: 267-278), or Hildebrand (1974: 584 ff) for a description of
the Newton Raphson method.
§ Copies of this program, along with documentation and test data can be obtained from Richard
McKelvey, Political Science Department, University of Rochester, Rochester, N.Y. 14627. For a
brief description of the program, see McKelvey and Zavoina (1971).
¶ See, e.g., Goldberger (1964:131), or Wilks (1962: 358 ff).
110 R. D. MCKELVEY AND W. ZAVOINA
Thus, in large samples ß, will be normal with mean ß, and standard error
where — VfJ is the ij^1 entry of the variance covariance matrix. For any j8 (0 the
statistic
R-R. -
(3.10)
Downloaded by [University of Illinois at Urbana-Champaign] at 21:12 08 August 2013
To test the overall significance of the independent variables, one would let Ô' be
the MLE when ßi,...,ßK a re constrained to be 0. If rk denotes the total number
of observations with Z e Rk, i.e., if rk = £ Z M , then one would find Ö' to be
J=i
the point such that
ßo) = £ (3.13)
for 1 < k <, M
and the probability of the sample at this point, from (3.3) and (3.5) is
L=nir) (3-I4>
so
M
rk log rk-n log n. (3.15)
Ô is the MLE when all the variables are in the equation, so L*0) is simply the log
likelihood function evaluated at this point. Hence the likelihood ratio test for the
overall significance of the independent variables becomes
Goodness of Fit
There are several statistics which can be used to measure the overall fit of the model.
The most useful of these is the estimated R2, which gives an estimate of the R2 of
the underlying regression model. This is equivalent to the R2, or coefficient of
determination in regression analysis and has a similar interpretation, namely, it
measures the portion of the original variance of the dependent variable explained
by the probit analysis.
Although there is no way of knowing the variance of the dependent variable on
its underlying interval scale, it can be estimated as follows: the model assumes that
the dependent variable on its underlying interval scale satisfies a regression model.
Moreover, the dependent variable is normalized so that the variance around the regres-
sion line, a2, is unity. If ß=(ß0,. • • ,ßK) is the MLE for ß, we define t0 = ¿ ß,X,j,
i=1
and e¡ = {Y¡- tj). Then, since
§2 = n a2 = n. (3.18)
Next, the explained sum of squares S% can be computed using ordinary procedures
tobe
(3.19)
i= l
where 7 = ¿ ÎJn.
Thus, we can estimate the total sum of squares by summing (3.18) and (3.19) to
obtain
= Ê Pf+n. (3.20)
112 R. D. MCKELVEY AND W. ZAVO1NA
2
It follows that R can be computed, in the usual manner, to be
Ç2 Lu
A2=-£== '-'
The computer program we have written calculates A2, and as mentioned above,
this statistic can be interpreted much the same as the corresponding R2 for regression
Downloaded by [University of Illinois at Urbana-Champaign] at 21:12 08 August 2013
% = kj (3.23)
where k¡ is that value 1 < kj £ M which maximizes P^j. Both Z¡ and Z¡ are
ordinal variables, and hence any of a number of ordinal measures of strength of
association can be used to determine the strength of association between predicted
ORDINAL LEVEL DEPENDENT VARIABLES 113
and actual values of the dependent variables. NPROBIT displays both Z¡ and Z¡
(as well as each PkJ) for each observation and calculates the Spearman rank order
correlation between the two.
† For a more detailed discussion of the circumstances surrounding the passage of this bill, see
the Congressional Quarterly Almanac (1965: 236 ff).
H
114 R. D. MCKELVEY AND W. ZAVOINA
Variable Measurement
Vote Scale Score on Medicare:
0 if against (i.e., Y on CQ34, N on CQ35)
1 if weakly for (i.e., Y on CQ34, Y on CQ35)
2 if strongly for (i.e., N on CQ34, Y on CQ35)
Variable Measurement^
Party 0 if Democrat
1 if Republican
Region 0 if Non-South
1 if South
Employment Percent civilian unemployed in district
Old Percent over 65
Population Population density in thousands per square mile.
We consider four different models which hypothesize that vote (or the probability
of voting a particular way in the case of probit analysis) is a linear function of
various combinations of the above independent variables. In Model 1, vote is a
function only of the main party groupings (Party and Region). In Model 2, vote is
a function also of the constituency support variables (Old and Employment), and
in Models 3 and 4, we introduce the effect of urbanization (Population) into Models
1 and 2 respectively.
Table 1 represents the results of estimating the four different models, first with
regression analysis and then with probit analysis. One cannot directly compare the
ß coefficients in the two models because they represent different things in each
model. In the regression model, the ß represents the amount of change in the
observed value of the dependent variable which is brought about by a unit change
in the independent variable. Since the coding of the (ordinal level) dependent variable
is arbitrary, this value will depend on the particular coding which is chosen. In the
probit analysis, on the other hand, ß represents the amount of change in the
dependent variable on its (hypothesized) underlying scale which is brought about by
a unit change in the independent variable. (In terms of the observed data, this
translates into the increment in probability of being in a higher response category
brought about by a unit change in the independent variable.) Because of the ordinal
† Data for the dependent variable are from the Congressional Quarterly Almanac (3965). Data
for the independent variables are from the Congressional District Data Book, Districts of the 88th
Congress, and from supplements indicating changes for the 89th Congress.
ORDINAL LEVEL DEPENDENT VARIABLES 115
TABLE 1
Comparison of regression and probit analysis in four models explaining Congressional voting
behavior on 1965 Medicare Bill (n = 427)
•fThe F statistic has an F distribution with K and n—(K+l) d.f. Here K is the number of
independent variables. All the models above are significant at the .01 level.
% The A* statistic is x2 with K d.f. Again, all models are significant at the .01 level.
The results of Table 1 illustrate several points. The first thing to note is the general
tendency, which is borne out in these tables, of the regression analysis to give lower
estimates of fit (i.e., R2) than does the probit analysis. This makes sense in light of
our development above of the n-chotomous model. It is obvious, for example, that
a perfect fit to the probit model would translate, under the categorization of the
dependent variable, into a poor fit to the regression model. Such a possibility is
illustrated in Figure 2. Even when the original fit to the probit model is not perfect,
one would expect the categorization to introduce the same sort of tendency, but we
Ho"
i»m # »mm
1 •
(a)
FIGURE2
would expect the differences to become more extreme the stronger the true relation,
with regression always underestimating the fit. We again stress, however, that one
should avoid ovçrinterpreting the R2 of the n-chotomous probit analysis until its
sampling distribution is known. Thus, although it is clear that the R2 of the
regression is generally an underestimate of the fit of a model which is generated by
the n-chotomous probit assumptions, it is not clear how much better R2 is. For
example, it is possible that R2 may in general overestimate the true R2 of the under-
lying model. More work needs to be done on the properties of R2.
The second thing to note in Table 1 is the strange behavior of the variable
"Population". Thus, before the introduction of this variable (i.e., in Models 1 and 2),
we can see that although there are slight differences, the basic conclusions of the
probit and regression runs are similar. The relative strengths of the independent
ORDINAL LEVEL DEPENDENT VARIABLES 117
Downloaded by [University of Illinois at Urbana-Champaign] at 21:12 08 August 2013
10 20 30 40 50 - 60 70 80 SO 100
Population (in thousand/sq.ml.)
FIGURE 3
variables (as measured by the &*'s) are similar, and the results of the significance
tests on the coefficients are similar. However, with the introduction of the variable
"Population" into the models (in Models 3 and 4), we see that now the probit and
regression models differ more significantly in their conclusions. The most striking
difference can be found in a comparison of the standardized coeflScients (the &*'s).
Here, the regression analysis finds that the original party grouping variables retain
their importance, and population explains a very small portion of the variance. In
contrast, the n-chotomous probit analysis finds that the variable "Population" is
important, explaining a larger portion of the variance than either of the original
party grouping variables. Comparing the results of the significance tests, probit
would retain "Population" as significant at the .05 level, while the regression
analysis would not.
The characteristics of the variable "Population" which cause the different con-
clusions to be drawn by the two models can be illustrated by a look at the univariate
relation between Population and Vote. Thus, in Figure 3, we can see that this
variable has the property that all Congressmen from very densely populated areas
voted for Medicare. They seem all to be very clearly for Medicare, and end up in
category 2, most likely, for lack of a stronger category. But while probit analysis
interprets additional data in the circled region as additional evidence for a stronger
relation (as indeed it seems it should be interpreted), regression interprets such data
as additional evidence for a weaker relation. We thus see in this example an
illustration of the problems discussed from a theoretical point of view in the first
two sections. In other words, the correlation between error and regressor which is
introduced when using regression analysis to estimate the underlying relation causes
a bias which is dependent on the distribution of the independent variable. This bias
has the effect of causing the regression analysis at times to interpret data which are
indicative of a strong relation as demonstrating, instead, a weak relation.
Finally, before concluding, we illustrate the use of n-chotomous probit analysis
for scaling of the dependent variable and for predicting the probability of being in
each response category. In Table 1, we were interested primarily in a comparison of
118 R. D. MCKELVEY AND W. ZAVOINA
TABLE 2
Represents
Parameter Effect of MLE Standard Error z
ßo Constant 2.598 .175 14.82
ßi Party -2.490 .182 13.65
ßi Region -1.899 .181 10.51
Pi CQ35 0.0
V-i CQ34 .948 .088 10.71
Downloaded by [University of Illinois at Urbana-Champaign] at 21:12 08 August 2013
L* = -285.67
A* = 286.83 (x1 with 2 df.)
probit to regression analysis, so we have only reported the estimated /S's. In addition,
of course, the n-chotomous probit analysis estimates the cutting points of the scale,
or the ¡ik's as they have been labeled in the formal development.
We illustrate the above for Model 1. Thus, in Table 2, we give the complete
estimation of Model 1. It follows that 0 and .96 are the estimated positions of the
two stimuli (i.e., CQ35 and CQ34 respectively) on the underlying scale. (Note that
since there are only two stimuli, the normalization procedure automatically sets one
of these to 0.) It follows that respondents falling below a particular stimulus respond
negatively (i.e., against Medicare) to that stimulus while respondents falling above
the stimulus respond positively to it. Thus, respondents falling below /xt = 0 fall in
response category 0, and respond negatively to both stimuli. Thus, they vote for
recommittal and against passage. Those respondents falling between /^i = 0 and
/i 2 = .96 fall in response category 1 and respond positively to CQ35 and negatively
to CQ34. Thus, they vote for recommittal and for passage. Finally, those above
fi2 — .96 respond positively to both stimuli. They vote against recommittal and for
passage.
Further, by substituting into the probit equation, we can find the expectation of
any respondent on this underlying continuum and the estimated probability of his
ND
FIGURE 4
ORDINAL LEVEL DEPENDENT VARIABLES 119
falling in any of the response categories. Thus, for the four main party and regional
groupings, we have:
It follows that if the estimated model is the true model, one can compute the
theoretical probabilities of being in each category using equation (3.3), i.e., by
assuming that respondents are distributed around their expected score with unit
variance. Thus, in Figure 4, and Table 3, we have illustrated this procedure, and
have compared the predicted to the observed frequencies.
TABLE 3
(1) (57)
(35) (16)
0 .6 46.3
31.8 76.2 .5 24.3 46.1 96.4
(3) (50)
(31) (5)
1 1.7 28.2
40.7 23.8 4.5 35.5 34.1 3.1
(169) (44)
(16) (0)
2 97.7 40.0
13.0 0.0 95.1 40.2 19.8 0.3
(173) (110) (123) (21)
TOTALS 100% 100% 100% 100% 100% 100% 100% 100%
i The entries represent percentages of column totals. Figures in parentheses are the total in the
cell. Columns may not add to 100% due to rounding error.
CONCLUSION
We have argued above that regression analysis is an inappropriate technique to
apply to ordinal level dependent variables. Regression is inadequate not because of
the failure to model the true relation. Rather, we assume that the true relation ¿y
described by a linear model, and that the failure of the regression model to describe
the observed data is due to the inherent loss of information that is introduced when
the continuous dependent variable is measured by gross techniques which lump
together and identify various portions of the scale. The net effect is that if this is in
fact the process by which the data have been generated, there is a correlation between
error and regressor when regression is applied to the observed data. Consequently,
a bias is introduced into the estimate of ß which is dependent on the distribution of
the independent variable. This bias may, in some cases, have the undesirable effect
of causing regression analysis to severely underestimate the relative impact of certain
variables.
120 R. D. MCKELVEY AND W. ZAVOINA
REFERENCES
Hildebrand, F. B. (1974) Introduction to Numerical Analysis, 2nd edition. New York: McGraw Hill.
McFadden, D. (1974) Analysis of Qualitative Choice Behavior, in Frontiers of Econometrics,
P. Zaremba, ed. New York: Academic Press.
McKelvey, R. and W. Zavoina. An IBM Fortran IV Program to Perform N-Chotomous Multi-
variate Probit Analysis. Behavioral Science 16 (March, 1971) pp. 186-187.
Theil, H. (1969) A Multinominal Extension of the Linear Logit Model. International Economic
Review 10, 251-259.
Theil, H. (1970) On the Estimation of Relationships Involving Qualitative Variables. American
Journal of Sociology 76, 103-154.
Wilks, S. S. (1962) Mathematical Statistics. New York: John Wiley and Sons.