Mood LogisticRegressionCannot 2010

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Logistic Regression: Why We Cannot Do What We Think We Can Do, and What We Can

Do About It
Author(s): Carina Mood
Source: European Sociological Review , FEBRUARY 2010, Vol. 26, No. 1 (FEBRUARY 2010),
pp. 67-82
Published by: Oxford University Press

Stable URL: https://www.jstor.org/stable/40602478

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Oxford University Press is collaborating with JSTOR to digitize, preserve and extend access to
European Sociological Review

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
European Sociological Review volume 26 | number ι ι 2010 67-82 67
DOI:10.1093/esr/jcp006, available online at www.esr.oxfordjournals.org
Online publication 9 March 2009

Logistic Regression: Why We


Cannot Do What We Think
We Can Do, and What We Can
Do About It
Carina Mood

Logistic regression estimates do not behave like linear regression estimates in one
important respect: They are affected by omitted variables, even when these variables ar
unrelated to the independent variables in the model. This fact has important implication
that have gone largely unnoticed by sociologists. Importantly, we cannot straightforwardly
interpret log-odds ratios or odds ratios as effect measures, because they also reflect
the degree of unobserved heterogeneity in the model. In addition, we cannot compare
log-odds ratios or odds ratios for similar models across groups, samples, or time points,
or across models with different independent variables in a sample. This article discusses
these problems and possible ways of overcoming them.

Introduction the variation in the dependent variable that is caused


by variables that are not observed (i.e. omitted
variables).2 Many sociologists are familiar with the
The use of logistic regression is routine in the social
sciences when studying outcomes that are naturally or problems of bias in effect estimates that arise if
necessarily represented by binary variables. Examples omitted variables are correlated with the observed

independent variables, as this is the case in ordinary


are many in stratification research (educational transi-
tions, promotion), demographic research (divorce, least squares (OLS) regression. However, few recognize
childbirth, nest-leaving), social medicine (diagnosis,that in logistic regression omitted variables affect
mortality), research into social exclusion (unemploy-coefficients also through another mechanism, which
ment, benefit take-up), and research about operates regardless of whether omitted variables are
political behaviour (voting, participation in collective
correlated to the independent variables or not. This
action). When facing a dichotomous dependent article sheds light on the problem of unobserved
variable, sociologists almost automatically turn heterogeneity
to in logistic regression and highlights three
logistic regression, and this practice is generally important but overlooked consequences:
recommended in textbooks in quantitative methodol-
(i) It is problematic to interpret log-odds ratios
ogy. However, our common ways of interpreting
(LnOR) or odds ratios (OR) as substantive
results from logistic regression have some important
problems.1 effects, because they also reflect unobserved
The problems stem from unobservables, or the fact heterogeneity.
(ii) It is problematic to compare LnOR or OR
that we can seldom include in a model all variables

that affect an outcome. Unobserved heterogeneity is across models with different independent

© The Author 2009. Published by Oxford University Press. All rights reserved.
For permissions, please e-mail: journals.permissions@oxfordjournals.org

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
68 I MOOD

variables, because the unobserved heterogene- about doing so while others are less enthusiastic, and
ity is likely to vary across models. that among those not participating there are some
(iii) It is problematic to compare LnOR or OR who can easily tip over to participation and some who
will never participate. Thus, the latent variable can in
across samples, across groups within samples,
or over time - even when we use models with this case be perceived as a 'taste for participation' that
underlies the choice of participation.
the same independent variables - because the
For simplicity, we assume only one independent
unobserved heterogeneity can vary across the variable. The latent variable model can then be
compared samples, groups, or points in time. written as:

Though some alarms have been raised about these y? = a + xußi + ει (1)
problems (Allison, 1999), these insights have not
where y* is the unobserved individual propensity,
penetrated our research practice and the inter-
is the independent variable observed for individua
relations between the problems have not been fully
a and β ι are parameters, and the errors ε, a
appreciated. Moreover, these problems are ignored
unobserved but assumed to be independent of Χχ.
or even misreported in commonly used methodology
estimate this model, we must also assume a certain
books. After introducing and exemplifying the behav-
distribution of ε,·, and in logistic regression we assu
iour of logistic regression effect estimates in light of
a standard logistic distribution with a fixed variance
unobserved heterogeneity, I outline these three prob- 3.29. This distribution is convenient because it result
lems. Then, I describe and discuss possible strategies of
in predictions of the logit, which can be interpret
overcoming them, and I conclude with some recom-
as the natural logarithm of the odds of having y =
mendations for ordinary users.
versus y = 0. Thus, logistic regression assumes the log
to be linearly related to the independent variables:
The Root of The Problems:
Unobserved Heterogeneity ijy^l =α + χι,·&ι (2)
where Ρ is the probability that y- 1. For purposes
In this section, I demonstrate the logic behind the
interpretation, the logit may easily be transformed
effect of unobserved heterogeneity on logistic regres-
sion coefficients. I describe the problem in termsodds
of or probabilities. The odds that y{ = 1 is obtain
an underlying latent variable, and I use a simple by exp(logit), and the probability by exp(logit)
example with one dependent variable (y) and two [1 + exp (logit)]. The logit can vary from - oo
independent variables (χχ and x2) to show that +00,
the but it always translates to probabilities abov
LnOR or OR of χγ will be affected in two differentand below 1. Transforming the logit reveals th
we assume that yt = 1 if the odds is above 1 or, wh
ways by excluding x2 from the model: First, it will
be biased upwards or downwards by a factor deter- makes intuitive sense, the probability is above 0
Results
mined by (a) the correlation between xx and x2 and (b) from logistic regression are commonly p
the correlation between x2 and y when controlling sented in terms of log-odds ratios (LnOR) or o
ratios
for xx. Second, it will be biased downwards by a factor (OR). Log-odds ratios correspond to b
determined by the difference in the residual varianceEquation 2, and odds ratios are obtained by ebl>
between the model including x2 and the model LnOR gives the additive effect on the logit, while
gives the multiplicative effect on the odds of y= 1 o
excluding it. For readers less comfortable with algebra,
y = 0. In other words, LnOR tells us how much
verbal and graphical explanations of the problem and
logit increases if xx increases by one unit, and OR
extensive examples are given in the following sections.
One can think of logistic regression as a way us how many times higher the odds of y = 1 is if
of modelling the dichotomous outcome (y) as the increases by one unit. While the effect on probabil
depends on the values of other independent variab
observed effect of an unobserved propensity (or latent
variable) (/"), so that when y*>0, y=l, and when in a model, LnOR and OR hold for any values of
y*<0, y = 0. The latent variable is in turn linearlyother independent variables.
related to the independent variables in the model The total variance in y* in Equation 1 consist
of explained and unexplained variance. When we
(Long, 1997, section 3.2). For example, if our observed
binary outcome is participation in some collective Equation 2 to estimate this underlying latent vari
action (yes/no), we can imagine that among those whomodel we force the unexplained (residual) part of
variance to be fixed. This means that any increase
participate, there are some who are highly enthusiastic

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
LOGISTIC REGRESSION | 69

the explained variance forces the total variance of the Taken together, these two problems imply that if
dependent variable, and hence its scale, to increase. we exclude x2 from Equation 4, instead of βλ we
When the scale of the dependent variable increases, bx estimate:4
must also increase since it now expresses the change in
the dependent variable in another metric. So because h = (ßi + ßiYi) , ^2 = (6)
ν/3.29 , + $var(v)
the residual variance is fixed, the coefficients in (2)
estimate the effect on the dependent variable on a scale If χλ and x2 are uncorrelated, Equation 6 collapses to
that is not fixed, but depends on the degree of
unobserved heterogeneity. This means that the size of hl=ßx-^= (7)
bx reflects not only the effect of xx but also the degree v/3.29 + $var(x2)
of unobserved heterogeneity in the model. This can be
Therefore, the size of the unobserved het
seen if we rewrite Equation 1 in the following way:
depends on the variances of omitted variables
y[ =α + χιφι+σεί (3) effects on y, and LnOR and OR from
where everything is as in Equation 1,regression except for the
are affected by unobserved hete
fact that the variance of ε{ is now fixedeven and when
the factor
it is unrelated to the included in
σ adjusts ε,· to reflect its true variance,variables. Thisof
σ is the ratio is a fact that is very often o
the true standard deviation of the errors to the
For assumed
example, one of the standard sociolog
standard deviation of the errors. Because we cannot ences on logistic regression, Menard (199
observe σ, and because we force ε,· to have a fixed states that Omitting relevant variables from
tion in logistic regression results in biased c
variance, bx in the logistic regression model (Equation
2) estimates βχ1σ and not βχ (cf. Gail, Wieand and for the independent variables, to the exten
Piantadosi, 1984; Wooldridge, 2002: pp. 470-472).3 omitted
In variable is correlated with the in
other words, we standardise the true coefficients βχ variables', and goes on to say that the dir
so that the residuals can have the variance of the size of bias follows the same rules as in linear

standard logistic distribution (3.29). regression. This is clearly misleading, as the coefficients
To further clarify the consequences of unobserved for the independent variables will as a matter of
fact change when including other variables that are
heterogeneity in logistic regression, consider omitting
the variable x2 when the true underlying model is correlated with the dependent variable, even when
these are unrelated to the original independent
y?=<x + xußi + x2iß2 + σ ε{ (4) variables.

where ε,· is logistically distributed with a true variance


of 3.29 (which means that σ= 1 in this case), and the
relation between x2 and xx is
*2i = tt) + YiXu + Vf (5)
An Example Using
Probabilities
where y0 and γχ are parameters to be estimated and v,·
is the error term (which is uncorrelated to ε, in
The previous section has explained how logistic
Equation 4). The omission of x2 from Equation 4 leads
regression coefficients depend on unobserved hetero-
to two problems, one that is familiar from geneity.
the linear
The logic behind this is perhaps most easily
regression case and one that is not. First, grasped
just as when
in we express it from the perspective of the
linear regression, the effect of xx is confounded with
estimated probabilities of y='. Imagine that we are
the effect of x2, so that when (5) is substituted into (4),in the effect of χλ on y and that y is also
interested
the effect of xx becomes β'-'-β2γ'. That is, to the
affected by a dummy variable x2 defining two equal-
degree that xx and x2 are correlated β ι captures the
sized groups. For concreteness, assume that xx is
effect of x2. intelligence (as measured by some IQ test), y is the
The second problem, which does not occur in linear transition to university studies, and x2 is sex (boy/girl).
regression, is that the residual variance increases. In I construct a set of artificial data5 where (a) IQ is
Equation 4, a=V3.29/V3.29, that is, the assumed normally distributed and strongly related to the
variance equals the true variance and βχΐσ is just βχ. transition to university, (b) girls are much more
When omitting x2, the true residual variance becomes likely to enter university than are boys, and (c) sex
var^) + )#2var(v), and as a consequence σ changes to and IQ are uncorrelated (all these relations are
^3.29 + $var(v)/V£29. realities in many developed countries). I estimate the

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
70 I MOOD

following logistic models and the results are shown in P(y= l) = odds/(l +odds).
Table 1: dicted probabilities from
Table 1, for different value
Model 1: y* = a + xlißl + ei
the following curves: (1)
Model 2: yf = a + xu βγ + x2i β2 + £,·
probability at different
R(xu x2)=0;P(y=l)=0.5
(2) the predicted transitio
values of IQ from Model 2
In Table 1, we can see that the LnOR and OR for IQ
and (4) the predicted tra
increase when we control for sex - even though IQ Model
and 2 for boys and gi
sex are not correlated. This happens because we average
can predicted transitio
explain the transition to university better with both
for small intervals of the
IQ and sex than with only IQ, so in Model 2 the In Figure 1, we can see t
unobserved heterogeneity is less than in Model 1. quite
The different. In addit
stronger the correlation between sex and the transition
though there is no interac
to university, the larger is the difference in the
predicted transition proba
coefficient for IQ between Models 1 and 2. IQ differs between boys a
The estimated odds of y - 1 versus y = 0 canThebe reason that curves (3)
translated to probabilities (P) through the formula
and girls have different av
(i.e. different intercepts
Table 1 Logistic regression estimates Importantly, curves (2), (3
sense that for any point o
Model 1 Model 2
same slope, which means
LnOR OR LnOR OR
the same OR and LnOR (r
LnOR and OR is constant
IQ 0.80 2.24 0.99 2.69
Sex 2.00 7.36 though the effect on the t
Constant -0.01 -1.01 However, curve (1) is mo
represents a smaller OR an

Figure 1 Predicted probabilities from Model 1 and

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
LOGISTIC REGRESSION | 71

Recall that both Model 1 and Model 2 are estimated Table 2 Average estimate of βλ for different values
using the same observed combinations of IQ andof β2 and calculated estimate of βλ
transitions. Even though boys and girls have different
02 = 0.5 β2 = λ β2 = 2
transition probabilities at different values of IQ, the
average predicted transition probability among all
ßl9 Model 1 0.95 0.84 0.61
students at different values of IQ should be approxi- (0.03) (0.03) (0.03)
mately the same in Model 1 as in Model 2 (since IQßly Model 2 1.00 1.00 1.00
and sex are uncorrelated). This is confirmed by curve (0.03) (0.03) (0.03)
ßi, Equation 7 0.97 0.88 0.67
(5), which is the average for small intervals of IQ of
the predicted transition probabilities that are described
Standard deviations in parentheses. 1,000 replicati
by curves (3) and (4).6 So curve (1) approximately
represents the average of curves (3) and (4) on the
probability scale. However, the curve describing the
average of the probabilities for boys and girls does
within the range of commonly observed
not represent the average of the OR for boys and girls
Take note, however, that in this simulation, the
[curves (3) and (4)]. Curve (2), on the other hand,
unobserved heterogeneity is represented by a single
correctly represents the average of the OR of curves (3)
variable, while in most real cases it will consist of a
and (4) (or rather, it represents the same OR as
range of variables.
these curves). So Model 1 and curve (1) estimate
Table 2 reports the average of β γ from these
transition probabilities at different values of IQ and the
simulations, where Model 1 is the model excluding
corresponding OR for the population, but for any single
x2 and Model 2 is the true model including x2. The last
individual in the population the effects of IQ on
row gives the attenuated estimate of βλ as calculated
the transition probability is better captured by Model 2
by Equation 7.
and curves (3) and (4). If expressed on the probability We can see in Table 2 that the estimate of β ι is
scale, the average of these curves [as represented by
clearly biased towards zero when not accounting for
curve (1)] makes intuitive sense as an average in that x2, and this bias increases with the size of β2. The table
it underestimates the transition probability for girls
demonstrates that bias occurs already at relatively
and overestimates it for boys. However, because curve
modest, and common, effect sizes and may become a
(1) is more stretched out than curves (3) and (4), its
considerable obstacle to interpretation of effects when
OR is smaller, and this OR will therefore under-
important predictors are left out of the model. Recall,
estimate the OR for all individuals, i.e. for girls as well
however, that the bias actually depends not only on β2
as for boys.7
but also on the variance in x2 (which is invariantly
set to one in this example). Table 2 also reveals that βγ
in this example is even more biased than Equation 7
Simulation of the Impact of would suggest. This is in line with the results in
Unobserved Heterogeneity Cramer (2003), and the discrepancy can be explained
by the misspecification of the shape of the error
A fundamental question is how large effects we can distribution in Model 1 (cf. Note 6).
expect from a given level of unobserved heterogeneity
on the estimates in logistic regression. In order to
show the impact of unobserved heterogeneity, I carry
Interpretation of Log-Odds
out 1,000 simulations of a sequence of logistic
regressions. Each simulation constructs an artificial
Ratios and Odds Ratios as
dataset (n= 10,000) where variables xx and x2 are Effect Estimates
continuous variables drawn from normal distributions
(command drawnorm in Stata 10.0). Both have a mean In Figure 1, we could see that curves (2), (3), and (4)
of 0 and a standard deviation of 1, and they are had a common slope (i.e. their OR and LnOR were the
uncorrelated. y* is determined by Equation 4, where αsame), but that curve (1) looked different. That is, the
is invariantly set to 0, βλ to 1, and et follows a logistic LnOR and OR describing the relation between x1 and
distribution with a variance of 3.29. The size of
P(y= 1) with x2 set at fixed values [curves (2), (3), and
(4)] are
the unobserved heterogeneity is varied by setting β2 different from the LnOR and OR for the
(the LnOR for x2) to 0.5, 1.0, or 2.0, which are same relation averaged over the distribution of x2

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
72 I MOOD

[curve (1)]. None of these curves is inherently 'wrong',


but they estimate different underlying quantities. Curve
Comparing Log-Odds Ratios or
(1) represents the population- averaged effect of X' on Odds Ratios Across Models in
P(y=l), while curves (2), (3), and (4) represent the a Sample
effect of Χι on P(y=l) conditional on having a
certain value on x2. Hence, in the logistic regression In Table 1 above, we saw that the LnOR and OR for
without x2 we obtain the LnOR or OR corresponding one variable (χι) can change when we control for
to a population-averaged probability curve, while the another variable (x2)> even though xx and x2 are not
logistic regression with x2 gives us the LnOR or OR correlated. This has the consequence that the common
corresponding to a conditional probability curve. practice of 'accounting for' effects by including new
We can think of the conditional estimates as independent variables in a model may lead researchers
moving from the aggregate level in direction towards
astray. In OLS regression we can start with a bivariate
the individual-level effects, i.e. the effects that wouldof, e.g. the sex wage gap, and control for a
model
occur for individuals upon a change in the indepen-
range of variables to see to which extent these variables
dent variable. In terms of the above example, each for the wage gap. If we swap the dependent
account
individual must be either a boy or a girl and hence from wages to promotion and use logistic
variable
the OR or LnOR for IQ conditional on sex comes
regression, we cannot use the same procedure, as the
closer to the individual-level effect than the OR or changes in coefficients across models can depend also
LnOR for IQ from the bivariate model. Controlling on changes in unobserved heterogeneity.
for any other variable that improves our predictionUsing another example, say that we estimate
of the university transition moves the OR for IQ even regional differences in unemployment, starting with
closer to the individual-level effect. As a consequence, a bivariate model, and in the second step controlling
estimates of a variable's LnOR or OR from a model for education. If we do this on data from a country
averaging P(y = 1 ) over the distribution of some where there are no differences in average level of
characteristic may be poor approximations of the education between regions, and if education is
same variable's LnOR or OR from a model condition- relevant in explaining unemployment, the LnOR or
ing on the characteristic, even if this characteristic is OR for the region variable will suggest a larger effect
unrelated to the independent variable. of region when controlling for education. In a linear
As a consequence of the above, when using logistic regression, this would be interpreted as education
regression, we should be even more cautious tobeing a suppressor variable, meaning that the regional
interpret our estimates as causal effects than we are difference was partially suppressed by a higher level
when we use linear regression. It is difficult to control of education in regions with more unemployment.
for all factors related to both independent and In logistic regression, we cannot draw such a
dependent variables, but it is of course even more conclusion.
difficult to control for all variables that are important Now imagine the same analysis on data from a
for explaining the dependent variable. Considering country where there are regional differences in
the small degree of variance that we can usually education. The change in the estimated LnOR or OR
explain, unobserved heterogeneity is almost always for the region variable will then depend both on the
present. The problem need not be serious in all relation between region and education and on the
applications, but the risk that it is should be a causechange of scale induced by the reduction of unob-
for caution when interpreting results. served heterogeneity when including education in the
Recall that even if we do not know the size of the model. Because less-unobserved heterogeneity leads
impact of unobserved heterogeneity unrelated to the to larger effect terms, a downward change in the LnOR
independent variables, we always know the direction of(towards 0) or in the OR (towards 1) for the region
variable when education is added to the model cannot
the impact: it can only lead to an underestimation of
be due to a change of scale. This means that if the
the effect that one would estimate accounting for the
unobserved heterogeneity. In addition, unobservedLnOR or OR for the region variable decreases in
heterogeneity that is unrelated to the independent strength when adding education to the model, we can
variables does not affect our possibilities to drawconclude that this is due to the relation between region
conclusions about the direction of an effect and the and education. However, the shrinkage of the coeffi-
relative effect of different variables within a model (i.e.cient can be partly offset by an increase in explained
which variable has the largest effect) (Wooldridgevariance, i.e. the decrease would have been larger in
2002, p. 470). the absence of a change of scale. And if the coefficient

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
LOGISTIC REGRESSION | 73

is unchanged, we may mistakenly conclude that


regional differences in education are of no relevance
Proposed Solutions
for regional differences in unemployment. A look in any sociological journal that publishes
quantitative research confirms that the problem of
unobserved heterogeneity has escaped the attention of
Comparing Log-Odds the large majority of users of logistic regression. LnOR
and OR are interpreted as substantive effects, and it
Ratios or Odds Ratios
is common practice to compare coefficients across
Over Samples, Groups, models within samples, and across samples, groups etc.
or Time-Points just as in linear regression. This is obviously problem-
atic, so what should we do instead?
The fact that LnOR and OR reflect effects of the The easiest solution to the problem discussed here -
or rather a way of avoiding it - is to replace the latent
independent variables as well as the size of the
continuous variable with an observed continuous one;
unobserved heterogeneity does not only affect our
if a reasonable such variable exists. For example, if the
possibility to draw conclusions about substantive
outcome in question is a binary measure of poverty
effects and to compare coefficients over models. It
also means that we cannot compare LnOR and OR
(coded 1 if poor and 0 if non-poor) one might
consider using a continuous measure of economic
across samples, across groups within samples (Allison,
hardship instead. This, however, is not always a
1999), or over time, without assuming that the
feasible option, because we may be interested in
unobserved heterogeneity is the same across the
dependent variables that are fundamentally qualitative,
compared samples, groups, or points in time.andIn for which no alternative continuous variables
most cases, this is a very strong assumption. Even
exist, such as mortality, divorce or educational
if the models include the same variables, they need
transitions. Even so, there are ways around the
not predict the outcome equally well in all the
problems, at least in some cases. These strategies
compared categories, so different ORs or LnORs in
could usefully be divided into those that concern odds
groups, samples, or points in time can reflect
ratios and log-odds ratios, and those that concern
differences in effects, but also differences in unob-
probability changes.
served heterogeneity. This is an important point
because sociologists frequently compare effects across,
e.g. sexes, ethnic groups, nations, surveys, or years. The
Solutions Using Odds Ratios and
problem extends to comparisons across groups, Log-Odds Ratios
time-
points, etc. within one logistic regression model in the
The problem of comparing coefficients across models
form of interaction effects. For example, in a model for the same sample but with different independent
where we want to study the difference in the effect variables
of is discussed briefly by Winship and Mare
one variable between different ethnic groups, between(1984), who suggest that coefficients can be made
men and women, or between different years, the comparable across models by dividing them with the
estimate will depend on the extent to which the model
estimated standard deviation of the latent variable
predicts the outcome differently in the different
(sdY*) for each model (/-standardization).8 The total
categories. estimated sdY* is the sum of (1) the standard deviation
For example, we might be interested in how the of the predicted logits, and (2) the assumed standard
effect of school grades on the probability of transition deviation of the error term (which is always 1.81,
to university, controlling for the student's sex, varies i.e. V3.29). Because (2) is assumed to be fixed, all
over time or between countries. A weakening of LnOR variation in the estimated sdY* across models must
or OR for grades between two points in time can mean depend on (1), which in turn can depend only on the
that the effect of grades is diminishing over time, but included independent variables. As explained above,
it can also mean that the importance of sex for the standard deviation of the logit (and hence its scale)
educational choice has decreased over time. Similarly, increases when we include variables that improve our
if the LnOR or OR for grades is higher in one country predictions of y. Because /-standardization divides
is higher than in another country, it can mean that the coefficients by the estimated standard deviation, it
effect of grades is stronger, but it can also mean that neutralizes this increase and rescales coefficients to
sex is more important in explaining the educational express the standard- deviation-unit change in y* for a
choice in that country. one-unit change in the independent variable. Note that

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
74 I MOOD

this method does in no way retrieve the 'true' scale of outcome. Because the mean and variance of categorical
y*: The size of the unobserved heterogeneity is still variables cannot be separately identified, models of this
unknown, and the only thing we achieve is the kind cannot really solve the problems discussed here.
possibility to compare a variable's coefficient between Rather, they can give alternative estimates under what
models estimated on the same sample with different are essentially different assumptions about the func-
control variables. tional form of the relation (Keele and Park, 2006). For
y- standardization works for comparisons across ordinal dependent variables, however, these models are
models estimated on the same sample because we more likely to be identified (Keele and Park, 2006),
know the size of the difference in unobserved hetero-so if it is possible to express the outcome in ordinal
geneity across models. However, when comparing rather than nominal terms this can be a good option.
results for different groups, samples, points in time, As noted by Allison (1999), we can get a rough
etc., we do not know the differences between these indication of whether differences in LnOR and OR
in unobserved heterogeneity. What we do know is thatdepend on unobserved heterogeneity by simple inspec-
unobserved heterogeneity that is unrelated to the tion: If coefficients are consistently higher in one
independent variables affects all coefficients within a group, sample, model etc. than in another, it is an
group etc. in the same way. Allison (1999) uses thisindication that there is less unobserved heterogeneity.
fact to develop a procedure to test whether differences However, because the true effects can also differ, this
in coefficients across groups are due to unobserved is not a foolproof test.
heterogeneity, and his article contains a detailed
description of this procedure using various programs. Solutions Using Probability Changes
His procedure involves both a test of whether at least
one coefficient differs across groups, and a test ofThough the proposed solutions described above may
whether a specific coefficient differs. As shown byallow comparability across models and in some cases
Williams (2006a), the first test runs into problems if across groups, the question remains as to how we
the effects of several variables in one group deviate inshould interpret the substantive effects. Angrist (2001)
the same direction from the effects in the other group.argues that problems with effect estimation from
In these cases, Allison's procedure can falsely ascribenonlinear limited dependent variable models (includ-
group differences in effects to group differences in ing logistic regression models) are primarily a con-
residual variation. In addition, the test of whether a sequence of a misplaced focus on the underlying latent
variables. Instead, he thinks that we should care about
specific coefficient differs requires an assumption
that at least one of the underlying 'true' coefficients the effects on probabilities (see also Wooldridge, 2002,
is the same in the compared groups. Of course, thisp. 458). But how do estimates in probability terms fare
assumption may be hard to justify, but if a researcherin the context of unobserved heterogeneity?
has strong theoretical reasons to make such an The link between the logit and the probability of an
assumption, or have external evidence that supportsoutcome is the logistic cumulative distribution func-
it, and only wants to know whether an effect differstion (CDF):
significantly across groups and in which direction, this
test is likely to do the job.
The procedure proposed by Allison for group
comparisons is criticized by Williams (2006a). He where ßx{ is the value of the logit (i.e. the linear
shows that it can be seen as one model in a largercombination of values on variables χ and their
family of so-called heterogeneous choice models, and estimated coefficients β) for observation i. The slope
argues that it will not be the right model for manyof the logistic CDF is the logistic probability distribu-
situations. Instead, he proposes a more flexible use of tion function (PDF), which is given by:
the family of heterogeneous choice models to compare
logit and probit coefficients across groups. Such
[1 + expißxi)]2
models can be estimated in common statistical soft-
ware, such as Stata [using the oglm command
The CDF gives the P(yf = 1), and the PDF at a given
(Williams 2006b)], SPSS and Limdep, and are basedP(yf=l) equals P(yf = 1) χ [1-P(yf = 1)]. Unlike the
LnOR and OR, the effect of an independent variable
on the idea of estimating one 'choice equation' that
models the effect of a range of variables on the on P(yi=l) is not an additive or multiplicative
outcome, and one 'variance equation' that models constant, so there is no self-evident way of reporting
the effect of a range of variables on the variance in results
the in probability terms. There are various measures

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
LOGISTIC REGRESSION | 75

of changes in probability (Long, 1997, pp. 64-78), thus also given by Equation (11), but it is calculated
some of which are based on derivatives, which measure for subgroups with a specific value, or within a specific
the slope at a particular point of the CDF, and others range of values, on Χχ.
that measure the partial change in P(y = 1 ) for a Wooldridge (2002, p. 471) shows for probit that if
discrete change in an independent variable from one other variables are normally distributed and uncorre-
specified value to another (e.g. a one-unit-change). As lated to xly the expectation of APE at different values
demonstrated by Petersen (1985), the derivative does of x1 equals the MFX from the probit without these
not equal the change in P(y = 1) for a one- unit change other variables. Similarly, under the assumption that
in xy but it is often a decent approximation. omitted variables are unrelated to the included ones,
Marginal effects (MFX) at specified values (com- the MFX at specified values of several included
monly means) of all independent variables take the variables equals the APE (averaged over other unob-
value of the logistic PDF corresponding to the served or observed variables) among individuals who
predicted logit for these specific values and multiply have this combination of values. For logistic regression
it by the estimated coefficient for xx. MFX thus Wooldridge offers no similar proof, but the intuition is
expresses the effect of Χχ on P(y=l) conditional on the same: If other variables are uncorrelated to xl9 the
having the specified characteristics. Another possibility is APE of x1 considering these variables and the MFX of
to evaluate the marginal effect at the logit correspond- Χι disregarding these variables will be similar, and any
ing to the average P(y = 1), i.e. conditional on having difference depends on how much the probability
an average P(y= I).9 To get a better sense for the non- distribution that the APE corresponds to deviates
linearity of the effect one can report marginal effects at from a logistic distribution (cf. Figure 1 and note 6:
different levels of independent variables or at different The slope of curve (5) approximates APE, while the
P(y= 1). When the specific values chosen for the point slope of curve (1) is the MFX from the model without
of evaluation are the means of independent variables, x2. Because curve (5) does not follow a logistic
MFX is given by: distribution, the curves are not identical).
Estimated changes in P(y=l) for discrete changes in
ßxj(ßx) (10)
an independent variable (ΔΡ) (Petersen, 1995; Long,
where ßx is the value of the 1997) are normally
logit evaluated
when taking
all the point of χ
variables
have average values. departure at a specific value of the logit, often a value
Another common measure based on derivatives is corresponding to the averages of independent vari-
the average marginal effect (ΑΜΕ), which is given by
ables. In contrast to MFX, ΔΡ measures the change in
P(y=') for a substantively meaningful change in Χχ.

lY,ßxJ(ßxÖ (Π) For example, for a one-unit change in Χχ the ΔΡ


would commonly be estimated as:

where ßX} is the estimated LnOR for variable xly /far, is F(ßx + ßXl)-F(ßx) (12)
the value of the logit (i.e. the linear combination of
values on variables χ and their estimated coefficients where
β) F(ßx + ßXl) and F(ßx) is the CDF of the
for the i-th observation, and fißXi) is the PDF of distribution with regards to (ßx + ßXl) and
the logistic distribution with regard to ßxj. In words,respectively. As with MFX, one can evaluate
ΑΜΕ expresses the average effect of X' on ¥{y=l). logits corresponding to different values of th
It does so by taking the logistic PDF at each pendent variables to better understand the
linearity of the relation.
observation's estimated logit, multiplying this by the
To get a better sense of these different me
coefficient for xly and averaging this product over all
observations. Table 3 shows results from two logistic regress
Average partial effects (APE) differ from ΑΜΕ by a fictitious data set.10 ΑΜΕ is simply the mean
individual derivatives ßx'f{ßxi)y MFX is evalu
averaging the marginal effects βΧι /(/&,·) across the
the means of independent variables (using th
distribution of other variables at different given values
command in Stata), APE is calculated as the m
of X'. While the ΑΜΕ gives one single estimate, APE
ßxifißxu for the 153 (out of 20,000) observatio
varies by Χχ and thus acknowledges the nonlinear shape
are in an interval of 0.01 standard deviations around
of the relation. APE differs from MFX in that MFX
the mean of xly and ΔΡ measures the change in the
gives the marginal effect at given values of all variables
in the equation, while APE gives the average effect at a = 1 ) for a one unit change in Χχ from Vi unit below
P(y
average
given value of xY (or in a given interval of χλ). ΑΡΕ is to Vi unit above average (using the pr change

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
76 I MOOD

command in the Spost routine in Stata). The following ΑΜΕ and APE average the conditional effects,
models are estimated: means that they are not invariant to the excl
of independent variables that are correlated t
Model 1 : y* = a + Xu βι + ε,·
independent variables in the model.
Model 2: y* = a + xlißi-l· x2i ßi + β,·
The change in MFX and ΔΡ between Models 1
R(xly x2)=0; P(y=l)=0.5
2 is a more complex issue than it may seem. It
As can be seen in Table 3, MFX and Δ Ρ follow the not always be the case that these measures chan
same pattern as the LnOR and OR: When controlling a way similar to the LnOR and OR. Recall that
for x2y these estimates suggest stronger effects of χγ on at the mean of the independent variables is give
y. AME and APE, however, are not affected more than ßXlftßx), and ΔΡ by ¥(ßx+ßXl)-¥(ßx), which m
marginally by controlling for x2. The reason is that that the magnitude and direction of the chan
MFX and ΔΡ are conditional on specific values of these measures for xx when controlling for x2 dep
the observed variables, while ΑΜΕ and APE represent not only on the change in ßXl (the LnOR for the e
averages of the conditional effects. Thus ΑΜΕ and APE of xi on y) but also on the size of f(ßx), i.e. the
are roughly invariant to the exclusion of independent of the logistic PDF at the point of evaluation (in
variables unrelated to the independent variables already
case, this point is at the average of the indepe
in the model [cf. Figure 1, in which curve (5) - that variables). This might appear clearer in Figure 2,
approximates APE - is close to curve (1)]. Recall that shows (la) the predicted P(y=l) for different v
of xly not controlling for x2 (the bold solid curve
Table 3 Comparison of estimates of βλ with the corresponding marginal effects at different
control for x2 (Model 2) and without (Model 1) of Χι (the bold dashed curve), (2a) the pred
[/ty=1) = 0.50] P(y=l) for different values of xlt with x2
constant at 0, which is its mean value (the thin
LnOR OR ΑΜΕ ΑΡΕ MFX ΔΡ
curve), and (2b) the corresponding marginal effe
different values of xx (the thin dashed curve).
Model 1 0.626 1.870 0.143 0.157 0.157 0.155
MFX for Χι without controls for x2 (Model 1)
Model 2 1.005 2.732 0.142 0.151 0.251 0.246
bold dashed curve, corresponds to the slope of
bold solid curve, while MFX for χλ controlling f

Figure 2 Predicted probability of y= 1 by x1r and corresponding marginal effects. Average pro

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
LOGISTIC REGRESSION | 77

(i.e. Model 2), the thin dashed curve, corresponds to can for some values of xx lead to higher estimated
the slope of the thin solid curve. Because χλ and x2 are MFX (Wooldridge, 2002; p. 471).
uncorrelated, the bold dashed curve also approximates To further clarify the nature of MFX and ΔΡ,
APE. The solid curve changes shape when including x2, Table 4 gives results from a similar logistic regression
because the more variables we control for, the better of Χι and x2 on y, where P(y=') is 0.14 instead of
we can predict the outcome. In Table 3, MFX was 0.5. n The following models are estimated:
evaluated at the average of xly which is 0. As we can Model 1 : y* = a + χλϊβλ + ε,·
see in Figure 2 (at the vertical line), this is where the Model 2: y* = a + Xußi + x2ißi + ε*
slopes - the MFX - differ most. At other values of xu
R(xu x2)=0;P(y=l)=0.14
however, the differences between the MFX are smaller,
and for very high or very low values of Χχ the MFX are In Table 4 we can see that MFX and Δ Ρ decrease
actually larger when not controlling for x2. when controlling for x2. This is because these measures
To understand how this happens, recall that the are now evaluated where the logistic PDF is further
logistic PDF (i.e. the derivative of the CDF), has its from its peak. For the models in Table 4, Figure 3
peak when P(y=l) is 0.5, which means that this is shows the same quantities as in Figure 2, that is: (la)
where the effect of an independent variable on P(y = 1 ) the predicted P(y=l) for different values of xlt not
is strongest. The more unobserved heterogeneity we controlling for x2 (the thick solid curve), (lb) the
have, the more the scale of the logit shrinks towards corresponding marginal effects at different values of xx
zero. This means two things: (1) the coefficients
shrink, and (2) the predicted logits shrink. Because a Table 4 Comparison of estimates of Βλ with
logit of 0 equals a probability of 0.5, the predicted control for x2 (Model 2) and without (Model 1)
probabilities move closer to 0.5 when the predicted 1/^=1) = 0.14]
logits shrink towards zero. As this is a move towards
LnOR OR ΑΜΕ ΑΡΕ MFX ΔΡ
larger /(/3x), unobserved heterogeneity simultaneously
shrinks βχγ and increases f(ßx). Though the shrinkage Model 1 0.692 1.999 0.080 0.076 0.076 0.076
in βχι is the same for all levels of xlt the increase in Model 2 1.002 2.723 0.078 0.077 0.044 0.045
fißx) varies with xu so more unobserved heterogeneity

Figure 3 Predicted probability of y= 1 by x1# and corresponding marginal effects. Average

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
78 I MOOD

(the thick dashed curve), (2a) the predicted P(y = 1) for (2) Heteroscedastic and non-normal residuals, leading
different values of xly with x2 held constant at 0, which to inefficiency and invalid standard errors, and
is its mean value (the thin solid curve), and (2b) the (3) Misspecified functional form.
corresponding marginal effects at different values of xx
(the thin dashed curve). As in Table 4, we see that at the As noted by Long (1997), the occurrence of
unrealistic predicted values as in (1) is also common
point where MFX is evaluated (at xx = 0), the MFX
in linear regression with non-binary dependent vari-
(which is the slope of the solid curve predicting the
probability) is lower when we control for x2.
ables. This is not a serious problem unless many
To conclude, AME and APE are not (at least not predicted values fall below 0 or above 1; and (2) can
easily be corrected for.13 This leaves (3) as the critical
more than marginally) affected by unobserved hetero-
issue. It is often theoretically plausible that binary
geneity that is unrelated to the independent variables
outcome variables are related to the independent
in the model, and can thus be compared across
variables in a non-linear fashion with smaller incre-
models, groups, samples, years etc. However, these
ments in the probability of the outcome at the extreme
measures are population-averaged and represent the
ends of the distribution.
average of the conditional effects οϊχλ on P(y = 1). ΑΜΕ
As long as the misspecification of functional form
gives the overall average of the conditional slopes for
does not alter (more than marginally) the substantive
xly while APE corresponds to the average of the
conclusions that are relevant to the questions asked, it
conditional slopes for different values of χγ. Though
is reasonable to choose LPM over logistic regression.
population-averaged estimates in probability terms are
Though LnOR or OR from logistic regression are valid
more intuitive in that they measure the average change
for all values of the independent variable and acknowl-
in P(y=l) and do not uniformly underestimate
edge the non-linearity of the relation, these advantages
conditional effects as LnOR or OR do, they still
are not exploited if substantive effects are reported
estimate an average effect. If one is interested in the
only at a certain point of the distribution. For
effect on an aggregate level, ΑΜΕ or APE is normally
example, if we are only interested in sign and
sufficient. Nonetheless, many are interested in the
significance of an effect, or of an average effect
change in probability that would occur for individuals
estimate (such as ΑΜΕ) and not in the non-linearity
upon a change in the independent variable, and then
of the relation per se, a LPM is entirely appropriate,
MFX or ΔΡ evaluated at different values of indepen-
and deriving AME from logistic regression is just a
dent variables are more appropriate because they come
complicated detour.
closer to this - especially ΔΡ as it measures the effect
In fact, the LPM effect estimates are unbiased and
of a meaningful change in the independent variable. As
consistent estimates of a variable's average effect on
noted above, these measures are affected by unob-
P(y=l) (e.g. Wooldridge, 2002; p. 454). For the data
served heterogeneity, and cannot be straightforwardly
in Tables 3 and 4, the LPM coefficients are 0.143
compared. In addition, they can suggest very different
and 0.080, respectively, which is identical to the AME
effects depending on the values chosen for the
for the same models. To get a more general picture of
independent variable and the conditioning variables.
the equality of AME and LPM coefficients, I ran 1,000
Thus, a good option is often to report one estimate of
simulations that fitted logistic regression and LPM
probability change that is population-averaged, and
to fictitious datasets (n = 5,000) where xx and x2 are
one that is conditional.12
continuous variables drawn from normal distributions,
and y* is generated by a + 0.5xu-'-2x2i+ ε*, where £,·
Linear Probability Models follow a logistic distribution with a variance of 3.29,
and α is set to vary between 0 and 6. Varying α in this
Linear probability models (LPM), i.e. linear regression
way means that we obtain data with P(y = 1 ) ranging
used with binary dependent variables, also yield results
between 0.5 and 0.98 (because the logistic curve is
in terms of probability changes. In linear regression,
symmetric, there is no need to consider P(y= 1) <0.5).
we estimate the effects on the observed dependent
As before, Model 1 includes only χγ and Model 2
variable, so coefficients are comparable over models,
includes both χλ and x2.
groups, etc. Using LPM is almost unthinkable in
The results (see Table 5) are strikingly easy to
sociology, while it is common in economics. Generally,
summarize: Regardless of the overall P(y=l), ΑΜΕ
three problems are pointed out for LPMs:
and LPM coefficients are identical or as good as
(1) The possibility of predicted probabilities higher identical. So if one seeks an estimate of the average
than 1 or lower than 0, i.e. out of range. effect, there appears to be no need to use logistic

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
LOGISTIC REGRESSION | 79

Table 5 Average coefficients from LPM and AME from logistic regression

a = 0 a - 1 a = 2 a = 3 a = 4 a = 5 ol = 6
P(y=1) 0.50 0.65 0.77 0.87 0.93 0.97 0.98

Model 1
AME χγ 0.074 0.069 0.056 0.039 0.024 0.013 0.006
(0.007) (0.006) (0.006) (0.005) (0.004) (0.003) (0.002)
LPM x1 0.074 0.069 0.056 0.039 0.024 0.013 0.006
(0.007) (0.006) (0.006) (0.005) (0.004) (0.003) (0.002)
Model 2
AME xx 0.074 0.069 0.056 0.039 0.024 0.013 0.006
(0.006) (0.006) (0.005) (0.004) (0.003) (0.003) (0.002)
LPM Xi 0.074 0.069 0.056 0.039 0.024 0.013 0.006
(0.006) (0.005) (0.005) (0.004) (0.003) (0.003) (0.002)
AME x2 0.297 0.276 0.223 0.157 0.096 0.053 0.026
(0.004) (0.004) (0.005) (0.005) (0.004) (0.004) (0.003)
LPM x2 0.298 0.277 0.223 0.157 0.096 0.053 0.026
(0.004) (0.004) (0.005) (0.005) (0.005) (0.004) (0.003)

Standard deviations in parentheses. 1,000 replications, η = 5,000.

regression. However, a problem with


Because reporting
coefficients oneon effect sizes and
depend both
overall effect estimate is that the
it magnitude
gives an impression
of unobserved heterogeneity, we cannot
that the relationship is linear. If the independent
straightforwardly interpret and compare coefficients as
variables mainly vary over a range
we dowhere
in linear the curve
regression. is
Although these problems
roughly linear, this is of courseare
not a problem,
mentioned but if
in some econometric textbooks (e.g.
nonlinearity is substantively Wooldridge,
important, it may
2002) and canbebe known by sociologists
misleading. If coefficients can be easily compared and
specialized in quantitative methodology, the knowledge
appear easy to understand, but give a false
has hardly spreadimpression
at all to practicing sociologists. If
of linearity, we have only exchanged
we do notone problem
consider for
these issues, we run great risk of
another.
drawing unwarranted conclusions from our analyses,
One argument sometimes levelled and evenagainst LPMs
to give wrong adviceisto policy-makers. The
that their results as concerns interactions and non-
aim of this article is to present and discuss these
linear transformations of the independent variables problems in the context of logistic regression, but the
differ from those of logit and probit models for theproblem is similar in other applications, such as probit
same variables. This, however, should not be seen as a
models (Wooldridge, 2002, pp. 470-472) or propor-
defect of LPMs, but as a simple reflection of the fact
tional hazard models (Gail, Wieand and Piantadosi,
that interactive and non-linear transformations say1984), methods also commonly used by sociologists.
something substantively different in a model where a
To minimize problems at the stage of analysis and
nonlinear functional form is already inherently builtreporting, it is important to be aware of these
into the model than in a model that assumes a linear
problems already at the study planning and data
relation. In other words, nonlinearities that are
collection stage. First, one should avoid collection of
captured by the logistic functional form in the Logit
data in terms of dichotomies and qualitative variables
model can be captured by non-linear transformations
if continuous (or at least ordinal) alternatives exist.
of the independent variables in a LPM.
Second, if the intention is to use logistic regression or
some similar model using a non-linear link function,
Conclusions one must be careful to collect information on variables
that are likely to be important for the outcome, even if
Logistic regression is more complex than sociologists
these are likely to be only weakly, or not at all, related
usually think. Standard textbooks in quantitativeto the independent variables of interest.
There are no simple all-purpose solutions to the
methods do not correctly reflect this complexity, and
researchers continuously misunderstand and misreport
problems of interpretability and comparison of effect
effect estimates derived from logistic regression.
estimates from logistic regression. The situation is

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
80 I MOOD

Table 6 Characteristics of estimated effects on binary dependent variables

Capture Comparable Comparable Conditional


nonlinearity across across effect
groups, models estimate3
samples etc.

Measures based on odds and log-odds


Odds ratio Yes No No Yes
Log-odds ratio Yes No No Yes
/-standardization Yes No Yes No
Allison's procedure Yes Yesb No Yes
Heterogeneous choice models Yes Yesc No Yes
Measures based on percentages
Average marginal effect No Yes Yes No
Average partial effect Yesd Yes Yes No
Marginal effect Yesd No No Yes
ΔΡ Yesd No No Yes
Linear probability model No Yes Yes Noe
aIn a multivariate model.
bIf assumption that one variable has same effect in groups etc. is correct.
cIf assumption about the functional form of the relationship is correct.
dIf estimated at several places in the distribution.
eIf the true relationship is nonlinear.

complicated by the fact that we often want estimates dependent variables, but they hold also for
that simultaneously (i) capture the non-linearity of the multinomial or ordinal logistic regression.
relation, (ii) are comparable over groups, samples etc., 2. Unobserved heterogeneity is sometimes defined as
(iii) are comparable over models, and (iv) indicate unobserved differences between certain categories
conditional effects. Because one estimate can normally
(e.g. men/women, treated/non-treated) or unob-
not fulfil all these criteria, we need to carefully
served individual characteristics that are stable
consider what is most relevant for our purposes and
what we can estimate with available data. And, over time. In this article, I consider all variation

crucially, if our estimates do not fulfil all these criteria, that is caused by unobserved variables to be
we must report our results accordingly. unobserved heterogeneity, and the problems that
Table 6 provides a summary of the characteristics of I discuss occur regardless of whether the unob-
the different effect estimates that have been discussed.
served variables are group-specific and/or stable
Because different estimates fulfil different criteria, it is over time.
often advisable to report results using more than one
3. The problems discussed also apply to coefficients
type of estimate. I would also like to add a fifth
from probit and most other models using non-
criterion to the four above, namely that the estimates
linear link functions (Wooldridge, 2002,
we report should be understandable to the reader.
Many find log-odds ratios hard to grasp, and odds pp. 470-472; Gail, Wieand and Piantadosi,
ratios are frequently misunderstood as relative risks, so 1984). I concentrate on logistic regression here
it is often a good choice to present at least one effect because it is used very frequently in sociology.
estimate in terms of effects on probabilities. 4. Equations 6 and 7 are only approximately true,
because bx will be affected not only by the size but

Notes also by the shape of the residual variation, and


this shape cannot be logistic both when including
1. For an accessible introduction to logistic regres- x2 and when excluding it (cf. note 6).
sion, see Long (1997). I here discuss these5. The dataset (n= 10,000) is constructed in Stata
problems in the context of dichotomous 10.0. IQ (χχ) is a continuous variable (mean 0,

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
LOGISTIC REGRESSION | 81

sd 1) drawn from a normal distribution (com- generated by -3 + xu + 2x2i + eb where ε{ follows


mand drawnorm) and sex (x2) is a dummy a logistic distribution with a variance of 3.29.
variable (mean 0.5, sd 0.5). y* is generated by 12. As discussed in the section about LnOR and OR,
- l + lxii + 2x2i + ei, where ε{ follows a logistic the distinction between population- averaged and
distribution with a variance of 3.29. conditional estimates is really a matter of degrees.
6. As can be seen in Figure 1, the probability curve Estimates that are conditional on some vari-

from the bivariate logistic model (curve 1) does able are still averaged over the distribution of
not perfectly represent the average of the other variables. Thus, the MFX for X' in Model 2
conditional probability curves (3) and (4), but in Tables 3 and 4 are conditional on x2f but
deviates systematically at very high and very averaged over the distribution of variables that
remain unobserved.
low values of Χχ. This is because the average of
two logistic curves is not a logistic curve itself, 13. A simple way is to use heteroscedasticity- robust
standard errors. A more efficient alternative is to
but Model 1 is restricted by the assumption of
the logistic distribution so that the predictions estimate LPM by weighted least squares (WLS),
where the observations with smaller residuals are
from it (curve 1) must take a logistic shape. This
exemplifies an additional problem with logistic given more weight. However, because the smaller

regression estimates: they can be affected not residuals are for predicted values close to 0 or
only by the size of the error but also by 1, which are the predictions most likely to be

misspecification of the shape of the distribution. misleading in the LPM if the true relationship

However, this problem appears minor relative is nonlinear, WLS can be misleading when the
to the one discussed in this article (cf. Cramer, true relationship is nonlinear and many predicted

2003). probabilities lie close to 0 or 1.

7. For the case of log-linear models and cross-


tabular analysis the conflict between averages Acknowledgements
on the probability and on the OR scale has
I thank Jan O. Jonsson, Martin Hällsten, Richard
been discussed in terms of the collapsibility
Breen, Frida Rudolphi, Robert Erikson, and two
of OR over partial tables (Whittemore, 1978;
anonymous referees for helpful comments and advice.
Ducharme and Lepage, 1986).
Financial support from the Swedish Council for
8. y- standardized coefficients can easily be obtained
Working Life and Social Research (FAS grant no
in Stata using the Spost package (Long and 2006-0532) is gratefully acknowledged.
Freese, 2005).
9. This is in fact a convenient way to roughly gauge
References
the substantive meaning of results reported in
terms of LnOR. Because the logistic PDF is Allison, P. D. (1999). Comparing logit and probit
simply P(l-P), one can calculate the marginal coefficients across groups. Sociological Methods and
effect at average P(y=l) by taking a variable's Research, 28, 186-208.
LnOR χ average P(l- average P). Angrist, J. D. (2001). Estimation of limited dependent
10. The dataset (« = 20,000) is constructed in Stata variable models with dummy endogenous regres-
10.0. xY and x2 are continuous variables (mean sors: Simple strategies for empirical practice.
0, sd 1) that are uncorrelated and drawn Journal of Business & Economic Statistics, 19, 2-28.
from a normal distribution (command draw- Cramer, J. S. (2003). Logit Models from Economics and
Other Fields. Cambridge: Cambridge University
norm), y* is generated by Xn + lx^ + s^ where ε,·
Press.
follows a logistic distribution with a variance
Ducharme, G. R. and Lepage, Y. (1986). Testing
of 3.29.
collapsibility in contingency tables. Journal of the
11. The dataset (« = 20,000) is constructed in Stata Royal Statistical Society. Series Β (Methodological),
10.0. Χι and x2 are uncorrelated continuous 48, 197-205.
variables (mean 0, sd 1) drawn from a normalGail, M. H., Wieand, S. and Piantadosi, S. (1984).
distribution (command drawnorm). yf is Biased estimates of treatment effect in randomized

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms
82 I MOOD

experiments with nonlinear regressions and Winship, C. and Mare, R. D. (1984). Regression
omitted covariates. Biometrika, 71, 431-444. models with ordinal variables. American
Keele, L, and Park, D. K. (2006). Difficult Choices: An Sociological Review, 49, 512-525.
Evaluation of Heterogeneous Choice Models. Paper Williams, R.W. (2006a). Using Heterogeneous Ch
for the 2004 Meeting of the American Political Models To Compare Logit and Probit Coefficien
Science Association. Across Groups. Working Paper, Notre Da
Long, J.S. (1997). Regression Models for Categorical and University.
Williams, R.W. (2006b). OGLM: Stata Module to
Limited Dependent Variables. Thousand Oaks: Sage.
Long, J. S. and Freese, J. (2005). Regression Models forEstimate Ordinal Generalized Linear Models.
Categorical Outcomes Using Stata. TX: Stata Press. Working Paper, Notre Dame University.
Wooldridge, J. M. (2002). Econometric analysis
Menard, S. (1995). Applied Logistic Regression Analysis.
Sage University Paper series on Quantitative of cross section and panel data. Cambridge:
Applications in the Social Sciences 07-106. MIT Press.

Thousand Oaks, CA: Sage.


Petersen, T. (1985). A Comment on presenting results
from logit and probit models. American Sociological Author's Address
Review, 50, 130-131.
Whittemore, A. S. (1978). Collapsibility of multidi- Swedish Institute for Social Research (SOFI),
mensional contingency tables. Journal of the Royal Stockholm University, SE-10691 Stockholm,
Statistical Society. Series Β (Methodological) , 40, Sweden. Email: carina.mood@sofi.su.se
328-340.

This content downloaded from


203.110.242.40 on Sat, 16 Mar 2024 06:56:26 +00:00
All use subject to https://about.jstor.org/terms

You might also like