Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Health Services & Outcomes Research Methodology 4: 5–18, 2003


c 2003 Kluwer Academic Publishers. Manufactured in The Netherlands.

Choosing Between and Interpreting the Heckit


and Two-Part Models for Corner Solutions
WILLIAM H. DOW∗ will dow@unc.edu
EDWARD C. NORTON edward norton@unc.edu
Department of Health Policy and Administration, University of North Carolina at Chapel Hill, USA

Received November 20, 2002; Revised March 14, 2003; Accepted April 9, 2003

Abstract. This article addresses certain poor practices commonly seen in the applied health economics literature
regarding the use of the Heckit and the two-part model. First, many articles invoke the Heckit to solve a supposed
selection problem associated with masses of zero values in continuous variables, despite the fact that it has been
shown elsewhere that no such selection problem exists when modeling observed actual, as opposed to latent
potential, outcomes. Second, many applications incorrectly formulate the marginal effect tests in the Heckit and
two-part model, thus undermining central conclusions. Finally, many researchers use a t-test of the inverse Mills
coefficient to choose between the Heckit and two-part models despite its poor performace; we propose instead an
adapted empirical mean square error test.

Keywords: Heckman selection correction, two-part model, health care expenditures

1. Introduction

Health services researchers frequently analyze continuous variables, such as health expen-
ditures, that include a large fraction of zero values. Unfortunately, these researchers often
mis-apply Heckman selection models and two-part models to such corner solution out-
comes. Too often researchers invoke the Heckit model for inappropriate reasons, derive the
wrong marginal effects for their main tests of interest, and use an unsatisfactory test of the
Heckit versus the two-part model (2PM) (for a recent example of mis-interpretation and
confusion about the Heckit model, see the exchange by Zweifel et al. [22, 23] and Salas and
Raftery [18]). Although the “cake debates”1 of the 1980s discussed these concerns, certain
issues debated then still appear poorly understood by many researchers. We refer readers
to Jones [10] for an excellent history of the cake debates, although he does not attempt to
adjudicate between conflicting arguments about the Heckit and 2PM. Econometrics text-
books have also largely failed to help, because most ignore the 2PM, and mention only
latent variable applications of the Heckit (Wooldridge [21] is a welcome exception).
This article aims to clarify several issues regarding the relative merits and usage of the
Heckit and 2PM when applied to data with a large fraction of zeros. Our study makes three
main points. We first clarify confusion between two applications of the Heckit—one for
latent variable modeling of potential outcomes, and a second for corner-solution modeling

∗ Author to whom correspondence should be addressed.


6 DOW AND NORTON

of actual outcomes. The former is the traditional selection bias application for which the
Heckit was designed. Most health care applications are of the latter type, however, and we
emphasize that there is no selection bias problem when modeling actual outcomes. Second,
because papers reporting the wrong marginal effects continue to appear in the literature,
we explain how to compute the correct marginal effects of covariates on actual outcomes
in both the Heckit and 2PM. Third, to address problems with the usual t-test of the inverse
Mills coefficient for choosing between the Heckit and 2PM, we recommend that researchers
instead use an adaptation of the Toro-Vizcarrondo and Wallace [19] empirical mean square
error test of the difference in the marginal effects of interest. We illustrate our points
using Monte Carlo simulation examples. To focus attention we refer to health expenditure
models throughout, but the issues are equally important for many other applications, such
as cigarette demand and hospital utilization.

2. Actual versus potential outcomes

When analyzing continuous outcome variables with a large proportion of zeros, a common
stumbling block is confusion between actual outcomes and potential outcomes. (Rather
than the terms actual and potential, some authors prefer the terms conditional and uncon-
ditional. We feel the latter terms cause confusion, because the second part of the 2PM is
also sometimes referred to as the conditional equation, but is a very different outcome.)
The actual outcome is a fully-observed variable. Zero values for actual health expendi-
tures indicate that zero dollars were expended. We refer to these actual zero values as corner
solutions (using the language of constrained optimization), because individuals cannot have
negative health expenditures. If many observations have zero expenditures, then the econo-
metric challenge is to model these corner solutions. As long as the zero expenditures are
true zeros—not missing data—then there is no selection problem to address. Although the
Heckit can be used to estimate actual outcomes, this interpretation requires several extra
calculations beyond what is reported by most canned statistical packages. As an alternative
estimator, Duan and colleagues [3] proposed the 2PM, arguing that it often has lower mean
square error than other estimators such as the Heckit when analyzing actual outcomes.
In contrast, the potential outcome is a latent variable that is only partially observed. The
non-zero values are assumed to be true observations of the potential outcome, but zero
values indicate observations for which the potential outcome is missing (latent). The zeros
do not represent zero values for the potential outcome. If those with missing values differ
systematically from those with observed values of the potential outcome (after controlling
for other covariates), then models such as the 2PM will suffer from selection bias. This
is related to the point recently re-emphasized by Angrist [2] and Mullahy [17], that the
second part of the 2PM model cannot generally be used to make causal inferences about
“conditional-on-positive” effects. In such cases, a structural model with explicit assumptions
is required to model the population distribution of potential outcomes. Because the Heckit
was designed to address selection bias for analyzing potential outcomes, the Heckit estimator
incorporates features that make it often perform worse than the 2PM when analyzing actual
outcomes. We discuss these specific features further below, such as the stronger identification
assumptions required of the Heckit.
HECKIT AND TWO-PART MODELS FOR CORNER SOLUTIONS 7

Because the Heckit and the 2PM have different strengths and weaknesses when applied
to actual as opposed to potential outcomes, and because these models must be interpreted
differently when applied to these different outcomes, it is critical to clearly specify whether
the outcome of interest is the actual or the potential. For example, labor economists, who
developed the Heckit model, are generally interested in the potential wage. Observations
without positive wage outcomes do not imply that an individual worked for zero wages;
instead they indicate that the potential wage (the wage that an individual could earn if she
were to work) is unobserved. Because non-working people (with unobserved wages) are
likely to be systematically different from working people (with observed potential wages),
2PM models of potential wages are likely to suffer from selection bias. Therefore, labor
economists estimate structural models such as the Heckit.
However, the same logic does not necessarily hold for other outcomes. Consider health
expenditures. Although researchers could plausibly be interested in either the actual or
the potential outcomes, for most policy issues, researchers are interested in the public and
private budgetary implications of actual expenditures. In contrast, the concept of potential
health expenditures has rarely been discussed in the literature, and is not well defined.
Potential expenditures that are never incurred will not affect health care budgets. One
possible interpretation of potential health expenditures is that for a person with zero actual
expenditures, there is a latent positive expected expenditure that would have been incurred if
the person had sought any health care. For example, if a person with zero actual expenditures
had been examined by a doctor, the doctor’s perception of unmet health care needs could
have led to procedures costing a total of $1,000. In this case the latent potential expenditure
would be $1,000 instead of the actual expenditure of zero.
Unfortunately, many researchers automatically invoke the Heckit in order to solve a
supposed selection problem, regardless of whether they are interested in actual or potential
expenditures. The vast majority of health expenditure analyses are in fact concerned with
the determinants of actual expenditures. Because zero expenditures are true zeros in such
applications, there is no selection bias problem. The selection issue would only have to be
addressed for research questions about the determinants of potential health expenditures,
but those issues are rarely of interest. As a result, the Heckit is often used in applications
where the 2PM is likely a superior estimator.
The common misunderstanding of the difference between actual and potential outcomes
is not merely an academic exercise of choosing the right estimator. The choice of type of
outcome and model have important implications for interpretation, magnitudes of marginal
effects, and significance of hypothesis tests. Too often researchers using the Heckit or 2PM
fail to calculate the correct actual marginal effects of interest—mistakenly focusing on the
potential marginal effects instead. In the next section we explicitly show how to calculate
the correct actual marginal effects in these models, and highlight the dangers of mistakenly
focusing on the potential marginal effects.

3. Actual outcomes in the 2PM and Heckit, and their marginal effects

To further clarify model choice and testing for actual outcomes, we consider three related
but distinct approaches, as adapted from Heckman [9]: single-equation models, Heckit, and
8 DOW AND NORTON

2PM. Denote y as the actual outcome, y ∗ as the potential outcome, X as a vector of observed
explanatory variables of interest with coefficients β, and errors ε. We initially assume
linearity in the continuous equations to abstract from retransformation issues discussed
elsewhere by Duan and colleagues [4], Manning [14], Mullahy [16], and Ai and Norton [1].
Both the Heckit and 2PM specify two separate equations, the probability of a positive
observed outcome Pr[y > 0 | X ], and the mean outcome conditional on being positive
E[y | y > 0, X ]. Although the Heckit was designed to estimate E[y ∗ | X ], our main out-
come of interest E[y | X ] can be recovered from either the Heckit or 2PM by using the
decomposition E[y | X ] = Pr[y > 0 | X ] × E[y | y > 0, X ]. Based on this decomposition,
for either a Heckit or 2PM the general formula for the marginal effect of a covariate xk on
expected actual expenditures E[y] is the sum of two terms (the condition on X is suppressed
for ease of notation):

∂E[y] ∂(Pr[y > 0] × E[y | y > 0])


=
∂ xk ∂ xk
   
∂E[y | y > 0] ∂Pr[y > 0]
= Pr[y > 0] × + E[y | y > 0] × (1)
∂ xk ∂ xk

3.1. Single-equation model

The single-equation model does not treat zeros specially. Instead, Eq. (2) directly models
actual health care expenditures in the full sample.

E[y | X ] = f (Xβ1 , ε1 ) (2)

Variations of this model, as determined by the function f , include linear OLS, Box-Cox,
exponential specifications [16] and conditional density estimators [5]. The advantage of
Eq. (2) is that it allows direct computation of the primary effect of interest—the marginal
effect of one covariate xk on actual expenditures: ∂E[y]/∂ xk = β1 . The disadvantages are
that it does not allow modeling of two potentially distinct aspects of health care expenditures
(access and quantity), and that it may have higher mean square error [3].

3.2. Heckit model

The Heckit model (also known as the two-step selection model, the adjusted Tobit, or the
Limited Information Maximum Likelihood selection estimator) consists of two equations.
The first equation is a probit estimator of the probability of having a positive outcome (the
selection equation), and the second equation is an OLS estimator of expenditures among
the sub-sample with y > 0 (the conditional equation):

Pr[y > 0 | X ] = (Xβ2 , ε2 ) (3)


E[y | y > 0, X ] = Xβ3 + E[ε3 | y > 0, X ] (4)
= Xβ3 + ρσ3 λ(Xβ2 ) (5)
HECKIT AND TWO-PART MODELS FOR CORNER SOLUTIONS 9

The inverse Mills ratio λ(Xβ2 ) = ϕ(Xβ2 )/(Xβ2 ) is used to estimate E[ε3 | y > 0, X ] =
ρσ3 λ(Xβ2 ), where (ε2 , ε3 ) ∼ N (0, ) and
 
1 ρσ3
= . (6)
ρσ3 σ32

The Heckit model assuming error normality has been the dominant selection model in the
literature. This is despite the fact that already by the early 1980’s alternative non-normal
selection models had been proposed [11, 12].
In practice, using the Heckit model requires sufficient variation to identify the X coef-
ficient separately from the inverse Mills coefficient (ρσ3 ) in Eq. (4). If the model is not
well identified, the resulting multicollinearity in Eq. (4) will cause parameter instability
that leads to high mean square error, and it will also cause the t-test on the inverse Mills
term to perform poorly [13].
Identification of the Heckit model can come from two sources. One is the use of instru-
ments in the selection Eq. (3), i.e., imposing the exclusion restriction assumption that some
components of the X vector have coefficients β3 equal to zero in the conditional Eq. (4).
However, such exclusion assumptions are often unavailable or hard to defend, particularly
in health expenditure models for which the determinants of zero expenditures are often the
same as the determinants of the amount of positive expenditures. In the absence of exclu-
sion restrictions, the second potential source of identification is functional form. If the X
range is sufficiently wide, then Leung and Yu [13] indicate that the Heckit can perform well
with functional form identification. With a smaller X range, however, the model will not be
sufficiently identified, and the multicollinearity between X and the inverse Mills coefficient
will cause the Heckit model to perform poorly.
Potential outcomes in Eq. (7) cannot be directly estimated because the censoring of y ∗
may cause selection bias (y ∗ = y if y > 0, y ∗ = unobserved otherwise).

E[y ∗ | X ] = Xβ ∗ (7)

Under explicit error distributional assumptions, however, inclusion of the inverse Mills
term in Eq. (4) corrects this selection bias, allowing β3 to be interpreted as a consistent
estimate of β ∗ . In this case, β3k is the potential marginal effect of a change in xk on potential
expenditures, β3k = ∂E [y ∗ ] /∂ xk . When researchers focus attention on β3k only, they are
implicitly analyzing the potential marginal effect, not the actual.
Recovery of actual outcomes y from the Heckit requires a more complex calculation
involving both Eqs. (3) and (4):

E[y | X ] = (Xβ2 )[Xβ3 + ρσ3 λ(Xβ2 )] (8)

For the Heckit model with a linear y in the main Eq. (4), the resulting actual marginal effect
is the derivative of Eq. (8):
∂E[y]
m Heckit,linear =
∂ xk
= β3k (Xβ2 ) + β2k φ(Xβ2 )[Xβ3 − ρσ3 Xβ2 ] (9)
10 DOW AND NORTON

Many applications specify the dependent variable in the main Eq. (4) as ln(y) rather than
y. For these log models, van de Ven and van Praag [20] have derived retransformed E[y] in
the case where the error term has a normal distribution and is homoskedastic (this problem
is far more complicated if the error term is not normal or not homoskedastic [1, 14]:
 
E[y] = (Xβ2 + ρσ3 ) exp Xβ3 + .5σ32 (10)

From Eq. (10) the marginal effect can be derived in log models:

∂E[y]
m Heckit,log =
∂ xk
 
= β3k E[y] + β2k φ(Xβ2 + ρσ3 ) exp Xβ3 + .5σ32
= [β3k + β2k λ(Xβ2 + ρσ3 )]E[y] (11)

Some researchers prefer to instead report the elasticity η when using a log dependent variable
specification:

∂E[y] xk
ηHeckit,log = × = [β3k + β2k λ(Xβ2 + ρσ3 )]xk (12)
∂ xk E[y]

For each of these Heckit marginal effects and elasticities, Eq. (8) shows that xk affects
actual E [y] three ways: through its effect on the selection equation (captured by β2 in
Eq. (3)), through its direct effect in the conditional equation (captured by β3 in Eq. (4)),
and through its indirect effect through the inverse Mills ratio λ (captured by ρσ3 in Eq. (4)).
Correspondingly, the statistical significance of the marginal effect and elasticity depends
on the standard errors, variances, and covariances of all of these parameters. Because xk
and λ are highly collinear in many applications, their coefficients will be correlated, and as
a result β3 will often be less precisely estimated than the actual marginal effect or actual
elasticity. Researchers who focus only on the statistical significance and magnitude of β3
will fail to reject the null hypothesis too frequently when xk truly does affect E[y].

3.3. Two-part model

Like the Heckit, the 2PM has two equations, the first being on the entire sample, and the
second on a subset:

Pr[y > 0 | X ] = (Xβ2 , ε2 ) (13)


E[y | y > 0, X ] = Xβ4 + E[ε4 | y > 0, X ] = Xβ4 (14)

Equation (13) is identical to the Heckit selection Eq. (3). Equation (14) has almost the
same right-hand side specification as the Heckit conditional Eq. (4), except the 2PM does
not include the inverse Mills term. As a result of excluding this inverse Mills term, the 2PM
is conceptually inappropriate for estimating potential outcomes; β4 will only equal β ∗ in
HECKIT AND TWO-PART MODELS FOR CORNER SOLUTIONS 11

the special case where there is no selection bias (ρ = 0). The 2PM is appropriate, however,
for estimating actual outcomes:

E[y | X ] = (Xβ2 )[Xβ4 ]. (15)

Similar to the Heckit, estimation of ∂E[y]/∂ xk for the 2PM requires a derivative of the
combination of Eqs. (13) and (14), again using the decomposition in Eq. (1). When the
dependent variable in Eq. (14) is linear y, the equation for the marginal effect analogous to
Eq. (9) is:

∂E[y]
m 2PM,linear = = β4k (Xβ2 ) + β2k φ(Xβ2 )[Xβ4 ] (16)
∂ xk

For the corresponding 2PM with ln(y) and normal homoskedastic errors ε4 , actual outcomes
are predicted by (again, this is more complex if the error term is either not normal or not
homoskedastic):
 
E[y] = (Xβ2 ) exp Xβ4k + .5σ42 (17)

and the marginal effect is:

∂E[y]  
m 2PM,log = = β4k E[y] + β2k φ(Xβ2 ) exp Xβ4k + .5σ42
∂ xk
= [β4k + β2k λ(Xβ2 )]E[y] (18)

Finally, the elasticity for the 2PM log specification is given by:

∂E[y] xk
η2PM,log = × = [β4k + β2k λ(Xβ2 )]xk (19)
∂ xk E[y]

In empirical applications, the true data generating process is always unknown. If the
Heckit is the true model, then the 2PM will be mis-specified because it assumes linear
X in Eq. (14), and omits the non-linear X term represented by the inverse Mills ratio.
Comparison of the actual y formula in the Heckit Eq. (8) and the 2PM Eq. (15) reveals
that when modeling actual outcomes, this mis-specification is analogous to omitting a
single higher order term in X from a regression equation. In previous Monte Carlo work
by Leung and Yu [13], Manning et al. [15], and Hay et al. [7], bias in the 2PM when
the Heckit is the true model is frequently outweighed by the higher relative efficiency
of the 2PM. This relative efficiency arises from the fact that the inverse Mills term is not
included in the 2PM, and hence there is no multicollinearity problem. These previous Monte
Carlo studies, however, have not specifically analyzed the estimation of the marginal effect
for actual outcomes. In Section 4 we present additional Monte Carlo evidence that yields
important insights regarding the use of the 2PM and Heckit models to recover actual marginal
effects.
12 DOW AND NORTON

4. A Monte Carlo example

We illustrate the above issues with a Monte Carlo example based on the designs used
by Leung and Yu [13]. The data are drawn assuming that the Heckman sample selection
model (Eqs. (3) and (4)) is the true data generating process. Potential and actual marginal
effects and elasticities are reported for linear (Table 1) and log specifications (Table 2)
estimated by:
1. 2PM,
2. Heckit (2-step Limited Information Maximum Likelihood sample selection model),
3. FIML (Full Information Maximum Likelihood sample selection model).

Table 1. Actual and potential marginal effects from Monte Carlo simulation: Linear specification.

Estimated marginal effects


(standard error)

Monte Carlo design Heckman selection models

True M.E. 2PM Heckit FIML


True x Percent
model range censored Potential Actual Potential Actual Potential Actual Potential Actual

Heckit U [0, 3] 75 1 .20 .64 .19 1.10 .20 .82 .19


(.09) (.03) (1.08) (.03) (.38) (.03)
Heckit U [0, 3] 25 1 .80 .82 .79 1.01 .80 .97 .80
(.04) (.03) (.17) (.03) (.13) (.03)
Heckit U [0, 10] 75 1 .25 .79 .24 1.01 .25 .97 .25
(.05) (.02) (.20) (.02) (.18) (.02)
Heckit U [0, 10] 25 1 .75 .95 .74 1.00 .75 1.00 .75
(.02) (.01) (.02) (.01) (.02) (.01)
Notes: 1. M.E. = marginal effect. 2. There are 200 iterations for each design, N = 1000, ρ = .5.

Table 2. Actual marginal effects and elasticities from Monte Carlo simulation: Log specification.

Monte Carlo design


Estimated actual M.E. Estimated actual elasticity
True actual (standard errors) (standard errors)
True x Percent
model range censored M.E. Elasticity 2PM Heckit FIML 2PM Heckit FIML

Heckit U [0, 3] 75 .84 2.70 .80 25.64 .84 2.66 3.07 2.72
(.09) (253) (.12) (.20) (.58) (.21)
Heckit U [0, 3] 25 6.03 1.67 5.43 6.50 6.01 1.57 1.71 1.67
(.42) (1.52) (.64) (.07) (.15) (.10)
Heckit U [0, 10] 75 2.01 12.2 1.74 2.19 2.03 12.8 12.5 12.5
(.23) (.73) (.38) (1.1) (1.1) (1.1)
Heckit U [0, 10] 25 298 5.44 266 302 301 5.43 5.45 5.45
(26) (33) (32) (.13) (.13) (.13)
Notes: 1. M.E. = marginal effect. 2. There are 200 iterations for each design, N = 1000, ρ = .5.
HECKIT AND TWO-PART MODELS FOR CORNER SOLUTIONS 13

Although FIML is used less commonly in the literature than the Heckit, we include it here
because previous evidence has shown that it can significantly outperform the Heckit when
the sample selection model is the true data generating process. All of the Heckit formulas
apply to the FIML estimator as well.
Following Leung and Yu [13], our models have a single continuous explanatory variable
xk in addition to the constant term. We assume no exclusion restrictions, so the same
variables are used in both equations of each model. Model designs are estimated based on
two different ranges of the x-support: xk ∼ [0, 3] and xk ∼ [0, 10]. Leung and Yu [13] show
that the Heckit estimator performs poorly with xk ∼ [0, 3] because the narrow support of
xk is insufficient to allow identification of the inverse Mills ratio (a non-linear function of
xk ) separately from the xk variable in the second stage. Under xk ∼ [0, 10], however, there
is sufficient non-linear variation for the Heckit to be well identified by functional form with
no identifying exclusion restrictions.
The errors in Eqs. (3) and (4) are drawn from a bivariate normal distribution with corre-
lation ρ = 0.5:

     
ε2 0 1 0.5
∼N , (20)
ε3 0 0.5 1

The true coefficient for xk is set equal to one (βk = 1) in both equations of the model,
thus the true potential marginal effect is unity with linear y. The intercept is varied to allow
for either 75% or 25% censoring, but again is set to have the same value in both equations
of the model, following Leung and Yu [13]. We drew 1000 observations on ε2 , ε3 , and
xk for each of 200 iterations, and used these same draws to estimate each of the reported
designs. We estimate standard errors as the standard deviations of the 200 estimates of each
statistic.
A number of insights are apparent from this simulation. First, the magnitude of the
potential marginal effect is different from that of the actual marginal effect, in many cases
by several hundred percent (Table 1). Confusing potential with actual marginal effects can
result in large errors.
Second, when estimating the Heckit model, the statistical significance of βk (the po-
tential marginal effect in the linear specification) can be very different from the statistical
significance of the properly calculated actual marginal effect. In our example this is most
apparent in the design for which the Heckit is most poorly identified: xk ∼ [0, 3] with 75%
censoring (Table 1). In this design βk is statistically insignificant (βkHeckit,linear = 1.10, s.e.
(standard error) = 1.08), while the actual marginal effect of xk on y is highly significant
(m Heckit,linear
k = 0.20, s.e., = 0.03). Failing to compute the correct actual marginal effect
would lead to the common mistake of inferring that xk does not influence y when it actually
does.
Third, the usual specification test of whether the inverse Mills coefficient is significant
may exacerbate the previous problem when βkHeckit,linear is analyzed instead of m Heckit,linear
k .
The large standard error on βk in this design arises from multicollinearity between xk and
the inverse Mills ratio, the result of the Heckit model being poorly identified. But such
multicollinearity causes parameter instability, with βk being highly correlated with the
14 DOW AND NORTON

inverse Mills coefficient. When this instability causes βk to be unusually small (and hence
appear insignificant), the inverse Mills coefficient will be unusually large. Thus the t-test
of the inverse Mills coefficient will incorrectly reject the 2PM in favor of the Heckit in
exactly those models in which the t-statistic on βk is unusually small. This is related to the
unsatisfactory performance of this t-test that was noted by Leung and Yu [13]. In Section 5
of this paper we offer an alternative model test that does not suffer from this problem.
Fourth, for linear y, the actual marginal effects and standard errors are virtually identical
across the estimators (Table 1). This is not surprising, because in comparison to the 2PM, the
selection estimators simply add one additional covariate (the inverse Mills ratio) that is just
a non-linear transform of the variables already in the model. (This explains the motivation
for the data-analytic two-part model also used by Manning et al. [15], in which higher order
terms of X are added to the 2PM based on Mallow’s C p criterion of testing their statistical
significance at t = 1.41.) For linear models, weak identification of βk separately from the
inverse Mills coefficient in the Heckit model does not adversely affect the estimate of the
actual marginal effects.
Fifth, the actual marginal effects do differ across models when the dependent variable is
specified in log form (Table 2). For log models, the Heckit marginal effect has extremely
high variance when the model is poorly identified. For example, in the log design with
Heckit,log
75% censoring and xk ∼ [0, 3], the Heckit marginal effect estimate (m k = 25.64,
2PM,log
s.e. = 253) is orders of magnitude larger than in the 2PM (m k = 0.80, s.e. = 0.09) or
FIML,log
FIML (m k = 0.84, s.e. = 0.12). Further exploration indicates that this is primarily
due to high variance of the Heckit estimate of σ32 after re-transforming errors from log to
linear y space. Manning et al. [15] also noted this phenomenon. Recall from Eq. (4) that
estimated ε3 includes the inverse Mills term, thus instability of the inverse Mills coefficient
arising from poor identification leads to occasional large errors ε3 in log specifications.
Exponentiation of those large errors in the re-transformation to linear y is what causes the
high variance of the Heckit estimate of the marginal effect. By not attempting to explicitly
estimate the correlation between ε2 and ε3 , the 2PM circumvents this problem. As noted in
previous Monte Carlo simulations comparing 2PM to Heckman selection models [13, 15],
the FIML estimator also appears much more robust to this problem, with a more efficient
estimator of σ32 .
Sixth, the high variance in the Heckit log model is ameliorated when estimating the
elasticity rather than the marginal effect (Table 2). While the Heckit elasticity standard error
Heckit,log 2PM,log
(ηk = 3.07, s.e. = 0.58) is double that of the 2PM (ηk = 2.66, s.e. = 0.20) or
FIML,log
FIML (ηk = 2.72, s.e. = 0.21) for this poorly identified design, the Heckit elasticity
True,log
bias is less than 15% relative to the truth (ηk = 2.70). The reason that Heckit performs
better at estimating the elasticity than at estimating the marginal effect in log models can be
seen by noting that the Heckman elasticity Eq. (12) does not require exponentiation of σ32 .
This suggests that when estimating the Heckit model with a log dependent variable, it may be
preferable to focus estimation on the elasticity rather than the marginal effect. It also suggests
the importance of choosing appropriately between the linear and log specifications. While
others have highlighted the danger of ignoring the skewness in variables such as positive
health care expenditures, our analysis indicates that there may also be a substantial cost to
incorrectly assuming log specifications.
HECKIT AND TWO-PART MODELS FOR CORNER SOLUTIONS 15

5. Choosing between the Heckit and the 2PM

Having discussed theoretical and computational issues surrounding appropriate estimation


of the marginal effects of interest, we now turn to statistical tests to choose between the
Heckit and the 2PM. The standard specification test to choose between these models is
a t-test of the coefficient (ρσ3 ) on the inverse Mills ratio λ in Eq. (4) (e.g. [21]). If the
coefficient is zero then the Heckit reduces exactly to the 2PM. Duan and colleagues [4]
show that the converse is not necessarily true, i.e., the 2PM does not require that ρ = 0. The
models simply make different implicit distributional assumptions, and are not in general
nested. Nevertheless, as Leung and Yu [13] argue, a test of ρ = 0 in the Heckit can be used
to test the null hypothesis that the 2PM is correct against the alternative hypothesis that the
Heckit is correct, and thus the models are partially nested.
However, this t-test should be used with caution because it is sensitive to multicollinear-
ity. Leung and Yu [13] show that this t-test is not reliable when the inverse Mills ratio λ
is highly collinear with the independent variables X , such as when there are no identifying
instruments, as is common in health economics applications. If a researcher conducts the
standard t-test, the degree of multicollinearity between λ and X should be reported—Leung
and Yu recommend the condition number, but other measures may also be useful such as
variance inflation factors, or correlations from the variance-covariance matrix of the coeffi-
cient estimates. The drawback of relying on such multicollinearity indicators is that they do
not provide guidance regarding how much multicollinearity is too much, i.e., how in practice
to trade-off parameter instability with bias when choosing between the Heckit and 2PM.
In Monte Carlo studies, such as the one by Leung and Yu [13], an important statistical
criterion used to choose between the Heckit and 2PM is the mean square error (MSE) of the
parameter of interest. The MSE is the variance plus the square of the bias, and therefore its
calculation requires knowledge of the true parameter to compute the bias. Unfortunately,
this MSE criterion cannot be used in empirical applications because the true parameter
values are unknown. However, Toro-Vizcarrondo and Wallace [19] suggested a test based
on the MSE criterion for empirical applications with multicollinearity (see additional notes
on this test in Greene [6]). We refer to this as an Empirical MSE test.
The intuition is to calculate the Empirical MSE of both estimators of interest, assuming
that the Heckit model is consistent (and hence correct), and then to choose the estimator
with the lower Empirical MSE. Therefore, the Heckit is treated as the null estimator and
assumed to have zero bias. The original test statistic was derived for OLS models, but the
intuition can be extended to the Heckit and 2PM. We focus attention on the actual marginal
effects m Heckit and m 2PM as defined earlier. Under these assumptions, the Empirical MSE
(EMSE) for the Heckit simplifies to the variance of the estimated actual marginal effect:

EMSE (m Heckit ) = Var(m Heckit ) + (m Heckit − m Heckit )2


= Var(m Heckit ) (21)

The Empirical MSE for the 2PM is the sum of the variance and the square of the bias,
assuming that the Heckit is correct:
EMSE (m 2PM ) = Var(m 2PM ) + (m 2PM − m Heckit )2 (22)
16 DOW AND NORTON

Table 3. Monte Carlo evidence comparing model choice using True MSE of elasticity vs. Empirical MSE.

Monte Carlo design


Bias MSE Preferred model
True x Percent
model range censored Estimator Variance True Emp. True Emp. TMSE EMSE

Heckit U [0, 3] 75 Heckit .451 .291 0 .535 .451


2PM .059 .037 −.254 .061 .124 2PM 2PM
50 Heckit .041 .096 0 .050 .041
2PM .015 .035 −.061 .016 .019 2PM 2PM
25 Heckit .018 .046 0 .020 .018
2PM .005 −.021 −.068 .006 .010 2PM 2PM
Heckit U [0, 10] 75 Heckit 5.479 .328 0 5.586 5.479 Heckit Heckit
2PM 4.284 1.466 1.139 6.434 5.580
50 Heckit .270 .087 0 .277 .270 Heckit Heckit
2PM .195 1.064 .977 1.327 1.149
25 Heckit .013 .006 0 .013 .013 Heckit Heckit
2PM .009 −.156 −.162 .033 .035
2PM U [0, 10] 75 Heckit 5.171 .262 0 5.240 5.171
2PM 4.112 .240 −.022 4.170 4.113 2PM 2PM
50 Heckit .326 .089 0 .334 .326
2PM .146 .060 −.029 .150 .147 2PM 2PM
25 Heckit .007 −.003 0 .007 .007
2PM .006 −.006 −.003 .006 .006 2PM 2PM
Notes: 1. Emp. = Empirical, TMSE = True MSE, EMSE = Empirical MSE. 2. The True Bias and True MSE
are taken from Leung and Yu [13] Tables 1, 2, and 10. All other quantities are derived from these based on the
formula: MSE = Variance + Bias2 .

Assuming that the Heckit is consistent and has no bias does not mean that EMSE
(m Heckit ) < EMSE (m 2PM ) is a foregone conclusion. In the Monte Carlo analysis by Leung
and Yu ([13], Table 10) the 2PM often had lower True MSE than the Heckit, even when the
Heckit was assumed to be true. In such cases, we assert that our proposed Empirical MSE
criterion should be useful for choosing between the two models.
To provide evidence on the performance of the Empirical MSE criterion, we compare
its values to the True MSE criterion using results reported in Leung and Yu [13]. In their
Tables 2, 3, and 10, Leung and Yu provide Monte Carlo evidence for several permutations
of the Heckit and 2PM, with a single uniformly distributed covariate X , and ρ = 0.5. We
present results derived from three of their reported simulations. The first two assume that
the Heckit is the true model—first the high multicollinearity model with X ∼ U [0, 3],
then the lower multicollinearity version with X ∼ U [0, 10]. The third simulation assumes
that the 2PM is the true model (they only report results with X ∼ U [0, 3]). We focus
on their “Elasticity Bias” and “Elasticity Square Error” because they are conceptually closest
to the marginal effects which we emphasize in this paper. In our Table 3 we reproduce their
estimates of these parameters, which we label the True Bias and True MSE. From these we
can calculate Variance = (True MSE) − (True Bias)2 , and then use our Eqs. (21) and (22)
to calculate “Empirical Bias” and “Empirical MSE.”
HECKIT AND TWO-PART MODELS FOR CORNER SOLUTIONS 17

In virtually all of the models presented, the True MSE differences between the Heckit and
2PM are dominated by differences in the variances; the relative biases are comparatively
small. This implies that the Empirical MSE criterion is only mildly affected by assuming
that the Heckit is correct, even when the 2PM is the true model. As a consequence, in these
models our proposed test of the sign of the difference in Empirical MSEs appears to be a
reasonably good estimate of the sign of the difference in the True MSEs, and hence a useful
metric for choosing between the Heckit and 2PM. In fact, in all nine of the models presented
in our Table 3, the Empirical MSE test chooses the same model as does comparison of the
True MSEs.
The Empirical MSE criterion can also be used to compare the Heckit and 2PM to other
models that are not nested. For example, Manning et al. [15] find that a data-analytic two-
part model (DA2PM) often performs substantially better in Monte Carlo studies than the
standard 2PM. The DA2PM is a generalization of the 2PM that adds higher order terms of
X to Eq. (14). The DA2PM provides a data-based method of specifying non-linearities in
X in the conditional equation, as opposed to the Heckit method based on a priori normality
distributional assumptions. The Heckit does not even partially nest the DA2PM, making
the t-test on the λ coefficient irrelevant for choosing between these models, but instead the
Empirical MSE test proposed here is likely to prove useful.

6. Discussion

A substantial amount of health economics research has been based on either the Heckit or
the 2PM. Unfortunately, the results of these studies have all too commonly been interpreted
incorrectly. Without presentation of additional tests and results, it is typically not possible
to know to what extent conclusions would be affected by improved model choice and
testing. The objective of this paper is to improve applied health economics research by
encouraging researchers to more clearly articulate their effect of interest, more carefully
derive the marginal effects of interest, and rely more on rigorous empirical tests to choose
the most appropriate statistical model.

Acknowledgments

Edward Norton was funded from the National Institute on Aging grant R01-AG16600.

Note

1. The term “cake debates” refers to the title of an article by Hay and Olsen [8] as part of an exchange with Duan
and colleagues [3, 4] comparing the Heckit and 2PM.

References

1. Ai, C. and Norton, E.C., “Standard errors for the retransformation problem with heteroscedasticity,” Journal
of Health Economics 19(5), 697–718, 2000.
2. Angrist, J.D., “Estimations of limited dependent variable models with dummy endogenous regressors: Simple
strategies for empirical practice,” Journal of Business and Economic Statistics 19, 2–16, 2001.
18 DOW AND NORTON

3. Duan, N., Manning, W.G., Morris, C.N., and Newhouse, J.P., “A comparison of alternative models for the
demand for medical care,” Journal of Business and Economic Statistics 1, 115–126, 1983.
4. Duan, N., Manning, W.G., Morris, C.N., and Newhouse, J.P., “Choosing between the sample-selection model
and the multi-part model,” Journal of Business and Economic Statistics 2, 283–289, 1984.
5. Gilleskie, D.B. and Mroz, T.A., “Estimating the effects of covariates on health expenditures,” Working paper,
Department of Economics, University of North Carolina at Chapel Hill, 2000.
6. Greene, W.H., Econometric analysis, 4th edn., Prentice Hall, Inc., Upper Saddle River, 2000.
7. Hay, J.W., Leu, R., and Fohrer, P., “Ordinary least squares and sample-selection models of health-care demand,”
Journal of Business and Economic Statistics 5, 499–506, 1987.
8. Hay, J.W. and Olsen, R.J., “Let them eat cake: A note on comparing alternative models of the demand for
health care,” Journal of Business and Economic Statistics 2, 279–282, 1984.
9. Heckman, J.J., “What has been learned about labor supply in the past twenty years?” American Economic
Review 83(2), 116–121, 1993.
10. Jones, A.M., Health Econometrics, in Handbook of Health Economics (A.J. Culyer and J.P. Newhouse, eds.),
North Holland, Amsterdam, 265–344, 2000.
11. Lee, L.-F., “Some approaches to the correction of selectivity bias,” Review of Economic Studies 49, 355–372,
1982.
12. Lee, L.-F., “Generalized econometric models with selectivity,” Econometrica 51, 507–512, 1983.
13. Leung, S.F. and Yu, S., “On the choice between sample selection and two-part models,” Journal of Econo-
metrics 72, 197–229, 1996.
14. Manning, W.G., “The logged dependent variable, heteroscedasticity, and the retransformation problem,”
Journal of Health Economics 17, 283–296, 1998.
15. Manning, W.G., Duan, N., and Rogers, W.H., “Monte Carlo evidence on the choice between sample selection
and two-part models,” Journal of Econometrics 35, 59–82, 1987.
16. Mullahy, J., “Much ado about two: Reconsidering retransformation and the two-part model in health econo-
metrics,” Journal of Health Economics 17, 247–282, 1998.
17. Mullahy, J., “Estimations of limited dependent variable models with dummy endogenous regressors: Simple
strategies for empirical practice: Comment,” Journal of Business & Economic Statistics 19, 23–25, 2001.
18. Salas, C. and Raftery, J.P., “Econometric issues in testing the age neutrality of health care expenditure,” Health
Economics 10, 669–671, 2001.
19. Toro-Vizcarrondo, C. and Wallace, T.D., “A test of the Mean Square Error criterion for restrictions in linear
regression,” Journal of the American Statistical Association 558–72, 1968.
20. van de Ven, W.P. and van Praag, B.M., “Risk aversion of deductibles in private health insurance: Application
of an adjusted Tobit model to family health care expenditures,” in Health, Economics and Health Economics
(J. van der Gaag and M. Perlman, eds.), North Holland, Amsterdam, 125–148, 1981.
21. Wooldridge, J.M., Econometric analysis of cross section and panel data, MIT Press, Cambridge, MA,
2002.
22. Zweifel, P., Felder, S, and Meiers, M., “Ageing of population and health care expenditure: A red herring?”
Health Economics 8, 485–496, 1999.
23. Zweifel, P., Felder, S., and Meier, M., “Reply to: Econometric issues in testing the age neutrality of health
care expenditure,” Health Economics 10, 673–674, 2001.

You might also like