A Stochastic Frontier Model With Correction For Sample Selection

J Prod Anal (2010) 34:15–24
DOI 10.1007/s11123-009-0159-1
A stochastic frontier model with correction for sample selection

William Greene
Published online: 28 November 2009

Springer Science+Business Media, LLC 2009
Abstract Heckman’s (Ann Econ Soc Meas 4(5), 475– JEL Classification C13 C15 C21
492, 1976; Econometrica 47, 153–161, 1979) sample
selection model has been employed in three decades of
applications of linear regression studies. This paper builds 1 Introduction
on this framework to obtain a sample selection correction
for the stochastic frontier model. We first show a surpris- Heckman’s (1976, 1979) sample selection model has been
ingly simple way to estimate the familiar normal-half employed in three decades of applications of linear
normal stochastic frontier model using maximum simu- regression studies. Numerous applications have extended
lated likelihood. We then extend the technique to a sto- Heckman’s approach to nonlinear settings such as the
chastic frontier model with sample selection. In an binary probit and Poisson regression models. The first is
application that seems superficially obvious, the method is Wynand and van Praag’s (1981) development of a probit
used to revisit the World Health Organization data (WHO model for insurance purchase. Among a number of other
in The World Health Report, WHO, Geneva 2000; Tandon recent applications, Bradford et al. (2001) extended
et al. in Measuring the overall health system performance Heckman’s method to a stochastic frontier model for
for 191 countries, World Health Organization, 2000) where hospital costs. The familiar approach in which a sample
the sample partitioning is based on OECD membership. selection correction term is simply added to the model of
The original study pooled all 191 countries. The OECD interest (see (7) and (8)) is not appropriate for nonlinear
members appear to be discretely different from the rest of models such as the stochastic frontier. In this study, we
the sample. We examine the difference in a sample selec- build on the maximum likelihood estimator of Heckman’s
tion framework. sample selection corrected linear model and the extension
to nonlinear models by Terza (1996, 2009) to obtain a
Keywords Stochastic frontier Sample selection counterpart for the stochastic frontier model. We first show
Simulation Efficiency a surprisingly simple way to estimate the familiar normal-
half normal stochastic frontier model using maximum
simulated likelihood. The next step is to extend the tech-
nique to a stochastic frontier model in the presence of
sample selection.
The method is used to revisit the World Health Orga-
nization (2000) data (see also Tandon et al. 2000) where
the sample partitioning is based on OECD membership.
W. Greene (&) The original study pooled all 191 countries (in a panel,
Department of Economics, Stern School of Business, albeit one with negligible within groups variation). The
New York University, 44 West 4th St., Rm. 7-78,
OECD members appear to be discretely different from the
New York, NY 10012, USA
e-mail: wgreene@stern.nyu.edu rest of the sample. We examine the difference in a sample
URL: pages.stern.nyu.edu/*wgreene selection framework.
123
16 J Prod Anal (2010) 34:15–24
2 A selection corrected stochastic frontier model supported techniques in a variety of programs including
LIMDEP, Stata and TSP.2
The stochastic frontier model of Aigner et al. (1977) (ALS) Conditioned on ui, the central equation of the model in
is specified with (2.1) would be a classical linear regression model with
normally distributed disturbances. Thus,
y i ¼ b0 xi þ v i ui
where ui ¼ jru Ui j ¼ ru jUi j; Ui N½0; 1 ð1Þ exp½12ðyi b0 xi þ ru jUi jÞ2 =r2v
yi ¼ rv Vi ; Vi N½0; 1 f ðyi jxi ;jUi jÞ ¼ pffiffiffiffiffiffi : ð3Þ
rv 2p
A vast literature has explored variations in the The unconditional log likelihood for the model is
specification to accommodate, e.g., heteroscedasticity, obtained by integrating the unobserved random variable,
panel data formulations, etc.1 It will suffice for present |Ui|, out of the conditional density. Thus,
purposes to work with the simplest form. Extensions will
Z
be considered later. The model can be estimated by exp½12ðyi b0 xi þ ru jUi jÞ2 =r2v
modifications of ordinary least squares (e.g., Greene f ðyi jxi Þ ¼ pffiffiffiffiffiffi pðjUi jÞdjUi j;
jUi rv 2p
2008a), the generalized method of moments (Kopp and
/ðjUi jÞ h irffiffiffi
2
Mullahy 1990) or, as is conventional in the recent 1 2
where pðjUi jÞ ¼ ¼ exp jUi j ; jU i j 0
literature, by maximum likelihood (ALS). (A spate of Uð0Þ 2 p
Bayesian applications has also appeared in the recent X N
literature, e.g., Koop and Steel 2001). In this study, we will then log Lðb; ru ; rv Þ ¼ log f ðyi jxi Þ; ð4Þ
i¼1
suggest, a fourth estimator, maximum simulated likelihood
(MSL). The simulation based estimator merely replicates where / is the standard normal density and U is the
the conventional estimator for the base case, in which the standard normal cdf. The closed form of the integral
closed form is already available. The log likelihood appears in (2).3 Consider using simulation to approximate
function for the sample selection model does not exist in the integrals;
closed form, so some approximation method, such as MSL
is necessary. 1X R
exp½12ðyi b0 xi þ ru jUir jÞ2 =r2v
f ðyi jxi Þ pffiffiffiffiffiffi ð5Þ
R r¼1 rv 2p
2.1 Maximum likelihood estimation of the stochastic where Uir is R random draws from the standard normal
frontier model population. (There is no closed form for the extension of
the model that appears below.) The simulated log
The log likelihood for the normal-half normal model for a likelihood is
sample of N observations is
log LS ðb; ru ; rv Þ ¼
log Lðb; r; kÞ ¼ ( )
X N
1X R
exp½12ðyi b0 xi þ ru jUir jÞ2 =r2v
X N h i log pffiffiffiffiffiffi ð6Þ
1 2 1
log log r ðei =rÞ2 þ log Uðcei =rÞ i¼1
R r¼1 rv 2p
2 p 2
i¼1
0
where ei ¼ yi b xi ¼ vi ui ; The maximum simulated likelihood estimators of the
model parameters are obtained by maximizing this function
c ¼ ru =rv ;
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi with respect to the unknown parameters.4
r ¼ r2v þ r2u ð2Þ
and U() denotes the standard normal cdf. The density 2.2 Sample selection in the linear model
satisfies the standard regularity conditions, and maximum
likelihood estimation of the model is a conventional Heckman’s (1979) sample selection model for the linear
problem handled with familiar methods. Estimation is regression case is specified as
straightforward and has been installed in the menu of
2
Details on maximum likelihood estimation of the model can be
found in ALS and elsewhere, e.g., Greene (Greene 2008b, Ch. 16).
3
See Weinstein (1964).
1 4
See Greene (2008a) for further development of the model and a See Gourieroux and Monfort (1996), Train (2003), Econometric
survey of extensions and applications. Software, Inc. (2007), Greene (2008b) and Greene and Misra (2004).
123
J Prod Anal (2010) 34:15–24 17
di ¼ 1½a0 zi þ wi [ 0; wi N ½0; 1 development of a probit model for binary choice. The typical
approach taken to control for selection bias, motivated by (8),
yi ¼ b0 xi þ ei ; ei N½0; r2e
ð7Þ is to fit the probit model in (7), as in the first step of Heckman’s
ðwi ; ei Þ N2 ½ð0; 1Þð1; qre ; r2e Þ two step estimator, then append ^ki (from (8)) to the linear
ðyi ; xi Þ observed only when di ¼ 1: index part of the nonlinear model wherever it happens to
appear. The approach is inappropriate. The term ^ ki in (8)
Two familiar methods have been developed for arises as E[ei|di = 1] in a linear model. The expectation of
estimation of the model parameters. Heckman’s (1979) some nonlinear g(b0 xi ? ei) subject to selection will gener-
two step, limited information method builds on the result ally not produce the form E[g(b0 xi ? ei)|di = 1] = g
E½yi jxi ; di ¼ 1 ¼ b0 xi þ E½ei jdi ¼ 1 (b0 xi ? hki) which can then be carried back into the otherwise
¼ b0 xi þ qre /ða0 zi Þ=Uða0 zi Þ unchanged nonlinear model. See, e.g., Terza (1994, 1996,
¼ b0 xi þ hki ð8Þ 1998) who develops the result in detail for nonlinear regres-
sions such as the exponential conditional mean case. Indeed,
In the first step, a in the probit equation is estimated by in some cases, such as the probit and count data models, the
unconstrained single equation maximum likelihood and the ei for which the expectation given di = 1 is taken does not
inverse Mills ratio (IMR), ^ ki ¼ /ðâ0 zi Þ=Uð^
a0 zi Þ is computed even appear in the original model; it is unclear as such what
for each observation. The second step in Heckman’s the correction is correcting.
procedure involves linear regression of yi on the The distribution of the observed random variable condi-
augmented regressor vector, xi* = (xi, ^ ki ), using the tioned on the selection will generally not be what it was
observed subsample, with a correction of the OLS standard without the selection (with or without the addition of the
errors to account for the fact that an estimate of a is used in inverse Mills ratio, ki to the index function). Thus, the
the constructed regressor. addition of ki to the original likelihood function generally
The full information maximum likelihood estimator for the does not produce the appropriate log likelihood in the
model is developed in Heckman (1976) and Maddala (1983). presence of the sample selection. This can be seen even for
The log likelihood function for the sample selection model is the linear case in (9). The least squares estimator of b (with
log Lðb; re ; a; qÞ ki added to the equation) is not the MLE in (9); it is merely a
2 8 93 feasible consistent estimator. Two well worked out specific
>
> exp 1
ððy b 0
x Þ 2
=r2
Þ >
> cases do appear in the literature. Maddala (1983) and Boyes
6 > > 2 i i e >
> 7
6 > > p ffiffiffiffiffi
ffi >
>
6 < r e 2p =77
et al. (1989) obtained the appropriate closed form log like-
X N 6 di 7 lihood for a probit model subject to sample selection. The
6 > !>7
¼ log6 > > 0
ðqðyi b xi Þ=re Þ þ a zi > 0 > 7 resulting formulation is a type of bivariate probit model, not
6 > > 7
i¼1 6 > >
: U pffiffiffiffiffiffiffiffiffiffiffiffiffi >
>
; 7 a univariate probit model based on (xi, ki). Another well
6 1q 2 7
4 5 known example is the open form result for the Poisson
0
þ ð1 di ÞUða zi Þ regression model obtained by Terza (1996, 1998).5
" !
X N
1 ei ðqei =re þ a0 zi The combination of efficiency estimation and sample
¼ log di / U pffiffiffiffiffiffiffiffiffiffiffiffiffi selection appears in several studies. Bradford, et al. (2001)
i¼1
re re 1 q2
# studied patient specific costs for cardiac revascularization
in a large hospital. They state ‘‘… the patients in this
þð1 di ÞUða0 zi Þ ð9Þ sample were not randomly assigned to each treatment
group. Statistically, this implies that the data are subject to
This has become a conventional, if relatively less sample selection bias. Therefore, we utilize a standard
frequently used estimator that is built into most con- Heckman two-stage sample-selection process, creating an
temporary software. IMR from a first-stage probit estimator of the likelihood of
CABG or PTCA. This correction variable is included in the
2.3 Estimating a stochastic frontier model with sample frontier estimate…’’ (page 306).6
selection
5
See, also, Winkelmann (1998).
The received literature contains many studies in which 6
The authors opt for a GMM estimator based on Kopp and
authors, have extended Heckman’s selectivity model to non- Mullahy’s (1990) (KM) relaxation of the distributional assumptions in
linear settings, such as count data (e.g., Poisson regression— the standard frontier model. It is suggested, that KM ‘‘find that the
traditional maximum likelihood estimators tend to overestimate the
Greene 1994), nonlinear regression, and binary choice
average inefficiency.’’ (Page 304) KM did not, in fact, make the latter
models. The first application of the sample selection treatment argument, and we can find no evidence to support it in the since
in a nonlinear setting was Wynand and van Praag’s (1981) received literature. KM’s support for the GMM estimator is based on
123
18 J Prod Anal (2010) 34:15–24
Sipiläinen and Oude Lansink (2005) have utilized a sto- of Heckman’s result in (9) to nonlinear models. The result
chastic frontier, translog model to analyze technical efficiency in (11) shows an application to the stochastic frontier
for organic and conventional farms. They state ‘‘Possible case—see (34:SS) in Terza (2009).
selection bias between organic and conventional production Sample selection arises as a consequence of the corre-
can be taken into account [by] applying Heckman’s (1979) lation of the unobservables in the production or cost
two step procedure.’’ (Page 169.) In this case, the inefficiency equation, vi, with those in the sample selection equation,
component in the stochastic frontier translog distance function wi. Two other applications of this general approach to
is distributed as the truncation at zero of a Ui with a hetero- modeling sample selection or endogenous switching in the
geneous mean.7 The IMR is added to the deterministic (pro- stochastic frontier model have appeared in the recent lit-
duction function) part of the frontier function. erature. In Kumbhakar et al. (2009), the model framework
Other authors have acknowledged the sample selection is very similar to that in (10), but the selection mechanism
issue in stochastic frontier studies. Kaparakis et al. (1994) is assumed to operate through ui rather than vi. In partic-
in an analysis of commercial banks and Collins and Harris ular, the disturbance in their counterpart to the equation for
(2005) in their study of UK chemical plants both suggested di is wi ? dui; in essence, the inefficiency in the production
that ‘‘sample selection’’ was a potential issue in their process produces an ‘‘inclination’’ towards, in their case,
analysis. Neither of these formally modified their stochastic organic farming. In Lai et al.’s (2009) application to a wage
frontier models to accommodate the result, however. equation, the wi in the selection mechanism is correlated
If, to motivate the sample selection treatment, we specify (through a copula function) with ei, not specifically with
that the unobservables in the selection model are correlated vi or ui. In both of these cases, the log likelihood is sub-
with the noise in the stochastic frontier model, then the sto- stantially more complicated than the one used here. More
chastic frontier model with sample selection can be cast as an importantly, the difference in the assumption of the impact
extension of Heckman’s specification for the linear regres- of the selection effect is substantive.
sion model. The combination of the models in (1) and (7) is The log likelihood for the model in (10) is formed by
di ¼ 1½a0 zi þ wi [ 0; wi N½0; 1 integrating out the unobserved |Ui| then maximizing with
respect to the unknown parameters. Thus, as in (4) and (5),
yi ¼ b0 xi þ ei ; ei N½0; r2e
ðyi ; xi Þ observed only when di ¼ 1: log Lðb; ru ; rv ; a; qÞ
X N Z
e i ¼ v i ui ð10Þ
¼ log f ðyi jxi ; zi ; di ; jUi jÞpðjUi jÞdjUi j: ð12Þ
ui ¼ jru Ui j ¼ ru jUi j where Ui N½0; 1 i¼1 jUi j
vi ¼ rv Vi where Vi N½0; 1
The integral in (12) is not known; it must be
ðwi ; vi Þ N2 ½ð0; 1Þ; ð1; qrv ; r2v Þ approximated. The simulated log likelihood function is
The conditional density for an observation in this log LS ðb;ru ; rv ;a; qÞ
2 2 3
specification is
exp 12ðyi b0 xi þ ru jUir jÞ2 =r2v Þ
f ðyi jxi ;jUi j; zi ; di ;Þ 6 di 4 pffiffiffiffiffiffi 7
6 rv 2p 7
2 8 93 6 7
>
> exp 1
ðy b 0
x þ r jU jÞ 2
=r2
Þ >
> XN
1 X6
R 6 7
!# 7
6 > 2 i >
i u i v
> > 7 ¼ log 6 7
6 > >
<
pffiffiffiffiffiffi >
>
=7 R 6
0
qðyi b xi þ ru jUir jÞ=re þ a zi0
7
6 rv 2p 7 i¼1 r¼1 6 U pffiffiffiffiffiffiffiffiffiffiffiffi 7
6 di 7 6 1 q 2 7
6 > !>7 4 5
¼6 > 0 > 7 ð11Þ
6 > qðyi b xi þ ru jUi jÞ=re þ a zi > 0
> > 7
6 > > U p ffiffiffiffiffiffiffiffiffiffiffiffiffi >
>
;7 þ ð1 di ÞUða0 zi Þ
6 : 1 q 2 7
4 5 ð13Þ
0
þ ð1 di ÞUða zi Þ
To simplify the estimation, we will use a two step approach.
Save for the appearance of the unobserved inefficiency The single equation MLE of a in the probit equation in (7) is
term, ru|Ui|, (11) is the same as (9). Terza (1996, 2009) consistent, albeit inefficient. For purposes of estimation of the
develops the log likelihood function for a generic extension parameters of the stochastic frontier model, however, a need
not be reestimated. We take the estimates of a as given in the
Footnote 6 continued simulated log likelihood in (13), then use the Murphy and
its more general, distribution free specification. We do note New- Topel (2002) correction to adjust the standard errors in
house (1994), whom Bradford et al. cite, has stridently argued against
the stochastic frontier model as well, but not based on the properties essentially the same fashion as Heckman’s correction of the
of the MLE. canonical selection model in (8). Thus, the conditional
7
See Battese and Coelli (1995). simulated log likelihood function is
123
J Prod Anal (2010) 34:15–24 19
logLS;C ðb;ru ;rv ;qÞ maximum simulated likelihood estimates. An alternative

2 8 93 approach takes advantage of the simulation of the values of
>
> exp 12ðyi b0 xi þru jUir jÞ2 =r2v Þ > >
6 > > >
> 7 ui during estimation. Using Bayes theorem, we can write
6 > >
<
pffiffiffiffiffiffi >
>
=7
6 rv 2p 7 pðui ; ei Þ pðei jui Þpðui Þ
X R 6 di 7
N
1X 6 > !>7 pðui jei Þ ¼ ¼R : ð17Þ
¼ log 6 > > qðy b 0
x þr jU jÞ=r þa >
> 7 pðei Þ ui pðei jui Þpðui Þdui
R r¼1 6 > > U i i u ir
pffiffiffiffiffiffiffiffiffiffiffi
e i >
> 7
i¼1 6 > : >
; 7
6 1q 2 7 Recall ui = ru|Ui|. Thus, equivalently,
4 5
þð1di ÞUðai Þ p½ðru jUi jÞ; ei
p½ðru jUi jÞjei ¼
ð14Þ pðei Þ
p½ei jðru jUi jÞpðru jUi jÞ
¼R ð18Þ
where ai = ^ a0 zi . With this simplification, the nonselected ui p½e i jðru jUi jÞpðru jUi jÞdðru jUi jÞ
observations (those with di = 0) do not contribute
information about the parameters to the simulated log The desired expectation is, then
likelihood. Thus, the function we maximize becomes E½ðru jUi jÞjei
log LS;C ðb; ru ; rv ; qÞ R
2 3 r jU j ðru jUi jÞp½ei jðru jUi jÞpðru jUi jÞdðru jUi jÞ
¼ u Ri
exp 12ðyi b0 xi þ ru jUir jÞ2 =r2v Þ ru jUi j pð½ei jðru jUi jÞpðru jUi jÞdðru jUi jÞ
6 pffiffiffiffiffiffi 7
X R 6 7
1X 6 rv 2p 7 ð19Þ
¼ log 6 ! 7
R r¼1 6 7
di ¼1 4 qðyi b0 xi þ ru jUir jÞ=re þ ai 5 These are the terms that enter the simulated log
U pffiffiffiffiffiffiffiffiffiffiffiffiffi
1 q2 likelihood for each observation. The simulated
denominator would be
ð15Þ 2 3
^ 0 xi þ r
exp 12ðyi b û jUir jÞ2 =^
r2v Þ
The parameters of the model are estimated using a 6 pffiffiffiffiffiffi 7
XR 6 7
conventional gradient based approach, the BFGS method. 1 6 r
^ v 2p 7
Bî ¼ 6 ! 7
We use the BHHH estimator to estimate the asymptotic R r¼1 6 ^ xi þ r
0 7
4 ^ðyi b
q û jUir jÞ=^ re þ a i 5
standard errors for the parameter estimators. When q U pffiffiffiffiffiffiffiffiffiffiffiffiffi
equals zero, the maximand reduces to that of the 1q ^2
maximum simulated likelihood estimator of the basic 1X R
frontier model shown earlier. This provides us with a ¼ fîr ð20Þ

R r¼1
method of testing the specification of the selectivity model PR
against the simpler model using a (simulated) likelihood while the numerator is simulated with Aî ¼ R1 r¼1
ratio test. ru jUir jÞfîr . The estimate of E[ui|ei] is then
ð^
X
R
fîr
Aî =Bî ¼ r
û cîr jUir j; where cir ¼
0\^ \1
2.4 Estimating observation specific inefficiency RRr¼1 fîr
r¼1
ð21Þ
The end objective of the estimation process is to charac-
terize the inefficiency in the sample, ui or the efficiency, These are computed for each observation using the
exp(-ui). Aggregate summary measures, such as the sample estimated parameters, the raw data and the same pool of
mean and variance are often provided (e.g., Bradford et al. random draws as were used to do the estimation. As shown
2001) for hospital costs). Researchers also compute indi- below, this gives a strikingly similar answer to the JLMS
vidual specific estimates of the conditional means based on plug in result suggested at the outset.
the Jondrow et al. (1982) (JLMS) result, The immediate advantage of this alternative approach is
only that the whole set of computations is done at once,
rk /ðli Þ kei
E½ui jei ¼ 2
li þ ; li ¼ ; during the estimation of the parameters. However, as noted
1þk Uðli Þ r ð16Þ
0
e i ¼ y i b xi below, the estimators in (15) and (21) can be employed
with other distributions for which the JLMS result in (16) is
The standard approach computes this function after not available. The simulation estimator suggested here can,
estimation based on the maximum likelihood estimates. In in principle, be used with any inefficiency distribution that
principle, we could repeat this computation with the can be simulated.
123
20 J Prod Anal (2010) 34:15–24
2.5 Panel data and other extensions based on a survey carried out by WHO to elicit stated
preferences of individuals in their relative valuations of the
Replication of the Pitt and Lee (1981) random effects form goals of the health system.’’ (TMLE, page 4) (It is
of the model, again with any distribution from which draws intriguing that in the public outcry over the results, it was
can be simulated, is simple. The term Bi defined in (20) that never reported that the WHO study did not, in fact, rank
enters the log likelihood becomes countries by health care attainment, COMP, but rather by
2 3 the efficiency with which countries attained their COMP.
exp 12ðyit b0 xit þ ru jUir jÞ2 =r2v Þ
6 pffiffiffiffiffiffi 7 That is, countries were ranked by the difference between
T 6 7
1X R Y
6 rv 2p 7 their COMP and a constructed country specific optimal
Bi ¼ 6 ! 7 COMP*.) In terms of COMP, itself, the US ranked 15th in
R r¼1 t¼1 6 0 7
4 qðyit b xit þ ru jUir jÞ=re þ ai 5 the study, not 37th, and France did not rank first as widely
U pffiffiffiffiffiffiffiffiffiffiffiffiffi
1 q2 reported, Japan did. The full set of results needed to reach
1X R Y Ti these conclusions are contained in TMLE (2000).
¼ fîrt ð22Þ The data set used by TMLE contained 5 years (1993–
R r¼1 t¼1
1997) of observations on the time varying variables
Further refinements, such as a counterpart to Battese and COMP, per capita health care expenditure and average
Coelli (1995) and Stevenson’s (1980) truncation model educational attainment, and time invariant, 1997 observa-
may be possible as well. This remains to be investigated. tions on the set of variables listed in Table 1. TMLE used a
linear fixed effects translog production model,
log COMPit ¼b1 þ b2 log HExpit þ b3 log Educit
3 Applications
þ b4 log2 Educit þ b5 log2 HExpit
In 2000, the World Health Organization published its þ b6 log HExpit log Educit ui þ vit ð23Þ
millennium edition of the World Health Report (WHR)
in which health expenditure and education enter loglinearly
(WHO 2000). The report contained Tandon et al.’s (2000)
and quadratically. (They ultimately dropped the last two
(TMLE) frontier analysis of the efficiency of health care
terms in their specification.) Their estimates of ui were
delivery for 191 countries. The frontier analysis attracted a
surprising amount of attention in the popular press (given
its small page length, minor role in the report and highly Table 1 Descriptive statistics for WHO variables, 1997 observations
technical nature), notably for its assignment of a rank of 37
Non-OECD OECD All
to the United States’ health care system. (7 years after its
publication, the report still commanded attention, e.g., New Mean SD Mean SD Mean SD
York Times 2007). The authors provided their data and COMP 70.30 10.96 89.42 3.97 73.30 12.34
methodology to numerous researchers who have subse- HEXP 249.17 315.11 1498.27 762.01 445.37 616.36
quently analyzed, criticized, and extended the WHO study EDUC 5.44 2.38 9.04 1.53 6.00 2.62
(e.g., Gravelle et al. 2002a,b; Hollingsworth and Wildman GINI 0.399 0.0777 0.299 0.0636 0.383 0.0836
2002; Greene 2004). VOICE -0.195 0.794 1.259 0.534 0.0331 0.926
TMLE based their analysis on COMP, a new measure of GEFF -0.312 0.643 1.166 0.625 -0.0799 0.835
health care attainment that they created. (The standard TROPICS 0.596 0.492 0.0333 0.183 0.508 0.501
measure at the time was disability adjusted life expectancy,
POPDEN 757.9 2816.3 454.56 1006.7 710.2 2616.5
DALE.) ‘‘In order to assess overall efficiency, the first step
PUBFIN 56.89 21.14 72.89 14.10 59.40 20.99
was to combine the individual attainments on all five goals
GDPC 4449.8 4717.7 18199.07 6978.0 6609.4 7614.8
of the health system into a single number, which we call
Efficiency 0.5904 0.2012 0.8831 0.0783 0.6364 0.2155
the composite index. The composite index is a weighted
Sample 161 30 191
average of the five component goals specified above. First,
country attainment on all five indicators (i.e., health, health Variables in the data set are as follows: COMP, WHO health care
attainment measure; HEXP, per capita health expenditure in PPP
inequality, responsiveness-level, responsiveness-distribu-
units; EDUC, average years of formal education; GINI, World bank
tion, and fair-financing) were rescaled restricting them to measure of income inequality; VOICE, World bank measure of
the [0, 1] interval. Then the following weights were used to democratization; GEFF, World bank measure of government effec-
construct the overall composite measure: 25% for health tiveness; TROPICS, dummy variable for tropical location; POPDEN,
population density in persons per square kilometer; PUBFIN, pro-
(DALE), 25% for health inequality, 12.5% for the level of
portion of health expenditure paid by government; GDPC, per capita
responsiveness, 12.5% for the distribution of responsive- GDP in PPP units; Efficiency, TMLE estimated efficiency from fixed
ness, and 25% for fairness in financing. These weights are effects model
123
J Prod Anal (2010) 34:15–24 21
computed from the estimated constant terms in the linear related to OECD membership that also impact the place-
fixed effects regression. Since their analysis was based on ment of the frontier function. We will use the sample
the fixed effects regression, they did not use the time selection model developed earlier to examine the issue. We
invariant variables in their regressions or subsequent note, it is not our intent here to replace the results of the
analysis (See Greene 2004 for discussion). Their overall WHO study. Rather, this provides a setting for demon-
efficiency indexes for the 191 WHO member countries are strating the selection model. Since we will be using a
published in the report (Table 1, pages 18–21) and used in stochastic frontier model while they used a fixed effects
the analysis below. linear regression, it will be difficult to make a direct
Table 1 lists descriptive statistics for the TMLE effi- comparison of the results (the issue is examined in detail in
ciencies and for the variables present in the WHO data Greene 2004). TMLE also used an elaborate normalization
base. The COMP, education and health expenditure are based on a turn of the last century benchmark to anchor
described for the 1997 observation. Although these vari- their efficiency estimates to a ‘‘minimal’’ level of health
ables are time varying, the amount of within group varia- care. And, of course, they used a panel data (fixed effects)
tion ranges from very small to trivial (See Gravelle et al. estimator whereas we have used a cross section. As such, it
2002a for discussion). The time invariant variables were seems unlikely that the specific estimates of inefficiency
not used in their analysis. The data in Table 1 are seg- would be very similar. We can, however, see whether
mented by OECD membership. The OECD members are general conclusions do hold up in the two settings. For
primarily 30 of the wealthiest countries (thought not spe- example, if both approaches are addressing the same broad
cifically the 30 wealthiest countries). The difference concept of efficiency relative to the production function in
between OECD countries and the rest of the world is evi- (23), then the rankings of countries might well be broadly
dent. Figure 1 plots the TMLE efficiency estimates versus similar. It is interesting to compare the rankings of coun-
per capita GDP, segmented by OECD membership. The tries produced by the two methodologies, though we will
figure is consistent with the values in Table 1. This sug- do so without naming names.
gests (but, of course, does not establish) that OECD We have estimated the stochastic frontier models for the
membership may be a substantive selection mechanism. logCOMP measure using TMLE’s truncated specification
OECD membership is based on more than simply per of the translog model. Since the time invariant data are
capita GDP. The selectivity issue is whether other factors only observed for 1997, we have used the country means of
related to OECD membership are correlated with the sto- the logs of the variables COMP, HExp and Educ in our
chastic element in the production function. estimation. Table 2 presents the maximum likelihood and
Figure 1 plots TMLE’s estimated efficiency scores maximum simulated likelihood estimates of the parameters
against per capita GDP for the 191 countries stratified by of the frontier models. The MSL estimates are computed
OECD membership. The difference is stark. The layer of using 200 Halton draws for each observation for the sim-
points at the top of the figure for the OECD countries ulation (see Greene 2008b or Train 2003 for discussion of
suggests that wealth produces efficiency in the outcome. Halton sequences). By using Halton draws rather than
The question for present purposes is whether the selection pseudorandom numbers, we can achieve replicability of the
based on the observed GDP value is a complete explana- estimates. To test the specification of the selection model,
tion of the difference, or whether there are latent factors we have fit the sample selection model while constraining
q to equal zero. The log likelihood functions can then be
compared using the usual chi squared statistic. The results
provide two statistics for the test, then, the Wald statistic
(t ratio) associated with the estimate of q and the likelihood
ratio statistic. Both Wald statistics fail to reject the null
hypothesis of no selection. For the LR statistics (with one
degree of freedom) we do not reject the base model for the
non-OECD countries, but we do for the OECD countries,
in conflict with the t test. Since the sample is only 30
observations, the standard normal and chi squared limiting
distribution used for the test statistic may be suspect. We
would conclude that the evidence does not strongly support
the selection model. It would seem that the selection is
dominated by the observables, presumably primarily by per
Fig. 1 Efficiency scores related to per capita GDP. Larger points capita income. Figure 2 plots the estimated efficiency
indicate OECD members scores from the stochastic frontier model versus those in
123
22 J Prod Anal (2010) 34:15–24
Table 2 Estimated stochastic frontier models

Non-OECD countries OECD countries
Stochastic frontier Sample selection Stochastic frontier Sample selection
Constant 3.76162 (0.05429) 3.74915 (0.05213) 3.10994 (1.15519) 3.38244 (1.42161)

LogHexp 0.08388 (0.01023) 0.08842 (0.010228) 0.04765 (0.006426) 0.04340 (0.008805)
LogEduc 0.09096 (0.075150) 0.09053 (0.073367) 1.00667 (1.06222) 0.77422 (1.2535)
2
Log Educ 0.00649 (0.02834) 0.00564 (0.02776) -0.23710 (0.24441) -0.18202 (0.28421)
ru 0.12300 0.12859 0.02649 0.01509
rv 0.05075 0.04735 0.00547 0.01354
k 2.42388 2.71549 4.84042 1.11413
r 0.13306 0.13703 0.02705 0.02027
q 0.0000 0.63967 (1.4626) 0.0000 -0.73001 (0.56945)
logL 160.2753 161.0141 62.96128 65.44358
LR test 1.4776 4.9646
N 161 30
The estimated probit model for OECD membership (with estimated standard errors in parentheses) is OECD = -8.2404 (3.369) ?
0.7388LogPerCapitaGDP (0.3820) ? 0.6098GovernmentEffectiveness (0.4388) ? 0.7291Voice (0.3171)
Efficiencies from Selection SF Model vs. WHO Estimates Simulation vs. Plug-in Efficiency Estimates
1.00 1.000
.950
.80 .900
.850
EFFJLMS
WHOEFF
.60
.800
.750
.40
.700
.650
.20
.600
.550
.00
.650 .700 .750 .800 .850 .900 .950 1.000
.650 .700 .750 .800 .850 .900 .950 1.000
EFFSIM
EFFSEL
Fig. 3 Alternative estimators of efficiency scores
Fig. 2 Estimated efficiency scores
Finally, Fig. 4 shows a plot of the country ranks based
the WHO report. (We did not reestimate the TMLE values; on the stochastic frontier model versus the country ranks
those shown in the figure appear in the tables in the WHO implicit in the WHO estimates for the non-OECD coun-
report.) As anticipated in Greene (2004), the impact of the tries. The Spearman rank correlation of the two series is
fixed effects regression is to attribute to inefficiency effects 0.66, which seems higher than the figure would suggest.
that might be better explained by cross country heteroge- The (visually) quite weak correlation in the two sets of
neity. These effects would be picked up by the noise term results conflicts with our earlier suggestion. In sum, there
in the frontier model. The heavy diagonal line in the figure are a long list of substantive differences between the
shows the effect; save for the very largest values, the MSL approach taken here and the one in TMLE. There are at
estimates of E[ui|ei] are well below their counterparts least three sources of difference. First, TMLE used a fixed
computed using the TMLE fixed effects estimator. effects linear regression whereas we have used a stochastic
Figure 3 shows a plot of the two estimators of the frontier model. Second, we have used the time invariant
inefficiency scores in the selectivity corrected frontier variables in Table 1 to control for cross country hetero-
model, the JLMS estimator and the simulated values of geneity whereas ETML did not make use of these. Third,
E[u|e] computed during the estimation. These are based on we have accounted for the nonrandom sample selection in
the parameters of the selectivity model in (11) As noted the OECD and NonOECD subsamples. None of these,
earlier, they are strikingly similar. alone or together should necessarily produce a change in
123
J Prod Anal (2010) 34:15–24 23
175 the linear model and Terza’s (1986, 2009) extension to

nonlinear models to produce a sample selection correction
140 for the stochastic frontier model.
The assumption that the unobservables in the selection
RANKW
105 equation are correlated with the heterogeneity in the pro-

duction function but uncorrelated with the inefficiency is
70
an important feature of the model. It seems natural and
appropriate in this setting—one might expect that obser-
vations are not selected into the sample based on their
35
being inefficient to begin with. Nonetheless, that, as well, is
an issue that might be further considered (note, again, the
0
0 35 70 105 140 175 alternative approaches by Kumbhakar et al. 2009 and by
RANKS Lai et al. 2009). A related question is whether it is rea-
sonable to assume that the heterogeneity and the ineffi-
Fig. 4 Ranks of countries based on WHO and simulation efficiency
ciency in the production model should be assumed to be
estimates
uncorrelated. Some progress has been made in this regard,
the rankings of observations. The impacts of each source of e.g., in Smith (2003), and, by implication, Lai et al. (2009),
variation might be the subject of some fruitful further but the analysis is tangential to the model considered here.
analysis. The TMLE study was ultimately focused on the We have revisited the WHO (2000) study, and found that
ranks of the counties, not on the inefficiency levels them- the results vary greatly depending on the specification.
selves. The disparity in the ranks produced by the methods However, it does appear that our expectation that ‘selection’
considered here should be of significant concern. on OECD membership is an important element of the mea-
The analysis described here is essentially microeco- sured inefficiency in the data was not supported statistically.
nomic, behavioral in nature. One might question the The results suggest that the obvious pattern in Fig. 1 that
theoretical underpinnings of a behavioral model of opti- separates OECD from nonOECD members is explained by
mization and efficiency applied to macroeconomic data observables (such as per capita GDP) and not unobservables
such as these. Another recent study, Rahman et al. (2009) as would be implied by the sample selection model.
used the methods described in this paper to analyze pro-
duction efficiency of rice producers in Thailand. In this
study, the authors analyzed the switch by Thai farmers References
from lower quality rice varieties to a higher quality, Jas-
mine variety. Their sample included 207 farmers in the Aigner D, Lovell K, Schmidt P (1977) Formulation and estimation of
former group and 141 in the latter. They were able to stochastic frontier production function models. J Econ 6:21–37
examine the production process in much greater detail than Battese G, Coelli T (1995) A model for technical inefficiency effects
in a stochastic frontier production for panel data. Empir Econ
we have here. In their results, the ‘‘correction’’ for selection 20:325–332
into the high quality market produced quite marked dif- Boyes W, Hoffman S, Low S (1989) An econometric analysis of the
ferences in the estimated production frontier and a highly bank credit scoring problem. J Econ 40:3–14
significant ‘‘selection effect.’’ Bradford D, Kleit A, Krousel-Wood M, Re R (2001) Stochastic
frontier estimation of cost models within the hospital. Rev Econ
Stat 83(2):302–309
Collins A, Harris R (2005) The impact of foreign ownership and
4 Conclusions efficiency on pollution abatement expenditures by chemical
plants: Some UK evidence. Scott J Political Econ 52(5):757–768
Econometric Software, Inc. (2007) LIMDEP version 9.0. Plainview,
We have proposed a maximum simulated likelihood esti- New York
mator for ALS’s normal—half normal stochastic frontier Gourieroux C, Monfort A (1996) Simulation based econometric
model. The normal-exponential model, a normal—t model, methods. Oxford University Press, Oxford
or normal—anything else model would all be trivial Gravelle H, Jacobs R, Street JA (2002a) Comparing the efficiency of
national health systems: Econometric analysis should be handled
modifications. The manner in which the values of ui are with care. University of York, Health Economics Unit, UK
simulated is all that changes from one to the next. The Manuscript
identical simulation based estimator of the inefficiencies is Gravelle H, Jacobs R, Street JA (2002b) Comparing the efficiency of
used as well. We note that in a few other cases, such as the national health systems: A sensitivity approach. University of
York, Health Economics Unit, UK Manuscript
t distribution, simulation (or MCMC) is the only feasible Greene W (1994) Accounting for excess zeros and sample selection in
method of proceeding (see Tsionas et al. 2008). The model poisson and negative binomial regression models. Stern School
is then extended using Heckman’s (1976) formulation for of Business, NYU, Working Paper EC-94-10
123
24 J Prod Anal (2010) 34:15–24
Greene W (2004) Distinguishing between heterogeneity and ineffi- Pitt M, Lee L (1981) The measurement and sources of technical
ciency: Stochastic frontier analysis of the World Health Orga- inefficiency in the indonesian weaving industry. J Dev Econ
nization’s panel data on national health care systems. Health 9:43–64
Econ 13:959–980 Rahman S, Wiboonpongse A, Sriboonchitta S, Chaovanapoonphol Y
Greene W (2008a) The econometric approach to efficiency analysis. (2009) Production efficiency of jasmine rice producers in
In: Lovell K, Schmidt S (eds) The measurement of efficiency. H Northern and North-eastern Thailand. J Agric Econ, online,
Fried Oxford University Press, Oxford pp 1–17 (forthcoming)
Greene W (2008b) Econometric analysis, 6th edn. Prentice Hall, Sipiläinen T, Oude Lansink A (2005) Learning in switching to
Englewood Cliffs organic farming, Nordic Association of Agricultural Scientists,
Greene W, Misra S (2004) Simulated maximum likelihood estimation NJF Report, Vol 1, Number 1, 2005. http://orgprints.org/5767/
of the stochastic frontier model. Manuscript, Department of 01/N369.pdf
Marketing, University of Rochester Smith M (2003) Modeling sample selection using archimedean
Heckman J (1976) Discrete, qualitative and limited dependent copulas. Econ J 6:99–123
variables. Ann Econ Soc Meas 4(5):475–492 Stevenson R (1980) Likelihood functions for generalized stochastic
Heckman J (1979) Sample selection bias as a specification error. frontier estimation. J Econ 13:58–66
Econometrica 47:153–161 Tandon A, Murray C, Lauer J, Evans D (2000) Measuring the overall
Hollingsworth J, Wildman B (2002) The efficiency of health health system performance for 191 countries, World Health
production: Re-estimating the WHO panel data using parametric Organization, GPE Discussion Paper, EIP/GPE/EQC Number
and nonparametric approaches to provide additional information. 30, 2000. http://www.who.int/entity/healthinfo/paper30.pdf
Health Econ 11:1–11 Terza J (1994) Dummy endogenous variables and endogenous
Jondrow J, Lovell K, Materov I, Schmidt P (1982) On the estimation switching in exponential conditional mean regression models,
of technical inefficiency in the stochastic frontier production Manuscript, Department of Economics, Penn State University
function model. J Econ 19:233–238 Terza J (1996) FIML, method of moments and two stage method of
Kaparakis E, Miller S, Noulas A (1994) Short run cost inefficiency of moments estimators for nonlinear regression models with
commercial banks: A flexible stochastic frontier approach. endogenous switching and sample selection, Working Paper,
J Money Credit Bank 26:21–28 Department of Economics, Penn State University
Koop G, Steel M (2001) Bayesian analysis of stochastic frontier Terza J (1998) Estimating count data models with endogenous
models. In: Baltagi B (ed) Companion to theoretical economet- switching: Sample selection and endogenous treatment effects.
rics. Blackwell, Oxford J Econ 84(1):129–154
Kopp R, Mullahy J (1990) Moment-based estimation and testing of Terza JV (2009) Parametric nonlinear regression with endogenous
stochastic frontier models. J Econ 46(1/2):165–184 switching. Econ Rev 28:555–580
Kumbhakar S, Tsionas M, Sipilainen T (2009) Joint estimation of Train K (2003) Discrete choice methods with simulation. Cambridge
technology choice and technical efficiency: An application to University Press, Cambridge
organic and conventional dairy farming. J Prod Anal 31(3):151–162 Tsionas E, Kumbhakar S, Greene W (2008) Non-Gaussian stochastic
Lai H, Polachek S, Wang H (2009) Estimation of a stochastic frontier frontier models. Manuscript, Department of Economics, Univer-
model with a sample selection problem. Working Paper, sity of Binghamton
Department of Economics, National Chung Cheng University, Weinstein M (1964) The sum of values from a normal and a truncated
Taiwan normal distribution. Technometrics 6:104–105 469–470
Maddala G (1983) Limited dependent and qualitative variables in Winkelmann R (1998) Count data models with selectivity. Econ Rev
econometrics. Cambridge University Press, Cambridge 4(17):339–359
Murphy K, Topel R (2002) Estimation and inference in two stem World Health Organization (2000) The World Health Report. WHO,
econometric models. J Bus Econ Stat 20:88–97 Geneva
New York Times, Editorial: ‘‘World’s Best Medical Care?’’ August Wynand P, van Praag B (1981) The demand for deductibles in private
12, 2007 health insurance: A probit model with sample selection. J Econ
Newhouse J (1994) Frontier estimation: How useful a tool for health 17:229–252
economics? J Health Econ 13:317–322
123

A Stochastic Frontier Model With Correction For Sample Selection

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Stochastic Frontier Model With Correction For Sample Selection

Uploaded by

Copyright:

Available Formats

J Prod Anal (2010) 34:15–24

A stochastic frontier model with correction for sample selection

Published online: 28 November 2009

logLS;C ðb;ru ;rv ;qÞ maximum simulated likelihood estimates. An alternative

frontier model shown earlier. This provides us with a ¼ f^ir ð20Þ

Table 2 Estimated stochastic frontier models

Constant 3.76162 (0.05429) 3.74915 (0.05213) 3.10994 (1.15519) 3.38244 (1.42161)

175 the linear model and Terza’s (1986, 2009) extension to

105 equation are correlated with the heterogeneity in the pro-

You might also like