Professional Documents
Culture Documents
Tema6 Punto4
Tema6 Punto4
Tema6 Punto4
Empirical Example
4. Sample selection Model : Heckman Model (Empirical Example)
25 20
expenditure in beer
10 5
015
all
buyers
61
4. Sample selection Model : Heckman Model (Empirical Example)
The outcome variable for the discrete choice model: to buy or not to buy
decision
.
. *Dummy que identifique el hogar en el que se consume cerveza
. gen byte consumidor=(gtot_cer>0 & gtot_cer<.)
. sum consumidor
tamamu
2 -.175105 .0366985 -4.77 0.000 -.2470328 -.1031772 estimates
3 -.1681606 .0398352 -4.22 0.000 -.2462362 -.090085
cons_vino 1.250868 .0206027 60.71 0.000 1.210487 1.291248 whether we estimate the equation jointly
_cons -1.296076 .1634957 -7.93 0.000 -1.616522 -.9756303
using the heckman selection model or
/mills
lambda 10.26995 .5403055 19.01 0.000 9.210974 11.32893 separately using MCO and logit.
rho 0.53755
sigma 19.105058
63
. heckman gtot_cer $x, select(consumidor=$z) nolog twostep 4. Sample selection Model : Heckman
Heckman selection model -- two-step estimates Number of obs = 18,929 Model : The case of the beer....
(regression model with sample selection) Selected = 8,112
Nonselected = 10,817
Wald chi2(10)
Prob > chi2
=
=
417.65
0.0000
+
𝛗 𝒁𝒊 𝜸
Coefficient Std. err. z P>|z| [95% conf. interval]
𝒊 : selection term
gtot_cer
𝚽 𝒁𝒊 𝜸
renta 20.73475 1.717961 12.07 0.000 17.36761 24.10189
rentac
hombresp
-6.812333
2.760026
.7557411
.4456736
-9.01
6.19
0.000
0.000
-8.293558
1.886521
-5.331107
3.63353 An example of analysis for covariates:
adult 2.678986 .5132346 5.22 0.000 1.673065 3.684908
edadsp
edadsp2
.7575324
-.007226
.1153621
.0010042
6.57
-7.20
0.000
0.000
.5314267
-.0091942
.9836381
-.0052578
Income is present in {X} and {Z}
diaria0 -4.037678 1.028675 -3.93 0.000 -6.053844 -2.021511
diaria1
diaria2
-1.42442
.8537074
.8189
.6944884
-1.74
1.23
0.082
0.219
-3.029435
-.5074648
.1805943
2.21488
Influences the decision of whether to buy
diaria3
_cons
-.6342487
-23.20353
.7540386
3.269747
-0.84
-7.10
0.400
0.000
-2.112137
-29.61212
.8436398
-16.79495
beer and the decision of the amount of
consumidor
beer....
renta .5857992 .0820532 7.14 0.000 .4249779 .7466204
rentac
hombresp
-.1898944
.0941907
.0368576
.0216825
-5.15
4.34
0.000
0.000
-.262134
.0516938
-.1176549
.1366876
We can interpret the sign of the effect:
adult .0854871 .027841 3.07 0.002 .0309197 .1400545
menores .0265691 .0291151 0.91 0.361 -.0304955 .0836337 In both cases is positive and statistically
diaria0 -.4509204 .0508238 -8.87 0.000 -.5505332 -.3513075
diaria1 -.3098566 .0441941 -7.01 0.000 -.3964755 -.2232376 significant, hence, our model states that as the
diaria2 -.1064416 .0375657 -2.83 0.005 -.1800691 -.0328141
diaria3 -.0300077 .0407837 -0.74 0.462 -.1099423 .0499269 family income growths, the probability of
edadsp
edadsp2
.026171
-.0002815
.005488
.0000473
4.77
-5.95
0.000
0.000
.0154146
-.0003742
.0369273
-.0001888 consuming beer at home increases as well as
caprov -.1564166 .0333308 -4.69 0.000 -.2217439 -.0910893
the amount of beer.
tamamu
2 -.175105 .0366985 -4.77 0.000 -.2470328 -.1031772 ¿Can we say the same idea for age of the
3 -.1681606 .0398352 -4.22 0.000 -.2462362 -.090085
4 -.1482927 .0433588 -3.42 0.001 -.2332743 -.0633111 reference person?
5 -.1905596 .0374727 -5.09 0.000 -.2640046 -.1171145
/mills
lambda 10.26995 .5403055 19.01 0.000 9.210974 11.32893
rho 0.53755
sigma 19.105058
64
. heckman gtot_cer $x, select(consumidor=$z) nolog twostep
tamamu
2 -.175105 .0366985 -4.77 0.000 -.2470328 -.1031772
3 -.1681606 .0398352 -4.22 0.000 -.2462362 -.090085
4 -.1482927 .0433588 -3.42 0.001 -.2332743 -.0633111
5 -.1905596 .0374727 -5.09 0.000 -.2640046 -.1171145
/mills
lambda 10.26995 .5403055 19.01 0.000 9.210974 11.32893
rho 0.53755
sigma 19.105058
65
4. Sample selection Model : Heckman Model : The case of the beer....
This “ycond” predictions are available for all observations in the dataset, but they
∗ ∗
are intended to compute : or
66
4. Sample selection Model : Heckman Model : The case of the beer....
.
. sum gtot_cer gcer_cond if gtot_cer>0
67
4. Sample selection Model : Heckman Model : The case of the beer....
𝐸 Q |𝑋
Prob Q∗ |𝑋, 𝑦 = 1
What if we were interested in making predictions about mean consumption for all
households? Here the expected expenditure is 0 for those who are not expected to buy
good Q, with expected participation (buy=yes) determined by the selection equation.
These values can be obtained with the yexpected option of predict (“y” taken to be 0 where
unobserved)
. predict gcer_exp, yexpected
Again we note that the predictions from heckman are close to the observed mean
expenditure level for beer for all households.
Why might be that predictions using ycond and yexpected are not equal to their observed sample
equivalents? For the Heckman model, unlike linear regression, the sample moments implied by the
optimal solution to the model likelihood do not require that these predictions match observed data.
Properly accounting for the additional variation from the selection equation requires that the model
use more information than just the sample moments of the observed expenditure 68
4. Sample selection Model : Heckman Model : The case of the beer....
𝐸 Q |𝑋
Prob Q∗ |𝑋, 𝑦 = 1
Predict with the option “psel” calculates the probability of selection (or being observed).
69
The issue of marginal effects
4. Sample selection Model : Heckman Model : The case of the beer....
𝐸 Q∗ |𝑋, 𝑦 = 1 = 𝛼 + X 𝛽 + 𝜌𝜎
71
4. Sample selection Model : Heckman Model : The case of the beer....
∗
Marginal effects on the “latent” variable
The total marginal effect of the independent variables {Z (some X are also in Z)} on
in the observed sample may consists of two components.
First, there is the direct effect of the independent variable on the mean of which is
captured by .
Second, there is an indirect effect if the independent variable also appears in the selection
equation. This is because a change in some X not only changes the the mean of , but also
the probability that an observation is actually in the sample i.e. it will affect through
[ | ]
,
The main point is that if and the independent variable appears in the selection and
outcome equation, then does NOT indicate the marginal effect of x on . It is quite
possible for the magnitude, sign, and statistical significance of the marginal effect to all be
different from the estimate of . This point is often ignored. Thus, it is not sufficient to
simply estimate the model and look at t-statistics to know if an independent variable (that
appears in the selection and outcome equation) has an effect on .
72
4. Sample selection Model : Heckman Model : The case of the beer....
Z≠𝑋
[R] heckman postestimation Postestimation tools for heckman
(View complete PDF manual entry)
Postestimation commands
Command Description
73
4. Sample selection Model : Heckman Model : The case of the beer....
statistic Description
* ycond and yexpected are not allowed with margins after heckman, twostep.
74
4. Sample selection Model : Heckman Model : The case of the beer....
statistic Description
The partial effects on the ycond are the partial effects on
xb linear prediction; the default
xbsel linear prediction for selection equation the truncated mean, i.e. only for those who actually have
pr(a,b)
e(a,b)
Pr(y
E(y
a < y < b)
a < y < b)
an observed value. 𝐸 Q∗ |𝑋, Q > 0
ystar(a,b) E(y*), y* = max{a,min(y,b)}
* ycond E(y y observed)
* yexpected E(y*), y taken to be 0 where unobserved
nshazard or mills nonselection hazard (also called inverse of Mills's ratio)
psel Pr(y observed)
stdp
stdf
not allowed with margins
not allowed with margins
The partial effects on the yexpected value, are the partial
stdpsel not allowed with margins effects on the censored mean of the dependent variable.
* ycond and yexpected are not allowed with margins after heckman, 𝐸 Q∗ |𝑋, 𝑄 > 0 *Prob Q∗ > 0
twostep.
The censored mean is supposed to be equal to the
probability of being observed times the truncated
mean.
75
4. Sample selection Model : Heckman Model : The case of the beer....
Some remarks
margins, dydx(*) after heckmand estimation
will always get you the same coefficients as you have for the value equation when
using heckman because since you don't specify any predict() options, it assumes that you
want the marginal effects on the linear prediction of the value equation, i.e. the xb option is
the default.
Nothing wrong there, only that if that's not what you want to estimate you should mention
that by setting an appropriate predict() option.
See heckman postestimation to know what options are available and which one
matches what you want.
76
. heckman gtot_cer renta c.renta#c.renta edadsp c.edadsp#c.edadsp $x1, select(consumidor=renta c.renta#c.renta edadsp c.edadsp#c.edadsp $z1) nolog twostep
Wald chi2(10)
Prob > chi2
=
=
417.65
0.0000
Fist step
Coefficient Std. err. z P>|z| [95% conf. interval]
gtot_cer
renta 20.73475 1.717961 12.07 0.000 17.36761 24.10189
consumidor Delta-method
renta .5857992 .0820532 7.14 0.000 .4249779 .7466204 dy/dx std. err. z P>|z| [95% conf. interval]
c.renta#c.renta -.1898944 .0368576 -5.15 0.000 -.262134 -.1176549 renta 10.04027 .6753112 14.87 0.000 8.716683 11.36385
edadsp -.0558587 .0161448 -3.46 0.001 -.0875018 -.0242155
edadsp .026171 .005488 4.77 0.000 .0154146 .0369273 hombresp 2.760026 .4456736 6.19 0.000 1.886521 3.63353
adult 2.678986 .5132346 5.22 0.000 1.673065 3.684908
c.edadsp#c.edadsp -.0002815 .0000473 -5.95 0.000 -.0003742 -.0001888
tamamu
2 -.175105 .0366985 -4.77 0.000 -.2470328 -.1031772
3 -.1681606 .0398352 -4.22 0.000 -.2462362 -.090085
4 -.1482927 .0433588 -3.42 0.001 -.2332743 -.0633111
5 -.1905596 .0374727 -5.09 0.000 -.2640046 -.1171145
/mills
lambda 10.26995 .5403055 19.01 0.000 9.210974 11.32893
rho 0.53755
sigma 19.105058
77
Fist step
. qui heckman gtot_cer renta c.renta#c.renta edadsp c.edadsp#c.edadsp $x1, select(consumidor=renta c.renta#c.renta edadsp c.edadsp#c.edadsp $z
. *margins, dydx(*)
. margins, dydx(renta edad)
Delta-method
dy/dx std. err. z P>|z| [95% conf. interval]
78
1. Corner Solutions: Heckman Model
[ | ] ( )
Marginal effects: ,
An example for Income {this variable can also be observed for Q=0}
( )
β <marginal effect in this case since >0
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
79
1. Corner Solutions: Heckman Model
[ | ] ( )
Marginal effects: ,
An example for Income {this variable can also be observed for Q=0}
( )
β <marginal effect in this case since >0
Delta-method
dy/dx Std. Err. z P>|z| [95% Conf. Interval]
80