Professional Documents
Culture Documents
Microeconometrie Chapitre3 TruncationSensoringSelectionModels
Microeconometrie Chapitre3 TruncationSensoringSelectionModels
Théophile T. Azomahou
University Clermont Auvergne, CNRS, CERDI
Maastricht University, School of Business and Economics
Email: theophile.azomahou@uca.fr
Truncation
Censoring
Selection
Application to Impact Evaluation
Théophile T. Azomahou (CERDI) Février 20-28, 2020 1 / 35
Introduction
1. Introduction
Previously, we use frameworks in which all observations are available for all
variables of interest in the sense that they are representative of the
population.
Here we consider regression when the dependent variable of interest is
incompletely observed and regression when the dependent variable is
completely observed but is observed in a selected sample that is not
representative of the population.
All these models share the common feature that even in the simplest case of
population conditional mean linear in regressors, OLS regression leads to
inconsistent parameter estimates because the sample is not representative of
the population.
This includes limited dependent variable models, latent variable models,
Tobit models and selection models.
Alternative estimation procedures, most relying on strong distributional
assumptions, are necessary to ensure consistent parameter estimation.
y∗ = −2500 + 1000 ln w + ε
ε ∼ N(0, 10002 )
ln w ∼ N(2.7, 0.602 ) (1)
The model implies that the wage elasticity is 1000/y ∗ , which equals, for
example, 0.5 for full-time work (2,000 hours). For each 1% increase in wage,
annual hours increase by 10 hours.
With censoring at zero, negative values of y ∗ are set to zero because people
with negative desired hours of work choose not to work. For this particular
sample this is the case for about 35% of the observations. This pushes up
the mean for low wages, since the many negative values of the y ∗ are shifted
up to zero. It has little impact for high wages, since then few observations
on y ∗ are zero.
With truncation at zero the 35% of the population with negative values of y ∗
are dropped altogether. This increases the mean above the censored mean,
since zero values are no longer included in the data used to form the mean.
2. Truncation
2.1 Definition and mechanism
Let y ∗ denote a variable that is incompletely observed.
For truncation from below, y ∗ is only observed if y ∗ exceeds a threshold.
For simplicity, let that threshold be zero. Then we observe y = y ∗ if y ∗ > 0.
Since negative values do not appear in the sample, the truncated mean
exceeds the mean of y ∗ .
Mechanism: Truncation entails additional information loss as all data on
observations at the bound are lost. With truncation from below we
observe only
y = y ∗ if y ∗ > L (2)
For example, only consumers who purchased durable goods may be sampled
(L = 0). With truncation from above we observe only
y = y∗ if y ∗ < U (3)
f (x)
f (x | x > a) = .
Prob(x > a)
If x has a normal distribution with mean µ and standard deviation σ, then:
a−µ
Prob(x > a) = 1 − Φ = 1 − Φ(α),
σ
where α = (a − µ)/σ and Φ(.) is the standard normal cdf. The density of the
truncated normal distribution is then
1 x−µ
2 2 φ
f (x) (2πσ 2 )−1/2 e −(x−µ) /(2σ ) σ σ
f (x | x > a) = = = ,
1 − Φ(α) 1 − Φ(α) 1 − Φ(α)
where φ(.) is the standard normal pdf.
Théophile T. Azomahou (CERDI) Février 20-28, 2020 7 / 35
Truncation
f (x) = 1, 0 ≤ x ≤ 1.
For a variable distributed uniformly between L and U, the variance is (U − L)2 /12.
Thus, h 1i 1
Var x | x > = .
3 27
1 1
The mean and variance of the untruncated distribution are 2 and 12 , respectively.
Example: conclusion
1 If the truncation is from below, then the mean of the truncated variable is
greater than the mean of the original one. If the truncation is from above,
then the mean of the truncated variable is smaller than the mean of the
original one.
2 Truncation reduces the variance compared with the variance in the
untruncated distribution. This shows that truncation is essentially a
characteristic of the distribution from which the sample data are
drawn.
Henceforth, we shall use the terms truncated mean and truncated variance to
refer to the mean and variance of the random variable with a truncated
distribution.
and
δ(α) = λ(α)[λ(α) − α]. (4)
An important result is
A result that we will use at several points below is dφ(α)/dα = −αφ(α). The
function λ(α) is called the inverse Mills ratio. The function in (3a) is also called
the hazard function for the standard normal distribution.
yi = x0i β + εi ,
where
εi | xi ∼ N[0, σ 2 ],
so that
yi | xi ∼ N[x0i β, σ 2 ]. (5)
We are interested in the distribution of yi given that yi is greater than the
truncation point a. This is the result described in Theorem. It follows that
The partial effects in this model in the subpopulation can be obtained by writing
where now αi = (a − x0i β)/σ. For convenience, let λi = λ(αi ) and δi = δ(αi ).
Then
∂E [yi | yi >a]
∂xi = β + σ(dλi /dαi ) ∂α
∂xi
i
= β + σ λ2i − αi λi (−β/σ)
(8)
= β 1 − λ2i + αi λi
= β(1 − δi )
Note the appearance of the scale factor 1 − δi from the truncated variance.
Because (1 − δi ) is between zero and one, we conclude that for every element of
xi , the marginal effect is less than the corresponding coefficient. There is a similar
attenuation of the variance.
Whether the partial effect in (7) or the coefficient β itself is of interest depends
on the intended inferences of the study. If the analysis is to be confined to the
subpopulation, then (7) is of interest. If the study is intended to extend to the
entire population, however, then it is the coefficients β that are actually of
interest.
3. Censoring
3.1 Examples, definition and mechanism
Censoring of the dependent variable is a very common problem in
microeconomic. When the dependent variable is censored, values in a certain
range are all transformed to (or reported as) a single value. Some studies in the
empirical literature:
1 Household purchases of durable goods [Tobin (1958)]
2 Number of extramarital affairs [Fair (1977, 1978)]
3 Number of hours worked by a woman [Quester and Greene (1982)]
4 Household expenditure on various commodity groups [Jarque (1987)]
Each of these studies analyzes a dependent variable that is zero for a significant
fraction of the observations. Conventional regression methods fail to account for
the qualitative difference between limit (zero) observations and nonlimit
(continuous) observations.
For example, all consumers may be sampled with some having positive
durable goods expenditures y ∗ > 0 and others having zero expenditures
y ∗ ≤ 0.
φ(µ/σ)
E [y | a = 0] = Φ(µ/σ)(µ + σλ), where λ = .
Φ(µ/σ)
For censoring of the upper part of the distribution instead of the lower, it is only
necessary to reverse the role of Φ and 1 − Φ and redefine λ as in Theorem on
moments of truncation.
1. For latent variable, E [yi∗ | xi ] is x0i β. If the data are always censored,
however, then this result will usually not be useful.
∂E [y | x]
= β × Prob[a < y ∗ < b].
∂x
4. Estimation
The tobit model has become so routine and been incorporated in so many
computer packages that despite formidable obstacles in years past, estimation is
now essentially on the level of ordinary linear regression. The log-likelihood for
the censored regression model is
The two parts correspond to the classical regression for the nonlimit observations
and the relevant probabilities for the limit observations, respectively. This
likelihood is a nonstandard type, because it is a mixture of discrete and
continuous distributions.
A. Framework
Selection equation: Latent Probit selection mechanism:
Di∗ = wi0 α + ui with ui ∼ N[0, 1] (14)
Di = 1[Di∗ >0]
where w stands for the selection controls, and Di = 1 if selection, ui is the
error term.
B. Estimation
where !
n
wi0 α + (yi − x0i β − δDi )(ρ/σ)
X
ln L = ln Φ (2Di − 1) p
i=1 1 − ρ2
exp(yi − x0i β − δDi )2
1
− ln √ for Di = (0, 1) (17)
2 σ 2 2π
where Di∗ in equation (20) is a latent variable, the observed counterpart of which
is Di , and 1[ ] denotes the indicator function.
We assume joint normality for the three disturbances;
µi 0 1
εi1 ∼ N 0 , ρµε σε1 σε2
1 1
εi0 0 ρµε0 σε0 0 σε20
B. Estimation
f2 (·) and f1 (·) are bivariate and univariate normal density functions. By
replacing in the population analogues:
N
" !#1−Di " !#Di
Y 1 −γ 0 wi − ρµi εi0 ζ0 1 γ 0 wi + ρµi εi1 ζ1
L= φ (ζ0 ) Φ p φ (ζ1 ) Φ p
i=1
σε0 1 − ρ2µi εi0 σε1 1 − ρ2µi εi1
(24)
yik−β0 x
k i
where ζk = σεk , with k = (0, 1), and φ(·) and Φ(·) are respectively the
pdf and cdf.
Since we use the estimates of the λ’s, the conventional standard errors are
not valid and need to be corrected by using techniques of simulation or
bootstrap.
E(yi |Di = 1) = x0i β + δ + E(εi |Di = 1) and E(yi |Di = 0) = x0i β + E(εi |Di = 0)
7. Empirical Application