Microeconometrie Chapitre3 TruncationSensoringSelectionModels

Chapter 3
Truncation, Censoring and Selection Models
Théophile T. Azomahou
University Clermont Auvergne, CNRS, CERDI
Maastricht University, School of Business and Economics
Email: theophile.azomahou@uca.fr
Truncation
Censoring
Selection
Application to Impact Evaluation
Théophile T. Azomahou (CERDI) Février 20-28, 2020 1 / 35
Introduction
1. Introduction
Previously, we use frameworks in which all observations are available for all
variables of interest in the sense that they are representative of the
population.
Here we consider regression when the dependent variable of interest is
incompletely observed and regression when the dependent variable is
completely observed but is observed in a selected sample that is not
representative of the population.
All these models share the common feature that even in the simplest case of
population conditional mean linear in regressors, OLS regression leads to
inconsistent parameter estimates because the sample is not representative of
the population.
This includes limited dependent variable models, latent variable models,
Tobit models and selection models.
Alternative estimation procedures, most relying on strong distributional
assumptions, are necessary to ensure consistent parameter estimation.

Introduction
Main causes of incompletely observed data are truncation and censoring:

For truncated data some observations on both the dependent variable
and regressors are lost. For example, income may be the dependent
variable and only low-income people are included in the sample.
For censored data information on the dependent variable is lost, but
not data on the regressors. For example, people of all income levels
may be included in the sample, but for confidentiality reasons the
income of high-income people may be top-coded and reported only as
exceeding, say, $100,000 per year.
Truncation entails greater information loss than does censoring. A leading

example of truncation and censoring is the Tobit model, named after Tobin
(1958).
Truncation is essentially a characteristic of the distribution from which the
sample data are drawn.
Censoring The censoring of a range of values of the variable of interest
introduces a distortion into conventional statistical results that is similar to
that of truncation.
Introduction
Example: Tobit regression of hours on log wage

Let’s consider the following labor supply example with simulated data. The
relationship between desired annual hours worked, y ∗ , and hourly wage, w , is
specified to be of linear-log form with data-generation process (DGP).
y∗ = −2500 + 1000 ln w + ε
ε ∼ N(0, 10002 )
ln w ∼ N(2.7, 0.602 ) (1)
The model implies that the wage elasticity is 1000/y ∗ , which equals, for
example, 0.5 for full-time work (2,000 hours). For each 1% increase in wage,
annual hours increase by 10 hours.
With censoring at zero, negative values of y ∗ are set to zero because people
with negative desired hours of work choose not to work. For this particular
sample this is the case for about 35% of the observations. This pushes up
the mean for low wages, since the many negative values of the y ∗ are shifted
up to zero. It has little impact for high wages, since then few observations
on y ∗ are zero.

Introduction
With truncation at zero the 35% of the population with negative values of y ∗
are dropped altogether. This increases the mean above the censored mean,
since zero values are no longer included in the data used to form the mean.
It is clear that censored and truncated conditional means are nonlinear in x

even if the underlying population mean is linear. OLS estimation using
truncated or censored data will lead to inconsistent estimation of the slope
parameter.
Clearly, sample means in truncated or censored samples cannot be used

without adjustment to estimate the original population mean.

Truncation
2. Truncation
2.1 Definition and mechanism
Let y ∗ denote a variable that is incompletely observed.
For truncation from below, y ∗ is only observed if y ∗ exceeds a threshold.
For simplicity, let that threshold be zero. Then we observe y = y ∗ if y ∗ > 0.
Since negative values do not appear in the sample, the truncated mean
exceeds the mean of y ∗ .
Mechanism: Truncation entails additional information loss as all data on
observations at the bound are lost. With truncation from below we
observe only
y = y ∗ if y ∗ > L (2)
For example, only consumers who purchased durable goods may be sampled
(L = 0). With truncation from above we observe only
y = y∗ if y ∗ < U (3)
For example, only low-income individuals may be sampled.

Truncation
2.2 Truncated Distributions

A truncated distribution is the part of an untruncated distribution that is above
or below some specified value.
Theorem (Density of a Truncated Random Variable)
If a continuous random variable x has pdf f (x) and a is a constant, then:
f (x)
f (x | x > a) = .
Prob(x > a)
If x has a normal distribution with mean µ and standard deviation σ, then:

a−µ
Prob(x > a) = 1 − Φ = 1 − Φ(α),
σ
where α = (a − µ)/σ and Φ(.) is the standard normal cdf. The density of the
truncated normal distribution is then

1 x−µ
2 2 φ
f (x) (2πσ 2 )−1/2 e −(x−µ) /(2σ ) σ σ
f (x | x > a) = = = ,
1 − Φ(α) 1 − Φ(α) 1 − Φ(α)
where φ(.) is the standard normal pdf.
Truncation
Figure: Truncated Normal Distributions

Truncation
Example: Truncated Uniform Distribution

If x has a standard uniform distribution, denoted U(0, 1), then
f (x) = 1, 0 ≤ x ≤ 1.
The truncated at x = 13 distribution is also uniform:

1 f (x) 1 3 1
f x |x > = = 2 = , ≤ x ≤ 1.
3 Prob x > 13 3
2 3
The expected value is

Z 1
1 3 2
E x |x > = x dx = .
3 1/3 2 3
For a variable distributed uniformly between L and U, the variance is (U − L)2 /12.
Thus, h 1i 1
Var x | x > = .
3 27
1 1
The mean and variance of the untruncated distribution are 2 and 12 , respectively.

Truncation
Example: conclusion
1 If the truncation is from below, then the mean of the truncated variable is
greater than the mean of the original one. If the truncation is from above,
then the mean of the truncated variable is smaller than the mean of the
original one.
2 Truncation reduces the variance compared with the variance in the
untruncated distribution. This shows that truncation is essentially a
characteristic of the distribution from which the sample data are
drawn.
Henceforth, we shall use the terms truncated mean and truncated variance to
refer to the mean and variance of the random variable with a truncated
distribution.

Truncation
2.3 Moment of Truncated Distributions

We are usually interested in the mean and variance of the truncated random
variable. They would be obtained by the general formula:
Z ∞
E [x | x > a] = xf (x | x > a) dx
a
for the mean and likewise for the variance.

Theorem (Moments of the Truncated Normal Distribution)
If x ∼ N[µ, σ 2 ] and a is a constant, then
E [x | truncation] = µ + σλ(α), (4)

2
Var[x | truncation] = σ [1 − δ(α)], (5)
where α = (a − µ)/σ, φ(α) is the standard normal density and
λ(α) = φ(α)/[1 − Φ(α)] if truncation is x > a, (3a)

λ(α) = −φ(α)/Φ(α) if truncation is x < a, (3b)

Truncation
and
δ(α) = λ(α)[λ(α) − α]. (4)
An important result is
0 < δ(α) < 1 for all values of α
A result that we will use at several points below is dφ(α)/dα = −αφ(α). The
function λ(α) is called the inverse Mills ratio. The function in (3a) is also called
the hazard function for the standard normal distribution.

Truncation
2.4 Truncated Regression Model
yi = x0i β + εi ,
where
εi | xi ∼ N[0, σ 2 ],
so that
yi | xi ∼ N[x0i β, σ 2 ]. (5)
We are interested in the distribution of yi given that yi is greater than the
truncation point a. This is the result described in Theorem. It follows that
φ[(a − x0i β)/σ]

E [yi | yi > a] = x0i β + σ (6)
1 − Φ[(a − x0i β)/σ]
The conditional mean is therefore a nonlinear function of a, σ, x, and β.

Truncation
The partial effects in this model in the subpopulation can be obtained by writing
E [yi | yi > a] = x0i β + σλ(αi ), (7)
where now αi = (a − x0i β)/σ. For convenience, let λi = λ(αi ) and δi = δ(αi ).
Then
∂E [yi | yi >a]
∂xi = β + σ(dλi /dαi ) ∂α
∂xi
i
= β + σ λ2i − αi λi (−β/σ)

(8)
= β 1 − λ2i + αi λi

= β(1 − δi )
Note the appearance of the scale factor 1 − δi from the truncated variance.
Because (1 − δi ) is between zero and one, we conclude that for every element of
xi , the marginal effect is less than the corresponding coefficient. There is a similar
attenuation of the variance.

Truncation
In the subpopulation yi > a, the regression variance is not σ 2 but
Var[yi | yi > a] = σ 2 (1 − δi ) (9)
Whether the partial effect in (7) or the coefficient β itself is of interest depends
on the intended inferences of the study. If the analysis is to be confined to the
subpopulation, then (7) is of interest. If the study is intended to extend to the
entire population, however, then it is the coefficients β that are actually of
interest.
One’s first inclination might be to use OLS of y on X to estimate the parameters

of this regression model. Then we have omitted a variable, the nonlinear term λi .
All the biases that arise because of an omitted variable can be expected.

Censoring
3. Censoring
3.1 Examples, definition and mechanism
Censoring of the dependent variable is a very common problem in
microeconomic. When the dependent variable is censored, values in a certain
range are all transformed to (or reported as) a single value. Some studies in the
empirical literature:
1 Household purchases of durable goods [Tobin (1958)]
2 Number of extramarital affairs [Fair (1977, 1978)]
3 Number of hours worked by a woman [Quester and Greene (1982)]
4 Household expenditure on various commodity groups [Jarque (1987)]
Each of these studies analyzes a dependent variable that is zero for a significant
fraction of the observations. Conventional regression methods fail to account for
the qualitative difference between limit (zero) observations and nonlimit
(continuous) observations.

Censoring
Mechanism: With censoring we always observe the regressors x, completely

observe y ∗ for a subset of the possible values of y ∗ , and incompletely
observe y for the remaining possible values of y ∗ .
For censoring from below at zero, y ∗ is not completely observed when
y ∗ ≤ 0, but it is known that y ∗ < 0 and for simplicity y is then set to 0.
Since negative values are scaled up to zero, the censored mean also exceeds
the mean of y ∗ .
If censoring is from below (or from the left), we observe
(
y ∗ if y ∗ > L
y= (10)
L if y ∗ ≤ L
For example, all consumers may be sampled with some having positive
durable goods expenditures y ∗ > 0 and others having zero expenditures
y ∗ ≤ 0.

Censoring
If censoring is from above (or from the right) we observe

(
y ∗ if y ∗ < U
y= (11)
U if y ∗ ≥ U
For example, annual income data may be top-coded at U = $100, 000.

The incompletely observed observations on y ∗ are set to L or U for
simplicity. More generally, we require that for incompletely observed
observations y ∗ is known to be missing (i.e., we observe that y ∗ lies outside
the relevant bound) and regressors x continue to be completely observed.

Censoring
3.2 Censored Normal Distribution
Theorem (Moments of the Censored Normal Variable)

If y ∗ ∼ N[µ, σ 2 ] and y = a if y ∗ ≤ a or else y = y ∗ , then
E [y ] = Φa + (1 − Φ)(µ + σλ),
and
Var[y ] = σ 2 (1 − Φ)[(1 − δ) + (α − λ)2 Φ],
where
Φ[(a − µ)/σ] = Φ(α) = Prob(y ∗ ≤ a) = Φ, λ = φ/(1 − Φ),
and
δ = λ2 − λα.
Proof: For the mean,
E [y ] = Prob(y = a) × E [y | y = a] + Prob(y > a) × E [y | y > a]
= Prob(y ∗ ≤ a) × a + Prob(y ∗ > a) × E [y ∗ | y ∗ > a]
= Φa + (1 − Φ)(µ + σλ)
using Theorem on moments of truncation.

Censoring
For the variance, we use a counterpart to the decomposition result that,

Var[y ] = E [conditional variance] + Var[conditional mean], and Theorem on
moments of truncation.
For the special case of a = 0, the mean simplifies to
φ(µ/σ)
E [y | a = 0] = Φ(µ/σ)(µ + σλ), where λ = .
Φ(µ/σ)
For censoring of the upper part of the distribution instead of the lower, it is only
necessary to reverse the role of Φ and 1 − Φ and redefine λ as in Theorem on
moments of truncation.

Censoring
3.4 Censored Regression (Tobit) Model
Regression model based on the preceding censoring is referred to as the censored

regression model or the tobit model, Tobin (1958). The regression is obtained
by making the mean in the preceding correspond to a classical regression model.
The general formulation is usually given in terms of an index function,
yi∗ = x0i β + εi ,
yi = 0 if yi∗ ≤ 0,
yi = yi∗ if yi∗ > 0.
There are potentially three conditional mean functions to consider, depending on
the purpose of the study:
1. For latent variable, E [yi∗ | xi ] is x0i β. If the data are always censored,
however, then this result will usually not be useful.

Censoring
2. Consistent with Theorem 3, for an observation randomly drawn from the

population, which may or may not be censored,
0
xβ
E [yi | xi ] = Φ i (x0i β + σλi ),
σ
where
φ[(0 − x0i β)/σ] φ(x0i β/σ)
λi = 0 = . (12)
1 − Φ[(0 − xi β)/σ] Φ(x0i β/σ)
3. Finally, if we intend to confine our attention to uncensored observations,
then the results for the truncated regression model apply. The limit
observations should not be discarded, however, because the truncated
regression model is no more amenable to least squares than the censored
data model.

Censoring
Theorem (Partial Effects in the Censored Regression Model)

In the censored regression model with latent regression y ∗ = x0 β + ε and
observed dependent variable, y = a if y ∗ ≤ a, y = b if y ∗ ≥ b, and y = y ∗
otherwise, where a and b are constants, let f (ε) and F (ε) denote the density and
cdf of ε. Assume that ε is a continuous random variable with mean 0 and
variance σ 2 , and f (ε | x) = f (ε). Then
∂E [y | x]
= β × Prob[a < y ∗ < b].
∂x

Estimation
4. Estimation
The tobit model has become so routine and been incorporated in so many
computer packages that despite formidable obstacles in years past, estimation is
now essentially on the level of ordinary linear regression. The log-likelihood for
the censored regression model is
(yi − x0i β)2

X 1 X 0
xi β
ln L = − log(2π) + ln σ 2 + 2
+ ln 1 − Φ . (13)
y >0
2 σ y =0
σ
i i
The two parts correspond to the classical regression for the nonlimit observations
and the relevant probabilities for the limit observations, respectively. This
likelihood is a nonstandard type, because it is a mixture of discrete and
continuous distributions.

Sample Selection
5. Sample Selection Models

Observational studies are rarely based on pure random samples. Most often
exogenous sampling is used and the usual estimators can be applied. If
instead a sample, intentionally or unintentionally, is based in part on values
taken by a dependent variable, parameter estimates may be inconsistent
unless corrective measures are taken. Such samples can be broadly defined
as selected samples.
There are many selection models, since there are many ways that a
selected sample may be generated. Indeed it is very easy to be unaware that
a selected sample is being used.
For example, consider interpretation of average scores over time on an
achievement test such as the Scholastic Aptitude Test, when test taking is
voluntary. A decline over time may be due to real deterioration in student
knowledge. However, it may just reflect the selection effect that relatively
more students have been taking the test over time and the new test takers
are the relatively weaker students.

Sample Selection
Selection may be due to self-selection, with the outcome of interest determined

in part by individual choice of whether or not to participate in the activity of
interest. It can also result from sample selection, with those who participate in
the activity of interest deliberately oversampled - an extreme case being sampling
only participants. In either case, similar issues arise and selection models are
usually called sample selection models.
A. Framework
Selection equation: Latent Probit selection mechanism:
Di∗ = wi0 α + ui with ui ∼ N[0, 1] (14)
Di = 1[Di∗ >0]
where w stands for the selection controls, and Di = 1 if selection, ui is the
error term.
Outcome equation (linear regression):

yi = x0i β + δDi + εi (15)
with εi i.i.d. (0, σ) and E(εi |Di ) 6= 0 (endogeneity of selection).
Sample Selection
B. Estimation
B1. Full Information Maximum likelihood (FIML): We assume that

equations (14) and (15) are linked by a bivariate normal distribution:

0 0 1 ρ
[ui , εi ] ∼ N , (16)
0 ρ σ
Let θ = (β, α, δ, σ, ρ) denotes the set of parameters. The FIML estimator

θ̂FIML is obtained as:
θ̂FIML = arg max ln L
θ∈Θ
where !
n
wi0 α + (yi − x0i β − δDi )(ρ/σ)
X
ln L = ln Φ (2Di − 1) p
i=1 1 − ρ2
exp(yi − x0i β − δDi )2

1
− ln √ for Di = (0, 1) (17)
2 σ 2 2π

Sample Selection
B2. Two step or control function: Heckman (1976,1979)
Step 1: Probit estimation of the selection equation to obtain

P(Di = 1 | wi ) = Φ(wi0 α).
Step 2: OLS estimation of the modified conditional outcome Eq. in

(15)
E(yi |x, Di , hi ) = x0i β + δDi + λhi (18)
where λ = ρσ and hi is the control function or inverse Mill’s ratio:
 0
 φ(wi0α̂) if Di = 1
hi = Φ(w i α̂)
−φ(w 0
α̂) (19)
 i
1−Φ(w0 α̂) if Di = 0
i
A correction of the covariance matrix of coef. in the 2nd step is needed.

Sample Selection
The Roy (1951) model

A. Framework
Model with endogenous selection and different outcome equations:
Di∗ = γ 0 wi + µi , Di = 1[Di∗ > 0], i = 1, · · · , N (20)

yi1 = β 01 xi + εi1 , (21)
yi0 = β 00 xi + εi0 , (22)
where Di∗ in equation (20) is a latent variable, the observed counterpart of which
is Di , and 1[ ] denotes the indicator function.
We assume joint normality for the three disturbances;
     
µi 0 1
εi1  ∼ N 0 , ρµε σε1 σε2 
1 1
εi0 0 ρµε0 σε0 0 σε20
Observe here that the null correlation term corresponds to

Cov(ε0 , ε1 ) = ρε0 ε1 σε0 σε1 .

Sample Selection
B. Estimation
B1. Maximum likelihood estimation The likelihood function is:

#1−Di Z
−γ 0 wi
N
"Z Di
Y ∞
L = f2 (µi , yi0 ) dµi f2 (µi , yi1 ) dµi (23)
i=1 −∞ −γ 0 wi
#1−Di
−γ 0 wi
N
" Di
Y Z Z ∞
= f1 (yi0 ) f1 (µi |yi0 ) dµi f1 (yi1 ) f1 (µi |yi1 ) dµi
i=1 −∞ −γ 0 wi
f2 (·) and f1 (·) are bivariate and univariate normal density functions. By
replacing in the population analogues:
N
" !#1−Di " !#Di
Y 1 −γ 0 wi − ρµi εi0 ζ0 1 γ 0 wi + ρµi εi1 ζ1
L= φ (ζ0 ) Φ p φ (ζ1 ) Φ p
i=1
σε0 1 − ρ2µi εi0 σε1 1 − ρ2µi εi1
(24)
yik−β0 x
k i
where ζk = σεk , with k = (0, 1), and φ(·) and Φ(·) are respectively the
pdf and cdf.

Sample Selection
B2. Two step estimation

We write the regression function for each subpopulation as
E(yi1 |D = 1, x, w) = β 01 xi + Cov(εi1 , µi )λ1 (γ 0 wi ) (25)

E(yi0 |D = 0, x, w) = β 00 xi 0
+ Cov(εi1 , µi )λ0 (γ wi ), (26)
0 0
φ(γ wi ) −φ(γ wi )
where λ1 (γ 0 wi ) = Φ(γ 0
0 w ) and λ0 (γ wi ) = Φ(−γ 0 w ) are the inverse Mills
i i
ratios. In terms of parameters to be estimated, these regressions can be
rewritten as:
yi1 = β 01 xi + ρµε1 σε1 λ1 (γ 0 wi ) + ηi1 (27)

yi0 = β 00 xi 0
+ ρµε0 σε0 λ0 (γ wi ) + ηi0 , (28)
where E(ηi1 |xi , λ1 ) = E(ηi0 |xi , λ0 ) = 0.

In relations (27) and (28), λ1 (γ 0 wi ) and λ0 (γ 0 wi ) do enter as additional
controls, the parameters of which ρµε1 σε1 and ρµε0 σε0 have to be estimated
in addition to parameters vector β 1 and β 0 .

Sample Selection
Two step estimation procedure:
1 Obtain consistent and efficient (under normality) estimates for γ by

estimating a probit using maximum likelihood. Compute λ1 (γ 0 wi ) and
λ0 (γ 0 wi ) given the predictions.
2 Use λ1 (γ 0 wi ) and λ0 (γ 0 wi ) as additional controls along side xi and apply

OLS to equations (27) and (28).
Since we use the estimates of the λ’s, the conventional standard errors are
not valid and need to be corrected by using techniques of simulation or
bootstrap.

Impact Evaluation: Treatment Effects
6. Application to Impact Evaluation: Treatment Effects for

Non-experimental Data
Heckman model
The estimated effects of a program is obtain by computing the difference in
expected school performance between participants and nonparticipants:
E(yi |Di = 1) − E(yi |Di = 0). We have
E(yi |Di = 1) = x0i β + δ + E(εi |Di = 1) and E(yi |Di = 0) = x0i β + E(εi |Di = 0)
Taking the difference of the above terms leads to
E(yi |Di = 1) − E(yi |Di = 0) = δ + E(εi |Di = 1) − E(εi |Di = 0)

= δ + E(εi |ui > −wi0 α) − E(εi |ui < −wi0 α)
φ(−wi0 α) φ(−wi0 α)

= δ+λ +λ
1 − Φ(−wi0 α) Φ(−wi0 α)
φ(wi0 α)

= δ+λ (29)
Φ(wi0 α){1 − Φ(wi0 α)}
where λ = ρσ. If the correlation coefficient ρ is zero then the estimation

procedure is reduced to an OLS. As a result, the difference in expected outcome
Impact Evaluation: Treatment Effects
The Roy model
Parameter Definition Assumptions (model)

ATE(x)a E(yi1 − yi0 |x) xi (β 01 − β 00 ) := ϑ(x)
ATET(x, w)b E(yi1 − yi0 |D = 1, x) ϑ(x) + Cov(εi1 − εi0 , µi )λ1
ATENT(x, w)c E(yi1 − yi0 |D = 0, x) ϑ(x) + Cov(εi1 − εi0 , µi )λ0
(a)
Average treatment effect
(b)
Average treatment effect on the treated
(c)
Average treatment effect on the untreated

Empirical Application
7. Empirical Application
Innovation survey in Uruguay:
Do firm linkages with universities positively affect R&D of

manufacturing firms?

Microeconometrie Chapitre3 TruncationSensoringSelectionModels

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Microeconometrie Chapitre3 TruncationSensoringSelectionModels

Uploaded by

Copyright:

Available Formats

Chapter 3

Truncation, Censoring and Selection Models

Théophile T. Azomahou (CERDI) Février 20-28, 2020 2 / 35

Main causes of incompletely observed data are truncation and censoring:

Truncation entails greater information loss than does censoring. A leading

Example: Tobit regression of hours on log wage

Théophile T. Azomahou (CERDI) Février 20-28, 2020 4 / 35

It is clear that censored and truncated conditional means are nonlinear in x

Clearly, sample means in truncated or censored samples cannot be used

Théophile T. Azomahou (CERDI) Février 20-28, 2020 5 / 35

For example, only low-income individuals may be sampled.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 6 / 35

2.2 Truncated Distributions

Figure: Truncated Normal Distributions

Théophile T. Azomahou (CERDI) Février 20-28, 2020 8 / 35

Example: Truncated Uniform Distribution

The truncated at x = 13 distribution is also uniform:

The expected value is

Théophile T. Azomahou (CERDI) Février 20-28, 2020 9 / 35

Théophile T. Azomahou (CERDI) Février 20-28, 2020 10 / 35

2.3 Moment of Truncated Distributions

for the mean and likewise for the variance.

E [x | truncation] = µ + σλ(α), (4)

where α = (a − µ)/σ, φ(α) is the standard normal density and

λ(α) = φ(α)/[1 − Φ(α)] if truncation is x > a, (3a)

Théophile T. Azomahou (CERDI) Février 20-28, 2020 11 / 35

0 < δ(α) < 1 for all values of α

Théophile T. Azomahou (CERDI) Février 20-28, 2020 12 / 35

2.4 Truncated Regression Model

φ[(a − x0i β)/σ]

The conditional mean is therefore a nonlinear function of a, σ, x, and β.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 13 / 35

E [yi | yi > a] = x0i β + σλ(αi ), (7)

Théophile T. Azomahou (CERDI) Février 20-28, 2020 14 / 35

In the subpopulation yi > a, the regression variance is not σ 2 but

Var[yi | yi > a] = σ 2 (1 − δi ) (9)

One’s first inclination might be to use OLS of y on X to estimate the parameters

Théophile T. Azomahou (CERDI) Février 20-28, 2020 15 / 35

Théophile T. Azomahou (CERDI) Février 20-28, 2020 16 / 35

Mechanism: With censoring we always observe the regressors x, completely

Théophile T. Azomahou (CERDI) Février 20-28, 2020 17 / 35

If censoring is from above (or from the right) we observe

For example, annual income data may be top-coded at U = $100, 000.

Théophile T. Azomahou (CERDI) Février 20-28, 2020 18 / 35

3.2 Censored Normal Distribution

Theorem (Moments of the Censored Normal Variable)

using Theorem on moments of truncation.

For the variance, we use a counterpart to the decomposition result that,

For the special case of a = 0, the mean simplifies to

Théophile T. Azomahou (CERDI) Février 20-28, 2020 20 / 35

3.4 Censored Regression (Tobit) Model

Regression model based on the preceding censoring is referred to as the censored

Théophile T. Azomahou (CERDI) Février 20-28, 2020 21 / 35

2. Consistent with Theorem 3, for an observation randomly drawn from the

Théophile T. Azomahou (CERDI) Février 20-28, 2020 22 / 35

Theorem (Partial Effects in the Censored Regression Model)

Théophile T. Azomahou (CERDI) Février 20-28, 2020 23 / 35

(yi − x0i β)2

Théophile T. Azomahou (CERDI) Février 20-28, 2020 24 / 35

5. Sample Selection Models

Théophile T. Azomahou (CERDI) Février 20-28, 2020 25 / 35

Selection may be due to self-selection, with the outcome of interest determined

Outcome equation (linear regression):