Professional Documents
Culture Documents
Sorbonne Econometrics
Sorbonne Econometrics
Hélène Huber 1
PSME
2015-2016
1/74
Outline
2/74
Outline
3/74
What if hypotheses do not hold ?
4/74
The problem with OLS
5/74
How can we correct for these problems ?
6/74
Generalized least squares
7/74
The GLS estimator
8/74
Using GLS
9/74
The FGLS estimator
10/74
Remarks
11/74
Outline
12/74
Heteroskedasticity
13/74
Re-weighting matrix
a1 0 · · · 0 1/a1 0 ··· 0
0 a2 · · · 0 0 1/a2 · · · 0
so Ω−1
Ω=
.. .. .. .. =
.. .. .. ..
. . . . . . . .
0 0 0 aN 0 0 0 1/aN
(1)
And the re-weighting matrix is :
√
1/ a1 0 ··· 0
√
0
1/ a2 · · · 0
Ω−1/2
=
.. .. .. .. (2)
. . . .
√
0 0 0 1/ aN
14/74
Testing for heteroskedasticity
15/74
Weighted least squares in practice (1)
16/74
Weighted least squares in practice (2)
What to do :
1. Estimate the original regression y = Xb + u with OLS, save
residuals, compute their square
2. Using plots or any other method, pick the right X ’s for the
auxiliary regression : log(û 2 ) = X γ + constant
3. Compute the predicted residual squared from the auxiliary
regression (need to get back to levels from logs)
4. For each individual i, divide every term of the original model
by the square root of that prediction
5. Run OLS on transformed data
17/74
Should we always use FGLS ?
18/74
The White matrix
t=1
Where the û’s are the residuals from the OLS regression.
This matrix can be used as an estimate of the true variance of OLS
estimators, and thus for tests. They are called "White" or "robust"
standard errors (because they are robust to heteroskedasticity),
and can be computed directly by softwares (option "robust").
19/74
Outline
20/74
Autocorrelation : a preview
21/74
First-order autocorrelation
22/74
Outline
23/74
Outline
24/74
Considering random explanatory variables
25/74
Hypotheses for estimation
1. E (u) = 0
2. ∀t, t 0 , Xt is random and uncorrelated to ut 0
3. Rk(X ) = k
4. E (uu 0 ) = σ 2 IN
5. Error terms are iid (0, σ 2 )
6. plim[(X 0 X )]/N = VX which is a positive definite matrix (when
N goes to infinity, the X variables always keep some variance)
26/74
Properties of OLS : unbiasedness
27/74
Properties of OLS : consistency
X 0 X −1 X 0u
plim(b̂mco ) = b + [plim( )] plim( ) = b + VX−1 .0 = b
N N
We will see in the next section that the asymptotic distribution of
b̂ is a normal, as expected.
28/74
Unconditional inference
29/74
Sketch of proof
Recall that for any parameter bj , the usual Student t-statistic is
the following :
b̂j
tj =
σ̂b̂j
I If we assume that the Xi are iid 1 and the u’s are normal,
then :
I OLS estimators are asymptotically normal and consistent
I So if the sample is large enough for this to hold, then all the
previous results on T and F tests remain the same
I We can thus run the usual T- and F-tests
I If the u are not normal but only iid, we will use the
asymptotic version of the usual T-test and F-tests if the
sample is large enough
I In that case, Normal replaces Student and χ2 replaces Fisher
31/74
We wish now to enlarge our framework to the case where u’s are
only iid (and not necessarily normal) and where X ’s can be random
(and not necessarily fixed). In what follows, we will thus use the
following asymptotic results, obtained under the usual model
hypotheses :
X 0 X −1 X 0u
plim(b̂mco ) = b + [plim( )] plim( ) = b + VX−1 .0 = b
N N
2
plim(σ
bmco ) = σ2
√
N(b̂mco − b) → N (0, σ 2 VX−1 )
32/74
Asymptotics : tests (1)
33/74
Asymptotics : tests (2)
34/74
Asymptotics : tests (3)
35/74
Outline
36/74
Non exogenous explanatory variables
X 0 X −1 X 0u
plim(b̂ols ) = b + [plim( )] plim( ) 6= b
N N
Even if we increase the size of the sample, we won’t get the right
value for b
37/74
Exogeneity : definitions
38/74
Omitted variables
Let’s assume the true model is the following :
yt = Xt0 b + wt d + vt
If we omit wt and estimate the following model instead :
yt = Xt0 b + ut
I wt is included in ut
I If wt is correlated to X , then E (Xt0 wt ) 6= 0 and thus
E (Xt0 ut ) 6= 0 : OLS is biased
I Remark : in the fixed X case, we would get the same result
using the Frish-Waugh theorem (see Dormont)
I When a variable that is correlated to the X ’s is omitted,
estimators suffer from an omitted variable bias
I Estimators are inconsistent
39/74
Variables measured with error
Let’s use a very simple one-variable theoretical model :
yt∗ = xt∗ .b + vt
Assume that x is measured with error. To run the estimation, we
have access to values yt and xt : yt = yt∗ and xt = xt∗ + et , et
being a white noise.
yt = xt .b + ut
With ut = vt − bet . We thus have the following :
41/74
Simultaneous equations cont’d
42/74
Autocorrelation with lagged dependent variable
Assume we use the following model :
yt = b0 + b1 xt + b2 yt−1 + ut
With :
ut = ρut−1 + εt
We can then write :
I yt = b0 + b1 xt + b2 yt−1 + ρut−1 + εt (1)
I And at the same time :
I yt−1 = b0 + b1 xt−1 + b2 yt−2 + ut−1 (2)
I Equation (2) shows that yt−1 depends on ut−1 , so yt−1 is
correlated to the error term in equation (2)
I The lagged outcome variable is thus not exogenous in the
estimated model
I It can be shown that it is exogenous if errors terms are not
autocorrelated
I How to test for this : Durbin’s h statistic (see Dormont) 43/74
General issue in policy evaluation
45/74
Outline
46/74
Instrumental variables
Let’s use the following model, where (to make things simple) all
X ’s are endogenous, except the constant (can’t be endogenous by
definition) :
y = X .b + u
Let’s consider a set of variables Z(N,p) different from the X ’s but
including the constant, with the following properties :
I E (Zt0 ut ) = 0 : variables Z are exogenous
I Rk(Z ) = p
I plim(Z 0 X /N) = VZX with VZX a non null matrix of dimension
(p, k) and rank k
I plim(Z 0 Z /N) = VZ with VZ a finite positive definite matrix
of dimension (p, p)
Means : variables Z need to be both exogenous and correlated to
X , and the number of IV, p, is greater than the number of
47/74
explanatory variables X (p ≥ k).
The estimator
The original model is :
y = X .b + u
I Assume we regress y , u and every X on variables Z (k + 2
regressions)
I We compute the predictions, respectively ỹ , ũ and X̃
I We use these new elements in the model instead :
ỹ = X̃ .b + ũ
I Let’s call PZ the orthogonal projection matrix, that projects
on L(Z ) : PZ = Z (Z 0 Z )−1 Z 0
I PZ is such that it is symmetric (PZ0 = PZ ) and idempotent
(PZ .PZ = PZ )
I It can be shown that ỹ = PZ y , ũ = PZ u and X̃ = PZ X
I It’s as if we had premultiplied the original model by PZ
I The IV estimator is : b̂iv = (X̃ 0 X̃ )−1 X̃ 0 ỹ = (X 0 PZ X )−1 X 0 PZ y 48/74
Remark
49/74
Intuition (1)
51/74
Remarks
52/74
Generalization (1)
53/74
Generalization (2)
54/74
Important : final notation
55/74
Properties of the IV estimator (1)
56/74
Properties of the IV estimator (2)
√ 0
basympt ( N(b̂iv − b)) = σ̂ 2 ( X PZ X )−1
V iv
N
SSRiv
with σˆiv 2 = N and ûi = y − X b̂iv
I When OLS is consistent, IV is less precise than OLS
I Precision increases with the number of instruments
57/74
Tests on regression parameters
59/74
How to run IV estimation
60/74
Two-stage least squares (1)
61/74
Two-stage least squares (2)
62/74
Exogeneity test (1)
I We test H0 : E (X 0 u) = 0
I This is called the "Hausman test" or "Durbin-Wu-Hausman
test"
I If H0 is true, then both OLS and IV estimators are consistent
I If H0 is false, only the IV estimator is consistent
I The test is based on the difference between b̂iv and b̂ols
I They are asymptotically normal : if we compute the difference
between the two, take its quadratic form and "divide" it by its
variance matrix, we will get a χ2 distribution
63/74
Exogeneity test (2)
64/74
The augmented regression
65/74
Proofs
66/74
Selecting convenient instruments
I Sargan test : H0 : E (Z 0 u) = 0
I Also called : test of overidentifying restrictions (Stata : overid
command)
Under H0 :
û 0 PZ û
→ χ2p−k
s2
û 0 û
with û = y − X b̂iv and s 2 = N .
û 0 PZ û is the sum of the predicted value of the regression of û on
Z , squared.
Remark : when p = k, the statistic is always zero because
b̂iv = (Z 0 X )−1 Z 0 y and Z 0 û = 0. So we cannot run the test with
the minimum number of instruments.
67/74
The problem with weak instruments
70/74
Quarter of birth and earnings
71/74
Results
72/74
Method
73/74
Are IV estimates reliable ?
I The goal of IV estimation is to bring to light real causality
and not mere correlation between outcome y and explanatory
variables X
I In treatment evaluation, we want to assess the Average
Treatment Effect
I So we instrument the dummy indicating whether the
individual was assigned to the treatment group because we
suspect that choosing a treatment is not exogenous : it is
likely to be correlated to unobserved heterogeneity
I However, it can be shown that in doing so, we estimate a
Local Average Treatment Effect, i.e. we estimate the impact
of the treatment only on the individuals for which the
instrument indeed has an impact (called compliers)
I In the Angrist study, compliers represent only 2% of the
sample
74/74