Professional Documents
Culture Documents
OLS Estimation of Single Equation Models PDF
OLS Estimation of Single Equation Models PDF
MODELS
Structural Model:
y = 0 + 1 x1 + 2 x2 + + K xK + u
= x + u,
where, x1K = (x1 x2 xK ) with x1 = 1 (intercept) and
k1 =
.
...
Assumptions
1. We can obtain a random sample from the population, where the
sample observations {(xi, yi) : i = 1, 2, , N } are iid.
2. The population error has zero mean and is uncorrelated to the
regressors.
E(u) = 0, cov(xj , u) = 0, j = 1, 2, , K
(1)
(2)
[Note:
If we do not have
of y. Then xK and u are generally correlated. E.g.- If quantity supplied is a dependent variable and price is an explanatory
variable, then the market clearing (or equilibrium) phenomenon
codetermines the values of quantity supplied and price as may
be observed in data. In this case xK (or price) is likely to be
endogenous.
]
Assumption (2) is stronger than what is required for deriving the
OLS . So we stick to (1):
asymptotic properties of
E(x0u) = 0
[OLS 1]
x1 u
0
x u
0
2
E
= ,
..
..
.
.
xK u
i.e. E(xj u) = 0 j = 1, , K.
Also as x1 = 1, Ex1u = 0 E(u) = 0 . Thus, E[xj E(xj )][u
E(u)] = 0 (remember E(xj ) is a population moment, hence constant).
cov(xj , u) = 0, j = 1, 2, , K .
3.
0
E(x x) = E
x1
x2
(x1x2 xK ) =
...
xK
2
Ex1
Ex x
2 1
...
ExK x1
Ex1x2
.....
Ex2
2
...
.....
...
ExK x2
.....
Ex1xk
Ex2xK
...
Ex2
K
KK
Method of Moments
Replace population moments E(x0x) and E(x0y) with corresponding
sample moments to obtain the sample counterpart (or, estimator) of
the population parameter.
So,
OLS
= (N 1
N
X
N
X
xixi)1(N 1
i=1
xi yi )
i=1
1
P 2
x1i
P
x2ix1i
P 2
x2i
...
P
xKix1i
x1ix2i
...
xKix2i
.....
.....
...
.....
x1ixki
P
x2ixKi
...
P 2
xKi
x1iyi
x2iyi
...
P
xKiyi
x11
Let , XN K = 12
..
.
x21
.....
x22
...
.....
...
xK1
xK2
yN 1 =
able y.
y1
y2
yN
(the sample of size N approaching the entire population causing sample moment to converge to corresponding population moment in probability).
P
0
0
Hence P [ N
x
x
X
X is
i=1 i i
non-singular] 1 as N .
PN
0
1
1
Hence by Corollary 1 of Asymptotic Theory, plim [(N
i=1 xixi) ] =
0
A1, where A = E(x x).
.
We have used the method of moments to derive this estimator
Why are we calling it Ordinary Least Square then?
m(x))
mM
)
i
i
i=1
N
Consistency of OLS
OLS
= (N 1
N
X
xixi)1(N 1
i=1
+ (N 1
N
X
x i yi )
i=1
N
X
xixi)1(N 1
i=1
N
X
xiui).
i=1
PN
0
0
p
x
u
E(x
u) = 0 by OLS 1.
i=1 i i
PN
0
x
i=1 iui)] = 0).
i=1 i
i=1 i
where, uN 1 =
u1
u2
.
...
uN
OLS ) = + E[(X0X)1X0u] 6= in general.
But E(
However if we use the stronger assumption (2) instead of OLS1, i.e.,
OLS may be retrieved.
E(u|x) = 0 then unbiasedness of
OLS = + (X0X)1X0u.
|X) = + (X0X)1X0E(u|X) = + 0 = .
Now E(
) = E[E(
|X)] = E( ) = .
But E(
PN
0
p
1
1
We know that, (N
x
x
)
A1, so that
i=1 i i
PN
0
p
1 A1
1
x
x
)
(N
i=1 i i
, i.e.
PN
0
1
1 A1 = o (1).
(N
p
i=1 xixi)
0
PN
0
0
0
d
1/2
x
u
N
(0,
B),
where
B
=
var(x
u
)
=
E(x
N
KK
i=1 i i
i i
i ui u i xi ) =
0
2 x0 x) for any i.
x
x
)
=
E(u
E(u2
i i i
PN
0
x
i=1 iui = Op(1), by Lemma 5.
PN
0
x
i=1 iui) +
PN
0
x
i=1 iui) + op(1), by Lemma 2.
Assumption 4
4.
OLS ) =
N (
PN
0
a
1
1/2
2 A1 ).
A (N
x
u
)+o
(1),
it
follows
that
N
(
N
(0,
p
i
OLS
i=1 i
From OLS 1 - OLS 3, and the fact that
Proof : N 1/2
PN
0
d
x
u
N (0, B).
i
i=1 i
PN
0
d
1 BA1 ).
x
u
)
N
(0,
A
i
i=1 i
) A1(N 1/2
OLS
N (
PN
0
p
x
u
)
0.
i=1 i i
d
OLS )
N (
N (0, A1BA1).
But under OLS 3, B = 2E(x0x) = 2A.
Thus,
a
OLS )
N (
N (0, 2A1).
Usual estimator of 2 is
2 = NRSS
K , where RSS =
).
squared OLS residuals: ui = yi xi
PN
2 (sum of
u
i
i=1
d
OLS )
N (
N (0, A1BA1).
OLS is
In other words, asymptotic variance of
1 1
PN
0
1
A consistent estimator of A is (N
i=1 xixi).
PN
p
2 x0 x
2 x0 x) =
u
E(u
i=1 i i i
B.
OLS .
As ui cannot be observed, replace ui by OLS residual u
i = yi xi
White (1980, Econometrica) proves the following.
= N 1
White (1980): A consistent estimator of B is B
PN
2 x0 x .
u
i=1 i i i
)
xi u
i = xiui xixi(
(1)
Transposing,
0 0
) xi x i
u
ixi = uix (
(2)
)0 x i x i
ui x i (
0
)xi
ui x i x i (
0
0
)(
)0 x i x i
+ x i x i (
N
N
N
X
X
0
0
0
0
1 X
1
1
2
2
0
u
x xi =
u x xi
u i x i ( ) x i x i
N i=1 i i
N i=1 i i
N i=1
N
0
1 X
)xi
u i x i x i (
N i=1
N
0
0
1 X
0
+
xixi( )( ) xixi
N i=1
(3)
a
Thus, vec(A) = vec 11
a21
a11
a12
a21
a12
a22
a22
a11B
a B
...
aK1B
a12B
a22B
...
...
aK2B
a1LB
a2LB
...
aKLB
b1
a13
, B31 = b2 , C12 =
c1
a23
b3
a
Let A23 = 11
a21
a12
a22
c2 ,
c (a b + a12b2 + a13b3)
ABC = 1 11 1
c1(a21b1 + a22b2 + a23b3)
c (a b + a b + a b )
22 2
23 3
Therefore, LHS = vec(ABC) = 1 21 1
Now RHS=
c
a
(C 0 A)vec(B) = ( 1 11
c2
a21
b1
a13
)vec b2
a23
b3
a12
a22
c1a11
c a
= 1 21
c2a11
c2a21
c1a12
c1a22
c2a12
c2a22
c1a13
c1a23
c2a13
c2a23
b1
b2
b3
a b c +a b c +a b c
22 2 1
23 3 1
= 21 1 1
c (a b + a b + a b3)
22 2
23
= 1 21 1
Hence LHS=RHS.
Part III:
Consider 3rd term on RHS of eqn 3.
1 PN vec(u x0 x (
i
i )xi)
N
i=1
1 PN (u x0 x0 x )vec(
).
=N
i=1 i i
i i
[ui is a scalar.
)K1 = B and
Treat (xixi)KK = A and (
xi1K = C]
p
p
)
)
Now, clearly (
0 vec(
0.
uix1i
0
0
uix2i
Again, uixi xixi =
...
uixKi
2
x1i
x x
2i 1i
...
xKix1i
x1ix2i
x2
2i
.....
...
.....
...
xKix2i
.....
x1ixKi
x2ixKi
...
x2
Ki
uix1i
=
uix2i
u x
i Ki
2
x1i
x x
2i 1i
...
x1ix2i
.....
x2
2i
...
.....
...
xKix1i
xKix2i
.....
2
x1i
x x
2i 1i
...
x1ix2i
.....
x2
2i
...
.....
...
xKix1i
xKix2i
...
.....
2
x1i
x x
2i 1i
...
x1ix2i
.....
x2
2i
...
.....
...
xKix2i
.....
xKix1i
x1ixKi
x2ixKi
...
xKi
x1ixKi
x2ixKi
...
x2
Ki
x1ixKi
x2ixKi
...
x2
K 2 K
Ki
P
PN
0
0
1
1
The terms of the final N (uixixixi) matrix are of the form N i=1 uix3
ki
P
P
1
N u x2 x or 1
N
or N
i=1 i ki ji
N i=1 uixkixji xli, j, k, l = 1, . . . , K.
2
Assume corresponding population moments, i.e., E(x3
j u), E(xk xj u),
or E(xj xk xl u) exist and are finite.
PN
0
0
p
1
Then by WLLN, N i=1(uixi xixi) E(), the corresponding popu-
)(
)
i i
i i
N i=1
1 PN (x0 x x0 x )vec((
)(
)0).
=N
i=1 i i
i i
p
p
)(
)0 )
Again,
0, thus vec((
0, by Lemma 2.
x2
1i
x21i
x2ix1i
..
.
xKix1i
x1ix2i
x22i
...
xKix2i
x2ix1i
.
..
.
.
xkix1i
.
..
.
.
.
.
...
.
...
.
.
...
.
...
...
...
...
...
...
...
...
x1ixki
x2ixKi
...
x2Ki
x1ix2i
.
...
.
..
.
.
x22i
.
..
.
.
.
.
...
.
.
.
...
.
...
...
...
...
...
...
...
...
.
...
...
...
...
...
.
...
xkix2i
.
..
.
.
.
.
...
.
x1ixki
.
...
...
x2ixki
.
..
.
.
.
.
...
.
...
...
...
...
.
.
...
.
...
...
...
...
...
.
..
.
.
...
...
...
...
...
.
...
x2ki
.
..
.
.
.
.
...
.
...
...
...
...
...
...
...
.
PN
0
0
p
1
Hence by WLLN, N i=1(xixi xixi) E(), the corresponding pop-
)(
Note also, if N
)x
)
or
i i i
i
i=1
i i
N i=1
0
0
) xixi) are op(1), then so also are the original expressions in RHS of
P
0
)xi) or 1 PN (x0 xi(
)(
i=1
i=1
)0xixi), since the vec operator does nothing but stack the columns
of the original matrices into vectors.
PART IV: Finally, from equation (3),
N
N
X
0
0
1 X
1
2
2
u
x xi =
u x xi + op(1).
N i=1 i i
N i=1 i i
So
N
N
X
0
0
1 X
1
p
2
2
u
i xixi
ui xi x i .
N i=1
N i=1
Thus,
N
0
1 X
p
2
u
x xi B.
N i=1 i i
OLS
Hence the heteroskedasticity-robust-variance-covariance matrix of
is:
1B
A
1
A
)robust =
Avar(
N
N
1 X
0
1
x i xi
N
N i=1
N
X
N
1 X
N i=1
0
2
u
i xixi
N
1 X
N i=1
1
0
x i xi
N
X
0
2
0
1
0
1
u
i xixi (X X) , where
xixi = X0X.
= (X X)
i=1
i=1
This matrix is also often called the sandwich matrix because of its
form.
OLS ) is obtained, we can
Once heteroskedasticity-consistent varcov(
OLS , by
also get the heteroskedasticity-robust standard errors of
)robust.
taking square roots of diagonal terms of Avar(
Once robust standard errors are obtained, t and F statistics can be
computed in the usual way (robust t or F stats).
However under the heteroskedasticity, F -stats are usually not valid
even asymptotically. So instead Wald-stats should be employed.
H 0 : R(QK) (K1) = r(Q1), where rank(R) = Q K.
r)0(RVR
0)1(R
The heteroskedasticity robust Wald stat is W = (R
a 2
)robust. W
= Avar(
r), where V
.
Q
1. GLS leads to efficiency gain only when the model for var(y|x) is
correct. So it requires a lot of information.
2. Finite sample properties of Feasible GLS are usually not known
except for simplest cases.
3. Under weak exogeneity assumption [OLS 1] GLS is generally inconsistent if E(u|x) 6= 0.
[See undergraduate notes for treatment of GLS].