wk02 - GLS - Hand Written Notes 100822

Linear Models: OLS and GLS II
Yiran Xie
School of Economics, University of Sydney
August 10, 2022
Yiran Xie (School of Economics, University of Sydney) Linear Models: OLS August 10, 2022 1 / 22
Table of contents
1. Review
2. Distribution of OLS Estimator
3. GLS: Generalized Least Squares
4. Simulations: OLS Consistency and Asymptotic Normality
Review
Review
Linear Model:
y = Xβ + u
! $ ! ′$ ! $
y1 x1 u1
" % " % " %
where y = # ... &, X = # ... &, and u = # ... &.
(N×1) (N×K ) (N×1)
yN xN′ uN
The OLS estimator:
β̂OLS = (X ′ X )−1 X ′ y .
which minimizes the sum of squared errors
N
' N
'
Q(β) = ui2 = (yi − xi′ β)2
i=1 i=1
Review
Review: OLS Properties
β̂OLS is always estimable, provided rank[X ] = K .

(Assume that N ≥ K , the data has more observations than the # coefficients
need to be estimated).
Proof.
Need to show X ′ X is invertible
⇔ if (X ′ X )y = 0 then y = 0
We can show Xy = 0 because (Xy )′ (Xy ) = y ′ (X ′ X )y = 0.
Given that X is full rank, y must be 0.
Review
Review: OLS Properties
If the d.g.p. is y = X β + u then

, N
-−1 N
' '
′ −1 ′
β̂OLS = β + (X X ) X u = β + xi xi′ x i ui
( )* +
i=1 i=1
A
Essential result:
Finite sample properties: If u ∼ N [0, Ω] then
β̂OLS ∼ N [A · 0 + β, AΩA′ ] = N [β, (X ′ X )−1 X ′ ΩX (X ′ X )−1 ]
OLS Consistency:
plim β̂ = β
Review
Review: OLS Consistency
plim β̂ = plim{β + (X ′ X )−1 X ′ u}

./ 0−1 ' 1
1 ' ′ 1
= plim β + plim xi xi x i ui
N N
(N terms in each summation, devide them by N to use LLN )
/ ' 0−1
1 1 '
= plim β + plim xi xi′ × plim x i ui
N N
/ 0−1
1 ' ′
= β + plim xi xi ×0
N
=β
plim{AN × bN } = plim AN × plim bN if the plim’s are constants

The plim’s exist using laws of large numbers (as averages)
2
For plim N1 i xi ui = 0 the key assumption is E [ui |xi ] = 0.
Review
Topics Today
Distribution of OLS Estimator

Generalized Least Squares (GLS)
Test of Linear Hypothesis (Wald tests)
Simulations: OLS Consistency and Asymptotic Normality
Stata commands
Appendix: OLS in matrix notation example
OLS Limit Distribution
p
β̂ has limit distribution with all mass at β (since β̂ → β)
√
To get a nondegenerate distribution, inflate β̂ by N.
leads to a random variable that has nonzero yet finite variance asymptotically
p
β̂ has limit distribution with all mass at β (since β̂ → β)
√
To get a nondegenerate distribution, inflate β̂ by N.
leads to a random variable that has nonzero yet finite variance asymptotically
Then instead of
β̂OLS − β = (X ′ X )−1 X ′ u,
we focus on √
N(β̂OLS − β) = (N −1 X ′ X )−1 N −1/2 X ′ u,
Then limit normal distribution is

/ ' 0−1
√ 1 ′ 1 '
N(β̂ − β) = xi xi √ x i ui
N N

/ ' 0−1
√ 1 ′ 1 '
N N
2
By LLN: plim N1 xi xi′ exists and we assume it is finite and invertible
2 d
By CLT: √1N xi ui → N [0, B] for some B.

/ ' 0−1
√ 1 ′ 1 '
N N
2
By LLN: plim N1 xi xi′ exists and we assume it is finite and invertible
2 d
By CLT: √1N xi ui → N [0, B] for some B.
/ 0−1
√ d 1 '
⇒ N(β̂ − β) → plim xi xi′ × N [0, B]
N
3 / ' 0−1 / ' 0−1 4
d 1 ′ 1 ′
→ N 0, plim xi xi × B × plim xi xi
N N
p d p
(If HN → H and bN → N [µ, Ω] then HN bN → N [Hµ, HΩH ′ ])
Question: What is B?
2 d
(Recall that √1N xi ui → N [0, B] for some B)
Question: What is B?
2 d
(Recall that √1N xi ui → N [0, B] for some B)
2
B is the variance-covariance matrix of √1N xi ui in the limit
5 2 6 5 2 6 ′ 2 2
B = plim √1N i xi ui √1
N i x i ui = plim N1 i j ui uj xi xj′
OLS Asymptotic Distribution

√
Rescale from N(β̂ − β) to β̂ for ”friendlier” looking results. Recall that
3 / ' 0−1 / ' 0−1 4
√ d 1 ′ 1 ′
N(β̂ − β) → N 0, plim xi xi × B × plim xi xi
N N

√
3 / ' 0−1 / ' 0−1 4
√ d 1 ′ 1 ′
N N
The so-called ”asymptotic distribution” is

3 / ' 0−1 / ' 0−1 4
a 1 ′ B 1 ′
β̂ ∼ N β, plim xi xi × × plim xi xi
N N N

√
3 / ' 0−1 / ' 0−1 4
√ d 1 ′ 1 ′
N N
The so-called ”asymptotic distribution” is

3 / ' 0−1 / ' 0−1 4
a 1 ′ B 1 ′
β̂ ∼ N β, plim xi xi × × plim xi xi
N N N
To estimate the variance-covariance matrix, we drop plim’s and replace B by

a consistent estimate B̂
7 5' 6−1 5' 6−1 8
a
β̂ ∼ N β, xi xi′ × N B̂ × xi xi′
VCE (Variance Component Estimation)

Default Estimate of VCE:
Independent homoskedastic errors: V [ui |xi ] = σ 2
Then
1 '' 1 ' 2 ′ 1 '
B = plim ui uj xi xj′ = plim ui xi xi = σ 2 plim xi xi′
N N N
i j i i

Default Estimate of VCE:
Independent homoskedastic errors: V [ui |xi ] = σ 2
Then
1 '' 1 ' 2 ′ 1 '
B = plim ui uj xi xj′ = plim ui xi xi = σ 2 plim xi xi′
N N N
i j i i
We can use
, -
1 ' 1 ' 2
B̂ = s 2 xi xi′ , where s 2 = ûi .
N N −K
i i
Then
, -−1
5' 6−1 5' 6−1 '
V̂Default [β̂] = xi xi′ × N B̂ × xi xi′ =s 2
xi xi′ .
i
Robust Estimate of VCE:

Most often used: only requires data to be independent over i.
Independent heteroskedastic errors: V [ui |xi ] = σi2
In Stata this is option vce(robust)
2 2 2
Then B = plim N1 i j ui uj xi xj′ = plim N1 i ui2 xi xi′ .
Robust Estimate of VCE:

Most often used: only requires data to be independent over i.
Independent heteroskedastic errors: V [ui |xi ] = σi2
In Stata this is option vce(robust)
2 2 2
Then B = plim N1 i j ui uj xi xj′ = plim N1 i ui2 xi xi′ .
White (1980) showed that can use
1 ' 2 ′
B̂ = ûi xi xi , where ûi = yi − xi′ β̂
N
i
.
Yields the heteroskedastic-consistent estimate of the
variance-covariance matrix of the OLS estimator (VCE)
5' 6−1 ' 5' 6−1
V̂robust [β̂] = xi xi′ ûi2 xi xi′ xi xi′
Robust Estimate of VCE: example

Example: N = 4 with (x, y ) equal to (1, 1), (2, 3), (2, 4), and (3, 4).
Then!y is $ 4 ×!1 $and X is!4 × $2 with
! $ ! $
y1 1 x1′ x11 x21 1 1
" y 2 % " 3% "x2′ % "x12 x22 % "1 2%
y =" % " % " % " % "
#y3 & = #4& ; X = #x3′ & = #x13 x23 & = #1 2&.
%
y4 4 x4′ x14 x24 1 3

So 7 8−1 7 8 7 8
′ −1 ′ 4 8 12 0
β̂OLS = (X X ) X y = =
8 18 27 1.5

Example: N = 4 with (x, y ) equal to (1, 1), (2, 3), (2, 4), and (3, 4).
Then!y is $ 4 ×!1 $and X is!4 × $2 with
! $ ! $
y1 1 x1′ x11 x21 1 1
" y 2 % " 3% "x2′ % "x12 x22 % "1 2%
y =" % " % " % " % "
#y3 & = #4& ; X = #x3′ & = #x13 x23 & = #1 2&.
%
y4 4 x4′ x14 x24 1 3

So 7 8−1 7 8 7 8
′ −1 ′ 4 8 12 0
β̂OLS = (X X ) X y = =
8 18 27 1.5
9 :′
Then û = y − X β̂ = −0.5 0 1 0.5
5' 6−1 ' 5' 6−1

V̂robust [β̂] = xi xi′ ûi2 xi xi′ xi xi′
7 8−1 ' 7 8−1
4 8 2 ′ 4 8
= ûi xi xi
8 18 8 18
' 7 8 7 8 7 8 7 8 7 8
1 1 2 1 2 2 1 3 2 1 4 1 6 17
ûi2 xi xi′ = 0.5 2
+0 +1 + 0.5 =
1 1 2 4 3 9 4 16 4 17 53
7 8
1 57 −31
⇒ V̂robust [β̂] =
32 −31 22
GLS: Generalized Least Squares
Generalized least squares (GLS) Overview
OLS is efficient (best linear unbiased estimator) if errors are i.i.d. so that
V [u|X ] = σ 2 I .
In practice errors are rarely i.i.d.
So we usually do OLS and obtain robust VCE that permits V [u|X ] ∕= σ 2 I
Generalized least squares (GLS) Overview
OLS is efficient (best linear unbiased estimator) if errors are i.i.d. so that
V [u|X ] = σ 2 I .
In practice errors are rarely i.i.d.
So we usually do OLS and obtain robust VCE that permits V [u|X ] ∕= σ 2 I
More efficient feasible GLS (FGLS) assumes a model for V [u|X ]
yields more precise estimates (smaller standard errors and bigger t-statistics)
but then obtain robust VCE that allows for misspecified model for V [u|X ]
Generalized least squares (GLS)
Suppose V [u|X ] − Ω where Ω is known

and y = X β + u, E [u|X ] = 0 as before.
The generalized least squares estimator is efficient:
β̂GLS = (X Ω−1 X )−1 X Ω−1 y .
Main idea: transform the original linear model with heteroskedastic errors to
a standard linear model with homoskedastic errors
Derivation:
Premultiply y = X β + u by Ω−1/2 so
Ω−1/2 y = Ω−1/2 X β + Ω−1/2 u.
This model has i.i.d. errors since

V [Ω−1/2 u|X ] = E [(Ω−1/2 u)(Ω−1/2 u)′ |X ] = Ω−1/2 ΩΩ−1/2 = IN .
Then GLS is OLS in this transformed model:
β̂GLS = [(Ω−1/2 X )′ (Ω−1/2 X )]−1 (Ω−1/2 X )′ (Ω−1/2 y )

= (X Ω−1 X )−1 X Ω−1 y .
Derivation:
Premultiply y = X β + u by Ω−1/2 so
Ω−1/2 y = Ω−1/2 X β + Ω−1/2 u.
This model has i.i.d. errors since

V [Ω−1/2 u|X ] = E [(Ω−1/2 u)(Ω−1/2 u)′ |X ] = Ω−1/2 ΩΩ−1/2 = IN .
Then GLS is OLS in this transformed model:
β̂GLS = [(Ω−1/2 X )′ (Ω−1/2 X )]−1 (Ω−1/2 X )′ (Ω−1/2 y )

= (X Ω−1 X )−1 X Ω−1 y .
generalized least squares (GLS)
The variance-covariance matrix of β̂GLS is
V (β̂GLS ) = σ 2 ((Ω−1/2 X )′ (Ω−1/2 X ))−1 = (X ′ Ω−1 X )−1
Feasible generalized least squares (FGLS)
The GLS estimator cannot be directly implemented because in practice Ω is

not known.
To implement GLS we need a consistent estimate of Ω.
p
Assume a model for Ω = Ω(γ), estimate γ̂ → γ,
p
and form Ω̂ = Ω(γ̂) → Ω.
The feasible GLS estimator (FGLS) is
β̂GLS = (X Ω̂−1 X )−1 X Ω̂−1 y ,
and then
a
β̂GLS ∼ N [β, (X Ω̂−1 X )−1 ]
Example: V [ui2 |xi ] = exp(zi′ γ)
We will discuss this in detail in Week 4.
Simulations: OLS Consistency and Asymptotic Normality
Simulations: OLS consistency and asymptotic normality
Stata Simulation:
D.g.p.: yi = b1 + b2 xi + ui where xi ∼ χ2 (1) and β1 = 1, β2 = 2.
Error: ui ∼ χ2 (1) − 1 is skewed with mean 0 and variance 2.

wk02 - GLS - Hand Written Notes 100822

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

wk02 - GLS - Hand Written Notes 100822

Uploaded by

Copyright:

Available Formats

Linear Models: OLS and GLS II

School of Economics, University of Sydney

August 10, 2022

2. Distribution of OLS Estimator

3. GLS: Generalized Least Squares

4. Simulations: OLS Consistency and Asymptotic Normality

Review: OLS Properties

β̂OLS is always estimable, provided rank[X ] = K .

Review: OLS Properties

If the d.g.p. is y = X β + u then

β̂OLS ∼ N [A · 0 + β, AΩA′ ] = N [β, (X ′ X )−1 X ′ ΩX (X ′ X )−1 ]

Review: OLS Consistency

plim β̂ = plim{β + (X ′ X )−1 X ′ u}

plim{AN × bN } = plim AN × plim bN if the plim’s are constants

Distribution of OLS Estimator

OLS Limit Distribution

OLS Limit Distribution

OLS Limit Distribution

Then limit normal distribution is

OLS Limit Distribution

Then limit normal distribution is

OLS Limit Distribution

Then limit normal distribution is

OLS Limit Distribution

OLS Limit Distribution

OLS Asymptotic Distribution

OLS Asymptotic Distribution

The so-called ”asymptotic distribution” is

OLS Asymptotic Distribution

The so-called ”asymptotic distribution” is

To estimate the variance-covariance matrix, we drop plim’s and replace B by

VCE (Variance Component Estimation)

VCE (Variance Component Estimation)

VCE (Variance Component Estimation)

Robust Estimate of VCE:

VCE (Variance Component Estimation)

Robust Estimate of VCE:

Robust Estimate of VCE: example

y4 4 x4′ x14 x24 1 3

Robust Estimate of VCE: example

y4 4 x4′ x14 x24 1 3

5' 6−1 ' 5' 6−1

Robust Estimate of VCE: example

Generalized least squares (GLS) Overview

Generalized least squares (GLS) Overview

Generalized least squares (GLS)

Suppose V [u|X ] − Ω where Ω is known

β̂GLS = (X Ω−1 X )−1 X Ω−1 y .

Generalized least squares (GLS)

Ω−1/2 y = Ω−1/2 X β + Ω−1/2 u.

This model has i.i.d. errors since

β̂GLS = [(Ω−1/2 X )′ (Ω−1/2 X )]−1 (Ω−1/2 X )′ (Ω−1/2 y )

Generalized least squares (GLS)

Ω−1/2 y = Ω−1/2 X β + Ω−1/2 u.

This model has i.i.d. errors since

β̂GLS = [(Ω−1/2 X )′ (Ω−1/2 X )]−1 (Ω−1/2 X )′ (Ω−1/2 y )

generalized least squares (GLS)

The variance-covariance matrix of β̂GLS is

V (β̂GLS ) = σ 2 ((Ω−1/2 X )′ (Ω−1/2 X ))−1 = (X ′ Ω−1 X )−1

Feasible generalized least squares (FGLS)

The GLS estimator cannot be directly implemented because in practice Ω is

β̂GLS = (X Ω̂−1 X )−1 X Ω̂−1 y ,