ECON F342 AE CH 12

ECON F342 : Applied Econometrics
N V M RAO
1 / 23
Reading from the Text Book
For the today

Serial correlation and heteroskedasticity in time series
regressions.
Chapter 12 (pp. 376 – 404).
2 / 23
Reading From the Text Book
For the today

Serial correlation and heteroskedasticity in time series
regressions.
Chapter 12 (pp. 376 – 404).
2 / 23
Properties of OLS with Serially Correlated
Errors: Unbiasedness and Consistency
OLS is unbiased under the first 3 Gauss-Markov

assumptions for time series regression.
3 / 23

But we do not assume anything about serial correlation
present often in the economic data.
3 / 23

As long as explanatory variables are strictly exogenous,
β̂OLS are unbiased, regardless the degree of serial
correlation in the errors.
3 / 23

Last lecture, we also relaxed the strict exogeneity to
E(ut |xt ) = 0 and by assuming weak dependence of the
data, we have shown that β̂OLS are still consistent
(although not necessarily unbiased).
3 / 23

Last lecture, we also relaxed the strict exogeneity to
E(ut |xt ) = 0 and by assuming weak dependence of the
data, we have shown that β̂OLS are still consistent
(although not necessarily unbiased).
But what about assumption on serial correlation?
3 / 23
Errors: Efficiency and Inference
Gaus-Markov theorem requires both homoskedasticity and

serially uncorrelated errors.
4 / 23

Thus, OLS is no longer BLUE in the presence of serial
correlation.
4 / 23

correlation.
...and standard errors and test statistics are not valid.
4 / 23

correlation.
...and standard errors and test statistics are not valid.
Let’s assume model with AR(1) errors:
yt = β0 + β1 xt + ut ,
ut = ρut−1 + t ,
for t = 1, 2, . . . , n, where |ρ| < 1 and t uncorrelated random
variables with zero mean and variance σ2 .
4 / 23
Errors: Efficiency and Inference cont.
The OLS estimator is then:

n
X
β̂1 = β1 + SSTx−1 xt ut ,
t=1
Pn 2
where SSTx = t=1 xt
5 / 23
Variance of β̂1 conditional on X is:
n
!
X
−2
V ar(β̂1 ) = SSTx V ar xt ut
t=1
 
n
X n−1
XX n−t
= SSTx−2  x2t V ar(ut ) + 2 xt xt+j E(ut ut+j )
t=1 t=1 j=1
n−1
XX n−t
2 2
= σ /SSTx + 2(σ /SSTx2 ) ρj xt xt+j
| {z }
t=1 j=1
| {z }
6 / 23
n
!
X
−2
t=1
 
n
X n−1
XX n−t
t=1 t=1 j=1
n−1
XX n−t
2 2
| {z }
t=1 j=1
variance of β̂1 | {z }
bias
6 / 23
n
!
X
−2
t=1
 
n
X n−1
XX n−t
t=1 t=1 j=1
n−1
XX n−t
2 2
| {z }
t=1 j=1
bias
where σ 2 = V ar(ut ) and we used the fact from last lecture
E(ut ut+j ) = Cov(ut , ut+j ) = ρj σ 2
6 / 23
n
!
X
−2
t=1
 
n
X n−1
XX n−t
t=1 t=1 j=1
n−1
XX n−t
2 2
| {z }
t=1 j=1
bias
where σ 2 = V ar(ut ) and we used the fact from last lecture
E(ut ut+j ) = Cov(ut , ut+j ) = ρj σ 2
If we ignore the serial correlation and estimate the variance
in the usual way, variance estimator will be biased (as
ρ 6= 0) 6 / 23
Consequences:
In most economic applications, ρ > 0 and usual OLS
variance underestimates the true variance of the OLS
7 / 23
Consequences:
We tend to think that OLS slope estimator is more precise
than it actually is.
7 / 23
Consequences:
We tend to think that OLS slope estimator is more precise
than it actually is.
Main consequence is that standard errors are invalid ⇒ t
statistics for testing single hypotheses are invalid ⇒
statistical inference is invalid.
7 / 23
Testing for AR(1) Serial Correlation
We need to be able to test for serial correlation in the error

terms in the multiple linear regression model:
yt = β0 + β1 xt1 +, . . . + βk xtk + ut ,
with ut = ρut−1 + t , t = 1, 2, . . . n.
8 / 23

yt = β0 + β1 xt1 +, . . . + βk xtk + ut ,
with ut = ρut−1 + t , t = 1, 2, . . . n.
The null hypothesis is that there is no serial
correlation.
8 / 23

yt = β0 + β1 xt1 +, . . . + βk xtk + ut ,
with ut = ρut−1 + t , t = 1, 2, . . . n.
correlation.
H0 : ρ = 0
8 / 23

yt = β0 + β1 xt1 +, . . . + βk xtk + ut ,
with ut = ρut−1 + t , t = 1, 2, . . . n.
correlation.
H0 : ρ = 0
With strictly exogenous regressors, the test is very
straightforward - simply regress the OLS residuals ût on
lagged residuals ût−1
8 / 23

yt = β0 + β1 xt1 +, . . . + βk xtk + ut ,
with ut = ρut−1 + t , t = 1, 2, . . . n.
correlation.
H0 : ρ = 0
With strictly exogenous regressors, the test is very
straightforward - simply regress the OLS residuals ût on
lagged residuals ût−1
t statistics of ρ̂ coefficient can be used to test H0 : ρ = 0
against HA : ρ 6= 0 (or sometimes even HA : ρ > 0)
8 / 23
Testing for AR(1) Serial Correlation cont.
An alternative is the Durbin-Watson (DW) statistic:

Pn
(ût − ût−1 )2
DW = t=2Pn 2 .
t=1 ût
9 / 23

Pn
(ût − ût−1 )2
DW = t=2Pn 2 .
t=1 ût
DW ≈ 2(1 − ρ̂).
9 / 23

Pn
(ût − ût−1 )2
DW = t=2Pn 2 .
t=1 ût
DW ≈ 2(1 − ρ̂).
ρ̂ ≈ 0 ⇒ DW ≈ 2.
9 / 23

Pn
(ût − ût−1 )2
DW = t=2Pn 2 .
t=1 ût
DW ≈ 2(1 − ρ̂).
ρ̂ ≈ 0 ⇒ DW ≈ 2.
ρ̂ > 0 ⇒ DW < 2.
9 / 23

Pn
(ût − ût−1 )2
DW = t=2Pn 2 .
t=1 ût
DW ≈ 2(1 − ρ̂).
ρ̂ ≈ 0 ⇒ DW ≈ 2.
ρ̂ > 0 ⇒ DW < 2.
The DW is little problematic, we have 2 sets of critical
values, dL (lower) and dU (upper):
9 / 23

Pn
(ût − ût−1 )2
DW = t=2Pn 2 .
t=1 ût
DW ≈ 2(1 − ρ̂).
ρ̂ ≈ 0 ⇒ DW ≈ 2.
ρ̂ > 0 ⇒ DW < 2.
DW < dL ⇒ reject the H0 : ρ = 0 in favor of HA : ρ > 0.
9 / 23

Pn
(ût − ût−1 )2
DW = t=2Pn 2 .
t=1 ût
DW ≈ 2(1 − ρ̂).
ρ̂ ≈ 0 ⇒ DW ≈ 2.
ρ̂ > 0 ⇒ DW < 2.
DW > dU ⇒ fail to reject the H0 : ρ = 0.
9 / 23

Pn
(ût − ût−1 )2
DW = t=2Pn 2 .
t=1 ût
DW ≈ 2(1 − ρ̂).
ρ̂ ≈ 0 ⇒ DW ≈ 2.
ρ̂ > 0 ⇒ DW < 2.
DW > dU ⇒ fail to reject the H0 : ρ = 0.
dL ≤ DW ≤ dU the test is inconclusive.
9 / 23
In case we do not have strictly exogenous regressors (one or

more xtj is correlated with ut−1 ), t test nor DW test does
not work.
10 / 23

not work.
In this case, we can regress ût on xt1 , xt2 , . . . xtk , ût−1 for all
t = 2, . . . , n.
10 / 23

not work.
t = 2, . . . , n.
t statistics of ρ̂ coefficient of ût−1 can be used to test the
null of no serial correlation.
10 / 23

not work.
t = 2, . . . , n.
t statistics of ρ̂ coefficient of ût−1 can be used to test the
null of no serial correlation.
The inclusion of xt1 , xt2 , . . . xtk explicitly allows each xtj to
be correlated with ut−1 ⇒ no need for strict exogeneity.
10 / 23
Testing for Higher Order Serial Correlation
We can easily extend the test for second order (AR(2))

serial correlation.
11 / 23

serial correlation.
In the model yt = ρ1 ut−1 + ρ2 ut−2 + t , we test the
H0 : ρ1 = 0, ρ2 = 0
11 / 23

serial correlation.
H0 : ρ1 = 0, ρ2 = 0
We regress ût on xt1 , xt2 , . . . xtk , ût−1 , ût−2 for all

t = 3, . . . , n
11 / 23

serial correlation.
H0 : ρ1 = 0, ρ2 = 0
We regress ût on xt1 , xt2 , . . . xtk , ût−1 , ût−2 for all

t = 3, . . . , n
...and obtain F test for joint significance of ût−1 and ût−2 .
If they are jointly significant, we reject the null ⇒ errors
are serially correlated of order two.
11 / 23
Testing for Higher Order Serial Correlation cont.
We can include q lags to test high order serial correlation.
12 / 23

Regress ût on xt1 , xt2 , . . . xtk , ût−1 , ût−2 , . . . , ût−q for all
t = (q + 1), . . . , n.
12 / 23

t = (q + 1), . . . , n.
Use F test to test joint significance of ût−1 , ût−2 , . . . , ût−q
12 / 23

t = (q + 1), . . . , n.
Or use LM version of test – Breusch-Godfrey test:
LM = (n − q)Rû2 ,
12 / 23

t = (q + 1), . . . , n.
LM = (n − q)Rû2 ,
where Rû2 is usual R2 from the regression above.
12 / 23

t = (q + 1), . . . , n.
LM = (n − q)Rû2 ,
where Rû2 is usual R2 from the regression above.

a
Under the null hypothesis, LM ∼ χ2q .
12 / 23
Correcting for Serial Correlation
When correlation is detected, we need to treat it.
13 / 23

We know that OLS may be inefficient.
13 / 23

So how do we obtain BLUE estimator in the AR(1) setting?
13 / 23

We assume all 4 Gauss-Markov Assumptions, but we relax
Assumption 5 and assume errors to follow AR(1):
ut = ρut−1 + t , t = 1, 2, . . .
13 / 23

ut = ρut−1 + t , t = 1, 2, . . .
V ar(ut ) = σ2 /(1 − ρ2 )
13 / 23

ut = ρut−1 + t , t = 1, 2, . . .
V ar(ut ) = σ2 /(1 − ρ2 )
We need to transform the regression equation so we have
no serial correlation in the errors.
13 / 23
Correcting for Serial Correlation cont.
Consider following regression:
yt = β0 + β1 xt + ut ,
ut = ρut−1 + t
14 / 23
yt = β0 + β1 xt + ut ,
ut = ρut−1 + t
For t ≥ 2, we can write:
yt−1 = β0 + β1 xt−1 + ut−1 ,

yt = β0 + β1 xt + ut
14 / 23
yt = β0 + β1 xt + ut ,
ut = ρut−1 + t
yt−1 = β0 + β1 xt−1 + ut−1 ,

By multiplying first equation by ρ and subtracting it from

second equation, we get:
ỹt = (1 − ρ)β0 + β1 x̃t + t
where ỹt = yt − ρyt−1 and x̃t = xt − ρxt−1
14 / 23
yt = β0 + β1 xt + ut ,
ut = ρut−1 + t
yt−1 = β0 + β1 xt−1 + ut−1 ,


ỹt = (1 − ρ)β0 + β1 x̃t + t

This is called quasi-differencing.
14 / 23
yt = β0 + β1 xt + ut ,
ut = ρut−1 + t
yt−1 = β0 + β1 xt−1 + ut−1 ,


ỹt = (1 − ρ)β0 + β1 x̃t + t

This is called quasi-differencing. BUT we never know
the value of ρ
14 / 23
Feasible GLS Estimation with AR(1) Errors
The problem with this GLS estimator is that we never
know the value of ρ.
15 / 23
But we already know how to obtain the ρ estimator:
15 / 23
Simply regress the OLS residuals on their lagged values
and get ρ̂.
15 / 23
and get ρ̂.
Feasible GLS (FGLS) Estimation with AR(1) Errors
15 / 23
and get ρ̂.
Run the OLS regression of yt on xt1 , . . . , xtk and obtain residuals
ût , t = 1, 2, . . . , n.
15 / 23
and get ρ̂.
ût , t = 1, 2, . . . , n.
Run the regression of ût on ût−1 to obtain estimate ρ̂.
15 / 23
and get ρ̂.
ût , t = 1, 2, . . . , n.
Run the regression of ût on ût−1 to obtain estimate ρ̂.
Run OLS equation:
ỹt = β0 x̃t0 + β1 x̃t1 + . . . + βk x̃t + errort ,
where x̃t0 = (1 − ρ̂), x̃t1 = xt − ρxt−1 for t ≥ 2, and

x̃t0 = (1 − ρ2 )1/2 , x̃t1 = (1 − ρ2 )1/2 x1 for t = 1
15 / 23
GLS is BLUE under Assumptions 1 – 5 and we can use t
and F tests from the transformed equation for the
inference.
16 / 23
inference.
These tests are asymptotically valid if Ass.1 – 5 hold in
transformed model (along with stationary and weak
dependence in the original variables)
16 / 23
inference.
Distributions conditional on X are exact (with minimum
variance) if Ass 6. holds fro t .
16 / 23
inference.
FGLS estimator is called Prais-Winsten estimator
16 / 23
inference.
If we just omit first equation (t = 1), it is called
Cochrane-Orcutt estimator.
16 / 23
inference.
FGLS estimators are not unbiased, but are consistent.
16 / 23
inference.
Asymptotically, both procedures are the same and FGLS is
more efficient than OLS.
16 / 23
inference.
Asymptotically, both procedures are the same and FGLS is
more efficient than OLS.
This method can be extended for higher order serial
correlation, AR(q) in the error term.
16 / 23
Serial Correlation-Robust Standard Errors
Problem: If the regressors are not strictly exogenous,

FGLS is no longer consistent.
17 / 23

If strict exogeneity does not hold, it’s possible to calculate
serial correlation (and heteroskedasticity) robust standard
errors of OLS estimate. We know that OLS will be
inefficient.
17 / 23

If strict exogeneity does not hold, it’s possible to calculate
serial correlation (and heteroskedasticity) robust standard
errors of OLS estimate. We know that OLS will be
inefficient.
The idea is to scale OLS standard errors to take into
account serial correlation.
17 / 23
Serial Correlation-Robust Standard Errors cont.
Estimate the model with OLS to obtain residuals ût , σ̂ and
the usual standard errors “se(β̂1 )”, which are incorrect.
18 / 23
Run the auxiliary regression of xt1 on xt2 , xt3 , . . . , xtk (with
constant) and get residuals r̂t .
18 / 23
For a chosen integer g > 0 (typically integer part of n1/4 ):
n g n
!
X X X
2
ν̂ = ât + 2 [1 − h/(g + 1)] ât ât−h ,
t=1 h=1 t=h+1
where ât = r̂t ût , t = 1, 2, . . . , n.
18 / 23
n g n
!
X X X
2
ν̂ = ât + 2 [1 − h/(g + 1)] ât ât−h ,
t=1 h=1 t=h+1
where ât = r̂t ût , t = 1, 2, . . . , n.
Serial Correlation-Robust Standard Error

√
se(β̂1 ) = [“se(β̂1 )”/σ̂]2 ν̂
18 / 23
n g n
!
X X X
2
ν̂ = ât + 2 [1 − h/(g + 1)] ât ât−h ,
t=1 h=1 t=h+1
where ât = r̂t ût , t = 1, 2, . . . , n.

√
se(β̂1 ) = [“se(β̂1 )”/σ̂]2 ν̂
Similarly for β̂j .
18 / 23
n g n
!
X X X
2
ν̂ = ât + 2 [1 − h/(g + 1)] ât ât−h ,
t=1 h=1 t=h+1
where ât = r̂t ût , t = 1, 2, . . . , n.

√
se(β̂1 ) = [“se(β̂1 )”/σ̂]2 ν̂
Similarly for β̂j .

SC robust standard errors can poorly behave in small samples
in presence of large serial correlation.
18 / 23
Heteroskedasticity in Time Series Regressions
OLS estimators are unbiased (with Ass. 1-3) and
consistent (Ass. 1A-3A).
19 / 23
OLS inference is invalid, if Ass.4 (homoskedasticity) fail.
19 / 23
Heteroskedasticity-robust statistics can be easily derived in
the same manner as in cross-sectional data (if Ass.
1A,2A,3A and 5A hold).
19 / 23
However, in small samples we know that these robust
standard errors may be large. ⇒ we want to test for
heteroskedasticity.
19 / 23
heteroskedasticity.
We can use the same tests as for the cross-sectional case,
but we need to have no serial correlation in the errors.
19 / 23
heteroskedasticity.
Also for the Breusch-Pagan test where we specify
u2t = δ0 + δ1 xt1 + . . . + δk xtk + νt and test
H0 : δ1 = δ2 = . . . = δk = 0, we need ν to be
homoskedastic and serially uncorrelated.
19 / 23
heteroskedasticity.
Also for the Breusch-Pagan test where we specify
u2t = δ0 + δ1 xt1 + . . . + δk xtk + νt and test
H0 : δ1 = δ2 = . . . = δk = 0, we need ν to be
homoskedastic and serially uncorrelated.
If we find heteroskedasticity, we can use heteroskedasticity
robust statistics.
19 / 23
Autoregressive Conditional Heteroskedasticity
Many times, we find dynamic form of the
heteroskedasticity in economic data.
20 / 23
We can have E[u2t |X] = V ar(ut |X) = V ar(ut ) = σ 2 , but
still:
E[u2t |X, ut−1 , ut−2 , . . .] = E[u2t |X, ut−1 ] = α0 + α1 u2t−1 .
20 / 23
still:
Thus u2t = α0 + α1 u2t−1 + νt , where
E[ν|X, ut−1 , ut−1 , . . .] = 0.
20 / 23
still:
E[ν|X, ut−1 , ut−1 , . . .] = 0.
Engle (1982) suggested looking at the conditional variance
of ut given past errors - autoregressive conditional
heteroskedasticity (ARCH) model.
20 / 23
still:
E[ν|X, ut−1 , ut−1 , . . .] = 0.
So even when the errors are not correlated (Ass. 5 holds),
its squares can be correlated.
20 / 23
still:
E[ν|X, ut−1 , ut−1 , . . .] = 0.
OLS is still BLUE with ARCH errors and inference is valid
if Ass. 6 (normality) holds.
20 / 23
still:
E[ν|X, ut−1 , ut−1 , . . .] = 0.
OLS is still BLUE with ARCH errors and inference is valid
if Ass. 6 (normality) holds.
Even if Normality does not hold, we know that
asymptotically OLS inference is valid under Ass 1A – 5A
and we can have ARCH effects.
20 / 23
cont.
So why do we need to care about ARCH errors?
21 / 23
cont.

Because we can obtain asymptotically more efficient
estimators than OLS.
21 / 23
cont.

Details will be provided at Mgr. courses, not Bc. level.
21 / 23
cont.

ARCH model have become important for empirical finance

as it captures time-varying volatility in the stock markets.
21 / 23
cont.

Rob Engle received a Nobel Prize in 2003 for it.
21 / 23
cont.

Rob Engle received a Nobel Prize in 2003 for it. Example of
stock market returns on the next slide.
21 / 23
cont.
Prices of DJI stock market index H2000-2011L
14 000
12 000
10 000
8000
6000
2000 2002 2004 2006 2008 2010 2012
Returns of DJI stock market index 2000-2011
0.10
0.05
0.00
-0.05
-0.10
2000 2002 2004 2006 2008 2010 2012
22 / 23
Thank you
23 / 23

ECON F342 AE CH 12

Uploaded by

Copyright:

Available Formats

You might also like

ECON F342 AE CH 12

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ECON F342 AE CH 12

Uploaded by

Copyright:

Available Formats

ECON F342 : Applied Econometrics

For the today

For the today

OLS is unbiased under the first 3 Gauss-Markov

OLS is unbiased under the first 3 Gauss-Markov

OLS is unbiased under the first 3 Gauss-Markov

OLS is unbiased under the first 3 Gauss-Markov

OLS is unbiased under the first 3 Gauss-Markov

Gaus-Markov theorem requires both homoskedasticity and

Gaus-Markov theorem requires both homoskedasticity and

Gaus-Markov theorem requires both homoskedasticity and

Gaus-Markov theorem requires both homoskedasticity and

The OLS estimator is then:

We need to be able to test for serial correlation in the error

We need to be able to test for serial correlation in the error

We need to be able to test for serial correlation in the error

We need to be able to test for serial correlation in the error

We need to be able to test for serial correlation in the error

An alternative is the Durbin-Watson (DW) statistic:

An alternative is the Durbin-Watson (DW) statistic:

An alternative is the Durbin-Watson (DW) statistic:

An alternative is the Durbin-Watson (DW) statistic:

An alternative is the Durbin-Watson (DW) statistic:

An alternative is the Durbin-Watson (DW) statistic:

An alternative is the Durbin-Watson (DW) statistic:

An alternative is the Durbin-Watson (DW) statistic:

In case we do not have strictly exogenous regressors (one or

In case we do not have strictly exogenous regressors (one or

In case we do not have strictly exogenous regressors (one or

In case we do not have strictly exogenous regressors (one or

We can easily extend the test for second order (AR(2))

We can easily extend the test for second order (AR(2))

We can easily extend the test for second order (AR(2))

We regress ût on xt1 , xt2 , . . . xtk , ût−1 , ût−2 for all

We can easily extend the test for second order (AR(2))

We regress ût on xt1 , xt2 , . . . xtk , ût−1 , ût−2 for all

We can include q lags to test high order serial correlation.

We can include q lags to test high order serial correlation.

We can include q lags to test high order serial correlation.

We can include q lags to test high order serial correlation.

We can include q lags to test high order serial correlation.

where Rû2 is usual R2 from the regression above.

We can include q lags to test high order serial correlation.

where Rû2 is usual R2 from the regression above.

When correlation is detected, we need to treat it.

When correlation is detected, we need to treat it.

When correlation is detected, we need to treat it.

When correlation is detected, we need to treat it.

When correlation is detected, we need to treat it.

When correlation is detected, we need to treat it.

For t ≥ 2, we can write:

yt−1 = β0 + β1 xt−1 + ut−1 ,

For t ≥ 2, we can write:

yt−1 = β0 + β1 xt−1 + ut−1 ,

By multiplying first equation by ρ and subtracting it from

ỹt = (1 − ρ)β0 + β1 x̃t + t

where ỹt = yt − ρyt−1 and x̃t = xt − ρxt−1

ỹt = (1 − ρ)β0 + β1 x̃t + t

ỹt = (1 − ρ)β0 + β1 x̃t + t

ỹt = (1 − ρ)β0 + β1 x̃t + t