WESS Time Series Lectures

Alexander Karalis Isaac

July, 2015

Econometrics so far
We have studied the regression equation

yi = α + βxi + i i = 1...n

where i indicates an individual in a sample of n observations.

We have been interested in
The effect of x on y; dy /dx = β
The predicted value of y, given x; E[yi |xi ] = α + βxi
The fit of the model, e.g. the R 2 statistic
Can we do the same with a pair of time series

yt = α + βxt + t

(Or rather, only under special circumstances, and such a regression is only
ever part of the answer!)

Two problems in time series (1)

With data yt , xt the critical assumption E[t |xt ] = 0 is difficult to

maintain. The y and x are often simultaneously determined in a system,
’the economy’.

yt = αy + βy xt + yt
xt = αx + βx yt + xt

Consider the regression

yt = a + bxt + et
= a + b(α + βx yt + xt ) + et

The regression error et is an estimate of the yt error yt . It will be

correlated with the regressor xt as this regressor actually contains yt , and
so contains yt

Two problems in time series (2)

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue

a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -

Does this look like a good regression?

Two problems in time series (2)

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue

a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -

Does this look like a good regression?

yt is U.S. output. xt is mean land sea temperature

Now what do you think about the regression?

Two problems in time series (2)

Look at the estimates from a regression yt = α + βxt + t

param estimate tvalue

a 7.9 (139)
b 2.9 (15.6)
R2 0.75 -

Does this look like a good regression?

yt is U.S. output. xt is mean land sea temperature

Now what do you think about the regression?

This is not a regression about climate change!

It is called the spurious regression problem.
It is easy to do crap regressions with time series data!

In Part I our models will look like

yt = α + βyt−1 + t or yt = α + βt−1 + t

so the explanatory variable is replaced by the previous value of the

dependent variable, or by previous errors.
Our primary interest will be prediction, E[yt |yt−1 ] = α + βyt−1 .

In Part I our models will look like

yt = α + βyt−1 + t or yt = α + βt−1 + t

so the explanatory variable is replaced by the previous value of the

dependent variable, or by previous errors.
Our primary interest will be prediction, E[yt |yt−1 ] = α + βyt−1 .
In Part II we will learn how to estimate dynamic relationships

yt = α0 + α1 yt−1 + β0 xt + β1 xt−1 + t

in a way that elimantes the two common problems with time series

Time series data

Our sample data {yt }, {xt } refers to observations on the same unit in
sequential time periods, t = 1, . . . , T
Periods may be years, quarters, months, weeks or days, depending on
interst and data availability

years quarters months weeks days

finance finance finance
macro macro macro
growth growth

Time series data

There are three main types of time sereis data

Mean reverting (’stationary’)
Series with a trend (’trend stationary’) series
Series with permanent shocks (’Integrated series’)

Part I of these notes deals with stationary series

Part II looks at testing for permanent shocks and dealing with Integrated

Mean reverting series

Trend stationary series

Integrated series

Know your data

The data file ‘macro vars.xls’ contains lots of U.S. data series
GDP is a good example

Plot U.S. GDP

What is the first thing to do to this data?

Take logs

The first thing we don’t like is the exponential shape.

Our regressions are linear regressions, so lets transform the data to make it
more linear

generate ln(GDP) and plot this

Look at your data. For data in levels, taking logs is often the first
step in applied work
Not if the data is already in % changes, or an interest rate!
Logarithms make percentage changes comparable by eye, which is
often more relevant.
Recall g = (xt − xt−1 )/xt−1 ⇒ 1 + g = xt /xt−1 , so then
ln(xt ) − ln(xt−1 ) = ln(1 + g ) ≈ ln(g ) for small g .

If the variable appears to be trending, as with ln GDPt , a safe thing to

do is take differences.

Generate the difference of GDP and plot this

What is the main difference compared to previous plots?

This now looks mean reverting. Later we will test formally whether a
series is mean reverting or integrated, but don’t forget it’s always
sensible to start by looking at your data.
∆yt = yt − yt−1 is the difference operator.

Know your data: prices

Consider the price data. Look at

log levels
difference of logs - inflation
the difference of inflation
Which would you be happy to consider mean reverting?

Part I: Modelling stationary time series

So far I have talked about mean reverting series, but we can be more
We will model variables which are covariance stationary (stationary
for short)
The mean exists and does not depend on time, E[yt ] = µ for all t
(quick notation ∀t).
The variance exists and is independent of time, var(yt ) = σy2 .
The Autocovariance, cov(yt , yt−k ) = σk2 is indpendent of time, it
depends only on k and not on t

Modelling stationary time series: assumptions on errors

Any model of a stationary series imposes two key assumptions on the

A1: E[t ] = E[E[t |yt−1 ]] = 0
A2: E[t t−s ] = 0 ∀s > 0
There are also some technical assumptions
A3: E[2t ] = var(t ) = σ2
A4: yt and yt−j become independent as j gets large
A5: Very large outliers are unlikely
These assumptions apply to the true model, and we have to replicate them
in our statistical model.

Discussion of assumpions

A1 This tells us that t is unpredictable given information about yt−1 ,

available before period t begins. In the regression context, we require
that t is unpredictable given all our r.h.s. variables. We use
predetermined data yt−1 , yt−2 , t−1 , t−2 , ...
A2 In cross-sections, this is a second order assumption determining the
standard error of b.
In time series it is a first order assumption, determining the
consistency of b. See exercise.
A3 Allows us to make calculations about variances, including confidence
intervals around parameter estimates and forecasts. It is implied by

Discussion of assumptions

A4 This is a technical requirement to derive limiting behaviour of

estimators. It replaces i.i.d. assumption in cross-sectional data
A5 This says that our models are not suitable for certain types of very
wild randomness. You should worry about this if you do
high-frequency finance, but it’s generally not a problem with
macroeconomic data.
In applied work, spurious regressions and models with wrong/insufficient
dynamics tend to violate A2, so checking it is key. Also, check A2 if you
are evaluating someone else’s work!

Stationary time series: AR(1) model

Our first time sereis model for stationary data

yt = α + βyt−1 + t (1)

This replaces independent explanatory data with past value of the

dependent variable.
Models the correlation between yt and its own past
Stationarity requires |β| < 1
Then the influence of past shocks dies away smoothly
Estimate the model by OLS

The AR(1) estimator

The AR(1) regression is like a standard OLS regression

t=2 (yt − ȳ )(yt−1 − ȳ )
b= PT 2
t=2 (yt−1 − ȳ )
cov(yt , yt−1 )
var (yt )
a = ȳ − b ȳ
⇒ ȳ =

The AR(1) estimator

Variance of b follows standard OLS theory

X −1
var(b) = σ̂ 2 (yt−1 − ȳ )2
σ̂ 2
\t )
1 X
where σ̂ 2 = ˆ2t
T −1−k

Note we lose an extra DoF for every lag we include in the autoregression
Confidence testing as usual, given |b| < 1:
τ = (b − bH0 )/SE (b) ∼ tα/2,DoF
Expect low R 2 compared to cross sectional data.

Look at the series for ∆ln(GDP) and do an AR(1) estimation

param estimate tvalue

R2 -

Plot the residuals of the regression. Do you think they meet A1 - A3?
We will look at formal tests for these assumptions below.

General AR(p) model
One lag of yt may not be enough: an omitted variable bias
This shows up as E[t t−s ] 6= 0
We find a model with enough lags to ensure E[t t−s ] = 0∀s

yt = α + β1 yt−1 + β2 yt−2 + · · · + βp yt−p + t

Post-estimation approximate F-test, Bruesch-Godfrey test

ˆt = b1 ˆt−1 + · · · + bq t−q

ˆ + νt
H0 : b1 = b2 = · · · = bq = 0
HA : bi 6= 0 for some i
τ= ∼ χ2q
= nR 2 ∼ χ2q

Inference on β̂i as in standard multivariate OLS models

Model selection strategy

Should be begin small and add lags until A2 holds?

Alexander Karalis Isaac (Warwick) Time Series July, 2015 25 / 90

Model selection strategy

Should be begin small and add lags until A2 holds?

NO!: Don’t base your model selection algorithm on starting from
models that don’t make any statistical sense
Start big and eliminate insignificant regressors, to find the smallest
model for which A2 still holds
Often start with p = f + 1 where f =nobs/year.
Quarterly example
Begin with p=5
Re-esetimate excluding the insiginifcant longer lags
Check E[t t−s ] = 0, s = 1...4
Repeat untill model contains only significant terms
D. Hendry ’PcGets’ software automates this

Notes on examples
You do some examples: ∆GDPt , ∆Const , ∆Invt , ∆Inft :

The MA(q) process

We noticed some series require very long AR models to capture all

the conditional correlation of the yt series.
This costs degress of freedom, making estimates and forecasts less
Is there a smaller model which could capture the dependency that AR
models struggle with?

The MA(q) process

We noticed some series require very long AR models to capture all

the conditional correlation of the yt series.
This costs degress of freedom, making estimates and forecasts less
Is there a smaller model which could capture the dependency that AR
models struggle with?
This is the moving average process

The MA(1) process


yt = α + βt−1 + t

Simple to analyse
Stationary for any β value
t }T
Harder to estimate - b determines {ˆ t }T
t=1 , but {ˆ t=1 is the
regressor which determines b!
Solution: take an MLE approach (as in Probit)

MLE in the MA(1)

t |yt−1 ∼ N(0, σ2 )

f (yt |yt−1 ) = √ exp((yt − α − βt−1 )2 /(2σ2 ))
l(α, β, σ2 ) = ln f (yt |yt−1 )
max l(.)w .r .t.α, β, σ2

Techinchally this is also conditional on 0 . A typical assumption is

0 = E[t ] = 0, though there are other approaches.
Inference follows standard maximum likelihood procedure

Information criteria

The MLE approach suggests another tool for tackling model selection
Minimise the expected information loss across potential models

AIC: −(2l(θ̂) − 2k): choose model with lowest AIC

BIC: −(2l(θ̂) − k ln(T )): choose model with lowest BIC

BIC generally chooses smaller models, unless you have small T

Combine insights from significance tests and Info criteria to choose

parsimonious model. Always check A2 holds!
Information criteria are also relevant for AR models, which can be
placed within MLE theory

Notes on examples
You do some MA(q) examples: ∆GDPt , ∆Const , ∆Invt , ∆Inft :

Model evaluation: forecast performance
If your job is forecasting, choose model with best forecasts!
I In sample forecasts:
Estimation period is 1 . . . T and look at e.g. 1-period ahead forecast
E[yt+1 |yt , θ̂T ]
This is similar to in-sample fit where we compare ŷt with yt , but now
we are doing it 1-period ahead.
I Out of sample forecasts:
Estimation period is 1 . . . N, and look at 1-period ahead forecasts
E[yt+1 |yt , θ̂N ] for t = N + 1, N + 2, N + 3 etc. up to final data point T .
This is a tougher test as none of the information in the forecast period
contributed to the parameter estimation.
A simple criterion Minimum Mean Square Error
1 X
MSE = (ŷi − yi )2

where ŷi is the forecast, yi is the realsiation.

Empirical Example: In-sample forecast comparisons

Compare 1-step ahead forecasts from 4-lag and preferred AR, MA models

Variable Model MSE Model MSE

∆ GDP AR(4) MA(4)
AR( ) MA( )
∆ Cons AR(4) MA(4)
AR( ) MA( )
∆ Inv AR(4) MA(4)
AR( ) MA( )
∆ Inf AR(4) MA(4)
AR( ) MA( )

ARMA(p,q) models

We can combine the forecasting power of AR and MA components

yt = α + β1 yt−1 + · · · + βp yt−p + γ1 t−1 + · · · + γq t−q + t

A1, A2, A3 apply for a well specified model

Estimation is by maximum likelihood
Don’t do large ARMAs, in practice ARMA(2,1) is often a good
approximation for macroeconomic time series.

Forecasts from an ARMA(2,1)

Variable k=1 k=4 k=8
∆ Cons
∆ Inv
∆ Inf

Estimate the model to 2005. From 2003q1, produce static 1-period ahead
forecasts up to 2005, then dynamic 4 and 8 period ahead forecasts also
from 2003q1
What happens to the MSE as the forecast horizon increases?

Out of sample forecast example

Now estimate the model to 2007 and repeat the process using dynamic
out of sample forecasting up to 2011

Variable k=1 k=4 k=8
∆ Cons
∆ Inv
∆ Inf

This is the problem the BoE had (with a more sophisticated model) during
the crisis
The FED did less badly because its model updates the parameters, via the
Kalman filter, when it makes an error. Beyond the scope of this course!

More on forecast errors

We have used the MSFE to look at different models and the effect of
different time horizons
Out of sample forecasts errors are larger than in-sample, because the
forecast error is really composed of two parts

MSFE = E[(yT +1 − ŷT +1|T )2 ]

= σ2 + var[(a − α) + (b − β)yT ]

The out of sample forecasts involve re-estimating the model, so give

an estimate of the likely performance of the model in real time

Deeper into time sereis: preliminaries

What does the AR part actually measure?

What does the MS part actually measure?
Why is their combination sometimes more useful?

Think about the way the influence past shocks, t−s decays over time
To go deeper into time series we need to brush up our maths!
We will look at deriving the conditional and unconditional
expectations, variances and autocovariances for simple time-series

Conditional Expectations

The conditional expectation E[yt+1 |yt ] follows from the conditional mean
eqation we write down in AR(1) or MA(1) model

E[yt+1 |yt ] = E[α + βyt + t+1 |yt ]

= α + βE[yt |yt ] + E[t+1 |yt ]
= α + βyt

E[yt+1 |yt ] = E[α + βt + t+1 |yt ]

= α + βE[t |yt ]
= α + βt

Looking further ahead: iterative forecasts

E[yt+2 |yt ] = E[α + βyt+1 + t+2 |yt ]

= α + βE[yt+1 |yt ] + E[t+2 |yt ]
= α + β(α + βyt )
= α + βα + β 2 yt
E[yt+k |yt ] = β i α + β k yt
lim E[yt+k |yt ] =
k→∞ 1−β


E[yt+k |yt ] = α ∀k ≥ 2

Unconditional Expectations

If we know the process, but have no observations, what is our best

guess at a value yt ? Our best guess is the unconditional mean implied
by the process

E[yt ] = α + βE[yt−1 ] + E[t ]

= α + βE[yt ]
E[yt ] =


E[yt ] = α + βE[t−1 ] + E[t ]

Uncertainty and variance AR(1)

Conditional variance

var(yt+1 |yt ) = var(α + βyt + t+1 |yt )

= var (t |yt ) = σ2
var(tt+2 |yt ) = var(α + βyt+1 + t+2 |yt )
= β 2 var(yt+1 |yt ) + var(t+2 |yt )
= (1 + β 2 )σ 2
var(yt+k |yt ) = (β 2 )i σ 2
⇒ lim var(yt+k |yt ) =

Uncertainty and variance AR(1)

Unconditional variance

σy2 = var(yt ) = var(α + βyt−1 + t )

= β 2 var(yt ) + σ2
σy2 =
1 − β2
Compare this to the limit of the conditional variance

Uncertainty and Variance MA(1)

Conditional variance

var(yt+1 |yt ) = var(α + βt + t+1 |yt )

= σ2
var(yt+k |yt ) = var(α + βt+k−1 + t+k |yt )
= (1 + β 2 )σ2 ∀k ≥ 2

Unconditional variance

σy2 = var(α + βt−1 + t )

= (1 + β 2 )σ 2

So the conditional variance of MA(1) returns to unconditional

variance after 2 periods!

Forecast error variance

We should include confidence intervals in our forecasts

Assume t ∼ N(0, σ2 )
Then the 95% confidence intervals for E[yt+k |yt ] are
AR(1) = yt+k|t ± 1.96 (β 2 )i σ2

MA(1) = yt+k|t ± 1.96(1 + β 2 )σ2 ∀k ≥ 2

In practice it is common to apply these formulas to forecasts

generated with estimates a, b, σ̂2 , ignoring the extra uncertainty
created by estimating parameters

Forecasts with confidence intervals


Deeper into time sereis: ACF

An important property is the correlation between yt and yt−k

The Autocovariance function is the set of numbers
cov(yt , yt−k ) := σk2
The sample estimator
PT of the Autocovariance function is
2 1
σ̂k = T −k−1 t=k+1 ỹt ỹt−k where ỹt = yt − ȳ
The Autocovariance function is normalised by the variance of y to
give the Autocorrelation function ACF(k):

cov(yt , yt−k )
ρk =
var(yt )

ACF for various stationary models


ACF: discussion

The ACF shows us how long it takes for the influence of past shocks
to die away, by measuring the correlation between yt and its own past
For stationary processes the ACF becomes statistically insignificant
after a finite number of periods.
Stationary processes have finite memory - the influence of a shock is
PIC:growth ACF

Deeper into time series: PACF

Clearly autoregressions can caputre correlation between yt and its

past, but how many lags do we need?
If yt = α + β1 yt−1 + β2 yt−2 + t , then we know from regression
analysis that β2 is a measure of the conditional correlation between yt
and yt−2 after accounting for the correlation explained by yt−1

cov(yt , yt−k |yt−1 , yt−2 , ..., yt−k+1 )

PACF (k) = 1/2
var(yt |yt−1 , ..., yt−k+1 ) var(yt−k |yt−1 , ..., yt−k+1 )
e.g. PACF (3) =

PACFs for stationary processes


Memory in AR(1)
Let yt = βyt−1 + t , i.e. put α = 0 ⇒ µ = 0

cov(yt , yt−1 ) = E[(βyt−1 + t )yt−1 ]

= βE[yt−1 ] = βσy2
⇒ corr (yt , yt−1 ) = β

cov(yt , yt−2 ) = E[(βyt−1 + t )yt−2 )]

= E[(β(βyt−2 + t−1 ) + t )yt−2 ]
= E[β 2 yt−1 + βt−1 yt−2 + t yt−2 ]
= βσy2
⇒ corr (yt , yt−2 ) = β 2

corr (yt , yt−k ) = β k

ACF for different AR(1) models


PACF AR models

yt = β1 yt−1 + β2 yt−2 +t

| {z }
cond corr

The coefficient in the bracketed term is

cov(yt , yt−2 |yt−1 )
var(yt |yt−1 ) var(yt−2 |yt−1 )

PACF (k) = βk in AR(p) models

So the PACF drops sharply to 0 after the final lagged term in the
AR(p) model
This is an alternative way to think about how many lags to include

PACF various AR models

What do you notice about ACF vs. PACF in AR models?

Consider the mean zero MA(1) yt = βt−1 + t

cov(yt , yt−1 ) = E[(βt−1 + t )(βt−2 + t−1 )]

= βσ2
⇒ corr (yt , yt−1 ) = β

cov(yt , yt−2 ) = E[(βt−1 + t )(βt−3 + t−2 )]

ACF (k) = 0 ∀k≥2

The ACF of an MA(q) process drops to 0 sharply after q + 1 lags

PACF of MA(1)

To calculate the PACF directly is hard. Here’s a neat trick

Assume β < 1, notice t = yt − βt−1

yt = β(yt−1 − βt−2 ) + t
= β(yt−1 − β(yt−2 − βt−3 )) + t
= βyt−1 − β 2 yt−2 + β 3 (yt−3 − βt−4 ) + t
yt = (−1)i+1 β i yt−i + t

Which is an AR(∞), and is well defined give |β| < 1.

Using the earlier result, the PACF will decay geometrically as β i declines
to zero

Box Jenkins model building method

Two famous statisticians suggested the ACF/PACF as a way of building

times series regressions

AR(p) MA(q) ARMA(p,q)

ACF Decays smoohtly Chops off at q lags Decays smoothly
PACF Chops off at p lags Decays smoothly Decays smoothly

Inspection of empirical ACF, PACF can help suggest sensible starting

ARMA(p,q) model.
Then test down to small model using significance and information
Always check A2 holds for your residuals

Emprical P/ACF

Genuine AR(1) process

∆ ln GDP

Emprical P/ACF

Genuine MA(1) process


We have dealt with finite memory processes where

I ACF (k) → 0 as k → ∞
I PACF (k) → 0 as k → ∞
I E[yt ] = µ ∀ t
I var(yt ) = σy2 ∀ t
I cov(yt , yt−k ) depends only on k and not t
ARMA(p,q) models make decent forcasts for these series
But in economics, they are only approximate models

How do we deal with levels of series and model relationships between

dynamic economic variables?

PART II: Integrated processes

Prcoesses with permanent shocks are called integrated processes

I A simple example shows our ideas of µ and σ 2 are not compatible with
permanent shocks
The first problem is to decide if a series is integrated
I Dickey Fuller tests
We then have a choice
I Difference the series to make it stationary
I Look for cointegration between two or more integrated series

Alexander Karalis Isaac (Warwick) Time Series July, 2015 62 / 90

Consider the random walk yt = yt−1 + t , y0 = 0

y1 = y0 + 1 = 1
y2 = y1 + 2 = 1 + 2
...yt = 1 + 2 + · · · + t

var(yt ) = var( t−i )
= tσ
→∞ as t → ∞

The variance of this process grows without bound

Permanent shocks

What about the mean?

Think about the random walk with drift yt = α + yt−1 + t
This is an AR(1) with β = 1
Thus E[yt ] = 1−β is undefined
The process has no unconditional mean
Conditional forecasts

E[yt+k |yt ] = α + yt−1

With an error variance that grows without bound

Regression analysis struggles with such data

Regressions with random walks

Regress the two uncorrelated random walks yt , xt in the dataset on


param value tstat

R2 -

Breusch-Godfrey stat for serial corr up to order 4:

This is typical of a spurious regression
High R 2 combined with positive serial correlation is always a sign of
spurious regression
Now regress ∆y on ∆x. Is there any relationship?

Testing for unit roots: Dickey-Fuller test

The best way to avoid spurious regressions is to do regressions with

stationary series
To determine stationarity, we need to test β = 1 in the process
yt = α + βyt−1 + t

∆yt = α + (β − 1)yt−1 + t
= α + ρyt−1 + t
H0 : ρ̂ = 0 ⇒ there is a unit root
HA : ρ̂ < 0 ⇒ No unit root
tDF =
SE (ρ̂)

The test stat tDF follows the Dickey Fuller distribution, which gives much
more negative critical values than the standard normal

Dickey Fuller distribution

The DF distribution is sensitive to specification of the test
I Inclusion of an intercept
I Inclusion of a trend
I Number of lags
I Sample size

The Augmented Dickey Fuller test

It is essential the there is no serial correlation in DF regression

If necessary add lagged differences of the dependent variable

yt = α + β1 yt−1 + β2 yt−2 + t
= α + β1 yt−1 + β2 yt−1 − β2 yt−1 + β2 yt−2 + t
= α + (β1 + β2 )yt−1 − β2 ∆yt−1 + t
∆yt = α + (β1 + β2 − 1)yt−1 − β2 ∆yt−1 + t
= α + ρyt−1 − β2 ∆yt−1 + t

Hypothesis, Alternative and test statistic as previous slide

Dealing with trends

Include trends using the ‘restricted trend’ option if available, for

g = γ/(1 − β)

yt = α + γt + βyt−1 + t
∆yt = α + (β − 1)(yt−1 − gt) + t
= α + ρ(yt−1 − gt) + t
⇒ ∆yt = α + t if ρ = 0 (2)
⇒ yt = α + γt + βyt−1 + t if ρ < 0 (3)

From (2) if process is unit root it is RW with drift

From (3) if process is not unit root, it is trend stationary with |β| < 1.

Dickey Fuller Tables

Notes on exercise

The order of integration, d, written yt ∼ I (d) is the number of times

a series must be differenced in order to make the series yt stationary
Determine the order of integration of Output, Consupmtion,
Investment and Prices.
Do any series exhibit trend-stationary behaviour?

Cointegration: Random walks which Tango!

So far we have dealt with Integrated series by differencing to make

them stationary and modelling their (univariate) stationary behaviour.
There is an important case when we can work with two (or more)
Integrated series directly. This is when the series are cointegrated
I Economic behaviour creates long run - equilibrium - relationships
between series. E.g. output and consumption, investment and output,
house prices and earnings (?), stock prices and profits (?)
I The ratio of such series is a stationary series, even though the two
series are I(1)!
I Variables which cointegrate in this way adjust to dynamic shocks in
order to move back towards their equilibrium relationship

Output and Consumption

Plots of series

Cointegration: formal definition

If a linear combination of I(1) series is I(0) then the two series cointegrate

xt ∼ I (1) yt ∼ I (1)
yt − βxt ∼ I (0)

The ‘cointegrating vector’ is the pair of values (1, −β) which

(working in logs) give the stationary ratio between the series
Economic theory often suggests theoretical values for β, so itis
interesting to see if these are true in the data

Common stochastic trends
Cointegration occurs when two series share a common stochastic trend,
say Xt . Let X0 = 0 and

Xt = Xt−1 + t
⇒ Xt = t

Let ỹt and x̃t be independent I (0) processes and let

yt = βXt + ỹt xt = Xt + x̃t

⇒ yt − βxt = βXt + ỹt − β(Xt + x̃t )
= ỹt − β x̃t ∼ I (0)

The common stochastic trend has been cancelled out. The pair (1, β) is
called the cointegrating vector as gives is the stationary linear combination
of y and x
Output and Consumption

Plots of ratio and residual in superconsistent regression

Cointegration: long and short-run relationships

If an economically meaningful equilibrium relationship exists:

There must be dynamic adjustment in the short run in order to return
the variables towards equilibrium levels when shocks push them apart
Thus the long-run relationship makes predictions about short-run
adjustment dynamics
The levels of the series this period help us predict changes in the
series next period
We can represent both the long-run and the short-run behaviour of
cointegrated series through the error correction model

Error correction model

We have seen an estimate of the cointegrating relationship between

const and outputt

ct = βyt + 

Encouragingly the residuals ˆt from this relationship were stationary

But look at the BGodfrey stat - XXX - the above model is not
dynamically well-specified; it does not meet A2.
A model with more general dynamics is

ct = β1 yt + β2 yt−1 + β3 ct−1 + t (4)

This allows for the response of ct to its own past, current and lagged
values of yt

Error correction model
Although (4) is a more general dynamic specification, it consists of
I (1) variables, yet the t series should be I (0).
With a bit of algebra we can rewrite the model entirely in terms of
I (0) variables

ct = β1 yt + β2 yt−1 + β3 ct−1 + t
= β1 yt − β1 yt−1 + β1 yt−1 + β2 yt−1 + β3 ct−1 + t
= β1 ∆yt + (β1 + β2 )yt−1 + β3 ct−1 + t
∆ct = β1 ∆yt + (β1 + β2 )yt−1 + (β3 − 1)ct−1 + t
β1 + β2
= β1 ∆yt + (β3 − 1) ct−1 − yt−1 +t
1 − β3
| {z }
E. C. term

∆yt , ∆ct are I (0), provided there is cointegration, so are the error
term and the equilibrium relationship in the large brackets

Error correction model

We can re-write the final line of the ECM as

∆ct = α1 ∆yt + α2 (ct−1 − βyt−1 ) + t (5)

β1 +β2
Cointegration imposes the restrictions γ = (β3 − 1) and β = 1−β3
If there is a cointegrating relationship
I ct−1 − β̂yt−1 ∼ I (0), and ˆt ∼ I (0)
I α̂2 < 0
The α̂2 < 0 requirement ensures ct adjusts to being above its
long-run level in period t − 1 by reducing in period t
To estimate such a model, we need an estimate of ct−1 − β̂yt−1

Estimation of the ECM
Engle and Granger (1987) propose a two-step procedure for estimating (5)
First we need an estimate of the cointegrating vector. Regress:

ct = βyt + νt
⇒ ν̂t = ct − β̂yt

the ν̂t is our estimate of deviations from the long-run equilibrium

Second, we estimate, by OLS

∆ct = α1 ∆yt + α2 ν̂t−1 + t

We can recover estimates of the parameters of the original dynamic

model (4) from the parameters of the estimated ECM, α̂1 , α̂2 and

Testing for cointegration: EG procedure
The two-step estimation approach suggests a method for testing whether 2
series are actually cointegrated
Estimate the cointegrating relationship

ct = βyt + νt

Save ν̂t series and perform an ADF test with no intercept

∆ν̂t = ρν̂t + γi ∆ν̂t−i + ut
H0 : ρ = 0 ⇒ ut is I(1) and there is no conitegration
HA : ρ < 0 ⇒ ut is I(0) and there may be cointegration
Critical values are McKinnon’s < DF critical values

Alexander Karalis Isaac (Warwick) Time Series July, 2015 82 / 90

Testing for cointegration: EG procedure
...estimate the ECM

∆ct = α1 ∆yt + α2 ν̂t−1 + t

Test that there is a significant, negative change in ct whenever

ct−1 > β̂yt−1 , in order to restor equilibrium

H0 : α̂2 < 0 ⇒ error correction is significant

HA : α̂2 ≥ 0 ⇒ no significant error correction
τ= ∼ t0.05,DoF
SE (α̂2 )

If the estimates pass these two tests, there is significant cointegration

and the ECM can be used to estimate the dynamic model
If not, then work with differences, i.e. transform the two series to
make them stationary.

EG procedure: discussion

The Engle-Granger procedure works well with two variables, but there are
The initial regression is misspecified, ν̂t is usually serially correlated
This two-step step approach introduces more variance than a
dynamically well-specified 1-step procedure
Results, esp. with more than two variables are sensitive to which
variable is taken as the left hand side variable
With more than two variables, there may be more than one
cointegrating relationship, and EG will estimate a linear combination
of these relationships, which has no real interpretation
These problems can be overcome by the Johansen procedure which is a
vector-based approach to estimating cointegrating equations

Empirical examples

Series β̂ ν̂t ∼ I (0) α̂2 t-stat

(ct , yt )
(hpt , wt )
(SPt , Dt )

Forecast comparisons

Estimate your error correction models on 1960-2000

Estimate your preferred ARIMA on 1960-2000
Produce 1-step and 4-step ahead out of sample forecasts with each
model for 2001-2006
Compare the MSPE from each model

Summary: Work stream for applied time series

Graph your data. Think about:

I Is the series trending over time?
I Is the trend exponential or linear?
I Is the series mean reverting?
I Would the series look mean reverting in most subsamples?
I Are there several variables that seem to exhibit the same random trend?
Take logs of exponentially increasing variables
Begin Dickey Fuller tests
I Decide about appropriate inclusion of trends and constants based on
visual inspection and inspection of DF regression results
I Include f + 1 lags in initial DF specification and remove insignificant
lags; check for serial correlation up to order f , ensure A2 is satisfied.
Using preferred specification of DF tests decide on order of
integration of the series

Summary: With the transformed stationary series

Build univariate ARMA models for forecasting

I Inspect ACF, PACF, decide on candidate AR, MA, ARMA specification
I Start with AR(f+1), MA(f+1) or ARMA (f/2,f/2)(?) specification and
test down by eliminating insignificant lags, minimizing AIC/BIC; ensure
A2 is satisfied in preferred model
Inspect forecast predictions v.s. actual outcomes
I Do the forecast error bounds include 95% of actual outcomes?
I Are the forecast errors close to uncorrelated?
Test robustness by performing out of sample forecast exercise
I You will need to reserve part of your sample so will lose some
information from the estimation
I But you might find a model that performs better in practice, or at
leatunderstand more about how your model is likely to perform as new
data comes in

Summary: Modelling cointegrating series
Plot the ratio of interest
Engle-Granger Procedure Step I
I Estimate the cointegrating relationship with appropriate constant/trend
I Save the residuals
I Perform Dickey Fuller test on residuals
I No constant! McKinnon p-values
I H0 : no cointegration. If reject H0 go to...
Engle-Granger Procedure Step II
I Estimate ECM with appropriate lagged differences so that A2 holds
I Test αˆ2 < 0 by standard t-test
I H0 : no cointegration (α2 = 0). If reject H0 ...
ECM is correct model. Recover parameters of restricted ARDL model
with appropriate tranformations
Interpret cointegrating relationship
Make dynamic forecasts
The End!

