Professional Documents
Culture Documents
slides-4
slides-4
Time series
James A. Duffy
IV.1
The road behind: cross-sectional data
IV.2
The road ahead: time series data
IV.3
What is a time series?
IV.4
Time series: examples
20
10
0
IV.5
Time series: examples
USD/GBP exchange rate Yield on 10−year UK government bonds
15
2.5
USD per GBP
10
Per cent
2.0
5
1.5
0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
15
4
USD per GBP
Per cent
10
3
5
2
1
1900 1920 1940 1960 1980 2000 2020 1750 1800 1850 1900 1950 2000
IV.5
Time series: examples
UK Unemployment rate UK quaterly real GDP
12
500
Billions of GBP, in 2015 prices
10
Per cent of labour force
400
8
300
6
4
200
2
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
UK real GDP growth rate (to 2019Q4) UK real GDP growth rate (to 2020Q4)
20
50
15
Per cent per annum
0
5
0
−50
−5
−10
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
IV.5
What makes time series data ‘different’ ?
• Ordering matters:
• ‘arrow of time’: causation runs from the past to the present
• Yt may depend on Yt−1 , Yt−2 , . . . ; but Yt−1 does not depend on Yt
• [terminology: Yt−k is the ‘kth lag’ of Yt ]
• Dependence between observations:
• Yt is almost always correlated with Yt−1 , and with Yt−2 , etc.
• graphically evident in series with ‘smooth’ time plots
• Non-identical distributions:
• Yt may have different mean, variance, etc. from Ys
• trends and breaks; changes in behaviour over time
• Consequences?
• many potential pitfalls when applying statistical methods for time series
• just diving in and running regressions – can lead to seriously misleading
inferences
IV.6
Forecasting
• We observe {Yt }T T
t=1 (and perhaps {Xt }t=1 too)
• final observation(s) in period T
• want to predict – to forecast – YT +1 , YT +2 , etc., using that data
• Conceptually straightforward:
• only needs a good ‘descriptive’ model
• of how Yt can be ‘best predicted’ using Yt−1 , Yt−2 , . . . (and perhaps
Xt−1 , Xt−2 , . . .)
• a natural application of regression analysis
IV.7
Roadmap
IV.8
Section 13
IV.9
Subsection 13.i
IV.10
‘Stable’ time series
IV.11
‘Stable’ time series
UK real GDP growth rate (to 2019Q4) UK unemployment rate: change from previous quarter
20
0.8
15
0.6
Per cent of labour force
Per cent per annum
10
0.4
0.2
5
0.0
0
−5
−0.4
−10
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
IV.11
‘Stable’ time series
500
Billions of GBP, in 2015 prices
2.5
400
USD per GBP
2.0
300
1.5
200
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
IV.11
Time series: examples
20
10
0
IV.11
Why do we want ‘stable’ time series?
IV.12
Stationarity: heuristics
IV.13
Stationarity: heuristics
IV.14
Subsection 13.ii
IV.15
Weak stationarity: definition
IV.16
Weak stationarity: consequences
• Descriptive statistics have something fixed to estimate!
• sample means and variance consistent for µ, σY2
• sample autocovariance:
T
1 X
γ̂h := cov(Y
ˆ t , Yt−h ) := (Yt − Y T )(Yt−h − Y T )
T
t=h+1
p
→ cov(Yt , Yt−h ) = γh
• sample autocorrelation:
cov(Y
ˆ t , Yt−h ) p cov(Yt , Yt−h )
ρ̂h := corr(Y
ˆ t , Yt−h ) := → = ρh
var(Y
ˆ t) var(Yt )
• consistency follows from LLNs for stationary, ‘weakly dependent’
processes
• Note the slight peculiarities in the definitions:
• γ̂h divides sum by T , rather than T − h or T − h − 1
• ρ̂h divides by var(Y
ˆ t ), because var(Yt ) = var(Yt−h )
IV.17
Weak stationarity: persistence
• implies ρh = ρ−h
• note that ρ0 = cov(Yt , Yt )/ var(Yt ) = 1
IV.18
Sample ACF: examples
UK unemployment rate: change from previous qtr (1960Q1−2016Q4)
1.0
0.8
Sample ACF: ρ h
^
0.6
0.4
0.2
0.0
0 1 2 3 4 5 6
Lag: h
• Red dashed lines: interval that contains ρ̂h with 95% probability,
assuming an i.i.d. (i.e. uncorrelated) series
IV.19
Sample ACF: examples
UK real GDP qtrly growth rate (1982Q1−2019Q4)
1.0
0.8
h
0.6
Sample ACF: ρ
^
0.4
0.2
0.0
−0.2
0 1 2 3 4 5 6
Lag: h
• Red dashed lines: interval that contains ρ̂h with 95% probability,
assuming an i.i.d. (i.e. uncorrelated) series
IV.20
Weak stationarity: persistence
IV.21
Stationarity and persistence
−0.4
2
1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010
1.0
1.0
0.8
0.8
h
h
Sample ACF: ρ
Sample ACF: ρ
^
^
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Lag: h Lag: h
IV.22
^
Sample ACF: ρ h Log £b GBP, 2015 prices
−0.2 0.0 0.2 0.4 0.6 0.8 1.0 26.2 26.4 26.6 26.8 27.0
0
1
1990
2
3
2000
Lag: h
4
Real GDP: logarithm
2010
5
6
Stationarity and persistence
2020
^
Sample ACF: ρ h Percent per annum
0
1
1990
2
3
2000
Lag: h
4
Real GDP: qtrly growth rate
2010
5
6
2020
IV.23
Strict stationarity
• {Xt } and {Yt } are jointly strictly stationary if, for every k ≥ 0,
IV.24
Stationarity
IV.25
How to tell if a time series is stationary?
IV.26
Subsection 13.iii
IV.27
What can we do with nonstationary time series?
IV.28
Transforming to stationarity
• Deterministic detrending: Yt − δ0 − δ1 t
• where δ0 and δ1 are estimated (by regression)
• Differencing:
• first difference: ∆Yt := Yt − Yt−1
• seasonal difference (quarterly data): ∆4 Yt = Yt − Yt−4
• Logarithms and growth rates:
• difference of logs: ∆ log Yt := log Yt − log Yt−1
• % growth rate: 100 × ∆ log Yt
• annualised % growth rate: 400 × ∆ log Yt , for quarterly data
• Why log differences, and not % changes?
• log differences are ‘symmetric’ in ± changes
• but % changes are not: ↓ 50% followed by ↑ 50% is a 25% decline!
• what I label a ‘% change’ in figures is always a difference of logs [as per
the usual practice in (macro)economics]
IV.28
Transforming to stationarity: US real GDP
^ ^
δ0 + δ1 × t
9.5
Log trillion USD, 2012 prices
Trillion USD, 2012 prices
15
9.0
8.5
10
8.0
7.5
5
7.0
1940 1960 1980 2000 2020 1940 1960 1980 2000 2020
1.0
1.0
0.8
0.8
h
h
0.6
0.6
Sample ACF: ρ
Sample ACF: ρ
^
^
0.4
0.4
0.2
0.2
−0.2 0.0
−0.2 0.0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Lag: h Lag: h
IV.29
Transforming to stationarity: US real GDP
Detrended log US annual real GDP Change in log US annual real GDP
0.15
0.0 0.1 0.2
0.05
−0.2
−0.05
−0.4
−0.15
1940 1960 1980 2000 2020 1940 1960 1980 2000 2020
1.0
1.0
0.8
0.8
0.6
h
h
0.6
Sample ACF: ρ
Sample ACF: ρ
^
0.4
0.4
0.2
0.2
−0.2 0.0
−0.2
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Lag: h Lag: h
IV.30
Transforming to stationarity
IV.31
Differencing twice to get to stationarity?
0.12
Log US CPI, 1982−84 = 100
5.0
0.08
Per cent
4.5
0.04
4.0
3.5
0.00
1950 1960 1970 1980 1990 2000 2010 2020 1950 1960 1970 1980 1990 2000 2010 2020
1.0
1.0
0.8
0.8
h
h
0.6
0.6
Sample ACF: ρ
Sample ACF: ρ
^
^
0.4
0.4
0.2
0.2
−0.2 0.0
−0.2 0.0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Lag: h Lag: h
IV.32
Differencing twice to get to stationarity?
0.04
0.08
Per cent
Per cent
0.00
0.04
−0.04
0.00
1950 1960 1970 1980 1990 2000 2010 2020 1950 1960 1970 1980 1990 2000 2010 2020
1.0
1.0
0.8
0.8
0.6
h
h
0.6
Sample ACF: ρ
Sample ACF: ρ
^
0.4
0.4
0.2
0.2
−0.2 0.0
−0.2
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Lag: h Lag: h
IV.33
Subsection 13.iv
IV.34
Modelling stationary time series
• We’ve seen: macro time series that are plausibly stationary have
• (roughly) constant means and variances
• sample ACFs that decay ‘rapidly’ in h
• Next step: develop simple descriptive models that can
• match the pattern of serial correlation we see in these series
• ‘repackage’ this in a way useful for prediction – for forecasting!
• Autoregressive of order p models: AR(p)
• essential building block for more elaborate models
• [terminology: ‘auto’-regressive: regressed on own lags]
IV.35
AR(1) model
Yt = β0 + β1 Yt−1 + ut
IV.36
AR(1) model
IV.37
AR(1) model: is it fit for purpose?
IV.38
When is an AR(1) process stationary?
• Stationarity depends:
• crucially on β1 : regulates how persistent {Yt } is
• ‘technically’ on Y0 : how the process is ‘initialised’
• To derive requirements on β1 and Y0 : suppose {Yt } is stationary . . .
1. µ := EYt must satisfy
IV.39
When is an AR(1) process stationary?
β0 σu2
EYt = µ = var(Yt ) = σY2 =
1 − β1 1 − β12
• implies |β1 | ≥ 1 is inconsistent with stationarity
• ‘solution’ for σY2 is either ∞ or negative in such cases
β0 σu2
EY0 = var(Y0 ) =
1 − β1 1 − β12
• Preceding shows EYt and var(Yt ) are time invariant: what about
cov(Yt , Yt−h )?
IV.40
When is an AR(1) process stationary?
IV.41
Properties of a stationary AR(1) process
• Population ACF:
cov(Yt , Yt−h ) β h σ2
ρh := = 1 2 Y = β1h
var(Yt ) σY
Yt = (1 − β1 )µ + β1 Yt−1 + ut
IV.42
Population ACFs for AR(1) models
β1 = 0.25 β1 = 0.5
1.0
1.0
0.8
0.8
Population ACF: ρh
Population ACF: ρh
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
β1 =Lag
0.75 β1 =Lag
0.95
1.0
1.0
0.8
0.8
Population ACF: ρh
Population ACF: ρh
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
2
2
1
1
0
0
−1
−1
−2
−2
1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020
β1 = 0.75 β1 = 0.95
4
4
2
2
0
−2
0
−4
−2
−6
−8
−4
1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020
β1 = − 0.75 β1 = − 0.95
5
2
0
0
−2
−5
−4
1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020
1.0
1.0
0.5
0.5
Population ACF: ρh
Population ACF: ρh
0.0
0.0
−0.5
−0.5
−1.0
−1.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
IV.46
Forecasting: the general problem
• Observe Yt for t = 1, . . . , T ; want to forecast T + h for h = 1, 2, . . .
• What is the best possible forecast we could make (in theory) of YT +1 ,
using the available past history of {Yt }T t=1 ?
YT +h|T := E[YT +h | YT ]
• also termed the MSFE-minimising forecast
IV.47
Forecasting: with an AR(1) model
• To operationalise: need to estimate YT +1|T := E[YT +1 | YT ]. How?
• If we assume {Yt } is an AR(1) process:
IV.50
Forecasting: GDP growth
YT+h
Y Y˙hat e˙hat ^
YT+h|T
2.0
2016 Q1 0.80 0.80 0.00
Qrtly % change
2016 Q2 1.81 1.56 0.25
2016 Q3 1.22 1.95 -0.73
1.5
2016 Q4 2.38 2.16 0.22
2017 Q1 1.98 2.26 -0.29
2017 Q2 1.17 2.32 -1.15 1.0
IV.51
Forecast errors: in an AR(1) model
• Error made by the:
• (infeasible) optimal forecast:
IV.52
Forecast performance
• How can we measure the performance of our forecasts?
• use same criteria as we are trying to minimise: the MSFE!
• for the theoretically optimal forecast:
• Main implications:
• MSFE(ŶT +1|T ) is always larger than MSFE(YT +1|T )
• residual variance σ̂u2 isn’t a good estimate of MSFE(ŶT +1|T )
• we’ll discuss how to estimate MSFE(ŶT +1|T ) in Section 14
IV.53
Subsection 13.vi
IV.54
AR(p) models
IV.55
Possible trajectories for AR(2) processes
β1 = 0.15, β2 = 0.45 β1 = 0.45, β2 = 0.15
3
3
2
2
1
1
0
0
−1
−1
−2
−2
−3
−3
−4
−4
1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020
4
1
2
0
−1
0
−2
−2
−3
−4
−4
1970 1980 1990 2000 2010 2020 1970 1980 1990 2000 2010 2020
1.0
Population ACF: ρh
Population ACF: ρh
0.5
0.5
0.0
0.0
−0.5
−0.5
0 1 2 3 4 5 6 0 1 2 3 4 5 6
β1 = 1, Lag
β2 = − 0.5 β1 = 0.5,Lag
β2 = 0.45
1.0
1.0
Population ACF: ρh
Population ACF: ρh
0.5
0.5
0.0
0.0
−0.5
−0.5
0 1 2 3 4 5 6 0 1 2 3 4 5 6
IV.59
Why (linear) AR(p) models?
IV.60
Section 14
Forecasting
IV.61
Roadmap
IV.62
Subsection 14.i
Forecast evaluation
IV.63
Estimating the MSFE
IV.64
Pseudo out-of-sample forecasting
T −1
ˆ ŶT +1|T ) = 1
X
ςˆ12 := MSFE( 2
ês+1|s
P
s=T −P
IV.66
Example: forecasting GDP growth
ˆ
• Compares with sd(grgdp) = 2.436 over the OOS period
IV.67
Subsection 14.ii
Model selection
IV.68
Which model should we forecast with?
p ∈ {0, 1, . . . , pmax }
IV.69
Fundamental bias–variance trade-off
IV.70
Model selection: formal approaches
ˆ
1. Directly compare estimated forecast performance: MSFE
• forecasting specific
ˆ
• may be unreliable if small P is used to estimate MSFE
2. Stepwise ‘testing down’ from a larger to smaller model
• applicable to only ‘nested’ models
3. Information criteria: penalise fit by number of model parameters
• very widely applicable, even to ‘non-nested’ models
IV.71
Stepwise ‘testing down’ procedure
IV.72
Information criteria
SSRm 2
Akaike IC AICm := log +m
T T
SSRm log T
Bayesian IC BICm := log +m
T T
• ‘best’ model minimises one of these criteria
• depends on model fit (SSR), and a penalty that grows with m
• note that AR(p) model has m = p + 1 parameters
IV.73
Information criteria
SSRm 1
2 log SERm ' log +m
T T
and so tend to choose models with weaker forecasting performance
IV.74
Model selection: example
• Estimate AR(p) models for GDP growth and the change in the
unemployment rate (both 1986Q1–2018Q4):
IV.75
Why did we not allow ‘gaps’ ?
IV.76
Subsection 14.iii
Additional predictors:
forecasting and Granger causality
IV.77
Beyond the AR(p) model
IV.78
The ADL(p,q) model
• Take an AR(p) model and add in q lags of another predictor {Xt }:
IV.79
Forecasting with an ADL(p,q) model
p
X q
X
Yt = β0 + βi Yt−i + δi Xt−i + ut E[ut | Yt−1 , Xt−1 ] = 0
i=1 i=1
YT +1|T := E[YT +1 | YT , XT ]
p q
" #
X X
= E β0 + βi YT +1−i + δi XT +1−i + uT +1 | YT , XT
i=1 i=1
p
X q
X
= β0 + βi YT +1−i + δi XT +1−i
i=1 i=1
IV.80
Forecasting with an ADL(p,q) model
p
X q
X
Yt = β0 + βi Yt−i + δi Xt−i + ut E[ut | Yt−1 , Xt−1 ] = 0
i=1 i=1
IV.81
Forecasting with an ADL(p,q) model
• What about h-step ahead forecasting?
• suppose p = q = 1 for simplicity, i.e.
• if h = 2: then
YT +2|T = E[YT +2 | YT , XT ]
= E[β0 + β1 YT +1 + δ1 XT +1 + uT +2 | YT , XT ]
= β0 + β1 E[YT +1 | YT , XT ] + δ1 E[XT +1 | YT , XT ]
= β0 + β1 YT +1|T + δ1 XT +1|T
IV.82
Forecasting with an ADL(p,q) model: example
• Suppose we are in 2016Q1:
• want to forecast quarterly UK GDP growth rate
(grgdp := 400∆ log GDP) for the next quarter
• additionally including the term spread (term; difference between the
10-year and 3-month bond rate)
ŶT +1|T = β̂0 + β̂1 YT + δ̂1 XT = 0.83 + 0.60 × 0.80 + 0.18 × 1.02 = 1.49
IV.83
Does {Xt } help to forecast {Yt }?
• Referred to in econom(ics/etrics) as the problem of Granger causality
• We say Xt Granger causes Yt if:
• informally: lags of Xt improve forecasts of Yt made on the basis its
own lags
• formally: lags of Xt reduce the optimal forecast’s MSFE:
• or equivalently:
E[YT +1 | YT , XT ] 6= E[YT +1 | YT ]
IV.84
Testing for Granger causality
p
X p
X
Yt = β0 + βi Yt−i + δi Xt−i + ut E[ut | Yt−1 , Xt−1 ] = 0
i=1 i=1
H0 : δ1 = δ2 = · · · = δp = 0
• a null of Granger non-causality
a
• a test of p linear restrictions: F ∼ Fp,∞
• unrestricted model: ADL(p, p), with k = 2p slope coefficients
• restricted model: AR(p)
• If reject H0 : conclude {Xt } Granger causes {Yt }
• Good practice to report F from several p’s, as a robustness check
IV.85
Granger causality: examples
IV.86
Does output Granger-cause unemployment?
• Table reports estimates for ADL(p,p) models:
p
X p
X
∆urt = β0 + βi ∆urt−i + δi grgdpt−i + ut
i=1 i=1
IV.87
Does unemployment Granger-cause output?
• Could also run in reverse: can unemployment help forecast output?
p
X p
X
grgdpt = β0 + βi grgdpt−i + δi ∆urt−i + ut
i=1 i=1
IV.88
Granger causality: stock prices and dividends
IV.89
Stock prices and dividends: for the S&P 500
S&P 500: index S&P 500: dividends
4000
60
50
3000
40
2000
30
20
1000
10
0
0
1940 1960 1980 2000 2020 1940 1960 1980 2000 2020
S&P 500: index growth rate S&P 500: dividend growth rate
15
10
40
5
20
0
0
−5
−20
1940 1960 1980 2000 2020 1940 1960 1980 2000 2020
IV.90
Granger causality: stock prices and dividends
• Stock prices (pr) and dividends (div) are highly nonstationary, so
estimate model in growth rates
p
X p
X
∆ log divt = β0 + βi ∆ log divt−i + δi ∆ log prt−i + ut
i=1 i=1
(Sample: US, 1930Q1–2018Q4; pr and div are from the S&P 500)
IV.91
Section 15
IV.92
Stationarity: revisited
IV.93
Subsection 15.i
IV.94
Parameter instability
IV.95
Example: UK real GDP growth
Real GDP: qtrly growth rate
20
15
mean sd
Percent per annum
10
5
1.0
0.8
0.8
h
0.6
0.6
Sample ACF: ρ
Sample ACF: ρ
^
0.4
0.4
0.2
0.2
0.0
0.0
−0.2
−0.2
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Lag: h Lag: h
IV.96
‘Breaks’ and parameter instability
• ‘Break’: abrupt change in model parameters on (or very near) a
particular date; modelled using breakpoint dummies
• Suppose we think things changed on (or around) 1989Q4; we could
specify:
Yt = β0 + β1 Yt−1 + γ0 Dt (τ ) + γ1 Dt (τ ) · Yt−1 + ut
• Implications?
• intercept is: β0 + γ0 until 1989Q4, β0 thereafter
• coef. on Yt−1 is: β1 + γ1 until 1989Q4, β1 thereafter
• equivalent to estimating separate AR(1) models on 1960–1989 and
1990–2018 subsamples
IV.97
Chow test for a break
IV.98
Testing for a break in UK GDP: at 1989Q4
• Easily reject the null: strong evidence of a break in the AR(1) model
describing GDP growth: at (or around) 1989Q4
IV.99
What if the break date is unknown?
IV.100
What if the break date is unknown?
• What is π?
• ‘trimming parameter’, to ensure enough pre- and post-break
observations
• π = 15% is a reasonable choice, in most macro applications
• QLR involves many, many tests:
• critical values larger than for the F test (see Table 14.5 in S&W)
• depends on: q = # parameters allowed to break; trimming π
• e.g. if q = 2 and π = 0.15, c0.05 = 5.86 (v. 3.00 for F test)
• Break date estimator? The maximiser τ̂ of these F statistics:
τ̂ := argmax F (τ )
τ0 ≤τ ≤τ1
IV.101
QLR test for a break in UK real GDP growth
20
15 Real GDP: quarterly growth rate
Percent per annum
10
5
0
−10 −5
1% critical value
5% critical value
10
8
F statistic
6
4
2
0
5
0
−5
1% critical value
5% critical value
10
8
F statistic
6
4
2
0
IV.104
Breaks: summary
IV.105
Subsection 15.ii
IV.106
Trends / ‘random wandering’
IV.107
Examples of trends / ‘random wandering’
30
Per cent per annum
20
10
5
0
0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
1.0
1.0
0.8
0.8
h
h
Sample ACF: ρ
Sample ACF: ρ
^
^
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Lag: h Lag: h
IV.108
Examples of trends / ‘random wandering’
26.8
10
26.4
6
26.0
4
25.6
2
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
1.0
1.0
0.8
0.8
h
h
Sample ACF: ρ
Sample ACF: ρ
^
^
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Lag: h Lag: h
IV.109
AR models for nonstationary time series
β1 + β2 = (γ1 + 1) + (−γ1 ) = 1
IV.110
Unit root processes
p
X
{∆Yt } is AR(p − 1) ⇐⇒ {Yt } is AR(p) with βi = 1
i=1
models / processes
• So what are the properties of unit root AR models?
IV.111
Unit roots: AR(1) case
• Unit root AR(1) process has β1 = 1:
Yt = β0 + Yt−1 + ut
3. an initial value: Y0
IV.112
Unit roots: AR(1) case
t
X
Yt = β0 t + us + Y0 =: β0 t + Ut + Y0
s=1
t+1
X t
X
Ut+1 = us = us + ut+1 = Ut + ut+1
s=1 s=1
IV.113
Trajectories of unit root AR(1): driftless
0 β0 = 0 β0 = 0
0
−5
−5
−10
−10
−15
−15
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
β0 = 0 β0 = 0
10
5
5
0
−5
0
−10
−5
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
IV.114
Trajectories of unit root AR(1): with drift
β0 = 0.2 β0 = 0.3
100
30
80
20
60
40
10
20
0
0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
β0 = 0.4 β0 = 0.5
120
80
100
60
80
60
40
40
20
20
0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
IV.115
Unit roots: AR(p) case
Yt = β0 t + Vt + Y0
1. β0 t: deterministic trend
2. Vt := ts=1 vs , for {vt } stationary: a stochastic trend
P
IV.116
Trajectories of unit root AR(p) processes
AR(4), β0 = 0 AR(4), β0 = 0
2
6
1
4
0
−1
2
−2
0
−3
−2
−4
−4
−5
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
2.5
2.0
2.0
1.5
1.5
1.0
1.0
0.5
0.5
0.0
0.0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
IV.117
Deterministic and stochastic trends
1. Deterministic trend: β0 t
• generates linear growth
Pt
2. Stochastic trend: Vt := s=1 vs for {vt } stationary and lrvar(vt ) > 0
• generates (and synonymous with) ‘random wandering’ behaviour
• special case: a random walk if {vt } is serially uncorrelated
IV.118
Unit root processes: do they match real series?
IV.119
Unit root processes: do they match real series?
10
10
8
Per cent of labour force
6
6
4
4
2
2
0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
1.0
1.0
0.8
0.8
h
h
Sample ACF: ρ
Sample ACF: ρ
^
^
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Lag: h Lag: h
IV.120
Unit root processes: do they match real series?
8
8
6
6
4
4
2
2
0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
1.0
1.0
0.8
0.8
h
h
Sample ACF: ρ
Sample ACF: ρ
^
^
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Lag: h Lag: h
IV.120
Unit root processes: do they match real series?
Log of US annual real GDP Log of UK quarterly real GDP
27.0
^ ^ ^ ^
δ0 + δ1 × t δ0 + δ1 × t
9.5
9.0
Log $tr, 2012 prices
26.5
8.5
8.0
26.0
7.5
7.0
25.5
1940 1960 1980 2000 2020 1960 1970 1980 1990 2000 2010 2020
Detrended log US annual real GDP Detrended log UK quarterly real GDP
0.10
0.2
0.1
0.05
Log $tr, 2012 prices
0.00
−0.2
−0.05
−0.4
−0.10
1940 1960 1980 2000 2020 1960 1970 1980 1990 2000 2010 2020
IV.121
Subsection 15.iii
IV.122
Handling unit roots in time series
IV.123
Pitfalls of unit roots
IV.124
Pitfalls of unit roots
• Conclusion? If we know {Yt } has a unit root, it is still preferable to fit
an AR (or ADL) model to {∆Yt }
• estimating the AR model for {∆Yt } implicitly imposes a unit root in
Yt : reduces estimation error / biases
• forecasts for Yt can be recovered as per
IV.125
Testing for a unit root: AR(1) case
H0 : β1 = 1 v. H1 : β1 < 1
• Why the one-sided alternative?
• β1 < 1: consistent with stationarity (provided β1 > −1; not tested)
• β1 > 1: implies ‘explosive’ behaviour, which is usually implausible
• one-sided test interprets evidence for β1 > 1 as evidence against
stationarity (correctly)
• one-sided test means smaller critical value; a more powerful test!
IV.126
Explosive AR(1) processes: β1 > 1
β0 = 0, β1 = 1.01 β0 = 0, β1 = 1.02
300
25
250
20
200
15
150
100
10
50
5
0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
β0 = 0, β1 = 1.03 β0 = 0, β1 = 1.04
800
6000
600
4000
400
2000
200
0
0
1960 1970 1980 1990 2000 2010 2020 1960 1970 1980 1990 2000 2010 2020
IV.127
Testing for a unit root: AR(1) case
H0 : δ = 0 v. H1 : δ < 0
−4 −2 0 2
IV.129
Example: is there a unit root in unemployment?
∆Yt = β0 + δYt−1 + ut
δ̂ −0.00769
t= = = −1.29
se(δ̂) 0.00595
∆Yt = β0 + δYt−1 + ut
• AR(1) model is very restrictive
• implies e.g. ∆Yt is serially uncorrelated (under H0 )
• may give misleading conclusions about presence of a unit root
• How to test for a unit root in the more general AR(p) setting?
• consider adding a lag of ∆Yt to the model above:
IV.131
Testing for a unit root: AR(p) case
• implies
(
=1 if δ = 0
β1 + β2 = (δ + γ1 + 1) + (−γ1 ) = δ + 1
<1 if δ < 0
IV.132
ADF test: constant only
• ‘Augmented’ Dickey–Fuller (ADF) test for a unit root
p
X
∆Yt = β0 + δYt−1 + γi ∆Yt−i + ut
i=1
1. Hypotheses:
H0 : δ = 0 v. H1 : δ < 0
• null: {∆Yt } is AR(p), so {Yt } has a unit root
• alternative: {Yt } is stationary AR(p + 1)
2. Test statistic: homoskedasticity-only t statistic:
δ̂ d
t := → DFcn
se(δ̂)
3. Decision rule: reject if t < cα
4. Critical values: from tabulation of DFcn
IV.133
Example: is there a unit root in unemployment?
δ̂
t= = −2.41 ≮ −2.86 = c0.05
se(δ̂)
• Still do not reject H0 (at α = 0.05): so unemployment has a unit root
• note: use of normal critical values (c0.05 = −1.64) would incorrectly
lead to a rejection of H0 here
IV.134
ADF test: a limitation
p
X
∆Yt = β0 + δYt−1 + γi ∆Yt−i + ut
i=1
Yt = β0 t + Vt + Y0
• has a deterministic trend (if β0 6= 0)
• but under the alternative, {Yt } is stationary: with no trend!
• Could lead to misleading conclusions, when applied to series with a
linear trend, e.g. log GDP
• may fail to reject the null, because the only null allows for a trend
• we need to allow for a linear trend under the alternative
• accommodated by augmenting model by a linear trend, αt
• E.g. an (otherwise) stationary AR(1) alternative (β1 < 1):
Yt = β0 + αt + β1 Yt−1 + ut
• Yt is stationary ‘around’ a linear trend: a trend stationary process
• linearly detrending Yt would yield a stationary process
IV.135
ADF test: constant and trend
p
X
∆Yt = β0 + αt + δYt−1 + γi ∆Yt−i + ut
i=1
1. Hypotheses:
H0 : δ = 0 v. H1 : δ < 0
• null: {∆Yt } is AR(p), {Yt } has a unit root
• alternative: {Yt } is trend stationary AR(p + 1)
2. Test statistic: homoskedasticity-only t statistic:
δ̂ d
t := → DFtr
se(δ̂)
IV.136
The Dickey–Fuller distributions
DFcn : constant only
0.5 DFtr : constant and trend
N[0,1]
0.4
0.3
0.2
0.1
0.0
−4 −2 0 2
p
X
∆Yt = β0 + αt + δYt−1 + γi ∆Yt−i + ut
i=1
IV.138
Example: is there a unit root in (log) real GDP?
p
X
∆Yt = β0 + αt + δYt−1 + γi ∆Yt−i + ut
i=1
1960Q1–2018Q4 1990Q1–2018Q4
p AIC BIC tδ=0 AIC BIC tδ=0
IV.139
Subsection 15.iv
Orders of integration
IV.140
Orders of integration
p
X
∆Yt = β0 + δYt−1 + γi ∆Yt−i + ut
i=1
• Suppose we accept H0 : δ = 0:
• conclude {Yt } has a unit root
• does this imply {∆Yt } is stationary?
• {∆Yt } follows an AR(p) model
p
X
∆Yt = β0 + γi ∆Yt−i + ut
i=1
• but possible that pi=1 γi = 1: {∆Yt } may itself have a unit root!
P
• could it be ‘unit roots all the way down’ ?
IV.141
Orders of integration
IV.142
Orders of integration
• Most economic time series are (approximately) I (d) for d ∈ {0, 1}, a
very small number are I (2)
• I (0): stationary processes
• I (1): have stochastic trends, but differences are stationary
• I (2): have to be differenced twice to obtain a stationary process
• I (d), d ≥ 1 series are termed integrated processes
• I (2): are randomly wandering, but even more persistent (‘smoother’)
than I (1) processes
IV.143
Example: is the (log) price level I (2)?
• Previously, we noted might be necessary to difference US log CPI
(1950–2019) twice to obtain a stationary series
• Does this bear up to formal unit root tests?
log CPIt ∆ log CPIt ∆2 log CPIt
constant + trend constant only constant only
IV.144
Differencing twice to get to stationarity?
0.12
Log US CPI, 1982−84 = 100
5.0
0.08
Per cent
4.5
0.04
4.0
3.5
0.00
1950 1960 1970 1980 1990 2000 2010 2020 1950 1960 1970 1980 1990 2000 2010 2020
1.0
1.0
0.8
0.8
h
h
0.6
0.6
Sample ACF: ρ
Sample ACF: ρ
^
^
0.4
0.4
0.2
0.2
−0.2 0.0
−0.2 0.0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Lag: h Lag: h
IV.145
Differencing twice to get to stationarity?
0.04
0.08
Per cent
Per cent
0.00
0.04
−0.04
0.00
1950 1960 1970 1980 1990 2000 2010 2020 1950 1960 1970 1980 1990 2000 2010 2020
1.0
1.0
0.8
0.8
0.6
h
h
0.6
Sample ACF: ρ
Sample ACF: ρ
^
0.4
0.4
0.2
0.2
−0.2 0.0
−0.2
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Lag: h Lag: h
IV.146
Beyond the AR model
IV.147
Subsection 15.v
IV.148
Handling nonstationarities
• Finally: distil what we’ve covered into a ‘recipe’ for forecasting with
nonstationary data
1. Plot the series and its sample ACF
• Does the series pass the ‘eyeball test’ for stationarity?
• or is a stochastic trend possibly present? =⇒ Step 2
• or is it stationary but for possible breaks? =⇒ Step 3
IV.149
Handling stochastic trends
IV.150
Handling breaks
IV.151
Section 16
Cointegration
IV.152
Regressions with integrated processes
IV.153
Subsection 16.i
Spurious regression
IV.154
Regression: with i.i.d. or stationary processes
Yt = β0 + β1 Xt + ut cov(Xt , ut ) = 0
IV.155
Regression: with stochastic trends
• Suppose {Xt } and {Yt } are independent random walks
t
X t
X
Xt = εxs Yt = εys
s=1 s=1
p β̂1 − 0 d
β̂1 → 0 t(0) = →ξ
se(β̂1 )
• but what actually happens is quite the opposite!
IV.156
Regression: with stochastic trends
P{|t(0)| > c} → 1
IV.157
Distribution of the t statistic
• Simulation:
• generate 1000 pairs of independent random walks {Xt }T T
t=1 and {Yt }t=1
• regress {Yt } on {Xt }, and compute the t statistic for H0 : β1 = 0
Spurious regression: distribution of the t statistic
T = 50
T = 100
0.06
T = 200
± 2.58
0.04
0.02
0.00
IV.159
Spurious regression: example
• Does the (log of) US industrial production (IP) help to account for
unemployment in the UK?
β̂1 se tβ1 =0 R2
IV.160
Spurious regression: example
UK unemployment rate
12
10
Per cent
8
6
4
2
60
40
20
IV.161
Spurious regression: and stochastic trends
IV.162
Spurious regression: diagnosis
IV.163
Spurious regression: residuals
UK unemployment rate
10
9
Per cent
8
7
6
5
80
70
60
Regression residual
2
1
Index
0
−1
−2
IV.164
Subsection 16.ii
IV.165
Regressions with integrated processes
IV.166
Cointegration: definition and interpretation
IV.167
Example: I (1) processes that tend to co-move
UK 10−year and 3−month Treasury bond yields (annualised)
10 year
15
3 month
10
Per cent
2
0
−2
−4
IV.168
Example: I (1) processes that tend to co-move
UK, 1870–2011: real wages [1985=1] and real output per worker [£1985 prices]
Sheet1
10
log (real output / worker)
log (hourly real wages) 0.75
9.5
log (real output / worker)
9
-0.25
8.5
-0.75
8
-1.25
7.5 -1.75
1870 1890 1910 1930 1950 1970 1990 2010
IV.169
Cointegration: mathematical illustration
• Suppose vt , wyt and wxt are mean zero and I (0), and
t t
X 1X
Yt = (µ + vs ) + wyt Xt = (µ + vs ) + wxt
s=1
θ s=1
IV.170
Cointegration: implications for OLS
• Does OLS regression of Yt on Xt make sense if there is cointegration?
• OLS chooses (α̂, θ̂) to solve
T
X
min (Yt − a − cXt )2
(a,c)
t=1
p
• Conclusion: if Xt and Yt are cointegrated, θ̂ → θ
• but because Xt ∼ I (1), θ̂ has a non-standard limiting distribution
• other estimators more efficient than OLS (and asymptotically normal!):
beyond scope of this course
IV.171
Contrast with spurious regression
• What if Yt , Xt ∼ I (1) but do not have a common stochastic trend?
t
X t
X
Yt = vys + wyt Xt = vxs + wxt
s=1 s=1
• there is no linear combination that eliminates the trends, even if vys and
vxs are (imperfectly) correlated
• for every γ,
t
X
Yt − γXt = (vys − γvxs ) + (wyt − γwxt ) ∼ I (1)
s=1
• Implication?
• residuals from a spurious regression will appear I (1):
IV.173
Example: real wages and output per worker
• Yt = log real wages; Xt = log output per worker
1. Perform ADF tests on both series
• constant + trend on levels; constant only on differences
• with lag order selected by AIC in each case
Yt ∆Yt Xt ∆Xt
IV.175
Example: the term spread
ξt := Yt − θXt = Yt − Xt
IV.176
Cointegration: extensions
IV.177
Lessons learned
• If Xt , Yt ∼ I (1):
• regression of Yt on Xt consistently estimates the cointegrating
relationship, if there is one
• otherwise, estimates can spuriously indicate a significant relationship
• you have to check the order of integration before running time series
regression: often the ‘eyeball ADF test’ is sufficient
• and then check the order of integration of your residuals!
• Spurious regression only arises when regressions are ‘unbalanced’
• when the stochastic trend in Yt does not appear on the r.h.s.
• won’t arise in a regression with lagged Yt , e.g. in
Yt = β0 + β1 Yt−1 + γ2 Xt + ut
IV.178