Stationarity, Cointegration: Arnaud Chevalier University College Dublin January 2004

Stationarity, cointegration
Arnaud Chevalier
University College Dublin
January 2004
STATIONARITY
Typically, we only observe one set of realisations for any particular series.
However if yt is stationary, the mean, variance and autocorrelations can
usually be well approximated by sufficient long time averages based on a
single set of realisations.
A stochastic process having a finite mean and variance is co-variance
stationary if for all t and t-s:
E ( yt )  E ( yt s )  

E  yt   
2
  E  y t s  
2
 2
y
E
 
E   y t    y t  s      E  y t  j    y t  j  s      s
A series is covariance stationary if its mean and all autocovariances are
unaffected by a change of time origin.
For a covariance stationary series, autocorrelation between yt and ys is:
s   s /  0
The autocorrelation is independent of time
* stationarity conditions for an AR(1) process
y t  a 0  a1 y t 1   t
Supposed the process started in period zero, so y0 is a deterministic
initial condition.
t 1 t 1
y t  a 0  a  a y 0   a1i  t i
i
1
t
1
i 0 i 0
t 1
Eyt  a 0  a1i  a1t y 0
i 0
So the mean is time dependent, and this sequence is therefore not
stationary.
If |a1|<1, ( a1 ) t y 0 converges to 0 as t∞.
i
Also as t becomes large, we have: a 0  a1i  a 0 /(1  a1 )
i 1

Thus lim y t  a 0 /(1  a1 )   a1i  t i
i 0
And for large t, Yet = a0 / (1-a1), which is finite and time independent.
The limit of the variance is:

E  Yt     E  t  a1 t 1  ( a1 ) 2  t  2  ...
2
 2


  2 1   a1    a1   ..   2 / 1   a1 
2 4
  2

Which is also finite and time-independent.
Similarly, the limiting values of all autocovariances are finite and ime-independent:
  
E   y t    y t  s      E  t  a1 t 1   a1   t  2  ..  t  s  a1 t  s 1   a1   t  s  2  ..
2 2
 2 4

  2  a1  1   a1    a1   ...   2  a1 
s s
/ 1   a  
1
2
If |a1|<1 and t large, yt is a stationary process

4 STATIONARY RESTRICTIONS FOR AN ARMA(p,q)
First consider an ARMA(2,1):

y t  a1 y t 1  a 2 y t  2   1 t 1   t (16)
Yt can also be written as an infinite MA process:

y t    i  t i (17)
i 0
For the 2 expressions to be equal we must have:
 0  t   1 t 1  ...  a1 ( 0  t 1   1 t  2  ...)  a 2 ( 0  t  2   1 t 3  ...)  1 t 1   t
This means:
0  1
 1  a1 0  1   1  a1   1
 i  a1 i 1  a 2 i  2 i  2

From (17) it is easy to see that E(yt)=0 and var(yt)=  2

i 0
i
2
are finite and time invariant.
The covariance between yt and yt-s is constant and independent of t.

cov( y t , y t  s )   2 ( s   s 1 1   s  2 2  ..)
These results can be generalised to an ARMA(p,q)
 A finite MA process will always be stationary

5 AUTOCORRELATION FUNCTION
* AR(1) process: y t  a 0  a1 y t 1   t

 0   2 / 1   a1 
2


 s   2  a1  / 1   a1 
s 2

Thus, the autocorrelations are :  s   0 /  s , hence ρ0=1, ρ1=a1, ρs=(a1)s
For an AR(1) process a necessary condition for stationarity is |a 1|<1

* AR(2) process: y t  a1 y t 1  a 2 y t  2   t
For an AR(2) to be stationary, the roots of (1-a1L-a2L2) need to be

outside the unit circle.
We use the Yule-Walker technique to find the autocorrelations:
Multiplying yt by yt-s for s=0,… and taking expectations, we form:
Eyt y t  a1 E ( y t 1 y t )  a 2 E ( y t  2 y t )  E ( t y t )
...
Eyt y t  s  a1 E ( y t 1 y t  s )  a 2 E ( y t  2 y t  s )  E ( t y t  s )
Thus,
 0  a1 1  a 2  2   2
 s  a1 s 1  a 2  s  2
Dividing γs by γ0 yields:
 s  a1  s 1  a 2  s  2 (19)
and we know that ρ0=1.
The roots of (19) lie inside the unit circle.
* Autocorrelation function of an MA(1) process. y t   t   t 1
By the Yule-Walker equations, we get:
 0  E ( y t yt )  E    t   t 1   t   t 1    (1   2 ) 2
 1  E ( y t y t 1 )  E    t   t 1   t 1   t  2     2
And  s  E ( y t y t  s )  E    t   t 1   t  s   t  s 1    0s  1
Thus, we have  0  1, 1   /(1   2 ),  s  0, s  1

* Autocorrelation of an ARMA(1,1) process: y t  a1 y t 1   t   t 1
By the Yule-Walker equations, we find that:

 0  a1 1   2    a1    2 (20)
 1  a1 0   2 (21)
 2  a1 1
…
 s  a1 s 1
Solving (20) and (21) simultaneously, we get:

1   2  2a1  2
0  
1  a12
1 
1  a1   a1     2
1  a  2
1
Hence 1 
1  a1   a1    .
1   2
 2a1  
And  s  a1  s 1 for all s  2
6 PARTIAL AUTOCORRELATION
In an AR(1) process, yt and yt-2 are correlated even though yt-2 does not
directly appear in the model. In contrast the partial autocorrelation between
yt and yt-s eliminates the effect of the intervening values. So, for an AR(1)
process, the partial autocorrelation between yt and yt-2 is 0.
The most direct way to obtain partial autocorrelation is to form the series :
y t*  y t  
The coefficient 11 in the following AR(1) is the partial autocorrelation

between yt and yt-1. Since there is no intervening value, this is also the
autocorrelation.
y t*  11 y t*1  et
Similarly,  22 , gives the partial autocorrelation between yt and yt-2.
y t*   21 y t*1   22 y t* 2  et
Partial autocorrelation can also be found from the Yule Walker equations:
11  1
 22    2  12  / 1  12 
s 1
 s    s 1, j  s  j
j 1
 ss  s 1
1    s 1, j  j
j 1
For an AR(p) process, there is no direct correlation between yt and yt-s for s>p.
An MA(1) process can be written as an AR(∞), so always have partial
autocorrelation, decaying slowly over time.
Features of autocorrelation and partial autocorrelation, can be used to determine

the type of your process.
For and ARMA(p,q), the PACF decay start after lag p

To summarise:
Process ACF PACF
White noise All 0 All 0
AR(p) Decay towards 0 Spikes though lag p, 0 after
MA(1) Spike at lag 1, 0 after Decay
ARMA(1,1) Decay after lag 1 Decay after lag 1
ARMA(p,q) Decay beginning at lag q Decay beginning at lag p
7.6 Determining the order of an autoregression
More lags means more information is used, but at the cost of

additional estimation uncertainty (estimating too many
coefficients).
- F-statistics
Start with a model with a large numbers of lags, test whether
the coefficient on the last lag is significant, if not reduce the
number of lags and start the process again. When the true value
of the model is p, the test will still estimate the model to be >p,
5% of the time.
- Information criteria
Information criteria trade off the improvement in the fit of the model with the
number of estimated coefficients. The most popular information criteria are the
Bayes Information Criteria, also called Schwarz information criteria and the Akaike
Information Criteria.
 SSR( p )  ln T
BIC ( p )  ln    ( p  1)
 T  T
,
 SSR( p )  2
AIC ( p )  ln    ( p  1)
 T  T
You choose the model minimizing the information criteria. The difference between the AIC and BIC
is that the term in ln T in the BIC is replaced by 2 in the AIC, so the second term (penalty for number
of lags) is not as large. A smaller decrease in the SSR is needed in the AIC to justify including an
additional regressor. In large sample, AIC will overestimate p with a non-zero probability.
Similarly the optimal number of lags in the additional regressors need
to be estimated. The same methods can be used, if the regressions has
K coefficients including the intercept, then the BIC is:
 SSR( K )  ln T
BIC ( K )  ln   K
 T  T
Important: all models should be estimated using the same
sample:
So make sure to start with the model with the most lags, and keep this
as your working sample for this test.
In practice a convenient shortcut is to impose that all the regressors
have the same number of lags to reduce the number of models that
need comparing.
7.7 Nonstationarity I: trends
If the dependent variable and/or regressors are nonstationary, then
hypothesis testing and forecast will be unreliable. There are two
common type of non stationarity, with their own solutions.
7.7.1 What is a trend?
A trend is a persistent long-term movement of a variable over time.
A deterministic trend is a non-random function of time (linear
in time for example).
- A stochastic trend is random and varies over time. For
example, a stochastic trend in inflation may exhibit a
prolonged period of increase followed by a period of decrease.
Since economic series are the consequences of complex
economic forces, trends are usefully thought of as having a
large unpredictable, random component.
The random walk model of a trend.
The simplest model of a variable with a stochastic trend is the random
walk. A time series Yt is said to follow a random walk if the change in
Yt is iid:
Yt  Yt 1  u t
where ut has conditional mean zero: E (u t | Yt 1 , Yt  2 ,...)  0
The basic idea of a random walk is that the value of the series
tomorrow is its value today plus an unpredictable component.
When series have a tendency to increase or decrease, the random
walk can include a drift component.
t
Yt   0  Yt 1  u t  t 0   u j
j 0
If Yt follows a random walk then it is not stationary: the variance of the
random walk increases over time so the distribution of Yt changes over time.
Var(Yt )=var(Yt-1 ) + var(ut )
For Yt to be stationary, we must have var(Yt)=var(Yt-1 ) which imposes that
var(ut )=0.
Alternatively, say Y0 =0, then Y1=u1 , Y2 =u1 + u2 and more generally, Yt =
u1 +u2 +…+ut . Because, the ut are uncorrelated , var(Yt ) =tσu2
The variance of Yt depends on t and increases as t increases. Because the
variance of a random walk increases without bound, its population
autocorrelations are not defined.
The random walk can be seen as a special case of the AR(1)
model in which β=1, then Yt contains a stochastic trend and a non
stochastic trend. If |β|<1, then Yt is stationary as long as ut is
stationary.
For an AR(p) to be stationary involves the roots of the following
polynomial to be all greater than 1. The roots are found by
solving:
1   1 z   2 z 2  ...   p z p  0
In the special case of an AR(1), the polynomial is simply:
1  1 z  0  z  1  .
1
The condition that the roots are less than unity is equivalent to
1  1
If an AR(p) has a root equals one, the series is said to have a unit (autoregressive)
root. If Yt has a unit root, it contains a stochastic trend and is not stationary. (the two
terms can be used inter-changeably).
If a series has a unit root, the estimator of the autoregressive coefficient in an AR(p) is
biased towards 0, t-stat have a non-normal distribution, two independent series may
appear related.
1) bias towards 0
Suppose that the true model is a random walk: ( Yt  Yt 1  u t ) but the econometrician
estimates an AR(1) ( Yt   1Yt 1  u t ).
Since the series is non stationary, the OLS assumptions are not satisfied and it can be
shown that:
E ( ˆ1 )  1  5.3 / T . So with 20 years of quarterly data, you would expect ˆ1  0.934
Monte carlo with 100 replications gives:
Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------
RES1 | 100 .9270481 .0570009 .7792342 1.010915
1) non normal distribution

If a regressor has a stochastic trend, then OLS t-statistics have a
nonnormal distribution under the null hypothesis. One important
case in which it is possible to tabulate this distribution is in the
context of an AR with unit root; we will go back to this.
1) spurious regression
US inflation was rising from the mid-60s through the early 80’s, so was the
Japanese GDP over the same period.
reg inflation gdp_jp if daten>=19651 & daten<=19814,robust

Regression with robust standard errors Number of obs = 68
F( 1, 66) = 113.38
Prob > F = 0.0000
R-squared = 0.5605
Root MSE = 2.2989
------------------------------------------------------------------------------
| Robust
inflation | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gdp_jp | .1871328 .0175741 10.65 0.000 .152045 .2222207
_cons | -2.938637 .7660354 -3.84 0.000 -4.468076 -1.409198
------------------------------------------------------------------------------
. reg inflation gdp_jp if daten>=19821 & daten<=19994,robust

F( 1, 68) = 5.49
Prob > F = 0.0221
R-squared = 0.0797
Root MSE = 1.5262
------------------------------------------------------------------------------
| Robust
inflation | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gdp_jp | -.0304821 .0130145 -2.34 0.022 -.0564521 -.0045121
_cons | 6.30274 1.378873 4.57 0.000 3.551242 9.054239
------------------------------------------------------------------------------
15
10
-5
19600 19700 19800 19900 20000
daten
150
100
50
0
19600 19700 19800 19900 20000
daten
7.7.2 Testing for unit root
The most commonly used test in practice is the Dickey and Fuller test.
* Dickey Fuller in the AR(1) model

In the AR(1) case, we want to test whether  1  1 , if we cannot reject
the null hypothesis then Yt contains a unit root and is not stationary
(contains a stochastic trend).
However, the test is best implemented by substracting Yt-1 to both sides,
it then becomes:
H0:   0 vs H1:   0
Yt   0  Yt 1  u t where    1  1
The OLS t-stat testing   0 is called the Dickey-Fuller statistics.
Note: the test is one sided because the relevant alternative is that the
series is stationary.
F( 1, 66) = 4.47
Prob > F = 0.0383
R-squared = 0.0852
Root MSE = 1.7954
------------------------------------------------------------------------------
| Robust
dinf | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
inf_1 | -.1559304 .0737577 -2.11 0.038 -.3031924 -.0086683
_cons | 1.07776 .4075892 2.64 0.010 .2639821 1.891538
------------------------------------------------------------------------------
The DF statistics does not have a normal distribution, so the critical values are specific
to the test.
Table 7.1 Critical values for Augmented Dickey and Fuller test
10% 5% 1%
Intercept only -2.57 -2.86 -3.43
Intercept and time trend -3.12 -3.41 -3.96
So in the previous regression we cannot reject at any level of statistical confidence that
  0 , so the series has a unit root, and is not stationary.
* Dickey-Fuller test in the AR(p) model
For an AR(p), the Dickey Fuller test is based on the following regression:
Yt   0  Yt 1   1 Yt 1   2 Yt  2  ...   p Yt  p  u t
(7.7)
H0:   0 vs H1:   0
The ADF statistics is the OLS-t-statistics testing   0 . If H0 is rejected, Yt
is stationary.
The number of p-lags needed is unknown. Studies suggest that for the ADF
it is better to have too many lags rather than too few, so it is recommended
to use the AIC to determine the number of lags for the ADF.
* Dickey Fuller allowing for a linear trend
Some series have an obvious linear trend (Japanese GDP) so it will be
uninformative to test their stationarity without accounting for the trend.
Alternatively, if Yt is stationary around a deterministic linear trend, the
trend must be added to (7.7) which becomes:
Yt   0  t  Yt 1   1 Yt 1   2 Yt  2  ...   p Yt  p  u t
If H0 is rejected, Yt is stationary around a deterministic time trend.
If the series is found to have a unit root, then the first difference of the
series does not have a trend. For example: Yt   0  Yt 1  u t then
Yt   0  u t is stationary.
R q : T h e p o w e r o f a te s t is e q u a l to th e p r o b a b ility o f r e je c tin g a f a ls e n u ll
h y p o th e s is (1 - p r o b T y p e I I) . M o n te C a r lo h a v e s h o w n th a t U R te s t h a v e
lo w p o w e r , th e y c a n n o t d is tin g u is h b e tw e e n a u n it r o o t a n d a s ta tio n a r y
n e a r u n it r o o t p r o c e s s . T h u s th e te s t w ill o f te n in d ic a te th a t a s e rie s c o n ta in s
a UR.
y t  1 .1 y t 1  0 .1 y t  2   t
z t  1 . 1 z t  1  0 . 15 z t  2   t
C h e c k in g f o r U R ,
1  1 .1 y  0 .1 y 2  0
W ith th e f ir s t p r o c e s s , w e h a v e : ( y  1 )( 0 . 1 y  1 )  0
y  1 , y  10
W ith th e s e c o n d p r o c e s s , w e h a v e th e f o llo w in g r o o ts : z = 0 .9 4 0 5 , z = 0 .1 5 9 5 .
S o th e fir s t p r o c e s s h a s a U R a n d th e s e c o n d o n e is s ta tio n a r y .
y z
10
-10
-20
0 100 200 300 400
t
S
imilarly
,itcanb
edifficulttod
isting
uishb
etw
eenatren
dstation
aryandaun
itroo
tprocessw
ith
d
rift.
t
w 10 tt
.02
xt 0 xt1t /3
.02
w x
10
-5
0 100 200 300 400
t
In th e s h o rt ru n , th e fo re c a s t fro m s ta tio n a ry a n d n o n -s ta tio n a ry m o d e ls
w ill b e c lo s e , h o w e v e r th e lo n g te rm fo re c a s t w ill b e q u ite d iffe re n t.
A ls o , th e p o w e r o f th e u n it ro o t te s t is d ra s tic a lly a ffe c te d b y th e d a ta
g e n e ra tin g p ro c e s s . If w e in a p p ro p ria te ly o m it th e in te rc e p t o r tim e
tre n d , th e p o w e r o f th e U R te s t c a n g o to 0 . F o r e x a m p le o m ittin g th e
tre n d le a d s to a n u p w a rd b ia s in th e e s tim a te d v a lu e o f  in :
 Y t   0   t   Y t 1   1 Y t 1   2  Y t 2  .   p  Y t p  u t
( 7 .8 )
T h u s a p ro c e d u re fo r U R te s tin g c a n ta k e th e fo llo w in g fo rm :
1 - U s e th e le a s t r e s tr ic tiv e m o d e l ( 7 .8 ) to te s t f o r U R .
U R te s t h a v e lo w p o w e r to re je c t H o , s o if H o is re je c te d th e re is
n o n e e d to p ro c e e d fu rth e r. If n o t g o to s te p 2 .
2 - T e s t   0 , if n o t u s e (7 .8 ) to te s t fo r U R  s te p 1
If y e s , u s e (7 .7 ) to te s t fo r U R , if H o is re je c te d c o n c lu d e n o u n it ro o t, if
n o t, g o to s te p 3 .
3 - T e s t   0 , if n o t g o b a c k to s te p 2 ,
p
If y e s , u s e  y y   y t1  
j1
 y t j to te s t fo r U R .
7.8 Non stationary: Breaks
A second type of nonstationary arises when the population regression
function changes over the course of the sample
A break can arise either from a discrete change in the population
regression coefficients at a distinct date (policy change) or from a
gradual evolution of the coefficients over a longer period of time
(change in the structure of the economy).
If the break is not noticed, estimates will be based on the average
behaviour of the series over the period of time and not the true
relationship at the end of the period, thus forecast will be poor.
7.8.1 testing for breaks at a known date
To keep it simple, let’s consider the ADL(1,1) model. Let’s denote  the period
at which the break is supposed to have happened.
Create a dummy variable (Dt)taking values 0 before  and 1 after  . D is also
interacted to Yt-1 and Xt-1.
Yt   0   1Yt 1   1 X t 1   0 Dt   1 ( Dt * Yt 1 )   2 ( Dt * X t 1 )  u t
Under the hypothesis of no break,  0   1   2  0 can be tested using a F-test.
Under the alternative of a break, at least one of these coefficients will be
different from 0. This is usually referred as a Chow test.
This approach can be modified to check for a break in a subset of the
coefficients by including only the binary variable interactions for that subset of
regressions of interest.
7.8.2 Testing for break at an unknown date
Often the date of a possible break is unknown, but you may suspect the range
during which the break took place, say between  0 and  1 . The Chow test is used
to test for breaks at all dates between  0 and  1 ., then using the largest of the
resulting F-statistics to test for a break at an unknown date. This is often referred
as Quandt Likelihood Ratio. Since, QLR is the largest of a series of F-statistics,
its distribution is special and depends on the number of restrictions tested q (nbr
of coefficients, including the intercept allowed to break),  0 and  1 , expressed as
a fraction of the total sample size. For the large sample approximation to the
distribution of the QLR to be a good one,  0 and  1 cannot be too close to the end
of the sample, For this reason, the QLR is computed over a trimmed range so that
 0  0.15T and  1  0.85T .
The QLR test can detect a single discrete break, multiple discrete breaks and/or
slow evolution of the regression function. If there is a distinct break in the
regression function, the date at which the largest Chow statistics occurs is an
estimator of the break date.
Say, we want to check that our estimates of the determinants of inflation in the
US over the 1962:I and 1999:4 period. More specifically, we are concerned
that the intercept and unemployment may have changed over time. The first
period we can check for structural break is 0.15T is 1967:4. So we create a
dummy variable for observations after 1967:4 and interact it with
unemployment variables:
Source | SS df MS Number of obs = 152
-------------+------------------------------ F( 13, 138) = 7.41
Model | 184.330595 13 14.1792765 Prob > F = 0.0000
Residual | 283.045198 138 1.91246756 R-squared = 0.3944
-------------+------------------------------ Adj R-squared = 0.3412
Total | 467.375793 151 2.90295524 Root MSE = 1.3829
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
dinf_1 | -.4009554 .0824812 -4.86 0.000 -.5639484 -.2379623
dinf_2 | -.3433158 .0892349 -3.85 0.000 -.5196549 -.1669767
dinf_3 | .0545284 .0850863 0.64 0.523 -.1136126 .2226693
dinf_4 | -.038809 .0754606 -0.51 0.608 -.1879284 .1103105
unemp_1 | -1.719641 1.254766 -1.37 0.173 -4.199214 .7599307
unemp_2 | 3.46834 2.364168 1.47 0.144 -1.203546 8.140225
unemp_3 | -3.370699 2.164944 -1.56 0.122 -7.648893 .9074963
unemp_4 | 1.666702 1.155521 1.44 0.151 -.6167486 3.950152
D | 1.775541 1.839904 0.97 0.336 -1.860335 5.411417
D_unemp_1 | -1.225527 1.351754 -0.91 0.366 -3.896758 1.445703
D_unemp_2 | .2032217 2.560099 0.08 0.937 -4.855847 5.26229
D_unemp_3 | 2.394236 2.370403 1.01 0.314 -2.28997 7.078442
D_unemp_4 | -1.668078 1.255425 -1.33 0.186 -4.148952 .8127955
_cons | -.2276938 1.757672 -0.13 0.897 -3.701068 3.245681
------------------------------------------------------------------------------
. testparm D-D_unemp_4
F( 5, 148) = 0.85
Prob > F = 0.5135
F=0.85, we now re-estimate this model with D=1 if t>=1968:1, and until
1993:I.
For example, a break at 1981:4 leads to
F( 13, 138) = 8.42
Prob > F = 0.0000
R-squared = 0.4223
Root MSE = 1.367
------------------------------------------------------------------------------
| Robust
-------------+----------------------------------------------------------------
dinf_1 | -.4075559 .0932063 -4.37 0.000 -.591853 -.2232587
dinf_2 | -.3777853 .0977229 -3.87 0.000 -.5710131 -.1845574
dinf_3 | .0515292 .0798247 0.65 0.520 -.1063085 .2093669
dinf_4 | -.0260024 .0826179 -0.31 0.753 -.1893631 .1373584
unemp_1 | -2.705181 .6911244 -3.91 0.000 -4.071744 -1.338618
unemp_2 | 3.54704 1.300035 2.73 0.007 .9764752 6.117605
unemp_3 | -2.025859 1.188034 -1.71 0.090 -4.374964 .3232453
unemp_4 | .9846463 .5641419 1.75 0.083 -.1308334 2.100126
D | -.0729984 .9544203 -0.08 0.939 -1.960177 1.81418
D_unemp_1 | -.5718067 .8773241 -0.65 0.516 -2.306543 1.162929
D_unemp_2 | .1754026 1.576346 0.11 0.912 -2.941512 3.292317
D_unemp_3 | 2.79729 1.599601 1.75 0.083 -.3656069 5.960186
D_unemp_4 | -2.432152 .8388761 -2.90 0.004 -4.090865 -.7734395
_cons | 1.350888 .733964 1.84 0.068 -.100382 2.802157
------------------------------------------------------------------------------
. testparm D-D_unemp_4
F( 5, 138) = 3.31
Prob > F = 0.0074
7.8.3 Pseudo out of sample forecast
1) choose the number of observations P for which you will generate pseudo out of
sample forecast, say P=10%. Let’s define s=T-P
2) Estimate the regression on the shortened sample: t=1,..,s
~
3) Compute the forecast for the first period beyond the shortened sample: Ys 1|s
~
4) The forecast error : u~s 1  Ys 1  Ys 1|s
5) Repeat steps 2-4 for each date from T-p+1 to T-1 (reestimating the regression
each time) .
6) The pseudo forecast errors can be examined to see if they are consistent with a
stationary relationship
For example, going back to our prediction of inflation, using data up to
1993:4, we can predict inflation for 1994:1, doing so until 1999:4, we have 24
pseudo forecasts.
F( 13, 114) = 7.37
Prob > F = 0.0000
R-squared = 0.4210
Root MSE = 1.4729
------------------------------------------------------------------------------
| Robust
-------------+----------------------------------------------------------------
dinf_1 | -.4190169 .0998416 -4.20 0.000 -.6168024 -.2212315
dinf_2 | -.3961329 .1031673 -3.84 0.000 -.6005065 -.1917593
dinf_3 | .039491 .0844715 0.47 0.641 -.1278463 .2068283
dinf_4 | -.0449508 .0860523 -0.52 0.602 -.2154198 .1255181
unemp_1 | -2.679112 .6980463 -3.84 0.000 -4.061936 -1.296288
unemp_2 | 3.465039 1.325757 2.61 0.010 .8387247 6.091353
unemp_3 | -1.987951 1.22184 -1.63 0.106 -4.408407 .4325056
unemp_4 | .9924426 .5769953 1.72 0.088 -.1505805 2.135466
D | .4808356 1.389741 0.35 0.730 -2.27223 3.233901
D_unemp_1 | -.9707623 .9465191 -1.03 0.307 -2.845809 .9042847
D_unemp_2 | .6794326 1.700203 0.40 0.690 -2.688656 4.047521
D_unemp_3 | 2.716406 1.821819 1.49 0.139 -.8926028 6.325415
D_unemp_4 | -2.525234 .9671997 -2.61 0.010 -4.441249 -.6092183
_cons | 1.414308 .7407146 1.91 0.059 -.0530417 2.881658
The inflation rate is predicted to rise by 1.9 percentage points. But the true
value is 0.9, so our forecast erro is –1 percentage points.
Doing this 24 times, we find that the average forecast error is 0.37 which is significantly
different from 0 (t=-2.71). This suggests that the forecasts were biased over the period,
systematically forecasting higher inflation. This would suggest that the model has been
unstable (break).
7.9 Cointegration
7.9.1 Cointegration and error correction
Series can move together so closely over the long run that they appear to have the same trend
component. For example, the 3 months and 12months US interest rate.
FYFF FYGM3
19.1
1.73
19591 20004
daten
moreover, the spread between the two series does not appear to have a trend.
-2
19600 19700 19800 19900 20000
daten
The two series have a common stochastic trend, they are said to be cointegrated. .
S u p p o s e X t a n d Y t a re in te g ra te d o f o rd e r 1 . If th e re e x is t a c o e ffic ie n t  s u c h
th a t Y t   X t is in te g ra te d o f o rd e r 0 (s ta tio n a ry ), th e n th e 2 s e rie s a re s a id to b e
c o in te g ra te d w ith a c o in te g ra tin g c o e ffic ie n t  .
U n it ro o t te s tin g c a n b e e x te n d e d to te s t fo r c o in te g ra tio n . If X t and Y t are

c o in te g ra te d , th e n Y t   X t is I(0 ) (th e n u ll h y p o th e s is o f a u n it ro o t is re je c te d )
o th e rw is e Y t   X t is I(1 ).
* T e s tin g fo r c o in te g ra tio n w h e n  is k n o w n .
In s o m e c a s e s , e c o n o m ic th e o ry s u g g e s ts a v a lu e o f  . In th is c a s e a D F te s t o n
th e s e rie s z t  Y t   X is c o n d u c te d .
In our example, let’s assume that theory suggest that  =1. There is no trend in dspread, so we
simply estimate:
. reg dspread spread_1 dspread_1 dspread_2 dspread_3 dspread_4

-------------+------------------------------ F( 5, 157) = 11.73
Model | 20.0646226 5 4.01292452 Prob > F = 0.0000
Residual | 53.706531 157 .342079815 R-squared = 0.2720
-------------+------------------------------ Adj R-squared = 0.2488
Total | 73.7711536 162 .455377491 Root MSE = .58488
------------------------------------------------------------------------------
dspread | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
spread_1 | -.2506278 .0719562 -3.48 0.001 -.3927548 -.1085007
dspread_1 | -.283247 .091436 -3.10 0.002 -.4638504 -.1026437
dspread_2 | .0230289 .0910197 0.25 0.801 -.1567521 .20281
dspread_3 | -.0599991 .0895151 -0.67 0.504 -.2368085 .1168102
dspread_4 | .048277 .0791148 0.61 0.543 -.1079897 .2045436
_cons | .1548892 .063015 2.46 0.015 .0304227 .2793557
------------------------------------------------------------------------------
Lags AIC
4 -1.049
3 -1.059
2 -1.063
1 -1.072
The t-stat on spread_1 = -3.48, which is greater than the critical value (1% of the ADF) so we reject
the null hypothesis that   0 , the series does not have a unit root, and is therefore I(0). The 2
interest rate series are cointegrated.
* testing for cointegration when  is unknown.
In general  is unknown, the cointegration coefficient must be estimated prior to testing for
unit root. This preliminary step makes it necessary to use different critical values for the
subsequent unit root test.
Step 1: estimate Yt    X t   t (7.12)
Step2: a Dickey Fuller t-test is used to test for unit root in the residuals from (1): ̂ t
This procedure is called the Engle-Granger Augmented Dickey Fuller Test. Critical values for
the EGADF are:
Nbr of X in (7.12): 10% 5% 1%
Cointegrated variables
1 -3.12 -3.41 -3.96
2 -3.52 -3.80 -4.36
3 -3.84 -4.16 -4.73
4 -4.20 -4.49 -5.07
. reg dnu nu_1 dnu_1 dnu_2 dnu_3 dnu_4

-------------+------------------------------ F( 5, 157) = 21.53
Model | 31.2052888 5 6.24105775 Prob > F = 0.0000
Residual | 45.5212134 157 .289944035 R-squared = 0.4067
-------------+------------------------------ Adj R-squared = 0.3878
Total | 76.7265022 162 .473620384 Root MSE = .53846
------------------------------------------------------------------------------
dnu | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
nu_1 | -.5739985 .1150186 -4.99 0.000 -.8011821 -.3468149
dnu_1 | -.1574595 .1139771 -1.38 0.169 -.3825858 .0676667
dnu_2 | .0752181 .1052652 0.71 0.476 -.1327006 .2831369
dnu_3 | .0053021 .0974368 0.05 0.957 -.1871541 .1977583
dnu_4 | .1237554 .0782992 1.58 0.116 -.0309003 .278411
_cons | .0016953 .0421806 0.04 0.968 -.0816193 .0850099
Reject the null hypothesis of a unit root, the two series are cointegrated.
*Error correction model
If 2 series are cointegrated, then the forecast of Yt and X t can be improved by including an
error correction term.
If Xt and Yt are cointegrated, one way to eliminate the stochastic trend is to compute the series
Yt  X t which is stationary and can be used for analysis. The term Yt  X t is called the error
correction term
Yt   0   1 Yt 1  ....   p Yt  p   1 X t 1  ...   q X t  q   (Yt 1  X t 1 )  u t
similarly, we also have:

X t   0   1 Yt 1  ....   p Yt  p   1 X t 1  ...   q X t  q   (Yt 1  X t 1 )  u t
if  is unknown, then the Error Correction Models can be estimated using ˆ t 1 .

In te re s t ra te c h a n g e a c c o rd in g to s to c h a s tic s h o c k s a n d p re v io u s p e rio d
d e v ia tio n fr o m th e lo n g - te rm e q u ilib r iu m Y t   X t = 0 . A lp h a s c a n b e
in te r p re te d a s th e s p e e d o f a d ju s tm e n t.
T h e a b s e n c e o f G r a n g e r c a u s a lit y fo r c o in te g r a te d v a r ia b le s r e q u ire s
th a t th e s p e e d o f a d ju s tm e n t is 0 a s w e ll a s a ll g a m m a s ( re s p , a ll b e ta s )
to b e 0 . O f c o u rs e a t le a s t o n e o f th e a lp h a s h a s to b e n o n - z e r o fo r th e 2
s e r ie s to b e c o in te g r a te d .
F o r  y t to b e I(0 ), Y t   X t n e e d s to b e I( 0 ) s in c e th e e r ro r te r m a n d
a ll firs t d iffe r e n c e te rm s a re I( 0 ) , h e n c e th e 2 s e r ie s a r e c o in te g r a te d
C (1 ,1 ).
reg dfyff dfyff_1-dfyff_4 dfy3m_1-dfy3m_4 spread_1

-------------+------------------------------ F( 9, 153) = 4.74
Model | 69.5220297 9 7.72466996 Prob > F = 0.0000
Residual | 249.54597 153 1.63101941 R-squared = 0.2179
-------------+------------------------------ Adj R-squared = 0.1719
Total | 319.067999 162 1.96955555 Root MSE = 1.2771
------------------------------------------------------------------------------
dfyff | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
dfyff_1 | -.0014132 .2136881 -0.01 0.995 -.4235733 .4207468
dfyff_2 | -.0264828 .2208415 -0.12 0.905 -.4627751 .4098095
dfyff_3 | .1002626 .2129522 0.47 0.638 -.3204438 .5209689
dfyff_4 | .1444413 .1802188 0.80 0.424 -.2115972 .5004798
dfy3m_1 | .0068489 .2541142 0.03 0.979 -.4951767 .5088745
dfy3m_2 | -.1758844 .275382 -0.64 0.524 -.7199263 .3681576
dfy3m_3 | .2220654 .2653096 0.84 0.404 -.3020777 .7462086
dfy3m_4 | -.3159166 .2272404 -1.39 0.166 -.7648506 .1330174
spread_1 | -.4598352 .1585354 -2.90 0.004 -.7730361 -.1466342
_cons | .2955998 .1381308 2.14 0.034 .0227098 .5684897
------------------------------------------------------------------------------
the lag spread does help to predict change in interest rate in the one year treasure bond rate.

Stationarity, Cointegration: Arnaud Chevalier University College Dublin January 2004

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stationarity, Cointegration: Arnaud Chevalier University College Dublin January 2004

Uploaded by

Copyright:

Available Formats

Stationarity, cointegration

If |a1|<1 and t large, yt is a stationary process

First consider an ARMA(2,1):

The covariance between yt and yt-s is constant and independent of t.

These results can be generalised to an ARMA(p,q)

 A finite MA process will always be stationary

For an AR(1) process a necessary condition for stationarity is |a 1|<1

For an AR(2) to be stationary, the roots of (1-a1L-a2L2) need to be

Thus, we have  0  1, 1   /(1   2 ),  s  0, s  1

By the Yule-Walker equations, we find that:

Solving (20) and (21) simultaneously, we get:

The coefficient 11 in the following AR(1) is the partial autocorrelation

Features of autocorrelation and partial autocorrelation, can be used to determine

For and ARMA(p,q), the PACF decay start after lag p

More lags means more information is used, but at the cost of

In the special case of an AR(1), the polynomial is simply:

1) non normal distribution

reg inflation gdp_jp if daten>=19651 & daten<=19814,robust

. reg inflation gdp_jp if daten>=19821 & daten<=19994,robust

* Dickey Fuller in the AR(1) model

If H0 is rejected, Yt is stationary around a deterministic time trend.

U n it ro o t te s tin g c a n b e e x te n d e d to te s t fo r c o in te g ra tio n . If X t and Y t are

Source | SS df MS Number of obs = 163

Source | SS df MS Number of obs = 163

error correction term.

similarly, we also have:

if  is unknown, then the Error Correction Models can be estimated using ˆ t 1 .

Source | SS df MS Number of obs = 163

You might also like