Differencing and Unit Root Tests

DIFFERENCING AND UNIT ROOT TESTS
In the Box-Jenkins approach to analyzing time series, a key question is whether to difference the
data, i.e., to replace the raw data {yt } by the differenced series {yt −yt −1}. Experience indicates that
most economic time series tend to wander and are not stationary, but that differencing often yields a
stationary result. A key example, which often provides a fairly good description of actual data, is the
random walk, yt = yt −1 + εt , where {εt } is white noise, assumed here to be iid . We can think of this as
an AR (1) process, yt = ρyt −1 + εt with ρ = 1. But since it has ρ = 1, the random walk is not stationary.
Indeed, for an AR (1) to be stationary, it is necessary that all roots of the polynomial 1 − ρz have
modulus greater than 1 (i.e., the roots must all be outside the unit circle). Since the root of 1 − ρz is
z = 1/ρ, we see that the AR (1) is stationary if and only if −1 < ρ < 1. The first difference is stationary,
however, since yt − yt −1 = εt , a stationary white noise process.
In general, we say that a time series {yt } is integrated of order 1 if {yt } is not stationary but the
first difference {yt − yt −1} is stationary. If {yt } is integrated of order 1, it is considered important to
difference the data, primarily because we can then use all of the methodologies developed for stationary
time series to build a model, or to otherwise analyze, the differenced series. This, in turn, improves our
understanding (e.g., provides better forecasts) of the original series, {yt }. For example, in the Box-
Jenkins ARIMA (p , 1 , q ) model, the differenced series is modeled as a stationary ARMA (p , q ) process.
In practice, then, we need to decide whether to build a stationary model for the raw data or for the
differenced data.
More generally, there is the question of how many times we need to difference the data. In the
ARIMA (p , d , q ) model, the d ’th difference is ARMA (p , q ). To obtain the d ’th difference, we
difference the data d times, where d is a nonnegative integer. The series is integrated of order d if
the series and all its differences up to the d −1’st are nonstationary, but the d ’th difference is stationary.
In practice, however, the only integer values of d which seem to occur frequently are 0 and 1. So here,
we will limit our discussion to the question of whether or not to difference the data one time.
If we fail to take a difference when the process is nonstationary, regressions on time will often
yield a spuriously significant linear trend, and our forecast intervals will be much too narrow
-2-
(optimistic) at long lead times. As an example of the latter problem, suppose that the series is actuall y
the random walk yt = yt −1 + εt , but we model it as yt = .9yt −1 + εt . Then we will think that the h -step
mean squared forecast error is E [yt +h − .9h yt ]2, which apparently tends to a finite constant, namely
σY2 = σε2 / (1−.92) ∼

∼ 5.3 σε2. In fact, however, the best h -step predictor is yt , and the mean squared fore-
cast error is E [yt +h − yt ]2 = E [εt +1 + . . . + εt +h ]2 = h σε2, which tends to ∞ with h .
It is also undesirable to take a difference when the process is stationary. Problems arise here
because the difference of a stationary series is not invertible, i.e., cannot be represented as an AR (∞).
For example, if yt = .9yt −1 + εt , so that {yt } is really a stationary AR (1), then the first difference {zt } is
the non-invertible ARMA (1 , 1) process zt = .9zt −1 + εt − εt −1, which has more parameters than the origi-
nal process. Because of the non-invertibility of {zt }, its parameters will be difficult to estimate, and it
will be difficult to construct a forecast of zt +h . Consequently, taking an unnecessary difference will
tend to degrade the quality of forecasts.
Ideally, then, what we would like is a way to decide whether the series is stationary, or integrated
of order 1. A method in widespread use today is to declare the series nonstationary if the sample auto-
correlations decay slowly. If this pattern is observed, then the series is differenced and the autocorrela-
tions of the differenced series are examined to make sure that they decay rapidly, thereby indicating
that the differenced series is stationary. This method is somewhat ad hoc , however. What is really
needed is a more objective way of deciding between the two hypotheses, "stationary" and "integrated of
order 1", without making any further assumptions. Unfortunately, each of these hypotheses covers a vast
range of possibilities, and any classical approach to discriminate between them seems doomed to failure
unless we limit the scope of the hypotheses.
The Dickey-Fuller Test For AR(1)
A test involving much more narrowly-specified null and alternative hypotheses was proposed by
Dickey and Fuller in 1979. In its most basic form, the Dickey-Fuller test compares the null hypothesis
H 0 : yt = yt −1 + εt ,
i.e., that the series is a random walk without drift, against the alternative hypothesis
-3-
H 1 : yt = c + ρyt −1 + εt ,
where c and ρ are constants with e ρ e < 1. According to H 1, the process is a stationary AR (1) with
mean µ = c /(1−ρ). To see this, note that, under H 1, we can write
yt = µ(1−ρ) + ρyt −1 + εt ,
so that
yt − µ = ρ(yt −1 − µ) + εt .
Note that by making the random walk the null hypothesis, Dickey and Fuller are expressing a prefer-
ence for differencing the data unless a strong case can be made that the raw series is stationary. This is
consistent with the conventional wisdom that, most of the time, the data do require differencing. A
type I error corresponds to deciding the process is stationary when it actually isn’t. In this case, we will
fail to recognize that the data should be differenced, and will build a stationary model for our nonsta-
tionary series. A type II error corresponds to deciding the process is nonstationary when it is actually
stationary. Here, we will be inclined to difference the data, even though differencing is not desirable.
Given data y 1 , . . . , yn , define the n −1 dimensional vectors Yt = (y 2 , . . . , yn )′,
Yt −1 = (y 1 , . . . , yn −1)′, ε = (ε2 , . . . , εn )′, and the (n −1)×2 matrix X = (1 , Yt −1). Under both H 0 and H 1,
the data Yt obey the linear regression model
Yt = X β + ε ,
where β = (c , ρ)′, and H 0 corresponds to ρ = 1, c = 0. We can estimate β by least squares,
β̂ = (X ′X )−1X ′Yt .
Denote the t -statistic for ρ̂ by
τµ = (ρ̂ − 1)/s ρ̂ ,
where s ρˆ is the estimated standard error for ρ̂. Note that τµ is easy to calculate, since ρ̂ and s ρ̂ can be
obtained directly from the output of the standard computer regression packages.
The statistic τµ can be used to test H 0 versus H 1. The percentiles of τµ under H 0 are given in
Fuller (1976, p. 371), and a shortened table is given in Dickey (p. 17, attached). The null hypothesis is
-4-
rejected if τµ is less than the tabled value. The tabulations for finite n were based on simulation, assum-
ing εt are iid Gaussian. The tabled values for the asymptotic distribution (n = ∞) are valid as long as
the εt are iid with finite variance. (No Gaussian assumption is needed here.) It should be noted that τµ
does not have a t distribution in finite samples, and does not have a standard normal distribution
asymptotically. In fact, the asymptotic distribution is longer-tailed than the standard normal. For exam-
ple, the asymptotic .01 percentage point of τµ is at −3.43, instead of −2.326 for a standard normal.
Thus, use of the standard normal table would result in an excess of spurious declarations of stationarity.
Of course, the null and alternative hypotheses H 0 and H 1 described above are too narrow to be
very useful in practice. Typically, we want to consider differencing the data because we hope the
difference may be stationary, but we do not want to commit ourselves to the assumption that the series
is (possibly nonstationary) AR (1). Fortunately, it is possible to generalize the Dickey-Fuller test to the
case where the series is a (possibly nonstationary) AR (p +1), where p ≥ 0 is known. To describe the
null and alternative hypotheses more formally, we need to discuss the "operator notation" for ARIMA
models.
The Backshift Operator, Differencing, and Unit Roots
Define the backshift operator B , which operates on a time series {xt }, by Bxt = xt −1. Note that B
shifts time back by one unit. Similarly,
B 2xt = B (Bxt ) = Bxt −1 = xt −2 .
In general, B k xt = xt −k , so that B k shifts time back by k units.
Although B is not a number, it can be treated like one in formal manipulations. In other words,
for many practical purposes, we can "pretend" that B is a number. For example,
(1−2B )(1+2B ) = 1−4B 2. This means that (1−2B )(1+2B ) is an operator, such that
(1−2B )(1+2B ) xt = (1−4B 2) xt = xt − 4xt −2 .
It also means that the linear filter which takes xt to xt − 4xt −2 can be realized by passing {xt } succes-
sively through two linear filters: The first takes xt to yt = xt − 2xt −1, and the second takes yt to yt + 2yt −1.
-5-
The AR (1) model xt = φ xt −1 + εt can be written in operator notation as (1−φB ) xt = εt . To see
why operator notation is convenient, let’s try to formally solve for xt by multiplying both sides by
1/(1−φB ). We get xt = 1/(1−φB ) εt . By a formal power series expansion, we have
1
hhhhh = 1 + φ B + φ 2B 2 + . . .
1−φB
∞ ∞
Thus, we find that xt = [ Σ
k =0
φk B k ]εt , in other words, that xt = Σ
k =0
φk εt −k . This is actually the correct
MA (∞) representation for {xt } if e φ e < 1.
In fact, the backshift operator has exactly the same effect as multiplication by e −i λ inside the
integral in the spectral representation. To see this, recall the spectral representation
π
xt = ∫ e i λt dZX (λ) ,
−π
and note that multiplication by e −i λ inside the integral gives
π π
i λt −i λ
∫e e dZX (λ) = ∫ e i λ(t −1)dZX (λ) = xt −1 .
−π −π
Thus, the backshift operator provides a notation by which we can manipulate linear filters without
explicitly using the spectral representation. To justify our formal manipulations with the backshift
operator, we simply need to verify that the analogous manipulations with the spectral representation are
permissible.
For example, the AR (1) process xt = φ xt −1 + εt , written in terms of the spectral representation, is
π π
i λt −i λ
∫e (1 − φe ) dZX (λ) = ∫ e i λt dZ ε(λ) ,
−π −π
which can be expressed as an MA (∞) by multiplying by 1/(1−φ e −i λ) under both integrals to give
π π
1
∫e
i λt
dZX (λ) = ∫ e i λt hhhhhhhh dZ ε(λ) .
−π −π 1−φ e −i λ
Now, if e φ e < 1, we have the expansion
1
hhhhhhhh
−i λ
= 1 + φ e −i λ + φ2 e −2i λ + . . . ,
1 − φe
-6-
so we get
π ∞
xt = ∫ [e i λt
−π
Σ
k =0
φk e −i λk ] dZ ε(λ) ,
in other words,
∞ π ∞
xt = Σ
k =0
∫
φk
−π
e i λ(t −k )dZ ε(λ) = Σ φk εt −k
k =0
.
The steps in this derivation are all rigorously justifiable, and are analogous to the formal manipulations
made earlier with the backshift operator.
Using the backshift operator, we can now write an ARMA (p ,q ) process {yt } as
φ(B ) yt = θ(B ) εt ,
where φ(B ) = 1 − φ1B . . . − φp B p , and θ(B ) = 1 − θ1B . . . − θq B q . This is the notation originally used
by Box and Jenkins, and is in widespread use today. If θ(B ) has all roots outside the unit circle, then
the process is invertible. If φ(B ) has all roots outside the unit circle, then the process is stationary. We
say that a root of φ(B ) is a unit root if it lies on the unit circle. If φ(B ) has one or more unit roots,
while the other roots are outside the unit circle, then {yt } will be stationary but not explosive. Such a
process provides a useful model for economic time series.
Of special interest is the case where the unit roots are exactly 1. Note that if φ(B ) has exactly d
roots of 1, then {yt } is integrated of order d , since φ(B ) will have a factor of form (1−B )d . In particu-
lar, if {yt } is nonstationary ARMA but the first differences are stationary ARMA , then φ(B ) must have
exactly one root of 1. In the next section, we show how to test the null hypothesis that {yt } is
AR (p +1) with exactly one root of 1 against that {yt } is stationary AR (p +1).
The Dickey-Fuller Test For AR(p+1)
We now assume that {yt } is the AR (p +1) process
φ(B )(1 − ρB )yt = c + εt ,
where φ(B ) = 1 − φ1B . . . − φp B p has all roots outside the unit circle, p is known, {εt } is iid white
-7-
noise with zero mean, and c = φ(1)(1 − ρ)µ. (Prove this last equation for homework). Define
Yt = (yp +2 , . . . , yn )′. Define the differencing operator ∆ by ∆ = 1 − B , so that ∆ yt = yt − yt −1, and
∆ Yt = Yt − Yt −1.
The null hypothesis is that ρ = 1, in which case {yt } is nonstationary, with exactly one unit root,
and ∆ yt is a stationary AR (p ). The alternative hypothesis is that ρ < 1, in which case {yt } is stationary
AR (p +1). The τµ statistic is based on the coefficient of Yt −1 in a linear regression of ∆ Yt on 1, Yt −1,
∆ Yt −1 , . . . , ∆ Yt −p . Here, we present some motivation for this idea.
First, note that for any ρ,
c + εt = φ(B )(1−ρB )yt = (1 − φ1B . . . − φp B p )( (1−B ) + (1−ρ)B ) )yt
= (1−B )yt − (φ1B + . . . + φp B p )(1−B )yt + φ(B )(1−ρ)yt −1 .
Therefore,
∆ yt = c + (φ1B + . . . + φp B p )∆ yt + φ(B )(ρ−1)yt −1 + [φ(1)(ρ−1)yt −1 − φ(1)(ρ−1)yt −1] + εt
= c + φ(1)(ρ−1)yt −1 + (φ1B + . . . + φp B p ) ∆ yt + (ρ−1)(φ(B ) − φ(1))yt −1 + εt . (1)
Now,
p p p
φ(B ) − φ(1) = 1 − Σ
k =1
φk B k − [1 − Σ φk ] = Σ φk (1−B k )
k =1 k =1
.
p
The p ’th order polynomial φ(z ) − φ(1) = Σ φk (1−z k )
k =1
clearly has a root at z = 1. Thus,
φ(z ) − φ(1) = (1−z ) γ(z ) for some polynomial γ(z ) of degree p −1. So
(φ(B ) − φ(1))yt −1 = (1−B ) γ(B )yt −1 = γ(B )∆ yt −1.
Thus from (1),
∆ yt = c + φ(1)(ρ−1)yt −1 + [Linear Combination of ∆ yt −1 , . . . , ∆yt −p ] + εt .
So if we regress ∆ Yt on 1, Yt −1, ∆ Yt −1 , . . . , ∆Yt −p then the coefficient β̂1 of Yt −1 will be a consistent
estimate of φ(1)(ρ−1). Since φ(1) > 0 (Homework), we can rewrite the null and alternative hypothesis as
-8-
H 0 : φ(1)(ρ−1) = 0
H 1 : φ(1)(ρ−1) < 0 .
We now define the test statistic as τµ = β̂1/s β̂ . The asymptotic distribution of τµ under H 0 is the same
1
as for the case p = 0 described earlier. So, at least in large samples, the Dickey-Fuller test can still be
carried out in the case where {yt } is a (possibly nonstationary) AR (p +1), assuming that p is known. If
H 0 is true, then we also get estimates of the AR parameters: If ρ = 1, then (1) reduces to
∆ yt = (φ1B + . . . + φp B p )∆ yt + εt ,
so the coefficients of ∆ Yt −1 , . . . , ∆ Yt −p in the regression described above provide consistent estimates
of φ1 , . . . , φp .

Differencing and Unit Root Tests

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Differencing and Unit Root Tests

Uploaded by

Copyright:

Available Formats

DIFFERENCING AND UNIT ROOT TESTS

however, since yt − yt −1 = εt , a stationary white noise process.

σY2 = σε2 / (1−.92) ∼

cast error is E [yt +h − yt ]2 = E [εt +1 + . . . + εt +h ]2 = h σε2, which tends to ∞ with h .

will be difficult to construct a forecast of zt +h . Consequently, taking an unnecessary difference will

tend to degrade the quality of forecasts.

unless we limit the scope of the hypotheses.

The Dickey-Fuller Test For AR(1)

mean µ = c /(1−ρ). To see this, note that, under H 1, we can write

Given data y 1 , . . . , yn , define the n −1 dimensional vectors Yt = (y 2 , . . . , yn )′,

the data Yt obey the linear regression model

where β = (c , ρ)′, and H 0 corresponds to ρ = 1, c = 0. We can estimate β by least squares,

Denote the t -statistic for ρ̂ by

The Backshift Operator, Differencing, and Unit Roots

shifts time back by one unit. Similarly,

B 2xt = B (Bxt ) = Bxt −1 = xt −2 .

In general, B k xt = xt −k , so that B k shifts time back by k units.

(1−2B )(1+2B ) xt = (1−4B 2) xt = xt − 4xt −2 .

The AR (1) model xt = φ xt −1 + εt can be written in operator notation as (1−φB ) xt = εt . To see

1/(1−φB ). We get xt = 1/(1−φB ) εt . By a formal power series expansion, we have

MA (∞) representation for {xt } if e φ e < 1.

and note that multiplication by e −i λ inside the integral gives

Now, if e φ e < 1, we have the expansion

made earlier with the backshift operator.

process provides a useful model for economic time series.

The Dickey-Fuller Test For AR(p+1)

We now assume that {yt } is the AR (p +1) process

φ(B )(1 − ρB )yt = c + εt ,

Yt = (yp +2 , . . . , yn )′. Define the differencing operator ∆ by ∆ = 1 − B , so that ∆ yt = yt − yt −1, and

AR (p +1). The τµ statistic is based on the coefficient of Yt −1 in a linear regression of ∆ Yt on 1, Yt −1,

∆ Yt −1 , . . . , ∆ Yt −p . Here, we present some motivation for this idea.

First, note that for any ρ,

c + εt = φ(B )(1−ρB )yt = (1 − φ1B . . . − φp B p )( (1−B ) + (1−ρ)B ) )yt

= (1−B )yt − (φ1B + . . . + φp B p )(1−B )yt + φ(B )(1−ρ)yt −1 .

∆ yt = c + (φ1B + . . . + φp B p )∆ yt + φ(B )(ρ−1)yt −1 + [φ(1)(ρ−1)yt −1 − φ(1)(ρ−1)yt −1] + εt

= c + φ(1)(ρ−1)yt −1 + (φ1B + . . . + φp B p ) ∆ yt + (ρ−1)(φ(B ) − φ(1))yt −1 + εt . (1)

(φ(B ) − φ(1))yt −1 = (1−B ) γ(B )yt −1 = γ(B )∆ yt −1.

Thus from (1),

∆ yt = c + φ(1)(ρ−1)yt −1 + [Linear Combination of ∆ yt −1 , . . . , ∆yt −p ] + εt .

So if we regress ∆ Yt on 1, Yt −1, ∆ Yt −1 , . . . , ∆Yt −p then the coefficient β̂1 of Yt −1 will be a consistent

so the coefficients of ∆ Yt −1 , . . . , ∆ Yt −p in the regression described above provide consistent estimates

You might also like