Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

DIFFERENCING AND UNIT ROOT TESTS

In the Box-Jenkins approach to analyzing time series, a key question is whether to difference the

data, i.e., to replace the raw data {yt } by the differenced series {yt −yt −1}. Experience indicates that

most economic time series tend to wander and are not stationary, but that differencing often yields a

stationary result. A key example, which often provides a fairly good description of actual data, is the

random walk, yt = yt −1 + εt , where {εt } is white noise, assumed here to be iid . We can think of this as

an AR (1) process, yt = ρyt −1 + εt with ρ = 1. But since it has ρ = 1, the random walk is not stationary.

Indeed, for an AR (1) to be stationary, it is necessary that all roots of the polynomial 1 − ρz have

modulus greater than 1 (i.e., the roots must all be outside the unit circle). Since the root of 1 − ρz is

z = 1/ρ, we see that the AR (1) is stationary if and only if −1 < ρ < 1. The first difference is stationary,

however, since yt − yt −1 = εt , a stationary white noise process.

In general, we say that a time series {yt } is integrated of order 1 if {yt } is not stationary but the

first difference {yt − yt −1} is stationary. If {yt } is integrated of order 1, it is considered important to

difference the data, primarily because we can then use all of the methodologies developed for stationary

time series to build a model, or to otherwise analyze, the differenced series. This, in turn, improves our

understanding (e.g., provides better forecasts) of the original series, {yt }. For example, in the Box-

Jenkins ARIMA (p , 1 , q ) model, the differenced series is modeled as a stationary ARMA (p , q ) process.

In practice, then, we need to decide whether to build a stationary model for the raw data or for the

differenced data.

More generally, there is the question of how many times we need to difference the data. In the

ARIMA (p , d , q ) model, the d ’th difference is ARMA (p , q ). To obtain the d ’th difference, we

difference the data d times, where d is a nonnegative integer. The series is integrated of order d if

the series and all its differences up to the d −1’st are nonstationary, but the d ’th difference is stationary.

In practice, however, the only integer values of d which seem to occur frequently are 0 and 1. So here,

we will limit our discussion to the question of whether or not to difference the data one time.

If we fail to take a difference when the process is nonstationary, regressions on time will often

yield a spuriously significant linear trend, and our forecast intervals will be much too narrow
-2-

(optimistic) at long lead times. As an example of the latter problem, suppose that the series is actuall y

the random walk yt = yt −1 + εt , but we model it as yt = .9yt −1 + εt . Then we will think that the h -step

mean squared forecast error is E [yt +h − .9h yt ]2, which apparently tends to a finite constant, namely

σY2 = σε2 / (1−.92) ∼


∼ 5.3 σε2. In fact, however, the best h -step predictor is yt , and the mean squared fore-

cast error is E [yt +h − yt ]2 = E [εt +1 + . . . + εt +h ]2 = h σε2, which tends to ∞ with h .

It is also undesirable to take a difference when the process is stationary. Problems arise here

because the difference of a stationary series is not invertible, i.e., cannot be represented as an AR (∞).

For example, if yt = .9yt −1 + εt , so that {yt } is really a stationary AR (1), then the first difference {zt } is

the non-invertible ARMA (1 , 1) process zt = .9zt −1 + εt − εt −1, which has more parameters than the origi-

nal process. Because of the non-invertibility of {zt }, its parameters will be difficult to estimate, and it

will be difficult to construct a forecast of zt +h . Consequently, taking an unnecessary difference will

tend to degrade the quality of forecasts.

Ideally, then, what we would like is a way to decide whether the series is stationary, or integrated

of order 1. A method in widespread use today is to declare the series nonstationary if the sample auto-

correlations decay slowly. If this pattern is observed, then the series is differenced and the autocorrela-

tions of the differenced series are examined to make sure that they decay rapidly, thereby indicating

that the differenced series is stationary. This method is somewhat ad hoc , however. What is really

needed is a more objective way of deciding between the two hypotheses, "stationary" and "integrated of

order 1", without making any further assumptions. Unfortunately, each of these hypotheses covers a vast

range of possibilities, and any classical approach to discriminate between them seems doomed to failure

unless we limit the scope of the hypotheses.

The Dickey-Fuller Test For AR(1)

A test involving much more narrowly-specified null and alternative hypotheses was proposed by

Dickey and Fuller in 1979. In its most basic form, the Dickey-Fuller test compares the null hypothesis

H 0 : yt = yt −1 + εt ,

i.e., that the series is a random walk without drift, against the alternative hypothesis
-3-

H 1 : yt = c + ρyt −1 + εt ,

where c and ρ are constants with e ρ e < 1. According to H 1, the process is a stationary AR (1) with

mean µ = c /(1−ρ). To see this, note that, under H 1, we can write

yt = µ(1−ρ) + ρyt −1 + εt ,

so that

yt − µ = ρ(yt −1 − µ) + εt .

Note that by making the random walk the null hypothesis, Dickey and Fuller are expressing a prefer-

ence for differencing the data unless a strong case can be made that the raw series is stationary. This is

consistent with the conventional wisdom that, most of the time, the data do require differencing. A

type I error corresponds to deciding the process is stationary when it actually isn’t. In this case, we will

fail to recognize that the data should be differenced, and will build a stationary model for our nonsta-

tionary series. A type II error corresponds to deciding the process is nonstationary when it is actually

stationary. Here, we will be inclined to difference the data, even though differencing is not desirable.

Given data y 1 , . . . , yn , define the n −1 dimensional vectors Yt = (y 2 , . . . , yn )′,

Yt −1 = (y 1 , . . . , yn −1)′, ε = (ε2 , . . . , εn )′, and the (n −1)×2 matrix X = (1 , Yt −1). Under both H 0 and H 1,

the data Yt obey the linear regression model

Yt = X β + ε ,

where β = (c , ρ)′, and H 0 corresponds to ρ = 1, c = 0. We can estimate β by least squares,

β̂ = (X ′X )−1X ′Yt .

Denote the t -statistic for ρ̂ by

τµ = (ρ̂ − 1)/s ρ̂ ,

where s ρˆ is the estimated standard error for ρ̂. Note that τµ is easy to calculate, since ρ̂ and s ρ̂ can be

obtained directly from the output of the standard computer regression packages.

The statistic τµ can be used to test H 0 versus H 1. The percentiles of τµ under H 0 are given in

Fuller (1976, p. 371), and a shortened table is given in Dickey (p. 17, attached). The null hypothesis is
-4-

rejected if τµ is less than the tabled value. The tabulations for finite n were based on simulation, assum-

ing εt are iid Gaussian. The tabled values for the asymptotic distribution (n = ∞) are valid as long as

the εt are iid with finite variance. (No Gaussian assumption is needed here.) It should be noted that τµ

does not have a t distribution in finite samples, and does not have a standard normal distribution

asymptotically. In fact, the asymptotic distribution is longer-tailed than the standard normal. For exam-

ple, the asymptotic .01 percentage point of τµ is at −3.43, instead of −2.326 for a standard normal.

Thus, use of the standard normal table would result in an excess of spurious declarations of stationarity.

Of course, the null and alternative hypotheses H 0 and H 1 described above are too narrow to be

very useful in practice. Typically, we want to consider differencing the data because we hope the

difference may be stationary, but we do not want to commit ourselves to the assumption that the series

is (possibly nonstationary) AR (1). Fortunately, it is possible to generalize the Dickey-Fuller test to the

case where the series is a (possibly nonstationary) AR (p +1), where p ≥ 0 is known. To describe the

null and alternative hypotheses more formally, we need to discuss the "operator notation" for ARIMA

models.

The Backshift Operator, Differencing, and Unit Roots

Define the backshift operator B , which operates on a time series {xt }, by Bxt = xt −1. Note that B

shifts time back by one unit. Similarly,

B 2xt = B (Bxt ) = Bxt −1 = xt −2 .

In general, B k xt = xt −k , so that B k shifts time back by k units.

Although B is not a number, it can be treated like one in formal manipulations. In other words,

for many practical purposes, we can "pretend" that B is a number. For example,

(1−2B )(1+2B ) = 1−4B 2. This means that (1−2B )(1+2B ) is an operator, such that

(1−2B )(1+2B ) xt = (1−4B 2) xt = xt − 4xt −2 .

It also means that the linear filter which takes xt to xt − 4xt −2 can be realized by passing {xt } succes-

sively through two linear filters: The first takes xt to yt = xt − 2xt −1, and the second takes yt to yt + 2yt −1.
-5-

The AR (1) model xt = φ xt −1 + εt can be written in operator notation as (1−φB ) xt = εt . To see

why operator notation is convenient, let’s try to formally solve for xt by multiplying both sides by

1/(1−φB ). We get xt = 1/(1−φB ) εt . By a formal power series expansion, we have

1
hhhhh = 1 + φ B + φ 2B 2 + . . .
1−φB

∞ ∞
Thus, we find that xt = [ Σ
k =0
φk B k ]εt , in other words, that xt = Σ
k =0
φk εt −k . This is actually the correct

MA (∞) representation for {xt } if e φ e < 1.

In fact, the backshift operator has exactly the same effect as multiplication by e −i λ inside the

integral in the spectral representation. To see this, recall the spectral representation

π
xt = ∫ e i λt dZX (λ) ,
−π

and note that multiplication by e −i λ inside the integral gives

π π
i λt −i λ
∫e e dZX (λ) = ∫ e i λ(t −1)dZX (λ) = xt −1 .
−π −π

Thus, the backshift operator provides a notation by which we can manipulate linear filters without

explicitly using the spectral representation. To justify our formal manipulations with the backshift

operator, we simply need to verify that the analogous manipulations with the spectral representation are

permissible.

For example, the AR (1) process xt = φ xt −1 + εt , written in terms of the spectral representation, is

π π
i λt −i λ
∫e (1 − φe ) dZX (λ) = ∫ e i λt dZ ε(λ) ,
−π −π

which can be expressed as an MA (∞) by multiplying by 1/(1−φ e −i λ) under both integrals to give

π π
1
∫e
i λt
dZX (λ) = ∫ e i λt hhhhhhhh dZ ε(λ) .
−π −π 1−φ e −i λ

Now, if e φ e < 1, we have the expansion

1
hhhhhhhh
−i λ
= 1 + φ e −i λ + φ2 e −2i λ + . . . ,
1 − φe
-6-

so we get

π ∞
xt = ∫ [e i λt
−π
Σ
k =0
φk e −i λk ] dZ ε(λ) ,

in other words,

∞ π ∞
xt = Σ
k =0

φk
−π
e i λ(t −k )dZ ε(λ) = Σ φk εt −k
k =0
.

The steps in this derivation are all rigorously justifiable, and are analogous to the formal manipulations

made earlier with the backshift operator.

Using the backshift operator, we can now write an ARMA (p ,q ) process {yt } as

φ(B ) yt = θ(B ) εt ,

where φ(B ) = 1 − φ1B . . . − φp B p , and θ(B ) = 1 − θ1B . . . − θq B q . This is the notation originally used

by Box and Jenkins, and is in widespread use today. If θ(B ) has all roots outside the unit circle, then

the process is invertible. If φ(B ) has all roots outside the unit circle, then the process is stationary. We

say that a root of φ(B ) is a unit root if it lies on the unit circle. If φ(B ) has one or more unit roots,

while the other roots are outside the unit circle, then {yt } will be stationary but not explosive. Such a

process provides a useful model for economic time series.

Of special interest is the case where the unit roots are exactly 1. Note that if φ(B ) has exactly d

roots of 1, then {yt } is integrated of order d , since φ(B ) will have a factor of form (1−B )d . In particu-

lar, if {yt } is nonstationary ARMA but the first differences are stationary ARMA , then φ(B ) must have

exactly one root of 1. In the next section, we show how to test the null hypothesis that {yt } is

AR (p +1) with exactly one root of 1 against that {yt } is stationary AR (p +1).

The Dickey-Fuller Test For AR(p+1)

We now assume that {yt } is the AR (p +1) process

φ(B )(1 − ρB )yt = c + εt ,

where φ(B ) = 1 − φ1B . . . − φp B p has all roots outside the unit circle, p is known, {εt } is iid white
-7-

noise with zero mean, and c = φ(1)(1 − ρ)µ. (Prove this last equation for homework). Define

Yt = (yp +2 , . . . , yn )′. Define the differencing operator ∆ by ∆ = 1 − B , so that ∆ yt = yt − yt −1, and

∆ Yt = Yt − Yt −1.

The null hypothesis is that ρ = 1, in which case {yt } is nonstationary, with exactly one unit root,

and ∆ yt is a stationary AR (p ). The alternative hypothesis is that ρ < 1, in which case {yt } is stationary

AR (p +1). The τµ statistic is based on the coefficient of Yt −1 in a linear regression of ∆ Yt on 1, Yt −1,

∆ Yt −1 , . . . , ∆ Yt −p . Here, we present some motivation for this idea.

First, note that for any ρ,

c + εt = φ(B )(1−ρB )yt = (1 − φ1B . . . − φp B p )( (1−B ) + (1−ρ)B ) )yt

= (1−B )yt − (φ1B + . . . + φp B p )(1−B )yt + φ(B )(1−ρ)yt −1 .

Therefore,

∆ yt = c + (φ1B + . . . + φp B p )∆ yt + φ(B )(ρ−1)yt −1 + [φ(1)(ρ−1)yt −1 − φ(1)(ρ−1)yt −1] + εt

= c + φ(1)(ρ−1)yt −1 + (φ1B + . . . + φp B p ) ∆ yt + (ρ−1)(φ(B ) − φ(1))yt −1 + εt . (1)

Now,

p p p
φ(B ) − φ(1) = 1 − Σ
k =1
φk B k − [1 − Σ φk ] = Σ φk (1−B k )
k =1 k =1
.

p
The p ’th order polynomial φ(z ) − φ(1) = Σ φk (1−z k )
k =1
clearly has a root at z = 1. Thus,

φ(z ) − φ(1) = (1−z ) γ(z ) for some polynomial γ(z ) of degree p −1. So

(φ(B ) − φ(1))yt −1 = (1−B ) γ(B )yt −1 = γ(B )∆ yt −1.

Thus from (1),

∆ yt = c + φ(1)(ρ−1)yt −1 + [Linear Combination of ∆ yt −1 , . . . , ∆yt −p ] + εt .

So if we regress ∆ Yt on 1, Yt −1, ∆ Yt −1 , . . . , ∆Yt −p then the coefficient β̂1 of Yt −1 will be a consistent

estimate of φ(1)(ρ−1). Since φ(1) > 0 (Homework), we can rewrite the null and alternative hypothesis as
-8-

H 0 : φ(1)(ρ−1) = 0

H 1 : φ(1)(ρ−1) < 0 .

We now define the test statistic as τµ = β̂1/s β̂ . The asymptotic distribution of τµ under H 0 is the same
1

as for the case p = 0 described earlier. So, at least in large samples, the Dickey-Fuller test can still be

carried out in the case where {yt } is a (possibly nonstationary) AR (p +1), assuming that p is known. If

H 0 is true, then we also get estimates of the AR parameters: If ρ = 1, then (1) reduces to

∆ yt = (φ1B + . . . + φp B p )∆ yt + εt ,

so the coefficients of ∆ Yt −1 , . . . , ∆ Yt −p in the regression described above provide consistent estimates

of φ1 , . . . , φp .

You might also like