Time Series Analysis: Applied Econometrics Prof. Dr. Simone Maxand

Chapter 6
Time series analysis
Applied Econometrics
WS 2020/21
Prof. Dr. Simone Maxand

Humboldt University Berlin
6.1 Introduction 2 | 124
Contents
6.1 Introduction
6.2 Stochastic processes
6.2.1 Basic concepts
6.2.2 Stationarity and ergodicity
6.2.3 Linear processes
6.3 ARMA models
6.3.1 Autoregressive and other ARMA processes
6.3.2 Estimation and forecasting
6.3.3 The Box-Jenkins program
6.4 Nonstationary processes
6.4.1 Unit root processes
6.4.2 Unit root tests
6.4.3 An empirical application with R

Applied Econometrics Chapter 6: Time series analysis
6.1 Introduction
I Time Series (TS): sequence (set) of observations yt of a
random variable over time
. Values of a variable are observed at successive time points.
. Observation at time t ∈ T : yt
I Notation: Write {yt }t∈T
. Sometimes shortly: {yt } or yt (if obvious that the TS and not
the observation at time t is meant)
I Example: Monthly US industrial production index

I T discrete set ⇒ {yt }t∈T TS in discrete time
. Data per hour, day, week, month, quarter, year, etc.
. Special case: T nite, equidistant points in time, i.e.
T = {1, ..., T } ⇒ {yt }t∈T = {y1 , ..., yT }
I TS in continuous time: Observations are recorded continuously
over some time interval, e.g. T = [0, 1].

. We use then the notation y (t) rather than yt .
I In theory: Assume often that {yt }t∈T has started in the
(innite) past (t ≤ 0) and continues to the (innite) future
(t > T ), i.e. {yt }∞

t=−∞ .
. {yt }T
t=1 is considered as nite segment of that innite series.

Typical characteristics of TS data

I yt is typically not independent of yt−1 !
. Strength of dependence is an essential characteristic of TS.
. Examples:
Independence of yt t = 1, . . . , T .
for all
Dependence under stationarity: yt = φyt−1 + εt with |φ| < 1
Integrated process (stochastic trend): yt = yt−1 + εt
Deterministic (linear) trend: yt = β · t + εt
I TS data may have a time-varying variance.
I TS data are often governed by a trend (deterministic/stoch.?).
I TS data have seasonal/cyclic components.
I TS data may have structural breaks.

Goals of time series analysis

I Generally, TS data can be used to answer quantitative
questions for which cross-sectional data are inadequate.
I Description/estimation of dynamic properties
. to gain a better understanding of the DGP:
Are there regularities or structures in the data?
. to check economic theory:
E.g. Quantity Theory of Money: money supply has a direct

proportional relationship to the price level,
. to forecast the future development of an economic variable:
What is next month's ination rate, interest rate, stock price,

etc.?

I (Dynamic) causal dependences between variables:

. How does yt depend on xt−1 ?
. How does xt depend on yt−1 ?
. How does xt depend on yt ?
. Example: What will be the present and future implications of a
change in income for consumption and investment?
. Require Multivariate Time Series Analysis (not in this course).
I Forecasts
. Predict yt based on yt−1 , yt−2 ,...
. Predict yt based on xt−1 , xt−2 ,...
. Example: Forecast of ination rate y by means of its own past
or ADL model: ination rate y is additionally inuenced by
unemployment rate and its lagged values
. Forecasts make sense even without causal interpretation (e.g.

in case of omitted variables).

Time series plots

I A rst impression about the behavior of the TS is provided by
a graphical representation (TS plot).
I A TS plot provides information about
. trends,
. seasonal patterns,
. structural breaks,
. conditional heteroscedasticity,
. outliers, etc.
I Note: When dealing with outliers, common sense is often
more important than statistical theory.

Quarterly German GDP and rst dierences
Quarterly German log(real GDP) First dierences (income growth)
6.4 0.06
6.2 0.03
6.0 1983 1988 1993 1998 2003 2008 2013
5.8 -0.03
5.6 -0.06
5.4 -0.09
1983 1988 1993 1998 2003 2008 2013

Daily exchange rate BRA-USD

Daily DAX returns (in %)
10.0 DAX, Veränderung täglich in %

Quelle: Thomson Reuters Datastream
7.5
5.0
2.5
0.0
-2.5
-5.0
-7.5
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Monthly car registrations in Germany

The lag operator

I Applying some operator to a TS (or sequence of (random)
variables) provides a new TS (or sequence).
I Lag- (Backshift-) operator L is dened by:
Lyt := yt−1 (rst lag of yt )
. Convention: L0 yt = yt
I Powers of L are dened in an obvious way (recursively):
Lj yt = L(Lj−1 yt ) (j ≥ 2)
⇒ Lj yt = yt−j , j ≥ 0 (j -th lag of yt )
I Obviously, for some constant c and integers j, k :
Lj c = c and Lj Lk yt = Lj+k yt
Lag polynomials
I The lag operator is a linear operator:
L(cxt + yt ) = cLxt + Lyt

I Lag polynomial: for some set I ⊆ Z,
X
c(L) = cj Lj
j∈I
I A lag polynomial describes a linear lter :
X X
yt∗ := c(L)yt = cj Lj yt = cj yt−j
j∈I j∈I
P
. Convention: c(1) = cj
I Algebra of lag polynomials is isomorphic to the algebra of
usual polynomials (in real or complex variables).

Dierence operator
I Dierence operator ∆ of rst order is dened by:
∆yt = yt − yt−1 .
⇒ yt∗ = ∆yt = yt − yt−1 is a linear lter (dierence lter).
⇒ ∆ = 1 − L, i.e. ∆yt = yt − yt−1 = (1 − L)yt

I Dierence operator ∆p of order p ≥ 1: recursively dened by
∆p yt := ∆(∆p−1 yt ) = ∆p−1 yt − ∆p−1 yt−1 ; ∆0 yt = yt .

I Polynomials in L and ∆ may be manipulated in the same way
as polynomials in real or complex variables, e.g.:

p
!
p p
X p
∆ = (1 − L) = (−1)j Lj .
j
j=0

Examples
I Dierence operator of order 2:
∆2 yt = ∆(∆yt ) = (yt − yt−1 ) − (yt−1 − yt−2 )

= yt − 2yt−1 + yt−2
= (1 − 2L + L2 )yt = (1 − L)2 yt
I ∆p removes a polynomial of order p (degree p − 1)

I For the Example with p = 2:
∆(α + βt) = [α + βt] − [α + β(t − 1)] = β

⇒ ∆2 (α + βt) = ∆[∆(α + βt)] = ∆(β) = 0

I The rst dierence of the log TS describes the growth rate:

yt
∆ ln(yt ) = ln(yt ) − ln(yt−1 ) = ln
yt−1

yt − yt−1 yt − yt−1 ∆yt
= ln 1 + ≈ = .
yt−1 yt−1 yt−1
I Moving average lter (of order q ):
1 X
yt∗ = yt−j = c(L)yt , where
2q +1
|j|≤q
1
(
2q+1 , if |j| ≤ q
cj =
0, otherwise.
I Seasonal dierence lter (for seasonality s ):

∆s yt = yt − yt−s = (1 − Ls )yt .
. For example, s=4 in case of quarterly data.

Classical decomposition
I Many economic TS exhibit trends and seasonal patters that
are informative but often not of interest for the study.
I Classical decomposition: yt can be written as the sum of a
trend (tt ), seasonal (st ) and random (rt ) component:
yt = tt + st + rt
⇒ Detrended and deseasonalized time series:
rbt = yt − tbt − sbt .

I Estimate trend parametrically (e.g. linear tt = α0 + α1 t ) or by
ltering (see last slide); estimate seasonality by trigonometric
functions → seasonally adjusted data often available.

Autocovariance and autocorrelation

I Assume: yt is realization of real-valued random variable Yt .
I Autocovariance:
γ(t, s) := Cov[Yt , Ys ] = E[(Yt − E[Yt ])(Ys − E[Ys ])]
I Autocorrelation:
ρ(t, s) := Corr [Yt , Ys ] = p

Cov[Ypt , ys ] = p γ(t,ps) ,
V[Yt ] V[Ys ] γ(t, t) γ(s, s)
where γ(t, t) = V[Yt ] := E[(Yt − E[Yt ])2 ].

I The description of dependences of variable between dierent
time points is a main issue in Time Series Analysis.
I Autocovariance/autocorrelation describes linear dependence!
I Obviously it holds: γ(s, t) = γ(t, s).

I γ(s, t) = 0 ⇒ Ys and Yt are uncorrelated, but can nonetheless
(strongly) depend on each other.
I Special case: If (Yt , Ys )0 follow a bivariate normal distribution,
then
γ(s, t) = 0 ⇔ Ys and Yt are stochastically independent.

Sample moments
I Besides plots, sample moments may serve as exploratory tools.
. Meaningful in particular for so-called stationary time series
I Sample mean describes the central location of the series:
T
1 X
y= yt .
T
t=1
I Sample variance describes the variation of the series:
T
2 1
(yt − y )2 .
X
σ
b =
T
t=1
I Sample standard deviation: σ
b

I Sample autocovariance function at lag h (h = 0, 1, . . . , T − 1)

T
1 X
γ
bh = (yt − y )(yt−h − y ) or
T
t=h+1
T
1 X
γ
bh = (yt − y )(yt−h − y ).
T −h
t=h+1
I Sample autocorrelation function (measures dependence over
time):
ρbh = γ
bh /b
γ0 ρh | ≤ 1).
(|b
I (Auto-)correlogram: Plot ρbh against h.

6.2 Stochastic processes 23 | 124
Contents
6.1 Introduction
6.3 ARMA models

6.2 Stochastic processes | 6.2.1 Basic concepts 24 | 124

I The analysis of time series (TS) requires a suitable
mathematical model for the data.
I Each observation yt of the TS is considered as a realization of
a random variable (RV) Yt .

I The observed TS {yt }t∈T0 is a realization of the family of RVs
{Yt }t∈T0 .
I The observed TS is (part of ) a realization of a stochastic
process {Yt }t∈T , T0 ⊆ T .

Stochastic process
I Denition. A stochastic process (SP) is a family of RVs
{Yt }t∈T dened on a probability space (Ω, A, P).
I Here: T =Z and T0 = {1, . . . , T }.

I {Yt }t∈T is dened on Ω×T, with values in some space E.
. In this course: E =R (univariate TS).
p
. Or: E =R (multivariate TS).
I For each xed t ∈ T , Yt is a RV, i.e. Yt = Yt (·) : Ω → E .

I For each xed ω ∈ Ω, Y· (ω) : T → E is a function of time.

SP, cont.
I The functions {Y· (ω)}ω∈Ω on T are called realizations (or
trajectories) of the SP {Yt }t∈T .
I The image space E of the SP {Yt }t∈T is called state (or
phase) space.
I Frequently:
. The term TS is used for both the data and the SP,
. there is no distinction in notation between the RV Yt and its

realization yt = Yt (ω), if meaning is clear from the context.

Example 1
I Let X ∼ N (0, 1) (dened on some space Ω) and dene a SP
{Yt }t∈N by
Yt = (−1)t X
(i.e., more explicitly, Yt (ω) = (−1)t X (ω), ω ∈ Ω, t ∈ N).
I Realizations of this SP: functions of t obtained by xing ω:
yt = (−1)t x, where x = X (ω)
I One realization of the SP: x = 0.45, t = 1, . . . , 20

Example 2 (Binary process)

I Let Yt , t = 1, 2, . . ., be a sequence of i.i.d. random variables
with
1
P(Yt = 1) = P(Yt = −1) = for all t.
2
⇒ It is not so obvious as in Example 1 that there exists a
probability space (Ω, A, P) with RVs Yt dened on Ω having

the required joint distributions, i.e. such that for all n ∈ N and
all i1 , . . . , in ∈ {−1, 1}:

1
P(Y1 = i1 , . . . , Yn = in ) =
2n
(nite-dimensional distributions). The existence of such a SP
is however guaranteed by Kolmogorov's theorem.

Finite-dimensional distributions
I An important characteristic of a (real-valued) SP is the
collection of its nite-dimensional distribution functions
Ft1 ,...,tn (·, . . . , ·),

which are dened for all t1 , . . . , tn with
t1 < t2 < . . . < tn , n = 1, 2, . . . by
Ft1 ,...,tn (y1 , . . . , yn ) = P(Yt1 ≤ y1 , . . . , Ytn ≤ yn ).

I In Example 1, we were able to dene Yt (ω) quite explicitly for
each t and ω.
I In contrast, investigations start frequently by specifying the
collection of all nite-dimensional distributions, cp. Example 2.

Fundamental problem of time series analysis

I Aim: Inference about properties/characteristics of {Yt }.
. We try to explain a variable with regard to its own past (and
the history of random disturbances).
I But: We observe only one single trajectory of the TS.

. We cannot go back in time, let history run again, and observe
another realization of the TS!
I If we could do that several times:

⇒ For each year: several observations
⇒ Averaging over this cross-sectional dimension would provide
a consistent estimator of E[Yt ] for each year t.
. Similarly, other features (higher moments or the distribution
itself, etc.) of the process could be estimated for each year.

Solution
I A profound inference requires:
1. Yt 's have common (or similar) characteristics: If distribution of

Yt does not change over time, the observed values yt of
history can be viewed as realizations of the same distribution.
⇒ Concept of stationarity
2. Observations over time can be used to infer on properties of

each Yt (population properties): If the SP is not too
persistent, each observation yt contains information not
available from the other elements.
⇒ Concept of ergodicity
1 PT
I 1. & 2. ⇒ TS average over time (i.e.
T t=1 Yt ) is a
consistent estimator of the population average E[Yt ].

6.2 Stochastic processes | 6.2.2 Stationarity and ergodicity 32 | 124

Weak stationarity
I Denition. The time series {Yt }t∈Z is called weakly
stationary (or covariance stationary), if:
. E |Yt |2 < ∞

∀ t ∈ Z,
. E[Yt ] = µ ∀ t ∈ Z, and
. γ(s, t) = γ(s + r , t + r ) ∀ s, t, r ∈ Z.

Autocovariance function under stationarity

I If {Yt }t∈Z is weakly stationary, then
γ(s, t) = γ(t − s, 0) ∀s, t

= γ(h, 0) = γ(−h, 0) for h := t − s.
⇒ γ(s, t) depends only on |t − s|.
I Redenition of the autocovariance function (at lag h):
γ(h) := γ(h, 0) = Cov[Yt , Yt−h ] ∀ t, h ∈ Z
with γ(−h) = γ(h) and V[Yt ] := γ(0) (for all t ∈ Z).
I Autocorrelation function (ACF): ρ(h) := γ(h)/γ(0).
. The ACF describes the short-term dynamics of a TS (in
contrast, the trend characterizes the long-run behavior).

Empirical autocorrelation function

PT
d = γ(h) = t=h+1 (yt − y )(yt−h − y)
d
ρ(h) ,
2
PT
γ( 0) t=1 (yt − y )
d
T
1 X
y := yt
T
t=1
T
1
by2 := (yt − y )2 .
X
γ(
d 0) = σ
T
t=1
I Compare Slide 22 for a rst denition.
. Now, the notation has slightly changed.
. Under stationarity, the theoretical ACF is estimated!

Strict stationarity
I Denition. The TS {Yt }t∈Z is called strictly stationary, if for

all n ∈ N and for all t1 , ..., tn , h ∈ Z the (nite-dimensional
0
marginal) distribution of (Yt1 , . . . , Ytn ) and
0
(Yt1 +h , . . . , Ytn +h ) are identical, i.e.
d
(Yt1 , . . . , Ytn )0 = (Yt1 +h , . . . , Ytn +h )0 .
. That is, the nite-dimensional marginal distributions of a

strictly stationary process are shift-invariant.

Relations between stationarity concepts
I If {Yt } is strictly stationary with E|Yt |2 < ∞, then {Yt } is
also weakly stationary.
I The converse of the above implication does not hold in
general, but it holds for Gaussian processes.
I {Yt } is called a Gaussian process if its nite dimensional
distributions are multivariate normal (Gaussian) distributions.

Examples
I Let εt ∼ N (0, σ 2 ) i.i.d. Then:
1. Yt = µ + εt is (strictly and weakly) stationary.
2. Yt = βt + εt is not stationary (because of time trend).
3. Yt = Yt−1 + εt with initial condition Y0 = 0 is not stationary.
Random walk
E[Yt ] ≡ 0
γ(s, t) = σ 2 min(s, t)
I In what follows, the focus will be on second order processes
{Yt } (i.e. with E|Yt |2 < ∞) and on weak stationarity.

Ergodicity
I We have only one observation of the SP and thus only the
1 PT
time average Y = T t=1 Yt .
⇒ When does this converge to µ = E[Yt ]?

I Denition. A weakly stationary SP {Yt }t∈Z is called ergodic
for the mean µ = E[Yt ], if
T
1 X p
Y = Yt → µ.
T
t=1
. Requires that γ(h) goes to 0 as h → ∞.
. Sucient condition:
∞
X
|γ(h)| < ∞. (1)
h=0

I Denition. A weakly stationary process {Yt } is called ergodic
for the second moments, if
T
1 X p
(Yt − µ)(Yt−h − µ) → γ(h) ∀ h.
T −h
t=h+1
I If {Yt } is a Gaussian process, it follows from (1) the ergodicity
for all moments.
I Often, stationarity and ergodicity have the same requirements
(sucient conditions), but the notions are dierent!

Example
I Assume
Yit = µ + λi + νit , i = 1, . . . , I ; t = 1, . . . , T ,
where the λi and νit are independent for all i, t with
λi ∼ N (0, σλ2 ) i.i.d. and

2
νit ∼ N (0, σν ) i.i.d.
I The process {Yit }t∈Z is weakly stationary, but it is not
ergodic.

6.2 Stochastic processes | 6.2.3 Linear processes 41 | 124

White noise processes
I White noise processes are basic building block for many other
processes.
1. A SP {εt } is said to be a white noise process (with mean zero
and variance σ 2 ), written {εt } ∼ WN(0, σ 2 ) ,

⇔ E[εt ] ≡ 0 and (2)
σ2 ,
(
if t=s
E[εt εs ] = (3)
0, otherwise.
. Obviously, a WN-process is weakly stationary with

autocovariance function γ(h) given by (3) with h = t − s.
White noise processes, cont.

2. If, additionally to (2) and (3), εt , εs are independent for t 6= s ,
then {εt } is an independent white noise process, written
{εt } ∼ IWN(0, σ 2 ) .
3. If εt ∼ N (0, σ 2 ) i.i.d., then {εt } is a Gaussian white noise
process, written
{εt } ∼ GWN(0, σ 2 ) .
I Clearly,
{εt } ∼ GWN ⇒ {εt } ∼ IID ⇒ {εt } ∼ IWN ⇒ {εt } ∼ WN
I The designation white originates from the analogy with
white light: It indicates that all possible periodic oscillations
are present with equal strength.

Simulated GWN(0,1) process

GWN(0,1)
2
1
Observations
0
-1
-2
0 100 200 300 400 500 600 700
Time

Linear processes
I Let {εt } ∼ WN(0, σ 2 ).
I Let {cj }j∈Z be a sequence of real-valued, absolutely summable
coecients, i.e.
∞
X ∞
X ∞
X
|cj | := |cj | + |c−j | < ∞ . (4)
j=−∞ j=0 j=1
I Then {cj }j∈Z or the associated lag polynomial,
∞
X
c(L) = cj Lj ,
j=−∞
is called an absolutely summable linear lter.

I An application of a linear lter to a WN process (and adding
possibly a constant) provides a general linear process:
∞
X
Yt = (µ+ ) c(L)εt = (µ+ ) cj εt−j
j=−∞
∞
X ∞
X
= (µ+ ) cj εt−j + c−j εt+j
j=0 j=1
I Linear lters could be dened for arbitrary processes {εt }t∈Z ;

but WN-processes are of particular interest in applications.

Existence of a linear process

I The existence is no problem as long as cj 6= 0 holds only for a
nite number of the coecients.
I Otherwise (4) assures the existence, because then:

∞
X
|cj εt−j | < ∞ with probability one for t∈Z .
j=−∞
Pn
⇒ The sequence j=−n cj εt−j converges almost surely to the
corresponding limiting value Yt (resp. Yt − µ).
I This a.s. limit coincides with the mean square limit, which
exists even under the weaker condition

∞
|cj |2 < ∞.
X
j=−∞
Weak stationarity of linear processes

I Let
(i) {εt }j∈Z be weakly stationary with E[εt ] = µε and
autocovariance function γε , and
(ii) {cj }j∈Z absolutely summable.
I Then
∞
X
Yt = cj εt−j = c(L)εt
j=−∞
is weakly stationary with
 
X
µY = E[Yt ] =  cj  µε = c(1)µε ,
j
XX
γY (h) = ci cj γε (h + i − j).
j i
Causality and ergodicity

I An absolutely summable lter {cj }j∈Z is called causal, if
cj = 0 ∀ j < 0.
I If {εt }t∈Z ∼ WN(0, σ 2 ) and {cj }j∈Z is absolutely summable
and causal, then the SP {Yt }t∈Z dened by

∞
X
Yt := c(L)εt = cj εt−j
j=0
is weakly stationary and causal.
. Causality means that Yt depends on εt , εt−1 , . . ., but not on

εt+1 , εt+2 , . . ..
. More precisely, {Yt }t∈Z is causal w.r.t. {εt }t∈Z .

I Then it holds that
µY = E[Yt ] ≡ 0, and
∞
2
X
γY (h) = E[Yt Yt−h ] = σ cj cj+h = γ(−h) (h ∈ N).
j=0
I Ergodicity for the mean follows from the absolute summability
of the lter {cj }.

I Under (4), ergodicity for the second moments follows e.g. if
εt ∼ (0, σ 2 ) i.i.d. and E[ε4t ] < ∞.

I SP {Yt } is ergodic for all moments, if
{εt } ∼ GWN(0, σ 2 ).

6.3 ARMA models | 50 | 124
Contents
6.1 Introduction
6.3 ARMA models

6.3 ARMA models | 6.3.1 Autoregressive and other ARMA processes 51 | 124
6.3.1 AR and other ARMA processes

Autoregressive (AR) processes
I An autoregressive process {Yt }t∈Z of order p, denoted AR(p ),
satises the following dierence equation (for every t ):

Yt = c + φ1 Yt−1 + . . . + φp Yt−p + εt , {εt } ∼ WN(0, σ 2 ).
I Lag operator notation:
Φp (L)Yt = c + εt , where
Φp (L) = 1 − φ1 L − . . . − φp Lp
is the autoregressive (AR) polynomial.
I Usually, only a stationary solution to the AR(p ) equations is
called AR(p ) process.

I AR model relates a TS to its past values and a current shock.
I E.g., forecasting the ination rate Xt is of interest for

. investors at stock market (how much to pay for bonds?),
. central banks (to decide about monetary policy), or
. rms (to forecast sales of their products).
I Fitting an AR(1) model for quarterly changes Yt = ∆Xt of the
U.S. ination rate (1962-2004, see Stock & Watson, 2007):
ybt = 0.017 − 0.238yt−1 .

. Increase of ination rate in one quarter ⇒ decrease of ination
rate next quarter.
. xT = 3.5(%), xT −1 = 1.6 ⇒ YT = 1.9 (with T = 2004 : 4)
⇒ ybT +1|T = 0.017 − 0.238yT = −0.43 ≈ −0.4
⇒ Forecast of XT +1 : xbT +1|T = xT + ybT +1|T = 3.5 − 0.4 = 3.1

Moving average (MA) processes

I A moving average process {Yt }t∈Z of order q, denoted
MA(q ), is given by:
2
Yt = µ + εt + θ1 εt−1 + . . . + θq εt−q , {εt } ∼ WN(0, σ ).
Yt = µ + Θq (L)εt , where
Θq (L) = 1 + θ1 L + . . . + θq Lq .
is the moving average (MA) polynomial.
. The value of the TS is inuenced by current and past shocks.

Autoregressive-moving average (ARMA)

processes
I An autoregressive-moving average process {Yt }t∈Z of order
(p, q), denoted ARMA(p, q ), satises (for every t ):

Yt = c + φ1 Yt−1 + . . . + φp Yt−p + εt + θ1 εt−1 + . . . + θq εt−q ,
2
where {εt } ∼ WN(0, σ ).
Φp (L)Yt = c + Θq (L)εt , where
Φp (L) = 1 − φ1 L − . . . − φp Lp ,
Θq (L) = 1 + θ1 L + . . . + θq Lq .
. Again, only a stationary solution to the ARMA(p, q ) equations
is usually called ARMA(p, q ) process.
Weak stationarity of an AR(1) process

I Is there a stationary process {Yt } satisfying the AR(1)
equations
2
Yt = c + φYt−1 + εt , {εt } ∼ WN(0, σ )? (5)
I If |φ| < 1 (stability condition), then there exists a unique,
weakly stationary and causal solution to (5):
∞
X c
Yt = µ + φj εt−j , µ=
1 −φ
j=0
I This is an MA(∞) representation of the process with
absolutely summable coecients, which is obviously causal
(and mean ergodic, since the coecients are abs. summable).

I The existence and uniqueness holds e.g. in the mean square
sense.
I For |φ| = 1, there is no stationary solution.
I If |φ| > 1 (explosive case), then there exists a unique and
weakly stationary solution to (5) given by
∞
X
Yt = µ − φ−j εt+j .
j=1
. But this solution is not causal!
. Do not confuse this solution with the non-stationary solution

obtained when starting with any RV Y0 which is uncorrelated
with the WN!

Moments of the AR(1) process

I For |φ| < 1:
∞
X c
E[Yt ] := µ = c φj = ∀ t,
1−φ
j=0
∞
2 σ2
φ2j =
X
V[Yt ] = σ ∀ t,
1 − φ2
j=0
φ|h|
γ(h) = σ 2
1 − φ2
γ(h)
ρ(h) = = φ|h| .
γ(0)
I Note: The ACF satises the dierence equation (for h > 0):
ρ(h) = φρ(h − 1).
The partial autocorrelation function (PACF)

I Let {Yt } be a weakly stationary process.
I Its partial autocorrelation function (PACF) α(h) at lag h

. is the correlation between Yt and Yt+h adjusted for the
observations Z := (Yt+1 , . . . , Yt+h−1 )0 , i.e. for h ≥ 2:
h i
α(h) = Corr Yt − E(Y
b t |Z ), Yt+h − E(Y b t+h |Z ) ,
where b t |Z )
E(Y is the best linear prediction of Yt based on Z,
. or, equivalently, the last coecient in a linear projection of Yt
on its h most recent values.
⇒ α(h) = φhh (h = 1, 2, . . .) in the following AR(h) regression:
Yt = c + φ1h Yt−1 + . . . + φhh Yt−h + εt ,

which allows its estimation by OLS.
Simulated AR(1) process with

c = 0, φ = 0.5, {εt } ∼ GWN(0, 1)
AR(1): =0.5
3
2
1
0
Y
-1
-2
-3
-4
0 100 200 300 400 500 600 700
Time

Simulated AR(1) process: Correlogram

Series Y
1.0
0.8
0.6
ACF
0.4
0.2
ACF PACF
lag.1 0.510 0.510
0.0
lag.2 0.209 -0.070

lag.3 0.073 -0.007
0 5 10 15 20 25
lag.4 0.036 0.019
lag.5 0.018 -0.003
Lag
lag.6 0.019 0.013
Series Y lag.7 0.016 0.002
lag.8 0.006 -0.006
lag.9 -0.002 -0.004
0.5
lag.10 -0.036 -0.044

lag.11 -0.021 0.022
0.4
lag.12 -0.042 -0.046

lag.13 -0.049 -0.015
0.3
Partial ACF
lag.14 -0.005 0.046

lag.15 -0.017 -0.044
0.2
0.1
0.0
-0.1
0 5 10 15 20 25
Lag

Simulated AR(1) process: Now φ = 0.99

AR(1): =0.99
15
10
5
Y
0
-5
-10
0 100 200 300 400 500 600 700
Time


Series Y
1.0
0.8
0.6
ACF
0.4
0.2
ACF PACF
lag.1 0.962 0.962
lag.2 0.927 0.018
0.0
lag.3 0.895 0.024

0 5 10 15 20 25
lag.4 0.863 -0.013
lag.5 0.832 -0.002
Lag
lag.6 0.802 0.005
Series Y lag.7 0.773 -0.011
lag.8 0.745 0.001
1.0
lag.9 0.712 -0.077

lag.10 0.678 -0.036
0.8
lag.11 0.643 -0.042

lag.12 0.608 -0.025
lag.13 0.575 0.005
0.6
Partial ACF
lag.14 0.542 -0.014

lag.15 0.507 -0.053
0.4
0.2
0.0
0 5 10 15 20 25
Lag

Simulated AR(1) process: Now φ = −0.9

AR(1): = -0.9
5
Y
0
-5
0 100 200 300 400 500 600 700
Time


Series Y
1.0
0.5
ACF
0.0
ACF PACF
-0.5
lag.1 -0.899 -0.899

lag.2 0.808 0.002
lag.3 -0.730 -0.018
0 5 10 15 20 25
lag.4 0.666 0.038
lag.5 -0.603 0.028
Lag
lag.6 0.542 -0.019
Series Y lag.7 -0.488 -0.008
lag.8 0.436 -0.025
lag.9 -0.381 0.041
lag.10 0.337 0.021
0.0
lag.11 -0.290 0.044

lag.12 0.244 -0.028
-0.2
lag.13 -0.198 0.031

Partial ACF
lag.14 0.158 -0.003

-0.4
lag.15 -0.125 0.002

-0.6
-0.8
0 5 10 15 20 25
Lag

Weak stationarity of AR(p) processes

I Let {εt } ∼ WN(0, σ
2 ). Then there is a unique, weakly
stationary and causal solution to the AR(p ) equations:
Yt = c + φ1 Yt−1 + . . . + φp Yt−p + εt ,
Xp
Φp (L)Yt = c + εt , with Φp (L) = 1 − φj Lj ,
j=1
if all rootsz1 , . . . , zp of the (characteristic) AR polynomial

Φp (z) = (1 − φ1 z − . . . − φp z p ) (z ∈ Z)
lie outside the unit circle, i.e. |zj | > 1 for j = 1, . . . , p .
. This condition is called stability condition and can be
equivalently expressed as
Φp (z) 6= 0 ∀|z| ≤ 1.
I Stability condition ⇒ MA(∞) representation:

∞ ∞
X X c
Yt = µ + ψj εt−j , |ψj | < ∞, µ= .
Φp (1)
j=0 j=0
P∞ j
I Ψ(L) = j=0 ψj L is the inverse lter of Φp (L), i.e.
Φp (z)Ψ(z) = 1 for all |z| ≤ 1.

I Factorization of characteristic polynomial:
p
Y 1
Φp (z) = 1 − z
zj
j=1
I Process is nonstationary if |zj | = 1 for some j ∈ {1, . . . , p}.

I |zj | =
6 1 for all j = 1, . . . , p but |zk | < 1 for at least one k
⇒ AR(p ) equations have a w. stationary, non-causal solution.

Stationarity check for AR(2) processes: Examples

1. Yt = 2 + 56 Yt−1 − 16 Yt−2 + εt , {εt } ∼ WN(0, σ
2 ).
5 1
⇒ Φ2 (z) = 1 − z + z 2 = 0 ⇔ z 2 − 5z + 6 = 0
r6 6
5 25 5 1
⇔ z1,2 = ± −6= ± ⇔ z1 = 3, z2 = 2
2 4 2 2
⇒ {Yt } is weakly stationary and causal, since |zi | > 1 for i = 1, 2.
2. Yt = 1 + 49 Yt−1 − 12 Yt−2 + εt , {εt } ∼ WN(0, σ

2 ).
9 1 9
⇒ Φ2 (z) = 1 − z + z 2 = 0 ⇔ z 2 − z + 2 = 0
r4 2 2
9 81 9 7 1
⇔ z1,2 = ± −2= ± ⇔ z1 = 4, z2 =
4 16 4 4 2
⇒ {Yt } is weakly stationary (|zi | 6 1),

= but non-causal (|z2 | < 1).

Moments of the AR(p) process

I Under the stability condition it holds:
c c
E[Yt ] := µ = Pp =
1 − j=1 φj Φp (1)
p
V[Yt ] = σ 2 +
X
φj γ(j)
j=1
p
X
γ(h) = φj γ(h − j) for h > 0.
j=1
⇒ Yule-Walker equations (dierence equations of order p ):

p
X
ρ(h) = φj ρ(h − j) for h≥1 (6)
j=1
MA process Pq
I MA(q) process: Yt = µ + j=1 θj εt−j + εt = µ + Θq (L)εt
I Moments: E[Yt ] := µ and
σ 2 k=0 θk θk+|h| ,
( Pq−|h|
if |h| ≤ q
γ(h) =
0, if |h| > q
I Without any condition on the parameters, the process exists, is
weakly stationary, causal and ergodic for the mean.
I Special case q = 1:
γ(0) = V[Yt ] = (1 + θ2 )σ 2 ,
γ(1) = θσ 2 , γ(h) = 0 for q > 1,
θ 1 1
ρ(1) = 2
, − ≤ ρ(1) ≤ .
1+θ 2 2
Simulated MA(1) process with

µ = 0, θ = 0.9, {εt } ∼ GWN(0, 1)
MA(1): = 0.9
4
2
Y
0
-2
-4
0 100 200 300 400 500 600 700
Time

Simulated MA(1) process: Correlogram

Series Y
1.0
0.8
0.6
ACF
0.4
0.2
ACF PACF
lag.1 0.479 0.479
lag.2 -0.042 -0.352
0.0
lag.3 -0.024 0.254

0 5 10 15 20 25
lag.4 0.057 -0.109
lag.5 0.103 0.179
Lag
lag.6 0.044 -0.142
Series Y lag.7 0.043 0.202
lag.8 0.066 -0.124
lag.9 0.035 0.123
lag.10 0.049 -0.041
0.4
lag.11 0.048 0.060

lag.12 0.002 -0.084
lag.13 -0.039 0.016
0.2
Partial ACF
lag.14 -0.044 -0.055

lag.15 -0.013 0.028
0.0
-0.2
0 5 10 15 20 25
Lag

Invertibility of ARMA processes

I Stationary and causal AR(MA)-processes have an MA(∞)
representation with absolutely summable coecients.
I Invertible ARMA processes have an AR(∞) representation with

absolutely summable coecients.
I Denition. The ARMA(p ,q ) process Φ(L)Yt = Θ(L)εt is invertible,

if there exists a sequence of constants {πj } with
∞
X
|πj | < ∞ (absolute summability) and
j=0
∞
X
εt + ν = πj Yt−j , π0 = 1 (ν is some constant).
j=0
X∞
(⇔ Yt = ν − πj Yt−j + εt )
j=1
One can show that...

I If the roots of the MA polynomial
Θq (z) = (1 + θ1 z + . . . + θq z q )
are outside the unit circle, then the ARMA(p, q ) process,
Yt = c + φ1 Yt−1 + . . . + φp Yt−p + εt + θ1 εt−1 + . . . + θq εt−q ,

is invertible.
I Moreover, the ARMA(p, q ) process is weakly stationary and
causal, if the roots of the AR polynomial
Φp (z) = (1 − φ1 z − . . . − φp z p )
lie outside the unit circle (stability condition).
. Weak stationarity follows if Φp (z) 6= 0 ∀ |z| = 1.

6.3 ARMA models | 6.3.2 Estimation and forecasting 74 | 124

I Consider a weakly stationary and causal AR(p ) process:
p
2
X
yt = c + φj yt−j + εt = xt0 β + εt , {εt } ∼ WN(0, σ ),
j=1
with xt = (1, yt−1 , . . . , yt−p )0 and β = (c, φ1 , . . . , φp )0 .

I Regressors yt−j , j = 1, . . . , p do not depend on εt , εt+1 , . . .
⇒ Regressors are not strictly exogenous but predetermined.
⇒ xt and εt are uncorrelated: E[xt εt ] = 0.

⇒ The OLS estimator of β is biased, but (under mild conditions)
consistent and asymptotically normal.
I Alternative estimator: Use Yule-Walker equations (6).

On the consistency of the OLS estimator

I Let c = 0, xt = (yt−1 , . . . , yt−p )0 and φ = (φ1 , . . . , φp )0 :
p
X
yt = φj yt−j + εt = xt0 φ + εt , ⇔ y = Xφ + ε
j=1
I Under mild conditions on εt , {yt } is ergodic for the 2nd moment

and zt = xt ε t is stationary and (mean) ergodic.
T
X 0X 1 X p
⇒ = xt xt0 → E[xt xt0 ] = ((γ(h − k))ph,k=1 =: Γp
T T t=1
T
X 0ε 1 X p
and = xt εt → E[xt εt ] = 0
T T t=1
0 −1 0
0 −1 0 X X X ε p
⇒ φ = (X X ) X y = φ +
b → φ
T T
Estimation of ARMA models

I ARMA(p, q ) model (q ≥ 1): Maximum Likelihood (dicult)
I For an ARMA(p, q ) model (q 6= 0) the OLSE is not
completely implementable. In practice, one also uses the
following simple approach:
I Step 1: Approximate the ARMA(p, q ) by an AR(r ) (with
r >> max{p, q}), and apply OLS.
⇒ OLS residuals e1 , . . . , eT .
I Step 2: Use OLS to estimate the model:
yt = c + φ1 yt−1 + . . . + φp yt−p + θ1 et−1 + . . . + θq et−q + εt .

⇒ Consistent estimator of (c, φ1 , . . . , φp , θ1 , . . . , θq )

The general problem of prediction

I Assume that all (functions of ) random variables have nite
second moments.
I Aim: prediction of Y on basis of X = (X1 , ..., Xk )0 .

I Suppose that Y is predicted by Yb = Yb (X ).
⇒ Prediction error: Y − Yb
I Performance measure: Mean Squared Error of Prediction
MSEP(Y
b) = E(Y − Yb )2

Result 1: Best (mean square) prediction

I The (population) regression function f ∗ (X ) := E[Y |X ] is the
best (mean square) prediction of Y on the basis of X , i.e.
E(Y − E[Y |X ])2 = min E[Y − f (X )]2 .

f : Ef (X )2 <∞
I Note that E[Y |X ] is an unbiased prediction, i.e.
E(Y − E[Y |X ]) = 0.
⇒ MSEP(E[Y |X ]) = E(Y − E[Y |X ])2 is just the variance of the
prediction error (Y − E[Y |X ]).

Result 2: Best linear prediction

I Linear predictions are easier to obtain than E[Y |X ].
I If V[X ] is nonsingular, then the linear (population) regression
function
b |X ] := E[Y ] + Cov(Y , X )(V[X ])−1 (X − E[X ])

`∗ (X ) = E[Y
is the best linear (mean square) prediction of Y on basis of X,
i.e.
 2
k
b |X ])2 =
X
E(Y − E[Y min E Y − β0 − βj Xj  .
β0 ,...,βk
j=1
I Note that b |X ]
E[Y is also an unbiased prediction.

Forecasting with ARMA models

I Aim: Prediction of YT +h based on X = (Y1 , . . . , YT )0
. h denotes the forecast horizon.
I By Result 1, YT +h|T := E[YT +h |YT , . . . , Y1 ] is the best
h-step-ahead forecast.
I In practice, it is easier to derive best linear forecasts (Result 2).
I But in case of linear processes such as ARMA models, one can
often proceed as follows:
. We derive the best forecast under a stronger assumption such

as IWN error terms.
. If this forecast is linear, it must be the (unique!) best linear

forecast under the weaker assumption of WN errors.

Example: Forecasting an AR(p) process

I Let {Yt } ∼ AR(p) be weakly stationary and causal.
I Assume: {εt } ∼ IWN(0, σ 2 ), h = 1, T ≥ p ⇒

YT +1|T = E[YT +1 |YT , . . . , Y1 ]
p
X
= E[c + φj YT +1−j + εT +1 |YT , . . . , Y1 ]
j=1
p
X
= c+ φj E[YT +1−j |YT , . . . , Y1 ] + E[εT +1 |YT , . . . , Y1 ]
j=1
= c + φ1 YT + . . . + φp YT +1−p
⇒ This is the best linear 1-step-ahead forecast under WN errors.
. YT +h|T can be obtained recursively (h = 2, 3, . . .).

Example: Forecasting an ARMA(1,1) process

I AR(∞) representation of an (invertible) ARMA(1,1) process:
∞
(−θ)i−1 (Yt−i − µ) + εt
X
Yt − µ = (φ + θ)
i=1
I Assume again: {εt } ∼ IWN(0, σ 2 ), h = 1

⇒ Optimal forecast of YT +1 based on the innite past:
YT∗ +1|T := E[YT +1 |YT , YT −1 , . . .]

∞
(−θ)i−1 (YT +1−i − µ).
X
= µ + (φ + θ)
i=1
I Approximate YT∗ +1|T by truncating the innite sum at T.
I In practice, we replace the parameters by estimates ⇒ YbT +h|T .

6.3 ARMA models | 6.3.3 The Box-Jenkins program 83 | 124

(1) If necessary, transform the data, so that the assumption of
weak stationarity is reasonable (see also Section 6.4).
. Main tool of Box & Jenkins (1976): (seasonal) dierencing
(2) Model identication: Propose suitable lag orders p and q for
the ARMA model.
(3) Estimate the parameters of the ARMA(p , q) model.
. see Section 6.3.2
(4) Model validation/diagnostic check: is the model consistent
with the observed features of the data?
(5) Forecasting: Prediction of future values of the process (see
Section 6.3.2) and forecast evaluation.

(2) Model identication
I Evaluate the empirical (P)ACF.
. The ACF of an MA(q ) process dies out after the order q.

. The PACF of an AR(p ) process dies out after the order p.
I Make an initial guess of small values for lag order p and q for
a suitable ARMA model.
I For specifying an appropriate model, you could also use some
information criterion (cp. step (4))

(4) Model validation/ Diagnostic analysis

I Calculation of residuals
Φ
b p (L) cb
et := Yt − .
Θ
b q (L) Θ
b q (1)
Example: For AR(p ): et = Yt − cb − φb1 Yt−1 − . . . − φbp Yt−p .

I Under WN/IWN/GWN assumption: The et are approximately
uncorrelated/independent/independently normally distributed.
I Analysis of the ACF/PACF and if necessary Jarque-Bera test.
I Plot (standardized) residuals:
. Roughly 95 % should be within ±1.96.

Portmanteau tests (for autocorrelation)

I Test for the nonexistence of autocorrelation among the errors εt :
H0 : ρε (1) = . . . = ρε (h) = 0
H1 : ρε (j) 6= 0 for some j ∈ {1, . . . , h}
I Portmanteau statistic (Box and Pierce, 1970):
h
ρbε (j)2 .
X
QBP (h) = T
j=1
I If εt are i.i.d., then we have
QBP (h) ∼ χ2(h−m) ,

a
H0
where m(= p + q) is the number of parameters to be
estimated.
I Ljung and Box (1978) (Q-statistic ):
ρbε (j)2
h
∼ χ2(h−m)
a
X
QLB (h) = T (T + 2)
T −j H0
j=1
I Note that ρbε (j) estimates the true correlations between the
εt 's by means of residuals after tting the ARMA model.
I The Ljung-Box statistic has greater power in smaller samples.
I Reject H0 ⇔ QLB (h) > χ2(h−m)

,1−α
I Attention: Rejecting the null hypothesis can also be due to
nonlinear dependences (Ex: GARCH eects)!

Breusch-Godfrey LM Test for autocorrelation

I Assume, e.g., an AR(p) model for Yt :
Yt = φ1 Yt−1 + . . . + φp Yt−p + εt .
I Test H0 : {εt } ∼ WN against H1 : {εt } ∼ AR(r ).
1. Perform the auxiliary regression for residuals et (after tting the

AR(p ) model):
et = α1 Yt−1 + . . . + αp Yt−p + β1 et−1 + . . . + βr et−r + νt .
2. Calculate the test statistic
LM = T · R 2 ∼ χ2r ,
a
where
H0
R2 is the coecient of determination of the auxiliary regression.
3. Reject H0 ⇔ LM > χr2,1−α .

Model comparison
I Akaike Information Criterion (AIC):
AIC := −2 ln[L(θ)]
b + 2m
I Bayes Information Criterion (BIC):
BIC := −2 ln[L(θ)]
b + m ln(T )
where L(θ)
b is the likelihood function at the point θb (MLE),
and m denotes the number of model parameters.

(5) Forecast evaluation

I Out-of-sample prediction:
. h-periods-ahead forecast of Yt : Ybt+h|t
. Prediction error: et+h|t := Ybt+h|t − Yt+h .
I Evaluation of T∗ forecasts for t ∈T∗ with |T ∗ | = T ∗ :

. Mean Absolute Prediction Error - MAPE:
1 X 1 X
MAPE = |Ybt+h|t − Yt+h | = |et+h|t |.
T∗ T∗
t∈T ∗ t∈T ∗
. Root Mean Square Prediction Error - RMSPE:

s s
1 1
(Ybt+h|t − Yt+h )2 = (et+h|t )2 .
X X
RMSPE =
T∗ ∗
T∗ ∗
t∈T t∈T
I Pseudo-out-of-sample: e.g.
T ∗ = {T − h, . . . , T − h − T ∗ + 1}
6.4 Nonstationary processes | 91 | 124
Contents
6.1 Introduction
6.3 ARMA models

6.4 Nonstationary processes | 6.4.1 Unit root processes 92 | 124

I Many economic TS show a trending behavior (e.g. German
GNP).
⇒ Stationarity assumption is unrealistic.
I Possible reasons for nonstationarity are, for example:
1. A nonstable mean due to deterministic trends, seasonality,

breaks in deterministic components etc. (deterministic
nonstationarity), or
2. a root of the AR-polynomial of the process that lies on the

unit circle (e.g. a unit root; stochastic nonstationarity).

Trend-stationary processes
I A trend-stationary process is given by
Yt = δ0 + δt + Ψ(L)εt , {εt } ∼ WN(0, σ 2 ),

where
∞
X
Ψ(L)εt = ψ0 εt + ψ1 εt−1 + ..., ψ 0 = 1, |ψj | < ∞.
j=0
I Yet := Yt − δ0 − δt = Ψ(L)εt is a weakly stationary and causal
process with zero mean.
. Any weakly stationary and causal ARMA process (with mean

zero) has such an MA(∞) representation!
⇒ E[Yt ] = δ0 + δt.

Example:Trend-stationary AR(1) process

I Let Xt be a zero-mean w. stationary and causal AR(1) process:
Xt = φXt−1 + εt , |φ| < 1, {εt } ∼ WN(0, σ 2 )

X∞
⇒ Xt = Ψ(L)εt = φj εt−j
j=0
I Trend-stationary AR(1) process:
Yt = δ0 + δt + Xt
= δ0 + δt + φ Xt−1 +εt
| {z }
=Yt−1 −δ0 −δ(t−1)
= [δ0 (1 − φ) + δφ] + δ(1 − φ)t + φYt−1 + εt
I Last representation: AR(1) process around a linear trend

Simulated trend-stationary AR(1) process

Yt = −0.05 + 0.05t + Xt , Xt = 0.7Xt−1 + εt , t = 1, . . . , 700
⇔ Yt = 0.02 + 0.015t + 0.7Yt−1 + εt , {εt } ∼ GWN(0, 1)
30
20
Y
10
0
0 100 200 300 400 500 600 700
Time
Simulated trend-stationary AR(1) process:

Correlogram (Sample ACF/PACF)
Series Y
1.0
0.8
0.6
ACF
0.4
0.2
ACF PACF
lag.1 0.990 0.990
lag.2 0.982 0.104
0.0
lag.3 0.974 0.021

0 5 10 15 20 25
lag.4 0.968 0.092
lag.5 0.963 0.071
Lag
lag.6 0.959 0.029
Series Y lag.7 0.954 0.028
lag.8 0.950 0.010
lag.9 0.946 0.054
1.0
lag.10 0.942 -0.001

lag.11 0.937 -0.038
0.8
lag.12 0.932 -0.012

lag.13 0.927 -0.042
0.6
Partial ACF
lag.14 0.923 0.066

lag.15 0.919 0.007
0.4
0.2
0.0
0 5 10 15 20 25
Lag

Integrated processes
I Denition: d ∈ N0 , a time series {Yt }∞
For t=−∞ is called
d
integrated of order d , denoted {Yt } ∼ I (d), if {∆ Yt } is a
(weakly) stationary process, whereas {∆

d−1 Y } is not (trend)
t
stationary.
I Representation: ∆d Yt = c + Ψ(L)εt ,
P∞
. j=0 |ψj | < ∞, ψ0 = 1, Ψ(1) 6= 0
I Example: ARIMA(p, d, q )-Process:
Φp (L)(1 − L)d Yt = c + Θq (L)εt ,

where Wt ≡ (1 − L)d Yt is the d -th dierence of Yt and
follows a (stationary) ARMA(p, q) process.

Simulated ARIMA(1,1,0) process: φ = 0.7

(Yt − Yt−1 ) = 0.7(Yt−1 − Yt−2 ) + εt , {εt } ∼ GWN(0, 1)
40
20
Y
0
-20
0 100 200 300 400 500 600 700
Time

Simulated ARIMA(1,1,0) process: Correlogram

Series Y
0.8
ACF PACF
lag.1 0.995 0.995
ACF
0.4
lag.2 0.984 -0.692

lag.3 0.967 -0.056
0.0
lag.4 0.947 -0.037

0 5 10 15 20 25 lag.5 0.925 0.004
Lag lag.6 0.901 0.086
lag.7 0.877 -0.036
Series Y lag.8 0.852 -0.017
lag.9 0.827 -0.020
1.0
lag.10 0.802 -0.003

lag.11 0.777 -0.026
Partial ACF
0.5
lag.12 0.752 -0.040

0.0
lag.13 0.727 -0.022

-0.5
lag.14 0.701 0.063

lag 15 0 676 0 023
0 5 10 15 20 25
Lag

Example of random walk (with drift c )

I Random Walk with drift:
Yt = c + Yt−1 + εt , {εt } ∼ WN(0, σ 2 ).

Y0 : Yt = Y0 + ct + tj=1 εj
P
I With (constant) initial value
E[Yt ] = t · c + Y0
V[Yt ] = t · σ 2
r
h
ρ(t, t − h) = 1 − (for h ≥ 0).
t
I The drift c generates a linear trend!
I For c = 0, the process is called Random Walk.
I Then: {Yt } ∼ ARIMA(0, 1, 0).

Simulated random walk (RW)

t
X
Yt = Yt−1 +εt = Y0 + εj , t = 1, . . . , 700, {εt } ∼ GWN(0, 1)
j=1
20
15
10
Y
5
0
-5
0 100 200 300 400 500 600 700
Time
Simulated random walk: Correlogram

Series Y
1.0
0.8
0.6
ACF
0.4
0.2
ACF PACF
lag.1 0.987 0.987
lag.2 0.972 -0.054
0.0
lag.3 0.958 0.015

0 5 10 15 20 25
lag.4 0.944 -0.002
lag.5 0.930 -0.002
Lag
lag.6 0.914 -0.068
Series Y lag.7 0.900 0.023
lag.8 0.886 0.010
lag.9 0.873 0.059
1.0
lag.10 0.861 -0.004

lag.11 0.848 -0.037
0.8
lag.12 0.836 0.054

lag.13 0.826 0.037
0.6
Partial ACF
lag.14 0.819 0.081

lag.15 0.810 -0.030
0.4
0.2
0.0
0 5 10 15 20 25
Lag

Simulated random walk with drift c = 0.02

t
X
Yt = 0.02 + Yt−1 + εt = Y0 + 0.02t + εj , t = 1, . . . , 700
j=1
80
60
Y
40
20
0
0 100 200 300 400 500 600 700
Time
Simulated RW with drift: Correlogram

Series Y
1.0
0.8
0.6
ACF
0.4
0.2
ACF PACF
lag.1 0.996 0.996
lag.2 0.991 -0.005
0.0
lag.3 0.987 -0.017

0 5 10 15 20 25
lag.4 0.982 0.017
lag.5 0.978 -0.019
Lag
lag.6 0.973 0.018
Series Y lag.7 0.969 0.008
lag.8 0.965 -0.014
lag.9 0.961 0.025
1.0
lag.10 0.956 0.007

lag.11 0.952 -0.015
0.8
lag.12 0.948 0.004

lag.13 0.944 0.021
0.6
Partial ACF
lag.14 0.940 -0.012

lag.15 0.936 0.013
0.4
0.2
0.0
0 5 10 15 20 25
Lag

Integrated processes, I(0) vs I(1)
I(0) I(1)
stationary process e.g. random walk
mean reverting wanders widely, stochastic trend
eect of error term is temporary eect of error term is innite
I Unit root process =

b I(1) ⊆ non-stationary.
I 'Near-I(1)' processes: stationary but highly persistent, often
approximated by I(1) processes in empirical applications.

6.4 Nonstationary processes | 6.4.2 Unit root tests 106 | 124

I Box-Jenkins program assumes that data has been transformed,
if necessary, so that stationarity assumption is reasonable.
I Modelling of trend: Deterministic or stochastic trend?
I Stochastic trend ⇒ (in contrast to deterministic trend):
. the series does not revert to a long-term trend line,

. innovation/shocks have a permanent (non-vanishing) eect,
. the forecast variance does not converge, but increases
(linearly) with the forecast horizon.
⇒ Integration order of the process is of great importance for the
analysis (economic interpretation).
⇒ Interest is in tests which allow to detect a unit root in the AR
polynomial of an AR(MA) process.

The Dickey Fuller (DF) test

I Consider an AR(1) model: yt = φ1 yt−1 + εt .
I DF-regression:
∆yt = (φ1 − 1)yt−1 + εt or ∆yt = αyt−1 + εt .

I Test H0 : α = 0 (φ1 = 1) vs H1 : α < 0 (φ1 < 1).
I The asymptotic null distribution of the DF t−statistic is not
normal
φb1 − 1 α d 1 W (1)2 − 1
−→
b
DF = q =p qR ,
Var (b
α ) 2
Var (φb1 ) W (r )dr
where W (1) is a Brownian motion evaluated in 1.
I The asymptotic distribution is called Dickey-Fuller distribution.

The augmented Dickey-Fuller (ADF) test

I Consider an AR(p ) model for {Yt } without deterministic
terms:
(1 − φ1 L − . . . − φp Lp )Yt = εt ,
I Alternative formulation:
Yt = ρYt−1 + ζ1 ∆Yt−1 + . . . + ζp−1 ∆Yt−p+1 + εt , (7)
where ∆Yt := Yt − Yt−1 , ρ := φ1 + . . . + φp

ζj := −[φj+1 + . . . + φp ] for j = 1, 2, . . . , p − 1.

I If {Yt } ∼ I (1), then there exists a solution to the equation
Φp (z) = (1 − φ1 z − . . . − φp z p ) = 0
that is equal to one (and {Yt } is called an unit root process),
i.e.
Φp (1 ) = 1 − φ1 − . . . − φp = 0 ⇔ ρ = 1.
I In this case (7) is not stationary; if however one unit root is
removed from Φp (L), then the resulting process is stationary, if
1 was the only root on the unit circle.
I Therefore the ADF test regression equation is given by:
∆Yt = αYt−1 + ζ1 ∆Yt−1 + . . . + ζp−1 ∆Yt−p+1 + εt , (8)
where α = ρ − 1 = φ1 + . . . + φp − 1 = −Φp (1).

I Test for unit roots implies
H0 : α = 0 vs. H1 : α < 0.
and is performed by means of a simple t -test statistic for α.
. If {Yt } is stationary and causal, then α < 0.
. Under H1 , {Yt } might be nonstationary or stationary and
non-causal; but these cases are of little practical relevance.
I Because under H0 the regressor Yt−1 is not stationary, the resulting

t -statistic is neither t -distributed nor asymptotically normal
distributed; it follows a non-standard (Dickey-Fuller) distribution.
I Instead, adapted (simulated) critical values developed by

MacKinnon (1991) have to be used.
I Adding a constant or a linear trend to the ADF regression (8) leads
to dierent (non-standard) distributions of the t -statistic.

Deterministic terms in ADF regression

I Deterministic terms have dierent eects under H0 and H1 :
E.g., under H0 (unit root) a constant generates a linear trend
(in contrast to trend stationary case). ⇒ Proposed solution:
. If the data has a nonzero mean (and shows prolonged upward

and downward patterns but no clear overall trend direction),
then include a constant in the regression.
. If the series has a clear (linear) trend direction, then include a

linear term in the regression.
Exception: E.g. interest rates (no economic theory for trend).
⇒ The model is correctly specied under H1 .

Under H0 : the corresponding parameter estimates should be
close to zero.

6.4 Nonstationary processes | 6.4.3 An empirical application with R 112 | 124

I Application of the Box-Jenkins Program to a data set:
I Data: Quarterly U.S. Fixed Investment, from 1947:1 until
1972:4 (xt , t = 1, . . . , T , T = 104), see slides 10, 11, 19, 20.
I It turns out, e.g. by unit root testing, that {xt } ∼ I (1) (unit
root process).
⇒ Select and estimate an appropriate model for quarterly
changes of U.S. xed investment, yt = ∆xt .

I Check the adequacy of the selected model by diagnostic tests
of residuals.
I Forecast the future values (4 quarters) of xt (nonstationary).

Data: U.S. xed investment (x )

I Quarterly observations, 1947:11972:4 (T = 104)
160
140
U.S. Fixed Investment
120
100
80
1950 1955 1960 1965 1970
Time

Sample ACF and PACF for U.S. xed

investment Series x
1.0
0.6
ACF
0.2
-0.2
0 1 2 3 4 5
Lag
Series x
1.0
Partial ACF
0.6
0.2
-0.2
1 2 3 4 5
Lag
ADF Test for U.S. xed investment

I ADF regression with linear time trend since time series shows
upward trend:
> adfTest(x, type="ct")
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 1
STATISTIC:
Dickey-Fuller: -2.4072
P VALUE:
0.4079
Quarterly Changes of US xed investment (y )
8
Quarterly Changes in U.S. Fixed Investment
6
4
2
0
-2
-4
1950 1955 1960 1965 1970
Time

Sample ACF and PACF for quarterly

changes of U.S. xed investment Series y
1.0
0.6
ACF
0.2
-0.2
0 1 2 3 4 5
Lag
Series y
0.4
Partial ACF
0.2
0.0
-0.2
1 2 3 4 5
Lag
ADF test for quarterly changes in U.S. xed

investment
I ADF regression with constant since time series shows nonzero mean:
> adfTest(y, type="c")
Title:
Augmented Dickey-Fuller Test
Test Results:
PARAMETER:
Lag Order: 1
STATISTIC:
Dickey-Fuller: -5.3516
P VALUE:
0.01
Estimation of an AR(1) model for y

> ar1<-arima(y,order=c(1,0,0))
> ar1
Call:
arima(x = y, order = c(1, 0, 0))
Coefficients:
ar1 intercept
0.5019 1.0885
s.e. 0.0899 0.4994
sigma^2 estimated as 6.308: log likelihood = -234.13,

aic = 474.25
Estimation of an AR(4) model for y

I Reasonable if PACF of Yt is nonzero.
> ar4<-arima(y,order=c(4,0,0))
> ar4
Call:
arima(x = y, order = c(4, 0, 0))
Coefficients:
ar1 ar2 ar3 ar4 intercept
0.5264 -0.0968 -0.0155 -0.2043 1.0125
s.e. 0.1015 0.1146 0.1149 0.1023 0.3085

aic = 472.81
Estimation of a restricted AR(4) model for y

I Model: yt = c + ϕ1 yt−1 + ϕ4 yt−4 + εt (smallest AIC/BIC)
> ar4r<-arima(y,order=c(4,0,0),fixed=c(NA,0,0,NA,NA))
> ar4r
Call:
arima(x = y, order = c(4, 0, 0), fixed = c(NA, 0, 0, NA, NA))
Coefficients:
ar1 ar2 ar3 ar4 intercept
0.4750 0 0 -0.2292 1.0150
s.e. 0.0879 0 0 0.0889 0.3247

aic = 469.86
Some diagnostics for the selected model Standardized Residuals
3
2
1
0
-2
1950 1955 1960 1965 1970
Time
ACF of Residuals
1.0
0.6
ACF
0.2
-0.2
0 1 2 3 4 5
Lag
p values for Ljung-Box statistic

0.8
p value
0.4
0.0
5 10 15 20
lag
Distribution of residuals in selected model

Histogram of Residuals
0.15
0.10
Jarque_Bera_Test
X-squared 2.8564686
df 2.0000000
p-value 0.2397318
0.05
0.00
-5 0 5
Residuals

Forecasting next 4 quarters of U.S. xed investment

# Estimate the selected model in terms of x (not): need linear trend to have constant for y!
> time<-seq(0,length(x)-1,1)
> ar4r_dx<-arima(x,order=c(4,1,0),xreg=time,fixed=c(NA,0,0,NA,NA))
> ar4r_dx
Call:
arima(x = x, order = c(4, 1, 0), xreg = time, fixed = c(NA, 0, 0, NA, NA))
Coefficients:
ar1 ar2 ar3 ar4 time
0.4750 0 0 -0.2292 1.0150
s.e. 0.0884 0 0 0.0894 0.3263
sigma^2 estimated as 5.963: log likelihood = -228.62, aic = 465.25

> time_new=seq(length(x),length(x)+3,1)
# Forecasting US fixed investment (x)

> predict(ar4r_dx,ahead=4,newxreg=time_new)
$pred
Qtr1 Qtr2 Qtr3 Qtr4
1972 178.0686 179.0836 180.0986
1973 181.1136
$se
Qtr2
1972 2.441928
# True observations in 1973, Qtr2-Qtr4: 176.1 178.2 186.7

Time Series Analysis: Applied Econometrics Prof. Dr. Simone Maxand

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Time Series Analysis: Applied Econometrics Prof. Dr. Simone Maxand

Uploaded by

Copyright:

Available Formats

Chapter 6

Time series analysis

Prof. Dr. Simone Maxand

6.2 Stochastic processes

6.2.1 Basic concepts

6.2.2 Stationarity and ergodicity

6.2.3 Linear processes

6.3 ARMA models

6.3.1 Autoregressive and other ARMA processes

6.3.2 Estimation and forecasting

6.3.3 The Box-Jenkins program

6.4 Nonstationary processes

6.4.1 Unit root processes

6.4.2 Unit root tests

6.4.3 An empirical application with R

random variable over time

. Values of a variable are observed at successive time points.

I Example: Monthly US industrial production index

Applied Econometrics  Chapter 6: Time series analysis

I T discrete set ⇒ {yt }t∈T TS in discrete time

. Data per hour, day, week, month, quarter, year, etc.

. Special case: T nite, equidistant points in time, i.e.

T = {1, ..., T } ⇒ {yt }t∈T = {y1 , ..., yT }

I TS in continuous time: Observations are recorded continuously

over some time interval, e.g. T = [0, 1].

I In theory: Assume often that {yt }t∈T has started in the

(innite) past (t ≤ 0) and continues to the (innite) future

(t > T ), i.e. {yt }∞

Applied Econometrics  Chapter 6: Time series analysis

Typical characteristics of TS data

I TS data may have a time-varying variance.

I TS data are often governed by a trend (deterministic/stoch.?).

I TS data have seasonal/cyclic components.

I TS data may have structural breaks.

Applied Econometrics  Chapter 6: Time series analysis

Goals of time series analysis

questions for which cross-sectional data are inadequate.

I Description/estimation of dynamic properties

. to gain a better understanding of the DGP:

 Are there regularities or structures in the data?

. to check economic theory:

 E.g. Quantity Theory of Money: money supply has a direct

. to forecast the future development of an economic variable:

 What is next month's ination rate, interest rate, stock price,

Applied Econometrics  Chapter 6: Time series analysis

I (Dynamic) causal dependences between variables:

. Forecasts make sense even without causal interpretation (e.g.

Applied Econometrics  Chapter 6: Time series analysis

Time series plots

a graphical representation (TS plot).

I A TS plot provides information about

I Note: When dealing with outliers, common sense is often

more important than statistical theory.

Applied Econometrics  Chapter 6: Time series analysis

Quarterly German GDP and rst dierences

Quarterly German log(real GDP) First dierences (income growth)

Applied Econometrics  Chapter 6: Time series analysis

Daily exchange rate BRA-USD

Applied Econometrics  Chapter 6: Time series analysis

Daily DAX returns (in %)

10.0 DAX, Veränderung täglich in %

Applied Econometrics  Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

. Special case: T nite, equidistant points in time, i.e.

(innite) past (t ≤ 0) and continues to the (innite) future

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

Are there regularities or structures in the data?

E.g. Quantity Theory of Money: money supply has a direct

What is next month's ination rate, interest rate, stock price,

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

I Note: When dealing with outliers, common sense is often

Applied Econometrics Chapter 6: Time series analysis

Quarterly German GDP and rst dierences

Quarterly German log(real GDP) First dierences (income growth)

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

I Lag- (Backshift-) operator L is dened by:

Lyt := yt−1 (rst lag of yt )

I Powers of L are dened in an obvious way (recursively):

usual polynomials (in real or complex variables).

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

I The rst dierence of the log TS describes the growth rate:

I Seasonal dierence lter (for seasonality s ):

Applied Econometrics Chapter 6: Time series analysis

ltering (see last slide); estimate seasonality by trigonometric

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

I The description of dependences of variable between dierent

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

Applied Econometrics Chapter 6: Time series analysis

{Yt }t∈T dened on a probability space (Ω, A, P).

I For each xed t ∈ T , Yt is a RV, i.e. Yt = Yt (·) : Ω → E .

Applied Econometrics Chapter 6: Time series analysis