EC2019 Econometrics II: Seminar 4-Solution

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

EC2019 Econometrics II: Seminar 4—Solution

1. Consider the following process:

yt = φyt−1 + t ; where |φ| < 1; t = 0, ±1, ±2, · · · , (1)

where c is a non-zero constant, and the error term t satisfies the usual white-noise assumptions:

E(t ) = 0 ∀t
E(2t ) = σ 2 ∀t
E(t t−s ) = 0 ∀s 6= 0

Let ŷt+h|t denote the optimal linear prediction of yt+h made at time t. Demonstrate that ŷt+h|t
approaches the unconditional mean of the process as h −→ ∞. Let eh denote the prediction error
such that
eh = yt+h − ŷt+h|t (2)
Show that the prediction error variance approaches the unconditional variance of the process as
h −→ ∞.

Recall the unconditional mean and variance of an AR(1) with no deterministic components:

E(yt ) = 0 (3)
σ2
V (yt ) = (4)
1 − φ2
Optimal linear predictions are derived by taking conditional expectation of future values at time t.
For h = 1, the actual yt+1 is generated by

yt+1 = φyt + t+1 (5)

Take conditional expectation given It , the information available at time t:

ŷt+1|t = E(yt+1 |It ) = φyt (6)

yt is known at time t and hence part of the information set It . However, t+1 in (5) is a future error
term. Given that every t is independent from t−s , ∀s 6= 0, E(t+1 |It ) = E(t+1 ) = 0—i.e. having
the past information does not help predict future random disturbances any better. The forecast error
e1 is the difference between the actual (5) and its prediction (6), which is obviously t+1 .

For h = 2, the actual yt+2 is generated by:

yt+2 = φyt+1 + t+2 (7)

and its conditional expectation given by:

ŷt+2|t = E(yt+2 |It )


= φE(yt+1 |It ) + E(t+2 |It )
= φŷt+1|t Substitute (6)
2
= φ yt (8)

For h = 3, yt+3 is generated by:


yt+3 = φyt+2 + t+3 (9)

1
Taking conditional expectation at time t gives its prediction:

ŷt+3|t = E(yt+3 |It )


= φE(yt+2 |It ) + E(t+3 |It )
= φŷt+2|t
= φ3 yt (10)

It is clear that the general prediction formula for the process of (1) is given by:

ŷt+h|t = φŷt+h−1|t
(11)
= φh yt

Since |φ| < 1, as h −→ ∞, ŷt+h|t −→ 0 = E(yt ).

As for the prediction error and its variance, in order to compute the prediction error variance,
we need to express the forecast errors in terms of t ’s, since this is the random variable whose
statistical properties we know—in this instance, by assumption. The one-step-ahead prediction error
is obviously the difference between (5) and (6), which is t+1 . For h > 1, we need to express yt+h in
terms of future errors. For h = 2:

yt+2 = φyt+1 + t+2 Substitute (5):


2
= φ yt + φt+1 + t+2

For h = 3:

yt+3 = φyt+2 + t+3


= φ(φ2 yt + φt+1 + t+2 ) + t+3
= φ3 yt + φ2 t+1 + φt+2 + t+3

By following the same pattern, it is clear that the actual observation yt+h can be expressed as:

yt+h = φh yt + φh−1 t+1 + φh−2 t+2 + · · · + t+h (12)

Writing the actual and predicted side-by-side makes evaluating the prediction error much easier.

yt+h ŷt+h|t

yt+1 = φyt + t+1 ŷt+1|t = φyt


2
yt+2 = φ yt + φt+1 + t+2 ŷt+2|t = φ2 yt
yt+3 = φ3 yt + φ2 t+1 + φt+2 + t+3 ŷt+3|t = φ3 yt
yt+4 = φ4 yt + φ3 t+1 + φ2 t+2 + φt+3 + t+4 ŷt+4|t = φ4 yt
.. ..
. .

The prediction errors are merely the difference between yt+h and ŷt+h|t , therefore:

e1 = t+1
e2 = φt+1 + t+2

2
e3 = φ2 t+1 + φt+2 + t+3
e4 = φ3 t+1 + φ2 t+2 + φt+3 + t+4
..
.
eh = φh−1 t+1 + φh−2 t+2 + · · · + t+h (13)

Note that E(eh ) = 0, ∀h. Then, V (eh ) = E(e2h ).

V (e1 ) = E(2t+1 ) = σ 2
V (e2 ) = E[(φt+1 + t+2 )2 ]
= φ2 E(2t+1 ) + E(2t+2 )
= σ 2 (1 + φ2 )
V (e3 ) = E[(φ2 t+1 + φt+2 + t+3 )2 ]
= σ 2 (1 + φ2 + φ4 )
V (e4 ) = E[(φ3 t+1 + φ2 t+2 + φt+3 + t+4 )2 ]
= σ 2 (1 + φ2 + φ4 + φ6 )
..
.
V (eh ) = σ 2 (1 + φ2 + φ4 + · · · φ2(h−1) )

The common increment in the sum is φ2 . Since 0 ≤ φ2 < 1 due to the stationarity condition:

σ2
lim V (eh ) = = V (yt ) (14)
h→∞ 1 − φ2

2. Consider the following process:

yt = φ1 yt−1 + φ2 yt−2 + t ; t = 0, ±1, ±2, · · · , (15)

and the error term t follows a normal distribution with the following properties:

E(t ) = 0 ∀t
E(2t ) = σ 2 ∀t
E(t t−s ) = 0 ∀s 6= 0

Show that the process of (15) can be equivalently expressed as:

∆yt = ρyt−1 − φ2 ∆yt−1 + t ; (16)

where ρ = φ1 + φ2 − 1. Demonstrate that ρ = 0 if the process yt has an AR unit root.

yt = φ1 yt−1 + φ2 yt−2 + t subtract yt−1 from both sides:


yt − yt−1 = (φ1 − 1)yt−1 + φ2 yt−2 + t add and subtract φ2 yt−1
∆yt = (φ1 + φ2 − 1)yt−1 − φ2 (yt−1 − yt−2 ) + t
∆yt = (φ1 + φ2 − 1)yt−1 − φ2 ∆yt−1 + t
∆yt = ρyt−1 + π∆yt−1 + t (17)

3
(15) and (17) are mathematically equivalent with:

ρ = φ1 + φ2 − 1; π = −φ2 (18)

regardless of the presence/absence of a unit root in yt . Consider the case where the process yt
generated by (15) contains a unit root; so that one of the roots of the polynomial

φ(z) = 1 − φ1 z − φ2 z 2 = 0

is on the unit circle. Rearrange φ(z):

φ(z) = 1 − φ1 z − φ2 z 2
= (1 − λ1 z)(1 − λ2 z)
= 1 − (λ1 + λ2 )z + λ1 λ2 z 2

This enables us to express the AR parameters in terms of the roots of the polynomial as:

φ1 = λ1 + λ2
φ2 = −λ1 λ2

Substituting them into (18) gives:

ρ = λ1 + λ2 − λ1 λ2 − 1

The roots of the polynomial φ(z) are λ−1 −1


1 and λ2 , and if either one of them is to be 1 (we shall limit
ourselves to the analysis of real roots), then either λ1 or λ2 must be 1. Let λ1 = 1:

ρ = φ1 + φ2 − 1
= 1 + λ2 − λ2 − 1
=0

Now let λ2 = 1:
ρ = λ1 + 1 − λ1 − 1 = 0
Therefore, when either of the roots is 1, ρ = 0, and the AR(2) process of (17) becomes:

∆yt = π∆yt−1 + t (19)

which is a stationary AR(1) process in first differences since π = λ1 λ2 .

3. Consider the following process:

yt = c + yt−1 + t − θt−1 ; t = 0, ±1, ±2, · · · , (20)

and the error term t follows a normal distribution with the following properties:

E(t ) = 0 ∀t
E(2t ) =σ 2
∀t
E(t t−s ) = 0 ∀s 6= 0

Suppose you have drawn T observations, y1 , y2 , · · · , yT from this process. Discuss how you can
estimate the model of (20) and compute optimal linear forecasts for the level yT +h , h = 1, 2, 3.

4
The process of (20) is an I(1) process: it has an AR unit root. Recall from the discussion on spurious
regression that the parameters of a unit root process cannot be consistently estimated. However, we
know that the first difference of an I(1) process is stationary. In this case, just subtracting yt−1 from
both sides of (20) gives:
∆yt = c + t − θt−1 (21)
which is a stationary MA(1) in ∆yt . Therefore, using the conditional maximum likelihood estimation
method, the unknown parameters of the process (20), c and θ, can be consistently estimated provided
that the invertibility condition |θ| < 1 holds.
Following the procedure detailed in Chapter 3, Section 2 of my lecture notes, the conditional likelihood
function, conditional on the assumption that 1 is a known constant, is given by:
T
−(∆yt − c + θt−1 )2
 
Y 1
f (∆y2 ∩ ∆y3 ∩ · · · ∩ ∆yT ) = √ exp (22)
2πσ 2 2σ 2
t=2

Note that given the set of observations y1 , y2 , · · · , yT as the question says, ∆y1 = y1 − y0 cannot be
computed since y0 obviously does not exit. Hence the first usable observation for ∆yt is ∆y2 . Then
the log-likelihood function is given by:
T
T −1 T −1 1 X
ll(θ) = − ln(2π) − ln σ 2 − 2 (∆yt − c + θt−1 )2 (23)
2 2 2σ
t=2
0
where θ is a vector containing the unknown parameters of the process, in this case θ = c θ σ 2 .


The ML estimate of θ is solved by numerical optimisation as no analytical solution exits.

Let c̃, θ̃, and σ̃ 2 denote the maximum likelihood estimates of c, θ, and σ 2 respectively. Using these,
we can compute forecasts for the level yt by simply using equation (20). Let ŷt+h|t = E(yt+h |t), the
optimal forecast of yt+h computed at time t.

yT +1 = c + yT + T +1 − θT
ŷT +1|T = c̃ + yT − θ̃T

At time T , the present value T is known—in practice, once the parameter estimates c, θ, and σ 2 are
obtained, the residual series {˜
t } can always be used.

yT +2 = c + yT +1 + T +2 − θT +1
ŷT +2|T = c̃ + ŷT +1|T
= 2c̃ + yT − θ̃T
yT +3 = c + yT +2 + T +3 − θT +2
ŷT +3|T = c̃ + ŷT +2|T
= 3c̃ + yT − θ̃T

It is apparent from above that


ŷT +h|T = hc̃ + yT − θ̃T
Therefore, apart from the initial value of yT − θ̃T , the h-step-ahead forecast for yt is simply a
linear line—hc̃. This makes sense as ∆yt is an MA(1). Recall from Chapter 4 that if yt follows a
zero-mean MA(1) process, all future forecasts ŷT +h|T = 0 apart from ŷT +1|T . If the process had a
constant c, then ŷT +h|T = c. In this case, it is ∆yt which follows an MA(1) with a non-zero mean,
so ∆y
c T +h|T = c, where ∆yc T +h|T denotes the forecast of ∆yT +h = yT +h − yT +h−1 made at time T . If
the level series yt is predicted to grow by c every time period, then ŷT +h|T will converge to a linear
line: hc.

You might also like