Professional Documents
Culture Documents
EC2019 Econometrics II: Seminar 4-Solution
EC2019 Econometrics II: Seminar 4-Solution
EC2019 Econometrics II: Seminar 4-Solution
where c is a non-zero constant, and the error term t satisfies the usual white-noise assumptions:
E(t ) = 0 ∀t
E(2t ) = σ 2 ∀t
E(t t−s ) = 0 ∀s 6= 0
Let ŷt+h|t denote the optimal linear prediction of yt+h made at time t. Demonstrate that ŷt+h|t
approaches the unconditional mean of the process as h −→ ∞. Let eh denote the prediction error
such that
eh = yt+h − ŷt+h|t (2)
Show that the prediction error variance approaches the unconditional variance of the process as
h −→ ∞.
Recall the unconditional mean and variance of an AR(1) with no deterministic components:
E(yt ) = 0 (3)
σ2
V (yt ) = (4)
1 − φ2
Optimal linear predictions are derived by taking conditional expectation of future values at time t.
For h = 1, the actual yt+1 is generated by
yt is known at time t and hence part of the information set It . However, t+1 in (5) is a future error
term. Given that every t is independent from t−s , ∀s 6= 0, E(t+1 |It ) = E(t+1 ) = 0—i.e. having
the past information does not help predict future random disturbances any better. The forecast error
e1 is the difference between the actual (5) and its prediction (6), which is obviously t+1 .
1
Taking conditional expectation at time t gives its prediction:
It is clear that the general prediction formula for the process of (1) is given by:
ŷt+h|t = φŷt+h−1|t
(11)
= φh yt
As for the prediction error and its variance, in order to compute the prediction error variance,
we need to express the forecast errors in terms of t ’s, since this is the random variable whose
statistical properties we know—in this instance, by assumption. The one-step-ahead prediction error
is obviously the difference between (5) and (6), which is t+1 . For h > 1, we need to express yt+h in
terms of future errors. For h = 2:
For h = 3:
By following the same pattern, it is clear that the actual observation yt+h can be expressed as:
Writing the actual and predicted side-by-side makes evaluating the prediction error much easier.
yt+h ŷt+h|t
The prediction errors are merely the difference between yt+h and ŷt+h|t , therefore:
e1 = t+1
e2 = φt+1 + t+2
2
e3 = φ2 t+1 + φt+2 + t+3
e4 = φ3 t+1 + φ2 t+2 + φt+3 + t+4
..
.
eh = φh−1 t+1 + φh−2 t+2 + · · · + t+h (13)
V (e1 ) = E(2t+1 ) = σ 2
V (e2 ) = E[(φt+1 + t+2 )2 ]
= φ2 E(2t+1 ) + E(2t+2 )
= σ 2 (1 + φ2 )
V (e3 ) = E[(φ2 t+1 + φt+2 + t+3 )2 ]
= σ 2 (1 + φ2 + φ4 )
V (e4 ) = E[(φ3 t+1 + φ2 t+2 + φt+3 + t+4 )2 ]
= σ 2 (1 + φ2 + φ4 + φ6 )
..
.
V (eh ) = σ 2 (1 + φ2 + φ4 + · · · φ2(h−1) )
The common increment in the sum is φ2 . Since 0 ≤ φ2 < 1 due to the stationarity condition:
σ2
lim V (eh ) = = V (yt ) (14)
h→∞ 1 − φ2
and the error term t follows a normal distribution with the following properties:
E(t ) = 0 ∀t
E(2t ) = σ 2 ∀t
E(t t−s ) = 0 ∀s 6= 0
3
(15) and (17) are mathematically equivalent with:
ρ = φ1 + φ2 − 1; π = −φ2 (18)
regardless of the presence/absence of a unit root in yt . Consider the case where the process yt
generated by (15) contains a unit root; so that one of the roots of the polynomial
φ(z) = 1 − φ1 z − φ2 z 2 = 0
φ(z) = 1 − φ1 z − φ2 z 2
= (1 − λ1 z)(1 − λ2 z)
= 1 − (λ1 + λ2 )z + λ1 λ2 z 2
This enables us to express the AR parameters in terms of the roots of the polynomial as:
φ1 = λ1 + λ2
φ2 = −λ1 λ2
ρ = λ1 + λ2 − λ1 λ2 − 1
ρ = φ1 + φ2 − 1
= 1 + λ2 − λ2 − 1
=0
Now let λ2 = 1:
ρ = λ1 + 1 − λ1 − 1 = 0
Therefore, when either of the roots is 1, ρ = 0, and the AR(2) process of (17) becomes:
and the error term t follows a normal distribution with the following properties:
E(t ) = 0 ∀t
E(2t ) =σ 2
∀t
E(t t−s ) = 0 ∀s 6= 0
Suppose you have drawn T observations, y1 , y2 , · · · , yT from this process. Discuss how you can
estimate the model of (20) and compute optimal linear forecasts for the level yT +h , h = 1, 2, 3.
4
The process of (20) is an I(1) process: it has an AR unit root. Recall from the discussion on spurious
regression that the parameters of a unit root process cannot be consistently estimated. However, we
know that the first difference of an I(1) process is stationary. In this case, just subtracting yt−1 from
both sides of (20) gives:
∆yt = c + t − θt−1 (21)
which is a stationary MA(1) in ∆yt . Therefore, using the conditional maximum likelihood estimation
method, the unknown parameters of the process (20), c and θ, can be consistently estimated provided
that the invertibility condition |θ| < 1 holds.
Following the procedure detailed in Chapter 3, Section 2 of my lecture notes, the conditional likelihood
function, conditional on the assumption that 1 is a known constant, is given by:
T
−(∆yt − c + θt−1 )2
Y 1
f (∆y2 ∩ ∆y3 ∩ · · · ∩ ∆yT ) = √ exp (22)
2πσ 2 2σ 2
t=2
Note that given the set of observations y1 , y2 , · · · , yT as the question says, ∆y1 = y1 − y0 cannot be
computed since y0 obviously does not exit. Hence the first usable observation for ∆yt is ∆y2 . Then
the log-likelihood function is given by:
T
T −1 T −1 1 X
ll(θ) = − ln(2π) − ln σ 2 − 2 (∆yt − c + θt−1 )2 (23)
2 2 2σ
t=2
0
where θ is a vector containing the unknown parameters of the process, in this case θ = c θ σ 2 .
Let c̃, θ̃, and σ̃ 2 denote the maximum likelihood estimates of c, θ, and σ 2 respectively. Using these,
we can compute forecasts for the level yt by simply using equation (20). Let ŷt+h|t = E(yt+h |t), the
optimal forecast of yt+h computed at time t.
yT +1 = c + yT + T +1 − θT
ŷT +1|T = c̃ + yT − θ̃T
At time T , the present value T is known—in practice, once the parameter estimates c, θ, and σ 2 are
obtained, the residual series {˜
t } can always be used.
yT +2 = c + yT +1 + T +2 − θT +1
ŷT +2|T = c̃ + ŷT +1|T
= 2c̃ + yT − θ̃T
yT +3 = c + yT +2 + T +3 − θT +2
ŷT +3|T = c̃ + ŷT +2|T
= 3c̃ + yT − θ̃T