Forecasting: 1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast

Forecasting
1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast

We have now considered how to determine which ARIMA model we should fit to our
data, we have also examined how to estimate the parameters in the ARIMA model
and how to examine the goodness of fit of our model. So we are finally in a position
to use our model to forecast new values for our time series. Suppose we have
observed a time series up to time T, so that we have known values for 1 , 2 , , ,
and subsequently, we also know 1 , 2 , , . This collection of information we
know will be denoted as:
= {1 , 2 , , ; 1 , 2 , , }
Suppose that we want to forecast a value for + , time units into the future. We
now introduce the most prevalent optimality criterion for forecasting.
Theorem: The optimal k-step ahead forecast +, (which is a function of ) that

will minimize the mean square error,
2
E(+ +, )
is
+, = E(+ | )
This optimal forecast is referred to as the Minimum Mean Square Error Forecast.
This optimal forecast is unbiased because
E(+ +, ) = E [E ((+ +, )| )] = E[+, +, ] = 0
Lemma: Suppose Z and W are real random variables, then

2 2
min [( ()) ] = [( (|)) ]

1
That is, the posterior mean E(Z|W) will minimize the quadratic loss (mean square
error).
Proof:
2 2
[( ()) ] = [( (|) + (|) ()) ]
2
= [( (|)) ] + 2[( (|))((|) ())]
2
+ [((|) ()) ]
Conditioning on W, the cross term is zero. Thus

2 2 2
[( ()) ] = [( (|)) ] + [((|) ()) ]
Note: We usually omit the word optimal in the K Periods Ahead (Optimal) Forecast,
because in the time series context, when we refer to forecast, we mean the optimal
minimum mean square error forecast, by default.
2
2. Forecast in MA(1)
The MA(1) model is:
= + 1 , ~(0, 2 )
Given the observed data {1 , 2 , , }, the white noise term is not directly
observed, however, we can perform an Recursive Estimation of the White Noise as
follows:
Given the initial condition 0 (not important), we have:
1 = 1 0
2 = 2 1

= 1
Therefore, given {1 , 2 , , }, we can also obtain {1 , 2 , , }, the entire

information known can thus be written as: = { , , , ; , , , }
Since +1 = +1 + , the one period ahead (optimal) forecast is

+1, = (+1| ) = (+1 + | )
= (+1 | ) + ( | )
= (+1 ) + ( | )
= 0 + =
Note: Per convention in time series literature, we wrote:

( | ) =
Although one can write this more rigorously as:
( | = ) =
Similarly, we wrote
( | ) = ,
which can be written more rigorously as:
( | = ) =
The forecast error is

+1, = +1 +1, = +1
Since the forecast is unbiased, the minimum mean square error is equal to the
forecast error variance:
2
E(+1 +1, ) = (+1 +1, ) = (+1 ) = 2
Given = {1 , 2 , , ; 1 , 2 , , }, and +2 = +2 + +1 ,
the two periods ahead (optimal) forecast is
+2, = (+2| ) = (+2 + +1| )
= (+2 | ) + (+1| )
= 0+0 =0
3
+2, = +2 +2, = +2 + +1
2
E(+2 +2, ) = (+2 +2, ) = (+2 + +1 ) = (1 + 2 ) 2
In summary, for more than one period ahead, just like the two-periods ahead
forecast, the forecast is zero, the unconditional mean. That is,
+2, = (+2| ) = 0
+3, = (+3| ) = 0

The MA(1) process is not forecastable for more than one period ahead (apart from
the unconditional mean).
Forecast for MA(1) with Intercept (non-zero mean)

If the MA(1) model includes an intercept
= + + 1 , ~(0, 2 )
We can perform forecasting using the same approach.

For example, since +1 = ++1 + , the one period ahead (optimal)
forecast is
+1, = (+1 | ) = ( + +1 + | )
= + (+1 | ) + ( | )
= + 0 + = +
+1, = +1 +1, = +1
2
E(+1 +1, ) = (+1 +1, ) = (+1 ) = 2
3. Generalization to MA(q):
For a MA(q) process, we can forecast up to q out-of-sample periods. Can you write
down the (optimal) point forecast for each period, the forecast error, and the
forecast error variance? Please practice before the exam.
4. Forecast in AR(1)
For the AR(1) model,
X = X1 + , ~(0, 2 )
Given = {1 , 2 , , ; 1 , 2 , , };
The One Period Ahead (Optimal) Forecast can be computed as follows:
4
X +1 = X ++1
So, given data , the one period ahead forecast is
+1, = (X +1| ) = (X ++1 | )

= (X | ) + (+1| )
= X + 0
= X
+1, = +1 +1, = +1
The forecast error variance:
(+1, ) = (+1 ) = 2
Two Periods Ahead Forecast

By back-substitution
X +2 = X +1 ++2
X +2 = (X ++1 )++2
X +2 = 2 X ++1 ++2
So, given data , the two periods ahead (optimal) forecast is
+2, = (+2 | ) = (2 X ++1 ++2 | )

= 2 X + 0 + 0
= 2 X
That is, the 2 periods ahead forecast is also a linear function of the final observed
value, but with the coefficient 2 .

+2, = +2 +2, = +1 ++2
The forecast error variance is:
Var(+2, ) = (+1 ++2 ) = (1 + 2 ) 2
K Periods Ahead Forecast

Similarly
X + = X +1 +1 + + +1 + +
So, given I , the k periods ahead optimal forecast is

+, = (X + |I ) = X
AR(1) with Intercept

If the AR(1) model includes an intercept
X = + X1 + , ~(0, 2 )
Then the one period ahead forecast is
+1, = (+1| )
= + ( | ) + (+1| )
5
= +
What is the two periods ahead forecast?
Forecast AR(1) recursively.

Alternatively, rather than using the back-substitution method directly as qwe have
shown above, we can forecast the AR(1) recursively as follows:
X +2 = X +1 ++2
So, given data , the two periods ahead (optimal) forecast is
+2, = (+2 | ) = (X +1 ++2 | ) = (X +1| ) = +1,
Similarly, one can obtain the general recursive forecast relationship as:
+, = (+ | ) = (X +1 ++ | )
= (X +1 | ) = +1,
= = 1 +1,
This way, once you have computed the one-step ahead forecast as:
+1, = (X +1| ) = (X ++1 | )
= (X | ) + (+1| )
= X + 0
= X
You can obtain the rest of the forecast easily.
Note: As mentioned in class, I prefer the back-substitution method (introduced first)

in general because it can provide you with the forecast error directly as well.
5. AR(p) Process
Consider an AR(2) model with intercept
X = + 1 X1 + 2 X2 + , ~(0, 2 )
We have
X +1 = + 1 X + 2 X 1 + +1
Then the one period ahead forecast is
+1, = (+1| )
= + (1 | ) + (2 1 | ) + (+1| )
= + 1 + 2 X 1
The two periods ahead forecast (by the recursive foreecasting method) is
+2, = (+2| )
= + (1 +1 | ) + (2 | ) + (+2| )
= + 1 +1, + 2 X
= + 1 ( + 1 + 2 X 1 ) + 2 X
= (1 + 1 ) + (1 2 + 2 )X + 1 2 X 1
6
Alternatively, the two periods ahead forecast (by the back-substitution method) can
be derived as:
X +2 = + 1 X +1 + 2 X + +2
= + 1 ( + 1 X + 2 X 1 + +1 ) + 2 X + +2
= (1 + 1 ) + (1 2 + 2 )X + 1 2 X 1 + 1 +1 + +2
Therefore
+2, = (+2| )
= (1 + 1 ) + (1 2 + 2 )X + 1 2 X 1
+2, = +2 +2, = 1 +1 + +2
The forecast error variance is:
Var(+2, ) = (1 +1 + +2 ) = (1 + 12 ) 2
What are the forecasts for more future periods?

Please practice before the exam.
6. ARMA Process
Consider an ARMA(1,1) model
X = X1 + + 1 , ~(0, 2 )
Again we can use the back-substitution method to compute point forecast.
X +1 = X + +1 +
X +2 = X +1 + +2 + +1
= (X + +1 + ) + +2 + +1
One-step ahead forecast:
+1, = (+1| )
= ( | ) + (+1 | ) + ( | )
= +
Two-step ahead forecast:
+2, = (+2| )
= (+1 | ) + (+2 | ) + (+1| )
= (+1 | ) = ( + )
The two-step ahead forecast error is

+2, = +2 +2, = +1 + +2 + +1 = ( + )+1 + +2
Again, since the forecast is unbiased, the minimum mean square error is equal to the
2
E(+2 +2, ) = (+2 +2, ) = [( + )+1 + +2 ]
= [( + )2 + 1] 2
7
7. ARIMA Process
Suppose follows the ARIMA(1,1,0) model. That means its first difference
= = 1
follows the ARMA(1,0) model, which is simply the AR(1) model as follows:
= 1 +
~(0, 2 )
: we used 2Z instead of 2 so you can get familiar with different notations.
Substituting = = 1 , and 1 = 1 = 1 2 we can write

the original ARIMA(1,1,0) model as:
1 = (1 2 ) +
That is:
= (1 + )1 2 +
One-step ahead forecast:
+1, = [+1 | ] = [(1 + ) 1 + +1 | ]
= (1 + ) 1
Thus the forecast error
+1, = +1 +1,
= (1 + ) 1 + +1 [(1 + ) 1 ]
= +1
And the variance of the forecast error
(+1, ) = (+1 ) = 2
Two-step ahead forecast:
+2 = (1 + )+1 + +2
= (1 + )[(1 + ) 1 + +1 ] + +2
= (1 + + 2 ) ( + 2 )1 + +2 + (1 + )+1
Therefore
+2, = [+2 | ]
2 ) 2
= [(1 + + ( + )1 + +2 + (1 + )+1 | ]
= (1 + + ) ( + 2 )1
2
Thus the forecast error

+2, = +2 +2, = +2 + (1 + )+1
And the variance of the forecast error
(+2, ) = (2 + 2 + 2 )2
8. Prediction (Forecast) Error, and Prediction Interval
Recall the k-step ahead prediction error is defined as

e + , = X + E(X + |I )
By the law of total variance:

Var( ) = E[Var( |X)] + Var[E( |X)]
8
We can derive the variance of the prediction error:
Var(e + , ) = Var(X + |I )
Based on the predictor and its error variance, we may construct the prediction
intervals. For example, if the X s are normally distributed, we can consider the 95%
prediction interval
E(X + |I ) 0.02 Var(X + |I )

Forecasting: 1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Forecasting: 1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast

Uploaded by

Copyright:

Available Formats

Forecasting

1. Optimal Forecast Criterion - Minimum Mean Square Error Forecast

Theorem: The optimal k-step ahead forecast +, (which is a function of ) that

Lemma: Suppose Z and W are real random variables, then

Conditioning on W, the cross term is zero. Thus

Therefore, given {1 , 2 , , }, we can also obtain {1 , 2 , , }, the entire

Since +1 = +1 + , the one period ahead (optimal) forecast is

Note: Per convention in time series literature, we wrote:

The forecast error is

Forecast for MA(1) with Intercept (non-zero mean)

We can perform forecasting using the same approach.

+1, = (X +1| ) = (X ++1 | )

Two Periods Ahead Forecast

So, given data , the two periods ahead (optimal) forecast is

+2, = (+2 | ) = (2 X ++1 ++2 | )

The forecast error is

K Periods Ahead Forecast

So, given I , the k periods ahead optimal forecast is

AR(1) with Intercept

Forecast AR(1) recursively.

+2, = (+2 | ) = (X +1 ++2 | ) = (X +1| ) = +1,

Note: As mentioned in class, I prefer the back-substitution method (introduced first)

What are the forecasts for more future periods?

The two-step ahead forecast error is

Substituting = = 1 , and 1 = 1 = 1 2 we can write

Thus the forecast error

8. Prediction (Forecast) Error, and Prediction Interval

Recall the k-step ahead prediction error is defined as

By the law of total variance:

E(X + |I ) 0.02 Var(X + |I )

You might also like