Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Time Series Analysis

Chapter 3 - Basic concepts

Annika Camehl
Econometric Institute
Erasmus University Rotterdam
Chapter 3 - Basic concepts

• Modeling cycle / Specification strategy


• Parameter estimation
• Model selection
• Diagnostic measures
• Forecasting

Time Series Analysis: Chapter 3 1


AR(p) and MA(q) models

An autoregressive model of order p [AR(p)] is given by

yt = φ1yt−1 + φ2yt−2 + · · · + φpyt−p + εt, (1)


where φ1, φ2, . . . , φp are unknown parameters, and εt is a standard
white noise process.

A moving average model of order q [MA(q)] is given by

yt = εt + θ1εt−1 + ... + θq εt−q . (2)


where θ1, θ2, . . . , θq are unknown parameters.

Time Series Analysis: Chapter 3 2


ARMAX(p,q) model

In general, we may consider an AutoRegressive MovingAverage


model (with eXogenous regressors) (ARMA(X) model):

yt = α + φ1yt−1 + · · · + φpyt−p
+ εt + θ1εt−1 + ... + θq εt−q
(+β1x1,t + β2x2,t + · · · + βk xk,t), (3)
where xi,t, i = 1, . . . , k denote (exogenous) regressors.

Time Series Analysis: Chapter 3 3


Empirical specification strategy

• How to apply / implement ARMAX models in practice?

⇒ Strongly recommended to do this following a systematic,


structured approach. We may, for example, adopt the following
“model building procedure”:

1. Select regressors x1,t, . . . , xk,t and AR and MA orders p and q;


2. Estimate parameters φ = (α, φ1, . . . , φp), β = (β1, . . . , βk ), and
θ = (θ1, . . . , θq );
3. Evaluate the model by applying misspecification tests and
other diagnostic measures;
4. Modify the model if necessary (that is, go back to step 1);
5. If the model can not be further improved and is satisfactory,
use it for description or forecasting.
Time Series Analysis: Chapter 3 4
Parameter estimation

Note that an AR(p) model

yt = α + φ1yt−1 + · · · + φpyt−p + εt,


is essentially nothing else than a linear regression model, with
lagged dependent variables as regressors. Hence, the unknown
parameters α, φ1, . . . , φp can be estimated straightforwardly using
ordinary least squares [OLS].

The MA(q) model

yt = εt + θ1εt−1 + ... + θq εt−q ,


also resembles a linear regression, but note that the ‘regressors’
are unknown / unobserved shocks εt−k , k = 1, . . . , q. We may
consider nonlinear least squares [NLS] or maximum likelihood
[ML] estimation.
Time Series Analysis: Chapter 3 5
Ordinary least squares

Estimating the parameters β in a linear regression model


yt = x0tβ + εt, t = 1, . . . , T, (4)
where xt is a (k × 1) vector of regressors, can be done by OLS,
which minimizes the sum of squared residuals
T
(yt − x0tβ)2.
X
bOLS = argmin Q(β) = argmin
t=1

From the first-order conditions, it follows easily that


 −1  
T T
0
X X
bOLS =  xtxt  xtyt ,
t=1 t=1
or, using matrix notation (Y = Xβ + ε),
bOLS = (X 0X)−1X 0y. (5)

Time Series Analysis: Chapter 3 6


Ordinary least squares

Under suitable regularity conditions, bOLS is unbiased (and


consistent), e.g. assuming fixed regressors X and E[εt] = 0,
E[bOLS ] = E[(X 0X)−1X 0y]
= E[(X 0X)−1X 0(Xβ + ε)]
= β + E[(X 0X)−1X 0ε]
= β + (X 0X)−1X 0E[ε]
= β.
To compute standard errors, we need the covariance matrix of
bOLS ,
V[bOLS ] = E[(bOLS − β)(bOLS − β)0]
= E[(X 0X)−1X 0εε0X(X 0X)−1]
= (X 0X)−1X 0E[εε0]X(X 0X)−1
= (X 0X)−1X 0ΩX(X 0X)−1, (6)
where Ω is the covariance matrix of ε.
Time Series Analysis: Chapter 3 7
Ordinary least squares

Standard errors of bOLS are the square roots of the diagonal


elements of
V[bOLS ] = (X 0X)−1X 0ΩX(X 0X)−1.

• If εt is homoskedastic (E[ε2 2
t ] = σ for all t) and uncorrelated
(E[εtεs] = 0 for all t and s), Ω = σ 2I, such that

V [bOLS ] = σ 2(X 0X)−1.

Time Series Analysis: Chapter 3 8


Ordinary least squares

Standard errors of bOLS are the square roots of the diagonal


elements of
V[bOLS ] = (X 0X)−1X 0ΩX(X 0X)−1.

• If εt is heteroskedastic (E[ε2 2
t ] = σt ) and uncorrelated, we have
Ω = diag(σ12, σ22, . . . , σT2 ), such that
 −1   −1
T T T
xtx0t σt2xtx0t  xtx0t
X X X
V [bOLS ] =   (7)
t=1 t=1 t=1

⇒ Using ε̂2 0 2
t (where ε̂t = yt − xt bOLS ) to estimate σt gives the
so-called ‘White standard errors’.

Time Series Analysis: Chapter 3 9


Ordinary least squares

An important difference between the ‘classical’ linear regression


model and an AR(p) model
yt = α + φ1yt−1 + · · · + φpyt−p + εt,
is that now the regressors yt−1, . . . , yt−p are not fixed but are
stochastic.

This basically implies that exact finite-sample results which hold


in the ‘classical’ linear regression model yt = x0tβ + εt with fixed xt’s
do not hold in the time series context.
⇒ Asymptotic results continue to hold, however.

For example, the OLS estimator of φ1 in the AR(1) model


yt = φ1yt−1 + εt,
is not unbiased but remains consistent.
Time Series Analysis: Chapter 3 10
Finite-samples and asymptotics

Whether or not we can rely upon asymptotic results to draw


inference in finite samples depends on the context – and in
particular on the sample size T .

Example: suppose a series yt is generated from the AR(1) model,

yt = φ1yt−1 + εt, t = 1, 2, . . . , T. (8)

⇒ Question: What is the distribution of the OLS estimator of φ1?

Time Series Analysis: Chapter 3 11


Finite-samples and asymptotics

The OLS estimate of the AR parameter φ1 is equal to


  , 
T T
2 
X X
φ̂1 =  yt−1yt  yt−1
t=1 t=1
  , 
T T
2 
X X
=  yt−1(φ1yt−1 + εt)   yt−1
t=1 t=1
  , 
T T
2 ,
X X
= φ1 +  yt−1εt  yt−1 (9)
t=1 t=1
from which it follows that
  , 
√ T T
1 X 1 X
2 
T (φ̂1 − φ1) =  √ yt−1εt  yt−1 (10)
T t=1 T t=1

Time Series Analysis: Chapter 3 12


Finite-samples and asymptotics

If εt ∼ iid(0, σ 2) for all t,


T
1 X 2 σ2
yt−1 → γ0 = (11)
T t=1 1 − φ2
1
T
1 X
√ yt−1εt → N (0, γ0σ 2) (12)
T t=1
2 ε2 ] = E[y 2 ] · E[ε2 ] = γ σ 2 ).
(because E[yt−1 t t−1 t 0

It then follows that


√ a
T (φ̂1 − φ1) ∼ N (0, σ 2γ0−1)

⇒ But is this asymptotic normal distribution a reasonable


approximation to the finite-sample distribution of φ̂ for small T ?

Time Series Analysis: Chapter 3 13


Finite-samples and asymptotics

3.5

Histogram
3.0 Normal

2.5

2.0
Density
1.5

1.0

0.5

0.0
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2

Exact small-sample distribution and asymptotic


normal distribution of φ̂1 as given in (9) for series
generated according to (8) with φ1 = 0.5 and T = 50.

Time Series Analysis: Chapter 3 14


Finite-samples and asymptotics

The exact small-sample distribution can be obtained by means of


Monte Carlo simulation:

1. Obtain independent drawings ε∗t , t = 1, . . . , T , from a normal


distribution.
2. Construct an artificial time series yt∗ from
(0)
yt∗ = φ1 yt−1
∗ + ε∗t , t = 1, . . . , T,
(0)
where φ1 is the value of the AR-parameter of interest.
3. Estimate the AR(1) model yt∗ = φ1yt−1
∗ + εt with the artificial
sample to obtain an estimate of the AR(1)-parameter.
4. Repeat steps 1-3 a large number of B times to obtain
(1) (B)
estimates φ̂1 , . . . , φ̂1 .
⇒ These form an estimate of the distribution of φ̂1.
Time Series Analysis: Chapter 3 15
Variable/Model selection

• Selecting the regressors x1,t, . . . , xk,t and AR and MA orders p


and q can be done in many different ways.

For p and q, we may try to use the properties of the empirical


autocorrelations and partial autocorrelations, but often these do
not clearly match the theoretical properties implied by AR and MA
models.

Two alternative methods that are very popular in empirical


applications are
1. model selection criteria based on in-sample fit;
2. out-of-sample forecasting;

Time Series Analysis: Chapter 3 16


Model selection criteria

Our model should describe the variable yt as good as possible. At


the same time, we prefer “small” models.
⇒ We should find a balance between “model fit” and “model
parsimony” (number of parameters to be estimated). Model
selection criteria aim to achieve this.

The Akaike Information Criterion [AIC] is given by

AIC(k) = T log σ̂ 2 + 2k, (13)


where T denotes the sample size, k is number of parameters and
σ̂ 2 is residual variance.
The Schwarz Information Criterion [SIC] is given by

SIC(k) = T log σ̂ 2 + k log T. (14)

⇒ Select ARMA orders p and q that minimize AIC(k) or SIC(k).


Time Series Analysis: Chapter 3 17
Misspecification tests and diagnostic measures

How can we evaluate whether an estimated model is ok?

Several possibilities exist. Many “misspecification tests” aim to


test whether the residuals ε̂t of the ARMA model satisfy the white
noise properties

E[ε2 2
t] = σ ,
E[εtεs] = 0, for all s 6= t.

1. Test of no residual autocorrelation, based on the empirical


autocorrelations rk (ε̂) = ( T
P PT 2
ε̂ ε̂
t=k+1 t t−k )/( t=1 ε̂t ).
m r 2 (ε̂)
a
k ∼ χ2(m).
X
LB(m) = T (T + 2)
k=1
T −k
⇒ If rejected, AR or MA order should be adjusted.
Time Series Analysis: Chapter 3 18
Misspecification tests and diagnostic measures

Another test for residual autocorrelation is based on the


Lagrange Multiplier [LM] principle.

To test an AR(p) model against AR(p + r) or ARMA(p, r), the LM


test is obtained by estimating the auxiliary regression

ε̂t = α1yt−1 + · · · + αpyt−p + αp+1ε̂t−1 + · · · + αp+r ε̂t−r + vt, (15)


where ε̂t are the estimated residuals of the AR(p) model.

The test statistic is calculated as T R2, and is asymptotically


χ2(r)-distributed.

Time Series Analysis: Chapter 3 19


Misspecification tests and diagnostic measures

2. Test of homoskedasticity (constant variance), often based


on autocorrelations of squared residuals.

⇒ If rejected, standard errors of parameters should be adjusted


or heteroskedasticity should be modelled explicitly.
3. Test of normality: Skewness=0, Kurtosis=3
T d2 T b
SKε̂ +
JB = (Kε̂ − 3)2,
6 24
where the skewness and kurtosis of ε̂t can be calculated as
T
q
1 X j
SK
d = m̂ /
ε̂ 3 m̂3
2, and b = m̂ /m̂2,
Kε̂ 4 2 with m̂j = ε̂t .
T t=1
a
Under the null hypothesis of normality: JB ∼ χ2
2

Time Series Analysis: Chapter 3 20


Forecasting

Given a model that has been specified and estimated using


observations y1, y2, . . . , yT , we may generate an h-step ahead
forecast for yT +h. Time T is called the forecast origin, and h is
the forecast horizon.

Three different, but related types of forecasts:

1. A point forecast of yT +h, denoted ŷT +h|T , provides a specific


value for this observation.
2. An interval forecast consists of a lower bound L b
T +h|T and an
upper bound U b
T +h|T such that interval (LT +h|T , UT +h|T )
b b
contains the actual value yT +h with a certain probability.
3. A density forecast concerns the conditional distribution of
yT +h, denoted as f (yT +h|YT ).

Time Series Analysis: Chapter 3 21


Different types of forecasts

For a given variable of interest, different types of forecasts may be


given:
1. Point forecasts
‘Inflation in the euro area over the next twelve months is expected to be
equal to 2.3 percent.’

2. Interval forecasts
‘Inflation in the euro area over the next twelve months will be between 1.0
and 3.6 percent with probability 0.95.’

3. Density forecasts
‘Inflation in the euro area over the next twelve months is normally
distributed with mean equal to 2.3 percent and standard deviation 0.65.’

Time Series Analysis: Chapter 3 22


Constructing point forecasts

What the optimal h-step ahead point forecast is depends on a


so-called loss function.

General idea: Forecasts are used in decision-making. Forecasts


that differ from the actual values lead to sub-optimal decisions. In
other words, forecast errors lead to a certain ‘loss’. We should use
the point forecast ybT +h|T that minimizes the expected value of the
loss function.

⇒ The form of the loss function depends on the particular


application, that is, on the variable that we are forecasting.
Forecasting Big Mac sales for McDonald’s is quite different from
forecasting the spread of a corona-virus...

Time Series Analysis: Chapter 3 23


Constructing point forecasts

• In many cases, the relevant loss function is difficult to specify


exactly. We therefore use simple functions that only depend on
the forecast error: eT +h|T = yT +h − ŷT +h|T .

• Most often, we assume that the forecast user has a squared


loss function, that is
LossT +h|T = e2
T +h|T . (16)

Minimizing the expected value of (16), or the Mean Squared


Prediction Error [MSPE], we find that the optimal point forecast
is the conditional mean of yT +h, that is
ŷT +h|T = E[yT +h|YT ]. (17)

Time Series Analysis: Chapter 3 24


Point forecasts: AR(1) model

Consider the AR(1) model

yt = φ1yt−1 + εt, (18)


with E[εt|Yt−1] = 0 and E[ε2 2
t |Yt−1 ] = σ . Assume the value of φ1 is
known.

At time T , the value of εT +1 is yet unknown, and since


E[εT +1|YT ] = 0, the optimal point forecast of yT +1 equals

ŷT +1|T = E[yT +1|YT ]


= E[φ1yT + εT +1|YT ]
= φ1yT . (19)

Time Series Analysis: Chapter 3 25


Point forecasts: AR(1) model

The one-step ahead forecast error eT +1|T is equal to the shock


occurring at t = T + 1:
eT +1|T = yT +1 − ŷT +1|T = yT +1 − φ1yT = εT +1. (20)

Hence, the variance of the forecast error V[eT +1|T ] is equal to σ 2,


which is the variance of εt and also the conditional variance
V [yT +1|YT ].

Time Series Analysis: Chapter 3 26


Point forecasts: AR(1) model

For two steps ahead, we obtain


ŷT +2|T = E[φ1yT +1 + εT +2|YT ]
= φ1E[yT +1|YT ]
= φ1ŷT +1|T
= φ2
1 yT . (21)

Since the actual value yT +2 is given by


yT +2 = φ1yT +1 + εT +2
= φ1(φ1yT + εT +1) + εT +2, (22)
it holds that the two-step ahead forecast error is equal to
eT +2|T ≡ yT +2 − ŷT +2|T = εT +2 + φ1εT +1,
with variance V[eT +2|T ] = (1 + φ2 1 )σ 2 , which shows that
V[eT +2|T ] > V[eT +1|T ].
Time Series Analysis: Chapter 3 27
Point forecasts: AR(1) model

For three steps ahead, we would have

ŷT +3|T = E[φ1yT +2 + εT +3|YT ]


= φ1ŷT +2|T
= φ3
1 yT . (23)

Since the actual value yT +3 is given by

yT +3 = φ1yT +2 + εT +3
= φ1(φ1(φ1yT + εT +1) + εT +2) + εT +3, (24)
it holds that the three-step ahead forecast error is equal to

eT +3|T ≡ yT +3 − ŷT +3|T = εT +3 + φ1εT +2 + φ2


1 εT +1 ,
with variance V[eT +3|T ] = (1 + φ2 4 2
1 + φ1 )σ , which shows that
V[eT +3|T ] > V[eT +2|T ].
Time Series Analysis: Chapter 3 28
Point forecasts: AR(1) model

This can be generalized to the following results:


1. The h-step ahead forecast can be computed either directly as
ŷT +h|T = φh
1 yT , or recursively as ŷT +h|T = φ1 ŷT +h−1|T .
2. As h becomes larger, the h-step ahead forecast ŷT +h|T
converges to 0, which is the unconditional mean of yT +h.
3. The h-step ahead forecast error eT +h|T is an MA(h-1) process,
namely a linear combination of the shocks that occur between
time T and T + h, that is, εT +1, εT +2, . . . , εT +h.
4. The variance of the forecast errors is increasing in the horizon
h, that is, V[eT +h|T ] > V[eT +h−1|T ] for all h > 1.
5. As h becomes larger, V[eT +h|T ] converges to σ 2/(1 − φ2
1 ),
which is the unconditional variance of yT +h.

⇒ Similar results hold for general ARMA(p,q) models.


Time Series Analysis: Chapter 3 29
Point forecasts: convergence to the unconditional mean

Consider the AR(1) model with intercept


yt = δ + φ1yt−1 + εt, (25)
with E[εt|Yt−1] = 0 and E[ε2 2
t |Yt−1 ] = σ .

At time T , the optimal point forecast of yT +1 equals


ŷT +1|T = E[yT +1|YT ]
= E[δ + φ1yT + εT +1|YT ]
= δ + φ1yT . (26)

For two steps ahead, we obtain


ŷT +2|T = E[δ + φ1yT +1 + εT +2|YT ]
= δ + φ1E[yT +1|YT ] = δ + φ1ŷT +1|T
= δ(1 + φ1) + φ2
1 yT . (27)

Time Series Analysis: Chapter 3 30


Point forecasts: convergence to the unconditional mean

For three steps ahead, we obtain


ŷT +3|T = E[δ + φ1yT +2 + εT +3|YT ]
= δ + φ1ŷT +2|T
= δ(1 + φ1 + φ2
1 ) + φ 3y .
1 T (28)

And, in general, the h-steps ahead point forecast is given by


ŷT +h|T = E[δ + φ1yT +h−1 + εT +h|YT ]
= δ + φ1ŷT +h−1|T
= δ(1 + φ1 + . . . + φh−1
1 ) + φ hy .
1 T (29)

As h → ∞ (and assuming |φ1| < 1), this converges to


δ
ŷT +h|T → δ(1 + φ1 + φ2 3 4
1 + φ1 + φ1 + . . .) = = E[yt].
1 − φ1

Time Series Analysis: Chapter 3 31


Point forecasts: convergence to the unconditional mean

This converge of point forecasts when the forecast horizon


increases is even easier seen by rewriting the AR(1) model as

yt − µ = φ1(yt−1 − µ) + εt, (30)


with µ = δ/(1 − φ1) denoting the unconditional mean of yt (again,
assuming |φ1| < 1).

Using this representation, we find

ŷT +1|T = µ + φ1(yT − µ),


ŷT +2|T = µ + φ2
1 (yT − µ),
ŷT +3|T = µ + φ3
1 (yT − µ),
and in general, for any h ≥ 1,

ŷT +h|T = µ + φh
1 (yT − µ).

Time Series Analysis: Chapter 3 32


Initial claims for unemployment insurance

800
700
600
500
400
300
200
100
0
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

h-step ahead AR(2) forecasts for quarterly US initial claims

Time Series Analysis: Chapter 3 33


Point forecasts: effect of estimation uncertainty

In practice, the true value(s) of the model parameter(s) are


unknown. Instead, we have to use estimated parameters. For the
AR(1) model, the ‘feasible’ 1-step ahead forecast is given by

ŷT +1|T = φ̂1yT .

This has consequences for the uncertainty/precision of the


forecast. Assuming that the data generating process (DGP) is an
AR(1) process yt = φ1yt + εt, the 1-step ahead forecast error
eT +1|T is equal to

eT +1|T = yT +1 − ŷT +1|T


= φ1yT + εT +1 − φ̂1yT
= εT +1 + (φ1 − φ̂1)yT .

Time Series Analysis: Chapter 3 34


Point forecasts: effect of estimation uncertainty

This (mostly) affects the variance of the forecast error V[eT +1|T ].

Using the earlier result that T (φ̂1 − φ1) ∼ N (0, σ 2γ0−1), we find
1
 
V[eT +1|T ] = σ 2 1 + .
T

Hence, for large T , estimation uncertainty is not important, but for


small/moderate T it can have a substantial impact.

Note that this also provides a strong argument in favor of


parsimonious forecasting models, in the sense that for a model
with k parameters it holds that
k
 
V[eT +1|T ] ≈ σ 2 1 + .
T

Time Series Analysis: Chapter 3 35


Initial claims for unemployment insurance

without estimation uncertainty


800
with estimation uncertainty
700
600
500
400
300
200
100
0
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015

h-step ahead AR(2) forecasts for quarterly US initial claims

Time Series Analysis: Chapter 3 36


Point forecasts: effect of model misspecification

In practice, the true data generating process (DGP) is unknown.


The model that we use for forecasting, might be mis-specified -
which also affects the properties of point forecasts and the
associated forecast errors.

Assume that the DGP of a time series yt is the AR(2) process

yt = φ1yt−1 + φ2yt−2 + εt, (31)


with E[εt|Yt−1] = 0 and E[ε2
t |Y t−1 ] = σ 2.

And assume that for making forecasts we use an AR(1) model

yt = φ̂1yt−1 + ηt, (32)


where we assume that the ‘shocks’ ηt have the usual white noise
properties, in particular mean zero.
Time Series Analysis: Chapter 3 37
Point forecasts: effect of model misspecification

In this case, the 1-step ahead forecast at time T is again given by

ŷT +1|T = φ̂1yT .

For the corresponding forecast error, we now find [using the AR(2)
DGP for the actual value yT +1!!]

eT +1|T = yT +1 − ŷT +1|T


= φ1yT + φ2yT −1 + εT +1 − φ̂1yT
= εT +1 + (φ1 − φ̂1)yT + φ2yT −1.

Obviously, eT +1|T is different from the true shocks εT +1 (even if


φ̂1 = φ1), and may have different properties. For example, the
one-step ahead forecast errors will exhibit non-zero
autocorrelations, which they should not.
Time Series Analysis: Chapter 3 38
Evaluating point forecasts

• Evaluation of forecasts is of crucial importance, for obvious


reasons. Why would you use a model (or an expert, or another
source) for forecasting if its forecasts are not of sufficient quality?

• Also, forecast evaluation is a helpful tool for selecting among


competing forecasts (from different models or other sources).

⇒ Forecast evaluation can be done in two ways:


1. ‘Absolute’: what is the quality of forecasts from one specific
model?
2. ‘Relative’: what is the quality of forecasts from multiple
competing models, relative to each other?

Time Series Analysis: Chapter 3 39


Evaluating point forecasts

• How to evaluate a given set of P 1-step ahead point forecasts


ŷt+1|t for t = T, . . . , T + P − 1?

• How to compare the quality of forecasts from competing


models? (This is an important alternative method for model
selection!)

⇒ Most forecast evaluation techniques focus on properties of the


forecast errors
et+1|t = yt+1 − ŷt+1|t.

(Throughout we assume that the point forecasts are the


conditional mean, that is ŷt+1|t = E[yt+1|Yt].)

Time Series Analysis: Chapter 3 40


Evaluating point forecasts

Desirable properties of point forecasts:


1. Unbiasedness: forecast errors have zero mean (E[et+1|t] = 0).

⇒ Straightforward to examine, by testing whether the sample


mean of the forecast errors differs significantly from zero.
2. Accuracy: MSPE should be as small as possible

⇒ Recall that the point forecast ŷt+1|t (usually) is taken to be


the one which minimizes E[e2 2
t+1|t ] = E[(yt+1 − ŷt+1|t ) ]

Note that the MSPE can be decomposed as


 2
E[(yt+1 − ŷt+1|t)2] = V[yt+1 − ŷt+1|t] + E[yt+1 − ŷt+1|t]
 2
= V[et+1|t] + E[et+1|t]
= variance + squared bias

Time Series Analysis: Chapter 3 41


Evaluating point forecasts

Given P one-step ahead forecast, we may compute / estimate the


MSPE as
+P −1
1 t=TX
(yt+1 − ŷt+1|t)2.
P t=T

Note that the value of an MSPE by itself is difficult to interpret,


as it depends on the scale of the time series.

⇒ Compare the MSPE to the variance of the residuals from the


model which is used to construct the forecasts. Recall that, under
correct model specification, et+1|t = εt+1.

Or, compare the MSPE with the unconditional variance of yt.

Time Series Analysis: Chapter 3 42


Evaluating point forecasts

In addition, we may consider other, similar measures of forecast


accuracy such as the Mean Absolute Error (MAE)
+P −1
1 t=TX
|yt+1 − ŷt+1|t|,
P t=T
or the Mean Percentage Squared Error (MPSE)
+P −1 !2
1 t=TX yt+1 − ŷt+1|t
.
P t=T yt+1
⇒ Each of these criteria may provide useful information.
[but can also lead to conflicting results when comparing models]

Time Series Analysis: Chapter 3 43


Evaluating point forecasts

Desirable properties of forecasts:


3. Efficiency / Optimality: it should not be possible to forecast
the forecast error itself with any information available at time t.

⇒ This implies that in a regression such as


et+1|t = α0xt + ηt+1,
where xt is a vector of variables known at time t, the
coefficients α should be equal to zero.

In particular, consider the Mincer-Zarnowitz regression


et+1|t = α0 + α1ŷt+1|t + ηt+1,
or
yt+1 = β0 + β1ŷt+1|t + ηt+1, (33)
for which it should hold that β0 = 0 and β1 = 1.

Time Series Analysis: Chapter 3 44


Comparing predictive accuracy

• Obviously, models with smaller MSPE (MAE, MPSE) are


“better”. But, how can we determine whether differences in
MSPE are statistically significant?
⇒ Formal comparison of MSPE is possible with so-called
Diebold-Mariano tests.

Let ŷi,t+1|t and ŷj,t+1|t denote two competing 1-step ahead


forecasts of yt, obtained from models i and j, respectively, with
corresponding forecast errors ei,t+1|t = yt+1 − ŷi,t+1|t.

Assume that the squared forecast error ei,t+1|t is the relevant “loss
function”. Define the “loss differential” as dt+1 = e2
i,t+1|t − e 2
j,t+1|t .

⇒ Equal forecast accuracy implies E[dt+1] = 0.

Time Series Analysis: Chapter 3 45


Comparing predictive accuracy

⇒ We can test the null hypothesis of equal forecast accuracy by


testing whether the sample mean of dt+1 is not significantly
different from zero.

Given a set of P 1-step ahead forecasts for t = T, . . . , T + P − 1,


the Diebold-Mariano statistic is given by
d a
DM = r ∼ N (0, 1), (34)

ˆ 
V dt+1 /P

where d is the sample mean of dt+1 and V (dˆt+1) is an estimate of


the variance of dt+1, which can be computed as

1 X−1 
T +P 2
ˆ
V (dt+1) = dt+1 − d ,
P − 1 t=T

Time Series Analysis: Chapter 3 46


Constructing density forecasts

Recall that by definition


yT +h = ŷT +h|T + eT +h|T , (35)
where (assuming a quadratic loss function) ŷT +h|T = E[yT +h|YT ].

⇒ The forecast error eT +h|T is a random variable with a certain


distribution.
It follows that the conditional distribution of yT +h at time T , or
the density forecast, is equal to the distribution of eT +h|T , except
with a different mean ŷT +h|T instead of zero.

• As shown before, eT +h|T is in general a linear combination of


εT +1, . . . , εT +h. If we assume normality of εt, it follows that

yT +h|Yt ∼ N (ŷT +h|T , V[eT +h|T ]).

Time Series Analysis: Chapter 3 47


Constructing interval forecasts

• From the density forecast f (yT +h|YT ) we can easily obtain


interval forecasts.

⇒ An interval forecast consists of a lower bound L b


T +h|T and an
upper bound U b
T +h|T such that interval (LT +h|T , UT +h|T ) contains
b b
the actual value yT +h with a certain pre-specified probability.

• Many intervals satisfy this property. It is common to consider a


symmetric interval around the conditional mean of yT +h.

For example, in case f (yT +h|YT ) = N (ŷT +h|T , V[eT +h|T ]), a 95
percent interval forecast for yT +h is given by
√ √
(LT +h|T , UT +h|T ) = (ŷT +h|T − 1.96 V[eT +h|T ], ŷT +h|T + 1.96 V[eT +h|T ]).
b b

Time Series Analysis: Chapter 3 48


Interval and density forecasts

• Forecasters have traditionally focused primarily on point


forecasts. Only recently, more attention is given to describing the
uncertainty surrounding such point forecasts, in the form of
interval forecasts and density forecasts.

Possible explanations include

1. There was only limited demand for interval and density forecasts. With the
boom in financial risk management, among others, this demand has
increased tremendously.
2. No methods were available for the evaluation of interval and density
forecasts.
3. Interval and density forecasts traditionally were constructed analytically,
which requires strong and sometimes dubious assumptions such as normally
distributed shocks. Nowadays, interval and density forecasts can be
constructed with simulation techniques which avoid such assumptions.

Time Series Analysis: Chapter 3 49

You might also like