Slides Chapter 3b

Time Series Analysis
Chapter 3 - Basic concepts
Annika Camehl
Econometric Institute
Erasmus University Rotterdam
Chapter 3 - Basic concepts
• Modeling cycle / Specification strategy

• Parameter estimation
• Model selection
• Diagnostic measures
• Forecasting
Time Series Analysis: Chapter 3 1

AR(p) and MA(q) models
An autoregressive model of order p [AR(p)] is given by
yt = φ1yt−1 + φ2yt−2 + · · · + φpyt−p + εt, (1)

where φ1, φ2, . . . , φp are unknown parameters, and εt is a standard
white noise process.
A moving average model of order q [MA(q)] is given by
yt = εt + θ1εt−1 + ... + θq εt−q . (2)

where θ1, θ2, . . . , θq are unknown parameters.

ARMAX(p,q) model
In general, we may consider an AutoRegressive MovingAverage

model (with eXogenous regressors) (ARMA(X) model):
yt = α + φ1yt−1 + · · · + φpyt−p
+ εt + θ1εt−1 + ... + θq εt−q
(+β1x1,t + β2x2,t + · · · + βk xk,t), (3)
where xi,t, i = 1, . . . , k denote (exogenous) regressors.

Empirical specification strategy
• How to apply / implement ARMAX models in practice?
⇒ Strongly recommended to do this following a systematic,

structured approach. We may, for example, adopt the following
“model building procedure”:
1. Select regressors x1,t, . . . , xk,t and AR and MA orders p and q;

2. Estimate parameters φ = (α, φ1, . . . , φp), β = (β1, . . . , βk ), and
θ = (θ1, . . . , θq );
3. Evaluate the model by applying misspecification tests and
other diagnostic measures;
4. Modify the model if necessary (that is, go back to step 1);
5. If the model can not be further improved and is satisfactory,
use it for description or forecasting.
Parameter estimation
Note that an AR(p) model
yt = α + φ1yt−1 + · · · + φpyt−p + εt,

is essentially nothing else than a linear regression model, with
lagged dependent variables as regressors. Hence, the unknown
parameters α, φ1, . . . , φp can be estimated straightforwardly using
ordinary least squares [OLS].
The MA(q) model
yt = εt + θ1εt−1 + ... + θq εt−q ,

also resembles a linear regression, but note that the ‘regressors’
are unknown / unobserved shocks εt−k , k = 1, . . . , q. We may
consider nonlinear least squares [NLS] or maximum likelihood
[ML] estimation.
Ordinary least squares
Estimating the parameters β in a linear regression model

yt = x0tβ + εt, t = 1, . . . , T, (4)
where xt is a (k × 1) vector of regressors, can be done by OLS,
which minimizes the sum of squared residuals
T
(yt − x0tβ)2.
X
bOLS = argmin Q(β) = argmin
t=1
From the first-order conditions, it follows easily that

 −1  
T T
0
X X
bOLS =  xtxt  xtyt ,
t=1 t=1
or, using matrix notation (Y = Xβ + ε),
bOLS = (X 0X)−1X 0y. (5)

Under suitable regularity conditions, bOLS is unbiased (and

consistent), e.g. assuming fixed regressors X and E[εt] = 0,
E[bOLS ] = E[(X 0X)−1X 0y]
= E[(X 0X)−1X 0(Xβ + ε)]
= β + E[(X 0X)−1X 0ε]
= β + (X 0X)−1X 0E[ε]
= β.
To compute standard errors, we need the covariance matrix of
bOLS ,
V[bOLS ] = E[(bOLS − β)(bOLS − β)0]
= E[(X 0X)−1X 0εε0X(X 0X)−1]
= (X 0X)−1X 0E[εε0]X(X 0X)−1
= (X 0X)−1X 0ΩX(X 0X)−1, (6)
where Ω is the covariance matrix of ε.
Standard errors of bOLS are the square roots of the diagonal

elements of
V[bOLS ] = (X 0X)−1X 0ΩX(X 0X)−1.
• If εt is homoskedastic (E[ε2 2
t ] = σ for all t) and uncorrelated
(E[εtεs] = 0 for all t and s), Ω = σ 2I, such that
V [bOLS ] = σ 2(X 0X)−1.

Standard errors of bOLS are the square roots of the diagonal

elements of
V[bOLS ] = (X 0X)−1X 0ΩX(X 0X)−1.
• If εt is heteroskedastic (E[ε2 2
t ] = σt ) and uncorrelated, we have
Ω = diag(σ12, σ22, . . . , σT2 ), such that
 −1   −1
T T T
xtx0t σt2xtx0t  xtx0t
X X X
V [bOLS ] =   (7)
t=1 t=1 t=1
⇒ Using ε̂2 0 2
t (where ε̂t = yt − xt bOLS ) to estimate σt gives the
so-called ‘White standard errors’.

An important difference between the ‘classical’ linear regression

model and an AR(p) model
yt = α + φ1yt−1 + · · · + φpyt−p + εt,
is that now the regressors yt−1, . . . , yt−p are not fixed but are
stochastic.
This basically implies that exact finite-sample results which hold

in the ‘classical’ linear regression model yt = x0tβ + εt with fixed xt’s
do not hold in the time series context.
⇒ Asymptotic results continue to hold, however.
For example, the OLS estimator of φ1 in the AR(1) model

yt = φ1yt−1 + εt,
is not unbiased but remains consistent.
Finite-samples and asymptotics
Whether or not we can rely upon asymptotic results to draw

inference in finite samples depends on the context – and in
particular on the sample size T .
Example: suppose a series yt is generated from the AR(1) model,
yt = φ1yt−1 + εt, t = 1, 2, . . . , T. (8)
⇒ Question: What is the distribution of the OLS estimator of φ1?

The OLS estimate of the AR parameter φ1 is equal to

  , 
T T
2 
X X
φ̂1 =  yt−1yt  yt−1
t=1 t=1
  , 
T T
2 
X X
=  yt−1(φ1yt−1 + εt)   yt−1
t=1 t=1
  , 
T T
2 ,
X X
= φ1 +  yt−1εt  yt−1 (9)
t=1 t=1
from which it follows that
  , 
√ T T
1 X 1 X
2 
T (φ̂1 − φ1) =  √ yt−1εt  yt−1 (10)
T t=1 T t=1

If εt ∼ iid(0, σ 2) for all t,

T
1 X 2 σ2
yt−1 → γ0 = (11)
T t=1 1 − φ2
1
T
1 X
√ yt−1εt → N (0, γ0σ 2) (12)
T t=1
2 ε2 ] = E[y 2 ] · E[ε2 ] = γ σ 2 ).
(because E[yt−1 t t−1 t 0
It then follows that

√ a
T (φ̂1 − φ1) ∼ N (0, σ 2γ0−1)
⇒ But is this asymptotic normal distribution a reasonable

approximation to the finite-sample distribution of φ̂ for small T ?

3.5
Histogram
3.0 Normal
2.5
2.0
Density
1.5
1.0
0.5
0.0
-0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2
Exact small-sample distribution and asymptotic

normal distribution of φ̂1 as given in (9) for series
generated according to (8) with φ1 = 0.5 and T = 50.

The exact small-sample distribution can be obtained by means of

Monte Carlo simulation:
1. Obtain independent drawings ε∗t , t = 1, . . . , T , from a normal

distribution.
2. Construct an artificial time series yt∗ from
(0)
yt∗ = φ1 yt−1
∗ + ε∗t , t = 1, . . . , T,
(0)
where φ1 is the value of the AR-parameter of interest.
3. Estimate the AR(1) model yt∗ = φ1yt−1
∗ + εt with the artificial
sample to obtain an estimate of the AR(1)-parameter.
4. Repeat steps 1-3 a large number of B times to obtain
(1) (B)
estimates φ̂1 , . . . , φ̂1 .
⇒ These form an estimate of the distribution of φ̂1.
Variable/Model selection
• Selecting the regressors x1,t, . . . , xk,t and AR and MA orders p

and q can be done in many different ways.
For p and q, we may try to use the properties of the empirical

autocorrelations and partial autocorrelations, but often these do
not clearly match the theoretical properties implied by AR and MA
models.
Two alternative methods that are very popular in empirical

applications are
1. model selection criteria based on in-sample fit;
2. out-of-sample forecasting;

Model selection criteria
Our model should describe the variable yt as good as possible. At

the same time, we prefer “small” models.
⇒ We should find a balance between “model fit” and “model
parsimony” (number of parameters to be estimated). Model
selection criteria aim to achieve this.
The Akaike Information Criterion [AIC] is given by
AIC(k) = T log σ̂ 2 + 2k, (13)

where T denotes the sample size, k is number of parameters and
σ̂ 2 is residual variance.
The Schwarz Information Criterion [SIC] is given by
SIC(k) = T log σ̂ 2 + k log T. (14)
⇒ Select ARMA orders p and q that minimize AIC(k) or SIC(k).

Misspecification tests and diagnostic measures
How can we evaluate whether an estimated model is ok?
Several possibilities exist. Many “misspecification tests” aim to

test whether the residuals ε̂t of the ARMA model satisfy the white
noise properties
E[ε2 2
t] = σ ,
E[εtεs] = 0, for all s 6= t.
1. Test of no residual autocorrelation, based on the empirical

autocorrelations rk (ε̂) = ( T
P PT 2
ε̂ ε̂
t=k+1 t t−k )/( t=1 ε̂t ).
m r 2 (ε̂)
a
k ∼ χ2(m).
X
LB(m) = T (T + 2)
k=1
T −k
⇒ If rejected, AR or MA order should be adjusted.
Another test for residual autocorrelation is based on the

Lagrange Multiplier [LM] principle.
To test an AR(p) model against AR(p + r) or ARMA(p, r), the LM

test is obtained by estimating the auxiliary regression
ε̂t = α1yt−1 + · · · + αpyt−p + αp+1ε̂t−1 + · · · + αp+r ε̂t−r + vt, (15)

where ε̂t are the estimated residuals of the AR(p) model.
The test statistic is calculated as T R2, and is asymptotically

χ2(r)-distributed.

2. Test of homoskedasticity (constant variance), often based

on autocorrelations of squared residuals.
⇒ If rejected, standard errors of parameters should be adjusted

or heteroskedasticity should be modelled explicitly.
3. Test of normality: Skewness=0, Kurtosis=3
T d2 T b
SKε̂ +
JB = (Kε̂ − 3)2,
6 24
where the skewness and kurtosis of ε̂t can be calculated as
T
q
1 X j
SK
d = m̂ /
ε̂ 3 m̂3
2, and b = m̂ /m̂2,
Kε̂ 4 2 with m̂j = ε̂t .
T t=1
a
Under the null hypothesis of normality: JB ∼ χ2
2

Forecasting
Given a model that has been specified and estimated using

observations y1, y2, . . . , yT , we may generate an h-step ahead
forecast for yT +h. Time T is called the forecast origin, and h is
the forecast horizon.
Three different, but related types of forecasts:
1. A point forecast of yT +h, denoted ŷT +h|T , provides a specific

value for this observation.
2. An interval forecast consists of a lower bound L b
T +h|T and an
upper bound U b
T +h|T such that interval (LT +h|T , UT +h|T )
b b
contains the actual value yT +h with a certain probability.
3. A density forecast concerns the conditional distribution of
yT +h, denoted as f (yT +h|YT ).

Different types of forecasts
For a given variable of interest, different types of forecasts may be

given:
1. Point forecasts
‘Inflation in the euro area over the next twelve months is expected to be
equal to 2.3 percent.’
2. Interval forecasts
‘Inflation in the euro area over the next twelve months will be between 1.0
and 3.6 percent with probability 0.95.’
3. Density forecasts
‘Inflation in the euro area over the next twelve months is normally
distributed with mean equal to 2.3 percent and standard deviation 0.65.’

Constructing point forecasts
What the optimal h-step ahead point forecast is depends on a

so-called loss function.
General idea: Forecasts are used in decision-making. Forecasts

that differ from the actual values lead to sub-optimal decisions. In
other words, forecast errors lead to a certain ‘loss’. We should use
the point forecast ybT +h|T that minimizes the expected value of the
loss function.
⇒ The form of the loss function depends on the particular

application, that is, on the variable that we are forecasting.
Forecasting Big Mac sales for McDonald’s is quite different from
forecasting the spread of a corona-virus...

Constructing point forecasts
• In many cases, the relevant loss function is difficult to specify

exactly. We therefore use simple functions that only depend on
the forecast error: eT +h|T = yT +h − ŷT +h|T .
• Most often, we assume that the forecast user has a squared

loss function, that is
LossT +h|T = e2
T +h|T . (16)
Minimizing the expected value of (16), or the Mean Squared

Prediction Error [MSPE], we find that the optimal point forecast
is the conditional mean of yT +h, that is
ŷT +h|T = E[yT +h|YT ]. (17)

Point forecasts: AR(1) model
Consider the AR(1) model
yt = φ1yt−1 + εt, (18)

with E[εt|Yt−1] = 0 and E[ε2 2
t |Yt−1 ] = σ . Assume the value of φ1 is
known.
At time T , the value of εT +1 is yet unknown, and since

E[εT +1|YT ] = 0, the optimal point forecast of yT +1 equals
ŷT +1|T = E[yT +1|YT ]

= E[φ1yT + εT +1|YT ]
= φ1yT . (19)

The one-step ahead forecast error eT +1|T is equal to the shock

occurring at t = T + 1:
eT +1|T = yT +1 − ŷT +1|T = yT +1 − φ1yT = εT +1. (20)
Hence, the variance of the forecast error V[eT +1|T ] is equal to σ 2,

which is the variance of εt and also the conditional variance
V [yT +1|YT ].

For two steps ahead, we obtain

ŷT +2|T = E[φ1yT +1 + εT +2|YT ]
= φ1E[yT +1|YT ]
= φ1ŷT +1|T
= φ2
1 yT . (21)
Since the actual value yT +2 is given by

yT +2 = φ1yT +1 + εT +2
= φ1(φ1yT + εT +1) + εT +2, (22)
it holds that the two-step ahead forecast error is equal to
eT +2|T ≡ yT +2 − ŷT +2|T = εT +2 + φ1εT +1,
with variance V[eT +2|T ] = (1 + φ2 1 )σ 2 , which shows that
V[eT +2|T ] > V[eT +1|T ].
For three steps ahead, we would have
ŷT +3|T = E[φ1yT +2 + εT +3|YT ]

= φ1ŷT +2|T
= φ3
1 yT . (23)
Since the actual value yT +3 is given by
yT +3 = φ1yT +2 + εT +3
= φ1(φ1(φ1yT + εT +1) + εT +2) + εT +3, (24)
it holds that the three-step ahead forecast error is equal to
eT +3|T ≡ yT +3 − ŷT +3|T = εT +3 + φ1εT +2 + φ2

1 εT +1 ,
with variance V[eT +3|T ] = (1 + φ2 4 2
1 + φ1 )σ , which shows that
V[eT +3|T ] > V[eT +2|T ].
This can be generalized to the following results:

1. The h-step ahead forecast can be computed either directly as
ŷT +h|T = φh
1 yT , or recursively as ŷT +h|T = φ1 ŷT +h−1|T .
2. As h becomes larger, the h-step ahead forecast ŷT +h|T
converges to 0, which is the unconditional mean of yT +h.
3. The h-step ahead forecast error eT +h|T is an MA(h-1) process,
namely a linear combination of the shocks that occur between
time T and T + h, that is, εT +1, εT +2, . . . , εT +h.
4. The variance of the forecast errors is increasing in the horizon
h, that is, V[eT +h|T ] > V[eT +h−1|T ] for all h > 1.
5. As h becomes larger, V[eT +h|T ] converges to σ 2/(1 − φ2
1 ),
which is the unconditional variance of yT +h.
⇒ Similar results hold for general ARMA(p,q) models.

Point forecasts: convergence to the unconditional mean
Consider the AR(1) model with intercept

yt = δ + φ1yt−1 + εt, (25)
with E[εt|Yt−1] = 0 and E[ε2 2
t |Yt−1 ] = σ .
At time T , the optimal point forecast of yT +1 equals

ŷT +1|T = E[yT +1|YT ]
= E[δ + φ1yT + εT +1|YT ]
= δ + φ1yT . (26)
For two steps ahead, we obtain

ŷT +2|T = E[δ + φ1yT +1 + εT +2|YT ]
= δ + φ1E[yT +1|YT ] = δ + φ1ŷT +1|T
= δ(1 + φ1) + φ2
1 yT . (27)

For three steps ahead, we obtain

ŷT +3|T = E[δ + φ1yT +2 + εT +3|YT ]
= δ + φ1ŷT +2|T
= δ(1 + φ1 + φ2
1 ) + φ 3y .
1 T (28)
And, in general, the h-steps ahead point forecast is given by

ŷT +h|T = E[δ + φ1yT +h−1 + εT +h|YT ]
= δ + φ1ŷT +h−1|T
= δ(1 + φ1 + . . . + φh−1
1 ) + φ hy .
1 T (29)
As h → ∞ (and assuming |φ1| < 1), this converges to

δ
ŷT +h|T → δ(1 + φ1 + φ2 3 4
1 + φ1 + φ1 + . . .) = = E[yt].
1 − φ1

This converge of point forecasts when the forecast horizon

increases is even easier seen by rewriting the AR(1) model as
yt − µ = φ1(yt−1 − µ) + εt, (30)

with µ = δ/(1 − φ1) denoting the unconditional mean of yt (again,
assuming |φ1| < 1).
Using this representation, we find
ŷT +1|T = µ + φ1(yT − µ),

ŷT +2|T = µ + φ2
1 (yT − µ),
ŷT +3|T = µ + φ3
1 (yT − µ),
and in general, for any h ≥ 1,
ŷT +h|T = µ + φh
1 (yT − µ).

Initial claims for unemployment insurance
800
700
600
500
400
300
200
100
0
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
h-step ahead AR(2) forecasts for quarterly US initial claims

Point forecasts: effect of estimation uncertainty
In practice, the true value(s) of the model parameter(s) are

unknown. Instead, we have to use estimated parameters. For the
AR(1) model, the ‘feasible’ 1-step ahead forecast is given by
ŷT +1|T = φ̂1yT .
This has consequences for the uncertainty/precision of the

forecast. Assuming that the data generating process (DGP) is an
AR(1) process yt = φ1yt + εt, the 1-step ahead forecast error
eT +1|T is equal to
eT +1|T = yT +1 − ŷT +1|T

= φ1yT + εT +1 − φ̂1yT
= εT +1 + (φ1 − φ̂1)yT .

Point forecasts: effect of estimation uncertainty
This (mostly) affects the variance of the forecast error V[eT +1|T ].
√
Using the earlier result that T (φ̂1 − φ1) ∼ N (0, σ 2γ0−1), we find
1

V[eT +1|T ] = σ 2 1 + .
T
Hence, for large T , estimation uncertainty is not important, but for

small/moderate T it can have a substantial impact.
Note that this also provides a strong argument in favor of

parsimonious forecasting models, in the sense that for a model
with k parameters it holds that
k

V[eT +1|T ] ≈ σ 2 1 + .
T

Initial claims for unemployment insurance
without estimation uncertainty

800
with estimation uncertainty
700
600
500
400
300
200
100
0
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
h-step ahead AR(2) forecasts for quarterly US initial claims

Point forecasts: effect of model misspecification
In practice, the true data generating process (DGP) is unknown.

The model that we use for forecasting, might be mis-specified -
which also affects the properties of point forecasts and the
associated forecast errors.
Assume that the DGP of a time series yt is the AR(2) process
yt = φ1yt−1 + φ2yt−2 + εt, (31)

with E[εt|Yt−1] = 0 and E[ε2
t |Y t−1 ] = σ 2.
And assume that for making forecasts we use an AR(1) model
yt = φ̂1yt−1 + ηt, (32)

where we assume that the ‘shocks’ ηt have the usual white noise
properties, in particular mean zero.
Point forecasts: effect of model misspecification
In this case, the 1-step ahead forecast at time T is again given by
ŷT +1|T = φ̂1yT .
For the corresponding forecast error, we now find [using the AR(2)
DGP for the actual value yT +1!!]
eT +1|T = yT +1 − ŷT +1|T

= φ1yT + φ2yT −1 + εT +1 − φ̂1yT
= εT +1 + (φ1 − φ̂1)yT + φ2yT −1.
Obviously, eT +1|T is different from the true shocks εT +1 (even if

φ̂1 = φ1), and may have different properties. For example, the
one-step ahead forecast errors will exhibit non-zero
autocorrelations, which they should not.
Evaluating point forecasts
• Evaluation of forecasts is of crucial importance, for obvious

reasons. Why would you use a model (or an expert, or another
source) for forecasting if its forecasts are not of sufficient quality?
• Also, forecast evaluation is a helpful tool for selecting among

competing forecasts (from different models or other sources).
⇒ Forecast evaluation can be done in two ways:

1. ‘Absolute’: what is the quality of forecasts from one specific
model?
2. ‘Relative’: what is the quality of forecasts from multiple
competing models, relative to each other?

• How to evaluate a given set of P 1-step ahead point forecasts

ŷt+1|t for t = T, . . . , T + P − 1?
• How to compare the quality of forecasts from competing

models? (This is an important alternative method for model
selection!)
⇒ Most forecast evaluation techniques focus on properties of the

forecast errors
et+1|t = yt+1 − ŷt+1|t.
(Throughout we assume that the point forecasts are the

conditional mean, that is ŷt+1|t = E[yt+1|Yt].)

Desirable properties of point forecasts:

1. Unbiasedness: forecast errors have zero mean (E[et+1|t] = 0).
⇒ Straightforward to examine, by testing whether the sample

mean of the forecast errors differs significantly from zero.
2. Accuracy: MSPE should be as small as possible
⇒ Recall that the point forecast ŷt+1|t (usually) is taken to be

the one which minimizes E[e2 2
t+1|t ] = E[(yt+1 − ŷt+1|t ) ]
Note that the MSPE can be decomposed as

2
E[(yt+1 − ŷt+1|t)2] = V[yt+1 − ŷt+1|t] + E[yt+1 − ŷt+1|t]
2
= V[et+1|t] + E[et+1|t]
= variance + squared bias

Given P one-step ahead forecast, we may compute / estimate the

MSPE as
+P −1
1 t=TX
(yt+1 − ŷt+1|t)2.
P t=T
Note that the value of an MSPE by itself is difficult to interpret,

as it depends on the scale of the time series.
⇒ Compare the MSPE to the variance of the residuals from the

model which is used to construct the forecasts. Recall that, under
correct model specification, et+1|t = εt+1.
Or, compare the MSPE with the unconditional variance of yt.

In addition, we may consider other, similar measures of forecast

accuracy such as the Mean Absolute Error (MAE)
+P −1
1 t=TX
|yt+1 − ŷt+1|t|,
P t=T
or the Mean Percentage Squared Error (MPSE)
+P −1 !2
1 t=TX yt+1 − ŷt+1|t
.
P t=T yt+1
⇒ Each of these criteria may provide useful information.
[but can also lead to conflicting results when comparing models]

Desirable properties of forecasts:

3. Efficiency / Optimality: it should not be possible to forecast
the forecast error itself with any information available at time t.
⇒ This implies that in a regression such as

et+1|t = α0xt + ηt+1,
where xt is a vector of variables known at time t, the
coefficients α should be equal to zero.
In particular, consider the Mincer-Zarnowitz regression

et+1|t = α0 + α1ŷt+1|t + ηt+1,
or
yt+1 = β0 + β1ŷt+1|t + ηt+1, (33)
for which it should hold that β0 = 0 and β1 = 1.

Comparing predictive accuracy
• Obviously, models with smaller MSPE (MAE, MPSE) are

“better”. But, how can we determine whether differences in
MSPE are statistically significant?
⇒ Formal comparison of MSPE is possible with so-called
Diebold-Mariano tests.
Let ŷi,t+1|t and ŷj,t+1|t denote two competing 1-step ahead

forecasts of yt, obtained from models i and j, respectively, with
corresponding forecast errors ei,t+1|t = yt+1 − ŷi,t+1|t.
Assume that the squared forecast error ei,t+1|t is the relevant “loss
function”. Define the “loss differential” as dt+1 = e2
i,t+1|t − e 2
j,t+1|t .
⇒ Equal forecast accuracy implies E[dt+1] = 0.

Comparing predictive accuracy
⇒ We can test the null hypothesis of equal forecast accuracy by

testing whether the sample mean of dt+1 is not significantly
different from zero.
Given a set of P 1-step ahead forecasts for t = T, . . . , T + P − 1,

the Diebold-Mariano statistic is given by
d a
DM = r ∼ N (0, 1), (34)

ˆ
V dt+1 /P
where d is the sample mean of dt+1 and V (dˆt+1) is an estimate of

the variance of dt+1, which can be computed as
1 X−1
T +P 2
ˆ
V (dt+1) = dt+1 − d ,
P − 1 t=T

Constructing density forecasts
Recall that by definition

yT +h = ŷT +h|T + eT +h|T , (35)
where (assuming a quadratic loss function) ŷT +h|T = E[yT +h|YT ].
⇒ The forecast error eT +h|T is a random variable with a certain

distribution.
It follows that the conditional distribution of yT +h at time T , or
the density forecast, is equal to the distribution of eT +h|T , except
with a different mean ŷT +h|T instead of zero.
• As shown before, eT +h|T is in general a linear combination of

εT +1, . . . , εT +h. If we assume normality of εt, it follows that
yT +h|Yt ∼ N (ŷT +h|T , V[eT +h|T ]).

Constructing interval forecasts
• From the density forecast f (yT +h|YT ) we can easily obtain

interval forecasts.
⇒ An interval forecast consists of a lower bound L b

T +h|T and an
upper bound U b
T +h|T such that interval (LT +h|T , UT +h|T ) contains
b b
the actual value yT +h with a certain pre-specified probability.
• Many intervals satisfy this property. It is common to consider a

symmetric interval around the conditional mean of yT +h.
For example, in case f (yT +h|YT ) = N (ŷT +h|T , V[eT +h|T ]), a 95
percent interval forecast for yT +h is given by
√ √
(LT +h|T , UT +h|T ) = (ŷT +h|T − 1.96 V[eT +h|T ], ŷT +h|T + 1.96 V[eT +h|T ]).
b b

Interval and density forecasts
• Forecasters have traditionally focused primarily on point

forecasts. Only recently, more attention is given to describing the
uncertainty surrounding such point forecasts, in the form of
interval forecasts and density forecasts.
Possible explanations include
1. There was only limited demand for interval and density forecasts. With the
boom in financial risk management, among others, this demand has
increased tremendously.
2. No methods were available for the evaluation of interval and density
forecasts.
3. Interval and density forecasts traditionally were constructed analytically,
which requires strong and sometimes dubious assumptions such as normally
distributed shocks. Nowadays, interval and density forecasts can be
constructed with simulation techniques which avoid such assumptions.

Slides Chapter 3b

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Slides Chapter 3b

Uploaded by

Copyright:

Available Formats

Time Series Analysis

Chapter 3 - Basic concepts

• Modeling cycle / Specification strategy

Time Series Analysis: Chapter 3 1

An autoregressive model of order p [AR(p)] is given by

yt = φ1yt−1 + φ2yt−2 + · · · + φpyt−p + εt, (1)

A moving average model of order q [MA(q)] is given by

yt = εt + θ1εt−1 + ... + θq εt−q . (2)

Time Series Analysis: Chapter 3 2

In general, we may consider an AutoRegressive MovingAverage

Time Series Analysis: Chapter 3 3

• How to apply / implement ARMAX models in practice?

⇒ Strongly recommended to do this following a systematic,

1. Select regressors x1,t, . . . , xk,t and AR and MA orders p and q;

Note that an AR(p) model

yt = α + φ1yt−1 + · · · + φpyt−p + εt,

The MA(q) model

yt = εt + θ1εt−1 + ... + θq εt−q ,

Estimating the parameters β in a linear regression model

From the first-order conditions, it follows easily that

Time Series Analysis: Chapter 3 6

Under suitable regularity conditions, bOLS is unbiased (and

Standard errors of bOLS are the square roots of the diagonal

V [bOLS ] = σ 2(X 0X)−1.

Time Series Analysis: Chapter 3 8

Standard errors of bOLS are the square roots of the diagonal

Time Series Analysis: Chapter 3 9

An important difference between the ‘classical’ linear regression

This basically implies that exact finite-sample results which hold

For example, the OLS estimator of φ1 in the AR(1) model

Whether or not we can rely upon asymptotic results to draw

Example: suppose a series yt is generated from the AR(1) model,

yt = φ1yt−1 + εt, t = 1, 2, . . . , T. (8)

⇒ Question: What is the distribution of the OLS estimator of φ1?

Time Series Analysis: Chapter 3 11

The OLS estimate of the AR parameter φ1 is equal to

Time Series Analysis: Chapter 3 12

If εt ∼ iid(0, σ 2) for all t,

It then follows that

⇒ But is this asymptotic normal distribution a reasonable

Time Series Analysis: Chapter 3 13

Exact small-sample distribution and asymptotic

Time Series Analysis: Chapter 3 14

The exact small-sample distribution can be obtained by means of

1. Obtain independent drawings ε∗t , t = 1, . . . , T , from a normal

• Selecting the regressors x1,t, . . . , xk,t and AR and MA orders p

For p and q, we may try to use the properties of the empirical

Two alternative methods that are very popular in empirical

Time Series Analysis: Chapter 3 16

Our model should describe the variable yt as good as possible. At

The Akaike Information Criterion [AIC] is given by

AIC(k) = T log σ̂ 2 + 2k, (13)

SIC(k) = T log σ̂ 2 + k log T. (14)

⇒ Select ARMA orders p and q that minimize AIC(k) or SIC(k).

How can we evaluate whether an estimated model is ok?

Several possibilities exist. Many “misspecification tests” aim to

1. Test of no residual autocorrelation, based on the empirical

Another test for residual autocorrelation is based on the

To test an AR(p) model against AR(p + r) or ARMA(p, r), the LM

ε̂t = α1yt−1 + · · · + αpyt−p + αp+1ε̂t−1 + · · · + αp+r ε̂t−r + vt, (15)

The test statistic is calculated as T R2, and is asymptotically

Time Series Analysis: Chapter 3 19