Forecasting Slides

Forecasting
Francis X. Diebold
University of Pennsylvania
August 11, 2015

1 / 323
c 2013 onward, by Francis X. Diebold.

Copyright
These materials are freely available for your use, but be warned:
they are highly preliminary, significantly incomplete, and rapidly
evolving. All are licensed under the Creative Commons
Attribution-NonCommercial-NoDerivatives 4.0 International
License. (Briefly: I retain copyright, but you can use, copy and
distribute non-commercially, so long as you give me attribution and
do not modify. To view a copy of the license, visit
http://creativecommons.org/licenses/by-nc-nd/4.0/.) In return I
ask that you please cite the books whenever appropriate, as:
Diebold, F.X. (year here), Book Title Here, Department of
Economics, University of Pennsylvania,
http://www.ssc.upenn.edu/ fdiebold/Textbooks.html.
The painting is Enigma, by Glen Josselsohn, from Wikimedia
Commons.
2 / 323
Elements of Forecasting in Business, Finance, Economics

and Government
1. Forecasting in Action
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
Operations planning and control

Marketing
Economics
Financial speculation
Financial risk management
Capacity planning
Business and government budgeting
Demography
Crisis management
3 / 323
Forecasting Methods: An Overview

Review of probability, statistics and regression
Six Considerations Basic to Successful Forecasting
1. Forecasts and decisions
2. The object to be forecast
3. Forecast types
4. The forecast horizon
5. The information set
6. Methods and complexity
6.1 The parsimony principle
6.2 The shrinkage principle
4 / 323
Statistical Graphics for Forecasting
Why graphical analysis is important
Simple graphical techniques
Elements of graphical style
Application: graphing four components of real GNP
5 / 323
Modeling and Forecasting Trend
Modeling trend
Estimating trend models
Forecasting trend
Selecting forecasting models using the Akaike and Schwarz

criteria
Application: forecasting retail sales
6 / 323
Modeling and Forecasting Seasonality
The nature and sources of seasonality
Modeling seasonality
Forecasting seasonal series
Application: forecasting housing starts
7 / 323
Characterizing Cycles
Covariance stationary time series
White noise
The lag operator
Wolds theorem, the general linear process, and rational

distributed lags
Estimation and inference for the mean, autocorrelation and

partial autocorrelation functions
Application: characterizing Canadian employment dynamics
8 / 323
Modeling Cycles: MA, AR and ARMA Models
Moving-average (MA) models
Autoregressive (AR) models
Autoregressive moving average (ARMA) models
Application: specifying and estimating models for forecasting

employment
9 / 323
Forecasting Cycles
Optimal forecasts
Forecasting moving average processes
Forecasting infinite-ordered moving averages
Making the forecasts operational
The chain rule of forecasting
Application: forecasting employment
10 / 323
Putting it all Together: A Forecasting Model with Trend,

Seasonal and Cyclical Components
Assembling what weve learned
Application: forecasting liquor sales
Recursive estimation procedures for diagnosing and selecting

forecasting models
11 / 323
Forecasting with Regression Models

I
Conditional forecasting models and scenario analysis
Accounting for parameter uncertainty in confidence intervals

for conditional forecasts
Unconditional forecasting models
Distributed lags, polynomial distributed lags, and rational

distributed lags
Regressions with lagged dependent variables, regressions with

ARMA disturbances, and transfer function models
Vector autoregressions
Predictive causality
Impulse-response functions and variance decomposition
Application: housing starts and completions
12 / 323
Evaluating and Combining Forecasts
Evaluating a single forecast
Evaluating two or more forecasts: comparing forecast accuracy
Forecast encompassing and forecast combination
Application: OverSea shipping volume on the Atlantic East

trade lane
13 / 323
Unit Roots, Stochastic Trends, ARIMA Forecasting

Models, and Smoothing
Stochastic trends and forecasting
Unit roots: estimation and testing
Application: modeling and forecasting the yen/dollar

exchange rate
Smoothing
Exchange rates, continued
14 / 323
Volatility Measurement, Modeling and Forecasting
The basic ARCH process
The GARCH process
Extensions of ARCH and GARCH models
Estimating, forecasting and diagnosing GARCH models
Application: stock market volatility
15 / 323
Useful Books, Journals and Software

Books
Statistics review, etc.:
I
Wonnacott, T.H. and Wonnacott, R.J. (1990), Introductory

Statistics, Fifth Edition. New York: John Wiley and Sons.
Pindyck, R.S. and Rubinfeld, D.L. (1997),Econometric Models

and Economic Forecasts, Fourth Edition. New York:
McGraw-Hill.
Maddala, G.S. (2001), Introduction to Econometrics, Third

Edition. New York: Macmillan.
Kennedy, P. (1998), A Guide to Econometrics, Fourth Edition.

Cambridge, Mass.: MIT Press.
16 / 323
Useful Books, Journals and Software cont.
Time series analysis:

I
Chatfield, C. (1996), The Analysis of Time Series: An

Introduction, Fifth Edition. London: Chapman and Hall.
Granger, C.W.J. and Newbold, P. (1986), Forecasting

Economic Time Series, Second Edition. Orlando, Florida:
Academic Press.
Harvey, A.C. (1993), Time Series Models, Second Edition.

Cambridge, Mass.: MIT Press.
Hamilton, J.D. (1994), Time Series Analysis, Princeton:

Princeton University Press.
17 / 323
Special insights:
I
Armstrong, J.S. (Ed.) (1999), The Principles of Forecasting.

Norwell, Mass.: Kluwer Academic Forecasting.
Makridakis, S. and Wheelwright S.C. (1997), Forecasting:

Methods and Applications, Third Edition. New York: John
Wiley.
Bails, D.G. and Peppers, L.C. (1997), Business Fluctuations.

Englewood Cliffs: Prentice Hall.
Taylor, S. (1996), Modeling Financial Time Series, Second

Edition. New York: Wiley.
18 / 323
Journals
I
Journal of Forecasting
Journal of Business Forecasting Methods and Systems
Journal of Business and Economic Statistics
Review of Economics and Statistics
Journal of Applied Econometrics
19 / 323

Software
I
General:
I
I
I
I
I
I
I
Cross-section:
I
Eviews
S+
Minitab
SAS
R
Python
Many more...
Stata
Open-ended:
I
Matlab
20 / 323
Online Information
I
Resources for Economists:
21 / 323
A Brief Review of Probability, Statistics, and Regression

for Forecasting
Topics
I Discrete Random Variable
I Discrete Probability Distribution
I Continuous Random Variable
I Probability Density Function
I Moment
I Mean, or Expected Value
I Location, or Central Tendency
I Variance
I Dispersion, or Scale
I Standard Deviation
I Skewness
I Asymmetry
I Kurtosis
I Leptokurtosis
22 / 323

for Forecasting
Topics cont.
I
I
I
I
I
I
I
I
I
I
I
I
I
Skewness
Asymmetry
Kurtosis
Leptokurtosis
Normal, or Gaussian, Distribution
Marginal Distribution
Joint Distribution
Covariance
Correlation
Conditional Distribution
Conditional Moment
Conditional Mean
Conditional Variance
23 / 323

for Forecasting cont.
Topics cont.
I
I
I
I
I
I
I
I
I
I
I
I
I
Population Distribution
Sample
Estimator
Statistic, or Sample Statistic
Sample Mean
Sample Variance
Sample Standard Deviation
Sample Skewness
Sample Kurtosis
2 Distribution
t Distribution
F Distribution
Jarque-Bera Test
24 / 323
Regression as Curve Fitting

Least-squares estimation:
min
T
X
[ yt 0 1 xt ] 2
t = 1
Fitted values:
y
t = 0 + 1 xt
Residuals:
e t = yt y
t
25 / 323
Regression as a probabilistic model

Simple regression:
y t = 0 + 1 x t + t
iid
(0, 2 )
Multiple regression:
yt = 0 + 1 xt + 2 zt + t
iid
(0, 2 )
26 / 323
Regression as a probabilistic model cont.

Mean dependent var 10.23
T
y
=
1X
yt
T
t=1
S.D. dependent var 1.49

s
PT
SD =
y
)2
T1
t=1 (yt
Sum squared resid 43.70

SSR =
T
X
e2t
t=1
27 / 323

Fstatistic 30.89
F =
(SSRres SSR) / (k 1)
SSR / (T k)
S.E. of regression 0.99

s
PT
=
SER =
2
t=1 et
Tk
s
s2
PT
2
t=1 et
Tk
28 / 323

Rsquared 0.58
2
PT
= 1 PT
2
t=1 et
t=1 (yt
y
t )2
or
R
= 1
1 PT
2
t=1 et
T
1 PT
t=1 (yt
T
y
t )2
Adjusted Rsquared 0.56

2 = 1
R
PT 2
1
t=1 et
Tk
PT
1
t=1 (yt
T1
y
t )2
29 / 323
Akaike info criterion 0.03

2k
AIC = e( T )
PT
2
t=1 et
Schwarz criterion 0.15

k
SIC = T( T )
PT
2
t=1 et
30 / 323

Durbin Watson stat 1.97
y t = 0 + 1 x t + t
iid
vt N(0, 2 )
t = t1 + vt
PT
DW =
2
t=2 (et et1 )
PT 2
t=1 et
31 / 323
Regression of y on x and z
32 / 323
Scatterplot of y versus x
33 / 323
Scatterplot of y versus x Regression Line Superimposed
34 / 323
Scatterplot of y versus z Regression Line Superimposed
35 / 323
Residual Plot Regression of y on x and z
36 / 323
Six Considerations Basic to Successful Forecasting
1. The Decision Environment and Loss Function

L(e) = e2
L(e) = |e|
2. The Forecast Object
I
Event outcome, event timing, time series.
3. The Forecast Statement

I
Point forecast, interval forecast, density forecast, probability

forecast
37 / 323
Six Considerations Basic to Successful Forecasting cont.

4. The Forecast Horizon
I
I
h-step ahead forecast

h-step-ahead extrapolation forecast
5. The Information Set

univariate
= {yT , yT1 , ..., y1 }
T
= {yT , xT , yT1 , xT1 , ..., y1 , x1 }
multivariate
T
6. Methods and Complexity, the Parsimony Principle, and the
I
I
I
I
Shrinkage Principle
Signal vs. noise
Smaller is often better
Even incorrect restrictions can help
38 / 323
Decision Making with Symmetric Loss

Demand High Demand Low
Build Inventory
0
$10,000
Reduce Inventory $10,000
0
Decision Making with Asymmetric Loss
Demand High Demand Low
Build Inventory
0
$10,000
Reduce Inventory $20,000
0
39 / 323
Forecasting with Symmetric Loss

High Actual Sales Low Actual Sales
High Forecasted
Sales
0
$10,000
Low Forecasted
$10,000
0
Sales
Forecasting with Asymmetric Loss
High Actual Sales Low Actual Sales
High Forecasted
Sales
0
$10,000
Low Forecasted
Sales
$20,000
0
40 / 323
Quadratic Loss
41 / 323
Absolute Loss
42 / 323
Asymmetric Loss
43 / 323
Forecast Statement
44 / 323
Forecast Statement cont.
45 / 323
Extrapolation Forecast
46 / 323
Extrapolation Forecast cont.
47 / 323
Statistical Graphics For Forecasting

1. Why Graphical Analysis is Important
I
I
I
Graphics helps us summarize and reveal patterns in data

Graphics helps us identify anomalies in data
Graphics facilitates and encourages comparison of different
pieces of data
Graphics enables us to present a huge amount of data in a
small space, and it enables us to make huge data sets coherent
2. Simple Graphical Techniques

I
I
I
Univariate, multivariate
Time series vs. distributional shape
Relational graphics
3. Elements of Graphical Style

I
I
I
Know your audience, and know your goals.

Show the data, and appeal to the viewer.
Revise and edit, again and again.
4. Application: Graphing Four Components of Real GNP

48 / 323
Anscombes Quartet
(1)
x1
y1
10.0
8.04
8.0
6.95
13.0 7.58
9.0
8.81
11.0
8.33
14.0
9.96
6.0
7.24
4.0
4.26
12.0
10.84
7.0
4.82
5.0
5.68
(2)
x2
10.0
8.0
13.0
9.0
11.0
14.0
6.0
4.0
12.0
7.0
5.0
y2
9.14
8.14
8.74
8.77
9.26
8.10
6.13
3.10
9.13
7.26
4.74
(3)
x3
10.0
8.0
13.0
9.0
11.0
14.0
6.0
4.0
12.0
7.0
5.0
y3
7.46
6.77
12.74
7.11
7.81
8.84
6.08
5.39
8.15
6.42
5.73
(4)
x4
8.0
8.0
8.0
8.0
8.0
8.0
8.0
19
8.0
8.0
8.0
49 / 323
Anscombes Quartet
50 / 323
Anscombes Quartet Bivariate Scatterplot
51 / 323
1-Year Treasury Bond Rates
52 / 323
Change in 1-Year Treasury Bond Rates
53 / 323
Liquor Sales
54 / 323
Histogram and Descriptive Statistics Change in 1-Year

Treasury Bond Rates
55 / 323
Scatterplot 1-Year vs. 10-year Treasury Bond Rates
56 / 323
Scatterplot Matrix 1-, 10-, 20-, and 30-Year Treasury

Bond Rates
57 / 323
Time Series Plot Aspect Ratio 1:1.6
58 / 323
Time Series Plot Banked to 45 Degrees
59 / 323
Time Series Plot Aspect Ratio 1:1.6
60 / 323
Graph
61 / 323
Graph
62 / 323
Graph
63 / 323
Graph
64 / 323
Graph
65 / 323
Components of Real GDP (Millions of Current Dollars,

Annual
66 / 323

1. Modeling Trend
Tt = 0 + 1 TIMEt
Tt = 0 + 1 TIMEt + 2 TIME2t
Tt = 0 e1 TIMEt
ln(Tt ) = ln(0 ) + 1 TIMEt
67 / 323

2. Estimating Trend Models
(0 , 1 ) = arg min
0 ,1
T
X
t = 1
T
X
(0 , 1 , 2 ) = arg min
0 ,1 ,2
0 ,1
0 ,1
yt 0 1 TIMEt 2 TIME2t
t = 1
(0 , 1 ) = arg min
(0 , 1 ) = arg min
[ yt 0 1 TIMEt ] 2
T
X
T h
X
yt 0 e1 TIMEt
i2
t = 1
[ ln yt ln 0 1 TIMEt ] 2
t = 1
68 / 323

3. Forecasting Trend
yt = 0 + 1 TIMEt + t
yT+h = 0 + 1 TIMET+h + T+h
yT+h,T = 0 + 1 TIMET+h
y
T+h,T = 0 + 1 TIMET+h
69 / 323

3. Forecasting Trend cont.
yT+h,T 1.96
y
T+h,T 1.96
N(yT+h,T , 2 )
N(
yT+h,T ,
2)
70 / 323
4. Selecting Forecasting Models

PT
2
t=1 et
MSE =
R2 = 1 PT
T
PT
2
t=1 et
t=1 (yt
y
)2
71 / 323
4. Selecting Forecasting Models cont.

s
s2 =
2
R
PT
=
1 PT
2
t=1 et
t=1 (yt
PT
=
2
t=1 et
Tk
PT
T
Tk
2
t=1 et
/ Tk
y
)2 / T 1
1 s
2
PT
)2 / T
t=1 (yt y
72 / 323
4. Selecting Forecasting Models cont.

2k
AIC = e( T )
SIC = T( T )
I
Consistency
Efficiency
PT
2
t=1 et
T
PT
2
t=1 et
73 / 323
Labor Force Participation Rate
74 / 323
Increasing and Decreasing Labor Trends
75 / 323
Labor Force Participation Rate
76 / 323
Linear Trend Female Labor Force Participation Rate
77 / 323
Linear Trend Male Labor Force Participation Rate
78 / 323
Volume on the New York Stock Exchange
79 / 323
Various Shapes of Quadratic Trends
80 / 323
Quadratic Trend Volume on the New York Stock

Exchange
81 / 323
Log Volume on the New York Stock Exchange
82 / 323
Various Shapes of Exponential Trends
83 / 323
Linear Trend Log Volume on the New York Stock

Exchange
84 / 323
Exponential Trend Volume on the New York Stock

Exchange
85 / 323
Degree-of-Freedom Penalties Various Model Selection

Criteria
86 / 323
Retail Sales
87 / 323
Retail Sales Linear Trend Regression
88 / 323
Retail Sales Linear Trend Residual Plot
89 / 323
Retail Sales Quadratic Trend Regression
90 / 323
Retail Sales Quadratic Trend Residual Plot
91 / 323
Retail Sales Log Linear Trend Regression
92 / 323
Retail Sales Log Linear Trend Residual Plot
93 / 323
Retail Sales Exponential Trend Regression
94 / 323
Retail Sales Exponential Trend Residual Plot
95 / 323
Model Selection Criteria
Linear, Quadratic and Exponential Trend Models
AIC
SIC
Linear Trend
19.35
19.37
Quadratic Trend
15.83
15.86
Exponential Trend
17.15
17.17
96 / 323
Retail Sales History January, 1990 December, 1994
97 / 323
98 / 323
99 / 323
100 / 323

1. The Nature and Sources of Seasonality
2. Modeling Seasonality
D1
D2
D3
D4
yt =
=
=
=
=
(1,
(0,
(0,
(0,
s
X
0,
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1,
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1,
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1,
0,
0,
0,
0,
1,
...)
...)
...)
...)
i Dit + t
i=1
yt = 1 TIMEt +
s
X
i Dit + t
i=1
yt = 1 TIMEt +
s
X
i=1
i Dit +
v1
X
i=1
iHD HDVit +
v2
X
iTD TDVi
i=1
101 / 323

3. Forecasting Seasonal Series
yt = 1 TIMEt +
s
X
i Dit +
i=1
yT+h = 1 TIMET+h +
v1
X
iHD HDVit +
i=1
s
X
i Di,T+h +
yT+h,T = 1 TIMET+h +
v1
X
y
T+h,T
s
i
X
i=1
iHD HDVi,T+h +
i=1
i Di,T+h +
i=1
= 1 TIMET+h +
iTD TDVi
i=1
i=1
s
X
v2
X
v1
X
iHD HDVi,T+h +
i=1
Di,T+h +
v1
X
iHD HDVi,T+h +
i=1
102 / 323
3. Forecasting Seasonal Series cont.

yT+h,T 1.96
y
T+h,T 1.96
N(yT+h,T , 2 )
N(
yT+h,T ,
2)
103 / 323
Gasoline Sales
104 / 323
Liquor Sales
105 / 323
Durable Goods Sales
106 / 323
Housing Starts, January, 1946 November, 1994
107 / 323
Housing Starts, January, 1990 November, 1994
108 / 323
Housing Starts Regression Results - Seasonal Dummy

Variable Model
109 / 323
Residual Plot
110 / 323
Housing Starts Estimated Seasonal Factors
111 / 323
Housing Starts
112 / 323
Housing Starts
113 / 323
Characterizing Cycles
1. Covariance Stationary Time Series
I
I
I
Realization
Sample Path
Covariance Stationary
Eyt = t
Eyt =
(t, ) = cov(yt , yt ) = E(yt )(yt )
(t, ) = ( )
( ) = p
( )
( )
cov(yt , yt )
p
p
= p
=
(0)
var(yt ) var(yt )
(0) (0)
114 / 323
1.Characterizing Cycles Cont.
corr(x, y) =
( ) =
( ) = p
I
cov(x, y)
x y
( )
, = 0, 1, 2, ....
(0)
( )
( )
cov(yt , yt )
p
p
= p
=
(0)
var(yt ) var(yt )
(0) (0)
p( ) regression of yt on yt1 , ..., yt
115 / 323
2.White Noise
yt WN(0, 2 )
iid
yt (0, 2 )
yt
iid
N(0, 2 )
116 / 323
2.White Noise Cont.
E(yt ) = 0
var(yt ) = 2
E(yt |t1 ) = 0
var(yt |t1 ) = E[(yt E(yt |t1 ))2 |t1 ] = 2
117 / 323
3.The Lag Operator

L yt = yt1
L2 yt = L(L(yt )) = L(yt1 ) = yt2
B(L) = b0 + b1 L + b2 L2 + ... bm Lm
Lm yt = ytm
yt = (1 L)yt = yt yt1
(1 + .9L + .6L2 )yt = yt + .9yt1 + .6yt2
B(L) = b0 + b1 L + b2 L2 + ... =
bi Li
i=0
B(L) t = b0 t + b1 t1 + b2 t2 + ... =
bi ti
i=0
118 / 323
4.Wolds Theorem, the General Linear Process, and

Rational Distributed Lags
Wolds Theorem
Let {yt } be any zero-mean covariance-stationary process. Then:
yt = B(L)t =
bi ti
i=0
t WN(0, 2 )
where
b0 = 1
and
b2i <
i=0
119 / 323
The General Linear Process
yt = B(L)t =
bi ti
i=0
iid
t WN(0, 2 ),
where b0 = 1 and
b2i <
i=0
120 / 323
The General Linear Process Cont.
E(yt ) = E(
X
i=0
bi ti ) =
X
i=0
bi Eti =
bi 0 = 0
i=0
X
X
X
X
2
2 2
2
var(yt ) = var(
bi ti ) =
bi var(ti ) =
bi =
b
i=0
i=0
i=0
i=0
E(yt |t1 ) = E(t |t1 ) + b1 E(t1 |t1 ) + b2 E(t2 |t1 ) + ... =
var(yt |t1 ) = E[(yt E(yt |t1 ))2 |t1 ] = E(2t |t1 ) = E(2t )
121 / 323
B(L) =
(L) =
(L)
(L)
q
X
i Li
i=0
(L) =
p
X
i Li
i=0
B(L)
(L)
(L)
122 / 323
5.Estimation and Inference for the Mean, Auto Correlation

and Partial Autocorrelation Functions
T
y
=
1X
yt
T
t=1
( ) =
( ) =
1
T
PT
E [(yt ) (yt )]
E[(yt )2 ]
t = +1 [(yt
1 PT
t=1 (yt
T
y
) (yt y
)]
y
)2
PT
=
t = +1
[(yt y
) (
PT
t=1 (yt
y
t = c + 1 yt1 + ... + yt
p
( )

1
( ), p
( ) N 0,
T
123 / 323
5.Estimation and Inference for the Mean, Auto Correlation

and Partial Autocorrelation Functions Cont.
( ) N(0,
1
)
T
rootT
( ) N(0, 1)
T 2 ( ) 21
QBP = T
m
X
2 ( )
=1
QLB = T (T + 2)
m
X
=1
1
T
2 ( )
124 / 323
A Rigid Cycle Pattern
125 / 323
Autocorrelation Function, One-Sided Gradual Damping
126 / 323
Autocorrelation Function, Non-Damping
127 / 323
Autocorrelation Function, Gradual Damped Oscillation
128 / 323
Autocorrelation Function, Sharp Cutoff
129 / 323
Realization of White Noise Process
130 / 323
Population Autocorrelation Function of White Noise

Process
131 / 323
Population Partialautocorrelation Function of White Noise

Process
132 / 323
Canadian Employment Index
133 / 323
Canadian Employment Index Correlogram

Sample: 1962:1 1993:4
Included observations: 128
Acorr.
P. Acorr.
Std. Error
Ljung-Box
0.949
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
118.07
219.66
303.72
370.82
422.27
460.00
486.32
503.41
512.70
516.43
517.20
517.21
p-v
v
1 0.949
2 0.877
3 0.795
4 0.707
5 0.617
6 0.526
7 0.438
8 0.351
9 0.258
10 0.163
11 0.073
12 0.005
0.244
0.101
0.070
0.063
0.048
0.033
0.049
0.149
0.070
0.011
0.016
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
134 / 323
Canadian Employment Index, Sample Autocorrelation and

Partial Autocorrelation Functions
135 / 323
Modeling Cycles: MA,AR, and ARMA Models
The MA(1) Process

yt = t + t1 = (1 + L)t
t WN(0, 2 )
If invertible:
yt = t + yt1 2 yt2 + 3 yt3 ...
136 / 323
Modeling Cycles: MA,AR, and ARMA Models Cont.
Eyt = E(t ) + E(t1 ) = 0

var(yt ) = var(t ) + 2 var(t1 ) = 2 + 2 2 = 2 (1 + 2 )
E(yt |t1 ) = E((t + t1 )|t1 ) = E(t |t1 ) + E(t1 |t1 )
var(yt |t1 ) = E[(yt E(yt |t1 ))2 |t1 ] = E(2t |t1 ) = E(2t )
137 / 323
The MA(q) Process
yt = t + 1 t1 + ... + q tq = (L)t
t WN(0, 2 )
where
(L) = 1 + 1 L + ... + q Lq
138 / 323
The AR(1) Process
yt = yt1 + t
t WN(0, 2 )
If covariance stationary:
yt = t + t1 + 2 t2 + ...
139 / 323
Moment Structure
E (yt ) = E (t + t1 + 2 t2 + ...)
= E (t ) + E (t1 ) + 2 E (t2 ) + ...
= 0
var (yt ) = var (t + t1 + 2 t2 + ...)
= 2 + 2 2 + 4 2 + ...
= 2
i=0
2i
= 12
140 / 323
Moment Structure Cont.

E (yt |yt1 ) = E ((yt1 + t ) | yt1 )
= E (yt1 |yt1 ) + E (t |yt1 )
= yt1 + 0
= yt1
var (yt |yt1 ) = var ((yt1 + t ) | yt1 )
= 2 var (yt1 |yt1 ) + var (t |yt1 )
= 0 + 2
= 2
141 / 323

Autocovariances and autocorrelations:
yt = yt1 + t
yt yt = yt1 yt + t yt
For
1
,
( ) = ( 1).
(Yule-Walker equation) But
2
(0) = 12
142 / 323
. Thus
2
( ) = 12 , = 0, 1, 2, ....
and
( ) = , = 0, 1, 2, ....
Partial autocorrelations:
143 / 323
The AR(p) Process
yt = 1 yt1 + 2 yt2 + ... + p ytp + t

t WN(0, 2 )
144 / 323
The ARMA(1,1) Process

yt = yt1 + t + t1
t WN(0, 2 )
MA representation if invertible:
yt =
(1 + L)
t
(1 L)
AR representation of covariance stationary:

(1 L)
y t = t
(1 + L)
145 / 323
The ARMA(p,q) Process
yt = 1 yt1 + ... + p ytp + t + 1 t1 + ... + q tq
t WN(0, 2 )
(L)yt = (L)t
146 / 323
Realization of Two MA(1) Processes
147 / 323
Population Autocorrelation Function MA(1) Process
= .4
148 / 323
Population Autocorrelation Function MA(1) Process
= .95
149 / 323
Population Partial Autocorrelation Function MA(1)

Process
= .4
150 / 323
Population Partial Autocorrelation Function MA(1)

Process
= .95
151 / 323
Realization of Two AR(1) Processes
152 / 323
Population Autocorrelation Function AR(1) Process
= .4
153 / 323
Population Autocorrelation Function AR(1) Process
= .95
154 / 323
Population Partial Autocorrelation Function AR(1) Process
= .4
155 / 323
Population Partial Autocorrelation Function AR(1) Process
= .95
156 / 323
Population Autocorrelation Function AR(2) Process with

Complex Roots
157 / 323
Employment: MA(4) Model
Sample: 1962:1 1993:4

158 / 323
Employment MA(4) Residual Plot
159 / 323

Residual Correlogram Sample: 1962:1 1993:4
Qstatistic probabilities adjusted for 4 ARMA term(s)
Acorr. P. Acorr.
1
0.345
2
0.660
3 0.534
4
0.427
5 0.347
6
0.484
7 0.121
8
0.348
9 0.148
10 0.102
11 0.081
12 0.029
0.345
0.614
0.426
0.042
-0.398
0.145
0.118
0.048
0.019
0.066
0.098
0.113
Std. Error
Ljung-Box
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
73.089
111.01
135.49
151.79
183.70
185.71
202.46
205.50
206.96
207.89
208.01
p-v
15.614
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
160 / 323

Residual Sample Autocorrelation and Partial Autocorrelation
Functions, With Plus or Minus Two Standard Error Bands
161 / 323
Employment: AR(2) Model
LS // Dependent Variable is CANEMP

Sample: 1962:1 1993:4
Convergence achieved after 3 iterations
Variable Coefficient Std. Error tStatistic Prob.
162 / 323
Employment AR(2) Model Residual Plot
163 / 323
Employment AIC Values of Various ARMA Models
AR Order
0
1
2
3
4
1.01
.762
.77
.79
1
2.86
.83
.77
.761
.79
MA Order
2
2.32
.79
.78
.77
.77
3
2.47
.80
.80
.78
.79
4
2.20
.81
.80
.79
.80
164 / 323
Employment SIC Values of Various ARMA Models
AR Order
0
1
2
3
4
1.05
.83
.86
.90
1
2.91
.90
.86
.87
.92
MA Order
2
2.38
.88
.89
.90
.93
3
2.56
.91
.92
.94
.97
4
2.31
.94
.96
.96
1.00
165 / 323
Employment: ARMA(3,1) Model
LS // Dependent Variable is CANEMP

Sample: 1962:1 1993:4
Convergence achieved after 17 iterations
Variable Coefficient Std. Error tStatistic Prob.
166 / 323
Employment ARMA(3) Model Residual Plot
167 / 323
Employment: ARMA(3,1) Model Residual Correlogram

Sample: 1962:1 1993:4
Qstatistic probabilities adjusted for 4 ARMA term(s)
Acorr. P. Acorr.
1
0.032
2 0.041
3 0.014
4 0.048
5 0.006
6 0.013
7 0.017
8 0.064
9 0.092
10 0.039
11 0.016
12 0.137
0.032
0.040
0.017
0.047
0.007
0.009
0.019
0.060
0.097
0.040
0.022
0.153
Std. Error
Ljung-Box
.09
.09
.09
.09
.09
.09
.09
.09
.09
.09
.09
.09
0.1376
0.3643
0.3904
0.6970
0.7013
0.7246
0.7650
1.3384
2.5182
2.7276
2.7659
5.4415
p-v
0.402
0.696
0.858
0.855
0.774
0.842
0.906
0.710
168 / 323
Employment: ARMA(3) Model
Residual Sample Autocorrelation and Partial Autocorrelation

Functions, With Plus or Minus Two Standard Error Bands
169 / 323
Forecasting Cycles
T = {yT , yT1 , yT2 , ...},
T = {T , T1 , T2 , ...}.
Optimal Point Forecasts for Infinite-Order Moving Averages
yt =
bi ti ,
i=0
where
t WN(0, 2 )
,
170 / 323
Forecasting Cycles Cont.

b0 = 1
, and
2
.
b2i <
i=0
yT+h = T+h + b1 T+h1 + ... + bh T + bh+1 T1 + ...

yT+h,T = bh T + bh+1 T1 + ...
eT+h,T = (yT+h yT+h,T ) =
h1
X
bi T+hi ,
i=0
h2 = 2
h1
X
b2i .
i=0
171 / 323
Interval and Density Forecasts

yT+h = yT+h,T + eT+h,T .
95% h-step-ahead interval forecast:
yT+h,T 1.96h
h-step-ahead density forecast:
N(yT+h,T , h2 )
Making the Forecasts Operational
The Chain Rule of Forecasting
172 / 323
Employment History and Forecast MA(4) Model
173 / 323
Employment History and Long-Horizon Forecast MA(4)

Model
174 / 323
Employment History, Forecast and Realization MA(4)

Model
175 / 323
Employment History and Forecast AR(2) Model
176 / 323
Employment History and Long-Horizon Forecast AR(2)

Model
177 / 323
Employment History and Very Long-Horizon Forecast

AR(2) Model
178 / 323
Employment History, Forecast and Realization AR(2)

Model
179 / 323
Putting it all Together

A Forecast Model with Trend, Seasonal and Cyclical Components
The full model:
yt = Tt () +
s
X
i=1
i Dit +
v1
X
iHD HDVit +
i=1
v2
X
iTD TDVit + t
i=1
(L)t = (L)vt
(L) = 1 1 L ... p Lp
(L) = 1 + 1 L + ... + q Lq
vt WN(0, 2 ).
180 / 323
Point Forecasting
yT+h = TT+h () +
s
X
i Di,T+h +
v1
X
i=1
yT+h,T = TT+h () +
s
X
s
i
X
i=1
v2
X
i=1
i Di,T+h +
i=1
+
y
T+h,T = TT+h ()
iHD HDVi,T+h +
i=1
v1
X
iHD HDVi,T+h
i=1
Di,T+h +
v1
X
i=1
iTD T
iHD HDVi,T+h +
v2
X
iTD
i=1
v2
X
iTD T
i=1
181 / 323
Interval Forecasting and Density Forecasting

Interval Forecasting:
y
T+h,T z/2
h
e.g.: (95% interval)
y
T+h,T 1.96
h
Density Forecasting:
N(
yT+h,T ,
h2 )
182 / 323
Recursive Estimation
yt =
K
X
k xkt + t
k=1
t iidN(0, 2 ),
t = 1, ..., T .
OLS estimation uses the full sample, t = 1, ..., T .
Recursive least squares uses an expanding sample.
Begin with the first K observations and estimate the model.
Then estimate using the first K + 1 observations, and so on.
At the end we have a set of recursive parameter estimates:
k,t , for k = 1, ..., K and t = K , ..., T .
183 / 323
Recursive Residuals
At each t, t = K , ..., T 1, compute a 1-step forecast,
yt+1,t =
K
X
kt xk,t+1 .
k=1
The corresponding forecast errors, or recursive residuals, are

et+1,t = yt+1 yt+1,t .
et+1,t N(0, 2 rt )
where rt > 1 for all t
184 / 323
Standardized Recursive Residuals and CUSUM
wt+1,t
et+1,t
,
rt
t = K , ..., T 1.
Under the maintained assumptions,
wt+1,t iidN(0, 1).
Then
CUSUMt
t
X
wt+1,t , t = K , ..., T 1
t=K
is just a sum of iid N(0, 1)s.

185 / 323
Liquor Sales, 1968.1-1993.12
186 / 323
Log Liquor Sales, 1968.01 - 1993.12
187 / 323
Log Liquor Sales: Quadratic Trend Regression
188 / 323
Liquor Sales Quadratic Trend Regression Residual Plot
189 / 323
Liquor Sales Quadratic Trend Regression Residual

Correlogram
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Acorr.
0.117
0.149
0.106
0.014
0.142
0.041
0.134
0.029
0.136
0.205
0.056
0.888
0.055
0.187
0.159
P. Acorr.
0.117
0.165
0.069
0.017
0.125
0.004
0.175
0.046
0.080
0.206
0.080
0.879
0.507
0.159
0.144
Std. Error
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
Ljung-Box
4.3158 0.0
11.365 0.0
14.943 0.0
15.007 0.0
21.449 0.0
21.979 0.0
27.708 0.0
27.975 0.0
33.944 0.0
47.611 0.0
48.632 0.0
306.26 0.0
307.25 0.0
318.79 0.0
327.17
0.0
190 / 323
Liquor Sales Quadratic Trend Regression Residual Sample

Autocorrelation Functions
191 / 323
Liquor Sales Quadratic Trend Regression Residual Partial

192 / 323
Log Liquor Sales: Quadratic Trend Regression With

Seasonal Dummies and AR(3) Disturbances
193 / 323
Liquor Sales Quadratic Trend Regression with Seasonal

Dummies Residual Plot
194 / 323

Dummies Residual Correlogram
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Acorr.
0.700
0.686
0.725
0.569
0.569
0.577
0.460
0.480
0.466
0.327
0.364
0.355
0.225
0.291
0.211
P. Acorr.Std.Error
0.700
.056
0.383
.056
0.369
.056
0.141
.056
0.017
.056
0.093
.056
0.078
.056
0.043
.056
0.030
.056
0.188
.056
0.019
.056
0.089
.056
0.119
.056
0.065
.056
0.119
.056
Ljung-Box
p-value
154.34 0.000
302.86 0.000
469.36 0.000
572.36 0.000
675.58 0.000
782.19 0.000
850.06 0.000
924.38 0.000
994.46 0.000
1029.1 0.000
1072.1 0.000
1113.3 0.000
1129.9 0.000
1157.8 0.000
1172.4 0.000
195 / 323

Dummies Residual Sample Autocorrelation Functions
196 / 323

Dummies Residual Sample Partial Autocorrelation
Functions
197 / 323

Dummies and AR(3) Disturbances Residual Plot
198 / 323

Dummies and AR(3) Disturbances Residual Correlogram
Acorr. P. Acorr.
1 0.056
0.056
2 0.037
0.034
3 0.024
0.020
4 0.084
0.088
5 0.007
0.001
6 0.065
0.072
7 0.041
0.044
8 0.069
0.063
9 0.080
0.074
10 0.163
0.169
11 0.009
0.005
12 0.145
0.175
13 0.074
0.078
14 0.149
0.113
15 0.039
0.060
Std. Error
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
Ljung-Box
p-v
0.9779 0.323
1.4194 0.492
1.6032 0.659
3.8256 0.430
3.8415 0.572
5.1985 0.519
5.7288 0.572
7.2828 0.506
9.3527 0.405
18.019 0.055
18.045 0.081
24.938 0.015
26.750 0.013
34.034 0.002
34.532 0.003
199 / 323

Dummies and AR(3) Disturbances Residual Sample
200 / 323

Dummies and AR(3) Disturbances Residual Sample Partial
201 / 323

Dummies and AR(3) Disturbances Residual Histogram and
Normality Test
202 / 323
Log Liquor Sales History and 12-Month-Ahead Forecast
203 / 323
Log Liquor Sales History, 12-Month-Ahead Forecast, and

Realization
204 / 323
Log Liquor Sales History and 60-Month-Ahead Forecast
205 / 323
Log Liquor Sales Long History and 60-Month-Ahead

Forecast
206 / 323
Liquor Sales Long History and 60-Month-Ahead Forecast
207 / 323
Recursive Analysis Constant Parameter Model
208 / 323
Recursive Analysis Breaking Parameter Model
209 / 323
Log Liquor Sales: Quadratic Trend Regression with

Seasonal Dummies and AR(3) Residuals and Two
Standard Errors Bands
210 / 323

Seasonal Dummies and AR(3) Disturbances Recursive
Parameter Estimates
211 / 323

Seasonal Dummies and AR(3) Disturbances CUMSUM
Analysis
212 / 323
Forecasting with Regression Models

Conditional Forecasting Models and Scenario Analysis
y t = 0 + 1 x t + t
t N(0, 2 )
yT+h,T |xT+h = 0 + 1 xT+h
Density forecast:
N(yT+h,T |xT+h , 2 )
Scenario analysis, contingency analysis
No forecasting the RHS variables problem
213 / 323
Unconditional Forecasting Models
yT+h,T = 0 + 1 xT+h,T
Forecasting the RHS variables problem
Could fit a model to x (e.g., an autoregressive model)
Preferably, regress y on
xth , xth1,
...
No problem in trend and seasonal models
214 / 323
Distributed Lags
Start with unconditional forecast model:
yt = 0 + xt1 + t
Generalize to
yt = 0 +
Nx
X
i xti + t
i=1
distributed lag model

lag weights
lag distribution
215 / 323
Polynomial Distributed Lags
min
0 , i
T
X
"
yt 0
t = Nx +1
Nx
X
#2
i xti
i=1
subject to
i = P(i) = a + bi + ci2 , i = 1, ..., Nx
Lag weights constrained to lie on low-order polynomial
Additional constraints can be imposed, such as
P(Nx ) = 0
Smooth lag distribution
Parsimonious
216 / 323
yt =
A(L)
x t + t
B(L)
Equivalently,
B(L)yt = A(L)xt + B(L) t
Lags of x and y included
Important to allow for lags of y, one way or another
217 / 323
Another way:
distributed lag regression with lagged dependent variables
yt = 0 +
Ny
X
i yti +
i=1
Nx
X
j xtj + t
j=1
Another way:
distributed lag regression with ARMA disturbances
yt = 0 +
Nx
X
i xti + t
i=1
t =
(L)
vt
(L)
vt WN(0, 2 )
218 / 323
Another Way: The Transfer function Model and Various

Special Cases
Univariate ARMA
yt =
C(L)
t
D(L)
A(
Distributed Lag with
B(L) yt = A(L) xt + t
, or
Lagged Dep. Variables
yt =
1
A(L)
xt +
t
B(L)
B(L)
C(
Distributed Lag with
219 / 323
Vector Autoregressions
e.g., bivariate VAR(1)
y1,t = 11 y1,t1 + 12 y2,t1 + 1,t
y2,t = 21 y1,t1 + 22 y2,t1 + 2,t
1,t WN(0, 12 )
Estimation by OLS
2,t WN(0, 22 )
cov(1,t , 2,t ) = 12
Order selection by information criteria
Impulse-response functions, variance decompositions, predictive
causality
Forecasts via Wolds chain rule
220 / 323
Point and Interval Forecast
221 / 323
U.S. Housing Starts and Completions, 1968.01-1996.06
222 / 323
Starts Correlogram
Sample: 1968:01 1991:12
Acorr.
1 0.937
2 0.907
3 0.877
4 0.838
5 0.795
6 0.751
7 0.704
8 0.650
9 0.604
10 0.544
11 0.496
12 0.446
13 0.405
14 0.346
P. Acorr.
0.937
0.244
0.054
0.077
0.096
0.058
0.067
0.098
0.004
0.129
0.029
0.008
0.076
0.144
Std. Error
Ljung-Box
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
255.24
495.53
720.95
927.39
1113.7
1280.9
1428.2
1554.4
1663.8
1752.6
1826.7
1886.8
1936.8
1973.3
p-v
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
223 / 323
Starts Sample Autocorrelations
224 / 323
Starts Sample Partial Autocorrelations
225 / 323
Completions Correlogram
Completions Correlogram
Sample: 1968:01 1991:12
Acorr.
1 0.939
2 0.920
3 0.896
4 0.874
5 0.834
6 0.802
7 0.761
8 0.721
9 0.677
10 0.633
11 0.583
12 0.533
P. Acorr.
0.939
0.328
0.066
0.023
0.165
0.067
0.100
0.070
0.055
0.047
0.080
0.073
Std. Error
Ljung-Box
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
256.61
504.05
739.19
963.73
1168.9
1359.2
1531.2
1686.1
1823.2
1943.7
2046.3
2132.2
p-v
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
226 / 323
Completions Sample Autocorrelations
227 / 323
Completions Partial Autocorrelations
228 / 323
Starts and Completions: Sample Cross Correlations
229 / 323
VAR Order Selection with AIC and SIC
230 / 323
VAR Starts Equation
231 / 323
VAR Start Equation Residual Plot
232 / 323
VAR Starts Equation Residual Correlogram

Sample: 1968:01 1991:12
Acorr.
1 0.001
2 0.003
3 0.006
4 0.023
5 0.013
6 0.022
7 0.038
8 0.048
9 0.056
10 0.114
11 0.038
12 0.030
13 0.192
14 0.014
P. Acorr.
0.001
0.003
0.006
0.023
0.013
0.021
0.038
0.048
0.056
0.116
0.038
0.028
0.193
0.021
Std. Error
Ljung-Box
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.0004
0.0029
0.0119
0.1650
0.2108
0.3463
0.7646
1.4362
2.3528
6.1868
6.6096
6.8763
17.947
18.010
p-v
0.985
0.999
1.000
0.997
0.999
0.999
0.998
0.994
0.985
0.799
0.830
0.866
0.160
0.206
233 / 323
VAR Starts Equation Residual Sample Autocorrelations
234 / 323
Var Starts Equation Residual Sample Partial

Autocorrelations
235 / 323

Evaluating a single forecast Process:
yt = + t + b1 t1 + b2 t2 + ...
t WN(0, 2 ),
h-step-ahead linear least-squares forecast:
yt+h,t = + bh t + bh+1 t1 + ...
Corresponding h-step-ahead forecast error:
et+h,t = yt+h yt+h,t = t+h + b1 t+h1 + ... + bh1 t+1
with variance
h2
= (1 +
h1
X
b2i )
i=1
236 / 323
So, four key properties of optimal forecasts:

a. Optimal forecasts are unbiased
b. Optimal forecasts have 1-step-ahead errors that are white noise
c. Optimal forecasts have h-step-ahead errors that are at most
MA(h-1)
d. Optimal forecasts have h-step-ahead errors with variances that
are non-decreasing in h and that converge to the unconditional
variance of the process
1. All are easily checked. How?
237 / 323
Assessing optimality with respect to an information set

Unforecastability principle: The errors from good forecasts are not
be forecastable!
Regression:
k1
X
et+h,t = +
i xit + ut
i=1
1. Test whether 0 , ..., k1 are 0

Important case:
et+h,t = + 1 yt+h,t + ut
1. Test whether (0 , 1 ) = (0, 0)
Equivalently,
yt+h,t = + 1 yt+h,t + ut
1. Test whether (0 , 1 ) = (0, 1)
238 / 323
Evaluating multiple forecasts: comparing forecast accuracy

Forecast errors, et+h,t = yt+h yt+h,t Forecast percent errors,
y y
pt+h,t = t+hyt+ht+h,t
T
1X
et+h,t
T
ME =
t=1
EV =
1
T
T
X
(et+h,t ME)2
1
T
1X 2
et+h,t
MSE =
T
t=1
T
1X 2
pt+h,t
T
t=1
v
u
T
u1 X
RMSE = t
e2t+h,t
T
MSPE =
t=1
239 / 323
Forecast encompassing
a
b
yt+h = a yt+h,t
+ b yt+h,t
+ t+h,t
1. If (a , b ) = (1, 0), model a forecast-encompasses model b

2. If (a , b ) = (0, 1), model b forecast-encompasses model a
3. Otherwise, neither model encompasses the other
Alternative approach:
a
b
(yt+h yt = (yt+h,t
yt ) + b (yt+h,t
yt ) + t+h,t
1. Useful in I(1) situations
240 / 323
Variance-covaiance forecast combination

Composite formed from two unbiased forecasts:
c
a
b
yt+h
= yt+h,t
+ (1 )yt+h,t
ect+h = eat+h,t + (1 )ebt+h,t

2
2
2
c2 = 2 aa
+ (1 )2 bb
+ 2(1 )ab
2 2
bb
ab
2 + 2 2sigma2
bb
aa
ab
2
2
bb
ab
2 +
2
2 2
bb
aa
ab
241 / 323
Regression-based forecast combination
a
b
yt+h = 0 + 1 yt+h,t
+ 2 yt+h,t
+ t+h,t
1. Equivalent to variance-covariance combination if weights sum

to unity and intercept is excluded
2. Easy extension to include more than two forecasts
3. Time-varying combining weights
4. Dynamic combining regressions
5. Shrinkage of combining weights toward equality
6. Nonlinear combining regressions
242 / 323

Models, and Smoothing
1. Stochastic Trends and Forecasting
(L)yt = (L)t
(L) = 0 (L)(1 L)
0 (L)(1 L)yt = (L)t
0 (L)yt = (L)t
I(0) vs I(1) processes
243 / 323

Models, and Smoothing Cont.
I
Random Walk
yt = yt1 + t
t WN(0, 2 ),
I
Random walk with drift

yt = + yt1 + t
t WN(0, 2 ),
Stochastic trend vs deterministic trend
244 / 323
Properties of random walks

yt = yt1 + t
t WN(0, 2 ),
With time 0 value y0 :
yt = y0 +
t
X
i=1
E(yt ) = y0
var(yt ) = t 2
lim var(yt ) =
245 / 323

yt = + yt1 + t
t WN(0, 2 ),
Assuming time 0 value y0 :
yt = t + y0 +
t
X
i=1
E(yt ) = y0 + t
var(yt ) = t 2
lim var(yt ) =
246 / 323
ARIMA(p,1,q) model
(L)(1 L)yt = c + (L)t

or
(1 L)yt = c1 (L) + 1 (L)(L)t
where
(L) = 1 1 L ... p Lp
(L) = 1 1 L ... q Lq
and all the roots of both lag operator polynomials are outside the
unit circle.
247 / 323
ARIMA(p,d,q) model
(L)(1 L)d yt = c + (L)t

or
(1 L)d yt = c1 (L) + 1 (L)(L)t
where
(L) = 1 1 L ... p Lp
(L) = 1 1 L ... q Lq
and all the roots of both lag operator polynomials are outside the
unit circle.
248 / 323
Properties of ARIMA(p,1,q) processes
I
I
Appropriately made stationary by differencing

Shocks have permanent effects
I
Forecasts dont revert to a mean
Variance grows without bound as time progresses

I
Interval forecasts widen without bound as horizon grows
249 / 323
Random walk example

Point forecast
Recall that for the AR(1) process,
yt = yt1 + t
yt WN(0, 2 )
the optimal forecast is
yT+h,T = h yT
Thus in the random walk case,
yT+h,T = yT , for all h
250 / 323
Random walk example Cont.

Interval and density forecasts
Recall error associated with optimal AR(1) forecast:
eT+h,T = (yT+h yT+h,T ) = T+h + T+h1 + ... + h1 T+1
with variance
h2
h1
X
2i
i=0
Thus in the random walk case,

eT+h,T =
h1
X
T+hi
i=0
h2 = h 2
h step ahead 95% interval : yT 1.96 h

h step ahead density forecast : N(yT , h 2 )
251 / 323
Effects of Unit Roots

I
Sample autocorrelation function fails to damp
Sample partial autocorrelation function near 1 for = 1, and

then damps quickly
Properties of estimators change
e.g., least-squares autoregression with unit roots

True process:
yt = yt1 + t
Estimated model:
yt = yt1 + t
Superconsistency:T (LS 1) stabilizes as sample size grows
Bias: E (LS ) < 1
I
Offsetting effects of bias and superconsistency
252 / 323
Unit Root Tests
yt = yt1 + t
iid
t N(0, 2 )
1
=
s
PT 1
t=2
2
yt1
Dickey-Fuller distribution
Trick regression:
yt yt = ( 1)yt1 + t
253 / 323
Allowing for nonzero mean under the alternative

Basic model:
(yt ) = (yt1 ) + t
which we rewrite as
yt = + yt1 + t
where
= (1 )
vanishes when = 1 (null)
is nevertheless present under the alternative,
so we include an intercept in the regression

Dickey-Fuller distribution
254 / 323
Allowing for deterministic linear trend under the alternative

Basic model:
(yt a bTIMEt ) = (yt1 a b TIMEt1 ) + t
or
yt = + TIMEt + yt1 + t
where = a(1 ) + b and = b(1 ).
I
Under the null hypothesis we have a random walk with drift,

yt = b + yt1 + t
Under the deterministic-trend alternative hypothesis both the

intercept and the trend enter and so are included in the
regression.
255 / 323
Allowing for higher-order autoregressive dynamics

AR(p) process:
yt +
p
X
j ytj = t
j=1
Rewrite:
yt = 1 yt1 +
p
X
j (ytj+1 ytj ) + t
j=2
where p 2, 1 =
Pp
j=1 j ,
and i =
Pp
j=1 j ,
i = 2, ..., p.
Unit root: 1 = 1 (AR(p 1) in first differences)

distribution holds asymptotically.
256 / 323
Allowing for a nonzero mean in the AR(p) case
(yt ) +
p
X
j (ytj ) = t
j=1
or
yt = + yt1 +
p
X
j (ytj+1 ytj ) + t ,
j=2
P
where = (1 + pj=1 j ), and the other parameters are as
above.
In the unit root case, the intercept vanishes, because
Pp

j=1 j = 1.
257 / 323
Allowing for trend under the alternative

(yt a bTIMEt ) +
p
X
j (ytj a bTIMEtj ) = t
j=1
or
yt = k1 + k2 TIMEt + 1 yt1 +
p
X
j (ytj+1 ytj ) + t
j=2
where
k1 = a(1 +
p
X
i ) b
i=1
and
k2 = bTIMEt (1 +
p
X
ii
i=1
p
X
i )
i=1
In the unit root case, k1 = b
Pp
i=1 ii
and k2 = 0.

258 / 323
General ARMA representations: augmented Dickey-Fuller

tests
yt = 1 yt1 +
k1
X
j (ytj+1 ytj ) + t
j=2
yt = + 1 yt1 +
k1
X
j (ytj+1 ytj ) + t
j=2
yt = k1 + k2 TIMEt + 1 yt1 +
k1
X
j (ytj+1 ytj ) + t
j=2
I
k-1 augmentation lags have been included
, , and hold asymptotically under the null

259 / 323
Simple moving average smoothing
1. Original data: {yt }T

t=1
2. Smoothed data: {
yt }
P
3. Two-sided moving average is yt = (2m + 1)1 m
i=m yti
P
4. One-sided moving average is yt = (m + 1)1 m
i=0 yti
Pm
5. One-sided weighted moving average is yt =
i=0 wi yti
I
Must choose smoothing parameter, m
260 / 323
Exponential Smoothing
Local level model:

yt = c0t + t
c0t = c0,t1 + t
t WN(0, 2 )
I
Exponential smoothing can construct the optimal estimate of

c0 - and hence the optimal forecast of any future value of y on the basis of current and past y
What if the model is misspecified?
261 / 323
Exponential smoothing algorithm
Observed series, {yt }T

t=1
Smoothed series, {
y t }T
t=1 (estimate of the local level)
Forecasts, yT +h,T
1. Initialize at t = 1: y1 = y1
2. Update: yt = yt + (1 )
yt1 , t = 2, ...T
3. Forecast: yT + h, T = yT
Smoothing parameter [0, 1]
262 / 323
Demonstration that the weights are exponential

Start:
y
t = yt + (1 )
yt1
Substitute backward for y :
y
t =
t1
X
wj ytj
j=0
where
wj = (1 )j
I
Exponential weighting, as claimed
Convenient recursive structure
263 / 323
Holt-Winters Smoothing
yt = c0t + c1t TIMEt + t

c0t = c0,t1 + t
c1t = c1,t1 + t
I
Local level and slope model
Holt-Winters smoothing can construct optimal estimates of c0

and c1 - hence the optimal forecast of any future value of y by
extrapolating the trend - on the basis of current and past y
264 / 323
Holt-Winters smoothing algorithm

1. Initialize at t = 2:
y
2 = y2
F2 = y2 y1
2. Update:
y
t = yt + (1 )(
yt1 + Ft1 ), 0 < < 1
Ft = (yt y
t1 ) + (1 )Ft1 , 0 < < 1
t = 3, 4, ..., T.
3. Forecast: yT +h,T = yT + hFT
I
yt is the estimated level at time t
Ft is the estimated slope at time t

265 / 323
Random Walk Level and Change
266 / 323
Random Walk With Drift Level and Change
267 / 323
U.S. Per Capita GNP History and Two Forecasts
268 / 323
U.S. Per Capita GNP History, Two Forecasts, and

Realization
269 / 323
Random Walk, Levels Sample Autocorrelation Function

(Top Panel) and Sample Partial Autocorrelation Function
(Bottom Panel)
270 / 323
Random Walk, First Differences Sample Autocorrelation

Function (Top Panel) and Sample Partial Autocorrelation
Function (Bottom Panel)
271 / 323
Log Yen / Dollar Exchange Rate (Top Panel) and Change

in Log Yen / Dollar Exchange Rate (Bottom Panel)
272 / 323
Log Yen / Dollar Exchange Rate Sample

Autocorrelations (Top Panel) and Sample Partial
Autocorrelations (Bottom Panel)
273 / 323
Log Yen / Dollar Exchange Rate, First Differences

Sample Autocorrelations (Top Panel) and Sample Partial
Autocorrelations (Bottom Panel)
274 / 323
Log Yen / Dollar Rate, Levels AIC and SIC Values of

Various ARMA Models
275 / 323
Log Yen / Dollar Exchange Rate Best-Fitting

Deterministic-Trend Model
276 / 323

Deterministic-Trend Model : Residual Plot
277 / 323
Log Yen / Dollar Rate History and Forecast : AR(2) in

Levels with Linear Trend
278 / 323
Log Yen / Dollar Rate History and Long-Horizon

Forecast : AR(2) in Levels with Linear Trend
279 / 323
Log Yen / Dollar Rate History, Forecast and Realization :

AR(2) in Levels with Linear Trend
280 / 323
Log Yen / Dollar Exchange Rate Augmented

Dickey-Fuller Unit Root Test
281 / 323
Log Yen / Dollar Rate, Changes AIC and SIC Values of

Various ARMA Models
282 / 323

Stochastic-Trend Model
283 / 323

Stochastic-Trend Model : Residual Plot
284 / 323
Log Yen / Dollar Rate History and Forecast : AR(1) in

Differences with Intercept
285 / 323

Forecast : AR(1) in Differences with Intercept
286 / 323

AR(1) in Differences with Intercept
287 / 323
Log Yen / Dollar Exchange Rate Holt-Winters

Smoothing
288 / 323
Log Yen / Dollar Rate History and Forecast :

289 / 323

Forecast : Holt-Winters Smoothing
290 / 323

291 / 323
Volatility Measurement, Modeling and Forecasting

The main idea:
t | t1 (0, t2 )
t1 = {t1 , t2 , ...}
Well look at:
I
Basic Structure and properties
Time variation in volatility and prediction-error variance
ARMA representation in squares
GARCH(1,1) and exponential smoothing
Unconditional symmetry and leptokurtosis
Convergence to normality under temporal aggregation
Estimation and testing

292 / 323
Basic Structure and Properties
Standard models (e.g., ARMA):

I
Unconditional mean: constant
Unconditional variance: constant
Conditional mean: varies
Conditional variance: constant (unfortunately)
k-step-ahead forecast error variance: depends only on k, not

on t (again unfortunately)
293 / 323
The Basic ARCH Process

yt = B(L)t
B(L) =
bi L
i = 0
b2i <
b0 = 1
i = 0
t | t1 N(0, t2 )
t2 = + (L)2t
>0
(L) =
p
X
i Li
i 0 for all i
i < 1.
i = 1
294 / 323
The Basic ARCH Process cont.

ARCH(1) process:
rt | t1 (0, t2 )
t2 = + r2t1
I
Unconditional mean: E (rt ) = 0
Unconditional variance: E (rt E (rt ))2 =
Conditional mean: E (rt | t1 ) = 0
Conditional variance:
2
E ([rt E (rt | t1 )]2 | t1 ) = + rt1
295 / 323
The GARCH Process
yt = t
t | t1 N(0, t2 )
t2 = + (L)2t + (L)t2
(L) =
p
X
i L ,
(L) =
i = 1
> 0,
i 0,
q
X
i Li
i = 1
i 0,
i +
i < 1.
296 / 323
Time Variation in Volatility and Prediction Error Variance
Prediction error variance depends on t1

I
e.g. 1-step-ahead prediction error variance is now

2
t2 = + r2t1 + t1
Conditional variance is a serially correlated RV

I
Again, follows immediately from

2
t2 = + r2t1 + t1
297 / 323
ARMA Representation in Squares

rt2 has the ARMA(1,1) representation:
r2t = + ( + )r2t1 t1 + t
where t = rt2 t2
Important result:
The above equation is simply
r2t = ( + ( + )r2t1 t1 ) + t
= t2 + t
Thus rt2 is a noisy indicator of t2
298 / 323
GARCH(1,1) and Exponential Smoothing

Exponential smoothing recursion:
r2t = r2t + (1 )r2t1
Back substitution yields:
r2t =
wj r2tj
where
wj = (1 )j
GARCH(1,1)
2
t2 = + r2t1 + t1
Back substitution yields:

t2 =
+
j1 r2tj
1
299 / 323
Unconditional Symmetry and Leptokurtosis
Volatility clustering produces unconditional leptokurtosis
Conditional symmetry translates into unconditional symmetry

Unexpected agreement with the facts!
Convergence to Normality under Temporal Aggregation
Temporal aggregation of covariance stationary GARCH

processes produces convergence to normality.
Again, unexpected agreement with the facts!
300 / 323
Estimation and Testing

Estimation: easy!
Maximum Likelihood Estimation
L(; r1 , ..., rT ) = f(rT | T1 ; ) f(rT1 | T2 ; )...
If the conditional densities are Gaussian,

1 r2
1
f(rt | t1 ; ) = t2 ()1/2 exp 2 t
.
2 t ()
2
We can ignore the f (rp , ..., r1 ; ) term, yielding the likelihood:
T
T
1 X r2t
Tp
1 X
ln t2 ()
.
ln(2)
2
2
2
2 ()
t=p+1
t=p+1 t
Testing: likelihood ratio tests

Graphical diagnostics: Correlogram of squares, correlogram of
squared standardized residuals
301 / 323
Variations on Volatility Models
We will look at:

I
Asymmetric response and the leverage effect
Exogenous variables
GARCH-M and time-varying risk premia
302 / 323
Asymmetric Response and the Leverage Effect:

TGARCH and EGARCH
Asymmetric response I: TARCH
Standard GARCH:
2
t2 = + r2t1 + t1
TARCH:
2
t2 = + r2t1 + r2t1 Dt1 + t1
where
positive return (good news): effect on volatility
negative return (bad news): + effect on volatility
6= 0 : Asymmetric news response
> 0 : Leverage effect
303 / 323
Asymmetric Response and the Leverage Effect Cont.
Asymmetric Response II: E-GARCH

rt1
2
2
+ rt1 + ln(t1

)
ln(t ) = +
t1
t1
I
Log specification ensures that the conditional variance is

positive.
Volatility driven by both size and sign of shocks
Leverage effect when < 0
304 / 323
Introducing Exogenous Variables
rt | t1 N(0, t2 )
2
t2 = + r2t1 + t1
+ Xt
where:
is a parameter vector
X is a set of positive exogenous variables.
305 / 323
Component GARCH
Standard GARCH:
2
(t2
) = (r2t1
) + (t1
),
for constant long-run volatility

.
Component GARCH:
2
(t2 qt ) = (r2t1 qt1 ) + (t1
qt1 ),
for time-varying long-run volatility qt , where

2
)
qt = + (qt1 ) + (r2t1 t1
306 / 323
Component GARCH Cont.
Transitory dynamics governed by +
Persistent dynamics governed by
Equivalent to nonlinearly restricted GARCH(2,2)
Exogenous variables and asymmetry can be allowed:
2
(t2 qt ) = (r2t1 qt1 ) + (r2t1 qt1 )Dt1 + (t1
qt1 ) +
307 / 323
Regression with GARCH Disturbances
yt = x0t + t
t | t1 N(0, t2 )
308 / 323
GARCH-M and Time-Varying Risk Premia

Standard GARCH regression model:
yt = x0t + t
t | t1 N(0, t2 )
GARCH-M model is a special case:
yt = x0t + t2 + t
t | t1 N(0, t2 )
Time-varying risk premia in excess returns

309 / 323
Time Series Plot NYSE Returns
310 / 323
Histogram and Related Diagnostic Statistics NYSE

Returns
311 / 323
Correlogram NYSE Returns
312 / 323
Time Series Plot Squared NYSE Returns
313 / 323
Correlogram Squared NYSE Returns
314 / 323
AR(5) Model Squared NYSE Returns
315 / 323
ARCH(5) Model NYSE Returns
316 / 323
Correlogram Standardized ARCH(5) Residuals : NYSE

Returns
317 / 323
GARCH(1,1) Model NYSE Returns
318 / 323
Correlogram Standardized GARCH(1,1) Residuals :

NYSE Returns
319 / 323
Estimated Conditional Standard Deviation GARCH(1,1)

Model : NYSE Returns
320 / 323
Estimated Conditional Standard Deviation Exponential

Smoothing : NYSE Returns
321 / 323
Conditional Standard Deviation History and Forecast :

GARCH(1,1) Model
322 / 323
Conditional Standard Deviation Extended History and

Extended Forecast : GARCH(1,1) Model
323 / 323

Forecasting Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Forecasting Slides

Uploaded by

Copyright:

Available Formats

Forecasting

August 11, 2015

c 2013 onward, by Francis X. Diebold.

Elements of Forecasting in Business, Finance, Economics

Operations planning and control

Forecasting Methods: An Overview

Statistical Graphics for Forecasting

Why graphical analysis is important

Simple graphical techniques

Elements of graphical style

Application: graphing four components of real GNP

Modeling and Forecasting Trend

Estimating trend models

Selecting forecasting models using the Akaike and Schwarz

Application: forecasting retail sales

Modeling and Forecasting Seasonality

The nature and sources of seasonality

Forecasting seasonal series

Application: forecasting housing starts

Covariance stationary time series

The lag operator

Wolds theorem, the general linear process, and rational

Estimation and inference for the mean, autocorrelation and

Application: characterizing Canadian employment dynamics

Modeling Cycles: MA, AR and ARMA Models

Moving-average (MA) models

Autoregressive (AR) models

Autoregressive moving average (ARMA) models

Application: specifying and estimating models for forecasting

Forecasting moving average processes

Forecasting infinite-ordered moving averages

Making the forecasts operational

The chain rule of forecasting

Application: forecasting employment

Putting it all Together: A Forecasting Model with Trend,

Assembling what weve learned

Application: forecasting liquor sales

Recursive estimation procedures for diagnosing and selecting

Forecasting with Regression Models

Conditional forecasting models and scenario analysis

Accounting for parameter uncertainty in confidence intervals

Unconditional forecasting models

Distributed lags, polynomial distributed lags, and rational

Regressions with lagged dependent variables, regressions with

Impulse-response functions and variance decomposition

Application: housing starts and completions

Evaluating and Combining Forecasts

Evaluating a single forecast

Evaluating two or more forecasts: comparing forecast accuracy

Forecast encompassing and forecast combination

Application: OverSea shipping volume on the Atlantic East

Unit Roots, Stochastic Trends, ARIMA Forecasting

Stochastic trends and forecasting

Unit roots: estimation and testing

Application: modeling and forecasting the yen/dollar

Exchange rates, continued

Volatility Measurement, Modeling and Forecasting

The basic ARCH process

The GARCH process

Extensions of ARCH and GARCH models

Estimating, forecasting and diagnosing GARCH models

Application: stock market volatility

Useful Books, Journals and Software

Wonnacott, T.H. and Wonnacott, R.J. (1990), Introductory