Download as pdf or txt
Download as pdf or txt
You are on page 1of 323

Forecasting

Francis X. Diebold
University of Pennsylvania

August 11, 2015


1 / 323

c 2013 onward, by Francis X. Diebold.


Copyright
These materials are freely available for your use, but be warned:
they are highly preliminary, significantly incomplete, and rapidly
evolving. All are licensed under the Creative Commons
Attribution-NonCommercial-NoDerivatives 4.0 International
License. (Briefly: I retain copyright, but you can use, copy and
distribute non-commercially, so long as you give me attribution and
do not modify. To view a copy of the license, visit
http://creativecommons.org/licenses/by-nc-nd/4.0/.) In return I
ask that you please cite the books whenever appropriate, as:
Diebold, F.X. (year here), Book Title Here, Department of
Economics, University of Pennsylvania,
http://www.ssc.upenn.edu/ fdiebold/Textbooks.html.
The painting is Enigma, by Glen Josselsohn, from Wikimedia
Commons.
2 / 323

Elements of Forecasting in Business, Finance, Economics


and Government

1. Forecasting in Action
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9

Operations planning and control


Marketing
Economics
Financial speculation
Financial risk management
Capacity planning
Business and government budgeting
Demography
Crisis management

3 / 323

Forecasting Methods: An Overview


Review of probability, statistics and regression
Six Considerations Basic to Successful Forecasting
1. Forecasts and decisions
2. The object to be forecast
3. Forecast types
4. The forecast horizon
5. The information set
6. Methods and complexity
6.1 The parsimony principle
6.2 The shrinkage principle

4 / 323

Statistical Graphics for Forecasting

Why graphical analysis is important

Simple graphical techniques

Elements of graphical style

Application: graphing four components of real GNP

5 / 323

Modeling and Forecasting Trend

Modeling trend

Estimating trend models

Forecasting trend

Selecting forecasting models using the Akaike and Schwarz


criteria

Application: forecasting retail sales

6 / 323

Modeling and Forecasting Seasonality

The nature and sources of seasonality

Modeling seasonality

Forecasting seasonal series

Application: forecasting housing starts

7 / 323

Characterizing Cycles

Covariance stationary time series

White noise

The lag operator

Wolds theorem, the general linear process, and rational


distributed lags

Estimation and inference for the mean, autocorrelation and


partial autocorrelation functions

Application: characterizing Canadian employment dynamics

8 / 323

Modeling Cycles: MA, AR and ARMA Models

Moving-average (MA) models

Autoregressive (AR) models

Autoregressive moving average (ARMA) models

Application: specifying and estimating models for forecasting


employment

9 / 323

Forecasting Cycles

Optimal forecasts

Forecasting moving average processes

Forecasting infinite-ordered moving averages

Making the forecasts operational

The chain rule of forecasting

Application: forecasting employment

10 / 323

Putting it all Together: A Forecasting Model with Trend,


Seasonal and Cyclical Components

Assembling what weve learned

Application: forecasting liquor sales

Recursive estimation procedures for diagnosing and selecting


forecasting models

11 / 323

Forecasting with Regression Models


I

Conditional forecasting models and scenario analysis

Accounting for parameter uncertainty in confidence intervals


for conditional forecasts

Unconditional forecasting models

Distributed lags, polynomial distributed lags, and rational


distributed lags

Regressions with lagged dependent variables, regressions with


ARMA disturbances, and transfer function models

Vector autoregressions

Predictive causality

Impulse-response functions and variance decomposition

Application: housing starts and completions

12 / 323

Evaluating and Combining Forecasts

Evaluating a single forecast

Evaluating two or more forecasts: comparing forecast accuracy

Forecast encompassing and forecast combination

Application: OverSea shipping volume on the Atlantic East


trade lane

13 / 323

Unit Roots, Stochastic Trends, ARIMA Forecasting


Models, and Smoothing

Stochastic trends and forecasting

Unit roots: estimation and testing

Application: modeling and forecasting the yen/dollar


exchange rate

Smoothing

Exchange rates, continued

14 / 323

Volatility Measurement, Modeling and Forecasting

The basic ARCH process

The GARCH process

Extensions of ARCH and GARCH models

Estimating, forecasting and diagnosing GARCH models

Application: stock market volatility

15 / 323

Useful Books, Journals and Software


Books
Statistics review, etc.:
I

Wonnacott, T.H. and Wonnacott, R.J. (1990), Introductory


Statistics, Fifth Edition. New York: John Wiley and Sons.

Pindyck, R.S. and Rubinfeld, D.L. (1997),Econometric Models


and Economic Forecasts, Fourth Edition. New York:
McGraw-Hill.

Maddala, G.S. (2001), Introduction to Econometrics, Third


Edition. New York: Macmillan.

Kennedy, P. (1998), A Guide to Econometrics, Fourth Edition.


Cambridge, Mass.: MIT Press.

16 / 323

Useful Books, Journals and Software cont.

Time series analysis:


I

Chatfield, C. (1996), The Analysis of Time Series: An


Introduction, Fifth Edition. London: Chapman and Hall.

Granger, C.W.J. and Newbold, P. (1986), Forecasting


Economic Time Series, Second Edition. Orlando, Florida:
Academic Press.

Harvey, A.C. (1993), Time Series Models, Second Edition.


Cambridge, Mass.: MIT Press.

Hamilton, J.D. (1994), Time Series Analysis, Princeton:


Princeton University Press.

17 / 323

Useful Books, Journals and Software cont.

Special insights:
I

Armstrong, J.S. (Ed.) (1999), The Principles of Forecasting.


Norwell, Mass.: Kluwer Academic Forecasting.

Makridakis, S. and Wheelwright S.C. (1997), Forecasting:


Methods and Applications, Third Edition. New York: John
Wiley.

Bails, D.G. and Peppers, L.C. (1997), Business Fluctuations.


Englewood Cliffs: Prentice Hall.

Taylor, S. (1996), Modeling Financial Time Series, Second


Edition. New York: Wiley.

18 / 323

Useful Books, Journals and Software cont.

Journals
I

Journal of Forecasting

Journal of Business Forecasting Methods and Systems

Journal of Business and Economic Statistics

Review of Economics and Statistics

Journal of Applied Econometrics

19 / 323

Useful Books, Journals and Software cont.


Software
I

General:
I
I
I
I
I
I
I

Cross-section:
I

Eviews
S+
Minitab
SAS
R
Python
Many more...

Stata

Open-ended:
I

Matlab

20 / 323

Useful Books, Journals and Software cont.

Online Information
I

Resources for Economists:

21 / 323

A Brief Review of Probability, Statistics, and Regression


for Forecasting
Topics
I Discrete Random Variable
I Discrete Probability Distribution
I Continuous Random Variable
I Probability Density Function
I Moment
I Mean, or Expected Value
I Location, or Central Tendency
I Variance
I Dispersion, or Scale
I Standard Deviation
I Skewness
I Asymmetry
I Kurtosis
I Leptokurtosis
22 / 323

A Brief Review of Probability, Statistics, and Regression


for Forecasting
Topics cont.
I
I
I
I
I
I
I
I
I
I
I
I
I

Skewness
Asymmetry
Kurtosis
Leptokurtosis
Normal, or Gaussian, Distribution
Marginal Distribution
Joint Distribution
Covariance
Correlation
Conditional Distribution
Conditional Moment
Conditional Mean
Conditional Variance
23 / 323

A Brief Review of Probability, Statistics, and Regression


for Forecasting cont.
Topics cont.
I
I
I
I
I
I
I
I
I
I
I
I
I

Population Distribution
Sample
Estimator
Statistic, or Sample Statistic
Sample Mean
Sample Variance
Sample Standard Deviation
Sample Skewness
Sample Kurtosis
2 Distribution
t Distribution
F Distribution
Jarque-Bera Test
24 / 323

Regression as Curve Fitting


Least-squares estimation:
min

T
X

[ yt 0 1 xt ] 2

t = 1

Fitted values:
y
t = 0 + 1 xt
Residuals:
e t = yt y
t

25 / 323

Regression as a probabilistic model


Simple regression:
y t = 0 + 1 x t + t

iid
(0, 2 )

Multiple regression:
yt = 0 + 1 xt + 2 zt + t

iid
(0, 2 )

26 / 323

Regression as a probabilistic model cont.


Mean dependent var 10.23
T

y
=

1X
yt
T
t=1

S.D. dependent var 1.49


s

PT

SD =

y
)2
T1

t=1 (yt

Sum squared resid 43.70


SSR =

T
X

e2t

t=1

27 / 323

Regression as a probabilistic model cont.


Fstatistic 30.89
F =

(SSRres SSR) / (k 1)
SSR / (T k)

S.E. of regression 0.99


s

PT
=

SER =

2
t=1 et

Tk
s

s2

PT

2
t=1 et

Tk

28 / 323

Regression as a probabilistic model cont.


Rsquared 0.58
2

PT
= 1 PT

2
t=1 et

t=1 (yt

y
t )2

or
R

= 1

1 PT
2
t=1 et
T
1 PT
t=1 (yt
T

y
t )2

Adjusted Rsquared 0.56


2 = 1
R

PT 2
1
t=1 et
Tk
PT
1
t=1 (yt
T1

y
t )2

29 / 323

Regression as a probabilistic model cont.

Akaike info criterion 0.03


2k
AIC = e( T )

PT

2
t=1 et

Schwarz criterion 0.15


k
SIC = T( T )

PT

2
t=1 et

30 / 323

Regression as a probabilistic model cont.


Durbin Watson stat 1.97
y t = 0 + 1 x t + t
iid
vt N(0, 2 )

t = t1 + vt
PT
DW =

2
t=2 (et et1 )
PT 2
t=1 et

31 / 323

Regression of y on x and z

32 / 323

Scatterplot of y versus x

33 / 323

Scatterplot of y versus x Regression Line Superimposed

34 / 323

Scatterplot of y versus z Regression Line Superimposed

35 / 323

Residual Plot Regression of y on x and z

36 / 323

Six Considerations Basic to Successful Forecasting

1. The Decision Environment and Loss Function


L(e) = e2
L(e) = |e|
2. The Forecast Object
I

Event outcome, event timing, time series.

3. The Forecast Statement


I

Point forecast, interval forecast, density forecast, probability


forecast

37 / 323

Six Considerations Basic to Successful Forecasting cont.


4. The Forecast Horizon
I
I

h-step ahead forecast


h-step-ahead extrapolation forecast

5. The Information Set


univariate
= {yT , yT1 , ..., y1 }
T
= {yT , xT , yT1 , xT1 , ..., y1 , x1 }
multivariate
T
6. Methods and Complexity, the Parsimony Principle, and the
I
I
I
I

Shrinkage Principle
Signal vs. noise
Smaller is often better
Even incorrect restrictions can help

38 / 323

Six Considerations Basic to Successful Forecasting cont.

Decision Making with Symmetric Loss


Demand High Demand Low
Build Inventory
0
$10,000
Reduce Inventory $10,000
0
Decision Making with Asymmetric Loss
Demand High Demand Low
Build Inventory
0
$10,000
Reduce Inventory $20,000
0

39 / 323

Six Considerations Basic to Successful Forecasting cont.

Forecasting with Symmetric Loss


High Actual Sales Low Actual Sales
High Forecasted
Sales
0
$10,000
Low Forecasted
$10,000
0
Sales
Forecasting with Asymmetric Loss
High Actual Sales Low Actual Sales
High Forecasted
Sales
0
$10,000
Low Forecasted
Sales
$20,000
0

40 / 323

Quadratic Loss

41 / 323

Absolute Loss

42 / 323

Asymmetric Loss

43 / 323

Forecast Statement

44 / 323

Forecast Statement cont.

45 / 323

Extrapolation Forecast

46 / 323

Extrapolation Forecast cont.

47 / 323

Statistical Graphics For Forecasting


1. Why Graphical Analysis is Important
I
I
I

Graphics helps us summarize and reveal patterns in data


Graphics helps us identify anomalies in data
Graphics facilitates and encourages comparison of different
pieces of data
Graphics enables us to present a huge amount of data in a
small space, and it enables us to make huge data sets coherent

2. Simple Graphical Techniques


I
I
I

Univariate, multivariate
Time series vs. distributional shape
Relational graphics

3. Elements of Graphical Style


I
I
I

Know your audience, and know your goals.


Show the data, and appeal to the viewer.
Revise and edit, again and again.

4. Application: Graphing Four Components of Real GNP


48 / 323

Anscombes Quartet
(1)
x1
y1
10.0
8.04
8.0
6.95
13.0 7.58
9.0
8.81
11.0
8.33
14.0
9.96
6.0
7.24
4.0
4.26
12.0
10.84
7.0
4.82
5.0
5.68

(2)
x2
10.0
8.0
13.0
9.0
11.0
14.0
6.0
4.0
12.0
7.0
5.0

y2
9.14
8.14
8.74
8.77
9.26
8.10
6.13
3.10
9.13
7.26
4.74

(3)
x3
10.0
8.0
13.0
9.0
11.0
14.0
6.0
4.0
12.0
7.0
5.0

y3
7.46
6.77
12.74
7.11
7.81
8.84
6.08
5.39
8.15
6.42
5.73

(4)
x4
8.0
8.0
8.0
8.0
8.0
8.0
8.0
19
8.0
8.0
8.0

49 / 323

Anscombes Quartet

50 / 323

Anscombes Quartet Bivariate Scatterplot

51 / 323

1-Year Treasury Bond Rates

52 / 323

Change in 1-Year Treasury Bond Rates

53 / 323

Liquor Sales

54 / 323

Histogram and Descriptive Statistics Change in 1-Year


Treasury Bond Rates

55 / 323

Scatterplot 1-Year vs. 10-year Treasury Bond Rates

56 / 323

Scatterplot Matrix 1-, 10-, 20-, and 30-Year Treasury


Bond Rates

57 / 323

Time Series Plot Aspect Ratio 1:1.6

58 / 323

Time Series Plot Banked to 45 Degrees

59 / 323

Time Series Plot Aspect Ratio 1:1.6

60 / 323

Graph

61 / 323

Graph

62 / 323

Graph

63 / 323

Graph

64 / 323

Graph

65 / 323

Components of Real GDP (Millions of Current Dollars,


Annual

66 / 323

Modeling and Forecasting Trend


1. Modeling Trend
Tt = 0 + 1 TIMEt
Tt = 0 + 1 TIMEt + 2 TIME2t

Tt = 0 e1 TIMEt
ln(Tt ) = ln(0 ) + 1 TIMEt

67 / 323

Modeling and Forecasting Trend


2. Estimating Trend Models
(0 , 1 ) = arg min

0 ,1

T
X
t = 1

T
X


(0 , 1 , 2 ) = arg min

0 ,1 ,2

0 ,1

0 ,1

yt 0 1 TIMEt 2 TIME2t

t = 1

(0 , 1 ) = arg min

(0 , 1 ) = arg min

[ yt 0 1 TIMEt ] 2

T
X

T h
X

yt 0 e1 TIMEt

i2

t = 1

[ ln yt ln 0 1 TIMEt ] 2

t = 1

68 / 323

Modeling and Forecasting Trend


3. Forecasting Trend
yt = 0 + 1 TIMEt + t

yT+h = 0 + 1 TIMET+h + T+h

yT+h,T = 0 + 1 TIMET+h

y
T+h,T = 0 + 1 TIMET+h

69 / 323

Modeling and Forecasting Trend


3. Forecasting Trend cont.
yT+h,T 1.96

y
T+h,T 1.96

N(yT+h,T , 2 )

N(
yT+h,T ,
2)

70 / 323

Modeling and Forecasting Trend

4. Selecting Forecasting Models


PT

2
t=1 et

MSE =
R2 = 1 PT

T
PT

2
t=1 et

t=1 (yt

y
)2

71 / 323

Modeling and Forecasting Trend

4. Selecting Forecasting Models cont.


s
s2 =

2
R

PT
=

1 PT

2
t=1 et

t=1 (yt

PT
=

2
t=1 et

Tk
 PT

T
Tk

2
t=1 et

/ Tk

y
)2 / T 1

1 s

2
PT
)2 / T
t=1 (yt y

72 / 323

Modeling and Forecasting Trend

4. Selecting Forecasting Models cont.


2k
AIC = e( T )

SIC = T( T )
I

Consistency

Efficiency

PT

2
t=1 et

T
PT

2
t=1 et

73 / 323

Labor Force Participation Rate

74 / 323

Increasing and Decreasing Labor Trends

75 / 323

Labor Force Participation Rate

76 / 323

Linear Trend Female Labor Force Participation Rate

77 / 323

Linear Trend Male Labor Force Participation Rate

78 / 323

Volume on the New York Stock Exchange

79 / 323

Various Shapes of Quadratic Trends

80 / 323

Quadratic Trend Volume on the New York Stock


Exchange

81 / 323

Log Volume on the New York Stock Exchange

82 / 323

Various Shapes of Exponential Trends

83 / 323

Linear Trend Log Volume on the New York Stock


Exchange

84 / 323

Exponential Trend Volume on the New York Stock


Exchange

85 / 323

Degree-of-Freedom Penalties Various Model Selection


Criteria

86 / 323

Retail Sales

87 / 323

Retail Sales Linear Trend Regression

88 / 323

Retail Sales Linear Trend Residual Plot

89 / 323

Retail Sales Quadratic Trend Regression

90 / 323

Retail Sales Quadratic Trend Residual Plot

91 / 323

Retail Sales Log Linear Trend Regression

92 / 323

Retail Sales Log Linear Trend Residual Plot

93 / 323

Retail Sales Exponential Trend Regression

94 / 323

Retail Sales Exponential Trend Residual Plot

95 / 323

Model Selection Criteria

Linear, Quadratic and Exponential Trend Models

AIC
SIC

Linear Trend
19.35
19.37

Quadratic Trend
15.83
15.86

Exponential Trend
17.15
17.17

96 / 323

Retail Sales History January, 1990 December, 1994

97 / 323

Retail Sales History January, 1990 December, 1994

98 / 323

Retail Sales History January, 1990 December, 1994

99 / 323

Retail Sales History January, 1990 December, 1994

100 / 323

Modeling and Forecasting Seasonality


1. The Nature and Sources of Seasonality
2. Modeling Seasonality
D1
D2
D3
D4
yt =

=
=
=
=

(1,
(0,
(0,
(0,

s
X

0,
1,
0,
0,

0,
0,
1,
0,

0,
0,
0,
1,

1,
0,
0,
0,

0,
1,
0,
0,

0,
0,
1,
0,

0,
0,
0,
1,

1,
0,
0,
0,

0,
1,
0,
0,

0,
0,
1,
0,

0,
0,
0,
1,

...)
...)
...)
...)

i Dit + t

i=1

yt = 1 TIMEt +

s
X

i Dit + t

i=1

yt = 1 TIMEt +

s
X
i=1

i Dit +

v1
X
i=1

iHD HDVit +

v2
X

iTD TDVi

i=1
101 / 323

Modeling and Forecasting Seasonality


3. Forecasting Seasonal Series

yt = 1 TIMEt +

s
X

i Dit +

i=1

yT+h = 1 TIMET+h +

v1
X

iHD HDVit +

i=1
s
X

i Di,T+h +

yT+h,T = 1 TIMET+h +

v1
X

y
T+h,T

s
i
X
i=1

iHD HDVi,T+h +

i=1

i Di,T+h +

i=1

= 1 TIMET+h +

iTD TDVi

i=1

i=1
s
X

v2
X

v1
X

iHD HDVi,T+h +

i=1

Di,T+h +

v1
X

iHD HDVi,T+h +

i=1
102 / 323

Modeling and Forecasting Seasonality

3. Forecasting Seasonal Series cont.


yT+h,T 1.96
y
T+h,T 1.96

N(yT+h,T , 2 )
N(
yT+h,T ,
2)

103 / 323

Gasoline Sales

104 / 323

Liquor Sales

105 / 323

Durable Goods Sales

106 / 323

Housing Starts, January, 1946 November, 1994

107 / 323

Housing Starts, January, 1990 November, 1994

108 / 323

Housing Starts Regression Results - Seasonal Dummy


Variable Model

109 / 323

Residual Plot

110 / 323

Housing Starts Estimated Seasonal Factors

111 / 323

Housing Starts

112 / 323

Housing Starts

113 / 323

Characterizing Cycles
1. Covariance Stationary Time Series
I
I
I

Realization
Sample Path
Covariance Stationary
Eyt = t
Eyt =
(t, ) = cov(yt , yt ) = E(yt )(yt )
(t, ) = ( )

( ) = p

( )
( )
cov(yt , yt )
p
p
= p
=
(0)
var(yt ) var(yt )
(0) (0)
114 / 323

1.Characterizing Cycles Cont.

corr(x, y) =
( ) =

( ) = p
I

cov(x, y)
x y

( )
, = 0, 1, 2, ....
(0)

( )
( )
cov(yt , yt )
p
p
= p
=
(0)
var(yt ) var(yt )
(0) (0)

p( ) regression of yt on yt1 , ..., yt

115 / 323

2.White Noise

yt WN(0, 2 )
iid
yt (0, 2 )

yt

iid
N(0, 2 )

116 / 323

2.White Noise Cont.

E(yt ) = 0
var(yt ) = 2
E(yt |t1 ) = 0
var(yt |t1 ) = E[(yt E(yt |t1 ))2 |t1 ] = 2

117 / 323

3.The Lag Operator


L yt = yt1
L2 yt = L(L(yt )) = L(yt1 ) = yt2
B(L) = b0 + b1 L + b2 L2 + ... bm Lm
Lm yt = ytm
yt = (1 L)yt = yt yt1
(1 + .9L + .6L2 )yt = yt + .9yt1 + .6yt2
B(L) = b0 + b1 L + b2 L2 + ... =

bi Li

i=0

B(L) t = b0 t + b1 t1 + b2 t2 + ... =

bi ti

i=0

118 / 323

4.Wolds Theorem, the General Linear Process, and


Rational Distributed Lags
Wolds Theorem
Let {yt } be any zero-mean covariance-stationary process. Then:

yt = B(L)t =

bi ti

i=0

t WN(0, 2 )
where
b0 = 1
and

b2i <

i=0
119 / 323

The General Linear Process

yt = B(L)t =

bi ti

i=0

iid
t WN(0, 2 ),
where b0 = 1 and

b2i <

i=0

120 / 323

The General Linear Process Cont.

E(yt ) = E(

X
i=0

bi ti ) =

X
i=0

bi Eti =

bi 0 = 0

i=0

X
X
X
X
2
2 2
2
var(yt ) = var(
bi ti ) =
bi var(ti ) =
bi =
b
i=0

i=0

i=0

i=0

E(yt |t1 ) = E(t |t1 ) + b1 E(t1 |t1 ) + b2 E(t2 |t1 ) + ... =

var(yt |t1 ) = E[(yt E(yt |t1 ))2 |t1 ] = E(2t |t1 ) = E(2t )
121 / 323

Rational Distributed Lags

B(L) =

(L) =

(L)
(L)
q
X

i Li

i=0

(L) =

p
X

i Li

i=0

B(L)

(L)
(L)

122 / 323

5.Estimation and Inference for the Mean, Auto Correlation


and Partial Autocorrelation Functions
T

y
=

1X
yt
T
t=1

( ) =
( ) =

1
T

PT

E [(yt ) (yt )]
E[(yt )2 ]

t = +1 [(yt
1 PT
t=1 (yt
T

y
) (yt y
)]
y
)2

PT
=

t = +1

[(yt y
) (
PT
t=1 (yt

y
t = c + 1 yt1 + ... + yt
p
( )


1
( ), p
( ) N 0,
T
123 / 323

5.Estimation and Inference for the Mean, Auto Correlation


and Partial Autocorrelation Functions Cont.
( ) N(0,

1
)
T

rootT
( ) N(0, 1)
T 2 ( ) 21

QBP = T

m
X

2 ( )

=1

QLB = T (T + 2)

m 
X
=1

1
T

2 ( )

124 / 323

A Rigid Cycle Pattern

125 / 323

Autocorrelation Function, One-Sided Gradual Damping

126 / 323

Autocorrelation Function, Non-Damping

127 / 323

Autocorrelation Function, Gradual Damped Oscillation

128 / 323

Autocorrelation Function, Sharp Cutoff

129 / 323

Realization of White Noise Process

130 / 323

Population Autocorrelation Function of White Noise


Process

131 / 323

Population Partialautocorrelation Function of White Noise


Process

132 / 323

Canadian Employment Index

133 / 323

Canadian Employment Index Correlogram


Sample: 1962:1 1993:4
Included observations: 128
Acorr.

P. Acorr.

Std. Error

Ljung-Box

0.949

.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088

118.07
219.66
303.72
370.82
422.27
460.00
486.32
503.41
512.70
516.43
517.20
517.21

p-v

v
1 0.949
2 0.877
3 0.795
4 0.707
5 0.617
6 0.526
7 0.438
8 0.351
9 0.258
10 0.163
11 0.073
12 0.005

0.244
0.101
0.070
0.063
0.048
0.033
0.049
0.149
0.070
0.011
0.016

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
134 / 323

Canadian Employment Index, Sample Autocorrelation and


Partial Autocorrelation Functions

135 / 323

Modeling Cycles: MA,AR, and ARMA Models

The MA(1) Process


yt = t + t1 = (1 + L)t
t WN(0, 2 )
If invertible:
yt = t + yt1 2 yt2 + 3 yt3 ...

136 / 323

Modeling Cycles: MA,AR, and ARMA Models Cont.

Eyt = E(t ) + E(t1 ) = 0


var(yt ) = var(t ) + 2 var(t1 ) = 2 + 2 2 = 2 (1 + 2 )

E(yt |t1 ) = E((t + t1 )|t1 ) = E(t |t1 ) + E(t1 |t1 )

var(yt |t1 ) = E[(yt E(yt |t1 ))2 |t1 ] = E(2t |t1 ) = E(2t )

137 / 323

The MA(q) Process

yt = t + 1 t1 + ... + q tq = (L)t
t WN(0, 2 )
where
(L) = 1 + 1 L + ... + q Lq

138 / 323

The AR(1) Process

yt = yt1 + t
t WN(0, 2 )
If covariance stationary:
yt = t + t1 + 2 t2 + ...

139 / 323

Moment Structure
E (yt ) = E (t + t1 + 2 t2 + ...)
= E (t ) + E (t1 ) + 2 E (t2 ) + ...
= 0
var (yt ) = var (t + t1 + 2 t2 + ...)
= 2 + 2 2 + 4 2 + ...
= 2

i=0

2i

= 12

140 / 323

Moment Structure Cont.


E (yt |yt1 ) = E ((yt1 + t ) | yt1 )
= E (yt1 |yt1 ) + E (t |yt1 )
= yt1 + 0
= yt1
var (yt |yt1 ) = var ((yt1 + t ) | yt1 )
= 2 var (yt1 |yt1 ) + var (t |yt1 )
= 0 + 2
= 2
141 / 323

Moment Structure Cont.


Autocovariances and autocorrelations:

yt = yt1 + t
yt yt = yt1 yt + t yt
For
1
,
( ) = ( 1).
(Yule-Walker equation) But
2

(0) = 12
142 / 323

Moment Structure Cont.

. Thus
2

( ) = 12 , = 0, 1, 2, ....
and
( ) = , = 0, 1, 2, ....
Partial autocorrelations:

143 / 323

The AR(p) Process

yt = 1 yt1 + 2 yt2 + ... + p ytp + t


t WN(0, 2 )

144 / 323

The ARMA(1,1) Process


yt = yt1 + t + t1
t WN(0, 2 )
MA representation if invertible:

yt =

(1 + L)
t
(1 L)

AR representation of covariance stationary:


(1 L)
y t = t
(1 + L)

145 / 323

The ARMA(p,q) Process

yt = 1 yt1 + ... + p ytp + t + 1 t1 + ... + q tq

t WN(0, 2 )
(L)yt = (L)t

146 / 323

Realization of Two MA(1) Processes

147 / 323

Population Autocorrelation Function MA(1) Process

= .4

148 / 323

Population Autocorrelation Function MA(1) Process

= .95

149 / 323

Population Partial Autocorrelation Function MA(1)


Process

= .4

150 / 323

Population Partial Autocorrelation Function MA(1)


Process

= .95

151 / 323

Realization of Two AR(1) Processes

152 / 323

Population Autocorrelation Function AR(1) Process

= .4

153 / 323

Population Autocorrelation Function AR(1) Process

= .95

154 / 323

Population Partial Autocorrelation Function AR(1) Process

= .4

155 / 323

Population Partial Autocorrelation Function AR(1) Process

= .95

156 / 323

Population Autocorrelation Function AR(2) Process with


Complex Roots

157 / 323

Employment: MA(4) Model

Sample: 1962:1 1993:4


Included observations: 128

158 / 323

Employment MA(4) Residual Plot

159 / 323

Employment: MA(4) Model


Residual Correlogram Sample: 1962:1 1993:4
Included observations: 128
Qstatistic probabilities adjusted for 4 ARMA term(s)
Acorr. P. Acorr.
1
0.345
2
0.660
3 0.534
4
0.427
5 0.347
6
0.484
7 0.121
8
0.348
9 0.148
10 0.102
11 0.081
12 0.029

0.345
0.614
0.426
0.042
-0.398
0.145
0.118
0.048
0.019
0.066
0.098
0.113

Std. Error

Ljung-Box

.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088
.088

73.089
111.01
135.49
151.79
183.70
185.71
202.46
205.50
206.96
207.89
208.01

p-v

15.614

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
160 / 323

Employment: MA(4) Model


Residual Sample Autocorrelation and Partial Autocorrelation
Functions, With Plus or Minus Two Standard Error Bands

161 / 323

Employment: AR(2) Model

LS // Dependent Variable is CANEMP


Sample: 1962:1 1993:4
Included observations: 128
Convergence achieved after 3 iterations
Variable Coefficient Std. Error tStatistic Prob.

162 / 323

Employment AR(2) Model Residual Plot

163 / 323

Employment AIC Values of Various ARMA Models

AR Order

0
1
2
3
4

1.01
.762
.77
.79

1
2.86
.83
.77
.761
.79

MA Order
2
2.32
.79
.78
.77
.77

3
2.47
.80
.80
.78
.79

4
2.20
.81
.80
.79
.80

164 / 323

Employment SIC Values of Various ARMA Models

AR Order

0
1
2
3
4

1.05
.83
.86
.90

1
2.91
.90
.86
.87
.92

MA Order
2
2.38
.88
.89
.90
.93

3
2.56
.91
.92
.94
.97

4
2.31
.94
.96
.96
1.00

165 / 323

Employment: ARMA(3,1) Model

LS // Dependent Variable is CANEMP


Sample: 1962:1 1993:4
Included observations: 128
Convergence achieved after 17 iterations
Variable Coefficient Std. Error tStatistic Prob.

166 / 323

Employment ARMA(3) Model Residual Plot

167 / 323

Employment: ARMA(3,1) Model Residual Correlogram


Sample: 1962:1 1993:4
Included observations: 128
Qstatistic probabilities adjusted for 4 ARMA term(s)
Acorr. P. Acorr.
1
0.032
2 0.041
3 0.014
4 0.048
5 0.006
6 0.013
7 0.017
8 0.064
9 0.092
10 0.039
11 0.016
12 0.137

0.032
0.040
0.017
0.047
0.007
0.009
0.019
0.060
0.097
0.040
0.022
0.153

Std. Error

Ljung-Box

.09
.09
.09
.09
.09
.09
.09
.09
.09
.09
.09
.09

0.1376
0.3643
0.3904
0.6970
0.7013
0.7246
0.7650
1.3384
2.5182
2.7276
2.7659
5.4415

p-v

0.402
0.696
0.858
0.855
0.774
0.842
0.906
0.710
168 / 323

Employment: ARMA(3) Model

Residual Sample Autocorrelation and Partial Autocorrelation


Functions, With Plus or Minus Two Standard Error Bands

169 / 323

Forecasting Cycles
T = {yT , yT1 , yT2 , ...},
T = {T , T1 , T2 , ...}.
Optimal Point Forecasts for Infinite-Order Moving Averages
yt =

bi ti ,

i=0

where
t WN(0, 2 )
,
170 / 323

Forecasting Cycles Cont.


b0 = 1
, and
2
.

b2i <

i=0

yT+h = T+h + b1 T+h1 + ... + bh T + bh+1 T1 + ...


yT+h,T = bh T + bh+1 T1 + ...
eT+h,T = (yT+h yT+h,T ) =

h1
X

bi T+hi ,

i=0

h2 = 2

h1
X

b2i .

i=0
171 / 323

Interval and Density Forecasts


yT+h = yT+h,T + eT+h,T .
95% h-step-ahead interval forecast:

yT+h,T 1.96h
h-step-ahead density forecast:
N(yT+h,T , h2 )
Making the Forecasts Operational
The Chain Rule of Forecasting
172 / 323

Employment History and Forecast MA(4) Model

173 / 323

Employment History and Long-Horizon Forecast MA(4)


Model

174 / 323

Employment History, Forecast and Realization MA(4)


Model

175 / 323

Employment History and Forecast AR(2) Model

176 / 323

Employment History and Long-Horizon Forecast AR(2)


Model

177 / 323

Employment History and Very Long-Horizon Forecast


AR(2) Model

178 / 323

Employment History, Forecast and Realization AR(2)


Model

179 / 323

Putting it all Together


A Forecast Model with Trend, Seasonal and Cyclical Components

The full model:

yt = Tt () +

s
X
i=1

i Dit +

v1
X

iHD HDVit +

i=1

v2
X

iTD TDVit + t

i=1

(L)t = (L)vt
(L) = 1 1 L ... p Lp
(L) = 1 + 1 L + ... + q Lq
vt WN(0, 2 ).
180 / 323

Point Forecasting

yT+h = TT+h () +

s
X

i Di,T+h +

v1
X

i=1

yT+h,T = TT+h () +

s
X

s
i
X
i=1

v2
X

i=1

i Di,T+h +

i=1

+
y
T+h,T = TT+h ()

iHD HDVi,T+h +

i=1

v1
X

iHD HDVi,T+h

i=1

Di,T+h +

v1
X
i=1

iTD T

iHD HDVi,T+h +

v2
X

iTD

i=1

v2
X

iTD T

i=1

181 / 323

Interval Forecasting and Density Forecasting


Interval Forecasting:

y
T+h,T z/2
h
e.g.: (95% interval)
y
T+h,T 1.96
h
Density Forecasting:
N(
yT+h,T ,
h2 )

182 / 323

Recursive Estimation

yt =

K
X

k xkt + t

k=1

t iidN(0, 2 ),
t = 1, ..., T .
OLS estimation uses the full sample, t = 1, ..., T .
Recursive least squares uses an expanding sample.
Begin with the first K observations and estimate the model.
Then estimate using the first K + 1 observations, and so on.
At the end we have a set of recursive parameter estimates:
k,t , for k = 1, ..., K and t = K , ..., T .
183 / 323

Recursive Residuals
At each t, t = K , ..., T 1, compute a 1-step forecast,
yt+1,t =

K
X

kt xk,t+1 .

k=1

The corresponding forecast errors, or recursive residuals, are


et+1,t = yt+1 yt+1,t .

et+1,t N(0, 2 rt )
where rt > 1 for all t

184 / 323

Standardized Recursive Residuals and CUSUM

wt+1,t

et+1,t
,
rt

t = K , ..., T 1.
Under the maintained assumptions,
wt+1,t iidN(0, 1).
Then

CUSUMt

t
X

wt+1,t , t = K , ..., T 1

t=K

is just a sum of iid N(0, 1)s.


185 / 323

Liquor Sales, 1968.1-1993.12

186 / 323

Log Liquor Sales, 1968.01 - 1993.12

187 / 323

Log Liquor Sales: Quadratic Trend Regression

188 / 323

Liquor Sales Quadratic Trend Regression Residual Plot

189 / 323

Liquor Sales Quadratic Trend Regression Residual


Correlogram
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Acorr.
0.117
0.149
0.106
0.014
0.142
0.041
0.134
0.029
0.136
0.205
0.056
0.888
0.055
0.187
0.159

P. Acorr.
0.117
0.165
0.069
0.017
0.125
0.004
0.175
0.046
0.080
0.206
0.080
0.879
0.507
0.159
0.144

Std. Error
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056

Ljung-Box
4.3158 0.0
11.365 0.0
14.943 0.0
15.007 0.0
21.449 0.0
21.979 0.0
27.708 0.0
27.975 0.0
33.944 0.0
47.611 0.0
48.632 0.0
306.26 0.0
307.25 0.0
318.79 0.0
327.17
0.0
190 / 323

Liquor Sales Quadratic Trend Regression Residual Sample


Autocorrelation Functions

191 / 323

Liquor Sales Quadratic Trend Regression Residual Partial


Autocorrelation Functions

192 / 323

Log Liquor Sales: Quadratic Trend Regression With


Seasonal Dummies and AR(3) Disturbances

193 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies Residual Plot

194 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies Residual Correlogram
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Acorr.
0.700
0.686
0.725
0.569
0.569
0.577
0.460
0.480
0.466
0.327
0.364
0.355
0.225
0.291
0.211

P. Acorr.Std.Error
0.700
.056
0.383
.056
0.369
.056
0.141
.056
0.017
.056
0.093
.056
0.078
.056
0.043
.056
0.030
.056
0.188
.056
0.019
.056
0.089
.056
0.119
.056
0.065
.056
0.119
.056

Ljung-Box
p-value
154.34 0.000
302.86 0.000
469.36 0.000
572.36 0.000
675.58 0.000
782.19 0.000
850.06 0.000
924.38 0.000
994.46 0.000
1029.1 0.000
1072.1 0.000
1113.3 0.000
1129.9 0.000
1157.8 0.000
1172.4 0.000

195 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies Residual Sample Autocorrelation Functions

196 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies Residual Sample Partial Autocorrelation
Functions

197 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies and AR(3) Disturbances Residual Plot

198 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies and AR(3) Disturbances Residual Correlogram
Acorr. P. Acorr.
1 0.056
0.056
2 0.037
0.034
3 0.024
0.020
4 0.084
0.088
5 0.007
0.001
6 0.065
0.072
7 0.041
0.044
8 0.069
0.063
9 0.080
0.074
10 0.163
0.169
11 0.009
0.005
12 0.145
0.175
13 0.074
0.078
14 0.149
0.113
15 0.039
0.060

Std. Error
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056
.056

Ljung-Box
p-v
0.9779 0.323
1.4194 0.492
1.6032 0.659
3.8256 0.430
3.8415 0.572
5.1985 0.519
5.7288 0.572
7.2828 0.506
9.3527 0.405
18.019 0.055
18.045 0.081
24.938 0.015
26.750 0.013
34.034 0.002
34.532 0.003
199 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies and AR(3) Disturbances Residual Sample
Autocorrelation Functions

200 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies and AR(3) Disturbances Residual Sample Partial
Autocorrelation Functions

201 / 323

Liquor Sales Quadratic Trend Regression with Seasonal


Dummies and AR(3) Disturbances Residual Histogram and
Normality Test

202 / 323

Log Liquor Sales History and 12-Month-Ahead Forecast

203 / 323

Log Liquor Sales History, 12-Month-Ahead Forecast, and


Realization

204 / 323

Log Liquor Sales History and 60-Month-Ahead Forecast

205 / 323

Log Liquor Sales Long History and 60-Month-Ahead


Forecast

206 / 323

Liquor Sales Long History and 60-Month-Ahead Forecast

207 / 323

Recursive Analysis Constant Parameter Model

208 / 323

Recursive Analysis Breaking Parameter Model

209 / 323

Log Liquor Sales: Quadratic Trend Regression with


Seasonal Dummies and AR(3) Residuals and Two
Standard Errors Bands

210 / 323

Log Liquor Sales: Quadratic Trend Regression with


Seasonal Dummies and AR(3) Disturbances Recursive
Parameter Estimates

211 / 323

Log Liquor Sales: Quadratic Trend Regression with


Seasonal Dummies and AR(3) Disturbances CUMSUM
Analysis

212 / 323

Forecasting with Regression Models


Conditional Forecasting Models and Scenario Analysis

y t = 0 + 1 x t + t
t N(0, 2 )
yT+h,T |xT+h = 0 + 1 xT+h
Density forecast:
N(yT+h,T |xT+h , 2 )
Scenario analysis, contingency analysis
No forecasting the RHS variables problem

213 / 323

Unconditional Forecasting Models

yT+h,T = 0 + 1 xT+h,T
Forecasting the RHS variables problem
Could fit a model to x (e.g., an autoregressive model)
Preferably, regress y on
xth , xth1,

...

No problem in trend and seasonal models

214 / 323

Distributed Lags
Start with unconditional forecast model:
yt = 0 + xt1 + t
Generalize to

yt = 0 +

Nx
X

i xti + t

i=1

distributed lag model


lag weights
lag distribution

215 / 323

Polynomial Distributed Lags

min
0 , i

T
X

"
yt 0

t = Nx +1

Nx
X

#2
i xti

i=1

subject to
i = P(i) = a + bi + ci2 , i = 1, ..., Nx
Lag weights constrained to lie on low-order polynomial
Additional constraints can be imposed, such as
P(Nx ) = 0
Smooth lag distribution
Parsimonious
216 / 323

Rational Distributed Lags

yt =

A(L)
x t + t
B(L)

Equivalently,
B(L)yt = A(L)xt + B(L) t
Lags of x and y included
Important to allow for lags of y, one way or another

217 / 323

Another way:
distributed lag regression with lagged dependent variables

yt = 0 +

Ny
X

i yti +

i=1

Nx
X

j xtj + t

j=1

Another way:
distributed lag regression with ARMA disturbances

yt = 0 +

Nx
X

i xti + t

i=1

t =

(L)
vt
(L)

vt WN(0, 2 )
218 / 323

Another Way: The Transfer function Model and Various


Special Cases
Univariate ARMA
yt =

C(L)
t
D(L)

A(
Distributed Lag with
B(L) yt = A(L) xt + t
, or
Lagged Dep. Variables

yt =

1
A(L)
xt +
t
B(L)
B(L)

C(
Distributed Lag with

219 / 323

Vector Autoregressions
e.g., bivariate VAR(1)
y1,t = 11 y1,t1 + 12 y2,t1 + 1,t
y2,t = 21 y1,t1 + 22 y2,t1 + 2,t
1,t WN(0, 12 )
Estimation by OLS
2,t WN(0, 22 )
cov(1,t , 2,t ) = 12
Order selection by information criteria
Impulse-response functions, variance decompositions, predictive
causality
Forecasts via Wolds chain rule
220 / 323

Point and Interval Forecast

221 / 323

U.S. Housing Starts and Completions, 1968.01-1996.06

222 / 323

Starts Correlogram
Sample: 1968:01 1991:12
Included observations: 288
Acorr.
1 0.937
2 0.907
3 0.877
4 0.838
5 0.795
6 0.751
7 0.704
8 0.650
9 0.604
10 0.544
11 0.496
12 0.446
13 0.405
14 0.346

P. Acorr.
0.937
0.244
0.054
0.077
0.096
0.058
0.067
0.098
0.004
0.129
0.029
0.008
0.076
0.144

Std. Error

Ljung-Box

0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059

255.24
495.53
720.95
927.39
1113.7
1280.9
1428.2
1554.4
1663.8
1752.6
1826.7
1886.8
1936.8
1973.3

p-v

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
223 / 323

Starts Sample Autocorrelations

224 / 323

Starts Sample Partial Autocorrelations

225 / 323

Completions Correlogram
Completions Correlogram
Sample: 1968:01 1991:12
Included observations: 288
Acorr.
1 0.939
2 0.920
3 0.896
4 0.874
5 0.834
6 0.802
7 0.761
8 0.721
9 0.677
10 0.633
11 0.583
12 0.533

P. Acorr.
0.939
0.328
0.066
0.023
0.165
0.067
0.100
0.070
0.055
0.047
0.080
0.073

Std. Error

Ljung-Box

0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059

256.61
504.05
739.19
963.73
1168.9
1359.2
1531.2
1686.1
1823.2
1943.7
2046.3
2132.2

p-v

0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
226 / 323

Completions Sample Autocorrelations

227 / 323

Completions Partial Autocorrelations

228 / 323

Starts and Completions: Sample Cross Correlations

229 / 323

VAR Order Selection with AIC and SIC

230 / 323

VAR Starts Equation

231 / 323

VAR Start Equation Residual Plot

232 / 323

VAR Starts Equation Residual Correlogram


Sample: 1968:01 1991:12
Included observations: 284
Acorr.
1 0.001
2 0.003
3 0.006
4 0.023
5 0.013
6 0.022
7 0.038
8 0.048
9 0.056
10 0.114
11 0.038
12 0.030
13 0.192
14 0.014

P. Acorr.
0.001
0.003
0.006
0.023
0.013
0.021
0.038
0.048
0.056
0.116
0.038
0.028
0.193
0.021

Std. Error

Ljung-Box

0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059
0.059

0.0004
0.0029
0.0119
0.1650
0.2108
0.3463
0.7646
1.4362
2.3528
6.1868
6.6096
6.8763
17.947
18.010

p-v

0.985
0.999
1.000
0.997
0.999
0.999
0.998
0.994
0.985
0.799
0.830
0.866
0.160
0.206
233 / 323

VAR Starts Equation Residual Sample Autocorrelations

234 / 323

Var Starts Equation Residual Sample Partial


Autocorrelations

235 / 323

Evaluating and Combining Forecasts


Evaluating a single forecast Process:
yt = + t + b1 t1 + b2 t2 + ...

t WN(0, 2 ),
h-step-ahead linear least-squares forecast:
yt+h,t = + bh t + bh+1 t1 + ...
Corresponding h-step-ahead forecast error:
et+h,t = yt+h yt+h,t = t+h + b1 t+h1 + ... + bh1 t+1
with variance
h2

= (1 +

h1
X

b2i )

i=1
236 / 323

Evaluating and Combining Forecasts

So, four key properties of optimal forecasts:


a. Optimal forecasts are unbiased
b. Optimal forecasts have 1-step-ahead errors that are white noise
c. Optimal forecasts have h-step-ahead errors that are at most
MA(h-1)
d. Optimal forecasts have h-step-ahead errors with variances that
are non-decreasing in h and that converge to the unconditional
variance of the process
1. All are easily checked. How?

237 / 323

Assessing optimality with respect to an information set


Unforecastability principle: The errors from good forecasts are not
be forecastable!
Regression:
k1
X
et+h,t = +
i xit + ut
i=1

1. Test whether 0 , ..., k1 are 0


Important case:
et+h,t = + 1 yt+h,t + ut
1. Test whether (0 , 1 ) = (0, 0)
Equivalently,
yt+h,t = + 1 yt+h,t + ut
1. Test whether (0 , 1 ) = (0, 1)
238 / 323

Evaluating multiple forecasts: comparing forecast accuracy


Forecast errors, et+h,t = yt+h yt+h,t Forecast percent errors,
y y
pt+h,t = t+hyt+ht+h,t
T

1X
et+h,t
T

ME =

t=1

EV =

1
T

T
X

(et+h,t ME)2

1
T

1X 2
et+h,t
MSE =
T
t=1
T

1X 2
pt+h,t
T
t=1
v
u
T
u1 X
RMSE = t
e2t+h,t
T
MSPE =

t=1

239 / 323

Forecast encompassing

a
b
yt+h = a yt+h,t
+ b yt+h,t
+ t+h,t

1. If (a , b ) = (1, 0), model a forecast-encompasses model b


2. If (a , b ) = (0, 1), model b forecast-encompasses model a
3. Otherwise, neither model encompasses the other
Alternative approach:
a
b
(yt+h yt = (yt+h,t
yt ) + b (yt+h,t
yt ) + t+h,t

1. Useful in I(1) situations

240 / 323

Variance-covaiance forecast combination


Composite formed from two unbiased forecasts:
c
a
b
yt+h
= yt+h,t
+ (1 )yt+h,t

ect+h = eat+h,t + (1 )ebt+h,t


2
2
2
c2 = 2 aa
+ (1 )2 bb
+ 2(1 )ab

2 2
bb
ab
2 + 2 2sigma2
bb
aa
ab

2
2

bb
ab
2 +
2
2 2

bb
aa
ab

241 / 323

Regression-based forecast combination

a
b
yt+h = 0 + 1 yt+h,t
+ 2 yt+h,t
+ t+h,t

1. Equivalent to variance-covariance combination if weights sum


to unity and intercept is excluded
2. Easy extension to include more than two forecasts
3. Time-varying combining weights
4. Dynamic combining regressions
5. Shrinkage of combining weights toward equality
6. Nonlinear combining regressions

242 / 323

Unit Roots, Stochastic Trends, ARIMA Forecasting


Models, and Smoothing
1. Stochastic Trends and Forecasting
(L)yt = (L)t
(L) = 0 (L)(1 L)
0 (L)(1 L)yt = (L)t
0 (L)yt = (L)t
I(0) vs I(1) processes

243 / 323

Unit Roots, Stochastic Trends, ARIMA Forecasting


Models, and Smoothing Cont.
I

Random Walk
yt = yt1 + t

t WN(0, 2 ),
I

Random walk with drift


yt = + yt1 + t

t WN(0, 2 ),
Stochastic trend vs deterministic trend
244 / 323

Properties of random walks


yt = yt1 + t
t WN(0, 2 ),
With time 0 value y0 :
yt = y0 +

t
X

i=1

E(yt ) = y0
var(yt ) = t 2
lim var(yt ) =

245 / 323

Random walk with drift


Random walk with drift
yt = + yt1 + t
t WN(0, 2 ),
Assuming time 0 value y0 :
yt = t + y0 +

t
X

i=1

E(yt ) = y0 + t
var(yt ) = t 2
lim var(yt ) =

246 / 323

ARIMA(p,1,q) model

(L)(1 L)yt = c + (L)t


or
(1 L)yt = c1 (L) + 1 (L)(L)t
where
(L) = 1 1 L ... p Lp
(L) = 1 1 L ... q Lq
and all the roots of both lag operator polynomials are outside the
unit circle.

247 / 323

ARIMA(p,d,q) model

(L)(1 L)d yt = c + (L)t


or
(1 L)d yt = c1 (L) + 1 (L)(L)t
where
(L) = 1 1 L ... p Lp
(L) = 1 1 L ... q Lq
and all the roots of both lag operator polynomials are outside the
unit circle.

248 / 323

Properties of ARIMA(p,1,q) processes

I
I

Appropriately made stationary by differencing


Shocks have permanent effects
I

Forecasts dont revert to a mean

Variance grows without bound as time progresses


I

Interval forecasts widen without bound as horizon grows

249 / 323

Random walk example


Point forecast
Recall that for the AR(1) process,
yt = yt1 + t
yt WN(0, 2 )
the optimal forecast is
yT+h,T = h yT
Thus in the random walk case,
yT+h,T = yT , for all h

250 / 323

Random walk example Cont.


Interval and density forecasts
Recall error associated with optimal AR(1) forecast:
eT+h,T = (yT+h yT+h,T ) = T+h + T+h1 + ... + h1 T+1
with variance
h2

h1
X

2i

i=0

Thus in the random walk case,


eT+h,T =

h1
X

T+hi

i=0

h2 = h 2

h step ahead 95% interval : yT 1.96 h


h step ahead density forecast : N(yT , h 2 )
251 / 323

Effects of Unit Roots


I

Sample autocorrelation function fails to damp

Sample partial autocorrelation function near 1 for = 1, and


then damps quickly

Properties of estimators change

e.g., least-squares autoregression with unit roots


True process:
yt = yt1 + t
Estimated model:
yt = yt1 + t
Superconsistency:T (LS 1) stabilizes as sample size grows
Bias: E (LS ) < 1
I

Offsetting effects of bias and superconsistency

252 / 323

Unit Root Tests

yt = yt1 + t
iid
t N(0, 2 )
1

=
s

PT 1
t=2

2
yt1

Dickey-Fuller distribution
Trick regression:
yt yt = ( 1)yt1 + t

253 / 323

Allowing for nonzero mean under the alternative


Basic model:
(yt ) = (yt1 ) + t
which we rewrite as
yt = + yt1 + t
where
= (1 )

vanishes when = 1 (null)

is nevertheless present under the alternative,

so we include an intercept in the regression


Dickey-Fuller distribution

254 / 323

Allowing for deterministic linear trend under the alternative


Basic model:
(yt a bTIMEt ) = (yt1 a b TIMEt1 ) + t
or
yt = + TIMEt + yt1 + t
where = a(1 ) + b and = b(1 ).
I

Under the null hypothesis we have a random walk with drift,


yt = b + yt1 + t

Under the deterministic-trend alternative hypothesis both the


intercept and the trend enter and so are included in the
regression.

255 / 323

Allowing for higher-order autoregressive dynamics


AR(p) process:
yt +

p
X

j ytj = t

j=1

Rewrite:
yt = 1 yt1 +

p
X

j (ytj+1 ytj ) + t

j=2

where p 2, 1 =

Pp

j=1 j ,

and i =

Pp

j=1 j ,

i = 2, ..., p.

Unit root: 1 = 1 (AR(p 1) in first differences)


distribution holds asymptotically.

256 / 323

Allowing for a nonzero mean in the AR(p) case

(yt ) +

p
X

j (ytj ) = t

j=1

or
yt = + yt1 +

p
X

j (ytj+1 ytj ) + t ,

j=2

P
where = (1 + pj=1 j ), and the other parameters are as
above.
In the unit root case, the intercept vanishes, because
Pp

distribution holds asymptotically.


j=1 j = 1.

257 / 323

Allowing for trend under the alternative


(yt a bTIMEt ) +

p
X

j (ytj a bTIMEtj ) = t

j=1

or
yt = k1 + k2 TIMEt + 1 yt1 +

p
X

j (ytj+1 ytj ) + t

j=2

where
k1 = a(1 +

p
X

i ) b

i=1

and
k2 = bTIMEt (1 +

p
X

ii

i=1
p
X

i )

i=1

In the unit root case, k1 = b

Pp

i=1 ii

and k2 = 0.

distribution holds asymptotically.


258 / 323

General ARMA representations: augmented Dickey-Fuller


tests
yt = 1 yt1 +

k1
X

j (ytj+1 ytj ) + t

j=2

yt = + 1 yt1 +

k1
X

j (ytj+1 ytj ) + t

j=2

yt = k1 + k2 TIMEt + 1 yt1 +

k1
X

j (ytj+1 ytj ) + t

j=2
I

k-1 augmentation lags have been included

, , and hold asymptotically under the null


259 / 323

Simple moving average smoothing

1. Original data: {yt }T


t=1
2. Smoothed data: {
yt }
P
3. Two-sided moving average is yt = (2m + 1)1 m
i=m yti
P
4. One-sided moving average is yt = (m + 1)1 m
i=0 yti
Pm
5. One-sided weighted moving average is yt =
i=0 wi yti
I

Must choose smoothing parameter, m

260 / 323

Exponential Smoothing

Local level model:


yt = c0t + t
c0t = c0,t1 + t
t WN(0, 2 )
I

Exponential smoothing can construct the optimal estimate of


c0 - and hence the optimal forecast of any future value of y on the basis of current and past y

What if the model is misspecified?

261 / 323

Exponential smoothing algorithm

Observed series, {yt }T


t=1

Smoothed series, {
y t }T
t=1 (estimate of the local level)

Forecasts, yT +h,T
1. Initialize at t = 1: y1 = y1
2. Update: yt = yt + (1 )
yt1 , t = 2, ...T
3. Forecast: yT + h, T = yT

Smoothing parameter [0, 1]

262 / 323

Demonstration that the weights are exponential


Start:
y
t = yt + (1 )
yt1
Substitute backward for y :
y
t =

t1
X

wj ytj

j=0

where
wj = (1 )j
I

Exponential weighting, as claimed

Convenient recursive structure

263 / 323

Holt-Winters Smoothing

yt = c0t + c1t TIMEt + t


c0t = c0,t1 + t
c1t = c1,t1 + t
I

Local level and slope model

Holt-Winters smoothing can construct optimal estimates of c0


and c1 - hence the optimal forecast of any future value of y by
extrapolating the trend - on the basis of current and past y

264 / 323

Holt-Winters smoothing algorithm


1. Initialize at t = 2:
y
2 = y2
F2 = y2 y1
2. Update:
y
t = yt + (1 )(
yt1 + Ft1 ), 0 < < 1
Ft = (yt y
t1 ) + (1 )Ft1 , 0 < < 1
t = 3, 4, ..., T.
3. Forecast: yT +h,T = yT + hFT
I

yt is the estimated level at time t

Ft is the estimated slope at time t


265 / 323

Random Walk Level and Change

266 / 323

Random Walk With Drift Level and Change

267 / 323

U.S. Per Capita GNP History and Two Forecasts

268 / 323

U.S. Per Capita GNP History, Two Forecasts, and


Realization

269 / 323

Random Walk, Levels Sample Autocorrelation Function


(Top Panel) and Sample Partial Autocorrelation Function
(Bottom Panel)

270 / 323

Random Walk, First Differences Sample Autocorrelation


Function (Top Panel) and Sample Partial Autocorrelation
Function (Bottom Panel)

271 / 323

Log Yen / Dollar Exchange Rate (Top Panel) and Change


in Log Yen / Dollar Exchange Rate (Bottom Panel)

272 / 323

Log Yen / Dollar Exchange Rate Sample


Autocorrelations (Top Panel) and Sample Partial
Autocorrelations (Bottom Panel)

273 / 323

Log Yen / Dollar Exchange Rate, First Differences


Sample Autocorrelations (Top Panel) and Sample Partial
Autocorrelations (Bottom Panel)

274 / 323

Log Yen / Dollar Rate, Levels AIC and SIC Values of


Various ARMA Models

275 / 323

Log Yen / Dollar Exchange Rate Best-Fitting


Deterministic-Trend Model

276 / 323

Log Yen / Dollar Exchange Rate Best-Fitting


Deterministic-Trend Model : Residual Plot

277 / 323

Log Yen / Dollar Rate History and Forecast : AR(2) in


Levels with Linear Trend

278 / 323

Log Yen / Dollar Rate History and Long-Horizon


Forecast : AR(2) in Levels with Linear Trend

279 / 323

Log Yen / Dollar Rate History, Forecast and Realization :


AR(2) in Levels with Linear Trend

280 / 323

Log Yen / Dollar Exchange Rate Augmented


Dickey-Fuller Unit Root Test

281 / 323

Log Yen / Dollar Rate, Changes AIC and SIC Values of


Various ARMA Models

282 / 323

Log Yen / Dollar Exchange Rate Best-Fitting


Stochastic-Trend Model

283 / 323

Log Yen / Dollar Exchange Rate Best-Fitting


Stochastic-Trend Model : Residual Plot

284 / 323

Log Yen / Dollar Rate History and Forecast : AR(1) in


Differences with Intercept

285 / 323

Log Yen / Dollar Rate History and Long-Horizon


Forecast : AR(1) in Differences with Intercept

286 / 323

Log Yen / Dollar Rate History, Forecast and Realization :


AR(1) in Differences with Intercept

287 / 323

Log Yen / Dollar Exchange Rate Holt-Winters


Smoothing

288 / 323

Log Yen / Dollar Rate History and Forecast :


Holt-Winters Smoothing

289 / 323

Log Yen / Dollar Rate History and Long-Horizon


Forecast : Holt-Winters Smoothing

290 / 323

Log Yen / Dollar Rate History, Forecast and Realization :


Holt-Winters Smoothing

291 / 323

Volatility Measurement, Modeling and Forecasting


The main idea:
t | t1 (0, t2 )
t1 = {t1 , t2 , ...}
Well look at:
I

Basic Structure and properties

Time variation in volatility and prediction-error variance

ARMA representation in squares

GARCH(1,1) and exponential smoothing

Unconditional symmetry and leptokurtosis

Convergence to normality under temporal aggregation

Estimation and testing


292 / 323

Basic Structure and Properties

Standard models (e.g., ARMA):


I

Unconditional mean: constant

Unconditional variance: constant

Conditional mean: varies

Conditional variance: constant (unfortunately)

k-step-ahead forecast error variance: depends only on k, not


on t (again unfortunately)

293 / 323

The Basic ARCH Process


yt = B(L)t
B(L) =

bi L

i = 0

b2i <

b0 = 1

i = 0

t | t1 N(0, t2 )
t2 = + (L)2t

>0

(L) =

p
X

i Li

i 0 for all i

i < 1.

i = 1

294 / 323

The Basic ARCH Process cont.


ARCH(1) process:
rt | t1 (0, t2 )

t2 = + r2t1
I

Unconditional mean: E (rt ) = 0

Unconditional variance: E (rt E (rt ))2 =

Conditional mean: E (rt | t1 ) = 0

Conditional variance:
2
E ([rt E (rt | t1 )]2 | t1 ) = + rt1

295 / 323

The GARCH Process

yt = t
t | t1 N(0, t2 )
t2 = + (L)2t + (L)t2

(L) =

p
X

i L ,

(L) =

i = 1

> 0,

i 0,

q
X

i Li

i = 1

i 0,

i +

i < 1.

296 / 323

Time Variation in Volatility and Prediction Error Variance

Prediction error variance depends on t1


I

e.g. 1-step-ahead prediction error variance is now


2
t2 = + r2t1 + t1

Conditional variance is a serially correlated RV


I

Again, follows immediately from


2
t2 = + r2t1 + t1

297 / 323

ARMA Representation in Squares


rt2 has the ARMA(1,1) representation:
r2t = + ( + )r2t1 t1 + t
where t = rt2 t2

Important result:
The above equation is simply
r2t = ( + ( + )r2t1 t1 ) + t
= t2 + t
Thus rt2 is a noisy indicator of t2

298 / 323

GARCH(1,1) and Exponential Smoothing


Exponential smoothing recursion:
r2t = r2t + (1 )r2t1
Back substitution yields:
r2t =

wj r2tj

where
wj = (1 )j
GARCH(1,1)
2
t2 = + r2t1 + t1

Back substitution yields:


t2 =

+
j1 r2tj
1
299 / 323

Unconditional Symmetry and Leptokurtosis

Volatility clustering produces unconditional leptokurtosis

Conditional symmetry translates into unconditional symmetry


Unexpected agreement with the facts!
Convergence to Normality under Temporal Aggregation

Temporal aggregation of covariance stationary GARCH


processes produces convergence to normality.
Again, unexpected agreement with the facts!

300 / 323

Estimation and Testing


Estimation: easy!
Maximum Likelihood Estimation
L(; r1 , ..., rT ) = f(rT | T1 ; ) f(rT1 | T2 ; )...
If the conditional densities are Gaussian,


1 r2
1
f(rt | t1 ; ) = t2 ()1/2 exp 2 t
.
2 t ()
2
We can ignore the f (rp , ..., r1 ; ) term, yielding the likelihood:

T
T
1 X r2t
Tp
1 X
ln t2 ()
.
ln(2)
2
2
2
2 ()
t=p+1
t=p+1 t

Testing: likelihood ratio tests


Graphical diagnostics: Correlogram of squares, correlogram of
squared standardized residuals
301 / 323

Variations on Volatility Models

We will look at:


I

Asymmetric response and the leverage effect

Exogenous variables

GARCH-M and time-varying risk premia

302 / 323

Asymmetric Response and the Leverage Effect:


TGARCH and EGARCH
Asymmetric response I: TARCH
Standard GARCH:
2
t2 = + r2t1 + t1

TARCH:
2
t2 = + r2t1 + r2t1 Dt1 + t1

where
positive return (good news): effect on volatility
negative return (bad news): + effect on volatility
6= 0 : Asymmetric news response
> 0 : Leverage effect

303 / 323

Asymmetric Response and the Leverage Effect Cont.

Asymmetric Response II: E-GARCH




rt1
2
2
+ rt1 + ln(t1

)
ln(t ) = +
t1
t1
I

Log specification ensures that the conditional variance is


positive.

Volatility driven by both size and sign of shocks

Leverage effect when < 0

304 / 323

Introducing Exogenous Variables

rt | t1 N(0, t2 )
2
t2 = + r2t1 + t1
+ Xt

where:
is a parameter vector
X is a set of positive exogenous variables.

305 / 323

Component GARCH
Standard GARCH:
2
(t2
) = (r2t1
) + (t1

),

for constant long-run volatility


.
Component GARCH:
2
(t2 qt ) = (r2t1 qt1 ) + (t1
qt1 ),

for time-varying long-run volatility qt , where


2
)
qt = + (qt1 ) + (r2t1 t1

306 / 323

Component GARCH Cont.

Transitory dynamics governed by +

Persistent dynamics governed by

Equivalent to nonlinearly restricted GARCH(2,2)

Exogenous variables and asymmetry can be allowed:

2
(t2 qt ) = (r2t1 qt1 ) + (r2t1 qt1 )Dt1 + (t1
qt1 ) +

307 / 323

Regression with GARCH Disturbances

yt = x0t + t

t | t1 N(0, t2 )

308 / 323

GARCH-M and Time-Varying Risk Premia


Standard GARCH regression model:
yt = x0t + t

t | t1 N(0, t2 )
GARCH-M model is a special case:
yt = x0t + t2 + t
t | t1 N(0, t2 )

Time-varying risk premia in excess returns


309 / 323

Time Series Plot NYSE Returns

310 / 323

Histogram and Related Diagnostic Statistics NYSE


Returns

311 / 323

Correlogram NYSE Returns

312 / 323

Time Series Plot Squared NYSE Returns

313 / 323

Correlogram Squared NYSE Returns

314 / 323

AR(5) Model Squared NYSE Returns

315 / 323

ARCH(5) Model NYSE Returns

316 / 323

Correlogram Standardized ARCH(5) Residuals : NYSE


Returns

317 / 323

GARCH(1,1) Model NYSE Returns

318 / 323

Correlogram Standardized GARCH(1,1) Residuals :


NYSE Returns

319 / 323

Estimated Conditional Standard Deviation GARCH(1,1)


Model : NYSE Returns

320 / 323

Estimated Conditional Standard Deviation Exponential


Smoothing : NYSE Returns

321 / 323

Conditional Standard Deviation History and Forecast :


GARCH(1,1) Model

322 / 323

Conditional Standard Deviation Extended History and


Extended Forecast : GARCH(1,1) Model

323 / 323

You might also like