Professional Documents
Culture Documents
Applied Economic Forecasting Using Time Series Methods Ghysels Full Chapter PDF
Applied Economic Forecasting Using Time Series Methods Ghysels Full Chapter PDF
https://ebookmass.com/product/applied-time-series-analysis-a-
practical-guide-to-modeling-and-forecasting-terence-c-mills/
https://ebookmass.com/product/machine-learning-for-time-series-
forecasting-with-python-francesca-lazzeri/
https://ebookmass.com/product/intermittent-demand-forecasting-
context-methods-and-applications-syntetos/
https://ebookmass.com/product/change-detection-and-image-time-
series-analysis-1-unsupervised-methods-1st-edition-abdourrahmane-
m-atto/
Intermittent Demand Forecasting: Context, Methods and
Applications 1st Edition Aris A. Syntetos
https://ebookmass.com/product/intermittent-demand-forecasting-
context-methods-and-applications-1st-edition-aris-a-syntetos/
https://ebookmass.com/product/survey-research-methods-applied-
social-research-methods-book-1-5th-edition-ebook-pdf/
https://ebookmass.com/product/interrupted-time-series-analysis-
david-mcdowall/
https://ebookmass.com/product/time-villains-series-book-1-victor-
pineiro/
https://ebookmass.com/product/text-as-data-computational-methods-
of-understanding-written-expression-using-sas-wiley-and-sas-
business-series-1st-edition-deville/
Applied Economic Forecasting
using Time Series Methods
Applied Economic Forecasting using
Time Series Methods
Eric Ghysels
University of North Carolina, Chapel Hill, United States
Massimiliano Marcellino
Bocconi University, Milan, Italy
1
3
Oxford University Press is a department of the University of Oxford. It furthers
the University’s objective of excellence in research, scholarship, and education
by publishing worldwide. Oxford is a registered trade mark of Oxford University
Press in the UK and in certain other countries.
9 8 7 6 5 4 3 2 1
Printed by Sheridan Books, Inc., United States of America
Contents
Preface xv
v
vi CONTENTS
2 Model Mis-Specification 57
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.2 Heteroskedastic and correlated errors . . . . . . . . . . . . . . 57
2.2.1 The Generalized Least Squares (GLS) estimator and
the feasible GLS estimator . . . . . . . . . . . . . . . . 60
2.3 HAC estimators . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.4 Some tests for homoskedasticity and no correlation . . . . . . 66
2.5 Parameter instability . . . . . . . . . . . . . . . . . . . . . . . 69
2.5.1 The effects of parameter changes . . . . . . . . . . . . 70
2.5.2 Simple tests for parameter changes . . . . . . . . . . . 71
2.5.3 Recursive methods . . . . . . . . . . . . . . . . . . . . 74
2.5.4 Dummy variables . . . . . . . . . . . . . . . . . . . . . 76
2.5.5 Multiple breaks . . . . . . . . . . . . . . . . . . . . . . 78
2.6 Measurement error and real-time data . . . . . . . . . . . . . 80
2.7 Instrumental variables . . . . . . . . . . . . . . . . . . . . . . 82
2.8 Examples using simulated data . . . . . . . . . . . . . . . . . 84
2.9 Empirical examples . . . . . . . . . . . . . . . . . . . . . . . 94
2.9.1 Forecasting Euro area GDP growth . . . . . . . . . . . 94
2.9.2 Forecasting US GDP growth . . . . . . . . . . . . . . . 107
2.9.3 Default risk . . . . . . . . . . . . . . . . . . . . . . . . 114
2.10 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . 118
Bibliography 559
xv
xvi PREFACE
ror correction (ECM) models, and Chapter 8 with Bayesian methods for
VAR analysis. The material progressively becomes better suited for master
level and/or PhD level classes, in particular the chapters on cointegration
and Bayesian VARs. Models considered in Part II are also common tools for
practical forecasting in public and private institutions.
Parts III and IV of the book contain a collection of special topics chapters.
Each chapter is self-contained, and therefore an instructor or researcher can
pick and choose the topics she or he wants to cover. Part III deals mainly
with modeling parameter time-variation. Specifically, Chapter 9 presents
Threshold and Smooth Transition Autoregressive (TAR and STAR) models,
Chapter 10 Markov switching regime models, and Chapter 11 state space
models and the Kalman filter, introducing, for example, models with random
coefficients and dynamic factor models.
Part IV deals with mixed frequency data models and their use for now-
casting in Chapter 12, forecasting using large datasets in Chapter 13 and,
finally, volatility models in Chapter 14.
Each chapter starts with a review of the main theoretical results to pre-
pare the reader for the various applications. Examples involving simulated
data follow, to make the reader familiar with application using at first styl-
ized settings where one knows the true data generating process and one
learns how to apply the techniques introduced in each chapter. From our
own teaching experience we find this to be extremely useful as it creates
familiarity with each of the topics in a controlled environment. The simu-
lated examples are followed by real data applications - focusing on macroe-
conomic and financial topics. Some of the examples run across different
chapters, particularly in the early part of the book. All data are public
domain and cover Euro area, UK, and US examples, including forecasting
US GDP growth, default risk, inventories, effective federal funds rates, com-
posite index of leading indicators, industrial production, Euro area GDP
growth, UK term structure of interest rates, to mention the most prominent
examples.
The book is mostly software neutral. However, for almost all the simu-
lated and empirical examples we provide companion EViews R and R code.
The former is mostly - but not exclusively - a menu driven licensed software
package whereas the latter is an open source programming language and soft-
ware environment for statistical computing and graphics supported by the
PREFACE xvii
R Foundation for Statistical Computing.1 . We provide code and data for all
the simulated and empirical examples in the book on the web.2 Moreover, all
the tables and figures appearing in the book were produced using EViews R .
As the title suggests, this book is an applied time series forecasting book.
Hence, we do not have the pretense to provide the full theoretical founda-
tions for the forecasting methods we present. Indeed, there are already a
number of books with thorough coverage of theory. The most recent exam-
ple is the excellent book by Elliott and Timmermann (2016) which provides
an in-depth analysis of the statistical theory underlying predictive models,
covering a variety of alternative forecasting methods in both classical and
Bayesian contexts, as well as techniques for forecast evaluation, comparison,
and combination. Equally useful and complimentary are the many standard
econometrics textbooks such as Hendry (1995), Judge, Hill, Griffiths, Lütke-
pohl, and Lee (1988), Stock and Watson (2009), Wooldridge (2012), and
the many graduate time series textbooks such as Box and Jenkins (1976),
Clements and Hendry (1998), Diebold (2008), Hamilton (1994), or Lütkepohl
(2007), among others.
We should also highlight that this book is an applied time series forecast-
ing book. As such, we will not discuss forecasting using economic theory-
based structural models, such as Dynamic Stochastic General Equilibrium
(DSGE) models, see, e.g., Del Negro and Schorfheide (2013) for an exhaus-
tive overview and references. Typically, these structural models are better
suited for medium- and long-horizon forecasting, and for policy simulations,
while time series methods work better for nowcasting and short-term fore-
casting (say, up to one-year ahead).
This book would not have been completed without the invaluable and
skillful help of a number of our current and former TAs and RAs. Special
thanks go to Cristina Angelico, Francesco Corsello, Francesco Giovanardi,
Novella Maugeri, and Nazire Özkan who helped with all the simulation and
empirical examples throughout the entire book, as well as Hanwei Liu who
wrote the R codes. Their contributions to the book were invaluable. We
would also like to thank many cohorts of undergraduate, master, and PhD
1
Please visit the web page http://www.eviews.com for further information on EViews R
and R programming language Wikipedia web page https://en.wikipedia.org/wiki/R
(programming language).
2
Please visit our respective webpages www.unc.edu/∼eghysels or www.igier.unibocconi.
it/marcellino to download the code and data.
xviii PREFACE
students who took courses based on the materials covered in the book at
Bocconi University and at the University of North Carolina, Chapel Hill and
the Kenan-Flagler Business School.
We are also grateful to a number of coauthors of papers on the topics
covered in the book. In particular, Eric Ghysels would like to thank Lucia
Alessi, Elena Andreou, Jennie Bai, Fabio Canova, Xilong Chen, Riccardo
Colacito, Robert Engle, Claudia Foroni, Lars Forsberg, Patrick Gagliardini,
René Garcia, Clive Granger, Andrew Harvey, Jonathan Hill, Casidhe Horan,
Lynda Khalaf, Andros Kourtellos, Virmantas Kvedaras, Emanuel Moench,
Kaiji Motegi, Denise Osborn, Alberto Plazzi, Eric Renault, Mirco Rubin,
Antonio Rubia, Pedro Santa-Clara, Pierre Siklos, Arthur Sinko, Bumjean
Sohn, Ross Valkanov, Cosmé Vodounou, Fangfang Wang, Jonathan Wright,
and Vaidotas Zemlys (and Massimiliano of course!). Massimiliano Marcellino
would like to thank Knut-Are Aastveit, Angela Abbate, Elena Angelini, Mike
Artis, Anindya Banerjee, Guenter Beck, Stelios Bekiros, Ralf Burggemann,
Andrea Carriero, Roberto Casarin, Todd Clark, Francesco Corielli, Sandra
Eickmeier, Carlo Favero, Laurent Ferrara, Claudia Foroni, Giampiero Gallo,
Ana Galvão, Pierre Guérin, Jerome Henry, Christian Hepenstrick, Kristin
Hubrich, Oscar Jorda, George Kapetanios, Malte Knüppel, Hans-Martin
Krolzig, Danilo Leiva-Leon, Wolfgang Lemke, Helmut Lütkepohl, Igor Mas-
ten, Gian Luigi Mazzi, Grayham Mizon, Matteo Mogliani, Alberto Musso,
Chiara Osbat, Fotis Papailias, Mario Porqueddu, Esteban Prieto, Tommaso
Proietti, Francesco Ravazzolo, Christian Schumacher, Dalibor Stevanovic,
Jim Stock, Fabrizio Venditti, and Mark Watson (and Eric of course!).
Several colleagues provided useful comments on the book, in particular
Frank Diebold, Helmut Lütkepohl, Serena Ng, Simon Price, Frank Schorfheide,
and Allan Timmermann. Of course, we remain responsible for any remaining
mistakes or omissions.
Finally, we would like to thank our families for their continuous support
and understanding. We forecast a quieter and more relaxed “Life after the
Book,” but as all forecasters, we could be proven wrong.
1.1 Introduction
3
4 CHAPTER 1. THE BASELINE LINEAR REGRESSION MODEL
y = Xβ + ε . (1.2.2)
(T ×1) (T ×k)(k×1) (T ×1)
LR1 E(ε) = 0,
LR2 E(εε0 ) = σ 2 IT ,
The assumptions LR1 through LR5 are typically known as “weak as-
sumptions.” The first assumption says that the expected value of the error
term should be equal to zero, namely, on average the model should provide a
correct explanation for y, which is a basic ingredient of a reasonable model.
Assumption 1.2.1 - LR2 states that the matrix of second moments of the er-
rors (which is also equal to their variance covariance matrix given Assumption
1.2.1 - LR1) is diagonal. This implies that the variance of the error term is
stable over time (homoskedasticity), and that the errors are uncorrelated over
time (no correlation). The requirement that X is distributed independently
of ε, in Assumption 1.2.1 - LR3, is needed to guarantee good properties for
the simplest parameter estimator, as we will see in the next section. Assump-
tion 1.2.1 - LR4 is typically referred to as lack of perfect multicollinearity,
and it is concerned with regressor redundancy and the identifiability of all
the β parameters. Actually, consider the case where k = 2 and X1 = X2 .
The matrix X 0 X in this case has dimensions 2 × 2, but its rank is equal to
one, and it can be easily shown that we can only identify (namely, recover
from the available information) β1 + β2 but not both β1 and β2 separately.
Assumption 1.2.1 - LR5 is instead concerned with the amount of temporal
persistence in the explanatory variables, and basically requires that if each
element of X is affected by a shock, the resulting effects on that element
6 CHAPTER 1. THE BASELINE LINEAR REGRESSION MODEL
E(β̂) = β, (1.3.2)
which means that if we could draw a very large number of samples from
y, each of dimension T, and for each sample j we computed β̂j using the
formula in (1.3.1), the average across the many samples of β̂j would be equal
to β. The OLS estimator β̂ is the best linear unbiased estimator since it is
1.3. PARAMETER ESTIMATION 7
the most precise in the class of the linear unbiased estimators for β. In other
words, β̂ has minimal variance in the class of estimators that are obtained
as a linear combination of the y1 , . . . , yT . Minimum variance, more formally,
means that the difference between the variance covariance matrix of β̂j and
that of another linear unbiased estimator is negative definite. This is a
convenient property since it implies that we are using optimally the available
information, in the sense of obtaining the most precise estimator for the
parameters β. Specifically, it is
While estimators are random variables, estimates are just numbers, and since
in general X 0 X/T converges to a matrix when T diverges while σ 2 /T goes to
zero, it is
lim V ar(β̂) = 0.
T →∞
This finding, combined with the unbiasedness of β̂, implied that the OLS
estimator is consistent for β, meaning that when the size of the sample size
T gets very large β̂ gets very close to the true value β (more formally, β̂
converges in probability to β).
Let us now use β̂ to construct the residuals
and collect ε̂t , t = 1, . . . , T, into the T × 1 vector ε̂. The residuals are related
to the errors but different. Specifically, they are:
We can use the residuals to construct an estimator for the variance of the
errors σ 2 as
σ̂ 2 = ε̂0 ε̂/(T − k), (1.3.5)
8 CHAPTER 1. THE BASELINE LINEAR REGRESSION MODEL
E(σ̂ 2 ) = σ 2 .
If, in addition to the assumptions stated above, we are also willing to maintain
that the errors are normally distributed, so that
ε ∼ N (0, σ 2 IT ), (1.3.6)
power beyond the specific sample. On the other hand, a model with a poor
in sample fit rarely forecasts well. In this sense, it is convenient to examine
measures of in sample model fit prior to forecasting, though this should not
be the unique forecast selection criterion.
A common measure of model fit is the coefficient of determination, R2 ,
that compares the variability in the dependent variable y with that in the
model (OLS) residuals ε̂. The better the model, the smaller the latter with
respect to the former. More precisely, R2 is defined as
ε̂0 ε̂ ŷ 0 ŷ
R2 = 1 − = (1.4.1)
y0y y0y
PT 2 PT 2
t=1 ε̂t t=1 (yt − Xt β̂)
= 1 − PT 2 = PT 2
t=1 yt t=1 yt
where
ŷ = X β̂ (1.4.2)
is the model fit. Hence, it is 0 ≤ R2 ≤ 1, with R2 = 0 when ε̂0 ε̂ = y 0 y, namely
the model has no explanatory power, and R2 = 1 if the fit is perfect.1
It is possible to show that it is also
R2 = Corr(y, ŷ)2 ,
meaning that the better the model fit the closer the correlation between
actual and expected y according to the model.
A problem with the coefficient of determination is that it is monotonically
increasing with the number of explanatory variables, meaning that a model
with many regressors will always generate a higher R2 than a model with a
subset of them, even when the additional variables in the larger model are
useless. Hence, it is convenient to introduce a modified version of R2 that
corrects this problem. The adjusted R2 is defined as
PT 2
2 ε̂0 ε̂/(T − k) ε̂ /(T − k)
R =1− 0 = 1 − PTt=1 t2 , (1.4.3)
y y/(T − 1) y
t=1 t /(T − 1)
1
Note that R2 is based on a decomposition of y into two orthogonal components, ŷ (the
fit) and ε̂ (the residual). OLS estimation guarantees orthogonality of fit and residuals, ŷ 0 ε̂
= 0, but this is not necessarily the case with other estimation methods. Moreover, an
intercept should be included in the model to make sure that 0 ≤ R2 ≤ 1.
10 CHAPTER 1. THE BASELINE LINEAR REGRESSION MODEL
in the sense of producing minimum forecast error variance and zero mean
forecast error (forecast unbiasedness), where the forecast error is
et+h = yT +h − ŷT +h . (1.5.2)
1.5. CONSTRUCTING POINT FORECASTS 11
yT +h − l0 y = (XT +h − l0 X ) β + εT +h − l0 ε.
1×k 1×T T ×k k×1
E (yT +h − l0 y) = (XT +h − l0 X) β,
so that unbiasedness of the forecast (namely, zero mean forecast error) re-
quires l0 X = XT +h .
Assuming that this condition is satisfied, the variance of the forecast error
coincides with the second moment:
2 2
E (yT +h − l0 y) = E (εT +h − l0 ε) = σ 2 (1 + l0 l) .
We want to find the vector l which minimizes this quantity, or just l0 l, subject
to the constraint l0 X = XT +h . To that end, we construct the Lagrangian
function:
L = l0 l − λ0 (X 0 l − XT0 +h ),
1×k k×1
∂L/∂l = 2l − Xλ,
T ×1
∂L/∂λ = X 0 l − XT0 +h .
k×1
Second, note that we have used a similar loss function for the derivation
of the OLS estimator and of the optimal linear forecast. If the forecast loss
function is different from the mean squared error, we might use a comparable
loss function also at the estimation stage, which of course would produce
a different type of parameter estimator, see, e.g. Elliott, Komunjer, and
Timmermann (2008).
Third, the empirical counterpart of the theoretical second moment of
the forecast error (our loss function) is called Mean Squared Forecast Error
(MSFE), and it is often used as a goodness of measure of a forecast, see
Chapter 4 for more details.
Finally, and as mentioned, the forecast in (1.5.1) is optimal if the assump-
tions underlying the model are valid. If, for example, the errors of the linear
model are not i.i.d., this should be taken into proper consideration in the
derivations. We will revisit this issue in the next chapters.
consider that we are using parameter estimators for β and σ 2 and the sample
is finite, then the standardized forecast error has a Student t distribution
with T − k degrees of freedom, as well as yT +h in (1.6.1).
The density forecast can be used to assign probabilities to specific events
of interest concerning the future behavior of the variable y. For example, if y
is inflation, with the formula in (1.6.1) we can compute the probability that
inflation in period T + h will be higher than 2%.
Another use of the forecast density is to construct interval forecasts for
yT +h . A [1 − α] % forecast interval is represented as
p p
ŷT +h − cα/2 Var(eT +h ); ŷT +h + cα/2 Var(eT +h ), (1.6.2)
where cα/2 is the (α/2) % critical value for the standard normal density (or
from the Student t density in the case of estimated parameters and small
samples). For example, a 95% confidence interval is given by
p p
ŷT +h − 1.96 Var(eT +h ); ŷT +h + 1.96 Var(eT +h ). (1.6.3)
For example, if y is again inflation, the optimal linear point forecast is ŷT +h =
2, and Var(eT +h ) = 4, the formula implies that a 95% interval forecast for
inflation in period T +h is [−1.92, 5.92]. As in the case of the density forecast,
the interval forecast is also only approximate with finite estimation samples.
The interpretation of the interval forecast is the following. Suppose that
we could generate a very large number of samples of data for y and X, each
of size T + h, and for each sample construct the interval forecast for yT +h
as in (1.6.2). Then, in [1 − α] % of the samples the realization of yT +h
will fall in the interval described in (1.6.2). A more common interpretation
is that there is a [1 − α] % probability that the future realization of yT +h
will fall in the interval in (1.6.2). Hence, continuing the example, there is a
95% probability that inflation at T + h will be lower than 5.92 and higher
than −1.92. However, strictly speaking, only the former interpretation of the
confidence interval is correct.
Note that density forecasts and confidence intervals can also be con-
structed with different assumptions on the distribution of the error term,
though the derivations are more complex. Moreover, as long as the distribu-
tion of the error is symmetric, the density (and the interval forecasts) will be
centered around the optimal point forecast that coincides, as said, with the
future expected value of the dependent variable, conditional on the available
information set.
1.7. PARAMETER TESTING 15
Finally, both forecast densities and interval forecasts tell us more about
(inflation) risk, a topic we will also discuss in further detail in later chap-
ters.
4. A decision rule that tells us, based on the value of the test statistic,
whether to accept or reject the null hypothesis. In practice, we define
a rejection region as a set of values such that if the realization of the
test statistic falls in this region, we reject the null hypothesis.
β̂i − c
t − stat = q , (1.7.3)
vdar(β̂i )
β̂i
t − stat = q , (1.7.5)
vd
ar(β̂i )
and the rejection region depends on the chosen alternative hypothesis, which
can be either βi 6= 0 or βi < 0 or βi ¿ 0.
An alternative decision rule can be sometimes useful, in particular when
using standard econometric software. Typically, the software reports the
probability, under the null hypothesis, that the absolute value of the test
statistic is larger than the realized value of the statistic. This quantity is
called p-value and it is formally defined, in the case of the t-stat, as
or
p − value = probH0 (t − stat > a) + probH0 (t − stat < −a),
2
The case where the alternative hypothesis is βi > c is similar.
18 CHAPTER 1. THE BASELINE LINEAR REGRESSION MODEL
where a is the realization of the t-stat. Hence, if the p-value is smaller than α,
it means that the t-stat is either very positive or very negative, and therefore
we can reject the null hypothesis.
Let us now consider the composite null hypothesis H0 : β = c in (1.7.2),
meaning β1 = c1 , . . . , βk = ck jointly. For testing H0 we can use the F-
statistic, or F-stat, defined as
and continue untill all the retained regressors are significant. However, this
procedure has a few drawbacks: first, it is very difficult to determine its
overall probability of type I error, due to the use of sequential testing; second,
if we choose a too high value of α we could keep redundant regressors, while
with a too low value we could omit relevant regressors; third, if we deleted
non-significant variables in groups rather than one by one, using F-statistics,
we might end up with a different model; fourth, the final model could possibly
not satisfy the Assumptions LR1 - LR5 in 1.2.1 even though the starting
general model did; finally, manually the procedure may require a lot of time
though it can be rather easily coded, see also Section 2.9.
Efron, Hastie, Johnstone, and Tibshirani (2004) show that the LARS
algorithm encompasses other popular shrinkage methods, including Forward
Selection itself, LASSO and the Elastic Net, to which we turn next.
where RSS is again the Residual Sum of Squares. The Lagrange multiplier λ
governs the shrinkage: the higher λ, the higher the penalty for having extra
regressors in the model.
LASSO introduces a slight but important modification of the penalty
function of the RIDGE regressor, which, rather than being a quadratic func-
tion shows a kink at zero:
M
X
min RSS + λ |βj |.
β
j=1
This modification implies that, unlike in the RIDGE setup, in LASSO some
regression coefficients are set exactly at zero. This is a convenient feature
in particular when many potential regressors are considered, as for example
in applications with big data (see Chapter 13 for further discussion on the
topic).
24 CHAPTER 1. THE BASELINE LINEAR REGRESSION MODEL
1.10 Multicollinearity
We have so far maintained Assumption 1.2.1 LR4 that is concerned with
regressor redundancy and the identifiability of all the β parameters in the
regression model. As a counter-example, we have considered the case where
k = 2 and X1 = X2 . The matrix X 0 X in this case has dimension 2 × 2 but
its rank is equal to one. Since the model becomes
yt = β1 x1t + β2 x2t + εt (1.10.1)
= (β1 + β2 )x1t + εt ,
1.10. MULTICOLLINEARITY 25
or as
ŷT +h = β\
1 + β2 x1T +h ,