Professional Documents
Culture Documents
Ms Var Libro
Ms Var Libro
Hans-Martin Krolzig
Markov-Switching Vector
Autoregressions
Modelling, Statistical Inference, and Application to
Business Cycle Analysis
Lecture N otes in Economies
and Mathematical Systems 454
Founding Editors:
M. Beckmann
H. P. Künzi
Editorial Board:
H. Albach, M. Beckmann, G. Feichtinger, W Güth, W Hildenbrand,
W Krelle, H. P. Künzi, K. Ritter, U. Schittko, P. Schönfeld, R. Selten
Managing Editors:
Prof. Dr. G. Fandei
Fachbereich Wirtschaftswissenschaften
Fernuniversität Hagen
Feithstr. 140/AVZ II, D-58084 Hagen, Germany
Prof. Dr. W. Trockel
Institut für Mathematische Wirtschaftsforschung (IMW)
Universität Bielefeld
Universitäts,str. 25, D-33615 Bielefeld, Germany
Springer-Verlag Berlin Heidelberg GmbH
Hans-Martin Krolzig
Markov-Switching
Vector Autoregressions
Modelling, Statistical Inference,
and Application to
Business Cycle Analysis
Springer
Author
Dr. Hans-Martin Krolzig
University of Oxford
Institute of Economics and Statistics
St. Cross Building, Manor Road
Oxford OX1 3UL, Great Britain
ISSN 0075-8442
ISBN 978-3-540-63073-9 ISBN 978-3-642-51684-9 (eBook)
DOI 10.1007/978-3-642-51684-9
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, re-use
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other
way, and storage in data banks. Duplication of this publication or parts thereof is
permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from
Springer-Verlag. Violations are liable for prosecution under the German Copyright
Law.
© Springer-Verlag Berlin Heide1berg 1997
Originally pub1ished by Springer-Verlag Berlin Heide1berg New York in 1997.
The use of general descriptive names, registered names, trademarks, etc. in this
publication does not imply, even in the absence of a specific statement, that such
names are exempt from the relevant protective laws and regulations and therefore
free for general use.
Typesetting: Camera ready by author
SPIN: 10546781 42/3142-543210 - Printed on acid-free paper
Ta my parents, Grete and Walter
Preface
The author is indebted to numerous individuals for help in the preparation of this
study. Primarily, I owe a great debt to Helmut Lütkepohl, who inspired me for mul-
tiple time series econometrics, suggested the subject, advised and encouraged my
viii Preface
research. The many hours Helmut Lütkepohl and Jürgen Wolters spent in discuss-
ing the issues of this study have been an immeasurable help.
The results obtained and their presentation have been profoundly affected by the in-
spiration of and interaction with numerous colleagues in Berlin and Oxford. Of the
many researchers from whom I have benefited by discussing with them various as-
pects of the work presented here, I would like especially to thank Ralph Friedmann,
David Hendry and D.S. Poskitt.
Many people have helped with the reading of the manuscript. Special thanks go to
Paul Houseman, Marianne Sensier, Dirk Soyka and Don Indra Asoka Wijewickrama;
they pointed out numerous errors and provided helpful suggestions.
I am very grateful to all of them, but they are of course, absolved from any respons-
ibility for the views expressed in the book. Any errors that may remain are my own.
Finally, I am greatly indebted to my parents and friends for their support and encour-
agement while I was struggling with the writing of the thesis.
Prologuc 1
Epilogue 329
References 331
Tables 347
Figures 351
In the last decade time series econometrics has changed dramatically. One increas-
ingly prominent field has become the treatment of regime shifts and non-linear mod-
elUng strategies. While the importance of regime shifts, particularly in macroecono-
metric systems, seems to be generally accepted, there is no estabUshed theory sug-
gesting a unique approach for specifying econometric models that embed changes in
regime.
Structural changes such as the oil price shocks, the introduction of European Mon-
etary System, the German reunification, the European Monetary Union and Eastem
European economies in transition, are often incorporated into a dynamic system in
a deterministic fashion. A time-varying process poses problems for estimation and
forecasting when a shift in parameters occurs. The degradation of performance of
structural macroeconomic models seems at least partly due to regime shifts. Increas-
ingly, regime shifts are not considered as singular deterministic events, but the un-
observable regime is assumed to be govemed by an exogenous stochastic process.
Thus regime shifts of the past are expected to occur in the future in a similar fashion.
The main aim of this study is to construct a general econometric framework for the
statistical analysis of multiple time series when the mechanism which generated the
data is subject to regime shifts. We build-up a stationary model where a stable vector
autoregression is defined conditionaI on the regime and where the regime gene rating
process is given by an irreducible ergodie Markov chain.
information in the data about regime shifts in the past, (ii.) estimating consistently
and efficiently the parameters of the model, (iii.) detecting recent regime shifts, (iv.)
correcting the vector autoregressive model at times when the regime alters, and fi-
nally (v.) incorporating the probability of future regime shifts into forecasts. This
Markov-switching vector autoregressive model represents a very general class which
encompasses some alternative non-linear and time-varying models. In general, the
model generates conditional heteroskedasticity and non-normality; prediction inter-
vals are asymmetric and reftect the prevailing uncertainty about the regime.
We will investigate the issues of detecting multiple breaks in multiple time series,
modelling, specification, estimation, testing and forecasting. En route, we discuss
the relation to alternative non-linear models and models with time-varying para-
meters. In course of this study we will also propose new directions to generalize the
MS-VAR model. Although some methodological and technical ideas are discussed
in detail, the focus is on modelling, specification and estimation of suitable models.
The first part of the book gives a comprehensive mathematical and statistical analysis
of the Markov-switching vector autoregressive model. In the first chapters, Markov-
switching vector autoregressive (MS-VAR) processes are introduced and their ba-
sic properties are investigated. We discuss the relation of the MS-VAR model to
the time invariant vector autoregressive model and against alternative nonlinear time
series models. The preliminary considerations of Chapter 1 are formalized in the
state-space representation given in Chapter 2, which will be the framework for ana-
lyzing the stochastic properties of MS-VAR processes and for developing statistical
techniques for the specification and estimation of MS-VAR models to fit data which
Survey ofthe Study 3
The main part of this study (Chapters 6 - 10) is devoted to the discussion of parameter
estimation for this c1ass of models. The c1assical method of maximum likelihood
estimation is considered in Chapter 6, where due to the nonlinearity of the model,
iterative procedures have to be introduced. While various approaches are discussed,
major attention is given to the EM algorithm, at which the limitation in the previous
literature of using special MS-VAR models is overcome. The issues of identifiab-
ility and consistency of the maximum likelihood (ML) estimation are investigated.
Techniques for the calculation of the asymptotic variance-covariance matrix of ML
estimates are presented.
In Chapter 7 the issue of model selection and model checking is investigated. The
focus is maintained on the specification of MS-VAR models. A strategy for sim-
ultaneously selecting the number of regimes and the order of the autoregression in
Markov-switching time series models based on ARMA representations is proposed
and combined with c1assical specification testing procedures.
Chapter 9 goes into further technical details of these estimation techniques and dis-
cusses the design of the regressions involved. Due to the computational demand of
iterative estimation techniques, major attention will be given to the development of
estimators which efficiently uses the structure of a special model. The regressions
involved by the EM Algorithm and the Gibbs sampier are explicitly compared for
all alternative specifications ofMS-VAR models. It is demonstrated that the presen-
ted EM algorithm, as weIl as the introduced Gibbs sampier, permits applications to
large systems. This reveals that the self-restriction of recent empirical investigations
to rudimentary univariate time series models, mixtures of normals or hidden Markov
chains is not justified.
In the second and last part of this study, the methodology introduced in the preced-
ing chapters is applied to business cyde analysis. This is not intended to present a
comprehensive analysis of the business cyde phenomenon and of all potential con-
tributions of the MS-VAR model to business cyde analysis; such an analysis would
be dearly beyond the scope of this study. Instead, the methods developed for the
statistical analysis of systems subject to regime shifts are elucidated by specific em-
pirical investigations.
national and global business cycles by analyzing a six-dimensional system for the
USA, Japan, West Germany, the UK, Canada, and Australia. The considerations for-
mulated in Chapter 13 suggest a new methodological approach to the analysis oj
cointegrated linear systems with shifts in regime. This methodology is then illus-
trated with a reconsideration of international and global business cyeles. The study
coneludes with abrief discussion of our major findings and remaining problems.
The study has a modular structure. Given the notation and basic structures intro-
duced in the first two chapters, most of the following chapters can stand alone.
Hence, the reader, who is primarily interested in empirical applications and less in
statistical techniques, can decide to read first the fundamental Chapters 1 and 2,
then Chapter 5 and Chapter 6 followed by the empirical analyses in Chapters 11 and
12 alongside the more technically demanding Chapter 13 and to decide afterwards
which of the remaining chapters will be of interest to hirn or her.
Although it is not necessary for the reader to be familiar with all fundamental meth-
ods of multiple time series analysis, the subject of interest requires the application
of some formal techniques. A number of references to standard results are given
throughout the study, while to simplify things for the reader we have remained as
elose as possible to the notation used in LÜTKEPOHL [1991]. In order to achieve
compactness in our presentation, we have dispensed with a more general introduction
of the topic since these are already available in HAMILTON [1993], [1994b, ch. 22]
and KROLZIG & LÜTKEPOHL [1995].
Chapter 1
The Markov-Switching
Vector Autoregressive Model
This first chapter is devoted to a general introduction into the Markov-switching vec-
tor autoregressive (MS-VAR) time series model. In Seetion 1.2 we present the fun-
damental assumptions constituting this class of models. The discussion of the two
components of MS-VAR processes will clarify their on time invariant vector auto-
regressive and Markov-chain models. Some basic stochastic properties ofMS-VAR
processes are presented in Section 1.3. Finally, MS-VAR models are compared to
alternative non-normal and non-linear time series models proposed in the literature.
As most non-linear models have been developed for univariate time series, this dis-
cussion is restricted to this case. However, generalizations to the vector case are also
considered.
Reduced form vector autoregressive (VAR) models have been become a dominant
research strategy in empirical macroeconomics since SIMS [1980]. In this study we
will consider VAR models with changes in regime, most results will carry over to
structural dynamic econometric models by treating them as restricted VAR models.
When the system is subject to regime shifts, the parameters () of the VAR process
will be time-varying. But the process might be time-invariant conditional on an un-
observable regime variable St which indicates the regime prevailing at time t. Let
M denote the number of feasible regimes, so that St E {I, ... , M}. Then the con-
1.1. General Introduction 7
f(Ytl.Yt-l,fh) if St = 1
p(YtIYt-l' St) = { : (1.1)
f(YtlYt-l, BM ) if St = M,
where Bm is the VAR parameter vector in regime m = 1, ... , M and Yt-l are the
observations {Yt-j }~l'
Thus, for a given regime St, the time series vector Yt is generated by a vector auto-
regressive process of order p (VAR(P) model) such that
p
1 The notation Pr(·) refers to a discrete probability measure, while p(.) denotes a probability density
function.
8 The Markov-Switching Vector Autoregressive Model
system, and the effects of changes in regime. Secondly, the basic statist-
ical techniques have been introduced by BAUM & PETRIE [1966] and BAUM
et al. [1970] for probabilistic functions oi Markov chains, while the MS-
VAR model also encompasses older concepts as the mixture oi normal dis-
tributions model attributed to PEARSON [1894] and the hidden Markov-chain
model traced back to BLACKWELL & KOOPMANS [1975] and HELLER [1965].
Thirdly, in econometrics, the first attempt to create Markov-switching regression
models were undertaken by GOLDFELD & QUANDT [1973], which remained,
however, rather rudimentary. The first comprehensive approach to the statistical
analysis of Markov-switching regression models has been proposed by LINDGREN
[1978] which is based on the ideas ofBAUM et al. [1970]. In time series analysis, the
introduction of the Markov-switching model is due to HAMILTON [1988], [1989]
on which most recent contributions (as wen as this study) are founded. Finally,
our consideration of MS-VAR models as a Gaussian vector autoregressive process
conditioned on an exogenous regime generating process is closely related to state
space models as wellas the concept of doubly stochastic processes introduced by
TJ0STHEIM [1986b].
The MS- VAR model belongs to a more general class of models that characterize a
non-linear data generating process as piecewise linear by restricting the process to
be linear in each regime, where the regime is conditioned is unobservable, and only
a discrete number of regimes are feasible. 2 These models differ in their assumptions
conceming the stochastic process generating the regime:
2In the case oftwo regimes, POTTER [1990],[1993] proposed to call this dass of non-linear, non-normal
models the single index generalized multivariate autoregressive (SIGMA) model.
1.1. General Introduction 9
and the conditional mean E[YtIYt-l' St-l] is given by E[YtIYt_l].3 Even so,
this model can be considered as a restricted MS-VAR model where the trans-
ition matrix has rank one. Moreover, if only the intercept term will be regime-
dependent, MS(M)-VAR(P) processes with Gaussian eITors and i.i.d. switch-
ing regimes are observationally equivalent to time-invariant VAR(P) processes
with non-nor;mal eITors. Hence, the modelling with this kind of model is very
limited.
While the presumptions of the SETAR and the MS-AR model seem to be quite
different, the relation between both model alternatives is father cIose. This is
also illustrated in the appendix which gives an example showing that SETAR
and MS-VAR models can be ohservationally equivalent.
where 0 = (O~, ... , o~ Y collects the VAR parameters and {m is the ergodie probability of regime
m.
4In threshold autoregressive (TAR) processes, the indicator function is defined in a switching variable
Zt-d, d ?: O. In addition, indicator variables can be introduced and treated with error-in-variabIes
techniques. Refer for example to COSSLETT & LEE [1985] and KAMINSKY [1993].
10 The Markov-Switching Vector Autoregressive Model
gime 1. For example, TERÄSVIRTA & ANDERS ON [1992] use the logistic dis-
tribution function in their analysis ofthe V.S. business cycle. 5
(iv.) All the previously mentioned models are special cases of an endogenous se-
lection Markov-switching vector autoregressive model. In an EMS(M, d)-
VAR(P) model the transition probabilities Pij ( .) are functions of the observed
time series vector Yt-d:
In this study, it will be shown that the MS-VAR model can encompass a wide spec-
trum of non-linear modifications of the VAR model proposed in the literature.
SIf F(·) is even. e.g. F(Yt-d - r) = 1 - exp {-(Yt-d - r)2}. a generalized exponential auto-
regressive model as proposed by OZAKI [1980] and HAGGAN & OZAKI [1981] ensues.
1.2. Markov-Switching Vector Autoregressions 11
where Ut IID (O,~) and Yo, ... , Yl-p are fixed. Denoting A(L)
IK - Al L - ... - ApLP as the (K K) dimensional lag polynomial, we as-
°
X
sume that there are no roots on or inside the unit circle IA( z) I =I for Iz I ~ 1 where
L is the lag operator, so that Yt- j = Lj Yt . If a normal distribution of the error is
assumed, Ut "" NID (0, ~), equation (1.2) is known as the intercept form of a stable
Gaussian VAR(P) model. This can be reparametrized as the mean adjusted form of
a VARmodel:
Yt - Jl = AI(Yt-1 - Jl) + ... + Ap(Yt-p - Jl) + Ut, (1.3)
The main characteristic of the Markov-switching model is the assumption that the
unobservable realization of the regime St E {I, ... , M} is governed by a discrete
time, discrete state Markov stochastic process, which is defined by the transition
probabilities
M
Pij = Pr(St+1 = jlSt = i), LPij =1 Vi,j E {I, ... , M}. (1.4)
j=l
6 For reasons of simplicity in notation, we do not introduce a separate notation for the theoretical repre-
sentation of the stochastic process and i15 actual realizations.
7In the notation of state-space models, the varying parameters p., v, Al, ... , A p , 1; become functions
of the model's hyper-parameters.
12 The Markov-Switching Yector Autoregressive Model
if St = 1,
(1.6)
if St = M.
In the model (1.5) there is after a change in the regime an immediate one-time jump
in the process mean. Occasionally, it may be more plausible to assurne that the mean
smoothly approaches a new level after the transition from one state to another. In
such a situation the following model with a regime-dependent intercept term v(St)
may be used:
Yt (1.7)
In contrast to the linear VAR model, the mean adjusted form (l.5) and the intercept
form (1.7) of an MS(M)-VAR(P) model are not equivalent. In Chapter 3 it will be
seen that these forms imply different dynamic adjustments of the observed variables
after a change in regime. While a permanent regime shift in the mean J.L(St) causes
an immediate jump ofthe observed time series vector onto its new level, the dynamic
response to a once-and-for-all regime shift in the intercept term v(St} is identical to
an equivalent shock in the white noise series Ut.
In the most general specification of an MS-VAR model, all parameters of the au tore-
gression are conditioned on the state St of the Markov chain. We have assumed that
each regime m possesses its VAR(P) representation with parameters v(m) (or /-Lrn),
~m, Alm, ... , A jm , m = 1, ... , M, such that
However for empirical applications, it might be more helpful to use a model where
only some parameters are conditioned on the state of the Markov chain, while the
8 Even at this early stage a complication arises if the mean adjusted fonn is considered. The conditional
density of Yt depends not only on St but also on St-l, ... , St-p. i.e. MP+l different conditional
1.2. Markov-Switching Vector Autoregressions 13
other parameters are regime invariant. In Section 1.2.2 some particular MS-VAR
models will be introduced where the autoregressive parameters, the mean or the in-
tercepts, are regime-dependent and where the error term is hetero- or homoskedastic.
Estimating these particular MS-VAR models is discussed separately in Chapter 9.
In empirical research, only some parameters will be conditioned on the state of the
Markov chain while the other parameters will be regime invariant. In order to estab-
lish a unique notation for each model, we specify with the general MS(M) term the
regime-dependent parameters:
M Markov-switching mean ,
I Markov-switching intercept term ,
A Markov-switching autoregressive parameters,
H Markov-switching heteroskedasticity .
To achieve a distinction ofVAR models with time-invariant mean and intercept term,
Yt=
To achieve a distinction ofVAR models with time-invariant mean and intercept term,
we denote the mean adjusted form of a vector autoregression as MVAR(p). An over-
view is given in Table 1.1. Obviously the MSI and the MSM specifications are equi-
valent if the order of the autoregression is zero. For this so-ealled hidden Markov-
chain model, we prefer the notation MSI(M)-VAR(O). As it will be seen later on, the
MSI(M)-VAR(O) model and MSI(M)-VAR(P) models with p > 0 are isomorphie
eoneeming their statistical analysis. In Section 10.3 we will further extend the dass
of models under eonsideration.
The MS-VAR model provides a very flexible framework which allows for hetero-
skedasticity, oecasional shifts, reversing trends, and foreeasts performed in a non-
linear manner. In the following sections the foeus is on models where the mean
(MSM(M)-VAR(p) models) or the intereept term (MSI(M)-VAR(P) models) are
subject to oceasional diserete shifts; regime-dependent eovarianee struetures of the
proeess are eonsidered as additional features.
At this stage it is useful to define the parameter shifts more dearly by formulating the
system as a single equation by introducing "dummy" (or more precisely) indicator
1.2. Markov-Switching Vector Autoregressions 15
variables:
1 if St = m
I ( St = m ) = { o otherwise,
where m = 1, ... , M. In the course of the following chapters it will prove helpful
to coIlect aIl the information about the realization of the Markov chain in the vector
~t as
I(st = 1) 1
~t =[
l(s, ~ M)
Thus, ~t denotes the unobserved state ofthe system. Since ~t consists ofbinary vari-
ables, it has some particular properties:
[
Pr(s, := M) Pr(6 ~ 'M) ,
where [,m is the m-th column ofthe identity matrix. Thus E[~tl, or a weIl defined con-
ditional expectation, represents the probability distribution of St. It is easily verified
that 1~~t = 1 as well as ~:~t = 1 and ~t~: = diag (~t), where 1M = (1, ... , 1)' is
an (M x 1) vector.
For example, we can now rewrite the mean shift function (1.6) as
M
/L(st} =L /LmI(St = m).
m=I
:E ~M ]
(KxMK)
16 The Markov-Switching Vector Autoregressive Model
vech ( ~ )
~m, 0' = (0'1"",
I , )'
O'M
such that
is a (K X K) matrix.
Pu Pl2 PIM
Pu P12 PlM
where the past and additional variables such as Yt reveal no relevant information bey-
ond that of the actual state. The assumption of ajirst-order Markov process is not
especially restrictive, since eaeh Markov ehain of an order greater than one ean be
reparametrized as a higher dimensional first-order Markov process (cf. FRIEDMANN
[1994]). A comprehensive diseussion of the theory of Markov ehains with applica-
tion to Markov-switehing models is given by HAMILTON [1994b, eh. 22.2]. We will
1.3. The Data Generating Process 17
just give abrief introduction to some basic concepts related to MS-VAR models, in
particular to the state-space form and the filter.
It is usually assumed that the Markov process is ergodic. A Markov chain is said to
be ergodie if exactly one of the eigenvalues of the transition matrix P is unity and
all other eigenvalues are inside the unit circle. Under this condition there exists a sta-
tionary or unconditional prob ability distribution of the regimes. The ergodie probab-
ilities are denoted by [ = E[~tl. They are determined by the stationarity restriction
p' [ = [ and the adding up restriction 1~[ = 1, from which it follows that
If [ is strictly positive, such that all regimes have a positive unconditional probab-
ility [i > 0, i = 1, ... , M, the process is called irreducible. The assumptions of
ergodicity and irreducibility are essential for the theoretical properties of MS-VAR
models, e.g. its property of being stationary. The estimation procedures, which will
be introduced in Chapter 6 and Chapter 8 are flexible enough to capture even these
degenerated cases, e.g. when there is a single jump ("structural break") into the ab-
sorbing state that prevails until the end of the observation period.
After this introduction ofthe two components ofMS-VAR models, (i.) the Gaussian
VAR model as the conditional data generating process and (ii.) the Markov chain as
the regime generating process, we will briefly discuss their main implications for the
data generating process.
For given states ~t and lagged endogenous variables Yt-l = (Y~-l' Y~-2' ... ,Y~,
Y~, ... , Y~ _p)' the conditional probability density function of Yt is denoted by
p(Yt I~t, Yt-l). It is convenient to assume in (1.5) and (1.7) a normal distribution of
the error term Ut, so that
p(Ytl~t = t m , Yt-d
In(21l')-1/2In IEI- 1/ 2 exp{(Yt - Ymt)/E-;;/(Yt - Ymt)}, (1.10)
18 The Markov-Switching Vector Autoregressive Model
where the conditional means Ymt are summarized in the vector Yt which is e.g. in
MSI specifications of the form
Yt =
v, + 2::=,' AljYt-j ].
VM + 2: j =1 AMjYt-j
Assuming that the information set available at time t - 1 consists only of the sampie
observations and the pre-sample values collected in Yt-l and the states of the Markov
chain up to ~t-lo the conditional density of Yt is a mixture of normals9 :
If the densities of Yt conditionaI on ~t and Yt-l are collected in the vector 1]t as
1]t = (1.13)
(1.14)
9The reader is referred to HAMILTON [1994a] for an excellent introduction into the major concepts of
Markov chains and to TITTERINGTON, SMITH & MAKOV [1985] forthe statistical properties ofmix-
tures of normals.
1.3. The Data Generating Process 19
Since the regime is assumed to be unobservable, the relevant information set avail-
able at time t - 1 consists only of the observed time series until time t and the unob-
served regime vector et has to be replaced by the inference Pr(et!Y-r). These prob-
abilities of being in regime m given an information set YT are denoted ~mtlT and
coIIected in the vector e* as
Pr(et = LIIYT) 1
e* = [
Pr(et = l.MIYT)'
which allows two different interpretations. First, etl T denotes the discrete conditional
prob ability distribution of et given YT • Secondly, e* is equivalent to the conditional
mean of et given YT • This is due to the binarity ofthe elements of ~t, which implies
that E[emt] = Pr(emt = 1) = Pr(St = m). Thus, the conditional probability
density of Yt based upon Yt-l is given by
M
L p(Yt, et-l = LmlYi-d
m=l
M
L p(Ytlet-l = l.m, Yi-d Pr(et-l = LmlYt-d (1.15)
m=l
Assuming presample values Yo are given, the density of the sampie Y YT for
given states e is determined by
T
Hence, the joint prob ability distribution of observations and states can be calculated
as
20 The Markov-Switching Vector Autoregressive Model
Finally, it follows by the definition of the conditional density that the conditional
distribution of the total regime vector ~ is given by
Pr(~IY) = p;~~) .
Thus, the desired conditional regime probabilities Pr(~tIY) can be derived by mar-
ginalization of Pr(~IY). In practice these cumbrous calculations can be simplified
by a recursive algorithrn, a matter which is discussed in Chapter 5.
The regime probabilities for future periods follow from the exogenous stochastic
process of ~t, more precisely the Markov property of regimes, Pr(~T+hl~T, Y) =
Pr(~T+hl~T ),
L Pr(~T+hl~T, Y) Pr(~TIY)
~t
L Pr(~T+hl~T) Pr(~TIY).
~t
where Pis the transition matrix as in (1.8). Forecasting MS-VAR processes is dis-
cussed in fulliength in Chapter 4.
In this section we have given just a short introduction to some basic concepts related
to MS-VAR models; the following chapters will provide broader analyses of the vari-
ous topics.
1.4. Features of MS-VAR Processes and Their Relation to Other Non-linear Models 21
The Markov switching vector autoregressive model is a very general approach for
modelling time series with changes in regime. In Chapter 3 it will be shown that MS-
VAR processes with shifting means or intercepts but regime-invariant variances and
autoregressive parameters can be represented as non-normal linear state space mod-
els. Furthermore, MSM-VAR and MSI-VAR models possess linear representations.
These processes may be beuer characterized as non-normal than as non-linear time
series models as the associated Wold representations coincide with those of linear
models. While our primary research interest concems the modelling of the condi-
tional mean, we will exemplify the effects of Markovian switching regimes on the
higher moments of the observed time series.
Most of them are made for two-regimes. Thus, the process generating Yt can be re-
written as
where the critical value ß(l depends on the ergodie regime probability ~lr e.g.
ßO.5 = 2 and ßO.1 = .6. 0 . 9 = 3.
Thus, the excess kurtosis is different from zero if O"r "# O"~ and 0 < ~l < 1.
Box & TIAO [1968] have used such a model for the detection of outliers. In order
to generate skewness and excess kurtosis it is e.g. sufficient to assurne an MSI(2)-
AR(O) model:
1.4. Features of MS-VAR Processes and Their Relation to Other Non-linear Models 23
so that
Then it can be shown that the normalized third moment oJYt is given by the skewness
If the regime i with the highest conditional mean /-Li > /-Lj is less likely than the other
regime, ~i < ~j, then the observed variable is more likely to be Jar above the mean
than it is to be Jar below the mean.
Since we have that max[l E[D,I) {~l (1- ~d} = ~ < t, the excess kurtosis is positive,
i. e. the distribution oJYt has more mass in the tails than a Gaussian distribution with
the same variance.
On the other hand, even if the white noise process Ut is homoskedastic, (7"2 (St) = (7"2,
Yt - J.-L
with Ut ""' NID (0, (7"2) and serial correlation in the regimes according to the trans-
ition matrix P. Employing the ergodie regime probability [1, Yt can be written as
errors since the predicted regime probabilities generally are non-linear functions of
Yt-l.
Recently some approaches have been made to consider Markovian regime shifts in
variance generating processes. The dass of autoregressive conditional heteroske-
dastic processes introduced by ENGLE [1982] is used to formulate the conditional
process; OUf assumption of an i. i .d. distributed error term is substituted by an ARCH
process Ut, cf. interalia HAMILTON & LIN [1994], HAMILTON & SUSMEL [1994],
CAI [1994] and HALL & SOLA [1993b]. ARCH effects can be generated by MSA-
AR processes which will be considered in the next section.
where I = -al a2 > 0 and ct is white noise. Thus, ARCH( 1) models can be inter-
preted as restricted MSA(2)-AR( 1) models.
26 The Markov-Switching Vector Autoregressive Model
It is worth noting that the stability of each VAR sub-model and the ergodicity of the
Markov chain are sufficient stability conditions; they are however not necessary to
establish stability. Thus, the stability of MSA-AR models can be compatible with
AR polynomials containing in some regimes roots greater than unity in absolute
value and less than unity in others. Necessary and sufficient conditions for the stabil-
ity of stochastic processes as the MSA-VAR model have been derived in KARLSEN
[1990a], [1990b]. However in practice, their application has been found to be rather
complicated (cf. HOLST et al. [1994]).
In this study we will concentrate our analysis on modelling shifts in the (conditional)
mean and the variance of VAR processes which simplifies the analysis.
In the preceding discussion ofthis chapter MS(M)-VAR(p) processes have been in-
troduced as doubly stochastic processes where the conditional stochastic process is a
Gaussian VAR(P) and the regime generating process is a Markov chain. As we have
seen in the discussion of the relationship of the MS-VAR model to other non-linear
models, the MS-VAR model can encompass many other time series models proposed
in the literature or replicates at least some of their features. In the following chap-
ter these considerations are formalized to state-space representations of MS-VAR
models where the measurement equation corresponds to the conditional stochastic
process and the transition equation reflects the regime generating process. In Sec-
tion 2.5 the MS-VAR model will be compared to time-varying coefficient models
with smooth variations in the parameters, i.e. an infinite number of regimes.
lOModels where the regime is switching between deterministic and stochastic trends are considered by
MCCULLOCH & TSAY [1994a].
1.A. Appendix: A Note on the Relation of SETAR to MS-AR Processes 27
While the presumptions ofthe SETAR and the MS-AR model seem to be quite differ-
ent, the relation between both model alternatives is rather dose. Indeed, both models
can be observationally equivalent, as the following example demonstrates:
For d = 1 it has been shown by CARRASCO [1994, lemma 2.2] that (1.19) is a par-
ticular case 01 the Markov-switching model
ijYt-l ::; r
ifYt-l >r
such that
Hence Stlollows afirst order Markov process where the transition matrix is defined
as
p = [P11 P12] = [ cI>(r-/1) cp( I'-l;r) ].
P2I P22 cp(r-1'-2)
(J
cp(1'-2- r )
(J
28 The Markov-Switching Vector Autoregressive Model
2. Parameter estimation & testing: If the parameters of the model are un-
known, classical Maximum Likelihood as weIl as Bayesian estimation meth-
ods are feasible. Here, the filter and smoother recursions provide the analytical
tool to construct and evaluate the likelihood function. See Chapters 6 - 9.
30 The State-Space Representation
The framework for the statistical analysis of MS-VAR models to be presented in the
next chapters is the state-space form. The advantage of viewing MS-VAR models in
this way is that general concepts can be introduced as the likelihood principle (Chap-
ter 6) and a recursive filter algorithm (Chapter 5) which corresponds to the KaIman
filter in Gaussian state-space models.
For particular MS-VAR processes, a state-space representation with ~t as the state
vector has been introduced by HAMILTON [1994a].1 In the following section we
investigate some state-space representations of MS-VAR models. These representa-
tions are then used to work out general properties of MS-VAR processes, inter aUa
we discuss the non-normality of the state-space form, we formulate conditions for
the linearity of the state-space representation and we show that the joint process of
observed variables and regimes, ( y~, ~~)', is Markovian. In Seetion 2.2 the specific-
ation of the state-space representation is discussed with regard to its adaptation to the
particular MS-VAR models proposed in Chapter 1. In the remaining sections, three
alternative state-space representations of MS-VAR processes are introduced which
will create new insights into the theory of MS-VAR processes and will be used later
on. In Section 2.3 the adding-up restriction on the state vector is eliminated by re-
ducing its dimension. Section 2.4 formulates the state-space representation in the
predicted state vectOf. Section 2.4 presents a state-space form in the vector of VAR
coefficients which allows a comparison with other time-varying coefficient models.
The state-space model given in Table 2.1 consists of the set of measurement and
transition equations. The measurement equation (2.1) describes the relation between
the unobserved state vector ~t and the observed time series vector Yt. Here, the pre-
determined variables X t and the vector of Gaussian disturbances Ut enter the model.
(2.3)
(2.4)
(2.5)
32 The State-Space Representation
The state vector ~t follows a Markov chain subject to a discrete adding-up restriction.
In this study, the Markov chain is assumed to be homogeneous, i.e. F t = F.
If there are restrictions on the parameter space of ß, e.g. due to regime invariant
parameters, it is sometimes more useful to consider the following formulation of the
measurement equation which is linear in parameters:
(2.7)
where the parameter vector 10 contains the regime-invariant parameters and Im,
m = 1, ... , M are the regime-dependent parameter vectors.
The transition equation (2.2) reflects a further property of the regime generating pro-
cess described in Seetion 1.2.4. Indeed, the Markov chain governing the state vector
~t can be represented as a first-ordervector autoregression (cf. HAMILTON [1994b]):
The last equation implies that the innovation Vt is an martingale difference series.
Although the vector Vt can take on only a finite set of values, the mean E[Vtl =
2 Some information about the necessary updates of filtering and estimation procedures under non-
normality of Ut are provided by HOLST et al. [1994].
2.1. ADynamie Linear State-Space Representation for MS- VAR Processes 33
Analogously, the probability distribution of the discrete innovation process Vt+ 1 con-
ditional on ~t is given for the discrete support {Lj - p' ~t}f=1 by
Due to the discrete error process {vt} given in (2.4), the state-space representation
of MS-VAR processes is non-normal. This non-normality reftects the transition of
regimes causing parameter variations which are not smooth but abrupt. Therefore
the KaIman filter does not produce an optimal inference; instead the so-called BLHK
filter will be introduced in Chapter 5 for the evaluation of the likelihood function of
this non-normal system.
It can easily be checked that the MS-VAR model without regime-dependent autoco-
variances can be considered as adynamie linear state-space model as suggested by
HARRISON & STEVENS [1976]:
Yt (2.9)
~t+I (2.10)
34 Tbe State-Space Representation
Yt
- +,,1/2
= [Ylt Lll
- +,,1/2
Ut,···, YMt
Je
LI M Ut <"t,
The representation (2.1 )/(2.2) indicates that the joint process {( Y~, ~D/} ofthe state
vector ~t and the stacked vector of observed variables Yt is Markovian, where p ad-
joining observable variables vectors are collected to Yt = (y~, Y~-1 , ... , Y~-p+ 1)' .
Thus, the relevant information concerning the evolution of the system output in the
3For MS-VAR processes the following reformulations of Ht~t = [!IIt, ... , YMt] ~t
2::::=1 ~1ntYmt are possible and each ofthem is useful for particular purposes
Xt B 6 = (x~ ® IK)B~t
= (x~®IK)(~~®IK2R)ß
where ß = (ßi, ... , ß'u)' = vec (B) and ßm = vec (B Tn ). Obviously these transformations hold
true for any (K x KR) matrix X t = (Xt ® IK).
2.1. ADynamie Linear State-Space Representation for MS- VAR Processes 35
future (Y~+h' ~t+h)', h > 0 is completely provided by the actual state (Y~,~:)', while
the past reveals no additional information.
since the conditional distribution depends only on the distribution of the error term
Ut which is assumed to be independent of Yt-l.
This is caused by the information contained in the history of the observed variable
Yt- p on the distribution of state vector ~t, thus
a.e.
Pr(~t = I,mlYt-d =1= Pr(~t = I,mIYt-l)'
A necessary and sufficient condition for the existence of a linear state-space repre-
sentation is that there are no regime-dependent heteroskedasticity or autoregressive
dynamies. This condition is guaranteed only for processes where the variance para-
meters, E(St) = E, and the autoregressive parameters, Aj(St) = Aj,j = 1, ... ,P,
are regime-invariant. Thus, only MSI(M)-VAR(P) and MSM(M)-VAR(P) possess
a linear MA(oo) representation of Yt in the innovations {Ut-j }~o and {Vt-j }~o'
36 The State-Space Representation
(A8a) MSI-VAR: F t = P~
(A8b) MSM-VAR: F t = diag (vec Pt ® 1 Mp - l )(lM ® IMP 0 1 ~)
E~[ :'
0
1[ ]
VI·· .vM [ vI·· .VM ] [ V ... V ] [ V ... v ]
0 ... 0 °l···OM 0 ... 0 °1···oM
r;M
As general filtering and estimation methods discussed later are based on this general
formulation of MS-VAR models, we have to devote some attention to the relation
of the special MS-VAR model introduced in the last chapter (cf. Table 1.1) to the
state-space representation in Table 2.1.
In Table 2.2 an overview over possible restrictions on the parameter space is given
in a systematic presentation. For MSI specifications, Table 2.3 demonstrates that the
formulation of the state-space representation is straightforward. B ut as Table 2.2 also
indicates, the state-space representation is able to capture even more general specific-
ations.
In MSM-VAR specifications as such equation (1.5), a difficulty arises from the fact
that the conditional density of Yt depends on the last p + 1 regimes,
a.e.
p(YtlSt, St-I,·· ., St-p, Yt-d =f:. p(YtISt, Yt-d·
Thus, Yt and St are not Markovian while the joint process of observable variables
Yt and the regimes (St, St-}, ... , St-p) is again Markovian.
c _ c(1) I(St:= 1) ]
c"t - c"t [
I(St = M)
(I M j 0 IM 01 ~p_j) d+p 1)
Pr(~t+1 = C+1I~t)
P r (~t+l
(1) _
-
*(1)
~t+l' ~t
(1)
= ~t
*(l)
, ..• ,
(1) _ C*(I) IC(l) C(l)
~t-p+l - c"t-p+l c"t , c"t-l" •• , c"t-p
c(1) )
or in matrix notation
where ® is the Kronecker product and 0 denotes the element-wise matrix multiplic-
ation. Therefore, we have
(2.11)
4For all remaining MS(M)- VAR(P) models employing an MSM specification, the measurement equa-
tions are given in the Tables 9.19-9.20.
40 The State-Space Representation
where Ut rv NID (0, E) and the (K x MP+I) input matrix His given by
p
H -'"'A·ML·
~ J J
(2.12)
j=O
[IK,-AI, ... ,-Ap] [Ip+10(JLI, ... ,JLM)] [L~, ... , L~]' (2.13)
This procedure alters the state-space representations considered so far as the new
state vector (t is only M - 1 dimensional:
1'
:F [
([M -I)x[M -1))
Pl,M-l ~ PM,M-l PM-I,M-I - PM,M-I
ß [ ß1 - ß M . .. ß M-1 - ß M ].
(Rx[M-1])
Obviously, the j-th row of ß is equal to zero, if the j-th element of the coefficient
vector is regime-invariant.
Xtß(t+Ut, (2.15)
F (t + Vt, (2.16)
The state-space forms considered so far have been fonnulated in the unobserved state
vector ~t. For forecasting it is more practical possessing a state-space representation
in the inferred state vector €tjt-l = E[~tlyt-ll.
42 The State-Space Representation
In the measurement equation (2.9), the innovations reflect only the error tenn Ut for
a given regime vector {t,
(2.17)
wh ich is however not in the infonnation set at time t - 1. Since the regime is un-
observable, the one-step prediction of the regime vector {tlt-l E[{tIYt-d, is
provided by
(2.18)
Equation (2.18) uses that the evolution of regimes is governed by the Markov chain,
and therefore the expectation of {t based on an infonnation set containing {t-l would
be given by
(2.19)
Thus, for given parameters, the prediction of the observable vector of variables Yt
can be derived by inserting ttlt-l into the measurement equation and using that
E[UtIYt-l] = E[Ut] = 0:
(2.20)
The resulting predictor Ytlt-l = X t B ttlt-l compared with the observed Yt gives
the prediction error et wh ich denotes the deviation of the realization Yt from its one-
step predictions Ytlt-l = E[YtIYt-d:
(2.21)
Since et represents the unpredictable element of the observed time series Yt, it is
called the innovation process.
The prediction error et can be decomposed into two components: (i.) the Gaussian
innovation Ut affecting the measurement equation and (ii.) the effects caused by re-
gime prediction errors Ct = ~t - E[~t\Yt-d. Thus,
(2.22)
These may be compared with the expectation of Yt for an infonnation set containing
Yt-l and {t-l:
2.4. Prediction-Error Decomposition and the Innovation State-Space Form 43
(2.23)
An innovation Ct may have two sources: (i.) the unpredictable innovation Vt of the
regime generating process and (ii.) the error ~t-l - €t-llt-l in the reconstruction
of the regime vector at time t - 1. Analogously to et, the regime prediction error
Ct can be considered as the innovation in the regime generating process given the
information set Yt-l. Since, strictly speaking, we are interested in the inferred re-
gime vector ~t+llb we have to derive the innovation term of the modified transition
equation:
ft E[~t+lIYt] - E[~t+1IYt-l]
F {- (~t - €tlt) + (~t - F~t-d + F (~t-l - €t-1It-l) } (2.24)
F (Vt - Et + FEt-l),
Yt X t B ~tlt-l + et (2.25)
Note, that the original formulation in AOKI [1990] and AOKI & HAVENNER [1991]
assurnes that the involved matrix H t = X t B is time-invariant. This presumption
44 The State-Space Representation
For the following analysis the veetor of prevailing parameters is partitioned into the
regime-dependent parameters ß: = ßS + ßS (t and the regime-invariant parameters
ß~ =ßo,thusßt = (ßr,ß:')'andß= (0, ßS')"whererk ß=min{M-l,RS}
and RS is the number of regime-dependent parameters. Analogously the matrix of
explanatory variables X t is split into the (K x RS) matrix X t and (K x [R - RS])
matrix X?, X t = (X?, Xt), where R - RS is the number of regime-invariant para-
meters.
The solution of this problem depends on the rank of ßS. If RS = M - 1, i.e. the
number of regime-dependent parameters is equal to the number of regimes minus
2.5. The MS- VAR Model and Time-Varying Coefficient Models 45
one, there exists a unique solution and the transition equation is given by
where Wt+1 = ß SVH1 and F ß' has fuH rank, rk F ß' =M - 1 = RS.
where F ß' has reduced rank, rk F ß" = M - 1 < RS, and the variance-covariance
matrix of Wt+ 1 is singular. Therefore, we will find some common shifts in the para-
meters, as long as the number of regimes M - 1 is less than the number of para-
meters. If RS <M - 1, i. e. the number of regime-dependent parameters is less than
the number of regimes minus one, there exists no linear transition equation in ßt.
the 'state' is the vector of (regime-dependent) parameters ((3: - ßS) and no longer
the regime or more precisely the vector of indicator variables ~t. Again the VAR(l)
representation in (2.28) can cover as usual higher order dynamics for ßt , if the state
vector is defined as (3t = (ß~, ... , (3~-q)' and (t~ 0I R )(3t is used in the observation
equation.
Hence, the MS-VAR model under consideration can be characterized as a time vary-
ing regression model, where all eigenvalues of F (3' are inside the unit circ1e and the
innovation process Wt+1 entering the transition equation is non-normal. The uncon-
ditional mean of (3t, ß= B[, has the interpretation as the average or steady-state
coefficient vector.
46 The State-Space Representation
Yt Xt(ß t - 13) + Ut
F 1 (ß t - 13) + ... + F q(ßt+1-q - 13) + Vt,
where Ut and Vt are Gaussian white noise. Ifthe Gaussian VAR(q) process is stable,
we have the return to normality model proposed by ROSENBERG [1973]. As in
(2.28) the time varying coefficients ßt fluctuate around their constant means ß. The
difference consists in the fact that the fluctuations of the parameters are not gener-
ated by a 'smooth' linear Gaussian system, but by a 'jumping' discrete-state Markov
chain.
In contrast to most other stochastically varying models where the variations in the
regression coefficients are assumed to be normally distributed, the transitions of
the parameter vector in the MS-VAR model are not smooth but abrupt. Theyare
neither transient as in the HILDRETH & HOUCK [1968] model nor permanent as in
a random-walk coefficients model. While this representation elarifies the relation
of the MS-VAR model to other regression models with stochastically varying coeffi-
cients, the state-space form is heavily restricted and it is not recommended as a device
for empirical research.
This chapter has laid out the formal framework for the statistical analysis ofMS-VAR
models. Before we consider the issue of statistical inference, we compiete the dis-
cussion of modelling MS-VAR processes by deriving VARMA representations for
MSM(M)- VAR(P) and MSI(M)- VAR(P) processes wh ich emphasize the elose rela-
tion of this MS-VAR sub-elass to linear systems.
Chapter 3
The previous chapter introduced the state-space representation as the basic tool
for describing vector autoregressive processes with Markovian regime shifts. This
chapter looks in greater depth at the relationship between Markov-switching vector
autoregressions and linear time series models. We develop a finite order VARMA
representations theorem for vector autoregressive processes with Markovian regime
shifts in the mean or the intercept term of the multiple time series. This result gener-
alizes concepts recently proposed by POSKITT & CHUNG [1994] for univariate hid-
den Markov-chains, and by KROLZIG [1995] for univariate MSM(M)-AR(P) and
MSI(M)-AR(P) processes.
We consider MS-VAR models where the mean J.L( St) (the MSM(M)-VAR(P) model)
or the intercept tenn v(St) (the MSI(M)-VAR(P) model) are subject to occasional
discrete shifts while the variance ~(St) and the autoregressive parameters Ai(st),
i = 1, ... ,p, of the time series are assumed to be regime invariant. Three alternative
models will be distinguished:
Yt = J.L( st} + Ut
(iL) MSM(M)-VAR(p) Processes, p >0
A(L) (Yt - J.L(st}) = Ut
A(L) Yt = v(St) + Ut
A common feature of the models under consideration is that the observed process Yt
may be considered as the sum oftwo independent processes: a non-linear time series
process J.Lt and a linear process Zt. The models differ only in the definition of these
processes.
Yt J.Lt + Zt,
J.Lt (3.1)
Zt Ut,
MSM(M)-VAR(P) Processes
Yt /-Lt + Zt,
/-Lt M~t, (3.2)
A(L) Zt Ut, Ut "" IID (0, I: u ).
MSI(M)-VAR(P) Processes
Yt /-Lt + Zt,
A(L) /-Lt M~t, (3.3)
A(L) Zt Ut, Ut "" IID (0, I: u ).
To simplify the notation, we are using here the same shift function /-L(St) and mat-
rix M as for the MSI(M)- VAR(P) model, where the quantities represent the regime-
dependent intercept terms.
Clearly all the features just described about the processes /-Lt and Zt are translated
into similar features inherited by the observed process Yt.
These observations will be formalized in the following state-space representation
of Markov-switching autoregressive processes. To derive the properties of models
1 While the hidden Markov-chain model is not widely used in econometrics, it has received considerable
attention in engineering, see e.g. LEVINSON et al. [1983) and POSKlTT & CHUNG [1994)). Hence,
there exists aseparate field of literature dealing with this model, starting with B LACKWELL & Koop-
MANS [1975] and HELLER [1965]. More recently, estimation methods have been discussed by VOlNA
[1988), LEROUX [1992], and QIAN & TlTTERINGTON [1992].
50 VARMA-Representation of MSI-VAR and MSM- VAR Processes
only the remaining M - 1 states are considered. The transition probabilities and
the regime conditioned means (or intercepts) are collected in the matrices :F and M,
such that
1'
:F [
([M-l)X[M-l])
Pl,M-l ~ PM,M-l PM-l,M-l - PM,M-l
Zt Al Ut
Zt-l IK o
Zt A= Ut=
Zt-p+l 0 o o
The linear state-space representation defines MSM- and MSI-VAR processes as lin-
early transformed VAR(1) processes:
Yt - J.Ly
3.1. Linearly Transformed Finite Order VAR Representations 51
Zt Ut·
MSM(M)-VAR(P) Processes
Yt - /.Ly M(t + Jz t ,
(t F(t-l + Vt, (3.6)
Zt AZt - 1 + Ut·
MSI(M)-VAR(P) Processes
Zt Ut,
where /.Ly is the unconditional mean of Yt, J = [,~ 0 IK, and [,1 is the first colurnn
of the identity matrix. The equation systems (3.5), (3.7), and (3.6), allow a state-
space representation, where the state vector Xt consists of the Markov chain (t and
the Gaussian process Zt.
Since our analysis focuses on the prediction of shifts in the mean of an observed time
series vector, but i.i.d. switching regimes produce only a mixture distribution ofthe
prediction errors, this rank condition seems to be rather reasonable. Furthermore,
rk F = M -1 can be seen as an identifying restriction. Without additional assump-
tion conceming the distribution of Ut, an MS(M)-VAR(P) model with i.i.d. regimes
52 VARMA-Representation of MSI- VAR and MSM- VAR Processes
Example 7 Consider an MS(2)- VAR(p) model with i.i.d. switching regimes. The
assumption Plm = P2m implies rk (F) = 0 as well as tt+1l t = [. Thus the
actual regime reveals no information about the future, i.e. 1It+1l t = H t +1[ and
E[Yt+lIYt,8t = 1] = E[Yt+lIYt,8t = 2]. Therefore, theconditionalmeanofYt+1
remains unaltered if states 1 and 2 are joined in a common state, M* = 1, with
ß* = ß= B [ and a transition probability of unity.
Yt Httt1t-l + et
HtF~t-llt-l + et
HtSAl/2T'~t_llt_l + et.
where deg(·) denotes the degree of a polynomial, A(L)* is the adjoint of A(L) and
A ij (L) is the i, j-th co-factor of A(L).
POS KITT & CHUNG [1994] consider a hidden Markov chain, which in our notation
is a univariate MSI(M)-AR(Q) model.
Using the state-space representations and the methodology of Lemma 1, the result
of POSKITT & CHUNG [1994] can be easily extended to vector systems.
M-1,
where ct is a zero mean vector white noise process, ')'(L) = 1 - ')'1 L - ... -
')'M-1L M-l is the scalar AR operatorand B(L) = IK - B1L - ... - BM-1L M-l
[M IK 1x" with Xt = [ ~: l
Xt
[: ~] X'-l + [ :: ].
3.2. VARMA Representation Theorems 55
Solving the transition equation for Xt and inserting the resulting vector MA( (0) re-
presentation for Xt in the measurement equation results in
F(l) 0 -1 [ Vt ]
(Yt - J-Ly) = [M IK ] [ 0 IK ] Ut'
where F(l) = IM -1 - FL. Applying Lemma 1 we get the final equation form of a
VARMA(M - 1,M - 1) model:
Proof The proof is a simple extension of the previous one. Consider the process
Y; = A(l) (Yt - J-Ly). Since the relation A(l) (Yt - J-Ly) = M(t + Ut holds by
definition, the transformed process Y; satisfies the conditions of Proposition 2. This
MSI(M)-VAR(O) process has the VARMA(M -1, M - 1) representation
If Yt is a vector valued process, we have to take into account that equation (3.9) is
not a final equation form. Multiplying with the adjoint A(L)* gives the final equation
form
where Ct is a zero mean vector white noise process; under quite general regularity
conditions, ,(L) is a scalar lag polynomial of order M + Kp - 1, and B(L) is a
(K x K) dimensional lag polynomial of order M - K p - 2.
Yt - ILy [M J 1
Xt with Xt = [ :: 1
Xt
[: : 1 Xt-l + [ :: 1
satisfies obviously the conditions of Lemma 1. Therefore, we have the final equation
form VARMA(M + Kp - 1,M + Kp - 2) model:
order of IF(L)I, M - 1, plus the order of IA(L)I, Kp, while the order ofthe matrix
MA polynomial in the lag operator equals
In general, it is not ensured that the relations between the order of the MS-VAR
model and the VARMA representation given in Propositions 3 and 4 hold with equal-
ity. In exceptional cases, where regularity conditions are not satisfied, they give just
upper bounds for the order of the VARMA representations.
This section illustrates the correspondence between VARMA models and the MS-
VAR processes by deriving the autocovariance function of MSI-VAR and MSM-
VAR processes. The autocovariance function (ACF) provides a way to determine the
parameters of the VARMA representation as functions of the underlying MS-VAR
parameters.
As we have seen in the preceding sections, the observed process Yt is the sum of the
two independent processes I1t and Zt. Hence, the moments of Yt are determined by
those of the non-linear time series process I1t and the linear process Zt:
In order to derive some basic results conceming the unrestricted state vector (t, recall
the transition equation of Seetion 2.3
(3.11)
where Vt+l are the non-Gaussian innovations ofthe Markov chain. By repeated sub-
stitution of (3.11), the state (t results as a weighted sum of all previous innovations
Vt-j,j ~ 0:
00
(3.13)
For a deterministic or a stochastic initial state (0, whose distribution is not identical
with those of the steady-state distribution derived above, (0 -f 0, the resulting sys-
tem represents neither a mean nor a variance stationary process as long as F =f. O.
3.3. The Autocovariance Function of MSI-VAR and MSM- VAR Processes 59
However, if all eigenvalues of F are less than unity in absolute value, F t -+ 0 for
t -+ 00, and the inftuenee of an initial state (0 disappears asymptotically. Analog-
ously, the responses of the system to past impulses diminish. In Markov ehain mod-
els, the assumptions of ergodieity and irreducibility guarantee that all eigenvalues of
F are less than unity in absolute value, and that the innovation proeess Vt is station-
ary. Hence {(t} is asymptotically stationary.
Ifthe process has an infinite history (or a stoehastie initial state (0 with (010 = 0),
the first and second moments of (t are determined by
E[(tl 0 (3.16)
[1(1- [d
I:, := Var ((t) =[
-[M-1[1
E[~l
Yt - /-Ly (3.19)
where Ut "'" NID (0, ::Eu), the mean /-Ly = M [ of the observed time series Yt is
determined by the ergodie probabilities [. Using the independence of the innovations
Ut and Vt, the variance of Yt is seen to be
(3.20)
(3.21)
Since the hidden Markov chain model exhibits no autoregressive structures, the serial
dependence of the regimes is the only source of autocorrelations in the data. This is
illustrated in the following example:
r y(o)
ry(h)
With r y(h) = (Pl1 + P22 - 1)ry(h - 1)for h > 1, the ACF ofthe MS/(2)-AR(O)
process corresponds to the ACF of an ARMA( J, J) model.
While the ACF of the hidden Markov chain is exclusively deterrnined by the Markov
chain, MSM(M)-VAR(p) processes exhibit more complex dynamies. From (3.2),
3.3. The Autocovariance Function of MSI-VAR and MSM- VAR Processes 61
Yt - /ly
00
(3.22)
L
00
Zt Aj Ut-j,
j=O
(3.23)
Zt
where Ut rv NID (0, aD. The unrestricted regime process (t possess the usualAR(1)
representation:
where Vt is a non-Gaussian white noise process. From equations (3.23) and (3.24)
follows that the ACF is given by
62 VARMA-Representation of MSI- VAR and MSM- VAR Processes
for h ~ 0 and r y(h) = r y(- h) for h < O. Under the regularity conditions 0:1 f:. 0,
(PU + P22 - 1) f:. 0 and 0:1 f:. (PU + P22 - 1), equation (3.25) corresponds to the
ACF of an ARMA(2, 1) model, such that
The ACF of an MSI(M)-VAR(P) process (3.3) can be traced back to the ACF of
a hidden Markov chain Xt on which the linear filter A(L) = IK - 'L;=1 AjLj is
applied
Xt
Xt
where the mean J-Ly = (I K - 'L;=1 Aj ) -1 D of the observed time series Yt is de-
termined by the ergodic probabilities t. Note that the ACF of Xt is given by (3.21),
such that
ME,M' +E u for h = 0,
rx(h) = { h (3.26)
MY E(M' for h > 0,
and r x(h) = r x( -h) for h < O. Furthermore, the covariances of Yt and Xt
are given by E[Xt(Yt - J-ty)') = ME[(tY~) + Eu and E[Xt(Yt-h - J-ty)']
FhE[(t_hY~_h) for h > 0, where we have used that
and thus (E[(tY~] - 'L j=1 .PE[(tY~]A~) = E, M'. Hence the ACF of an
MSM(M)-VAR(P) process is determined by
p
ME[(tY~] + Eu for h = 0
ry(h) - LAjry(h - j)
j=1 = { M FhE[(tY~] for h >0
(3.27)
3.3. The Autocovariance Function of MSI-VAR and MSM- VAR Processes 63
Then, the ACF is determined by the inhomogeneous system of linear difference equa-
tions
where r y(h) = r y( -h) for h < 0 and u; = (1 - f2)[1 (1 - [d. Thus the ACF of
an MSI(2)-AR( I) process can be calculated recursively for h > 1
which corresponds to the ACF 0/an ARMA(2, 1) model as it has been stated in Pro-
position 4.
64 VARMA-Representation of MSI- VAR and MSM- VAR Processes
MSI(M)-AR(p) MSM(M)-AR(P)
3.4 Outlook
For the hidden Markov-chain model, POSKITT & CHUNG [1994] provide consistent
statistical procedures for identifying the state dimension of the Markov chain based
on linear least-squares estimations. In Section 7.2 we propose for MSM(M)-AR(P)
and MSI(M)-AR(P) models a specification procedure based on an ARMA(p*, q*)
representation which is dosely related to POS KITT & CHUNG. An overview is given
in Table 3.1 for univariate ARMA processes.
The dass of models considered in this chapter is restrictive in the sense that the or-
der of the AR polynomials cannot be less than the MA order (under regularity con-
ditions). In order to generate ARMA(p*, q*) representations where p* < q* holds,
it would be necessary to introduce MS(M)-ARMA(P, q) models, which are compu-
tationally unattractive, or to use the approach introduced in Section 10.2. There we
generalize the MSI(M)-VAR(P) model to an MSI(M, q)-VAR(P) model character-
ized by an intercept term which does not depend only on the actual regime St, but is
also conditioned on the last q regimes.
Chapter4
One major objective of time series analysis is the creation of suitable models for pre-
diction. It is convenient to choose the optimal predictor Yt+hlt in the sense of a min-
imizer of the mean squared prediction eITor (MSPE),
(4.1)
Then it is also quite standard, see e.g. LÜTKEPOHL [1991, section 2.2] , that the op-
timal predictor Yt+hlt will be given by the conditional mean for a given information
set T t
(4.2)
In contrast to linear models, the MSPE optimal predictor Yt+hlt usually does not
have the property of being a linear predictor if the true data-generating process is
non-linear. In general, the derivation of the optimal predictor may be quite complic-
ated in empirical work. An attractive feature of the MS-VAR model as a c1ass of
non-linear models is the simplicity of forecasting if the optimal predictor (4.2) is ap-
plied. In the following section, the optimal predictor of MS-VAR processes is de-
rived. The properties of this predictor are shown for the MSM-VAR model in Sec-
tion 4.2 and for the MSI-VAR models in Seetion 4.3. Then, problems wh ich arise
with MSA -VAR models are discussed and an approximating technique to overcome
these problems is introduced. Finally, the forecasting facilities proposed for MS-
VAR processes are compared with the properties of forecasting with Gaussian VAR
models in Section 4.5.
66 Forecasting MS-VAR Processes
XtB ~t + Ut,
F (~t -~) + Vt+I,
where the assumptions (A1)-(A6) made in Table 2.1 apply. Thus, X t = (1, Y~-1)0
IK with Yt-l = (Y~-l" .. ,Y~_p)" In MSI specifications (cf. Table 2.3), the matrix
B contains the parameter vectors ßm associated with the regime m = 1, ... , M,
with interceptterms V m and the autoregressive parameters Om = vec (Al m, ... , Apm ).
As also stated in Chapter 2, in MSM specifications, the regime vector ~t is
N = M(p+l) dimensional, so that B is a ([K(Kp + 1)] x N) matrix with Vi
(4.3)
where we have used the unpredictability of the innovation process Ut, i.e.
E[Ut+Ilyt,~t+d = O. Thus, in case of anticipation of regime m, the optimal
predictor would be X t+ I ßm.
In practice these assumptions have to be relaxed. For example, the unknown para-
meter matrix B might be replaced by the ML estimator, which is asymptotically un-
biased. Having forecasts for the predetermined variables, the major task is to forecast
the evolution of the Markov Chain. As discussed in Section 2.4, this prediction can
be derived from the transition equation (2.2) as
(4.4)
4.1. MSPE-Optimal Predictors 67
Since Vt+1 in the general linear MS regression model is non-normal, the inferences
€tlt and €t+1lt depend on the information set Yt in a non-linear fashion. Thus, in
contrast to Gaussian state-space models, the one-step prediction of Yt+1l t cannot be
interpreted as a linear projection. Inserting the forecast of the hidden Markov chain
(4.4) into equation (4.3) yields the one-step predictor Yt+1lt:
Starting with the one-step prediction formula (4.5), general predictions can be de-
rived iteratively as long as the elements of Xt+h are uncorrelated with the state vector
ßt = B ~t+h
In OUf time series framework, it is crucial whether equation (4.6) holds true if lagged
endogenous variables are included in the regressor matrix X t+ j. In MSA-VAR mod-
els, the correlation of the lagged endogenous variables contained in X t with the re-
gime vector ~t may give rise to a problem which is unknown in VAR models with
deterministically varying parameters. I In general, equation (4.6) does not hold if X t
contains lagged endogenous variables
(4.7)
This problem does not occur in models with time-invariant autoregressive parameters
and constant transition probabilities, which can be represented as
(4.8)
where the matrix H = (VI, ... , VM) in MSI models and H is a function of J.L and
a = vec (Al, . .. ,A p ) in MSM models. 2
Equation (4.8) implies that the lagged endogenous regressors in X t+h and the regime
vector ~t+h enters the system additively. Hence, the regressors in Xt+h and the para-
meter vector ßt+h = B ~t+h are independently distributed, E[Xt+hßt+hIYT] =
E[Xt+hIYT]E[ßt+hIYT]. The optimal forecast of Yt+h is given by equation (4.6),
and
Thus, primary attention is given to MSM and MSI processes. Since we are only inter-
ested here in the optimal predictor, it is not necessary to distinguish between models
with or without heteroskedasticity. Consider first a subclass of processes for which
a computationally effective algorithm can easily be constructed.
Yt (4.10)
Yt-l - M "'t-l
t(l)
o
is a (K x pK) matrix, Zt = , Ut+ 1 = , and A =
Yt-p+1 -
M~(l)
t-p+1 0
A p- 1 Ap
0 0
is a (Kp x Kp) matrix.
o IK 0
4.2. Forecasting MSM-VAR Processes 69
Hence the problem of calculating the conditional expectation of Yt+h can be reduced
to the predictions of the Markovian and the Gaussian component of the state vector
(z~, ~?)')':
(4.13)
By using the law of iterated predictions, we first derive the forecast of ~~~h condi-
tional on ~?) and of Zt+h respectively. Applying the expectation operator to (4.11)
and (4.12) yields,
(4.14)
(4.15)
Then, the expectation operator is again applied to the just derived expressions, but
now conditional on the sampie information Yt
(4.17)
(4.18)
where we have used the mean ofthe observed time series given by J-Ly = M ~(1). The
reconstructed Gaussian component Ztlt is delivered as a by-product of the filtering
procedures (see Chapter 5) for the regime vector, tgi,
(4.19)
It needs no further darification to verify that the forecasts of YHh converge to the
unconditional mean of Y, if the eigenvalues of p' and A are inside the unit cyde.
In contrast to Gaussian VAR models, where interval forecasts and forecast regions
can be derived on the basis of the conditional mean Yt+hlt and the h-step MSPE
matrix ~t+hlt = E [(Yt+h - Yt+hlt)(YHh - YHhltYIYt], the conditional first
70 Forecasting MS-VAR Processes
and second moments are not sufficient to determine the conditional distribution of
Yt+hlYt which is a mixture ofnormals, e.g. for h = 1,
N
p(Yt+1IYt) = L Pr(~)t+1 = I,mIYt) r.p (L:~1/2(Yt+1-Ym.t+d),
m=l
where N = MP+1 and r.p(.) is the prob ability density function of a K -dimensional
vector of standard normals. Although the preceding calculations have been straight-
forward, in practice it is rather complicated to construct interval forecasts analytic-
ally.
Yt+h - Yt+hlt
Using again t!~hlt - (1) = P,h(t!ji - (1») as in (4.17) and Zt+hlt = Ahz t as in
(4.16) yields:
h h
Yt+h - Yt+hlt - M L p,iVt+i +J L AiUt+i + M P,h(~i1) - tiii) (4.20)
i=l i=l
+ JA h(I
p®
M)((C(l) c(1»)'
<"'t - <.,,* ,"', <"'t-p+l
(C(l) j(l)
- <"'t-p+llt
)')'
.
sification eITors and might be called filter uncertainty. If parameters have to be es-
timated as is usually the case in practice, another term enters due to parameter un-
certainty.
For the MSI model, MSPE optimal forecasts can be derived by applying the condi-
tional expectation to the measurement equation,
where the lagged endogenous variables Yt-l and the regime vector ~t enter addit-
ively as in (3.7).
(4.22)
h
=F = (VI, ... , VM). Toderiveaclosedformsolu-
A - A -
Yt H ~t + JAYt-l + Ut,
72 Forecasting MS-VAR Processes
H
o
where A = is a (Kp x Kp) matrix, H =
o IK 0 o
1,1®Hisa(KpxM)matrixandJ = [I K 0 o ] = l,~ ® IK is a (K x K p)
matrix.
Thus, we get the optimal predictors by solving the following linear difference equa-
tion system
In contrast to linear VAR(P) models, the optimal predictor Yt+ hIt depends not only on
the lastp observations Yt, butis based on the fuH sampIe information yt through €tlt
h-l
Yt+hlt = '2:= A h-i H €t+h-ilt + A h yt . (4.25)
i=O
Although the optimal predictor is linear in the vector of smoothed regime probabil-
ities ~tlt and the last p observations of Y t , Yt+hlt is a non-linear function of the ob-
served yt as the inference ~* depends on yt in a non-linear fashion.
While the absence of restrictions on the parameter matrix B can simplify estima-
tion (as will be shown in Chapter 9 for the MSIAH-VAR model), concerning fore-
casts, the situation worsens if the autoregressive parameters are allowed to be regime
dependent. For MSIAH-VAR processes, the observed variable Yt can no longer be
4.4. Forecasting MSA-VAR Processes 73
(4.27)
But this is only due to the fact that X t+1 is deterministic given Yt, E[Xt+1IYt, ~t] =
X t+1' while in general E[Xt+jIYt, ~t] =j:. E[Xt +j IYt].3 Thecrucial point conceming
the MSA-VAR model is that Yt is a non-linear function ofthe regimes {~t-d~o'
This is due to the (Y~ 0 IK) Ot term, where the autoregressive parameter vector
Ot = [al,"" aM ]~t and the lagged endogenous variable Yt-j enter, which is ob-
A two-step prediction for example would involve the following conditional expect-
ations:
E[Vt+2IYt] + E[At+2Yt+1IYtj
E[Vt+2IYt] + E[At+2(Vt+1 + At+IYt)IYt]
E[Vt+2IYt] + E[A t +2Vt+IIYt] + E[A t +2At+1IYt]Yt,
A
(MKxMK)
J '::'t
(MKxK) (MKxMK)
3 Similar problems arise if the transition probabilities are time varying. such that F varies stochastically.
74 Forecasting MS-VAR Processes
such that At = J'3 t AJ = J' A3t J and Vt = J'3 t v, v = (v~, ... , VM)'.
It can be easily verified that
E[Yt+hIYt]
r r
} Yt+h }Yt+1.t+h-l
p(Yt+h, Yt+1.t+h-lIYt) Yt+h dYt+h dYt+1.t+h-l
r r
}Yt+h }Yt+1.t+h-l
p(Yt+hIYt+l.t+h-l, Yt) p(Yt+l.t+h-lIYt)
E[Yt+hIYt+1.t+h-1It, Yt]
r
}Yt+h
Yt+h p(Yt+hIYt+1.t+h-llt, Yt) dYt+h,
which is obtained from (4.7) and which is not the optimal predictor E[Yt+hIYtj as in
the MSM-VAR and MSI-VAR model.
In practice, the parameters are unknown and we have to estimate them. Hence, the
usual procedure of substituting the unknown parameters by their estimates, whieh
are a non-linear function of the observed past values, is itself only an approximation.
Therefore, the predictor given in (4.29) might be justified for the same reasons.
In this chapter we have investigated the effects of the non-normality and the non-
linearity of the MS-VAR model on forecasting. It has been shown that:
(i.) the optimal predictor of MSM( M)- VAR(P) and MSI(P)-VAR(P) models is lin-
ear in the last p observations and the regime inference, but there exists no
purely linear representation of the optimal predictor in the information set.
The results could be compared with forecasts based on the VARMA represen-
tation of these processes.
(iii.) The predicted probability densities are non-normal and thus in general neither
symmetrie, homoskedastic, nor regime invariant.
Before we consider this simulation technique, which invokes Bayesian theory, the
filtering techniques delivering the statistical inference about the regimes and the clas-
sical method of maximum likelihood estimation in the context of this non-linear time
series model are presented in the following chapters.
Chapter 5
An important task associated with the statistical analysis of MS-VAR models is dis-
cussed in this chapter: the filtering and smoothing of regime probabilities. In the
MS-VAR model the state vector ~t is given a structural interpretation. Thus an infer-
ence on this unobserved variable is of interest for its own sake. However, the filtered
and smoothed state probabilities provide not only information about the regime at
time t, but also open the way for the computation ofthe likelihood function and con-
sequently for maximum likelihood estimation and likelihood ratio tests.
The discrete support of the state in the MS-VAR model allows to derive the com-
plete conditional distribution of the unobservable state variable instead of deriving
the first two moments, as in the KaIman filter (cf. KALMAN [1960],[1963] and KAL-
MAN [1961]) for normal linear state-space models, or the grid-approximation sug-
gested by KITAGAWA [1987] for non-linear, non-normal state-space models.
In their re cent form, the filtering and smoothing algorithms for time series models
with Markov-switching regimes are dosely related to HAMILTON [1988], [1989],
[1994a] building upon ideas of COSSLETT & LEE [1985]. The basic filtering and
smoothing recursions have been introduced by BAUM et al. [1970] for the recon-
struction of the hidden Markov chain. Their algorithms have been applied by LIND-
GREN [1978] to regression models with Markovian regime switches. A major im-
provement of the smoother has been provided by the backward recursions of KIM
[1994]. For these reasons, the recursive filter and smoother for MS-VAR models is
termed in the following chapter the Baum-Lindgren-Hamilton-Kim (BLHK) filter
and smoother. However, this name should not diminish the contributions of other
researchers to the development of related methods; for example, the basic filtering
formula has been derived independently by TJ0STHEIM [1986b] for doubly stochas-
tic processes with a Markov chain as the exogenous process goveming the parameter
78 The BLHK Filter
The aim of this chapter is to present and evaluate the algorithms proposed in the lit-
erature in the context of our settings and to discuss their implications for the follow-
ing analyses. In Seetion 5.1 algorithms to derive the filtered regime probabilities ~*
are presented. Smoothing algorithms delivering the full-sample conditioned regime
probabilities, ~tIT' are considered in Section 5.2. This will be done under the as-
sumption that the parameter vector ,.\ is known. In practice, ,.\ is usually unknown
and has to be estimated with the methods to be described in Chapter 6. Some related
technical remarks elose the discussion.
5.1 Filtering
Yt XtB ~t + Ut,
~t+l F ~t + Vt+l·
Note that the (N xl) regime vector is M -dimensional for MSI specifications, while
we consider for MSM specifications the stacked regime vector collecting the infor-
mation about the last p + 1 regime realizations, N = MP+l.
By assuming that all parameters ofthe model are known, the discrete-state algorithm
under consideration summarizes the conditional prob ability distribution of the state
vector ~t+l by
5.1. Filtering 79
Since each component of (t+1lt is a binary variable, (t+1lt possesses not only the
interpretation as the conditional mean, which is the best prediction of ~t+1 given Yt,
but the vector ~t+1lt also presents the conditional probability distribution of ~t+1.
Analogously, the filter inference (tlt on the current state vector based only on cur-
rently available data is defined as:
The filtering algorithm computes (tlt by deriving the joint probability density of ~t
and Yt conditioned on observations Yt.
By invoking the law ofBayes, the posterior probabilities Pr(~tIYtl Yt-d are given
by
Note that the summation involves all possible values of ~t and ~t-l.
P(YtIBl..' Yt-l) ]
'r}t = [
p(YtlBNl Yt-t}
where Bhas been dropped on the right hand side to avoid unnecessary notation, such
that the density of Yt conditional on Yt-l is given by p(YtIYt-d = 'r}~(tlt-l =
l/v('r}t 0 (tlt-d.
80 The BLHK Filter
1Jt 0 ~tlt-l
~tlt (5.4)
Consider, for example, an MS(2)-VAR(P) model with a Gaussian white noise: equa-
tion (5.4) traces the ratio of posterior regime probabilities
A A
back to the conditionallikelihood ratio 1Jlt and the prior ~ltlt-l . If one denotes :F =
1J2t ~2tlt-l
Pu - (1 - P22) and Umt = Yt - Xtßm, then the filtered probability €ltlt of regime
1 is found as
~ltlt
1 - ~ltlt
The transition equation implies that the vector ~~+1lt of predicted probabilities is a
linear function of the filtered probabilities ~tlt:
(5.5)
The sequence {€tlt-l };=l can therefore be generated by iterating on (5.4) and (5.5),
which can be summarized as:
F(1Jt 0 €tlt-d
~t+llt (5.6)
1'(1Jt 0 €tlt-d
In the prevailing Bayesian context, ~tlt-l is the prior distribution of ~t. The posterior
distribution €tlt is calculated by linking the new information Yt with the prior via
5.1. Filtering 81
Bayes' law. The posterior distribution ttit becomes the prior distribution for the next
state ~t+I and so on.
The iteration is started by assuming that the initial state vector ~o is drawn from the
stationary unconditional probability distribution ofthe Markov chain tilO = tor by
handling ~o parametrically. In this case, ~Ilo is an additional parameter vector to be
estimated.
Equations (5.4) or (5.6) present a fast algorithm for calculating the filtered regime
probabilities. For analytical purposes, it can be useful to have a final form of ~~It
which depends only on the observations Yt and the parameter vector ..\. The desired
transformation of equation (5.4) can be achieved as follows: equation (5.4) can be
rewri tten as
(5.7)
~tit = A ,
l' Kt~t-llt-l
where we have used that I' tt+llt = 1 holds by definition, and that expressions (5.2)
and (5.3) can be collected as
Solving the difference equation in {ttit}, we get the following final form of ~~It as
= TI p(Yt-jIYt-j-d ~o = p(Yt!Yo)
t-I Kt-j 1 TI K
(t-l
t- j
)
~o. (5.9)
j=O j=O
Expression (5.9) verifies that the regime probabilities are linear in the initial state ~o,
but non-linear in the observations Yt-j entering 17t-j and the remaining parameters.
82 The BLHK Filter
5.2 Smoothing
The filter recursions deliver estimates for ~t, t = 1, ... , T based on information up
to time point t. This is a limited information technique, as we have observations up
to t = T. In the following, full-sample information is used to make an inference
about the unobserved regimes by incorporating the previously neglected sampie in-
formation Yt+1.T = (Y~+l' ... ,y~)' into the inference about ~t. Thus, the smooth-
ing algorithm gives the best estimate of the unobservable state at any point within
the sampie.
Different approaches are available to calculate these probabilities, i.e. the smoothed
inference about the state at date t based on data available through some future date
T > t, where T := T is considered here exc1usively. The algorithm introduced by
HAMILTON [1988], [1989] derives the full-sample smoothedinference€tlT from the
common probability distribution of ~t and ~T conditional on YT ,
where
Pr(~T,~tIYT-l) = L Pr(~T-l,~tIYT-l)Pr(~TI~T-d·
~T-l
The full-sample smoothed inferences €tlT can be found by iterating backward from
t = T - 1, ... , 1 by starting from the last output of the filter ~TIT and by using the
identity
L Pr(~t, ~t+1IYT)
~t+l
For pure VAR models with Markovian parameter shifts, the probability laws for
Yt and ~t+1 depend only on the current state ~t and not on the former history of
states.Thus, we have
It is therefore possible to calculate the smoothed probabilities ~tlT by getting the last
term from the previous iteration of the smoothing algorithm tt+1l r , while it can be
shown that the first term can be derived from the filtered probabilities ttjt,
(5.12)
If there is no deviation between the full information estimate, ~t+1IT, and the in-
ference based on the partial information, ~t+1lt. then there is no incentive to update
ttlT = ttlt and the filtering solution ttlt cannot be further improved.
In matrix notation, (5.11) and (5.12) can be condensed to
~tlT (5.13)
where 0 and 0 denote the element-wise matrix multiplication and division. The re-
cursion is initialized with the final filtered probability vector tTIT. Recursion (5.13)
describes how the additional information yt+1.T is used in an efficient way to im-
prove the inference on the unobserved state ~t. As an illustration of this consider
i.i.d. switching regimes, where F = [1'. The missing serial dependence of regimes
implies that the observation at time t is a sufficient statistic for a regime inference.
Past observations, ttlt-l = ~, as weH as future observations, ttlT = ttlt are irrelev-
ant.
84 The BLHK Filter
The filtering recursion (5.4) and the smoothing recursion (5.13) are the base for
computationally appealing algorithms which will be used for parameter estimation.
However for theoretical purposes, it is sometimes beneficial to possess a final form
solution for ttlT. It can be easily verified that the final form solution of (5.13) is
identical to FRIEDMANN'S [1994] smoothing algorithm.
In (5.12) the ratio of smoothed and filtered regime probabilities at time t has been
traced back in a recursive fashion to the regime inferences for t + 1, Pr('t+1!YT)
and Pr('t+1IYt). To see the basis for another approach, apply Bayes' law to the
smoothed probability Pr('t IYt+1.T' Yt) to get the identity
P (t IY; )
r ':,t
= Pr (t
T - ':,t
IY;
t+1.T, t
Y;) = p(Yt+1.TI't,
("t/'
Yt) Pr('tIYt)
l"t/') , (5.14)
P .I t+1.T .I t
where the ratio of smoothed and filtered regime probabilities is reduced to the ra-
tio of the conditional probability density ~t, p(Yt+ 1. T I't, Yt), and the unconditional
density p(Yt+1.TIYt) ofthe new information Yt+1.T.
l' ( rr
T-t-1
J==O
KT-j
)
/'m,
l' ( rr
T-t-l
J=O
KT-j
)
ttlt.
~, 1 , (n
T- t- 1K
T-j )
'*
j=O /'m A
Lastly, the final form for ttlT follows from the definition of filtered probabilities ac-
cording to equation (5.9):
5.2. Smoothing 85
~ p(Y:IYo) [ 0 [ (5.16)
Equation (5.16) represents the smoothed regime probability vector €tlT as a non-
linear function of the past observations Yt and future observations Yt+1.T; except
for the normalization constant, ~tlT is linear in ~o.
A drastic simplification of this final form solution occurs if the regimes are seri-
ally uncorrelated (mixtures-of-normals model). By applying the recursions (5.4) and
(5.13), being aware ofthe unpredictability of ~t,
A _
~tlt-l = ~,
~ltiT
86 The BLHK Filter
5.A Supplements
Some cross products of the smoothed and filtered states might be of interest and can
also be calculated recursively. At first, we consider the conditional variance of ~t and
the predicted variance of ~t+l given Yt:
Obviously, both moments are functions ofthe inference about the actual regime ~tlt.
The conditional variance of the parameter vector ßt+ j and future values of the ob-
served variable Yt+j will therefore depend on ~tlt.
For example, the standard deviation of the filtered regime probability ~mt given Yt
can be calculated as
Far the conditional moments of states given the full-sample information YT is valid
analogously to:
E[~t~~+h\YTJ - E[~tlYT1E[~~+hIYT]
h' ~ ~,
Var [~tlYTl F - ~tlT ~t+hIT' h> 0, (5.21)
88 The BLHK Filter
where the parameter vector ). is assumed to be known. In practice, when the para-
meter vector ). is unknown, it is convenient to replace the true parameters with their
estimates. In the next chapter the maximum Iikelihood estimation is discussed. As it
will be shown, the path of smoothed regime probabilities plays a dominant role even
for the solution of the estimation problem.
rSt+I,St IYT -
P ( )- Pr(stHlst)Pr(StHIYT)Pr(StIYt)
P( I'J) .
r St+1 .It
vec
(P) i(l)
0 [ ( "'tHIT 0
i(l») ® "'*
"'t+llt
i(l)] '
(M 2 xl) (Mxl) (MX1) (Mx I)
where the filtered and smoothed probabilities {tls == {~f:l) = {~i~ are obtained by
the procedures (5.4) and (5.13).
Chapter6
In the last chapter attention was given to the determination of the state vector ~ for
given observations Y and known parameters).. In this chapter the maximum like-
lihood estimation of the parameters). = (BI, pI, ~b)1 of an MS-VAR model is con-
sidered. The aim of this chapter is (i.) to provide the reader with an introduction to
the methodological issues of ML estimation of MS-VAR models in general, (ii.) to
propose with the EM algorithm an estimation technique for all discussed types of
the MS-VAR models, (iii.) to inform the reader about alternative techniques which
can be used for special purposes or model extensions and (iv.) to give some basic
asymptotic results.
Thus, this chapter is partly a survey, partly an interpretation, and partly a new con-
tribution; preliminaries for the ML estimation are considered in the following two
sections. Section 6.1 gives three alternative approaches to formulate the likelihood
function of MS-VAR models which it will be seen, have turned out to be useful.
Section 6.2 discusses the identifiability of MS(M)-VAR(P) models. An identifiab-
ility result for hidden Markov-chain models provided by LEROUX [1992] is exten-
ded to our augmented setting. In Section 6.3 the normal equations of ML estima-
tion of MS-VAR models are derived. At the center of interest is the EM algorithm
which has been suggested by HAMILTON [1990] for the statistical analysis of time
series subject to changes in regimes. In the literature the regressions involved with
the EM algorithms are developed only for vector systems without autoregressive dy-
namics. We analyze the critical points; in particular, we relax the limitation in the
literature to MSI(M)-VAR(O) models thus allowing the estimation of genuine vec-
tor autoregressive models. It is shown that the implementation of the EM algorithm
to MS(M)-VAR(P) models causes some problems. Therefore the discussion is re-
stricted to an MS regression model, but one which captures all MSI specifications.
90 Maximum Likelihood Estimation
A concrete discussion of the ML estimation of the various model types via the EM al-
gorithm is left for Chapter 9. Extensions and alternatives which have been proposed
in the literature are considered in Section 6.5. In the closing Seetion 6.6, the asymp-
totic properties of the ML estimation ofMS-VAR models are discussed; in particular,
procedures for the estimation of the variance-covariance matrix of the ML estimates
are suggested.
T
Q(8, p, ~o) = II 7Jt(8)/~tlo(p, ~o), (6.1)
t=l
By using this function of prior regime probabilities ~tlO (p, ~o) which can be approx-
imated by the ergodie probabilities ((p) for sufficiently large t instead of the "pos-
terior" inference €tlt-l, GOLDFELD & QUANDT are not required to provide filtering
procedures to reconstruct the time-path of regimes. The model's parameters are es-
timated by numerical methods.
Unfortunately, the function Q(8, p, ~o) is not the likelihood function as pointed out
by COSSLETT & LEE [1985]. However, equipped with the results of Chapter 5 it is
possible to derive the likelihood function as a by-product of the BLHK filter:
L(..\IY) .- p(YTlYo;..\)
6.1. The Likelihood Function 91
t=l
T
TI LP(Ytl~t, Yt-l, 0) Pr(~tlYt-l,'x)
t=l €t
T T
TI 7J~ttlt-l = TI 7J~ Ftt-l1t-l . (6.2)
t=l t=l
As seen in Chapter 1, the conditional densities p(Ytl~t-l = Li, Yt-d are mixtures
of nonnals. Thus, the likelihood function is non-normal:
L('xIY)
T N N
TILLPij Pr(~t-l = LiIYt-1,'x) p(Ytl~t = tj,Yt-1,O)
t=l i=l j=l
g~ T N N
j;PtJ ~t.t-1It-1
" A, {
(211")
-K/2
I~J I
,-1/2
exp
1
( __
2 Ujt~j
I -1 ,
UJt
)}
,
FRIEDMANN [1994] has proposed inserting the c1osed-form solution (5.9) for
tt-1I t - l into equation (6.2). This procedure leads to the following algorithm for
determining the likelihood function:
L('xIY) 7J~tTIT-1L('xIYT-1 )
T T
TI 7J~ttlt-1 = TI 1:'" diag (7Jd Ftt-1It-1
t=l t=l
(6.3)
with
92 Maximum Likelihood Estimation
where the transition matrix F = (h, ... , IN) is such that K t = ("lt 0 h, ... ,Tlt 8
IN).
For the estimation procedures to be discussed in the following sections, a further
setting-up of the likelihood function will be employed wh ich makes use of the exo-
genity of the stochastic process ~t:
where the integration denotes again summation over all possible values of ~ = ~T 0
~T-l 0 ... 0 ~l' Later, these cumbersome calculations are simplified to a recursive
algorithm using the Markov properties:
T
p(YI~, 0) rrp(Ytl~t, rt-l,O),
t=l
Pr(~lp, ~o) rr
T
t=l
Pr(~tl~t-ll p).
Before we can proceed to the actual estimation, a unique set of parameters must be
specified. Maximum likelihood estimation presupposes that the model is at least
6.2. The Identification Problem 93
This observation implies that the linear VAR(P) model with parameter vector BO as a
nested special case of an MS(2)-VAR(P) model is not identifiable, since all structures
with BI = B2 = BO as weH as aH structures with P = 1/'~ and Bm = BO belong
to the same equivalence dass. The non-identifiability of a structure with BI = B2
causes problems for tests where the number of identifiable regimes is changed under
the null; this issue will be discussed further in Chapter 7.
is satisfied if and only if we can order the summations such that (i = {; and Oi = 0i
for i = 1, ... , m. Under the assumption (A6) made in Chapter 2 this condition is
fortunately fulfilled since the class of normal density functions is identifiable.
For hidden Markov-chain models LEROUX [1992] has shown that under this regu-
larity condition the equivalence classes are identifiable.
MT
where{ = 6060 .. .0~T is a (MT x 1) vector, O(e, A) = O(S1, A)0 .. .00(ST, .\),
and
T
p(YIYo, O(e l , .\)) II !(YtIYt-l; O(St, .\)),
t=l
M T
~I(P(.\), ~o(.\)) L (so II P S t_lSt (.\).
so=1 t=l
Employing standard results of the statistical theory of linear systems it is clear that
p(YT lYo, O({/, .\)) is a Gaussian density and that O({l'.\) would be identifiable.
6.3. Normal Equations of the ML Estimator 95
Hence, the critical point is whether the structures .V and ),2 define the same joint
density (6.6) only if they define the same mixing distribution (6.5), i.e. they belong
to the same equivalence cIass. It follows from TEICHER [1967] that (under inde-
pendence) the identifiability of mixtures carries over to products of densities from a
specific family. Using the argument of LEROUX [1992], that the result of TEICHER
is valid also for finite mixtures with a fixed number of components, we concIude that
the identifiability of (6.6) is ensured if and only if the identifiability of (6.5) does.
Thus .V and ),2 produce the same stationary law for Yt if and only if ),1 and ),2
are identical or differ only in the numeration of the states. This identifiability re-
sult is in line with previous suggestions of KARLIN & TAYLOR [1975] and WALL
[1987] where the latter has addressed the identification of varying coefficient regres-
sion models presupposing non-stochastic explanatory variables. Some useful proofs
can be found in BAUM & PETRIE [1966] and PETRIE [1969].
PIM 1
l:W~o 1
p 2: 0, Cl 2: 0, ~o 2: 0.
In L *(),)
1 For simplicity in notation we consider here explicitly an M -dimensional state vector as in MSI specific-
ations. The results can be straightforwardly transferred to MSM specifications where the dimension
of the initial state vector is MP.
96 Maximum Likelihood Estimation
Let Kl and K2 denote the Lagrange multipliers associated with the adding-up restric-
tions on the matrix of transition probabilities, i.e. p, and the initial state ~o. Then the
FOCs are given by the set of simultaneous equations
BlnL(.:\IY)
BB'
o
BInL(.:\IY) '(1' I) o
Bp' - Kl M 0 M
where it is assumed that the interior solution of these conditions exits and is well-
behaved, such that the non-negativity restrictions are not binding. These FOCs are
now calculated successively for the VAR parameter vector B, the vector of transition
probabilities p, and the initial state ~o.
The derivation of the log-likelihood function conceming the parameter vector Bleads
to the score function
BInL(.:\IY)
BB'
JBp(YI~,
~
L BB'
B) P (CIC
r '> ,>0,
) dC
P '>
8Inp(YI~, .:\)
BB'
are weighted with the conditional probabilities Pr(~IY,.:\) of this regime vector,
where we have used the definition of conditional probabilities
p(YI~, B) Pr(~I~o, p)
Pr(~IY, .:\)
Jp(Yle, B) Pr(eleo, p) d(
6.3. Nonnal Equations of the ML Estimator 97
T
BInL(AIY) = ' " ' " BInp(yt!{t, Yt-l, A) P (t IY; A) O. (6.7)
BO' L., L., BO' r ... t T,
t=l €t
0, (6.8)
BlnL(AIY)
Bp'
~ Jp(YI{,(J)BPr~~~o,p) d{
~J BInp~~,I{o'P)p(YI{,())pr({I{o,p) d{
J BIn P~~I{o, p) Pr({IY, A) d{.
Hence, the derivatives for each component Pij of p = vec ( P) are given by
T
BIn L(AIY)
BPij
LLL BInPr({t!{t-l,l)p(t
a
t
r ... t, ...
Pij
t-l
IY; A)
T,
t=l €t €t-l
98 Maximum Likelihood Estimation
ßlnL('xIY)
ßp'
and hence
e(1)
A
01\,1
,
= 1M , (6.12)
(6.13)
Since the score has the property that ßßln L -+ 00 if Pij -+ 0, there always exists
Pij
an interior solution for p, which is determined by equation (6.14), which in turn is
derived by inserting (6.13) into (6.11):
p (6.14)
6.3. Normal Equations of the ML Estimator 99
Thus, the ML estimator of the vector of transition probabilities p is equal to the trans-
ition probabilities in the sampie calculated with the smoothed regime probabilities
Pr(St = j, St-l = iIY), t = 1, ... ,T, i, j = 1, ... ,M collected in ~(2) (A).
alnL(AIY)
a~b
J
~ P(y,~,B)apr~~r'p) d~
~ Jalnp~~~I~o, p(YI~, Pr(~I~o,p) d~
p) B)
Jalnp~~~I~o,P) Pr(~IY,A) d~
L alnp~~~I~o,p) .Pr(6IY,A).
6 ~o
If the initial state is assumed to be fixed but unknown, the desired derivatives are
given by
alnPr(6 =tjl~o,p) _p (e _ .Ie )-lF ..
a~iO -
r <,,1 - t) <,,0, P )l'
alnL(AIY)
(6.15)
a~b
alnL(AIY) l'
a~b - /'1,2 M
o.
Since IM~O = 1 implies ""2 = 1, we get the following solution for ~o:
~O (6.16)
It is worth noting that the smoothed probability solution toIT(,x) for ~o in equation
(6.15) is a function of ~o itself. Furthennore, an analysis of the equivalent fonnula-
tion of the likelihood function (6.3) shows that the likelihood function is linear in ~o,
such that the interior solution (6.16) does not necessarily provide the global max-
imum. Hence, irrespective of whether the initial state ~o is assumed to be fixed to
one regime m* or stochastic with probabilities ~olo, the ML estimate is given by the
boundary solution:
The problem of assigning initial values can be overcome by assuming that the un-
conditional prob ability distribution of 6,
*'
~110' is equivalent to the ergodic probab-
ility distribution ~, since the ergodic probability distribution ~ is a function of the
transition probabilities p. Thus, a~~!O = would have to be included in the FOC
(6.15) of p.
2ForMSM specifications, {~MP) = {00{-l0 .. .06- p can bedetennined uniquely by using (6.16).
FRIEDMANN [1994] has proposed selecting only {i~p' while the initial state vector is determined by
j(p)
"010
= p,pc(l)
"l-p
0 ... 0 p'c(l) 0 c(l) .
"l-p "l-p
6.4. The EM Algorithm 101
the following sections, alternative algorithms are introduced that deliver maximum
likelihood estimates of the parameter vector A = «(), p, ~o) for given observations
YT = (y~, ... , Y~_p)1 by maximizing the conditionallog-likelihood function nu-
merically.
• In the expectation step (E), the unobserved states ~t are estimated by their
smoothed probabilities €tIT. The conditional probabilities Pr(~IY, A(j-I)) are
calculated with the BLHK filter and smoother by using the estimated para-
meter vector A(j -1) of the last maximization step instead of the unknown true
parameter vector A.
I. Initialization
ß. Expectation Step
~T-jIT
~ c' [BIll'T}t]
~ <"tiT BO' O.
t==l
3. Initial State: ~o
~o ~OIT'
Equipped with the new parameter vector ,\ the filtered and smoothed probabilities are
updated and so on. Thus, each EM iteration involves a pass through the BLHK filter
and smoother, followed by an update of the first order conditions and the parameter
estimates and is guaranteed to increase the value of the likelihood function.
General results available for the EM algorithm indicate that the likelihood function
increases in the number of iterations j. Finally, a fixed-point of this iteration sched-
ule ,\ (j) = ,\ (j -1) coincides with the maximum of the likelihood function. The gen-
eral statistical properties of the EM algorithm are discussed more comprehensively
in RUUD [1991].
It could be easily shown that the FOC's of the model where the smoothed regime
probabilities Pr(~IY,'\) are not determined simultaneously with the parameter vec-
tor'\ but calculated with a second predetermined parameter vector ,\ (j -1) is equival-
ent to maximization of the following objective function as pointed out by HAMILTON
[1990],
(6.18)
Equation (6.18) denotes the expected log-likelihood for ,\(j) given a distribution
parameterized by ,\(j-I). After some algebraic manipulations,
T
+L L In Pr(~tl(t-l, p) Pr((t, (t-l!YT, >.U- 1 )}. (6.20)
t=l ~t-l
Thus, the j-th maximization step of the EM algorithm maximizes the object-
ive function (6.18). Let 5. denote the maximizer of the expected log-likelihood
t'(>'/YT, >.U-l») conditional on >.U-l). Then 5. is the ML estimator of >. when the
algorithm has converged, i.e.
In the following, we will drop the >.U-l) indicating the parameter vector used for
the reconstruction of the Markov chain t'(>.IYT) == t'(>.IYT, >.U-l») for notation al
simplicity.
Since we are here interested only in the estimation of the VAR parameter vector 0,
we can concentrate our analysis on the first part of (6.19). By using the normality of
the conditional densities,
l(OIYT)
T N
cx: const. - ~L L ~mtlT {Kln(27f) + In lL:ml + Umt(T)'L:~IUmtb)}
t=l m=1
6.4. The EM Algorithm 105
A T A
For the sake of simplicity, we will consider here only MS-VAR models, which are
linear in the vector I of structural parameters
M
Yt L ~mtXmt'Y + Ut
m=l
such that the residuals at time t associated with regime mare given by
Yt - Xmt'Y.
As seen in Table 2.3, this assumption is guaranteed e.g. for MSI-VAR models, where
I = (1/', 0:')' and
X mt = [(t~, Y~-l' ... , Y~_p) 0IK].
Hence, the ML estimation of these models to be presented in Section 9.3.4 is a
straightforward application of the procedures which we are going to discuss here. In
Section 9.3.5 we shall also discuss how the procedures developed in this seetion can
be applied to MSM-VAR models which are non-linear in the structural parameters
0: and J-L. The linearity of the score in 0: conditional on J-L et vice versa will provide
the key.
Next, we show that the linearity of the model in the parameter vector I results in a
generalized linear regression model with many observations per cell, where the frac-
tional number of pseudo-observations(Yt, X mt , 6 = t m ) is given by the smoothed
regime probabilities
For calculating the derivatives of the expected likelihood function, the following mat-
rix notation will turn out to be useful:
M
f(BIYT ) cx: const. -~ L {i'mlnIEml+ u m(!)'(3m 0E;;/)um(!)},
m=l
1 MAI
cx: const. - 2L Tm In IEml- 2 uh),W- 1 u(!), (6.22)
m=l
106 Maximum Ljkelihood Estimation
where
w- 1
(MTKxMTK)
(TxT)
o ] = diag (~~)
tmTIT
u(r)
(MTKxl)
Um (1) y- X m 11
(TKxl)
X Xm
(MTKxR) (TKxR)
The ML estimates of the structural parameters 1 are given by the well-known GLS
estimator, since obviously:
ßf(OIYT)
(6.23)
81
(6.24)
Thus, the regressions necessary at each maximization step are GLS estimations
where the pseudo-observations (Yt, X mt1 ~t = t m ), m = 1, ... IN, are weighted
with their smoothed probabilities ttIT(.~(j-l)):
6.4. The EM Algorithm 107
X'M
X'M
expression for the log-likelihood function which will be useful in order to detennine
the ML estimator of E:
KT T 1
t'(AIYT ) oe const. - -2-ln(27r) - "2lnIEI- 2u*(-y)'W*-lU*(,),
The partial derivatives of the expected log-likelihood with respect to the elements of
E are
at'(AIYT) _
-'-:-:'::---'- -
T ~-l
--LJ -
1 ~-l U *( , )' U *( , )~-l
-LJ LJ •
aE 2 2
In order to detennine the ML estimates of EI, ... , E M, the system of first order par-
tial derivatives is needed. By means of standard matrix differential ca1culus (cf. e.g.
MAGNUS & NEUDECKER [1994]) we get
(6.29)
6.4. The EM Algorithm 109
Again, it is easily verified that the maximization of f( >'1 YT ) yields the modified FOC:
0, (6.30)
8 In 17mt
80"m = -"21 DK vec (-1
I
Em -
-1 ( I -1
E m Umt ')')umt{t) E m ), (6.31)
8 In 17it
8
O"m
= 0 forz. #- m,
results to:
T
'" i 8In 11mt
6<" mt lT 80"'
t=1 m
The interdependence of the estimates for ')' and 0" theoretically requires iterating
between the equations (6.24) and (6.27)/(6.29) within each maximization step.
However, as in the GeneraIized EM algorithm (cf. DEMPSTER et al. (1977] and
RUUD [1991 D, it can be sufficient to do only one single (estimated) generalized least
squares (GLS) estimation within each maximization step to ensure convergence to
a stationary point of the log-likelihood. In order to compromise convergence re-
quirements and computation time, the convergence criterion of the internal iteration
within each maximization step may be formulated less restrictive than the criterion
of the EM algorithm.
The iteration steps of the EM algorithm are repeated until convergence is ensured.
For this purpose, different convergence criteria can be used. The first criterion is
110 Maximum Likelihood Estimation
In addition, the parameter variation rnight be taken into account with a given norm
11 . 11, such that
If a maximum norm of the absolute change of the parameter values is used, we have
Alternatively, the (root of) mean of squared relative parameter changes rnight be con-
sidered:
~ R (A~i+l) _ A~j) ) 2
~2b:= R ~ A~j) ,
where R is the number of non-negative parameters A~j) ::f. O. The recursion stops if
convergence is achieved, i.e. if the changes of the log-likelihood and the parameters
are negligibly smalI: ~i ~ €i for all i = 1,2. Note that in ~2a and ~2b, the discrete
parameter vector ~o is not included in A. Finally, the EM algorithm is terrninated if
the number of iterations j exceeds a previously specified upper bound j > €4.
The EM algorithm has many attractive features; foremost among these are its com-
putational simplicity and its convergence properties. In our experience, the method
finds estimates in the region of the maximum reasonably quickly from arbitrary ini-
tial values. Among the undesirable features is the drawback that it does not produce
the information matrix automatically. However, the EM algorithm may be completed
by the procedures proposed in Seetion 6.6.2 for the estimation of the asymptotic vari-
ance matrix.
6.5. Extensions and Alternatives 111
Although the EM algorithm has good convergence properties even starting far away
from the maximum of the log-likelihood function, elose to the maximum, it con-
verges rather slowly. An algorithm which has attractive convergence characteristics
elose to the maximum is the scoring algorithm, which will be discussed in the next
section.
As we have seen, the maximization of the log-likelihood function is due to the non-
linearity of the first order conditions for 5. a highly non-linear optimization problem.
Its solution requires numerical techniques that maximize In L(AIYT ) iteratively. A
popular numerical optimization method uses gradient algorithms (cf. LÜTKEPOHL
[1987]). The general form ofthe j-th iteration step is
(6.32)
where h j is the step length 3 in the j-th iteration, H j is a positive definite direction
matrix and ST(A (j)) is the score defined as the gradient of In L(AIYT ) at A(j),
The various gradient algorithms differ in the choice of the direction matrix H j (cf.
e.g. JUOGE et al. [1985, sec. B.2]). The scoring algorithm uses the inverse of the
information matrix:
H j = [I(A(j))] -1 ,
Thus the method of scoring requires the score vector and the information matrix. Por
parsimoniously specified models it might be possible to derive the expressions forthe
3There are nurnerous ways to choose the step length hj. For sake of simplicity, it can be set to one,
hj = 1. A more efficient method is a grid search in a set ofincreasing positive values of hj in order
to maximize In L j +1 ( h j) = In L (,\ (j+ 1) ( hj ) ) , where the search stops after the first decline in the
likelihood. Then, the optimal step Iength is chosen either as the preceding value for h j or via quadratic
interpolation as the maximizer of a quadratic polynomial in In Lj +1 (hj) over the last three points.
112 Maximum Likelihood Estimation
score and the infonnation matrix analytically. In practice they are usually derived
numerically, where the infonnation matrix is approximated as i ().. (j)) by dropping
the expectation operator and by substituting the true parameter vector ).. with ).. (j) .
Altematively, an estimate of the infonnation matrix can be derived via BERNDT et al.
[1974]. This algorithm will be discussed in Section 6.6.2 in more detail.
8InL(~IYT) I
8)..1 )..=)..Ul
where ).. + := ).. (j) + c~j) Li, ).. - := ).. (j) - c~j) Li> c1 is a small positive number
and Li is the i-th coIumn of the identity matrix. The resuiting approximated infor-
mation matrix is assumed to be positive definite. If this assumption is violated, a
sufficiently large positive number c might be added to the elements of the main di-
agonaIofI()..(j)),Hj := [I()..(j))+d]-l.
Having evaluated the score vector and the infonnation matrix, the j -th iteration step
changesto
The method of scoring might be modified conceming the treatment of the initial state
parameters ~o. In each iteration step onIy the unknown elements of ).. t = (B', p')' are
estimated via scoring for given ~o. Then ~o is replaced by the smoothed probability
vector ~OIT' Thus, the recursion fonnulae are given by
Finally, in order to check convergence, the criteria introduced in Seetion 6.4 can be
used.
More general problems in the context of normal state-space models have been dis-
cussed in WATSON & ENGLE [1983]. In particular, it has been noted inter alia by
LÜTKEPOHL [1991, p. 437] that even though the scoring mostly has good conver-
gence properties near the maximum, far off the maximum it may perform poorly.
As proposed by WATSON & ENGLE [1983], the most practical method seems to be
a mix of EM and scoring algorithms. While the EM algorithm can be used to move
the parameters quickly to the neighborhood of the maximum, scoring can be used to
pinpoint the maximum and to estimate the information matrix.
In contrast to the so far presented algorithms where each iteration was based on
the full-sample information, the t-th iteration of the recursive maximum likelihood
estimator uses only the first t observations (after an initialization period).
Thus, the recursive ML estimator of the MS-VAR model is given by
(6.34)
where St+ 1 (A (t)) is a score vector and H t is the adaptive matrix. The optimal choice
of the adaptive matrix is the inverse of the information matrix. However, for com-
putational reasons, the inverse of the observed information matrix is used:
L S.(A(·-l))s.(A(·-l))'.
t
~ (6.35)
.=1
The crucial point of this procedure is to keep the adaptive matrix weIl behaved, i.e.
in particular positive definite. After an initial phase where the adaptive matrix is sta-
bilized, each observation Yt is processed separately.
There is a superficial similarity to the iterative step (6.32) of the scoring algorithm
with h t = (t + 1) -1. However, there are also important differences consisting of two
major points: first, at the t-th iteration only the first t observations are used, which
turns the algorithm into an on-line estimation technique. Secondly, the score func-
tion is not derived numerically, but involves an EM recursion
where the filtered regime probabilities trlt are involved. Again, these are provided
by the BLHK filter. Note that equations (6.35) and (6.37) use that St_l(,\(t-l)) =
alnL(~IYt_d _ 0 Thus the conditional score h (,\(t-1)) _ alnp(YtIYt_l;~(t-l») -
a~(t 1) -. , t - a~(t 1) -
A major drawback of this approach is its sensitivity to the adaptive matrix whose cal-
culation will become lengthy if large VAR pro ces ses with many parameters have to
be estimated. Note that the simulation results presented in HOLST et al. [1994] come
from a model which is restricted to only 8 parameters. Furthermore, the algorithm
provides only filtered regime probabilities. While this problem can be overcome by
a final full-sample smoothing step, the recursive EM algorithm will provide no full-
information maximum likelihood parameter estimates.
For large sampies, however, a combination of the recursive EM algorithm, the "full-
information" EM algorithm with the scoring algorithm might be favorable; perform
some iterations of the recursive EM algorithm to derive initial estimates for the full-
information EM algorithm, which is then used to come elose to the maximum. Tbe
EM algorithm will itself provide starting values for the scoring algorithm. In the final
step, the scoring algorithm is used to achieve the maximum ofthe log-likelihood and
to derive an estimate of the information matrix.
6.5. Extensions and Alternatives 115
In the presence of information about the parameters beyond that contained in the
sampie, Bayesian estimation provides a convenient framework for incorporating
such prior information. 4
With this, any information the anal yst has about the parameter vector ..\ is represented
by a prior density p(..\). Prob ability statements conceming..\ after the data YT have
been observed are based on the posteriordensity p(..\IYT ), which is given via Bayes'
theorem by
(..\IY; ) = p(YTI..\)p(>')
p T p(YT ) '
where the density ofYT conditional on the value ofthe random variable >., p(YTI..\)
is algebraically identical to the likelihood function L(>.IYT) and p(YT) denotes the
unconditional sampIe density which is just a normalization constant. Hence, all in-
formation available on >. is contained in
Note that for ftat priors, i.e. p(>') = const., the posterior density is proportional to
the likelihood function
p(>'/YT) cx L(..\IYT).
Thus, without reliable prior information, the mode of the posterior distribution is
given by the ML estimator..\. Analogously, equation (6.38) usually can be inter-
preted as a penalized likelihood function. However, it might be worth noting that
the Bayesian approach does not derive the distribution of the estimator ..\ but makes
an inference ab out ..\ which is itself regarded as a random variable in Bayesian stat-
istics. Thus, p( >'1 YT ) denotes the posterior distribution of the unknown parameter ..\
and not the distribution of tbe ML estimator .x.
For mixtures of normal distributions HAMILTON [l991a] has proposed a quasi-
Bayesian estimation implemented as a modification of the EM algorithm. The bene-
fit of a quasi-Bayesian analysis might be the capability of offering a solution for some
singularity problems associated with ML estimation and for cboosing between loeal
4The reader is referred to LÜTKEPOHL [1991, sec. 5.4] or HAMILTON [1994b, eh. 12] for an intro-
duction to the basic principles underlying Bayesian analysis with applications to time-invariant VAR
models.
116 Maximum LikeJihood Estimation
maxima of the likelihood function. While there is no natural conjugate prior for the
MS-VAR model (cf. HAMILTON [1993]) it is convenient to treat Normal-Garnrna-
priors. For the MSI(M)-VAR(O) model it is shown by HAMILTON [1991 a] that these
priors can be easily incorporated in the EM algorithm by representing prior infor-
mation as equivalent to observed data. In an MSI(M)-VAR(O) model for example,
the mode of the posterior density of V m would be given by
The underlying VAR model can be extended to a general state-space model (cf.
LÜTKEPOHL [1991, ch. 13]), where the parameters in the measurement and the
transition equation can depend on the regime governed by a Markov chain as in the
MS-VAR model:
For this more general dass of models, KIM [1994] has proposed an estimation tech-
nique that combines a BLHK-like filter with an approximating KaIman filter and
smoother. The first one "reconstructs" the regimes as in the MS-VAR context, while
the KaIman filter and smoother "reconstruct" the states Zt as in a time-invariant linear
normal state-space model. In order to make the estimation of the model tractable, ap-
proximations to optimal filtering are involved as in HARRISON & STEVENS [1976].
KIM [1994] suggests to maximize the likelihood function by using a non-linear op-
timization technique which, however, is not discussed further. Nevertheless, KIM'S
model generalizes the switching approachof SHUMWAY & STOFFER [1991], where
the regime-goveming random process is assumed to be serially independent and the
switching is restricted to the measurement equation.
While the procedure proposed by KIM [1994] seems to work in practice, theoretical
results conceming the effects of the various approximations are missing. Recently,
BILLIO & MONFORT [1995] have proposed a partial KaIman filter and smoother in
combination with Importance sampling techniques to compute the likelihood func-
tion of switching state-space models like (6.39)/(6.40). Simulated maximum like-
lihood methods have also been suggested by LEE [1995] for MS-AR models with
latent variables.
Furthermore, the redefining of the regime vector dr+ 1) = ~t ®~L 1 ® ... ®~Lr as in
the MSM(M)-VAR(P) model is intractable, since the number r of relevant regimes
inp(Ytl~t, Yt-l) grows with t, i.e. r ---? 00.
As already mentioned, MS-VAR models possess a linear Gaussian state-space repre-
sentation with Markov-switching regimes as in (6.39) and (6.40). But since they are
quite simplistic, the advantage of a partial KaIman filter estimation is rather limited
compared with the additional effort involved.
The asymptotic theory of ML estimation in linear time series models is very well-
developed, but fragmentary for non-linear models. For the MS-VAR model, it is
usually assumed that the standard asymptotic distribution theory holds. Unfortu-
nately, as far as we know, there exist no general theoretical results conceming the
asymptotic properties of the maximum likelihood estimation. As HAMILTON [1993,
p. 249] points out "All ofthe asymptotic tests [ ... ] assume regularity conditions are
satisfied, which to our knowledge have not yet been formally verified for this dass
ofmodels."
However, there are results in the literature which justify this assumption. For the
mixture-of-normals model with its i.i.d. regimes, the consistency and asymptotic
distribution of the maximum likelihood estimator have been shown by LINDGREN
[1978] and KIEFER [1978] ,[1980]. In LEROUX [1992], the consistency of maximum
likelihood estimators is proven for general hidden Markov-chain models, i.e. for
MSI(M)-VAR(O) processes. For stable MS(M)-AR(P) processes, it has been proven
6.6. Asymptotic Properties of the Maximum Likelihood Estimator 119
by KARLSEN [1990a] that Yt is a strong mixing proeess with a geometrie mixing rate.
For a hidden Markov ehain, the stationarity of Yt is implied by the stationarity of the
homogeneous Markov ehain ~t. Moreover, following BILLINGSLEY [1968], as ~t is
<p-mixing, Yt is <p-mixing, as weIl. When the data are <p-mixing and stationary, the
asymptotic distribution ean be based on the funetional centrallimit theorem given
in BILLINGSLEY [1968]. Following HOLST et al. [1994, p. 498], this might open
up a possibility of proving eonsisteney as weIl as asymptotie normality. In addition,
for univariate Markov-switehing regression models with endogenous state seleetion 5
(but again without lagged dependent variables), the eonsisteney of maximum likeli-
hood estimators has been proved by RIDDER [1994]. It remains to show, however,
that these results ean be transferred to the MS-VAR model in general.
Thus, it can be conjectured that the maximum likelihood estimator is consistent and
asymptotically normal under suitable conditions. Typical eonditions require iden-
tifiability, roots of :F(L) and A(L) to be outside the unit circle, and that the true
parameter vector does not fall on the boundary of the allowable parameter space (cf.
HAMILTON [1994a]). Therefore it should be mentioned that on the boundary, i.e. if
Pij = 0 for at least one pair i, j, the asymptotic distribution will eertainly be incor-
rect. The intuition beyond this condition is that the convergence of the ML estim-
ates Pij of the transition parameters of the Markov ehain depends on the number of
transitions nij ~ Pij[iT. Thus, Pij will converge very slowly to the true value if
the transition prob ability Pij or the ergodic probability of regime i, ~i, is near zero.
Furthermore, Pij = 0 or Pij = 1 would imply under normality that the confidence
interval is not restricted to the [0,1] range. This problem can be avoided by using
logits of the Pij as parameters. Then boundary solutions cannot be achieved.
5The transition probabilities Pij are time varying due to their dependence on Yt.
120 Maximum Likelihood Estimation
I a = lim Tl I,
T-+oo
and the information matrix I is defined as minus the expectation of the matrix of
second partial derivatives of the log-likelihood function evaluated at the true para-
meter vector. Hence, the asymptotic information matrix is given by
Since the maximum of the likelihood function lies on the boundary of the parameter
space concerning the parameter vector ~o, these parameters must be excluded from
A when the variance-covariance matrix is calculated.
For the MS-VAR model, it is in general impracticable to evaluate (6041) analytic-
ally. As suggested by HAMILTON [1993), an estimation of the information matrix
can be achieved by using the conditional scores ht(A) as proposed by BERNDT et al.
[1974]:
(6.42)
The conditional score of the t-th observation, ht (A) is defined as the first partial de-
rivative of In p(Yt IYt-1 jA):
Obviously, (6.43) is closely related to the score 8t (,>') as the first partial derivative of
the log-likelihood function Inp(YtIYo; A):
t t
( >') = 81np(YtIYo;A) =" 81np(Yr !Yr-l;A) = " h ( ) (6.44)
8t - 8>' - ~ 8>' - ~ r A.
r=1 r=1
Since -X is the maximizer of the likelihood function, the score s(-X) == STe-X) as the
gradient of the full-sample log-likelihood function In p(YT!Yo; >') at -X must be equal
to zero.
The scores 8t (-X) are calculated according to the normal equations of the ML estima-
tor
t
8t(\) = L \lfrC-X)'trlt,
r=1
where
.T, ( \) -
'±'r /\ -
8 diag (1]r ) F r
8N
I _.
>.=>.
The smoothed probabilities ~rlt can be derived analogously to KIM'S smoothing al-
gorithm (5.13)
trlt = [F' etr+1l t 0 tr+llt)] 0 trlTl
with the filtered probabilities ttlt as the starting values.
(6.45)
122 Maximum Likelihood Estimation
6.7 Conclusion
In this chapter we have discussed the classical method of maximum likelihood esti-
mation far MS(M)-VAR(P) models. While parameter estimation with the proposed
EM algorithm is quite standard, some tests of interest will not have standard asymp-
totics. The problems and methods of statistical tests in MS-VAR models will be in-
vestigated in the next chapter. It will be shown that this problem concems only hy-
potheses, where the number of identifiable regimes is altered under the null. Before
we come back to the EM algorithm in Chapter 9, where the regression step is final-
ized for all MS(M)- VAR(P) specifications under consideration, we will introduce in
Chapter 8 a new Gibbs sampier for MS-VAR models which combines Bayesian stat-
istical theory and Monte Carlo Markov-chain simulation techniques.
Chapter 7
The last two chapters have demonstrated that the estimation methods and filtering
techniques are now weIl established for MS-VAR processes. Most unresolved ques-
tions arising in empirical investigations with MS-VAR models concern the issue
of model specification. In Section 6.6 we discussed the asymptotic distribution of
the maximum likelihood estimator of MS-VAR models. In the literature (cf. e.g.
HAMILTON [1993]) it has been assumed that standard asymptotic theory holds:
The asymptotic normal distribution of the ML estimator ensures that most model dia-
gnostics and tests known from the time-invariant VAR(P) model (cf. the discussion
in LÜTKEPOHL [1991, ch. 4]) can be applied generally with only some slight modi-
fications.
Strategies for selecting simultaneously the number of regimes and the order of the
autoregression in Markov-switching time series models based on ARMA represent-
ations as weIl as usual specification testing procedures are introduced in Section 7.1.
These considerations are summarized in a bottom-up specification strategy. The
strategy is build upon a preliminary model selection to be presented in Section 7.2
whkh is based on the ARMA representation introduced in Chapter 3.
tion of standard asymptotic theory. This problem, as weIl as procedures for the deriv-
ation of the asymptotic null distribution of the likelihood ratio statistic, are discussed
in Section 7.5.
Why not use a top-down strategy? Starting with more elaborated models has the ad-
vantage that e.g. an MSMAHlMSIAH-VAR model can be easily estimated (as we
will show in Chapter 9). However, sinee we have to use numerical and iterative tech-
niques, this advantage is compromised by the potential danger of getting loeal max-
ima. This is due to the theoretical properties of these models discussed already in
Section 1.4: an MSMAH or MSIAH model can exhibit very different and extraordin-
ary statistical features which are hard to check theoretically. It therefore becomes
very important to perform estimations for alternative initial values. Furthermore,
from the results in Chapter 4, we know that forecasting becomes much harder if time-
varying autoregressive parameters are allowed. This view is stressed by the fact that
the analyst should be foreed to have some priors concerning the regime switching in
order to ensure that the model is identified.
7.1. A Bottom-up Strategy for the Specification of MS- VAR Models 125
Ho : a p = 0 vs. H 1 : a p =/:. 0
Ho: J1.m = J-ti for all i, m=l, ... , M, and ai =/:. a m for all i =/:. m
vs. H1 : J-tm =/:. J-ti for at least one i =/:. m, and ai =/:. a m for all i =/:. m
Suppose that economic theory or the data set under consideration indicate po-
tential regime shifts 1 . Then the analyst may start with some MSIIMSM(M)-
VAR(p)models which are chosen by the ARMA-representation-based model se-
lection procedure. Henceforth, an MSM model is only chosen if it is the most
parsimonious model feasible. The choice of an MSI specification is mainly motiv-
ated by practical considerations. As we will see in Chapter 9, for an MSI model
smoothing and filtering of regime probabilities and parameter estimation are much
less computationally demanding (and therefore much faster) than the statistical
analysis with an MSM model. 2 Hence, if there are no theoretical reasons which call
for an MSM specification, an MSI specification is preferred.
In the next step, the pre-selected models are estimated with the methods developed
in the last chapter. Since the estimation of MS-VAR models req uires numerical max-
imization techniques with the danger of convergence to a local maximum of the like-
lihood function, estimations should be performed for some initial values. Finally, the
statistically significant and econornically meaningful models are tested successively
against more general models. As Lagrange multiplier tests require only estimating
the parsimonious model, i.e. the restricted model, they might be preferred to LR and
Wald tests.
The proposed bottom-up specification strategy for single equation Markov-
switching models is shown in Table 7.1 in a systematic presentation. It is pointless
to list all test hypotheses related to the specification ofMS-VAR models. Numerous
examples are considered in the empirical analysis of Chapter 11 and Chapter 12. In
Section 7.4 the construction of statistical tests in MS-VAR models will be investig-
ated.
While a time-invariant Gaussian VAR(P) model is nested as an MS( 1)-VAR(P) model
in MS(M)-VAR(P), LR tests for the null of only one regime 01 = O2 = ... = 0M
are not easily available. Unfortunately, equivalence of the VAR parameters in all
1 Linearity tests as proposed by GRANGER & TERÄSVIRTA [1993, eh. 6] may be applied. If the linear
model is rejected by the linearity tests, this might be an indieation for MS-VAR models. But they
have power against several nonlinear models. To our knowledge there exists no partieular test with an
MS-VAR model as alternative without specifying and estimating the alternative. Unfortunately, there
seems to exist no deseriptive tool to deteet Markovian shifts reliably. In partieular, no graphical devices
are available, cf. TJ0STHEIM [1990].
2 Keeping our results in mind, it is surprising that beginning with the seminal eontributions of HAMIL-
TON [1988], [1989] the MSM specification clearly dominates empirical research with MS-VAR
models.
7.1. A Bottom-up Strategy for the Specification of MS-VAR Models 127
regimes implies that the Markov-chain parameters p are not identified as already seen
in Seetion 6.2. Thus, these nuisance parameters cause a bias ofthe LR test against the
null, see Section 7.5. Therefore alternative approaches may be preferable. If regime-
dependent heteroskedasticity is assumed, the number of regimes remains unaltered
under the null and standard results can be inferred for likelihood ratio tests of ßl =
... = ßM S.t. Um =f:. Uj iff m =f:. j.
Procedures testing Markovian regime shifts in the conditional mean E [Yt Irt-l , ~t-l]
can also be constructed as tests of restrictions on the transition matrix. For example,
Wald tests on i.i.d. regimes are feasible. Suppose that the hypothesis F = ~1' can-
not be rejected. Thus, the conditional density p(Yt Irt-l) would be a mixture of nor-
mals, but past regime shifts have neither predictable mean nor variance effects. Sim-
ilarly, a test for a reduced rank rk ( F) could be carried out.
[
fes} = s~ = 1) 1 F = [ Pu 1'
fes} = s; = 2)
~t P21
, P12 P22
Consider now the effects of intertemporally perfectly correlated regime shifts. For
example, suppose that the regime variable s} associated with the first equation leads
the regime variable of the second equation: s; = 8Ll' Then the model is given by
fes} = 1, SLI = 1)
~t
fes} = 1, SLI = 2)
fes} = 2, SLI = 1) H=[ Vu
V21
Vu
V22
V12
V21
V12
V22 1'
fes} = 2, SLI = 2)
128 Model Selection and Model Checking
Pu Pu 0 0
P12 P12 0 0
o 0 P22 P22
As in the last example, independent regime shifts in both equations imply a restricted
MS(4)-VAR(P) model:
l(s~ = 1,s; = 1)
~t
l(s}
l(s~
= 1,s; = 2)
= 2,s; = 1)
H = [ Vu
l/21
l/11
l/22
l/12
l/21
l/12
l/22
],
I(Sf = 2,s; = 2)
1 2 1 2 1 2 1 2
PUPll P11P21 P21PU P21P21
1 2 1 2 1 2 1 2
PuPu PllP21 P21Pll P21P21
F 1 2 1 2 1 2 1 2
P12P12 P12P22 P22P12 P22P22
1 2 1 2 1 2 1 2
P12P12 P12P22 P22P12 P22P22
Test procedures for these and other restrietions associated with a specification
analysis of estimated MS-VAR models will be discussed in Seetion 7.4. Notice
that parsimony with regard to the number of regimes is extremely desirable since
the number of observations, which are feasible for the estimation of the regime-
dependent parameters and the transition probabilities, shrinks dramatically when
the number of regimes increases.
In this seetion we will discuss some problems related to the specification ofMS-VAR
models based on ARMA representations. In particular, we present a strategy for se-
lecting simultaneously the state dimension M of the Markov chain and the order P of
the autoregression based on model selection procedures of the order of a univariate
ARMA model (or a final equations form VARMA model).
130 Model Selection and Model Checking
This approach is based on the VARMA representation theorems for MSM(M)- and
MSI(M)-VAR(P) processes, which have been derived in Chapter 3 (cf. Table 7.3). In
conclusion, an ARMA structure in the autocovariance function may reveal the char-
acteristics of a data generating MS-AR process. In the class ofMSI-AR models there
exists for any ARMA(p*, q*) representation with p* ~ q* ~ 1 a unique MSI(M)-
AR(P) model with M = q* + 1 and p = p* - q*. This result is summarized in
Table 7.4. Even if the regularity conditions do not hold, so that Table 7.3 provides
only the maximal orders, the specifications given in Table 7.4 are the most parsimo-
nious MSI-AR and MSM-AR models.
Since our results are closely related to POSKITT & CHUNG [1994], it seems to be
straightforward to adopt their statistical procedures for identifying the state dimen-
sion of the Markov chain. Based on linear least squares estimations, the identific-
ation process is consistent for the hidden Markov chain models. However their ap-
proach for identifying the number of states takes explicit account of the special struc-
ture of hidden Markov chains. An adjustment of their procedures for the conditions
of the models under consideration, as well as a general discussion of the statistical
properties of the proposed procedures, will be left for further research.
The representation theorems reduce the problem of selecting the number of states
and the order ofthe autoregression to the specification of ARMA models. Therefore,
the determination of the number of regimes, as well as the number of autoregressive
parameters, can be based on currently available procedures to estimate the order of
ARMA models. In principle, any of the existing model selection criteria may be ap-
plied for identifying M and p. To restrict the computational burdens associated with
the non-linearities of maximum likelihood estimation, model selection criteria may
be preferred which are based on linear LS estimations Ce.g. HANNAN & RISSANEN
[1982] and POS KITT [1987]). Altematively, for specifying univariate ARMA mod-
7.3. Model Checking 131
In the case of vector valued processes, identification techniques can be based on well-
established estimation procedures of the order of a final equation VARMA represen-
tation (cf. LÜTKEPOHL [1991]). A problem that should be mentioned is that the fi-
nal equations VARMA models lead only to restrictions on M + p. This is clearly
a dis advantage as it is the possibly large number of parameters. We have therefore
restricted our attention to the specification of univariate Markov-switching models.
We continue the discussion with model-checking. For this task, some descriptive
model-checking tools are introduced in the following section.
132 Model Selection and Model Checking
As in the linear regression model, checking might be based on the presence of struc-
tures in the estimated errors. In the MS-VAR model three alternative definitions can
be distinguished:
etlt-l Yt - E[YtIYt-l;,\ = ~l
Yt - X t B F~t-llt-l.
where the one-step prediction error etlt-l is based on the predicted regime probab-
A A
ilities ~tlt-l and the residuum Üt on the smoothed regime probabilities ~tIT.
Thus, etlt-I is a vector martingale difference sequence with respect to the infor-
mation set Yt-I'
If sampie moments of the conditional residuals Umt are computed, they have to be
weighted with their smooth regime probabilities ~mtlT as in the maximization step in
Chapter 6. For example, the sampie variance of each series of conditional residuals
may be helpful for a test of the homoskedasticity assumption,
A test to determine whether the residuals etlt-I are white noise can be used, while
to test whether the regime-dependent residuals Ut are white noise, the residuals are
weighted with their regime probabilities.
Model checking techniques have to take into account the non-normality of the predic-
tion errors and conditional distributions. Hence, statistical devices employed should
not rely on a normality assumption conceming the prediction errors or conditional
densities of the endogenous variables.
Typical statistical tools for checking linear models are the residual autocorrelations
and the portmanteau statistic. Since we are not sure about the asymptotic distribution
of the residual autocorrelation, such an instrument can be used only as a descriptive
device. In the time-invariant VAR case prediction tests Jor structural change are weH
established model checking devices. In MS-VAR models, however, the predicted
density of Yt+hlt is no longer normal and standard statistics cannot be used uncrit-
ically. Given the asymptotic normality of the ML estimator, the model specification
can be tested analogously to time varying models with deterministic model shifts like
periodic models (cf. interalia LÜTKEPOHL [1991, eh. 12]).
In accordance with the linear regression model the fitting of the data might be meas-
ured with the coefficient of determination
2
1- se
.- S2
Y
where the one-step prediction eITors etlt-l are used to measure the fit of the data.
A cOITection for the bias towards preferring the larger model can be obtained by mul-
tiplying R 2 with the ratio of the degrees of freedom and the number of observations
fl? = 1 _ T- 1 (1 _ R 2 )
T - M(M - 1 + K) - Kp - K(K + 1}/2 - 1 .
R2 = 1 _ T - 1 (1 _ R 2 ) (7 1)
T - M(M - 1 + K + K(K + 1}/2) - Kp - 1 ,.
If the conditions under which the standard asymptotic distribution theory holds are
satisfied, the likelihood ratio, Lagrange multiplier and Wald tests of most hypotheses
of interest all have the usual null distributions. Unfortunately, for one important ex-
ception standard asymptotic distribution theory cannot be invoked, namely, hypo-
thesis tests of the number of states of the Markov chain. Specification procedures
for those hypotheses altering the number of regimes under the null will be discussed
in Seetion 7.5. Before that, testing under standard asymptotics is considered in the
following Seetions 7.4.1 to 7.4.4.
7.4. Specification Testing 135
(7.3)
More details can be found in LÜTKEPOHL [1991, sec. C.5]. A necessary condition
for the validity of these standard results is that the number of regimes M is unaltered
under the null. The critical situation, where the number of regimes changes, will be
discussed in Section 7.5.
As long as the number of regimes remains unchanged under the null, t-tests and F-
tests concerning linear restrictions of the VAR coefficient vector () can be performed
as in linear models. Note, however, that the calculation of the variance-covariance
matrix differs obviously from the linear regression model.
Under the same conditions which ensure the applicability of standard asymptotics,
the LR test statistic has the same asymptotic distribution under the null hypothesis
as the Lagrange multiplier statistic and the Wald statistic.
The scores can also be used to implement Lagrange multiplier (LM) tests of
(7.4)
136 Model Selection and Model Checking
While the scores of an unrestricted model have sampie mean zero by construction as
discussed in Section 6.6.2,
T
s(,x) =L 'l1 t (,x)' ttlT = 0, (7.5)
t=l
(7.6)
vs. H1 : vech (~i) =1= vech (~m) for at least one i =1= m, (MSMH-VAR).
frlt
with
where Umtb) = Yt - Xmt'Y are the residuals at time t associated with regime m,
DK = :ve~
vec
!:!:m
Tn
is the (K 2 x K(K + 1)/2) duplication matrix as in (6.31) and
2;m = 2; is valid under the null.
The LM test is especially suitable for model checking because testing different model
specifications against a maintained model is straightforward. A new estimation is not
required as long as the null hypothesis is not altered. Note that the LM test operates
under the following conditions:
• The model is estimated under the null so that, for all unrestricted parameters,
the score s('\r) is zero. The scores of the last R elements are calculated ac-
cording to equations (6.8) and (6.10). Their magnitude reflects how much the
likelihood function increases if the constraints are relaxed .
Suppose that the parameter vector is partitioned as A = (Al, A2) and the interest
centers on linear restrietions on the parameter vector A2,
while there is no constraint given for the parameter vector Al. Then the relevant Wald
statistic can be expressed as
To make the procedure a bit more transparent, it may be helpful to consider some
applications:
[lM-10IK -IM-10hJ[:t]=o
7.4. Specification Testing 139
:t ]
vs. H1 : (MSMH(M)-VAR(P) model)
[ IM-I 0I K -IM-I 0I K ) [ ;i 0
As suggested by WHITE [1987], tests can be based on the conditional scores by using
the fact that the scores should be serially uncorrelated,
(7.9)
where
Eh", [ (t ,(X1
6 6'(\)') - I
• Autocorrelation:
• ARCH effects:
Unfortunately, HAMILTON [1991 b] found that these tests have poor small sampie
properties.
7.5. Determination of the Number of Regimes 141
A special problem which arises with the MS-VAR model is the determination ofthe
number of states required for the Markov process to characterize the observed pro-
cess. Testing procedures suffer from non-standard asymptotic distributions of the
likelihood ratio test statistic due to the existence of nuisance parameters under the
null hypothesis. For the derivation of the asymptotic null distribution, procedures
have been proposed by HANSEN [1992] and GARCIA [1993].
theory. 3 Hence, classical critical values may be used to check that the null cannot be
rejected, if LR < xLa(1). For this series of MS(2)-AR(P) it is shown by GAR-
CIA [1993] that the asymptotic distribution is elose to the small sampie distribution,
whereby the procedures proposed by HANSEN [1996b] to simulate central X2 pro-
ces ses have been employed. However, this approach is computationally demanding
and therefore only of limited use for empirical research with highly parameterized
models and vector systems.
The test procedures suggested by HANSEN and GARCIA are elosely related to DAV-
IES' [1977] bounded likelihood ratio test. The point of these procedures is to avoid
the problem of estimating the q nuisance parameters An by setting a grid of values of
the nuisance parameters, estimating the remaining vector of identified parameters Ai
and considering the likelihood ratio statistic conditioned on the value of the nuisance
parameters:
and constructing a test statistic based on the resulting values of the objective function,
LR = sup LR(A n ).
An
As shown by ANDREWS & PLOBERGER [1994], sup LR{A n ) is not the optimal test
which has an average exponential form. However, the power of the LR test is almost
insensitive to the choice of sup LR(A n ).
DAVIES [1977] has derived an upper bound for the significance level of the likelihood
ratio test statistic under nuisance parameters, which might be applied to a test of the
null hypothesis of M - 1 states. If the likelihood has a single peak, the following
approximation is valid:
3The critical values depend on the value of the autoregressive parameter, but in no case is the 5% critical
value less than eight.
7.5. Determination of the Number of Regimes 143
In addition, the so-called I-test for non-nested models of DAVIDSON & MACKIN-
NON [1981] can be applied. The model with the larger number of states M is estim-
ated and the fitted values y~M) are inserted into the regression of Yt in a model with
M - 1 states
Yt = (1 - 8)Xt Bd M - I) + 8y~M) + €t,
where y~M) = xtBtit:.). Then the coefficient 8 is subject to at-test.
An application of these testing procedures to MSM(M)-VAR(P) and MSMH(2)-
VAR(P) is discussed in GARCIA & PERRON [1990].
As in the c1assical theory, the LR, LM and Wald statistics all have the same asymp-
totic distribution under the null as shown by HANSEN [1996b]. In order to make
these procedures a bit more transparent we sketch briefly a Wald test of the hypo-
thesis Ho : J-L* = J-LI - J-L2 = 0 against the alternative J-L* = J-LI - J-L2 =j:. 0 as
considered by CARRASCO [1994] for the MSI(2)-AR(0) model. 4 The ML estimates
fi, * = fi,I - fi,2 and fi,2 have a joint limiting distribution of the form
])
The Wald statistic is given by
Unfortunately, P is a vector of nuisance parameters that are not identified under the
null. For given transition probabilities p, the Wald test statistic would have its stand-
ard asymptotic distribution under the null
4 CARRASCO [1994] derives also the asymptotic distribution of the Wald statistic of a threshold model
and a structural change model when the true model is a misspecified Markov-switching model and
constructs a Wald encompassing test (cf. MIZON & RICHARD [1986]) of the structural change model
by the Markov-switching model.
144 Model Selection and Model Checking
d
sup LWT(p) -t
pEP
In this chapter we have just scratched the surface of model selection and check-
ing techniques in MS-VAR models. It must be emphasized that the previous ana-
lysis rests on some basic assumptions and most of the presented results will not hold
without them. Furthermore, investigations of the small sampie properties of the em-
ployed statistical tests are needed.
Model selection and model checking represent an important area conceming empir-
ical investigations with MS-VAR models. Therefore, the development of an asymp-
totic theory and of new statistical tools for the specification of MS-VAR processes
merits future research.
Chapter8
In this section we discuss the use of simulation techniques to estimate and forecast
MS-VAR processes. A general feature of MS-VAR models is that they approximate
non-linear processes as piecewise linear by restricting the processes to be linear in
each regime. Since the distribution of the observed variable Yt is assumed normal
conditional on the unobserved regime vector ~t, the MS-VAR model is weH suited
for Gibbs sampling techniques.
The Gibbs sampier has become increasingly popular as a result of the work of GE-
MAN & GEMAN [1984] in image processing, and GELFAND & SMITH [1990] in
data analysis (cf. SMITH & ROBERTS [1993]). In particular, the Gibbs sampier is
quite tractable for parameter estimation with missing values, see for example Ru-
ANAIDH & FITZGERALD [1995]. The crucial point is that the unobservable states
can be treated as additional unknown parameters. Thus, the joint posterior distribu-
tion of parameters and regimes can be analyzed by Monte Carlo methods.
Existing Gibbs sampling approaches 1 for MS(2)-AR(p) models have been intro-
duced independently by ALBERT & CHIB [1993] and MCCULLOCH & TSAY
[1994b]. ALBERT & CHIB [1993] present a single-move Gibbs sampier for an
MSMlMSMH(2)-AR(P) model, while MCCULLOCH & TSAY [1994b] consider
a more general MS(2)-ARX(P) model. The latter approach has been applied by
GHYSELS [1994] to periodic MS-AR models. An extended version has been used
by FILARDO [1994] to estimate an MS-AR model with time-varying transition
probabilities. Unfortunately, Gibbs sampIers available in the literature are restric-
ted to univariate time series and to the presence of only two regimes. Since we
There is a wide range of views about the appropriate way to develop a Gibbs sampier
for a given problem. For the purpose of a reduction in correlation between con-
sequent iterations of the Gibbs sampling algorithm, and thus increased convergence
and efficiency, we suggest ways of modifying the single-move Gibbs sampling ap-
proach of ALBERT & CHIB [1993] and MCCULLOCH & TSAY [1994b] to a multi-
move sampIer. The difference between single-move and multi-move Gibbs sampier
lies in the generation of the state variables. While the single-move Gibbs sampier
generates each state variable ~t conditional on the observations Yf = (y~, . .. , y~ )
,
and all other generated regimes ~~t = (~~, ... , ~:-l ~~+l ' ... , ~T ),
the multi-move Gibbs sampier produces the whole state vector f = (~~, ... , ~T )
simultaneously from the joint probability distribution given the sampie YT and the
parameter vector ,X
~ f--' Pr(~IYT, 'x).
This multi-move sampling of the regime vector ~ is implemented by incorporating
the slightly revised filtering and smoothing algorithms for MS-VAR models which
have been discussed in Chapter 5. The aim of this modification is to reduce the cor-
relation between the draws of consequent iterations. Thus, an increased speed of
convergence of the Gibbs sampIer to the desired posterior distribution and an effi-
ciency of estimates relative to the algorithms proposed in the previous literature can
be achieved.
The chapter will be organized as folIows: we start our discussion with abrief intro-
duction to the Gibbs sampling technique. In the following sections it is shown that
generating the complete regime vector ~ is straightforward for Markov-switching
time series models by using the smoothed full sampie probabilities ttlT. Again, this
is a bit more sophisticated for MSM specifications. Given the regimes ~, Bayesian
inference about the parameter vector ,x is quite standard. The conditional posterior
distribution of the transition probabilities can be derived as in Markov-chain mod-
els. In this chapter, the Bayesian analysis is based on a generalized MS(M)-VAR(P)
model, which is linear in the vector I of VAR parameters. Finally, the usage of the
Gibbs sampIer for prediction purposes is discussed.
8.1. Bayesian Analysis via the Gibbs Sampler 147
The Gibbs sampier is an iterative Monte Carlo technique that breaks down the prob-
lem in Bayesian time series analysis of drawing sampies from a multivariate dens-
ity such as p(~, ),IYT ) into drawing successive sampies from lower dimensional (in
particular univariate) densities. Thus the regimes ~ and the parameter vector ), are
drawn from the smoothed regime probability distribution Pr(~IYT,),) and the con-
ditional density p(),I~, Yr ). Following a cycIical iterative pattern, the Gibbs sampier
generates the joint distribution p(~, ),!Yr) of ~ and)'. TIERNEY [1994] proves the
convergence of the Gibbs sampier under appropriate regularity conditions. A general
discussion ofthe involved numerical Bayesian methods can be found in RUANAIDH
& FITZGERALD [1995].
The main idea of the Gibbs sampier is to construct a Markov chain on (~, ),) such that
the limiting distribution ofthe chain is the joint distribution of p(~, ),!YT). Given the
data set YT and initial values 2 ), 0 , the Gibbs sampier consists of the following moves
"t-J" at each iteration j 2: 1:
where the parameter vector has been partitioned and ),-i is the complement to ),i.
. (j-1)' _ ( (j)' (j)' (j-1)' (j-1)')
More prec1sely, we have ),-i - ) , 1 ' · · . , \-1 , \+1 , .. ·'),R . Each
iteration involves a pass through the conditional probability distributions. As soon as
a variate is drawn it is substituted into the conditional prob ability density functions.
The Gibbs sampIer produces aseries of j = 1, ... , NI, ... ,NI + N 2 dependent
drawings by cycIing through the conditional posteriors. To avoid an effect of the
starting values on the desired joint densities and to ensure convergence, the first NI
draws are discarded and only the simulated values from the last N 2 cycIes are used.
The simulated values (~(j), ),(j»), j = NI + 1, ... , NI + N 2 are regarded as an
approximate simulated sampie from p(~, ),IYT). To compute the posterior density
2For their MS(2)-ARX(P) model, MCCULLOCH & TSAY [1994b] propose to use the estimates from a
linear multiple regression (M = 0) as initial parameter values.
148 Multi-Move Gibbs Sampling
1 N 1 +N2
p(.AiI Y ) = N L p(.A~j)I.A-i'~'YT). (8.1)
2 j=Nl+l
As emphasized by ALBERT & CHIB [1993], the numerical standard error Ui of the
estimate ~i cannot be calculated as usual by si!..;N;" where Si is the standard devi-
ation of .Ai in the sampled series,
This results from the fact that the quantities involved are sums of correlated obser-
vations. However, this effect can be corrected by invoking the batch-means method
(cf. RIPLEY [1987]): the sampie is divided in n batches of size N 2 /n, such that the
lag correlation of the batch means is just under a given c, e.g. c = 5%. Then the
numerical standard error is estimated by Ui = s;j Vii as if the batch means would
constitute the sampie, where Si is now the standard deviation of the batch means
-(k) _ .
\ ,k-l, ... ,n.
Si= ~t(:X~k)-E[.Ai]r·
k=l
As suggested by GEWEKE [1994] and PFANN et al. [1995], some quantities can be
more easily and accurately computed by using the analytical expression for the con-
ditional expectation and averaging over the conditional expectation. For example,
the expected value of .Ai can be calculated as the average conditional mean
where ~~j) is the mean of the conditional posterior distribution of .A~j) at the j-th
iteration of the Gibbs sampier. Analogously, the variance can be estimated as the
8.2. Bayesian Analysis of Linear Markov-Switching Regression Models 149
The parameters p = vec (P) of the Markov chain, the scale parameter vectors (J' m =
vech (L;m), (J" = «(J'~, ... , (J'M) and the location parameters I are collected to the
parameter vector3
homoskedasticity of
. Wh'Ite nOlse
the G ausslan . tLt. =
a parameter vector '"\' (I 'c' Ir, '1'
I ...• 'M'
"(j',p . use d• where
) IS
(j = vech (L:).
150 Multi-Move Gibbs Sampling
The conditional densities required for the Gibbs sampier can be derived from the
likelihood function. For given ~ the likelihood function is determined by the density
function p(YT I~, A):
where Ut(-y) = (Yt - [(1, ~:) ® IK]Xt/). For purposes of estimation a slightly
different formulation of the likelihood function is useful:
u=
10
o x~J 1= 11
IM
WoI-I
o ~lt~011
1'
W- l = [ W t- 1 -
- [
w.-
T
l
8.2. Bayesian Analysis of Linear Markov-Switching Regression Models 151
Suppose that the prior density of Ai is denoted by p(Ai I~, A-i), then the conditional
posterior density of Ai is given by
4For the precision matrix as the inverse ofthe variance-covariance matrix L:, HAMILTON [1991a] fol-
lowing DEGROOT [1970] suggests the use of a Wishart distribution L:;;.1 rv W(a m , Am) with a
degrees of freedom and a (K X K) precision matrix A, such that
p(L:~ I } cx 1L:~ll(am -K -1)/2 exp[ -(1/2) tr (A 7n L:~1 )].
152 Multi-Move Gibbs Sampling
and thus closely related to the ML estimator discussed in the previous chapter.
We continue with the derivation of the posterior probability distributions for the gen-
erallinear Markov-switching model under consideration. Our approach is summar-
ized in Table 8.1 on page 164, which presents the Gibbs sampling algorithm.
We begin the presentation of the Gibbs sampier by discussing the derivation of the
posterior distribution of the regime vector ~. In the Gibbs sampiers proposed by
ALBERT & CHIB [1993] and MCCULLOCH & TSAY [1994a], the states are gen-
erated one at a time ("single move") utilizing the Markov properties to condition on
neighboring states (cf. CARLIN et al. [1992]). Unfortunately, since the regimes are
highly correlated, the desired asymptotic distribution of the sampier might be ap-
proached only very slowly. MCCULLOCH & TSAY [1994b, p. 529] mention that
drawing such highly dependent variables together speeds-up convergence. There-
fore, they propose to sampie the regimes from the conditional probability distribu-
tion Pr(~t, ... , ~t+k-lIYT, 6, ... , ~t-l, ~t+k,"" ~T, A) for an arbitrary k.
The use of a multi-move Gibbs sampier has been suggested independently by SHEP-
HARD [1994] and CARTER & KOHN [1994] for related time series models. Among
other partially non-Gaussian state-space models, SHEPHARD [1994] considers a
state-space model where the intercept term depends on a binary Markov chain of
the transition equation and where the innovations are normally distributed. CAR-
TER & KOHN [1994] consider a linear state-space model with varying coefficients
8.3. Multi-Move Gibbs Sampling of Regimes 153
and errors that are a mixture of nonnals. The approach is applied to an MSH(2)-
AR(O) model which has been used by Box & TIAO [1968]. Following ANDERS ON
& MOORE [1979], it is shown that a smoothing algorithmrelated to KIM [1994] can
be used to generate the conditional probability distribution of the regimes. An applic-
ation to a switching regression state-space model used by SHUMWAY & STOFFER
[1991] is mentioned, but without going into details. The approach is then supported
theoretically by the results of Lw et al. [1994] who show that generating variables
simultaneously produces faster convergence than generating them one at a time.
In the following section we derive the algorithm for multi-move Gibbs sampling. It
is shown that the conditional posterior distribution of regimes involves the smoothed
regime probabilities ~tIT' Therefore, the Gibbs cycle is closely related to the EM
algorithm for ML estimation since it makes use of the same filtering and smoothing
procedures.
In this section we use the multi-move Gibbs sampling approach, generating all the
states at on ce by taking advantage of the structure of the Markov chain,
T-I
Pr('IYT) = Pr(~TIYT) II Pr('tl~t+1, yt). (8.8)
t=1
Equation (8.8) is analogous to Lemma 2.1 in CARTER & KOHN [1994] where it is
derived for conditionally normally distributed state variables.
Thus to generate ~ from the posterior Pr('IYT), we first draw 'T from Pr('TIYT)
that is the smoothed full-sample prob ability distribution which can be derived with
the BLHK filter. Then ~t, t = T - 1, ... , 1, is generated from Pr(~t l~t+1' YT ).
In the course ofthe discussion of KIM's smoothing algorithm it has been shown that
the distributionPr('t l't+1, YT ) is equal to Pr(~t l~t+1' yt) and, thus, can be deduced
from
(8.10)
154 Multi-Move Gibbs Sampling
where 8 and 0 denote the element-wise matrix multiplication and division respect-
ively.
With the exception that the generated ~t+ 1 is used instead of the smoothed probabil-
ities ~t+1IT' equation (8.10) works analogously to the smoothing procedure involved
in the EM algorithm of ML estimation.
Pr(~T=/'lIYT) 1 [ Pr(~t=/'ll~t+1, 1 YT )
where~TIT = [ : andttl~t+l,YT =
Pr(~T~/'MIYT) Pr(~t=/'MI~t+1,
YT )
denotes the prob ability distribution of ~t conditional on the previously drawn re-
gime vector ~t+1 and the sampie information YT . To ensure identification at the de-
termination of the conditional prob ability distributions of the transition and regime-
dependent parameters (see Section 8.4), a sampie can be accepted only if it contains
at least one draw of each regime.
In contrast to the handling of initial states of the Markov chain in the EM algorithm of
maximum likelihood estimation, we assume that the regimes in t = 0, ... , 1 - P are
generated from the same Markov process as the regimes in the sampie t = 1, ... , T.
Assuming that the Markov process is ergodic, there exits a stationary probability
distribution Pr( ~t Ip), where the discrete probabilities can be included in the vec-
tor [=[(p). Irreducibility ensures that the ergodie probabilities are strictly positive,
[m > 0 for all m = 1, ... , M.
Note that determination of [has already been discussed in the first chapter. The esti-
mation procedures established there are unaltered whether the single-move or the
multi-move Gibbs sampier is used for drawing the state vector~.
8.4. Parameter Estimation via Gibbs Sampling 155
Therefore, the conditional distribution can be described with the help of the sampie
estimates: let nij denote the number of transitions from regime i to j in the sampie
of ~ and define ni = L~l nij. Then the likelihood function of pis given by
rr Pr(~t I~t-l, rr II
T M M
= p) = (Pij )nij .
t=l i=l j=l
This formulation of the likelihood function does not take account of the adding-up
restrietion on the transition probabilities explicitly. Given that PiM = 1- L~~l Pij
and niM = ni - L ~~ 1 nij for all i = 1, ... , M, the likelihood function of p equals
p(plO
For the two-regime case as discussed in the literature, it can be easily seen that the
desired posterior is a product of independent Beta distributions,
p(plO
In generalization of this procedure we can deal with equation (8.13) as folIows. Cal-
culate the distribution of Pij conditional on Pil, ... , Pi,j-l, Pi,j+l, ... ,Pi,M -1 as:
*
Pij ( 1- L
j-l
Pim -
M)
L Pim Pij (8.14)
m=1 m=j+l
has a standard Beta distribution with hyperparameters nij, niM as its conditional
posterior. To generate the transition probability Pij, we are sampling first pij from
this Beta distribution,
(8.15)
and then transfonn the draw pr j into the corresponding parameter of interest:
-1
j-l M
Pij
(
1- L Pim - L Pim
)
pTj · (8.16)
m=1 ==j+l
This procedure is iterated for j = 1, ... , M - 1, while the transition probability PiM
is determined by the adding up restrietion:
M-l
PiM 1- L Pij, (8.17)
j=1
where i = 1, ... , M.
8.4. Parameter Estimation via Gibbs Sampling 157
p(YTI~, ..\)
where Ut(~t, ,) = ~~Ut, u* = [diag (0 ® IKju = (~® IK) u and W*-1 = (IT ®
~-1). By collecting the elements of u* in a (T x K) matrix U* = (ui, ... , ur)',
we have
, T T
u* Wu* u *' (IT ® "'-1)
L.J U* =~
~ u t*'",-1
L.J u t* = ~t
~ r ( u t* u t*'",-1)
L.J
t=l t=l
(8.18)
Thus, the joint probability distribution ofthe K (K + 1) /2 elements of~ is the inverse
Wishart distribution
where the mean E[:E- 1 ] = (a - K -l)A ofthe conditional density of :E- 1 is exactly
the inverse of the ML estimate of :E under the conditions considered,
~U*U*'. (8.21)
(8.22)
p(YTI~, >')
where ~~ = (~~l' ... ' ~~T) and Um = (y - Xo'Yo - Xm'Ym) = (U~l'···' U~T)'
is aT K dimensional vector. Collect the elements of Um in a (T x K) matrix Um =
(Um!, ... , UmT) and after some algebraic manipulations, the conditional posterior
distribution of L;m is given by
(8.23)
Thus, the joint probability distribution of the K (K +1) /2 elements of L; is the inverse
Wishart distribution
(8.25)
~ 1 I
L;m = Tm UmUm· (8.26)
For a simulated path of regime ~t conditions are established as if the regimes were
observable. Thus, the conditionallikelihood function model is equivalent to the like-
lihood function of an intervention VAR model. Such a model structure is associated
with struetural ehanges in time series where the parameter variations are systematic.
Given Rat priors phlYo, ~, 0') the eonditional posterior distribution of I is propor-
tional to the likelihood funetion given by equation (8.6). Therefore, we get anormal
distribution, i.e. phl~, YT , 0') is N (1', Var (-y)), where the postefior mean beeomes
the ML estimator 1'. A classical statistician would eonsider a normal distribution of
I given ~ to be valid only asymptotieally, sinee X t eontains lagged dependent vari-
ables. Hefe however, p( II~, YT) is the exaet small-sample posterior distribution as
in traditional Bayesian analysis (cf. HAMILTON [1994b, eh. 12]). Henee, the Gibbs
160 Multi-Move Gibbs Sampling
sampler6 is drawing / from a normal distribution with mean i and variance Var (i)
The mean of the location parameters / is given by the well-known GLS estimator
which is identical to the ML estimator for given~,
In equation (8.27), all VAR coefficients are drawn from their joint conditional
posterior density. MCCULLOCH & TSAY [1994a] suggest considering the con-
ditional posterior distributions of the regime invariant parameter vector /0 and
regime-dependent parameter vectors /1, ... , / M separately.
For the derivation of the posterior distribution of the common parameter vector /0
conditioned on the observations yt, the regimes ~, the variance-covariances eT, and
the regime-dependent parameters /1 , ... , / M, we are transforming the data by
and denote
o
Yo = [
Y01
YOT
]
XO = [
X01
X OT
]
W o- 1 --
~-1
T
].
6Implementation note: Since the conditional density p( 1'1~, YT, u) for the autoregressive parameter l'
is muItivariate Gaussian, l' rvN( 1',:E.y), a random sampIe vector l' can be generated by a vector e: of
independent standard normally distributed random variables, as l' = l' + Qe:, where the matrix Q is
the square root of the variance-covariance matrix I:.y such that QQ' = :E.y. This can be carried out
using a standard Choleski decomposition of the positive definite variance-covariance matrix.
8.4. Parameter Estimation via Gibbs Sampling 161
and the posterior variance Var bo I~, YT, U, 11, ... , IM) is given by
(8.33)
Pbml') pbmIYTTn,~m,Um)
where Ym = Y - X O,O and X~ = (X;"l' ... , X;"T)' The joint prob ability distri-
bution of the elements of Im is therefore normal:
(8.34)
with moments given by LS estimates if the regressors of each equation are identical,
such that X m = (X m ® I K ) holds,
((X~Xm)-lX~ ® I K ) Ym,
(X~Xm)-l ® l:m.
162 MuIti-Move Gibbs Sampling
Since the label of states and the submodels are interchangeable, the MS-VAR model
would be unidentifiable in the data fitting process. Hence, certain constraints are ne-
cessary to overcome the identifiability problem. As pointed out by MCCULLOCH &
TSAY [1994a], the investigator must have some priorbeliefs about how in the partic-
ular application the states differ. These beliefs become part of the modelling process.
For the sake of simplicity we denote the restricted parameter /-Lm,k as the first com-
ponent in Im, m = 1, ... ,M. Hence, the conditional densities of Im are truncated
normal:
(8.35)
in equation (8.34) and then discard the draw which violates the restriction, i.e.
Im,l > Im-l,l·
The draw of Im from the truncated normal distribution can be more easily obtained
by the method of inversion. 7 Let the vector Im,2 contain the unrestricted parameters
of regime m and Oij denote Cov (Im, i, Im,j) such that for m = 2, ... , M:
7For univariate time series, see ALBERT & CHIB [1993, p.5].
8.5. Forecasting via Gibbs Sampling 163
In this section we have considered the principles of Gibbs sampling for parameter
estimation and regime reconstruction in linear unrestricted MS regression models.
Before discussing the use of the Gibbs sampIer as a forecasting device, Table 8.1
summarizes the results of this section in form of an algorithm.
A major advantage of the multi-step Gibbs sampIer compared with the classical ML
estimation is the feasibility of generating forecast intervals. If the iterations (8.36)
and (8.37) are embodied in the regular Gibbs cycle, sampIes can be generated sim-
ultaneously from the parameter posterior and the prediction posterior. As such, it is
possible to obtain the non-normal prediction density of any future observation.
The foundations of forecasting MS-VAR processes have been discussed in the con-
text of the linear state-space representation. However, the investigation was restric-
ted to MSPE-optimal predictions. Forecasting via Gibbs sampling has the objective
of determining the Bayes prediction density p(YT+hIYT). The issue of forecasting
future observations using a single-move Gibbs sampier is discussed in ALBERT &
CHIB [1993, p. 8]. Starting with the one-step prediction of YT+l, this can be easily
done using the decomposition
where T/T+1 contains again the conditional prob ability densities of YT+1
J. Initialization.
(0)
'Yo (X~XO)-lXOYO
T
1 "'(
E(O)
m T L...J Yt - Xot'Yo(0))(Yt - Xot'Yo(0))'
t=l
('Y!,l'O'),
pO
~T +-' ~TIT,
~T-j +-' [F'(~T-i+l 0 tT-j+1IT-j )]0 tT-jIT-j.
*
Pij
j-l M
PiM 1- L Pim·
m=l
3. Inverted Wishart Step:
4. Regression Step:
Thus, sampies from the Bayesian predietion densities ean be obtained by sampling
for eaeh draw of (~, A) made available via the Gibbs sampier. Implementing these
two steps along with the regular Gibbs eycle produees sampies on whieh ealcula-
tions of the predietion density ean be based. For eaeh eyde, the eonditional densities
p(YT+hl~T+h, YT+h-l, A) are normal, i.e.
p(YT+hl~T+h, YT+h- 1 , A)
Note, that the predietion density ineorporates both parameter uneertainty and state
uneertainty. This is extremely helpful for MS-VAR models, sinee the eonditional
distribution of YT+hIYT is a mixture of normals. For interval foreeasts, the eondi-
tional mean and varianee are not sufficient as in the Gaussian VAR model.
8.6 Conclusions
Gibbs sampling has many attraetive features. Foremost among these are its eomputa-
tional simplicity and its eonvergenee properties. A major advantage is its feasibility
to generate the non-normal prediction density of any future observation.
If the foreeasting reeursions are embodied in the regular Gibbs eycle, sampies from
the predietion posterior are generated simultaneously with those of the parameter
posterior.
The general framework for ML estimation of the MS(M)-VAR(P) model was laid
out in Chapter 6. In Chapter 8 the methodological issues of Gibbs sampling and its
conceptional differences to the EM algorithm have been discussed. In this chapter,
we will focus on the technical aspects of estimation of the VAR coefficients under
the various types of restrictions. 1
1 Note that in HAMILTON [1990] only the univariate MSIA(M)-AR(P) model is discussed explicitly.
The MSI(M)-AR(P) model and the MSIH(M)-AR(P) model are discussed under the assumption p =
0, which is a very crucial restrietion for purposes of time series analysis. It is therefore important to
relax it here.
2For example, the GLS estimation of the three regime models of the six-dimensional system in Chap-
ter 12 with 120 observations would involve multiplications with the (2160 x 2160) matrix W- 1 .
168 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
1
MSM Specification MSI Specification
J.I varying J.I invariant .
I 11 varying 11 invariant
Notation: MS ... Markov switching mean (M), intercept tenn (I), autoregressive parameters (A) andlor
heteroskedasticity (H)
MVAR mean adjusted vector autoregression
oVAR vector autoregression in its intercept fonn
Gibbs SampIer and the EM algorithm for maximum likelihood estimation. After this
introduction we summarize in Seetion 9.1 the BLHK filter and smoother, which pro-
e
duce the vector of simulated regimes and the vector of smoothed regime probabil-
ities etlT respectively, as inputs for the maximization step and the regression step,
respectively. At the regression step of the Gibbs sampIer these smoothed regime
probabilities can be taken as if they were the true vectors of regimes. It has been
shown in Chapter 6 that the same does not hold for the EM algorithm. The resulting
set of regression equations yields a time-varying VAR with observable regimes and
is discussed further in Seetion 9.3.3. The implications for the EM algorithm (Sec-
tion 9.3.2) and for the Gibbs sampIer (Seetion 9.3.3) follow.
For the particular Markov-switching vector autoregressive models a number of sim-
plifications result and closed form expressions can be given for the GLS estimator
which have to be performed at each iteration of the EM algorithm (maximization
step) and the Gibbs sampIer (regression step) respectively. An overview is given in
Table 9.1.
9.1. Analysis of Regimes 169
S""
(TxT)
= diag (~"")
- = diag(e) ~ = (~;, ... ,~:W)' T = tr(S) = 1~~
(MTxMT) (MTxl)
. ( ..., "")
(TxT)
~""
(Tx 1)
= ~""IIT""'~""TIT
In this chapter we are investigating the estimation of the parameters of the vector
autoregression for a given inference on the regimes (the maximization step of the EM
algorithm), respectively the derivation of the posterior distribution of the parameter
for given regimes in the sampie (the regression step ofthe Gibbs sampier). Since the
following considerations are based on a previous analysis of regimes within the EM
algorithm and the Gibbs sampier, we will discuss them briefly. In Table 9.3 on page
170, the usage of the BLHK filter and smoother at the expectation step of the EM
algorithm and the Gibbs sampier, as weH as the treatment of the parameters of the
hidden Markov chain, are visualized.
The expectation step of the EM algorithm uses the forward recursion (5.6) of the filter
and backward recursion (5.13) ofthe smoother. The transition probabilities Pij are
estimated with the transition frequencies ~ which are calculated from the smoothed
regime probabilities ~ilj.. according to (6.14).
Furthermore, in order to maintain an identical notation for the next remarks, the in-
formation produced by the BLHK filter and smoother are summarized in Table 9.2.
Note that we have introduced no new symbols for the simulated regimes, thus main-
taining the use of ~.
170 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
Gibbs-Sampler
~T +-' ~TIT,
*
Pij Beta(nij, niM),
M-l
PiM 1- L Pim, i = 1, ... , M.
m=1
EM Algorithm
1. Expectation Step
F(1Jt ttlt-d
<::)
1 M(1Jt 0 ttlt-d
[F'(tT-j+1I T 0 tT-i+1IT-j)] <::) tT-jIT-j, j = 1, ... , T-1.
2. Maximization Step
Pij
M-l
PiM 1- L Pim, i = 1, ... , M.
m=1
9.2. Comparison of the Gibbs SampIer with the EM Algorithm 171
For a given state vector~, the regression step is based on the same estimation pro-
cedures established for ML estimation via EM at which 2tlT is substituted by E t . If
the priors are ftat, then the estimates maximize the likelihood function and the ML
estimates are derived as the mean of the posterior distribution.
To make the procedure a bit more transparent it may be helpful to compare the Gibbs
sampier with the EM algorithm. There is a superficial similarity. Suppose that in-
terest centers on ML estimation such that the priors are ftat. Then the multi-stage
Gibbs cycle results in the following sampling instructions: 3
Iterating the Gibbs cycle N times, N --+ 00, produces the joint posterior distribution
of (A, 0 and thereby the marginal posterior distribution of A. The ML estimator for
A is the maximizer of this function. In other words, each draw of the Gibbs sampier
can be considered as the ML estimate plus noise.
The EM iteration produces the most probable draw of the Gibbs sampier. Instead of
sampling the regimes and parameters from the posterior distribution as in the Gibbs
sampier, at each iteration of the EM algorithm the means of the conditional probab-
ility distributions, €tlT (expectation step) and)' (maximization step), are calculated.
At each iteration the EM algorithm maximizes
where Pr(~IYT, A(j-I») is the predictive density of ~ given the observations YT and
30bviously, equation (9.1) is a simplification since the parameter vector A is further decomposed. But
this does not substantially affect the following considerations.
172 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
the parameter vector A(j -1) derived at the preeeding iteration. As shown by HAMIL-
TON [1990], the EM algorithmeonverges to the ML estimator)., where). maxirnizes
the likelihood funetion
cx rr
T
t=l
LP(Ytl~t, Yt-l; A) Pr(~tIYT, A).
~t
(9.4)
Therefore, under the condition of flat priors, the ML estimate A is the fix point of
the EM sequenee as wen as the mode of the posterior prob ability density funetion
p(AIYT) from whieh the Gibbs sampIer is drawing A.
While the EM algorithm is less eomputationally demanding, it does not provide the
posterior distribution of the parameters and an estimate of the varianee-eovarianee
matrix direetly. eurrent estimation theory delivers only information about the
asymptotie distribution of A.
Sinee methodologie al aspeets have already been dealt with, we ean now coneentrate
our interest on technieal issues. As a basis for the following diseussion, the estima-
tion methods introdueed in Chapter 6 and Chapter 8 are outlined in Table9.4 and
Table 9.5.
Retracing the logic behind these two iterative proeedures, we see that the inputs for
the regression step are given by the observations y, X and the veetor of simulated re-
gimes ~ (respectively the vector of smoothed regime probabilities ~tIT) which have
been produeed by the BLHK filter and smoother or via simulation. These are taken at
the iteration as if they were the true (though unobserved) vectors of regimes. Each
pair of observed dependent and exogenous (or lagged dependent) variables Yt, X t
9.3. Estimation of VAR Parameters for Given Regimes 173
where Ut '" N (0, IK) and Ymt = X OtlO + Xmt'Ym. There is a dummy variable
corresponding to each regime and the dummy variable that corresponds to regime m
will take the value unity if the regime m has been drawn for St by the Gibbs sampier.
In the eontext of the EM algorithm, imt stands for the smoothed probabilities imtlT'
To get a better insight into this feature it may be worth noting that the estimator used
coincides with the GLS estimator of the following regression model:
:, ] 0 I K ) ?+ [ :~ ]
X'Y+ u ,
U <'V N (0, W) , W =
~~:EJ . (9.5)
Since the inverse of 3 m does not really exist, this formulation of the set of re-
gression equations is only a theoretical construet which features formal equival-
enee. However, Var (Ytl~t) ~ 00 insures that the likelihood of observing a tripie
(Yt, Xt, ~mt) is identieal to zero. Henee, the observation Yt eannotbeproduced (with
a positive probability) from regime m.
This linear statistical model implies an ML estimator whieh is exaetly the GLS
estimator introdueed in Chapter 6 and Chapter 8:
(9.6)
As one ean easily imagine, this set of regression equations eombines many features
better known from pooled times-series and eross-sectional data using dummy vari-
ables (cf. JUDGE et al. [1988, eh. 11.4]).
174 Comparative Analysis of Parameter Estimation in Particular MS-VAR Models
Conceming the Gibbs sampier, we have ~mt = I(~t = (,m) E {O, I} and we can
eliminate those equations where ~mt = 0, m = 1, ... , M and pool the remaining
T K equations of the MT K dimensional system (9.5) to the following system,
M
y XOIO + L (3m X 1 ® I K ) Tm + U,
m=l
For convenience, only this pooled regression equation is given in the tables.
If the regressors are identical for each equation Yk and regime m, X m = Xm ®IK,
equation (9.7) will yield for the particular MS-VAR models
This formulation ofthe estimator has two advantages compared to the standard GLS
form of Chapters 6 & 8. First of all it requires only the multiplication with weighting
matrices ofthe order (T K x T K), whereas formula (9.6) would require to multiply
9.3. Estimation of VAR Parameters for Given Regimes 175
matrices up to order (MT K x MT K). Thus the computational burden at each it-
eration is not much high er than M times the effort for a time-invariant VAR model.
For example, the ML estimator ßand the mean /3 of the posterior conditional distri-
bution of the MSIA model are identical to those of the MSIAH (cf. Tab1es 9.10 and
9.11). This is due to the presence of only regime-dependent parameters, which are
estimated regime by regime. In both models there prevails homoskedasticity in each
regime and the GLS estimator shrinks to an LS estimation. The GLS estimates can
be calculated faster as a LS estimation by
Regression Equation.
b. Identical Regressors X m = Xm ® IK
a. Homoskedasticity
m=l
b. Heteroskedasticity
Definitions
]
[ 81 0 t,' 0
W-l =
(MTKxMTK)
0
3M ®t-;)
X
(MTKxKR) []J. Y
(TKxl)
,
= (Yl"" 'YT
, )' ,
If all parameters I are regime dependent, i.e. there are no common parameters, then
Xm = (0, ... ,0, X m , 0, ... ,0). Thus, in an MSIAH(M)-VAR(P) model, each
parameter vector Im can be estimated separately,
where each observation at time t and regime m is weighted with the smoothed prob-
ability ~tIT'
Under uninformative priors, the mean of the posterior distribution of the VAR para-
meters:Y is technically almost identical4 to the ML estimator1', where the vector of
smoothed regime probabilities t is substituted with the drawn regime vector 5 ~ and
the remaining parameters are also drawn by the Gibbs sampier.
Despite their conceptual differences, the technical similarities of the regressions in-
volved with the Gibbs sampier as by the EM algorithmjustify considering the estim-
ators together.
As we have seen in the presentation of the Gibbs sampier in Chapter 8, the estima-
tion procedures involved are conditioned to a higher degree than those at the max-
imization step of the EM algorithm. However, in principle, the partitioning of the
parameter vector I and conditioning on the rest of the parameters can be done in the
same way within the EM algorithrn.
4 This technical point of view should not neglect the alternative theoretical foundations in classical and
Bayesian statistics on which the EM algorithm of ML estimation and the Gibbs sampier are built
respectively.
5 Remember that eis sampled from the discrete probability distribution ~tlT = E[et IYT] which is used
by the EM algorithm.
178 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
Regression Equation.
M
L Sm 0 I:
M
Heteroskedasticity: n= m .
m=l
-1 ) -1 -1
2:~=1 (Sm 0 1~ )Xm 'Ym,
I I
io ( XoW o Xo XoW o Yo, Yo = Y -
Var(iol') = (X~ Wö1XO)-1, Wo-1
= ",M ~
L.."m=l '::'m 0
~-l
LJ m .
b. Identical Regressors X m = Xm 0 IK
Var(iml') = (X~SmXm r 10 I: m.
3. Covariance Parameters
a. Homoskedasticity
M m
b. Heteroskedasticity
-1U'm='m
Trn ~ U
m·
9.3. Estimation of VAR Parameters for Given Regimes 179
MSI specifications have the convenient property that the closed-form of the estimator
follows immediately from the definition of the parameter vector in Table 9.6. 6
Inserting these definitions 7 in the formulae derived above yields (after some algeb-
raic manipulation) the estimators given in Tables 9.8 - 9.17. In particular for the
MSI -VAR & MSIH-VAR model and the MSIA-VAR & MSIAH-VAR model, the es-
timators can be given in a very compact form.
It might be worth noting the analo gy of the formulae for MSI-VAR (MSIA -VAR) and
MSIH-VAR (MSIAH-VAR) models. This result can easily be visualized by deriving
the estimator i of the MSI-VAR model from the estimation equation given for the
MSIH-VAR model under the restrietion E m = E for a11 m:
6Note that the corresponding regressor matrices are defined in the tables, accordingly.
7 To avoid any misunderstanding, for the filtering procedures we have assumed that the matrix r contains
regime-dependent and regime-invariant parameters. However, it is useful for estimation purposes to
split the parameter vector , into regime-invariant parameters ,0 and the parameters ,m belonging to
regime m, m = I, ... , M. See e.g. Chapter 8.
180 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
associated with the maximization step illustrates the principle of weighted LS esti-
mation. However, B' is not identical to an LS estimation of the corresponding regres-
sion equation where the smoothed probabilities collected in 3 are substituted for the
unobserved 3:
tr' = (Z*' Z* ) -1 Z*' y*
Comparing the results for the MSIAH(M)-VAR(P) model in Table 9.11 with the
estimator obtained for the MSIA(M)-VAR(P) model in Table 9.10, it turns out that
the ML estimator ßof the parameter vector ß and the mean /3 of the posterior distri-
bution of ß are identical. This is due to the fact that the parameter vector ßm asso-
ciated with regime m can be estimated separately for each regime m = 1, ... , M.
Thus the GLS estimator under heteroskedasticity and the weighted LS estimator un-
der homoskedasticity are identical. Differences in the treatment ofboth models con-
cern only the estimation ofthe covariance parameters:E versus :EI, ... , :E M • which
are estimated under regime-dependent heteroskedasticity with the residuals of re-
gime m weighted with the smoothed regime probabilities 3m •
In Table 9.13 and Table 9.14 the estimation of intercept-form VAR processes is
presented, where only the autoregressive parameters - and in the MSAH-VAR model
the variance-covariance matrix - are subject to regime shifts. Due to the conditioning
principle of the Gibbs sampier the regime-dependent autoregressive parameters can
be estimated separately, such that G LS and LS estimation are equivalent. Thereby
the estimation of the regime-invariant intercept terms is affected by heteroskedasti-
city.
9.3. Estimation of VAR Parameters for Given Regimes 181
The interdependence of the estimators for ß and Ern involved at the maximization
step of the EM algorithm normally require iterating between both equations recurs-
ively at each maximization step. However, to ensure convergence of the EM al-
gorithm to a stationary point of the likelihood function, in most cases it is sufficient
to ron regressions conditional on some estimates of the previous maximization step
(cf. the ECM algorithm, RUUD [1991]). Therefore, we estimate ß given the estim-
ated Ern of the last maximization step et vice versa in order to operationalize the
regression step. As an alternative one could consider ronning an interior iteration
until ß and E converge. This problem diminishes in the Gibbs sampier due to its
construction principle of conditioning.
1. The conditional density of Yt depends not only on the actual regime, but also
on the last p regimes.
182 Comparative Analysis of Parameter Estimation in Particular MS-VAR Models
.:::*
Xn
(TxKp)
.
XI' =
~p
- L...Jj=O V 0 Aj,
(MHp KxMK)
x~ =
(KxMK)
Lj
(MxMHp)
X
(TxKp)
Y_j (Yl-j, ... , YT-j)'
(TxK)
y (Y~, . .. ,Y~)'
(TKXl)
Um Y - X* A~ - (lT 0 J1.')
(TxK)
Üm Y - X;",.B'
(TxK)
2. The regression equation is no longer linear in the parameter vector /, i.e. the
vector of means J.-t = (J.-ti, ... , J.-t~ )" and the vector of autoregressive para-
meters 0: = (o:i, ... , o:~ )' .
Section 8.4.3 (despite the non-linear restrietions on the reduced form para-
meters). The convergence of the estimates is ensured by an internal iteration
in 5:(J-t, 0"), Jt(a, 0"), and if(J-t, a).
From these principles the maxirnization step of the EM algorithm and the regression
step of the Gibbs sampier ensue. The resulting explicit closed-form estimators are
given in Tables 9.18 - 9.21.
Tbe MSM(M)-VAR(P) model considered in Table 9.18 differs from the MSMH-
VAR model in Table 9.19 by the restrietion EI = ... = E M = E on the parameter
space. Inserting this restrietion into the estimators ofthe MSMH(M)-VAR(P) model
in Table 9.19 results in a weighted LS estimation of the autoregressive coefficients a.
Meanwhile, the estimation of the mean J-t involves a GLS estimator even if Ut is ho-
moskedastic. Since the regressor X~t is not identical in each single equation of the
vector system, the GLS-type estimation of the regime-dependent means J-t remains.
Interestingly, in the MSA(M)-VAR(P) model given in Table 9.13 and the
MSAH(M)-VAR(P) model given in Table 9.14, the regime-dependent auto-
regressive parameters a m can be estimated for each regime m = 1, ... , M
separately while the regime-dependent means J-tm have to be estimated simul-
taneously. This results from the presence of J-tm in the regression equations with
St f. m Thereby a m enters the regression equation if and only if St = m. Moreover,
it follows that a m is estimated with weighted LS and J-t is estimated with GLS irre-
spectively of homo- or heteroskedasticity of the innovation process Ut. Due to the
comrnon principle of conditioning, the regressions required by the EM algorithm
and the Gibbs sampier are identical, if the estimated parameters and smoothed
probabilities are replaced by their sampled values et vice versa.
9.4 Summary
The preceding discussions have shown the power of the statistical instruments given
by the BLHK Filter for the analysis of regimes, the EM algorithm for the ML esti-
mation of parameters and the Gibbs Sampier for simulating the posterior distribution
of parameters and the predicted density for future values involving Bayesian theory.
Various specifications have already been introduced. Nevertheless, some extensions
of the basic MS-VAR model rnight be useful in practice. They will be considered
briefly in the next chapter.
184 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
Before that, as an appendix, we present for particular MS-VAR models the closed
form expressions of the GLS estimator employed at each maximization step of the
EM algorithm, respectively at each regression step ofthe Gibbs sampier (cf. the over-
view given in Table 9.1).
9.A. Appendix: Tables 185
Regression Equation
Definitions
y ( I I )1
(TKx1)
Y1"" 'YT Ü 1M 0Y-ZB '
(MTxK)
186 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
Regression Equation
M M
Y= LC:::TnlT QSi IK)VTn + (x QSiIK) 0 + u, U rv N(O,!1), !1 = L 3 111 181 E 111
111=1
V111 = T~I(~~ QSi IK)(Y - (X 181 IK )0) = T;;;1 (Y' AX') ~111
Var (vTn I') = T~1 QSi E;;.l
f,
.l.Jm = y-1U'
m
~ U
m="m m
Definitions
X
(TxKp)
= (Y-l, ... ,Y- p )
y.
(Tx-k) = (Yl-j, ... , YT-j)'
(TIxI)
= (Y~,' .. , Y;')'
Xm
(Tx[M+Kp])
= (lT QSi L~; X)
Um Y - XA' - (lT QSi v~)
(TxK)
Üm Y -XmB'
(TxK)
9.A. Appendix: Tables 187
Regression Equation
ßm = «X'3mX)-lX'3 m ® IK) Y
iJ'rr, = (X'SmX)-lX'Sm y
Var (ßml') = (X'3mX)-1 ® ~
f; =T- 1 U'3U
Definitions
X
(Tx[Kp+1])
= (IT,y-1,""Y_ p )
(TIxI)
= (y~, . .. ,y'r)'
ßm
([K 2 p+1jX1)
=
U = 1M ® y - (IM ® X)(Bl"" ,Bm )'
(MTxK)
Ü 1M ® Y - (IM ® X)(E1"'" Em )'
(MTxK)
188 Comparative Analysis of Parameter Estimation in Particular MS-VAR Models
Regression Equation
M M
Definitions
(Tx[Kp+1))
X = (lT,Y-1, ... ,Y_p)
y.
(TX-K) = (Y1-j, . .. , YT-j)'
(TIxI)
= (y~, ... ,Y;")'
ßm
([K 2 p+1)X1)
= (v:n , Q~)' = vec B~
Um y - XB:n
(TXK)
Ü=
(TxK)
= Y-XB~
9.A. Appendix: Tables 189
Regression Equation
M
Y = (X 0 IK)ß + u, u rv N (0, n), n= L3 m 0 ~m
m=l
Definitions
X (lT,Y-l, ... ,Y_p)
(Tx[l+Kp]) U Y - (X 0IK)B'
y. (Yl-j, ... ,YT-j)' (TxK)
(Tx-jq Ü Y - (X 0IK)B'
y (y;, ... , Y~)' (TxK)
(TKxl)
190 Comparative Analysis of Parameter Estimation in Particular MS-VAR Models
Regression Equation
M
iJl ] [T t~ X ... t~ X ] -1 [ Y ]
=> [ ~; ~ ~t X'S,X . ~. _ ~'~'Y
A'u X'~M 0 X'SMX X'SMY
M
E= T- 1 L Ü~3mÜm
m=1
am = -I -)-1 (-I)
(( XSmX XSm 0IK ) (y-IT0v)
Definitions
X = (Y-1, ... ,Y- p )
(TxKp)
y. (Yl-j, ... ,YT-j)'
(Tx1n
( I I )'
(Tlx1) Yl"" 'YT
Xm
(Txl+MKp)
= (IT,t~ 0X)
Um
(TxK)
= Y - XA~ - (IT 0 V')
Üm Y-XmB '
(TxK)
9.A. Appendix: TabJes 191
Regression Equation
M M
Y = (lT 0 IK)V + I::(2 m X 0 I K )a m + U, U rv N (O,n), n = I:: 2 m 0 ~m
711=1 m=1
Definitions
X
(TxKp)
= (Y -1, ... , Y _p)
y.
(Tx-i) = (Y1-j, ... , YT-j)'
(Tlxl)
= (y; , ... , Y!r)'
Xm
(Tx [1+M Kp])
= (lT,t~ 0 X)
Um = y - XA~ - (lT 0 v')
(TxK)
Üm
(TxK)
= Y - XmB'
192 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
Regression Equation
L 3= 0
M
= (IT 0 IK)/J + (X* 0 IK)O: + U, U rv N (0, n), n= L: m
m=l
Definitions
x* = X-(ITI~0/J)
A(l)
(KxK)
= IK -l:~=1 Ai
(TxKp)
:::*
X
(TxKp)
= (Y -1, ... , Y -p) X
(TxKp)
= X - (ITI~ 0M
y.
(Tx-k) = (Yl-j, . .. ,YT-j)'
(TxK)
u = Y - X* A' - (IT 0 /J/)
Regression Equation
M
=(IT ® IK)J.L + L (3 Tn X* ® IK )O:m + u, U f'v N (0, D), D = IT ® 'E
711=1
Definitions
x* = X - (lTl~ ® /1) A=(l) = IK - 2:;=1 Amj
(TxKp) (KxK)
X = =*
(TXKp)
(Y-1,""Y-p) X
(TXKp)
= X - (lTl~ ® j1)
y.
(Tx-K)
(Y1-j, . .. , YT-j)' U= = Y - X* A~ - (IT ® /1')
= ( ,
Y1""
, )'
'YT
(TxK)
Ü= = y - X;,.i3'
(Tlx1) (TxK)
194 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
Regression Equation
M M
=(IT 12> IK)I-' + L (2 771 X* 12> IK )0 771 + ", " rv N (0, 0), 0 = L 2 771 12> I: 771
771=1
; ~ (~T mAm (1)' ;;;;;' Am(I)) -1 (j; (t;." Am(I)' ;;;;;1) (Y - (X® IK )<im) )
P ~ (~TmAm (I)' E;;;' Am (I)) -1 (j; ((;. ®Am(l)' E;;;') (Y - (X® IK )am ) )
Definitions
X" = X- (lTl~ 12> 1-') A 7n {l) = IK - L~=l A771j
(TxKp) (KxK)
(TxKp)
X (Y -1, ... , Y -p) "'''
X
(TxKp)
= x - (lTl~ 12> p.)
y. = U X* A:"
(TX-lh
(Yl-j, •.. ,YT-j)' 771 = Y - - (IT 12> 1-")
(TxK)
(Tlxl)
= (y~, ... , Y;')' Ü 771 = Y - X;."iJl
(TxK)
9.A. Appendix: Tables 195
Regression Equations
(t L ~n 0X~) 11 0IK)
y = + (X a + u, (1)
y = 0 + 0 IK) a + u, (2)
m=l n(m} m=l n(m}
Ci = [ { (t LX:'
==1 n(=)
2nX:) -1 (t LX:' 2n) }0IK] (Y -
",=1 n(m)
lT 0
jj.", )
f; =1'; .1 L L U~2n Un
M
==1 n(",)
Var(&I'} = (t L
==1 n(",)
X:'2 n X:) -1 0!:
t = T- 1 ~ ~ u 'n '=
L-..; L-..;
n1. ...... n
Un
==1 n(",)
196 Comparative Analysis of Parameter Estimation in Particular MS-VAR Models
Regression Equations
... =1 n(m)
® (lt 0
1:-:... = T",--1,,",-'':'-
~ U"':',, U"
,,(m)
(t (2: X:'EnX:)
Var(ä:I·) =
tn_l a(m)
® r;;;;t)_1
Regression Equations
L L Ü~E
M
t = 1';;.1 .. Ü ..
m=l n(m)
M
:t = T- 1 ~
fn
~ u'n-n
~ ~
.. U ft
198 Comparative Analysis oE Parameter Estimation in Particular MS- VAR Models
Regression Equations
u '" N(O,f}), f} = L L =n 0 ~=
==1 n(m)
EM Algorithm: Maximization Step
t ... = T,;;I 2:
n(",j
U~:=: .. u"
Chapter 10
In the preceding chapters we have made three essential assumptions with regard to
the specification of MS-VAR processes: we have assumed that (i.) the system is
autonomous, i.e. no exogenous variables enter into the system, (ii.) the regime-
dependent parameters depend only on the actual regime but not on the former history,
and (iii.) the hidden Markov chain is homogeneous, i.e. the transition probabilities
are time-invariant. As we have seen in the foregoing discussion, these assumptions
allow for various specifications. Modelling with MS-VAR processes is discussed ex-
tensively in the last part of this study for some empirical investigations related to
business cycle analysis. However, there might be situations where the assumptions
made about the MS-VAR model result in !imitations for modelling.
Therefore, in this chapter we will introduce three extensions of the basic MS-VAR
model. In Section 10.1 we will consider systems with exogenous variables; in Sec-
tion 10.2 the MSI(M)- VAR(P) model is generalized to an MSI(M, q)- VAR(P) model
with intercept terms depending on the actual regime and the last q regimes, thus ex-
hibiting distributed lags in the regimes. In a third section we discuss MS-VAR mod-
els with time-varying transition probabilities and endogenous regime selection, i.e.
specifications where the transition probabilities are functions of observed exogenous
or lagged endogenous variables.
The natural way to introduce these variables is to generalize the MS-VAR model to
adynamie simultaneous equation model with Markovian regime shifts:
where Wt '" NID (0, I;(st)) and Yt = (Ylt, ... , YKt)' is a K-dimensional vector
of endogenous variables, the Ai and Bj are coefficient matrices. The vector Xt of
exogenous variables may contain stochastic components (e.g. policy variables) and
non-stochastic components (e.g. seasonal dummies). The intercept v has not been
included into the vector Xt.
In the foUowing we will focus on the reduced form of the system which can be ob-
tained by premultiplying (10.1) with AöI:
IOn the other way round, it may be interesting to check whether the regime shift appears for changes
in an omitted or unobservable variable (world business cycle, state of confidence, oil price, ete.).
10.1. Systems with Exogenous Variables 201
varying dynatnic multipliers Dj(st}, while D(L) = L;':o DjLj = A(L)-l B(L) is
time invariant iff A(L) and B(L) are time invariant.
T
L(A) / p(YI~, z, B) Pr(~I~o, p) d~ L 1J~~tlt-l'
t=l
where
1Jt = ~tlt-l =
and X t - 1 = (x~_I' x~_2' ... ,x~)', Z = X T . Then, the estimation ofthe parameter
vectors
b = vec (B o, BI, ... , Bq) or bm = vec (B o.m , Bl. m ,"" B q.m ), m = 1, ... , M,
respectively, can be obtained in the same manner as for the intercept parameter vector
I/m in the previous chapters. For example, the normal equations of the ML estimator
are given by
ßlnL
ab'
may be introduced into the state equation, where they determine the probabilities of
regime transitions, e.g. the transition probabilities Pij(Xt-d) could be a function of
some observed exogenous variables at time t - d such that
202 Extensions of the Basic MS- VAR Model
As a generalization ofthe MSI(M)-VAR(P) models, one may assume that the inter-
cept tenn depends not only on the actual regime but in addition on the last q regimes
q q M
where the Mq+l = MP+1 different (K x 1) intercept tenns are functions of the
M different (K x 1) mean vectors and the (K x K) autoregressive matrices A j ,
j = 1, ... ,po We have seen in the context ofthe MSM(M)-VAR(P) model that the
problem of lagged regimes in the conditional density of the observable variable can
be avoided by redefining the relevant state vector. However, such an unrestricted pro-
cedure increases the dimension of the state vector dramatically and without further
restrietions this leads to a parameter inflation.
Therefore we will not generalize this model further by relaxing the assumption of
additivity. In particular, for two-regime models M = 2 with q = 1, this assumption
is not restrictive since Mq+1 = (q + 1)M:
The MSI(M, q)-VAR(P) model is of particular interest for multiple time series as
Section 7.1 has indicated. For example, {Ylt} may be a leading indicator for {Y2t}
where the lead is d periods:
Thus, one would observe the effects of a change in regime in the first time series d
periods before the shift affects the second time series.
A(L)(Yt - J-Ly)
F(L)(t
~1t - (1 ]
~M-l,t ~
[
[M-l .
Proof The proof is a simple extension of the proof for MSI(M)-VAR(P) pro-
cesses. The stable VAR(1) process {(t} possesses the vector MA( 00) representa-
tion (t = F(L)-IVt. Since the inverse matrix polynomial can be reduced to the
inverse of the determinant I F(L)I- 1 and the adjoint matrix F(L)*, we have (t =
I F(L)I- I F(L)*Vt. Inserting this transformed state equation into the measurement
equation results in
Yt - /-Ly (10.5)
Xt (10.6)
where the state vector consists of p adjoining observable vectors {Yt-j }j:~ and q+ 1
unobservable regime vectors {(t+l-j }]=o:
Al ... Ap-IAp MI .. . M q- l M q Ut
Yt IK 0 0 0 0
Yt-p+l 0 IK 0 0
Xt = ,G= ,Ut =
(t+l P' 0 0 Vt
IM-I 0 0 0
(t-q+l
0 0 IM-l 0 0
10.3. The Endogenous Markov-Switching Vector Autoregressive Model 205
The statistical analysis ofMSI( M, q)- VAR(P) models can be also easily performed as
a straightforward extension of the by now familiar methods. It should come as no SUf-
prise that the MSI(M, q)- VAR(P) model can be treated analogously to the MSM(M)-
VAR(P) model. Define the relevant state vector as
In the foregoing we have assumed that the hidden Markov chain is homogeneous,
such that the matrix of transition probabilities, Pt = P t-I = ... = P, is constant
over time. In their cIassical contribution, GOLDFELD & QUANDT [1973] have pro-
posed an extension ofthe approach by allowing the elements ofthe transition matrix
to be functions of an extraneous variable Zt. For M = 2 regimes we would have for
example
206 Extensions of the Basic MS- VAR Model
This approach has been ealled by GOLDFELD & QUANDT [1973] the "r(z)-
method". If the underlying model is a veetor autoregression, we will refer to it as
a generalized Markov-switching vector autoregressive model or GMS(M)-VAR(P)
model. If some transition probabilities depend on the lagged endogenous variable
Yt-d, d > 0, i.e. Pr(St = ils t - I = j, Yt-d = Pij(Y~_d8), then the resulting model
will be termed endogenous selection Markov-switehing vector autoregressive model
or EMS(M, d)-VAR(P) model.
DIEBOLD et al. [1994] eonsider Markov-switehing models with exogenous switeh-
ing, but without lagged endogenous variables. Markov-switching models with endo-
genous switehing (but again without lagged endogenous variables) are considered by
RIDDER [1994]. In particular, DIEBOLD et al. [1994] have discussed a modifieation
to an MSI(2)-AR(0) model in which the transition probabilities can vary with fun-
damentals. The transition probabilities Pr(StISt-l' zt} are parameterized by use of
logit transition functions as
Hence, the matrix P t-I of transition probabilities Pr( St ISt-I , Zt) equals
exp(z~81)
[ 1 + ex~(z~8d (10.8)
1 + exp(z~82)
In contrast to the STAR model, the effects of the variable Zt on the probability distri-
bution of forthcoming regimes depends on the actual regime. An alternative model
might coneern asymmetrie and stoehastic poliey multipliers. Suppose, far example,
that a tight monetary poliey is more effeetive in stopping "booms" than an expan-
sionary monetary poliey in implementing upswings. 2 Then the poliey variable Zt
would affeet the transition probability of slumping from a "boom" (St = 1) into
"recession" (St = 2), Pr(st = 1lSt-l = 2, Zt), while the transition probabilities
Pr(StISt-l = 2, Zt) = Pr(StISt-l = 2), remain unaffeeted.
This model is also well-suited to ineorporate deterministic elements of regime
switching. Suppose one expects the prevailing of regime m* = 1 for a given period
2Such asymmetries ofmonetary policy are considered in GARCIA & SCHALLER [1995] for the United
States.
10.3. The Endogenous Markov-Switching Vector Autoregressive Model 207
of time Tr. Define a dummy variable dt such that dt = I (t +1 E Tr) and let
Zt = (1, dt ). Finally, denote P t-I in a slightly different form to equation (10.8)
with the unrestricted parameter vector p = (bO,I, bO,2, b1),
exp(bO,I + b1dt} 1 ]
Pt 1 = [ 1 + exp(bo,l + bIdt ) 1 + exp(bO,1 + bldt}
- exp(bo,2 + bIdt} 1 .
1 + exp(bo,2 + b1dt} 1 + exp(bO,2 + bldt )
Then for b1 -+ 00, the probability of being in regime 1 goes to one at any point t in
time with t + 1 E Tr,
Pt~l = [~ ~],
while the transition probabilities are given for the remaining periods t with t + 1 ~ Tr
by
exp(bo,l)
P
t-I
=P = [ 1 + exp( bo,t}
exp(bO,2)
1+ ex~(bo,tl] [ P11 1 - P11 1'
1- P22 P22
1 + exp(bo,2) 1 + exp(bo,2)
_
where PlI = 1 +exp( bo,d
exp(bO,I)
_
and P22 =- - -1
--
1 + exp(bO,2)
A slightly different model can be achieved by introducing the dummy variable Zt
only in the transition functions for regime 1, Pr(stlSt-1 = 1, Zt), but not in those of
regime 2: Pr(StISt-1 = 2, Zt) = Pr(StISt-1 = 2). As previously mentioned, the
deterministic event leads to an immediate jump at time t into a special regime, say
m *. After the regime prevails, the transition probabilities are unaltered compared to
the former history. By using the dummy variable approach, we define dr - 1 = 0, T =f:.
t, and dt - l = 1. This implies the following expected evolution of regimes:
- Ih-l
~t+hlt-l = P /'rn'"
~t+hlt-l t rn * for h E Tr
- h-l
~t+T",. +h-Ilt-l P' t rn .,
where it is assumed that the intervention period is a compact period Tr with length
Tt.
208 Extensions oE the Basic MS-VAR Model
Thus, the EMS-VAR model implies a feedback from the observational process to the
state process which can be exemplified by setting-up the likelihood function of an
EMS-VAR model
to MS- VAR models with an exogenous Markov chain as regime generating process,
the likelihood function cannot be written in the form of a finite mixture of conditional
densities p(YIYo, f)(~l' A)) with positive mixing proportions ~1(p(A), ~O(A))
L:~=l €so rri'=l PS t - 1 St (A) as in (6.6). For this reason the identifiability arguments
invoked in Section 6.2 cannot be applied to the EMS(M)- VAR(P) model. In RIDDER
[1994], identifiability and consistency ofML estimation is checked for endogenous
selection models without autoregressive dynamies, i.e. only for EMS(M)- VAR(O)
models. Hence the properties of the statistical procedures to be discussed in the next
seetions merit further investigation.
We will show now how the filtering algorithm and the estimation procedures have to
be modified to handle the case of non-homogeneous Markov chains.
Since P t - 1 = II(Yt-d) is known at time t - 1, the Bayesian calculations of the
last sections remain valid, even if endogenous selection of regimes is assumed. For
example, the posterior probabilities Pr( ~t IYt, Yi-l) are given by invoking the law of
Bayes as
c l"ll')
P r (<,t I t
= P r (C<,t IYt, v t-I ) = p(Ytl~t, Yi-d
- I
Pr(~tIYi-d
( I"ll' ) (10.11)
P Yt It-l
with the a-priori probability
where Pr(~t I~t-l, Yt-d) has replaced the simple transition prob ability Pr(~t I~t-l)
and the density p(YtlYt-l) is again
Hence we only have to take into account that the Markov chain is no longer homo-
geneous. The necessary adjustments of the filtering and smoothing algorithms affect
only the transition matrix F, which is now time varying
Ft (1Jt 0 €tlt-d
~t+1lt (10.12)
l'(1Jt 0 €tlt-d '
DIEBOLD et al. [1994] have proposed a modification to the EM algorithm that can be
used to estimate the parameter vector entering into the transition functions. The use
ofthe Gibbs sampier has been suggested by FILARDO [1994] and GHYSELS [1994].
While the MS-VAR model with constant transition probabilities has been recognized
as a non-normal, linear state-space model (see Seetion 2) the EMS-VAR model can
be described as a non-normal, non-linear state-space model, where the non-linearity
arises in the transition equation.
In order to motivate our procedure, let us consider at first the treatment of non-linear
models which are more established in the literature. Again, if the innovations Vt were
normal, Vt '" NID (0, Ev ), we would have a normal, non-linear state-space model.
Forthis kind ofmodel the extendedKalman filter (cf. e.g. HAMILTON [1994a]) is of-
ten an efficient approach. The idea behind the extended KaIman filter is to linearize
the transition equation and to treat the Taylor approximation at ~t = ttit as if it were
the true model. These procedures result in an augmented time-varying coefficient
version of a linear state-space model, for which the iterations needed for deriving
the smoothed states ~tlT are well-established. It can be easily verified that the mod-
i fied EM algorithm proposed by DIEB 0 LD et al. [1994] is a straightforward applic-
ation of these ideas developed for the normal non-linear state-space model to MS-
VAR models with time-varying transition probabilities. Thus the statistical analysis
of these models can be emdebbed in the EM algorithm, which has been discussed in
Chapter 6, for the MS-VAR model with time-invariant transition probabilities.
8pmm
Pmm(1- Pmm)z~
86'
Pu (1 - Pu) 0
8pt 0 -P21(1- P2t}
o z~
86' -(1 - P12)P12 0
0 (1 - P22)P22
10.4. Summary and OutIook 211
z't 0
0 -z't
diag (p 0 (t - p))
-z't 0
0 z't
Since the resulting first-order condition is non-linear, DIEBalD et al. [1994] suggest
a linear approximation at <5 1- 1
It may be worth noting that HOLST et al. [1994] have proposed to estimate the logits
of transition probabilities In (1~;ij) of a homogeneous Markov chain rather than
the Pij itself. This reparametrization is useful especially for the detennination of the
information matrix and thus for the variance-covariance matrix.
In the Burns-Mitchell tradition, the identification of turning points has been con-
sidered as the principal task of empirical business cycle research. While the NBER
methodology has been criticized as "measurement without theory" (cf. KOOPMANS
[1947]), the statistical measurement ofbusiness cycles is still worth studying.
lSee inter alia ALBERT & CHIB [1993], DIEBOLD el al. [1994], GHYSELS [1994], GOODWIN
214 Markov-Switching Models of the German Business eyde
120 ~----------------------------------------------------~
100
80
60
40
20
65 70 75 80 85 90 95
WIN [1993], HAMILTON & SUSMEL [1994], KÄHLER & MARNET [1994a], KIM [1994], KROLZIG
& LÜTKEPOHL [1995], LAM [1990], MCCULLOCH & TSAY [1994a], PHILLIPS [1991] and SEN-
SIER [1996].
2 As an alternative to MS-AR models of real GNP growth rates, it would be possible to model "'uctu-
ations in the utilization rate of potential output which is preferred in other definitions of the business
cycle (cf. e.g. OPPENLÄNDER [1995]). However, this approach requires the measurement of potential
output and would heavily depend on the quality of the constructed time series. For these reasons we
followed the standard assumptions in the relevant literature.
Markov-Switching Models of the German Business Cycle 215
the seasonally adjusted quarterly GNP data for West-Germany from 1960 to 1994.
The overall objectives of this analysis of the German business cycle are (i.) to illus-
trate the as yet theoretically derived properties of MS-AR models, (ii.) to demon-
strate the feasibility of the approach developed in this study for empirical analysis
and (iii.) to examine the potential role of the MS-AR models in forecasting. In con-
trast to the previous literature, statistical characterizations of the business cycle are
examined for a broad range of model specifications. In particular, we will exam-
ine whether the proposed models ean essentially replicate traditional business eycle
classifications by employing stoehastic models that are parsimonious, statistically
satisfactory and eeonomically meaningful.
This chapter will proceed as folIows. In the tradition of HAMILTON [1989], Markov-
switching autoregressive processes in growth rates of the real gross national product
(GNP) are interpreted as stochastie business eycle models. In the following section
the data are presented. Traditional characterizations of the German business cycle
are eonsidered as a benchmark for the following analysis. The strategies introduced
in Chapter 7 for selecting simultaneously the number of regimes and the order of the
autoregression in Markov-switehing time series models based on ARMA representa-
tions is used. Maximum likelihood (ML) estimations of the alternative models have
been performed with versions of the EM algorithm introduced in Chapter 6. The
estimation proeedures were implemented in GAUSS 3.2.
The presentation begins with the HAMILTON [1989] model. This MSM(2)-AR(4)
model illustrates the implieations of the Markov-switehing autoregressive model for
the stylized facts of the business cycle. It is shown that the MSM(2)-AR(4) model
cannot be rejeeted in the class ofMSM(2)-AR(P) models. Then we will remain in
the two-regime world and compare the Hamilton model to speeifieations, where the
intercept term (MSI(M)-AR(P) models) is shifting. In further steps, the assumption
216 Markov-Switching Models oE the German Business CycJe
where 6.Yt is 100 times the first differences of the log of real GNP and the conditional
mean Jl( sd switches between two states (M = 2),
and the variance a 2 is constant. The effect of the regime St on the growth rate
D.Yt is illustrated with the conditional prob ability density function p( D.Yt 1St) in Fig-
ure 11.2. 3
3The plotted p(.6.Yt\st} are constructed analogously to equation (11.4) using the regime classifica-
tions of the estimated MSM(2)-AR(4) model (cf. Section 11.3). As .6.Yt is neither independently nor
identically distributed, Figure 11.2 cannot be considered as a viable kernel density estimate.
11.2. Preliminary Analysis 217
0.5 .----------------------------------------------------r
0.4 Recession
-4 -3 -2 -1 o 1 2 3 4 5
11.2.1 Data
While the definition of the business cycle proposed by B URNS & MITCHELL [1946]
emphasizes co-movements in the dynamics of many economic time series, we will
restriet our investigation to a broad macroeconomic aggregate: the gross national
product (GNP) in constant prices of West-Germany from 1960 to 1994, which is
218 Markov-Switching Models ofthe German Business Cyc1e
6 r-------------------------------------------------~
-2
60 65 70 75 80 85 90 95
plotted in Figure 11.1. More precisely, we are going to model the quarterly growth
rate of the seasonally adjusted series given in Figure 11.3. The data consists of 132
quarterly observations for the period 1962: 1 to 1994:4 (excluding presample values).
Data sources are the Monatsberichte of the Deutsche Bundesbank and for the data
before 1979, the Quarterly National Accounts Bulletin of the OECD.
The presence of unit roots in the data has been checked by the augmented DrCKEY-
FULLER (ADF) test [1979], [1981]. For the null hypothesis of unit roots, i.e. Ho:
7r = 0 in the regression
p-l
.6.Yt = <P + L 'l/Ji.6.Yt-i + 1fYt-l + Ut, (11.1)
i=l
the test statistic gives -1.8778 (with p = 12) and -1.85961 (with p = 8). At a 10%
significance level, the null of a unit root in Yt cannot be rejected. For differenced
time series .6.Yt, the ADF test rejects the unit root hypothesis on the 1% significance
level (with test statistics of -4.2436 and -4.0613). Thus, Yt was found to be integrated
of order 1. In the appendix, we also show that the Hodrick-Prescott filter does not
produce a detrended time series with satisfying statistical characteristics. Therefore,
the data are detrended by differencing. The potential importance of structural breaks
for this result has been emphasized by PERRON [1989]. In contrast to this view we
will now consider the MS-AR model, where the presence of regime shifts and unit
roots is assumed.
11.2. Preliminary Analysis 219
60 65 70 75 80 85 90 95
In Figure 11.4 etc., the dark shadowed areas denote recessions as the decline from
the upper turning point ("peak") to the lower turning point ("trough") of the busi-
ness cycIe. The classicaL business cycIes are characterized by alternating periods of
expansion and contraction in the level of macroeconomic activity. They are encom-
passed by growth cycles which are short-term ftuctuations in macroeconomic activity
characterized by periods of high and low mean rate of growth. The more common
phases of decelerating growth rates are indicated by light shadowed areas. More de-
tails on the methodology of the CIBCR and the data source can be found inter alia
in ZARNOWITZ [1995] and NIEMIRA & KLEIN [1994].
220 Markov-Switching Models of the German Business CycJe
Akaike Criterion
AIC ARMA MSI-AR MSM-AR MSI(M, q)-AR(P)
1. 0.0295 (6,8) (2,8,5)
2. 0.0583 (8,8) (2,8,7)
3. 0.0874 (3,7) (2,7,2)
4. 0.0997 (3,4) (2,4,2)
5. 0.1062 (4,4) (5,0)
Schwarz Criterion
SC ARMA MSI-AR MSM-AR MSI(M, q)-AR(P)
1. 0.2939 (3,4) (2,4,2)
2. 0.3247 (4,4) (5,0)
3. 0.3544 (3,7) (2,7,2)
4. 0.3652 (3,6) (2,6,2)
5. 0.3746 (5,4) (5,1) (2,4)
A critical decision in the specification of MS-AR processes is the choice of the num-
ber of regimes M which are required for the Markov chain to characterize the ob-
served process. As we have seen in Section 7.5, testing procedures for the determin-
ation of the number of states are confronted with non-standard asymptotics. Due to
the existence of nuisance parameters under the null hypothesis, the likelihood ratio
test statistic does not possess an asymptotic X2 distribution.
In order to apply this model selection strategy to the data under consideration, we
have performed a univariate ARMA analysis. The maximum likelihood estimations
11.2. Preliminary Analysis 221
of the ARMA models were computed with the BOXJENK procedure provided by
RATS. The Akaike information criterion (AlC) and the Schwarz criterion (SC) were
employed to assist in choosing the appropriate order of the ARMA(p, q) processes.
The recommended ARMA models and corresponding MS-AR models are given in
Table 11.1. 4 Equipped with these results, we are able to select MS models which
could have generated the selected ARMA representation and thus can be expected
to be consistent with the data.
Note that in the class of MSI(M, q)-AR(P) models, under regularity conditions,
the ARMA(p*, q*) representation corresponds to a unique generating MSI(M, q)-
AR(P) process as can be inferred from Table 7.3. Apart from that, the specifica-
tion (M, p, q) of the most parsimonious MSM(M, q)-AR(P) and MSI(M, q)-AR(P)
model has been reported. 5 Thus, for the selected ARMA(p* , q*) representation with
p* 2: q* 2: 1 the unique MSI(M)-AR(P) model with M = q* + 1 and p = p* - q*
and for p* - 1 = q* 2: 1 the parsimonious MSM(2)-AR(q* - 1) is provided. For
completeness, the MSI(M, q)-AR(P) model introduced in Section 10.2 has been ap-
plied, if the MA order q* is larger than the AR order p* .
The selected MSM-AR and MSI-AR models should be considered as take-off points
for the estimation of more general MS models. As a next step, the recommended
MSM(M)-AR(P) and MSI(M)-AR(P) models are estimated and then compared with
regard to the resulting classifications of the German business cycle. It is worthwhile
to note that the MSM(2)-AR(4) model used by Hamilton in his analysis ofthe D.S.
business cycle is among the preferred models. But the results indicate that the further
analysis should not be restricted to two regimes. A Markov chain model with five
st"tes and no autoregressive structures may be an especially feasible choice. 6 The
MSI(5)-AR(O) will be discussed in Section 11.6.3.
4The complete results incIuding the computed selection criteria values for ARMA(p,q) models with
o~ p ~ 14, 0 ~ q ~ 10 are presented in KROLZIG [1995].
5For example, the recommended ARMA(5,4) model is also compatible with an MSM(3)-AR(3) and an
MSM(4)-AR(2) model.
6Unfortunately, MSM-AR models with more than two states and containing some lags quickly become
computationally demanding and therefore unattractive. Analogous problems would have been caused
by MSI(M, q)-AR(P) models. Consequently we consider only MSM(M)-AR(P) models with M ~ 3
and MSI(M)-AR(P) models.
222 Markov-Switching Models of the German Business Cyc1e
According to the results of our ARMA representation based model pre-selection, the
empirical analysis of the German business cycle can be started with the application
of the MSM(2)-AR(4) model introduced by HAMILTON [1989], whose theoretical
aspects have been discussed in Seetion 11.1. It will be shown that (i.) the Hamilton
specification does not only reveal meaningful business cycle phenomena, but also
that (ii.) the Hamilton specification cannot be rejected by likelihood ratio tests in
the class of MSM(2)-AR(P) models as shown in Table 11.2 and in Seetion 11.7, that
(iii.) the MSM(2)-AR(4) model is supported by likelihood ratio tests of the homoske-
dasticity hypothesis. Furthermore, the main features of the Markov-switching auto-
regressive model will be illustrated by means of the Hamilton model.
Maximum likelihood estimation of the MSM(2)-AR(4) model has been carried out
11.3. The Hamilton Model 223
.
.~---~~~~----,-~~--~~~~----~--~~~
with the EM algorithm given in Table 9.19; the numbers in parentheses give the
asymptotic standard errors as discussed in Section 6.6.2:
These results are in line with MSM(2)-AR(4) models estimated by GOODWIN [1993]
224 Markov-Switching Models of the German Business Cycle
0.5
0 .0 ~~~~~~~~~~~~~~~~~~~~~~~~~~
60 65 70 75 80 8S 90 95
[1995] for data from 1961:2 to 1991:4 as weH as the MSM(2)-AR(1) model fitted by
PHILLIPS [1991] to monthly growth rates in West-German industrial production.
Since the most innovative aspect of the Hamilton model is the ability to objectively
date business cyc1es, a rnain purpose of our analysis is to check the sensitivity of
business cyc1e classifications to the model specification.
Figure 11.7: MSM(2)-AR(4) Model: Regime Shifts and the Business CycJe
- 2
-4
60 65 70 75 80 85 90 95
In general, we will use the following simple rule for regime classification: attach
the observation at time t to the regime m * with the highest full-sample smoothed
probability,
m* := arg max Pr(St = mIYT). (11.2)
m
This procedure is in two-regime models equivalent with the 0.5 rule proposed by
HAMILTON [1989] such that
> 0.5
m' = { ~ if Pr(St = lIYT )
otherwise.
Interestingly, the tradition al business cycle dates given in Figure 11.6 correspond
fairly closely to the expansion and contraction phase as described by the Markov-
switching model. In contrast to the conclusion of KÄHLER & MARNET [1994a,
p.173] who "were not abLe to find meaningfuL business-cycle phenomena", our es-
timated MSM(2)-AR(4) model detects precisely the recession in 1966:3-1967:2 as
weIl as the recessions in 1973:4-1975:2 and 1980: 1-1982:4 succeeding the oil price
shocks. Furthermore, the model is able to describe even the macroeconomic tenden-
eies after the German reunification.
One advantage of the MS-AR model is its ability not only to classify observations,
but also to quantify the uncertainty associated with this procedure of regime classific-
ation. Ifwe attach the observation at time t to theregime m* accordingto rule (11.2)
226 Markov-Switching Models of the Gennan Business CycJe
M
M-1
' " Pr{St
~
= mIYT ),
mfm·
where M;:;l is the maximal uncertainty if all regimes m = 1, ... , M are possible
with the same probability 1. Hence, the proposed measurement is bounded between
o and 1. Obviously, we get for M = 2, that the probability of a wrong classifica-
tion, which is given by the complementary probability, is nonnalized to 2 Pr( St =f:.
m*IYT ).
The results presented in Figure 11.25for the MSM(2)-AR(4) model ofthe German
business cycle show that uncertainty approaches its maximum at the CIBCR turning
points of the business cycles. Given the results from Figure 11.6 this coincides with
the detection of regime transitions. Thus, the timing of a regime shift seems to be the
main problem arising with the identification of regimes. These findings and Their
implications for forecasting will be reconsidered in Section 11.9.
where ~ltiT = Pr( St = lIYT). Figure 11.7 reconsiders the estimated time path of
the conditional me an of growth rate which has already been given as the third chart
of Figure 11.5. Obviously the (reconstructed) regime shifts describe the core growth
rate in the historical boom and recession episodes fairly weIl.
11.3. The Hamilton Model 227
h h
Figure 11.8 shows the dynamic effects of a shift in the regime St and of a shock Ut.
In the left figure, the expected growth rate is given conditional on the information
that the business cycle is at time t in the state of a boom or a recession.
The innovation impulse responses plotted in Figure 11.8 are the coefficients of the
MA( 00) representation,
= I:: M I::
(Xl (Xl
q>i =
j=l
which can be interpreted as the response of the growth rate !J.Yt to an impulse Ut-i,
i periods ago. Thus, the impulse response function for the Gaussian innovation can
be calculated as for time invariant AR processes. 7
7However. trus innovation impulse function has to be distinguished substantially from the forecast error
228 Markov-Switching Models ofthe German Business Cyc1e
The impulse responses exhibit a strong periodic structure. Hence, the remarkable be-
nefit from a fourth lag might be an evidence of spurious seasonality in the considered
seasonally adjusted data.
If the shift in regime would be permanent, the system would jump immediately to
its new level /-LI or /-L2 (dotted line). Due to the stationarity of the Markov chain, the
conditional distribution of regimes converges to the ergodie distribution.
For a two-dimensional Markov chain, it can be shown that the unconditional regime
probabilities shown in Table 11.2 are given by
l _ 1 - P22 d l _ 1 - Pu
<,,1-(1-P22)+(1-pu) an <,,2-(1-Pn)+(1-pu)·
An important characteristic associated with business cycles and many other eco-
nomic time se ries (cf. KUNITOMO & SATO [1995]) is the asymmetry of expansion-
ary and contractionary movements. It is thus a great advantage of the MS-AR model
in comparison with linear models that it can generate asymmetry of regimes. The
incorporated business cycle non-linearities in the MSM(2)-AR(4) model are shown
in Figure 11.9.
Pr(h = j)
0.25 0 = - - - - - , , - - - - - - - - - - - - - - - - - - - - - - - , -
Recession
\
\
0.20 \
\
\
,
\
\
,
\
0.15 \
\',
,
0.10 Boom ""'"
.... ' .............
0.05
--- --- -----
o.00 +------,r---~--_,__---~_._-~-----,....:--:..:::-_r--::.::-c=..--::.::-=-~--=-""'--=____j.
o 4 8 12 16 20
The expected duration of a recession differs in general fram the duration of a boom.
These expected values can be calculated fram the transition prababilities as:
1 (Xl. 1
E[hls t = m] = 2: 00 i LP~mi = 1_ ,m E {I, ... , M}. (11.3)
i=1 Pmm i=l Pmm
In the Hamilton model (cf. Table 11.2) the expected duration of a recession is 4.6
quarters, that of a boom is 12.2 quarters. The unconditional probability of a recession
is estimated as 0.2719.
p
Uit = /}.Yt - iI~i - L Aj/}.Yt-j far i = 1, ... , MP+1,
j=l
230 Markov-Switching Models of the Gennan Business Cyc1e
Recession
0.4 ",,"'-,
"
l"
'\,
\\
0.3 \
\
,,
,
\
0.2 ,,
l
0.1 ,I
,/
---_ .."
-4 -3 -2 -1 o 1 2 3 4
Innovation Ut
(11.4)
is the Gaussian kerne!. For h = 0.5, theresulting kernel density estimates are given
in Figure 11.10.
11.4. Models with Markov-Switching Intercepts 231
4 .---------------------------------------------------~
2 ~~~r_----~r_------------~----------------------~
-2 ~~~------4_-4--------------------~~~--------~~
60 65 70 75 80 85 90 95
The results of the kernel density estimation should be compared with Figure 11.11
where the expected residuum ut,
M(p+l)
Ut = L
m=l
Umt€mtlT
is plotted against the time. The path of residuals verifies that the business cycIe is
generated by shifts in regime, larger shocks Ut are not related to the CIBCR turning
points.
8 See BIANCHI [1995] for a possible detection of regime shifts by kerne! density estimation.
232 Markov-Switching Models of the German Business CycJe
rate smoothly approaches a new level after the transition from one state of the busi-
ness cycle to another. For these situations, the MSI-AR model may be used. Esti-
mation results for alternative MSI specifications for the period 1962: 1 to 1994:4 are
summarized in Table 11.3.
Interestingly, the results are very similar to those of the last section. As a compar-
ison with Table 11.2 verifies, the estimated parameters of the Markov chain and the
likelihood are quite close to the corresponding MSM(2)-AR(P) models. Again an
MS(2)-AR(4) model outperforrns models with lower and higher AR orders. This
can be shown by means of a likelihood ratio test of the type Ho : op = 0 against
H 1 : op "# 0 shows which is asymptotically X2 (1) distributed. Thedifferences in the
properties of the MSM(2)-AR(4) and the MSI(2)-AR(4) model shall be compared in
the following considerations.
In Figure 11.12, the conditional growth expectations and the impulse response func-
tion is given for the MSI(2)-AR(4) model. While the impulse responses are quite
similar to those ofthe MSM(2)-AR(4) model, a comparison ofthe dynamic propaga-
11.4. Models with Markov-Switching Intercepts 233
0 .5
,,'\,'
\'
- ...... - ,
-'
I
0.0 V V
/\
v
"
I \ I
I
I
" - 0.5
0.0 +f--'- - - - - - - - - - - - - - 4
: s, =2 "Recession" 1
: ..... .... a(l)- /.12
" ' ..... , ...... ,. ........ .. ...... ... ... ..
,
-1.0
- O. 5 +----.-,.--~--..--,,----~__r_o,----...._,_~_.__f ~-~-~-,----~-~-~
o 4 8 12 16 20 o 4 8 12 16 20
h h
tion (dotted lines) of a permanent shift in regime in the MSI model with those of
MSM model illustrates the different assumptions of both models.
As we have seen in Figure 11.8, a permanent shift in regime induces in the MSM(2)-
AR(4) model a once-and-for-all jump in the process mean. In the MSI model,
however, a permanent change in regime causes the same dynamic response (dotted
lines in the left diagram of Figure 11.12) as the accumulated impulse-responses of a
Gaussian innovation with the same impact effect (dotted line on the right),
00
Thus the periodic structure of the impulse-responses as seen in right of Figure 11.12
is translated into the dynamic propagation of a shift in regime.
As long as the Markov chain is ergodic, though, the dynamic effects of a shift in
regime in both approaches differ only tran si tory. More precisely,
,
r ~.
.
j ~:
"\".
r; ,
, .
I "
.' l
-..
Ccntribution 0' th. McrkQ'I.I Choin to th . Bus in.Es Cycl e
Since the long-term mean growth rate depends only on the stationary distribution
of the state of the Markov chain t and is thus given by the unconditional mean
6.y = P, = M t<l) in the MSM model and 6.y = 2:;:0 <Pj H( in the MSI model,
respectively.
In Figure 11.13 the contribution of the Markov chain to the business cycle is again
measured by the estimated mean of 6.Yt conditioned on regime inference ~ =
{ttiT 1;=1' [LtiT = MttlT = E[6.Ytltl which can be calculated in the MSI(2)-
AR(P) model recursively as
P
[LtiT = L O:j[Lt-jIT + zltlT'
j=l
(11.5)
The p-th order differenee equation (11.5) is initialized with the uneonditional mean,
p
fi = (1 - L Öj )-1 (iidl + ii2(2).
j=l
Thus, the ealculation of fl,tlT is slightly more eomplieated than for MSM-AR models.
As ean be seen from the estimation results in Table 11.3 and Table 11.2, as well as a
eomparison of Figure 11.13 with Figure 11.5, the similarity of the regime classifie-
ations to those of the MSM(2)-AR(4) model is obvious. A major differenee whieh
oeeurs by using the 0.5 rule eoneerns the year 1987, where the MSI-AR model de-
teets a one-quarter reeession whieh leads the stock market crash.
Thus, Markov-switching models with a regime shift in the intercept term ean be used,
as weB as models with a regime shift in the mean, as a device to describe the German
business cycle.
In this section we will relax the assumption that the white noise proeess Ut is homo-
skedastic, instead allowing for regime-dependent heteroskedasticity of Ut,
Even if the white noise process Ut is homoskedastic, (T2 (St) = (T2, the observed pro-
cess ,6,Yt may be heteroskedastic. The process is called conditionally heteroskedastic
if the conditional varianee of the forecast error
is a funetion of Yt-l. This implies for MS-AR proeesses with regime-invariant auto-
regressive parameters that the conditional varianee is a function of the regime infer-
ence €t-llt-l' A necessary and sufficient condition for conditional heteroskedasti-
city of these processes is the predictability of the regime vector.
For the MSIH(2)-AR(P) model, the effect of the actual regime classification ~tit for
the conditional heteroskedasticity of the forecast error varianee in t + 1 is given by
2 2 2 2
+ (Tl + (JLl
A
(11.6)
236 Markov-Switching Models of the German Business CycJe
MSI(2)-AR(4) MSIH(2)-AR(4)
2.0 1---:::::======::::::---' 2.0 r---=======::::::::=----I
1.5 1.5
---------
1.0 Var (L'lYt+ll~t+l)
0.5 0.5
where tl,t+llt = (,~ pi tm. Ifthe variance(J'2 ofthe white noise term Ut is not regime
dependent, as in the MSI(2)-AR(P), the caIculation of the conditional forecast error
variance in t + 1 (11.6) simplifies to
In Figure 11.14, these two components of the forecast error variance are illustrated
for the MSI(2)-AR(4) model which has been discussed in Section 11.4 and the
MSIH(2)-AR(4) model, which will be introduced next.
It will be clarified that the uncertainty associated with regime classification tltlt(l-
tltld is immediately transformed into the forecast error variance through
where F = Pu - (1 - P22).
In MSM specifications, the caIculations are more complicated since the conditional
density P(Yt+l!Yt, ~d (and thus the conditional variance) depends on the MP+l di-
mensional state vector. The uncertainty resulting from /-L( St), ... , /-L( St-P+l) has to
be taken into consideration.
11.5. Regime-Dependent and Conditional Heteroskedasticity 237
In Section 7.4.1 it has been shown that the likelihood ratio test can be based on the
LR statistic
LR = 2(lnL()') -lnL(X o)),
where ).0 denotes the restricted ML estimate of the n dimensional parameter vector
.\ under the null Ho: <p(.\) = 0, where r= rk (a~\),,)) ~ n. Under the null,
LR has an asymptotic X2-distribution with r degrees of freedom as stated in (7.3).
Conditional on the regime dependence of the mean, J.LI > J.L2, or the intercept term,
VI > V2, likelihood ratio tests of hypotheses of interest such as (Ti = (Ti can be
performed as in models with deterministic regime shifts.
For the MSIH(2)-AR(4) model the conditional forecast error variance as weil as the
conditional variance of the error term have been illustrated in Figure 11.14.
238 Markov-Switching Models of the German Business Cyc1e
.
' . (nl'iji\
v:V ,,·A. ,.r
\ / ., ' ~
""
\ I .
\.;-0 .
'.
L
Smcotl'l ed ond rat.rad P"'Obobllit ies: R4t9 ime 2
" I
.'
I'. . : ..
. . .. ~
..
Contrio-ution 0' t". '-40t'lc.ov Choin to th. Bu, i"'.1$ Cycle
The kernel density estimation in Section 11.3.5 has provided same evidence that it
11.5. Regime-Dependent and Conditional Heteroskedasticity 239
I.·'
Smoot "'ed ond fi tter.d P robcbi liti.s: : Reg ime 1
'. .
,
1
. , J
'. 1-
.~-----~----~----.-~~~--~~~~----~----~----~
.. . ;
. \.f':;' .
, "
, I
may be too restrictive to assurne that the regime shift does not alter the variance of
the innovation Ut. An estimation of the MSMH(2)-AR(4) model seems to support
this view, ar = 1.5080 > 0.5348 = a~. However, for the null hypothesis MSM(2)-
AR(4): ur = u~, J.Ll > J.L2 versus the alternative MSMH(2)-AR(4): ur =I u~, J.Ll >
J.L2, we get the LR test statistic LR = 2[( -217.22) - (-218.78)] = 3.12. With an
conventional critical value of X6.9s(1) = 3.84146, the null hypothesis of a regime
invariant variance of the innovation process cannot be rejected.
In contrast to KÄHLER & MARNET [1994a], we conclude that allowing for regime-
dependent heteroskedasticity cannot significantly improve the fit of the model. But
this result may depend essentially on the estimation period. This outcome of the LR
test is visualized in Figure 11.16, which makes clear that the MSMH(2)-AR( 4) model
leads to a very similar regime classification as the MSM(2)-AR(4) model (cf. Fig-
ure 11.5). There are two major changes: the short CIBCR recession in 1963 is now
attached to the more volatile expansionary regime, and the same holds true for the
240 Markov-Switching Models of the German Business Cycle
The ARMA representation based model selection indicates that more than two stages
should be taken into account. In particular, a Markov-switching model with five re-
gimes has been recommended as an ingenious device.
0.4907 0.3157
- , 3.2997]
[ 0.0000
0.1936]
M = [ 0.6353, P = 0.9732 0.0268 .
-2.5763 0.9999 0.0001 0.0000
As seen in Figure 11.17, the MSI(3)-AR(0) model has completely lost its business
cycIe characterization. The outlier periods coincide with epochs of high volatility in
the process of economic growth; the period 1968-71 with an active Keynesian sta-
bilization policy and drops in industrial production caused by strikes, and the first
quarter in the year of the stock market crash 1987.
11.6. Markov-Switching Models with Multiple Regimes 241
MSI(3) MSnI(3) MSMH(3) MSI(4) MSnI(4) MSnI(4) MSnI(4) MSI(5) MSIH(5) MSIH(5)
-AR(O) -AR(O) -AR(3) -AR(O) -AR(O) AR(2) -AR(4) -AR(O) -AR(O) -AR(l)
P-l, VI 3.2997 3.2969 0.9068 3.4190 3.4027 3.5906 3.5143 3.9168 3.2809 3.5677
P-2, V2 0.6353 0.6370 1.2712 1.0601 1.2000 1.4362 1.3852 1.6698 1.4460 1.5170
P-3, V3 -2.5763 -2.6269 -0.3436 -0.1917 0.3388 -0.3528 -0.3657 0.9966 0.6312 1.1761
P-4, V4 -2.5008 -2.6237 -2.2066 -2.2104 -0.2181 -0.3499 -0.3255
P-S, VS -2.6016 -2.6180 -2.1021
0<1 -0.3576 -0.2376 -0.2345 -0.2386
0<2 -0.1268 -0.0012 -0.0038
0<3 -0.0985 0.0144
0<4 0.0265
Pll 0.4907 0.4948 0.7459 0.4007 0.4533 0.5527 0.5524 0.1433 0.4886 0.5366
P12 0.3157 0.3113 0.2541 0.4587 0.3810 0.2281 0.2284 0.3424 0.0000 0.2487
Pl3 0.1936 0.1939 0.0000 0.0000 0.0034 0.0000 0.0000 0.5143 0.3215 0.0000
P14 0.1406 0.1622 0.2192 0.2192 0.0000 0.0000 0.0000
PIS 0.0000 0.1899 0.2147
P2l 0.0000 0.0000 0.1782 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
P22 0.9732 0.9737 0.5814 0.8372 0.7573 0.7725 0.7846 0.6035 0.5015 0.3031
P23 0.0268 0.0263 0.2404 0.1099 0.1589 0.1861 0.1741 0.0000 0.4653 0.6969
P24 0.0529 0.0838 0.0414 0.0413 0.0000 0,0000 0.0000
P2S 0.3965 0.0332 0.0000
P3I 0.9999 1.0000 0.0081 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
P32 0.0001 0.0000 0.3094 0.2208 0.0794 0.3393 0.3271 0.0000 0.1100 0.0784
P33 0.0000 0.0000 0.6825 0.7792 0.9206 0.6607 0.6729 0.8825 0.7866 0.7620
P34 0.0000 0.0000 0.0000 0.0000 0.1105 0.0699 0.1022
P3S 0.0070 0.0335 0.0574
P4l 1.0000 1.0000 0.7880 0.7847 0.0000 0.0000 0.0000
P42 0.0000 0.0000 0.2120 0.2153 0.0421 0.2254 0.1835
P43 0.0000 0.0000 0.0000 0.0000 0.1737 0.0017 0.0000
PH 0.0000 0.0000 0.0000 0.0000 0.7843 0.7729 0.8165
P4S 0.0000 0.0000 0.0000
PSI 1.0000 1.0000 0.7853
PS2 0.0000 0.0000 0.2147
PS3 0.0000 0.0000 0.0000
PS4 0.0000 0.0000 0.0000
PSS 0.0000 0.0000 0.0000
EI 0.0753 0.0749 0.2922 0.0686 0.0651 0.0684 0.0685 0.0323 0.0752 0.0714
(2 0.8863 0.8873 0.4028 0.5945 0.2987 0.5765 0.5824 0.0595 0.1912 0.1595
(3 0.0384 0.0378 0.3049 0.2958 0.6006 0.3162 0.3100 0.5823 0.5315 0.4669
{4 0.0411 0.0356 0.0389 0.0391 0.2983 0.1636 0.2601
(s 0.0277 0.0385 0.0421
(I-Pll) 1 1.9633 1.9795 3.9358 1.6687 1.8291 2.2356 2.2340 1.1673 1.9555 2.1582
(1-P22)-1 37.2831 38.0814 2.3890 6.1419 4.1210 4.3957 4.6420 2.5218 2.0062 1.4349
(l-P33)-1 1.0000 1.0000 3.1492 4.5283 12.5944 2.9470 3.0574 8.5117 4.6861 4.2010
(1-P44)-1 1.0000 1.0000 1.0000 1.0000 4.6351 4.4031 5.4509
(l-pss)-l 1.0000 1.0000 1.0000
O"t 1.0306 1.1313 3.6251 0.7118 1.0626 0.2448 0.2588 0.7110 1.1576 0.2503
0"2 1.0501 0.1904 0.3955 0.4280 0.4510 0.1252 0.0522
2
0"2 0.3034 0.3854 1.1549 0.4308 0.4447 1.0913 0.8545
O"~4 0.3095 0.6612 0.6990 0.2432 0.5219
O"~ 0.3064 0.7706
InL -211.62 -210.43 -209.82 -206.37 -204.95 -198.11 -198.09 -201.70 -199.60 -190.45
242 Markov-Switching Models of the German Business Cycle
Il ~ IU ·
~I" . CI
C'
v
~ I
\1 l~\ J ~
."
I
~.
I1
'. · $n\oolft,. ", CI",II "'1\ .,...0 Pntbor. lkl. . : " ' - J
[ ~ H~ · Coto.lnoY h Ofto 0 1 IN
0
I
,
.
,
f ffi · f····
I
;, '.
.\ . i
: . . .•
' /".
.
.' ! .~ .
vi '.";:,
.1
,
·
1
The regimes 2 and 3 are associated with business eyde phenomena. Regime 2 with
an expected growth rate of 1.06, which is extremely dose to the 1.07 of the MSM(2)-
AR(4) model, reflects phases of"normal" expansions. Also, the reeessionary state 3
is quite eompatible with the Hamilton model. The expeeted duration of a recession
is identically 4.5 quarters, the conditional mean growth rate is -0.19 vs. -0.30. So,
11.6. Markov-Switching Models with Multiple Regimes 243
11
••
The regime probabilities associated with the MSI(4)-AR(0) model are plotted in Fig-
ure 11.18. Unfortunately, the cIassification uncertainty is relatively high. Allow-
ing for a first-order autocorrelation destroys some of these features as in an MSI( 4)-
AR(l) model. In contrast to previous specifications, first-order autocorrelation and
heteroskedasticity are not evident for MSI models with five regimes.
The model pre-selection in Section 11.2.3 has shown that an MSI(5}-AR(O) model
may have generated the autocovariance function of the observed West-German GNP
244 Markov-Switching Models of the German Business Cyc1e
Il ~. H~ . ,0
i lt kfDJ'\i~; f\__ I
~.
1~'----~'~~~'--~~j~--~~~~~----~--~~~~~LL2- ----4
I ~ . ~~~ l
,--_n..-~""'t
growth rates. Thus, a hidden Markov-chain model with five states and no auto-
regressive structures may be a feasible choice.
0.5
0.0 ~~~~~~~~~~~~~-L~~~~~~~~~~~~~
60 65 70 75 80 85 90 95
The MSI(5)-AR(O) model has remarkable tuming point dating abilities, as a com-
parison with the CIBCR dated tuming points in Figure 11.20 verifies.
In this section we have so far only considered hidden Markov-chain models with a
homoskedastic white noise. Augmenting the MSI(5)-AR(O) model with a first-order
autoregression and regime-dependent variances leads to an MSIH(5)-AR(1), where
the boom regime splits into pre- and post-recessionary expansion phases represented
246 Markov-Switching Models of the German Business Cyc1e
i
"
~ •
1,~,--~------------------------~----~--~----~. I
by regimes 3 and 2:
The main difference between the MSI(4)-AR(O) model and the MSIH(5)-AR(1) lies
in the function of regime 2. In the MSIH(5)-AR(1) model, regime 2 hel ps to rep-
licate excessively high growth after the end of recessions, a phenomenon that has
been stressed by SICHEL [1994] for the United States. It should be noted that the
MSIH(5)-AR(1) model is characterized by a very fast detection of economic up-
11.7. MS-AR Models with Regime-Dependent Autoregressive Parameters 247
swings.
For U.S. data, HANSEN [1992] found some evidence for an MSA(2)-MAR(4)
model with shifting autoregressive parameters, but a regime invariant mean. 9
Consequently, the previous assumption that the regime shift does not alter the
autoregressive parameters, will be relaxed in this section.
The estimation results for MSIAH(M)-AR(P) where all parameters are assumed to
shift are given in Table 11.6. For the MSIAH(2)-AR(4) model the probability ofbe-
ing in regime 2, which is characterized by a lower intercept term V2 = 0.3608 <
1.9978 = VI and variance o-i = 1.2004 < 6.7104 = 0-; is plotted in Figure 11.22.
gIn contrast to the MSA(2)-MAR(4) model, the likelihood ratio test proposed by HANSEN [1992]
could not reject a regime invariant AR(4) model against an MSM(2)-AR(4) model; compare, however,
HANSEN [1 996a].
248 Markov-Switching Models of the German Business Cycle
0.5 [1
65 70 75 80 8S 90 9S
Obviously, the regime shifts detected by these models are not closely related to turn-
ing points of the business cycle. The level effect associated with business cycle phe-
nomena seems to be dominated by changes in the covariance structure of the data
generating process in all MSIAH(M)-AR(P) models considered so far. Interestingly,
MSIAH(4)-AR(P) models exhibit regimes with unit roots. Thus these models are re-
lated to the conceptof stochastic unit roots as in GRANGER & SWANSON [1994] and
have to be investigated in future research.
Instead we are going to test the MSI(2)-AR(4) and the MSIH(2)-AR(4) model
against the MSIAH(2)-AR(4) model where all parameters are varying apply-
ing usual specification testing procedures. The likelihood ratio test statistic is
LR = 2[(-213.54) - (-219.56)] = 12.04 for a test of the MSI(2)-AR(4) against
the MSIAH(2)-AR(4) and LR = 2[(-213.54) - (-219.49)] = 11.90 for the
MSIH(2)-AR(4) model. Under the number-of-regimes preserving hypothesis.
Vl =I V2. and critical values of X6.9S(5) = 11.0705 and X6.9S(4) = 9.48773, the
null hypothesis of regime-invariant autoregressive parameters is rejected at the 5%
level.
J
J
I
J
'-
'\ \ ) .
- ,
J,V ·\"'
: i
!' "
;~ .. .
- .. .
;'1;
.~. .
. ~ .
Although the Hamilton model of the U.S. business cycle cannot be rejected in the
cIass of two-regime models, our previous analysis indicates that there are features
in the data which are not well-explained by the MSM(2)-AR(4) model. In partic-
ular, the evidence for an MS(5)-AR(0) model suggests to take the extreme macro-
economic ftuctuation of the 1968-71 period into eonsideration. These results un-
derline the need for an extension of the Hamilton model for an adequate empirical
eharacterization of the West-German business eycIe. Therefore, in this seetion, the
Hamilton model is augmented in two respects: an additional regime is introduced
and the varianees are considered to be regime-dependent.
1.0 r-~~--~~--~~--~r-----~~~~--~~----~
.. . - __ ~
., ..
0 .5
0.0 ~~~~~~~~~~~~~~~~~~~~~~~~~~
60 65 70 75 80 85 90 95
This model has two important features . First, the business cycle phenomenon of the
Hamilton model are replicated by the second regime and the third one. Secondly,
the first regime separates those periods from the virtual business cycles which have
been captured by the MSI(5)-AR(0) model as the three outlier states. In comparison
to the normal expansionary regime, this episode is characterized by a slightly higher
mean and a much higher variance of innovations.
- - (1 ) - ~( 1) - - (1)
Yt M ~tlT - 0.2963 (Yt - l - M ~t-lIT) - 0.1407 (Yt-2 - M ~t-2IT)
(0 .1060) (0 .1081)
- -(1) - -(1) -
0 .0290 (Yt-3 - M~t-3IT) + 0.2127 (Yt-4 - M~t-4IT) + Ut
(0.0949) (0 .0916)
= [ 1.3438
(0.4708)
0.9368
(0.1344)
0.3630
(0 .1509)
],
[ 3.6596
(1.9046)
0.8146
(0.1840)
0.5344
(0 .1853)
].
252 Markov-Switching Models ofthe German Business CycJe
(
[ 01096]
0.6469 E[h] =
[ 14.1991
9.4191]
0.2434 5.3430
In Figure 11.24, the smoothed and filtered probabilities of the recessionary regime
3 are compared with the business cyele elassifications of the CIBCR. The empir-
ical characterization of the business cyele is quite elose to those of the MSM(2)-
AR(4) and the MS(5)-AR(O) model. More fundamentally, regime shifts coincide
with CIBCR turning points. Interestingly, regime discrimination is maximized, and
thus regime uncertainty minimized.
if the information set used for the parameter estimation is denoted by YT , YT +j was
used to derive the h-step prediction YT+j(h).
The forecast performance of the models over the following 20 quarters (1990:1 to
1994:4) has been measured in terms of the root of the mean squared prediction eITors
(RMSPE). The results are summarized in Table 11.7.
However, except for the one-step forecast, the MSM(3)-AR(4) model substantially
outperforms the alternative model-based predictors. Our results are thus in contrast
254 Markov-Switching Models of the German Business Cycle
o +-~~~~~~~~~~~~~-L~~~ __ ~~~~~~~~
60 65 10 15 80 85 90 95
to the evidence found in the literature supporting the claim that non-linear time series
models tend to have superior in-sample and worse post-sample abilities. For ex-
ample, GRANGER & TERÄSVIRTA [1993, p.l63] found for logarithms ofU.S. GNP
data from 1984:3 to 1990:4 that the post-sample root me an squared prediction error
of a smooth transition regression model was 50 % higher than for a corresponding
linear model.
One explanation for the unsatisfaetory performance may be the singularity of events
in the forecast period. This forecasting period was affected by the monetary union
with the former East Germany (GDR), the reunification of East and West Germany,
the gulf-war, and a severe international recession (to be considered in Chapter 12). A
non-linear model may be expeeted to be superior to a linear one only if the foreeast-
ing period eontains those non-linear features. For the period under eonsideration, the
non-linearities associated with the business eycle phenomenon seem to be dominated
by the politieal shoeks affeeting the econornie system.
In particular, the high one-step prediction errors see m to be associated with some
instabilities of the autoregressive parameters. This is supported by the superior per-
formance of the MSM(2)-AR(O) model, where we have set the AR-parameters ofthe
estimated MSM(2)-AR(4) model to zero.
11.10 Conclusions
This analysis has examined whether MS-AR models could be useful tools for the
investigation of the German business cycle. It has been shown that - among other
preferred models - the MS(2)-AR(4) model proposed by Hamilton is able to cap-
ture the main characteristics of the German business cycle. Potential for improve-
ment using the introduction of more than two regimes has also been established. In
particular, the Markov-switching model with additional regimes reftecting episodes
of high volatility has been recommended as an inventive device for dating the Ger-
man business cycle and multi-step forecasts. Dur findings demonstrate the effects
of model selection; the new insights and improvements gained from departing from
the basic MSM(2)-AR(4) model, which dominates the literature, have been proven
to be worth the effort.·
A main assumption of OUf analysis which might be relaxed is that of fixed transition
probabilities. For post-war O.S. data, DIEBOLD et aI. [1993] found that the memory-
lessness of contractions and expansions are strongly rejected. Models with varying
transition probabilities have been considered by DIEBOLD et ai. [1994], FILARDO
[1994],LAHIRI & WANG [1994],andDuRLAND & MCCURDY [1994]forU.S. data
and should be applicable to Markov-switching models of the German business cycle.
GHYSELS [1993] ,[ 1994] has proposed an MS model with seasonal varying transition
probabilities, which is more suited fer seasonally unadjusted data.
The necessary instruments for implementation of these models for German data are
given in Chapter 10. In any case, the two remaining chapters of this study are con-
cemed with another imperfection of the models considered so far. The business cycle
as defined by BURNS & MITCHELL [1946] is essentially a macroeconomic phe-
nomenon, which reftects co-movements of many individual economic series. There-
fore, the dynamics of the business cycle have to be considered in a multiple time
series framework. A univariate approach to the German business cycle must be con-
sidered unsatisfactory. In the next chapter, we will show how the tradition al distinc-
256 Markov-Switching Models of (he German Business Cyc1e
tion in an analysis of co-movements among economic time series and the division
of the business cycle into separate regimes can be solved by means of the Markov-
switching vector autoregressive model. In addition, Chapter 13 investigates the ap-
plicability of the Markov-switching model to the analysis of cointegrated systems.
11.A. Appendix: Business Cyc1e Analysis with the Hodrick-Prescott Filter 257
-2
- 4
-6
60 65 70 75 80 85 90 95
A broadrange ofbusiness eycle studies generate "stylized facts" ofthe business cycle
using the Hodrick-Prescott (HP) filter which derives the trend component 1it of a uni-
variate time series Yt as the result of the following algorithm:
T T-l
argmin 2..:(Yt - flt)2 +). 2..: (ßYt+1 - ßYt)2, (11.7)
t=l t=2
where ßYt = Yt - Yt-I. The first-order eondition for Yt, 2 < t < T - 2 associated
with the optimization problem (11.7) is given by
The system of first-order conditions for {:ilt} ;=1 results in the following linear filter:
-1
'fh 1 -2 1 0 Yl
'fh -2 5 -4 1 Y2
fh 1 -4 6 -4 1 Y3
IT + .,\
YT-2 1 -4 6 -4 1 YT-2
YT-l 1 -4 5 -2 YT-l
YT 0 1 -2 1 YT
The cyclical component is given by the residuum of this procedure, Yt - Yt. Thus
the cyclical component measures the deviation of the considered series from its loeal
trend. For the West German GNP data, the eyclieal component is plotted in Fig-
ure 11.26 where .,\ = 1600 has been chosen as in KYDLAND & PRESCOTT [1990].
A comparison with Figure 11.4 clarifies the fact that peaks and troughs of the series
detrended with the HP filter coincide with those of the change in GNP against pre-
vious year. CIBCR-reeessions are not associated with low values but with sharp de-
clines of the cyclieal component, which indicates that there might be a unit root in the
cyclical component. Furthermore, it is neither clear how the turning points should
be dated nor how the filtered data could be used e.g. for forecasts. Furthermore,
the statistical properties of the filter have been criticized recently inter aUa by KING
& REBELO [1993], HARVEY & JAEGER [1993], COGLEY & NASON [1995] and
BÄRDSEN et al. [1995]. In particular, COGLEY & NASON [1995] have shown that
the HP filter can generate spurious cycles when the time series are integrated as in
our case (cf. Section 11.2.1).
Altogether, the CIBCR classification of the German business cycle seems to be the
best available benchmark for measuring the quality of empirical characterizations of
the German business cycle by means of MS-AR models.
Chapter 12
Since OUf primary research interest concems business cycle phenomena and not the
convergence of per capita income and growth, the national trends are eliminated sep-
arately. To use a precise notation, the considered MS-VAR model in differences is
called an MS(M)-DVAR(P) model. The issue of cointegration will be investigated
in the next chapter and is therefore not dealt with here. The analysis of co-movement
of economic time series within cointegrated systems with Markov-switching regime
will conclude this study.
The study uses the data from the OECD on real GNP of the USA, Japan and West-
Germany, as weIl as real GDP of the Uni ted Kingdom, Canada and Australia. The
data set consists of quarterly seasonally adjusted observations. The estimation
period, excluding presample values, covers 120 quarters from 1962: 1 to 1991 :4.
The time series were tested for unit roots. Each one was found to be 1(1). Thus, first
differences of logarithms (times 100) are used, which are plotted in Figure 12.1.
Most empirical research has been done with a single equation Markov-switching
model, see interalia LAM [1990], PHILLIPS [1991], GOODWIN [1993], KIM [1994],
KÄHLER & MARNET [1994a], KROLZIG & LÜTKEPOHL [1995] and SENSIER
[1996]. All ofthese cited investigations have applied the HAMILTON [1989] model
of the U .S. business cycle with at best slight modifications. In line with these studies,
we investigate the national business cycle phenomena in oUf data set by means of an
MSM(2)-AR(4) model. In contrast to previous studies, evidence ofthe inadequacy
of the Hamilton specification is revealed, at least in the case of the Japanese growth
process.
12.1. Univariate Markov-Switching Models 261
6 6
USA CAN
4 4
2 2
0 0
-2 -2
-4 -4
60 65 70 75 80 85 90 60 65 70 75 80 85 90
6 6
UK FRG
4 4
2 2
0 0
-2 -2
-4 -4
60 65 70 75 80 85 90 60 65 70 75 80 85 90
6 6
JAP AUS
4 4
2 2
0 0
-2 -2
-4 -4
60 65 70 75 80 85 90 60 65 70 75 80 85 90
An overview of our estimation results is given in Table 12.1. The models have been
estimated with the EM-algorithm discussed in Chapter 6 and Chapter 9. In contrast
to GOODWIN'S [1993] analysis, which used numerical optimization techniques, we
were not forced to employ Bayesian priors in order to derive meaningful business
cycle phenomena (even for the Japanese and the United Kingdom data).
262 Markov-Switching Models of Global and International Business Cyc1es
0.5
60 65 70 75 80 85 90
Contribution of the Markov Chain to the Business Cyde
.:.:.::::::.:
4
12.1.1 USA
In contrast to HESS & IWATA [1995] who observed a breakdown of the Hamilton
model for data which indudes the end of World War II and the Korean War, our res-
ults for the 1962-1991 period reveal structural stability of the MSM(2)-AR(4) model
of the U. S. business cyde. The estimates given in Table 12.1 are broadly consistent
with those presented by HAMILTON [1989] for the 1952-1984 period. This concerns
the conditional means, {LI = 1.06 vs. 1.16 and {L2 = -0.12 vs. -0.36, as weIl as
the transition probabilities, Pu = 0.93 vs. 0.90, P22 = 0.86 vs. 0.75, and the er-
ror variance 0- 2 = 0.57 vs. 0.59. The filtered and smoothed probabilities generated
by the MSM(2)-AR(4) model are presented in Figure 12.2. Interestingly, as for the
NBER classifications used by HAMILTON [1989] we found that the expansion and
contraction episodes found by the Markov-switching model
264 Markov-Switching Models of Global and International Business Cyc1es
::::;::
.. ....
.',',',
'
.-:: ' , '
0.5 ;:::::;
:::::::
:::::;:
.... ',.
65 70 75 80 85 90
Contribution of the Markov Chain to the Business Cycle
6
12.1.2 Canada
The Canadian economy is characterized in the sampie period by two strong contrac-
tions in 1981/82 and 1990/91. As illustrated in Figure 12.3, the MSM(2)-AR(4)
model captures these deep recessions as shifts from regime 1 with an expected
growth rate [LI = 1.18, to regime 2 with a negative mean growth [L2 = -0.37. The
shorter contractionary periods in 1974 and 1980, which are classified however by the
CIBCR as downswings of the business cycle, are not explained as being caused by a
shift in regime, but rather as negative shocks in an underlying expansionary regime.
Note also that our estimations are quite compatible to those of GOODWIN [1993]
who was compelled to use Bayesian priors to establish a meaningful result.
12.1. Univariate Markov-Switching Models 265
:+ "t(
:,:
:
:t.
. ... •
,".'r :,·,-: ,I
"
""
I
I
65 70 75 80 85 90
Contribution of the Markov Chain to the Business Cycle
6
2 '
, ...
- 2 "
- 4 ~--~~~~'~
" ~~~~'~
' --~~----~~~~--~~~~~~ " ~
"
60 65 70 75 80 85 90
The macroeconomic ftuctuations in the Uni ted Kingdom are marked by the three
strong recessions dated by the MSM(2)-AR(4) model as the periods from 1973:4 to
1975:3,1979:3-1981:3 and 1990:2-1991:4.
The CIBCR methodology leads to three additional, yet shorter, recessions in 1966
and 1971/72. But Figure 12.4 shows that these are not clearly reftected in the
quarterly GDP growth rate of these periods and hence not detected by the MS-AR
model. Note that this is in line with the estimates of GOODWIN [1993] and SENSIER
[1996].
266 Markov-Switching Models of Global and International Business Cyc1es
,
I
1
1
I
0.5 I.
0.0 f-.....:....;~__~,.;c.::.::::...:..J:.:A04~Iooi[.;...:.J,O'""-'=~..c.....R:liolt~:...:....:...:....:..:.lo!.I,J~~.;;..;::Qt.a.e.:....J
60 65 70 75 80 85 90
Contribution of the Markov Chain to the Business Cycle
6
· ......... ::::, ........ -,
· .'.' .....
4
· ... ' .....
· .'.' ..... ?:::::::::::
-2
-4 +-~~~~~~--~~~--~~--~~,.~~~.~.~.~.--~~.~~.~~~
~tmmt ... .. ..
60 65 70 75 80 85 90
12.1.4 Germany
MS(M)-AR(P) models of the German business cycle have been discussed at full-
length in Chapter 11. A comparison of the estimated parameters in Table 12.1 and
Table 11.2 as wen as Figure 12.5 with Figures 11.7 and 11.7 shows that the addi-
tional 12 observations, together with the update of the 1990/91 observations, have
only limited effects. Interestingly, the results are again very elose to the estimations
of GOODWIN [1993].
In comparison to the D.S. business cycle, the recessions are shorter (4.4 vs. 7 quar-
ters), but more pronounced (-0.4% vs. -0.1 %). The variance of the white noise is
higher and regime shifts more frequent. Thus, relative to the process of D.S. GNP
growth, the German growth rates are more difficult to predict.
12.1. Univariate Markov-Switching Models 267
0.5 1
::~::::
65 70 75 80 85 90
Contribution of the Markov Chain to the Business Cycle
6 :::::.:.:.:.:.
4 I::::::::::
2
- 2
- 4 +---~~~~--~-,~~--~+-~~--~~~~~~~ __~~~
60 65 70 75 80 85 90
12.1.5 Japan
The estimated conditional means, [ll = 1.4 and [l2 = -0.26 are quite compatible
with the business cycles of the other countries under consideration. However, Fig-
ure 12.6indicates that the process of economic growth in Japan is not described very
weIl by a two-regime model. The MSM(2)-AR(4) model underestimates the me an
growth rate in the first part of the sampie and overestimates the mean growth rate in
the second part of post-war economic history.
268 Markov-Switching Models of Global and International Business CycJes
--9-
I. • • • 14
'/'IIOOC~ ..., ,.... 'tMCIlI_: 1
I. . .. J.4
s..o.ttwd .... ,..,... ....... ~! • .p- }
'. l<
~ .... r..... m..bllIiII.: 111 ...... t.
\I~.-----:-----:--~~,~--:------!I
~..,. . ", I,.. IM"' ... 0It0_ ,. u.. ...-- c,.
,I
I .
~'~~ r.
" ·1
Thus, we will consider MS-AR models with more than two regimes. The estimation
results of the more general MS(M)-AR(P) are given in Table 12.2.
The MSI(4 )-AR( 4) model presented in Figure 12.7 reveals a structural break in the
business cycle behavior of the Japanese economy. A growth cycle (regime 1: VI =
3.0 vs. regime 2: V2 = 1.22) is identified until 1974. The contraction in 1974 is
identified as an outlier state with V4 = - 2.8 and an expected duration of exactly one
quarter. The recession initiates a third regime of dampened macroeconomic fluctu-
ations. This regime is the .absorbing state of the regime shifts generating Markov
chain with an expected growth rate of &(1) V3 = 1.13.
12.1. Univariate Markov-Switching Models 269
J_ .. • •
1 1/J A/..
Smoot " 1oCI 0"4 fllt .,..d PrablDIII' ..... : 1It~ Z
1I Ä ~. (1. I
The more parsimonious MSI(3)-AR(4) model subsumes the growth recessions be-
fore 1974 and the post-1974 episode as a joint "normal growth" regime. Virtually
unchanged are the "high-growth" regime and the remaining third stagnationary re-
gime as Figure 12.8 clarifies.
So far we have assumed that the variance is regime invariant. However, this hypo-
thesis is rejected by likelihood ratio tests., The LR test statistic gives 12.36 for Ho:
MSI(3)-AR(4) vs. H 1 : MSIH(3)-AR(4) and 13.94 for Ho: MSM(3)-AR(4) vs. H 1 :
MSMH(3)-AR(4) which are both significant at 1%, X5.99(3) = 11.3.
12.1. Univariate Markov-Switching Models 271
IV _\1\ ,
Smoou~f(t ond f li litfed Probo rWIiri.. ; 1iI'~ 1
1I
Smootl'llM3 Ol"ld FnltrH PrOCoclbtl~ : Ittqlm.t '2
I
The foregoing results of the MSM(2)-AR(4) model confirm the evidence found in
the previous literature that the Harnilton model is able to replicate traditional busi-
ness cycIe cIassifications. However, as in Chapter 11, we have also seen that there
are structural breaks in the data which cannot be subsumed under the notion of busi-
ness cycIes. These findings for Japan of the pre-197 5 period are similar to the result
of MINTZ [1969] that for the West-German economy of the fifties and sixties only
growth cycIes can be identified.
272 Markov-Switching Models of Global and International Business Cycles
.. ill.
~ .. .
: :fTt ~
~;~ : I .....
.-ll·
rl:
,,
r:.;·
I , . .... . .
::_~ :
:: 1". '
::~ :;:t:
0.5 :1
! lt: ::: 1
..
.
I
" I
: '.:-:-;.>
Jw ?\\;
I1 I - I1
Vi
I
~f l
0.0 .'W ,A
~~~~~~~~~~~~~~~~~,-~~~~~~--~~~
~~
60 65 10 75 80 •••••••••••• 85 90
Contribution of the Markov Chain to the Business Cyde
0.5 :->:-
......
'I< ': .:
:::-:-:-
u · . .
".
- - ..
65 70 75 80 85 90
12.1.6 Australia
As the last single equation analysis in this study, we investigate the Australian macro-
economic fluctuations with the help of the MSM(2)-AR(4) model. To our know-
ledge, there exists no result in the literature on MS-AR models of the Australian
business cycle. Hence, we ~se again the CIBCR business cycle classifications as a
benchmark.
The estimated parameters given in Table 12.1 are quite compatible with the MSM(2)-
AR(4) models discussed previously. Figure 12. 10 reveals a relatively high volatility
of the Australian growth process. While this observation seems to be consistent with
the high frequency of CIBCR recessions, the expected duration of expansion is less
than one year, which is much shorter than those of the other country models and the
notion of a business cycle. Hence we have considered alternative specifications.
12.1. Uni varia te Markov-Switching Models 273
0.5
65 70 75 80 85 90
Contribution of the Markov Chain to the Business Cycle
6
-2 .. ..
-4 +-:->-:----~·~·_·_·~·--~~~~:~
::::~::::~-~·Ä·~~~:~}~
.:::+::::~~~~---Ä~
60 65 70 75 80 85 90
- A( 1 ) - A( 1 ) - A(1 )
l'1Yt = M~tIT- 0.0217 (.6.Yt-l-M~t_lIT)- 0.0903 (.6.Yt-2-M~t_2IT)
(0.0954) (0.0987)
- A(l) - A(l) _
+ 0.1603 (.6.Yt-3 - M ~t-3IT) - 0.1411 (.6.Yt-4 - M ~t-4IT) + Ut
(0.0985) (0 .0906)
a-21 = 1.2379 ö- 2
2 -
- 0.2154 InL = -181.8147
(0.1809) (0 .1238)
A comparison with Figures 12.3 and 12.4 clarifies that the regime shifts of the
MSMH(2)-AR(4) model are closely related to those in the UK and Canada. This
coherence of regime shifts suggests the notion of a common regime shift generating
process.
In contrast to our analysis in the foregoing chapter, we will not go further into the
details of model specification of the univariate time series under consideration. In-
stead, we will move direct1y to the system approach by studying a six-dimensional
system of the global economy.
12.1.7 Comparisons
In the preceding discussion, we have seen that the MS(M)-AR(P) model is able to
capture the business cycle dynamies of the considered national economies. The re-
cession probabilities that we have obtained for the six countries are compared in
Figure 12.12. At least for the last two decades, recessions and booms occur simul-
taneously across countries (with four exceptions regarding Japan, Canada and Aus-
tralia); this might be due to the world-wide oil price shocks or the increasing glob-
alization of markets.
12.1. Univariate Markov-Switching Models 275
10
0.5
0.0
r , ~lL71-JYj ,~1l
A
60 65 70 75 80 85 90
276 Markov-Switching Models oE Global and International Business Cyc1es
15 15
USA CAN
10 10
5 5
0 0
-5 -5
60 65 70 75 80 85 90 60 65 70 75 80 85 90
15 15
UK FRG
10 10
5 5
0 0
-5 -5
60 65 70 75 80 85 90 60 65 70 75 80 85 90
15 15
AUS
10 10
5 5
0 0
-5 -5
60 65 70 75 80 85 90 60 65 70 75 80 85 90
This evidence seems also to be consistent with a comparison of the path of annual
growth rates given in Figure 12.13.
12.2. Multi-Country Growth Models with Markov-Switching Regimes 277
In this section we investigate common regime shifts in the joint stochastic process
of economic growth in the six countries under consideration. More precisely, for the
system of quarterly real GNP growth rates,
In general, it would be possible to consider regime shifts for each individual country
separately. However, together with the possible leadingllagging relationships, this
formulation entails that the number of regimes would explode to M K = 26 = 64
(cf. Table 7.2). The dimension of the regime vector involved in the EM algorithm
would be (M K )2 = 64 2 = 4096 for an MSI specification or even (M K )(1+p) for
an MSM model, which makes the analysis impossible.
Thus we assume in the following that the regime shifts are perfectly correlated. As a
consequence, the dynamic propagation mechanism of impulses to the system con-
sists of (i.) a linear autoregression representing the international transmission of
national shocks and (ii.) the regime shifts generating Markov process representing
large - contemporaneously occurring - common shocks.
This procedure is in line with PHILLIPS' [1991] analysis of monthly growth rates of
industrial production, where U .S. data have been combined with UK, German and Ja-
panese data. In none of the bivariate difference-stationary MSM(2)-DVAR(1) mod-
278 Markov-Switching Models of Global and International Business Cycles
eIs considered by PHILLIPS could the null hypothesis of perfectly correlated regime
shifts be rejected.
But our analysis does not only extend the approach of PHILLIPS [1991] to large VAR
systems. In particular, we do not restrict our investigation to MSM-DVAR models
with M = 2 and p = 1. For pure VAR(P) processes, the Akaike order selection cri-
terion suggests a first-order autoregression in differences (p = 1), while the Hannan-
Quinn and the Schwarz criteria support a random walk with drift (p = 0). However,
such a specification seems to be a good starting point. If we consider time-invariant
VAR(P) models as approximations of the infinite VAR representation of a data gen-
erating MS-VAR process, then we get p = 1 as the maximal autoregressive order for
the MS-DVAR process by neglecting the non-normality assumption of the model. In
addition to p = 1, the goodness of fit achieved for each component makes a fourth
order autoregression of the system attractive.
In the following specification analysis, we are going to test the order of the vec-
tor autoregression p for various MSI and MSM specifications, introduce additional
states M, and allow for shifts in the variance 2: (St). In order to demonstrate the feas-
ibility of the methods proposed in this study, we have put no further restrictions on
the regime switching process. The limited number of regimes can therefore capture
quite different shifts in this rather large vector system. Indeed, we will show in the
following that alternative specifications ofthe MS(M)-DVAR(P) model lead to dif-
ferent but complementary conclusions on the economic system under consideration.
This strengthens our view that model variety is essential for the statistical analysis
of time series subject to regime shifts and that the necessary methods which allow
for the estimation of these models have to be provided.
.. . ;.:-:-: .
. .....
. :-,';'>
:: .:-:::;:;:;
.. . ... .
. ... .
65 7'0 75 80 85 90
The results are given in Table 12.4 for a homoskedastic white noise process and in
Table 12.3 for a process with a regime-dependent variance-covariance matrix 1: (St).
The implications for business cycle analysis are visualized in Figure 12.14 and Fig-
ure 12.15. For both specifications, a contemporaneous structural break in the growth
rate of a11 six time series is detected in 1973:2 when the system approaches the ab-
sorbing state 2. This structural break detected by an unrestricted MSM(2)-DVAR(1)
model is known in economic history as the end ofthe 'Golden Age'. The striking fea-
ture ofthis period after World War II ("-' 1950-1973) has been an average growth rate
which is more than double the mean of any other period in history (cf. e.g. CRAFTS
[1995]).
The estimated slump in the mean growth rate in the MSMH(2)-DVAR(I) model is
given by:
.6037 0.4754
1.3313 0.9462
.6336 0.4926
{LI - {L2 = f..L2
.5279 0.3707
.7833 0.6757
.8325 0.6141
As Table 12.4 verifies, these estimations are almost identical to those of the MSM(2)-
DVAR(1) model.
280 Markov-Switching Models of Global and International Business Cyc1es
0.5
..
....
r;. :- :-:
:., :. '.: .: .:1., \:
.:.:::~ II
.....
~:
. . . . . . . ..
.:-:-: -::: ::
... ::' ...:::::;:;
0.0 ~"__L/·~·,_··~··~;'~t_·,_·,__~",·~·::;~[:~I~t~A~"r··_______.~'.~::._"~-:::~::::~:--~~-----r~"~'
60 65 70 75 80 85 90
Consider now the contemporaneous correlations of the first four variables of the sys-
tem, where the lower triangular matrix gives the contemporaneous correlations in
regime 1 (1962: 1-1973: 1) and the upper triangular matrix gives those in regime 2
(1973:2-1991 :4):
USA .267 .113 .206
-.100 JAP .209 .267
-.179 .265 FRG .352
-.016 .297 0.153 UK
The importance of shifts in the variance of the white noise process Ut is confirmed
by the strong rejection of the MSM(2)-DVAR(I) model against the MSMH(2)-
DVAR(1) model: the likelihood ratio test for the Ho: ~1 = ~2 yields LR =
52.3869, wh ich is significant at 0.1 % (X5.999 (21) = 46.8), where we again assume
VI -=1= V2 to be valid under the null and the alternative.
12.2. Multi-Country Growth Models with Markov-Switching Regimes 281
..
65 70 75 80 85 90
A parsimonious model that generates global business cycles with only two regimes
and a first-order autoregression is presented in Figure 12.16 and Table 12.5.
The recessionary regime coincides with the post-1973 U .S. recessions identified with
the Hamilton model of Section 12.1.1. The smoothing and the filtering procedures
identify the oil-price shock recession from 1973:2 to 1975: 1, the double-dip reces-
sion of 1979:3-1980:3 and 1981 :2-1982:4 as weIl as the recession in the nineties.
They are associated with contractions in the UK and rather slow growth in the other
countries:
1.0355 -0.2551
0.7996 0.4776
0.4284 0.2798
VI - V2 = V2
0.9459 -0.0074
0.6118 0.0957
0.8360 0.0460
282 Markov-Switcmng Models of Global and International Business Cyc1es
.....
. "."
0.5
75 80 85 90
Figure 12.17 gives the business eyde dating ifthe HAMILTON [1989] specifieation is
applied to the multiple time series under eonsideration. Aeeording to the estimated
parameters in Table 12.6, the effect of a regime shift is given by
1.8294 -0.7622
1.0072 0.5704
1.1454 -0.2149
/-tl - /-t2 = /-t2 =
1.1818 -0.3876
1.6696 -0.3795
1.3761 -0.2193
.. ....
:: . . ::::: ..:.:.:.: ! <m(
0.5
. ..... . .. .. ,' ..
. ..... .:<}:::::
..
. . ...
.....
.... :::: ..... . ..
0.0 ~~~~~~~~~.~..~~~.~.~
.. ----~~~:~::~:::~::~:.::~:::.~~.~.~.~.~!~!~L'~'~'
60 65 70 75 80 85 90
However, while in the MSM(2)-DVAR(4) model the second regime was associated
with negative mean growth rates in all national economies, except Japan, the re-
cessionary state reveals contractions only in the UK and Australia, while the me an
growth rate in the USA, Canada, Japan and West Germany corresponds closer to
growth recessions
0.1904 0.7958
0.4285 l.4707
0.4254 0.3777
J.L2 = -0.0293 0.9626
0.2607 l.0627
-0.2344 l.6825
Thus, this model detects some asymmetries in the national size of the global busi-
ness cycle; the effect of a shift in regime is for the German economic process, with
284 Markov-Switching Models of Global and International Business Cycles
annualized 1.51 %, much less important than in the rest of the world, where the drop
in the mean growth rate is between 3.06% and 6.73% per anno.
In line with the MSMH(2)-DVAR(1) model, the innovations in the rest of the world
are highly positively contemporaneously correlated shocks in the U.S. growth rate in
the second regime. While the variance of all other growth rates is reduced in regime
2, the U.S. standard error is doubled.
••••.
•
t::::::: ~:.
;::>:.:.:.:.:
H::::::::::~
0.0 +-~~~~~~~~~--~~------~~~~~~~--~--~
60 65 70 75 80 85 90
60 65 70 75 80 85 90
0.6694 1.2692
1.8637 2.8917
0.8296 1.4081
fLi - fL2 = fLi = (12.1)
0.5439 1.2664
0.8413 1.6168
1.0953 1.8695
Interestingly, global recessions are again asymmetric. Negative mean growth rates
are restricted to the four English-speaking countries, whereas the loss in economic
286 Markov-Switching Models of Global and International Business Cycles
1.0461 -0.4463
0.2657 0.7623
0.1919 0.3866
{12 - {13 = {13 -
1.6160 -0.8935
0.8850 -0.1095
0.8293 -0.0551
In the dass of Markov-switching models with two regimes we have found evid-
ence for a fourth-order autoregression. Table 12.9 gives the estimation results for
an MSIH(3)-DVAR(4) model, where again regime 1 reflects high-growth episodes,
regime 2 corresponds to a 'normal' macroeconomic growth and regime 3 indicates
recessions. In comparison with the MSMH(3)-DVAR(1) model, it produces a relat-
ively shorter duration ofthe high-growth regime (3.3 vs. 12.7 quarters), but a dearer
indication of recessions (5.5 vs. 2.4 quarters).
In the first chart of Figure 12.20 we have again given the CIBCR business cyde das-
sification for Japan. It can be seen that regime 1 matches the down swings of the
growth eyde very weIl. Moreover, this result is quite compatible to the UK and Ger-
man dassifications:
0.4353 1.4626
1.4814 2.4330
1.5943 1.9709
2.0775 3.0103
0.8326 1.7937
0.7864 1.5609
A direct comparison of these impact effects of a shift from regime 2 to regime 1 with
(12.1) in the MSMH(3)-DVAR(1) world could be misleading, since the assumed dy-
namic propagations of regime shifts are different. In an MSI-DVAR model, a persist-
ent shift in regime causes effects which are equivalent to the accumulated responses
of an impulse as high as VI -V2; while, in the MSM-DVAR model, a once-and-for-all
jump in the mean growth rate is enforeed.
12.2. Multi-Country Growth Models with Markov-Switching Regimes 287
::«<
r----,-r----~I~----or~~------~----------------~
III :. :. :. :. :. <-: . . . . .
0.5
I
I
I
I
..... t:::::::·:·
.~: ~
I
I
I
I
I ::: ::: : :: . •...•:.: :.: .: :.: .: i.· •.: :.. : ...
}}~.
0.0 +---~~~--~~~~~~~------~~~--~~----~~
60 65 70 75 80 85 90
..
0.5 ...... . ,
..
: .: .:.:.....:.:.:.:. :.:.::. I~. i
0.0 ~.. ~~·~.·~.·~·~~~~~~~n~~.~..~.~a~ftu-~~L.~..~.k-~~l~·\_n~I'~:~~.:~:::~:::
.
... ~. !~I ". .... : ::: : ~ , ....
60 65 70 75 80 85 90
The recessionary regime 3 coincides obviously with the post-1973 recessions ofthe
V.S. economy which are associated with contractions in all other countries:
1.2128 -0.1855
0.8816 0.0700
0.7227 -0.3461
V2 - V3 = V3
1.0426 -0.1098
0.6992 0.2619
0.9943 -0.2198
The smoothed, as weIl as the filtered, probabilities of regime 3 reftect the oil-price
shock recession from 1973:2 to 1975: 1, the double-dip recession of 1979:311980: 1-
1980:3 and 1981:2-1982:4, and the last recession starting 1990:4.
It needs no furtherclarification to see that the MSIH(3)-DVAR(4) model has the best
fit of all estimated models with a log-likelihood of -812.76. While a likelihood
ratio test of the three-regime hypo thesis would be confronted with the violation of
288 Markov-Switching Models of Global and International Business Cycles
the identifiability assumption of standard asymptotic theory (cf. Section 7.5), this
provides some evidence in favor of the MSIH(3)-DVAR(4) model.
12.3 Conclusions
(i.) In each time series considered, business cycle phenomena could be identified
as Markov-switching regimes in the mean growth rate.
(ii.) There is clear evidence for a structural break in the unconditional trend growth
ofthe world economy in 1973:2. For the Japanese economy this result is ob-
vious, in the other univariate analyses the lowered trend growth after 1973 is
expressed in a higher frequency of realized recessionary states.
(iii.) Booms and recessions occur to a large extent simultaneously across countries.
Since the oil-price shock in 1973/74 contemporaneous world-wide shocks
have been the major source of the high international co-movement of output
growth.
(iv.) In addition to the uniform regime shifts in the mean growth rate, the post-1973
period is characterized by a strong contemporaneous correlation of eountry-
specific shocks.
Altogether there is some evidence that the maeroeconomic ftuctuations in the last
twenty years have been mainly driven by world-wide shocks. While the dominance
of a global business cycle does not exclude the possibility that a large asymmetrie
shoek such as the German reunification can temporarily interfere the common cycle,
the MS-DVAR models suggest a less than eentral role for the international transmis-
sion of country-specific shocks. 1
1 In contrast, we will see in the next chapter that the international transmission of shocks in the U.S.
economy dominates the dynamies in a linear cointegrated VAR model.
12.3. ConcJusions 289
Even the very rudimentary six-country models considered in this chapter have been
able to produce plausible results for the statistical characterization of international
business cycles over the last three decades. Nevertheless a deeper analysis seems de-
sirable. In particular, the assumption that the unit roots in the data generating process
can be eliminated by differencing without destroying relevant information seems too
restrictive, as it does not allow for catch-up effects in low-income countries, which
e.g. might be an explanation for the high growth rate ofthe Japanese economy in the
sixties. 2
2Note, however, that an economically meaningful analysis of the issue of convergence would require
per-capita data which have not been used in this study.
290 Markov-Switching Models of Global and International Business CycJes
1.0791 0.4754
2.2775 0.9462
1.1262 0.4926
ji1 = 0.8986
ji2
0.3707
, In L = -990.0887
1.4590 0.6757
1.4466 0.6141
P
[ 0.9778
0.0000
0.0222
1.0000 ] ~_
, ~ -
[ 0.0000 ]
1.0000 ' E[h] =[ 44.9996
00 ]
12.A. Appendix: Estimated MS-DVAR Models 291
1.0951 0.4749
2.2973 0.9422
1.1473 0.4920
Pl P2 = 0.3717
, In L = -1016.2821
0.8920
1.4705 0.6735
1.4651 0.6097
P
[ 0.9777
0.0000
0.0223
1.0000 ] "- _ [ 0.0000
, ~ - 1.0000 ] , E[h] =[
44.8849
00 ]
292 Markov-Switching Models of Global and International Business Cyc1es
0.7804 -0.2551
1.2772 0.4776
0.7082 0.2798
VI V2 , lnL == -1026.1669
0.9385 -0.0074
0.7075 0.0957
0.8810 0.0460
P
[ 0.9482
0.1387
0.0518
0.8613 ] ":. _ [ 0.7278
, €- 0.2722 ] , E[h) == [
19.2872
7.2122 ]
12.A. Appendix: Estimated MS-DVAR Models 293
1.0672 -0.7622
1.5786 0.5704
0.9305 -0.2149
fit
0.7942
jl2 = -0.3876
1.2901 -0.3795
1.1568 -0.2193
P
[ 0.9482
0.1829
0.0518
0.8171 ] ~ = [ 0.7791
0.2209 ] , E[hJ =[
19.2886
5.4681 ]
In L = -936.1343
294 Markov-Switching Models of Global and International Business CycJes
[
-0.2799 0.1380 0.1526 -0.0426 0.1060 0.0894
0.1096 0.1145 -0.1205 -0.0551 0.1754 -0.0681
Ä2 0.3795 -0.0500 -0.0369 0.0515 -0.0916 -0.1422
0.0020
-0.1342
-0.1344
-0.1081
0.0614
-0.0288
-0.0586
0.0172
0.1587
0.0778
0.0164
0.2645
-0.0373
0.0918
-0.0904
0.0117
1
-0.0261 -0.0253
[
-0.3536 0.0937 -0.0159 0.0285 0.4001 0.1961
0.0811 -0.1399 -0.1529 0.0188 0.0865 -0.4420
Ä3 0.0080 0.0930 -0.1220 -0.0721 -0.0091 0.2572
-0.1243
-0.1507
0.0021
0.1263
-0.0684
-0.0182
-0.1878
0.0566
0.0618
0.0594
0.1451
0.3245
-0.1784
0.0563
0.0903
-0.0040
1
0.1386 0.0032
-0.2913 -0.1810 0.2756 -0.2046 0.0169 -0.0272
[ ]
-0.0335 0.1626 0.2977 -0.0310 0.0909 -0.0115
Ä4 -0.0153 -0.0895 0.1455 -0.0072 0.0292 0.3456
-0.1045 0.0194 0.0958 0.0616 -0.0688 -0.0188
-0.5272 0.1389 0.1519 -0.1410 0.3185 -0.1247
0.4383 -0.2789 -0.0979 -0.1086 0.0603 -0.0366
-0.2789 1.2289 0.0283 0.2937 0.2400 0.1901
[ ]
-0.0979 0.0283 2.1132 0.4829 0.0437 -0.2993
EI -0.1086 0.2937 0.4829 2.0066 0.0915 0.3645
0.0603 0.2400 0.0437 0.0915 0.5582 0.0867
-0.0366 0.1901 -0.2993 0.3645 0.0867 1.0143
1.0214 0.4668 0.3754 0.2659 0.5009 0.2365
]
0.4668 0.6209 0.5250 0.1423 -0.1681 0.0119
t 2
[ 0.3754
0.2659
0.5009
0.2365
0.5250
0.1423
-0.1681
0.0119
0.5294
0.0653
-0.1471
-0.0187
0.0653
0.5531
0.0815
-0.4368
-0.1471
0.0815
0.7043
0.2151
-0.0187
-0.4368
0.2151
0.9074
0.9862 0.1904
[ ] [ ]
1.8992 0.4285
ji.l
0.8031
0.9333 I ji.2 = 0.4254
-0.0293 I In L = -895.4399
1.3234 0.2607
1.4481 -0.2344
P [ 0.9408
0.1184
0.0592
0.8816 ] I
€_ [
-
0.6667
0.3333 ] I E[hJ =[ 16.8968
8.4460 ]
12.A. Appendix: Estimated MS-DVAR Models 295
[
0.9214 0.0786 0.0000
1 12.7305
P
1' = =[
~ [ 0217B
1
0.0287 0.8418 0.1295 0.5961 • Elh) 6.3217
0.0000 0.4148 0.5852 0.1861 2.4109
In L = -926.3881
296 Markov-Switching Models oE Global and International Business CycJes
[
0.3019 -0.1200 -0.2193 0.0132 0.0131 -0.0979
Ä1 -0.0610 -0.1047 -0.2615 -0.2652 0.0713 -0.1593
0.3499
-0.0589
-0.0604
-0.0567
0.1331
0.1620
0.0128
-0.0229
0.0837
0.0469
-0.1207
-0.0646
-0.0478
0.0968
0.1396
-0.3542
-0.1422
1
0.0350
[
-0.0728 0.2435 0.0363 -0.1468 0.1155 -0.0468
0.0511 0.0418 -0.1448 -0.0005 -0.0168 0.1427
Ä2 0.3612 -0.0708 -0.0609 0.0014 0.0717 -0.0085
-0.0030
0.0544
-0.1201
-0.1911
-0.1178
0.0024
0.0101
-0.0286
0.0311
-0.0733
0.1261
0.0821
-0.0246
0.2739
0.1251
-0.1022
-0.2316
0.0014
1
[
-0.0563 0.0256 -0.1119 0.0813 0.2189 0.0239
0.0021 -0.0840 0.0180 0.0349 0.1336 -0.0730
Ä3 -0.0040 0.1096 -0.1864 -0.0895 -0.0162 -0.0246
0.0552
0.0929
0.0042
0.2081
-0.0829
-0.2138
0.0215
0.0068
0.0799
0.0678
0.3014
:.....0.0504
0.0316
-0.0966
-0.1485
1
-0.0045 -0.0234 -0.1148
[
-0.0318 -0.0323 0.0625 -0.0379 -0.0894 0.0954
-0.1204 0.1950 0.3304 -0.0851 -0.0139 0.1104
Ä4 0.0930 -0.1344 0.1110 -0.0246 0.0654 0.2069
-0.0818
-0.3487
0.0967
0.2090
0.0807
0.0815
0.0006
-0.1422
0.2365
-0.0222
0.3095
0.2601
-0.0994
-0.2874
-0.1803
1
0.3447 -0.2417 -0.2682
[
-0.2417 0.5941 0.8281 -0.7906 0.2479 -0.0443
-0.2682 0.8281 1.8107 -1.5582 0.1518 -0.1079
EI 0.2365 -0.7906 -1.5582 4.1308 -0.0078 0.1913
0.2601
-0.1803
0.2479
-0.0443
0.1518
-0.1079
-0.0078
0.1913
-0.2410
0.7993
-0.3136
0.1643
-0.3136
0.1700
0.0177
1
0.4356 -0.0643 -0.1381
[ ]
-0.0643 0.7252 -0.0951 -0.0469 0.0457 0.0284
-0.1381 -0.0951 1.2716 0.3832 -0.1294 -0.3024
E2 -0.2410 -0.0469 0.3832 1.2429 -0.0479 0.0887
0.1643 0.0457 -0.1294 -0.0479 0.5199 0.1490
0.0177 0.0284 -0.3024 0.0887 0.1490 1.2206
[
0.4712 -0.3952 -0.3606
]
0.1514 0.5515 0.2697
0.2343 0.2697 0.5938 0.0464 -0.2620 -0.1664
E3 0.3784 0.4712 0.0464 0.7824 -0.1304 -0.3441
0.1238 -0.3952 -0.2620 -0.1304 0.6942 0.2092
-0.0109 -0.3606 -0.1664 -0.3441 0.2092 0.3892
[ ] [ ] [ ]
2.4330 0.9516 0.0700
1.9709 0.3766 -0.3461
VI
3.0103 V2 = 0.9328
, V3 = -0.1098
, In L = -812.7632
1. 7937 0.9611 0.2619
1.5609 0.7745 -0.2198
[ ]
0.6929 0.2052 0.1019 ] 3.2560
[ 0.0787 ]
P = 0.0351 0.9156 0.0493 , ~= 0.6887 , E[h] = [ 11.8550
0.0000 0.1803 0.8197 0.2326 5.5455
Chapter 13
The chapter proceeds as follows. The next section gives abrief introduction into
the issue of cointegration. Then we introduce the MSCI(M,r)-VAR(p) model as a
Markov-switching p-th order vector autoregression with cointegration rank rand M
regimes. Modelling and some basic theoretical properties of these processes are dis-
cussed in Section 13.1. Issues of co-breaking drifts and intercepts are also investig-
ated. In a generalization of the results of Chapter 3, a cointegrated VARMA repre-
sentation for MSCI(M,r)-VAR(p) processes is introduced in Seetion 13.2. For this
dass of processes, a two-stage ML estimation technique is proposed in Section 13.3.
In the first stage, the JOHANSEN [1988], [1991] procedure is applied to finite VAR
approximations of the data generating MSCI-VAR process in order to deterrnine the
cointegration rank and estimate the cointegration matrix. In the second stage, condi-
tional on the estimated cointegration matrix, the remaining parameters of the vector
equilibrium correction representation of the MSCI-VAR process are estimated via
the version of the EM algorithm presented in Section 10.1. Finally, the proposed
methodology is illustrated with an application to the data set introduced in the last
chapter.
298 Cointegration Analysis of VAR Models with Markovian Shifts in Regime
13.1.1 Cointegration
where Yt = (Ylt, ... , YKd, lI(St) = (1I1(St), ... , lIK(St))', the Ai are (K x K)
coefficient matrices and Ut = (Ult, ... , UKt)' is a Gaussian white noise with cov-
ariance matrix L:, Ut ....., NID (0, L:), and Yo, ... , Yl-p are fixed. The reverse charac-
teristic polynomial ofthe system (13.1) is given by
If IA (z ) I has one or more roots for z = 1, IA (1) I = 0, and all other roots are outside
the complex unit cirde, IA( z) I =1= 0 for Iz I :s; 1, z =1= 1, the Yt variables are integrated
and possibly cointegrated.
In the following we consider processes where Yt is integrated of order 1, Yt ....., 1 (1),
such that f:j.Yt is stable while Yt is unstable. The 1(1) process Yt is called cointegrated
if there is at least one linear combination of these variables c' Yt which is stationary.
Obviously, there can exist up to K - 1 linearly independent cointegration relation-
ships. The variable Zt = c' Yt - b with b = E[c' Yt] is a stationary stochastic variable
measuring deviations from the equilibrium.
13.1. Cointegrated VAR Processes with Markov-Switching Regimes 299
The concept of co integration is closely related to the error correction model respect-
ively vector equilibrium correction model (VECM) proposed by DAVIDSON et aI.
[1978]. Subtracting Yt-l from both sides and rearranging terms, the process defined
in (13.1) can be written in its vector equilibrium correction form as
p-l
L.:;=1 Aj = A(l) is singular. The rank r of the matrix TI is called the cointegration
rank. Thus TI can be written as BC with Band C' being of dimension (K x r) and of
rank r. The (r x K) matrix C is denoted as the cointegration matrix and the matrix
B is sometimes called the loading matrix. We consider systems with 0 < r < K,
thus Yt is neither stationary (r = K; TI unrestricted) nor purely difference stationary
(r = 0; TI = 0). A more detailed discussion ofthe properties and statistical analysis
oflinear cointegrated systems (v(st} == v) can be found in LÜTKEPOHL [1991, ch.
11]. A Markov-switching p-th order vector autoregression with cointegration rank r
is called MSCI(M,r)-VAR(p) model.
In cointegrated VAR(P) models, the intercept term v reftects in general two rather
different quantities. Applying the expectation operator to the VECM model (13.2)
gives us
D(I)E[!:lyt] = v + BE[CYt],
where D(l) = IK - D1 - ... - D p- 1. Thus,
v = -B8 + D(I)JL,
where JL denotes the expected first difference of the time series
and 8 is a constant determining the long-run equilibrium and is thus included in the
cointegration relation,
300 Cointegration Analysis of VAR Models with Markovian Shifts in Regime
Cointegration implies the following restriction for the expected first differences of
the system:
0, (13.3)
revealing that J-t consists only of K - r free parameters reflecting the common de-
terministic linear trends of the system. Thus J-t can be parameterized as
and we get
J-t * == [ /Li.. ].
ILK-r
If the intercept term can be absorbed into the cointegration relation, the variables
have no deterministic linear time trends. Otherwise, in the absence of any restriction
on v, there are K - r time trends producing the drift in Yt. such that
(13.4)
Analogously, a regime shift in the intercept term can change the mean growth rate
and the equilibrium mean. In MSCI-VAR models each regime m == 1, ... , M is
associated with an attractor (/L:n, Öm ):
13.1. Cointegrated VAR Processes with Markov-Switching Regimes 301
(13.5)
(iv.) Contemporaneous shifts in the drift /1( St) and in the long-run equilibrium
b(sd:
(13.8)
where b( St) and J.L(St) are defined as in (13.6) and (13.7). The difference to the
model in (13.5) consists of an immediate one-time-jump of the process drift
and equilibrium mean after a change in regime, as in the MSM-VAR model.
Furthermore, the shifts in the drift and in the long-run equilibrium might be
(contemporaneously or intertemporally) perfectly correlated or not.
The MS-VECM model is closely related to the notion of multiple equilibria in dy-
namic economic theory. Henceforth, each regime is characterized by its attractor of
the system which is defined by the equilibrium value of the cointegration vector and
the drift.
Consider, for example, a bivariate model with logarithms of income Yt and consump-
tion Ct, where the cointegration relation is determined by an equilibrium consump-
lion ratio, Ct - Yt = <5. The MS-VECM form of this model is given by
(13.9)
302 Cointegration Analysis of VAR Models with Markovian Shifts in Regime
where Ut '" NID (0, L:) and J-L* is the equilibrium growth rate.
In (13.9), each regime m is associated with a particular attractor (J-L':n, Om) given by
the equilibrium growth rate J-L':n and the equilibrium consumption ratio om. Hence the
different specifications of the MSCI-VAR process can be characterized either by (i.)
a rather complex dynamic adjustment after the transition from one state into another,
v(st}, (iL) regime shifts in the common growth rate J-L* (st}, (iii.) regime shifts in the
equilibrium consumption ratio o( St), or (iv.) contemporaneous regime shifts in both
parameter vectors, J-L(sd and o(St).
(13.10)
l
is a (K x [M - 1]) matrix and the ([M - 1] x 1) regime vector is defined as
~1t - [1 ]
(13.11)
{M-l,t ~ (M-l
The regime vector (t follows the hidden Markov chain, which is again represented
as a VAR(1) process,
(13.12)
where in the ([ M -1] x [M -1]) matrix :F the adding-up restrietion on the transposed
transition matrix P' is elirninated,
:F
l Pu -
Pl,M-l ~
PMl
PM,M-l
PM-l,l - PMl
PM-l,M-l - PM,M-l
] ,
D.Yt
D.Yt-l
D 1 ... D p- 1 Be B M D Ut
D.Yt D.Yt-l
IK 0 0 0 0 0 0
D.Yt-l D.Yt-2
= + +
D.Yt-p+l 0 IK 0 0 D.Yt-p
CYt-p 0 0 C Ir 0 CYt-p-l 0
(t 0 0 F (t-l Vt
0
D.Yt - jl
D.Yt-l - jl
D.Yt-l - jl IK 0 0 0 0 D.Yt-2 - jl 0
+
D.Yt-p+l - jl 0 IK 0 0 D.Yt-p - jl
CYt-p - 8 0 0 C Ir 0 CYt-p-l - 8 0
(t 0 0 F (t-l Vt
304 Cointegration Analysis of VAR Models with Markovian Shifts in Regime
If there exists no absorbing state of the Markov chain, then all eigenvalues of :F are
less than one in absolute value. In PROIETTI [1994, p.5] it is shown that the remain-
ing eigenvalues of (13.13) lie outside the unit circle. Thus, the state-space represen-
tation (13.13) associated with MS-VECM processes is stable.
This steady-state representation opens the way to the investigation of common trends
and cycles. Due to the non-normal innovations Vt, the statistical analysis of (13.13)
requires a combination of KaIman filter with BLHK filter techniques, which have
been discussed in Chapter 5. We willleave this last issue to future research.
When multiple time series are subject to regime switching, the shifts in regime can
be related in an analogous way to cointegration. To clarify the properties of the re-
gime shifts in MSCI-VAR processes, a comparison with the concept of co-breaking
recently introduced by CLEMENTS & HENDRY [1994] and HENDRY [1996] might
be helpful.
Co-breaking is closely related to the idea of cointegration (cf. EMERSON & HENDRY
[1995]): while cointegration removes unit roots from linear combinations of vari-
ables, co-breaking can eliminate the effects of regime switching by taking linear
combinations of variables. Roughly speaking, (drift) co-breaking prevails if the re-
gime shift alters the drift of system such that at least one linear combination remains
stationary. The condition for co-broken MSCI-VAR processes can be formulated as
where B 1.. is a full row rank ([K - Tl x K) matrix orthogonal to the loading matrix
B, BBJ.. = O.
For MSCI -VAR processes, the stationarity of the cointegration relation remains un-
altered even if the regime shifts are not co-breaking. Due to the stationarity of the
stochastic process generating the path of regimes the effects of regime switching are
elirninated asymptotically. Since there exists an ergodic distribution of the state vec-
tor ~t, a shift in regime does not affect the unconditional drift of the cointegrated
variables.
If the variance-covariance matrix ~ is allowed to vary over regimes, the error term
Wt of the resulting VAR model becomes bilinear in the innovations Vt and Ut such
that
The bilinearity of (13.15) may effect the justification to be given in Seetion 13.3 for
the applicability ofthe Johansen framework.
In the following we consider only the simplest case, where the deviation from Gaus-
sian cointegration systems (as considered inter alia by [1995]) is restricted to the
different treatment of the intercept term, which is no longer a simple parameter but
is assumed to be generated by a stochastic process, i.e. the hidden Markov chain. An
example given in Section 13.4 will illustrate the relevance ofthe MSCI-VAR model
for empirical macroeconomics. However, it must be emphasized that due to their
non-standard asymptotics the estimated MSCI-VAR models are here primarily used
as descriptive devices. Nevertheless, this investigation shows that the development
of an asymptotic distribution theory for these pro ces ses is a worthy pro gram for fu-
ture research.
Thus, the intercept term is not a simple parameter but it is generated by the stochastic
process (13.10):
V(St) = v+ M(t,
2:
00
FjVt-j. (13.17)
j==O
13.2. A Cointegrated VARMA Representation for MSCI- VAR Processes 307
Hence the intercept term lI(St) is generated by a linearly transformed VAR(l) pro-
cess. Inserting (13.17) and (13.10) into (13.16) gives us
p =
Yt = L AiYt-i + Ut + iJ + M L :FjVt-j. (13.18)
i=l j=O
Thus the Markovian shift in the intercept term implies a cointegrated VAR process
where the equilibrium term Wt is the sum oftwo independentprocesses, the Gaussian
white noise Ut and an autocorrelated non-normal process,
Wt = Ut +M L :FjVt-j' (13.20)
j=O
Using the definition of the adjoint matrix, :F(L)* = I :F(L)I :F(L)-l, results in
(13.21)
or written with the (K x K) reduced polynomial A(L) = I(L)A(L) and the (K xl)
constant ao = 1(1)0,
p+M-1 M-1
From (13.22) it is clear that MSCI(M, r)-VAR(p) processes can be written in the
form of a vector autoregression with an infinite order. To illustrate thi~ point, sup-
pose that both sides of equation (13.23) are multiplied with the inverse polynomial
B(L)-1 such that
L WiYt-i + ct,
00
Yt 1jJ + (13.25)
i=O
where the interceptterm 1jJ = 1(1 )B(l )-1 0 reflects the unconditional mean of 1I( sd
and w(L) exhibits only the unit roots introduced by A(L).
Some remarks on this point are necessary. Note that Yt is an integrated variable and
thus an infinite sum is not absolutely summable. In this sense equation (13.25) is
not well-defined. The rough disregard of the initial conditions of the process might
be justified for our purposes as we are not interested in the parameters of (13.25).
13.3. A Two-Stage Procedure 309
The main point is that equation (13.25) characterizes the cointegrated system (13.1)
with Markovian regime shifts as a non-normal cointegrated vector autoregression of
infinite order. This property ofMSCI-VAR processes enables us to base the cointe-
gration analysis of such data generating processes on procedures available for infinite
order VAR models.
SAIKKONEN [1992] and SAIKKONEN & LUUKKONEN [1995] show that the use of
analogs or elose versions of the likelihood ratio tests developed for finite order Gaus-
sian vector autoregressive processes is justified even when the data are generated by
an infinite non-Gaussian vector autoregressive process.
A vector equilibrium correction model with finite order h is fitted to the data which
are assumed to be generated by an infinite order cointegrated VAR process. If the
finite order VAR process is regarded as an approximation,
h
<Ph + 2: Di,hfJ.Yt-i + IIhYt-l + Ut,h, (13.26)
i=l
310 Cointegration Analysis oE VAR Models with Markovian ShiEts in Regime
SAIKKONEN [1992] provides some general asymptotic results for infinite order VAR
processes showing that most of the asymptotic results of J OHANSEN [1988], [1991]
for the estimated cointegration relations and weighting matrix remain valid.
Thus, the conditions of the Johansen-Saikkonen test correspond to the situation cur-
rently under consideration. For the application to the specific model under consid-
eration, four results are essential:
• Under the assumption that the order of the fitted process is increased with the
sampie size, some of the results for finite order VAR processes can be extended
to these more general data generation processes. The asymptotic properties of
the estimated short-run parameters, as weIl as impulse responses, are derived
in SAIKKONEN & LÜTKEPOHL [1994] and LÜTKEPOHL & SAIKKONEN
[1995]. In particular, LÜTKEPOHL & SAIKKONEN [1995] demonstrate that
the usual interpretation of cointegrated VAR systems through impulse re-
sponses and related quantities can be justified even if the true VAR order is
infinite while a finite VAR(P) process is fitted to the observed data.
• A major problem might occur from the fact that an asymptotic estimation
theory is not weIl established for infinite cointegrated VAR processes with
a drift term (cf. SAIKKONEN & LÜTKEPOHL [1994]): The contribution of
SAIKKONEN [1992] and his co-authors is restricted to models where the inter-
cept term can be included in the cointegration relation 11 = -Bb, where b is
an (r xl) vector. Furthermore, the asymptotic distribution of ML estimates of
the general intercept term is non-standard (cf. HAMILTON [1994a, ch. 18.2]).
Thus, LR tests of hypotheses conceming a shift of the intercept term typically
have non-standard distributions even if the number of regimes is unaltered un-
der the null. The estimated regime-dependent intercept terms primarily have
a descriptive value.
Since the assumed latent Markovian shifts in regime imply a data generating
VARMA process, the Johansen-Saikkonen statistic would be a natural testing pro-
cedure. However LÜTKEPOHL & CLAESSEN [1996] found for small sampies that
the Johansen statistic is more closely approximated by the identical asymptotic
distribution. In conclusion, under the prevailing conditions of our analysis, that is,
under the assumption that the data generating process is an MSCI(M, r)-VAR(p)
process, there is no obstac1e in studying the long-term properties within the well-
known Johansen framework for linear systems.
13.3.2 EM Algorithm
Our two-stage procedure employs the Johansen ML analysis only to determine the
cointegration rank r ofthe system and to deliver an estimation of cointegration mat-
rix C. The remaining parameters of the MSCI(M, r)-VAR(p) model are estimated
with the methods developed in Chapters 6, 9 and 10.
While the cointegration analysis has been based on approximating a linear sys-
tem, we consider again the equilibrium correction form of the data generating
MSCI(M, r)-VAR(p) process:
p-l
Figure 13.1: Growth in the World Economy. Log of Real GNP 1960-1991
200
JAP
180
160
140
CAN
120 AUS
100 ~,
~-_ FRG
USA
80
UK
60
40
20
0
60 65 70 75 80 85 90 95
yPSA
y:AP
yfRG
Yt =
yp K
yfAN
y~US
13.4. Global and International Business Cyc1es 313
By subtracting the value of 1960:1, the system has been normalized such that the
time series vector is equal to zero in 1960: 1. Note that in contrast to the last chapter,
the non-stationary univariate components of the multiple time series are not differ-
enced prior to modelling the MS-VAR process, which is now defined in levels. The
ordering of the variables corresponds to the size of the national economies and to
their possible importance for the world economy and, thus, the international busi-
ness cycle. In particular, this ordering ensures that in the usual way orthogonalized
shocks in the V.S. economy may have an instantaneous impact on all other variables
of the system.
The variables under consideration are plotted in Figure 13.1. Obviously there is a
strong parallel trending movement, which suggests possible cointegration. Interest-
ingly, there seems to be a break in the trend of the Japanese GNP as seen in the last
chapter.
This section evaluates the cointegration features of international and global business
cycles on the basis of a finite pure VAR(P) approximation ofthe system. Hence the
following cointegration analysis does not consider the latent Markovian regime shifts
explicitly. This enables us to perform the cointegration analysis with the Johansen
ML procedures for linear cointegrated systems. With this estimated model, some
issues related to cointegration will be discussed.
Initially we determine the order of the VAR approximation and the cointegration rank
ofthe system. All calculations in this seetion have been carried out with MULTI (cf.
LÜTKEPOHL et al. [1993]).
We have applied VAR order selection criteria (cf. LÜTKEPOHL [1991, sec. 11.4.1.])
with a maximum order of 8. Four different criteria have been used for specifying
the VAR order. The Schwarz criterion (SC) and the Hannan-Quinn criterion (HQ)
estimated the order p = 1 of a VAR approximation ofthe system, while the Akaike
(AlC) and thefinal prediction error (FPE) criteria support a larger model, p = 2.
For finite VAR processes, SC and HQ are both consistent while AlC is not a con-
sistent criterion. This would justify to choose the order ß = 1, thus, restricting the
dynamics of the model to the equilibrium correction mechanism exclusively.
314 Cointegration Analysis of VAR Models with Markovian Shifts in Regime
t Trace test for cointegration rank: Ho: rank = r versus Hl: r < rank ::; K.
t Maximum eigenvalue test for cointegration rank: Ho: rank = r versus Hl: rank = r + 1.
** Significant at 1%level, * significant at 5% level.
Percentage points ofthe asymptotic distribution are taken from OSTERWALD-LENUM [1992].
However, since the true model is assumed to be subject to Markovian regime shifts,
under the present conditions any finite VAR order is only an approximation. There-
fore, we have performed the cointegration analysis using both specifications with
p = 1 and p = 2. An intercept term was included in the VAR(P) model under con-
sideration.
Since the assumed latent Markovian shifts in regime imply a data generating
VARMA process, the Johansen-Saikkonen test statistic would be a natural test-
ing procedure. However as already mentioned in Section 13.3, LÜTKEPOHL &
CLAESSEN [1996] and SAIKKONEN & LUUKKONEN [1995] found that for small
sampies the Johansen statistic is more closely approximated by the (identical)
asymptotic distribution.
Therefore, we will perform the statistical analysis based on the JOHANSEN approach
[1988], [1991] to maximum likelihood estimation of cointegrated linear systems.
As the employed finite VAR model is only an approximation of the data generat-
13.4. Global and International Business CycJes 315
t Trace test for cointegration rank: Ho: rank = r versus Hl: r < rank S; K.
:/: Maximum eigenvalue test for cointegration rank: Ho: rank = r versus Hl: rank = r + 1.
** Significant at 1% level, * significant at 5% level.
Percentage points of the asymptotic distribution are taken from OSTERWALD-LENUM [1992].
Initially we have determined the cointegration rank of the system. Table 13.1 shows
the results of the Johansen trace and maximum eigenval ue test for the VAR( 1) model,
where the critical values from ÜSTERWALD-LENUM [1992] correspond to the situ-
ation where the variables exhibit deterministic trends; the significance levels are
valid for the individual tests only. As shown by SAIKKONEN & LUUKKONEN
[1995], these tests maintain their asymptotic validity even if the true VAR order is
infinite. Both Johansen tests strongly support a cointegration rank of r = 1 for the
VAR(1) model as weIl as forthe VAR(2) model (cf. Table 13.2). Thus, K - r = 5
linearly independent stochastic trends remain.
Since our main interest is the analysis of the effects of regime shifts, we restrict our
analysis of the VAR approximations to the long-run properties of the system. The
estimated cointegration vector is quite similar in both specifications:
where we have normalized the cointegration vector so that the U .S. coefficient equals
1 and the constant has been suppressed. The highest weight for the USA in the coin-
tegration relationship in conjunction with the positive elements of the loading matrix
for the rest of the world point out the dominance of the U. S. economy for the global
economic system.
While the equilibrium of both models are quite similar, the VAR(l) model restricts
the dynamics of the model to the equilibrium cOITection mechanism by
In pure VAR models, tests for Granger-causality can be based on Wald tests for a set
of linear restrictions. The vector Yt is partitioned into the single time series Xt and
the resulting 5-dimensional rest-of-the-world system ROW t , such that
A 12 .i
A 22 ,i
1'
Then Xt does not Granger-cause ROW t if and only ifthe hypothesis
The aspects of testing Granger causality in MSCI-VAR models are not solved at
this stage of research. For stationary MSM-VAR and MSI-VAR processes, Gran-
ger eausality eould be eheeked on the basis of their VARMA representations (cf.
LÜTKEPOHL [1991, eh. 6.7.1]). As shown by LÜTKEPOHL & POSKITT [1992] for
stationary VARMA processes and SAIKKONEN & LÜTKEPOHL [1995] for cointeg-
rated processes, tests for Granger causality eould then be based on finite order VAR
approximations.
Table 13.3 gives the results for a pure finite VAR approximation ofthe data set under
eonsideration. The significanee levels given in Table 13.3 are valid for a X2 distribu-
tion with degrees offreedom equal to the number of zero restrictions (i.e. x2 (10) for
the Ho: ROW-f. x and X2 (50) for Ho: x -f. ROW). However, as already noted, the
asymptotic distribution of the Wald statistic could be non-standard as in finite-order
eointegrated VAR proeesses; on the other hand, overfitting could be helpful as in the
318 Cointegration Analysis of VAR Models with Markovian Shifts in Regime
The one-directional Granger causality of the US growth rate demonstrates the im-
portance of the U.S. economy for the world economy (significant at 1%). In addi-
tion to the Uni ted States, only West German econornic data seems to have a predict-
ive power for the rest of the system (significant at 5%). West Germany, Canada and
Japan are highly dependent on the state of global macroeconomic activity as they
are respectively Granger caused by the rest of the world. There are no statistically
significant findings for the United Kingdom and Australia.
The test results for instantaneous causality ROW t-+ x are given in Table 13.3, too.
There is instantaneous causality between Xt and ROWt if and only if HO:~12 =
~21 = 0 is true (cf. e.g. LÜTKEPOHL [1991, sec. 2.3.1.]). In addition to the dy-
namic propagation of econornic shocks, evidence for contemporaneous shocks is es-
tablished for the USA, Canada, and the UK.
If the process is stationary, forecast eITor impulse responses are the coefficients of
the Wold MA representation,
=L
00
where the process mean has been set to zero for simplicity.
While such a Wold representation does not exist for cointegrated processes, the (K x
K) matrix <Pi can be calculated recursively (cf. LÜTKEPOHL [1991, sec. 11.3.1]) as
and the kl-th element of <Pi can be interpreted as the response of variable k to an
impulse in variable l, i periods ago.
After 10 years, 93% of the variance of US GDP is due to own innovations, but also
68% ofthe Canadian, 60% ofthe German, 30% ofthe UK, 45% ofthe Australian and
25% of the Japanese are caused by shocks in the U.S. economy. Other than the ef-
fects ofU.S. shocks, only the own innovations in Japan and the UK and the feedback
between Japan and Germany are statistically significant. Furthermore, there is evid-
ence for effects of German shocks on the Australian process of economic growth.
It should be emphasized that the main conclusions forthe VAR(l) process are similar
to those for the VAR(2) process. However, an asymptotic theory for the forecast er-
ror decomposition under our assumptions regarding the data generating mechanism
merits future investigation.
. ... :r:r
0.5
. .. . Fr:
'L:\ .. ''..':<>:
:::
',. . ... .. . ... ::=:::::: . . }:~t:·::H
..... . .. 1':-:-:-:
)\A.
.!.~.
.. i~ ..... . .6. : 1:::::::: t. ...
0.0 ~~~~~--~--~~~~~~--~~~~--~~----~~
60 65 70 75 80 85 90
In Figure 13.2 the filtered probabilities Pr(St = 21yt) of being in the recession-
ary state 2 and the (full sampie) smoothed probabilities Pr(St = 2IYT) are again
compared with the chronology of business and growth cyc1e tuming points of the
U.S. economy provided by the Center ofIntemational Business Cyc1e Research. The
recessionary regime is clearly associated with the two recessions after the oil price
shocks in 1973/74 and 1979/80.
Interestingly, the impact effect of a regime shift is quite heavy for the United States
and the Uni ted Kingdom, but negligible for Australia and contradictory for Canada.
13.5. Global Business Cycles in a Cointegrated System 321
So,
1.3310 -0.8542
1.3193 0.3020
0.7652 0.2173
VI - V2 = V2 =
1.2817 -0.2901
-0.3722 0.8112
0.0627 0.8647
These results should be compared to the regime classifications given in Figure 12.16
for an MSI(2)-DVAR(1) model, i.e. where the equilibrium correction mechanism
has been dropped out: C == 0; the estimation results have been given in Table 12.5.
A likelihood ratio test of the MSI(2)-OVAR( 1) model against the MS(2)-VECM( 1)
model results in
which will be significant at 1% if the critical values of the linear world can be ap-
plied (cf. OSTERWALD-LENUM [1992]). The rejection of the null hypothesis of no
cointegration relation against a cointegration rank r = 1 strongly supports the cointe-
gration result which has been found in the pure VAR approximations ofthe system.
where the white noise process is heteroskedastic, Ut '" NIO (0, :E(sd), and Zt
•
I
,l ~
,.,,.
•
I
,.
.,,.
.
11
.u·
.H
.. " -.
I, A
65 70 75
of a regime shift:
0.8969 -0.2041
0.4587 1.1533
0.7719 0.3981
0.9251 0.2322
0.6484 0.1702
0.6335 0.5737
Thus a turning of the 'world business cycle' causes annualized impact effects on real
econornic growth in the range from 1.8% (Japan) to 3.7% (UK). This reveals a strong
homogeneous effect of a shift in regime to the national economies.
as weIl as a strong positive correlation between shocks in the United States and in
the other national economies.
possesses its standard asymptotic distribution. Then, under the number of regimes
preserving hypothesis 2; 1 = 2;2 but VI =/:. V2, the LR test statistic results in
In order to conclude our empirical analysis, the results of this chapter may be com-
pared with those of the last one. The incredibly high parameter values of the estim-
ated loading matrix emphasize the importance of the equilibrium correction mech-
anism. Economic differences in the regime classification occur with regard to the
double-dip characterization of the recession 1979/80 and 1981/82. Finally, an addi-
tionallow growth period for 1966/67 is identified.
13.6 Conclusions
The theoretical analysis has focused on the modelling of Markovian regime shifts
of cointegrated systems. This issue has been linked to the notion of multiple equi-
libria in dynamic economic theory, as weIl as to the recently proposed concept of
co-breaking trends and means. The procedures proposed for the statistical analysis
of cointegrated systems subject to changes in regime have been based on the infinite
cointegrated VAR representation of MSCI-VAR models.
While there is much work that can and will be done on this class of models in the
near future, the main results of our investigation can be summarized:
(iii.) For the statistical detenrunation of the cointegration relationship, i.e. tests
of the cointegration rank rand the cointegration matrix C, the non-normal
VARMA process may be approximated by a finite pure VAR(P) process, which
allows the application of the Johansen ML analysis of cointegrated linear sys-
tems.
(iv.) An asymptotic theory for the statistical methods of testing the cointegration
rank may be based on the infinite cointegrated VAR representation. While the
development of an asymptotic theory has been beyond the scope of this chap-
ter, there is hope that research currently in progress will provide a theoretical
basis. In particular, a theory of infinite VAR processes with drift would be
able to solve currently existing problems. As long as this theory does not ex-
ist, some of our results remain provisional.
-I
/.I
= [ 0.8870
(0.1170)
2.0371
(0.1230)
1.1788
(0.1681)
0.8115
(0.1753)
1.2567
(0.1255)
1.4204
(0.1362) ]
0.71625
0.10359 0.86969
0.01348 0.25282 1.50300
f;
0.15058 0.25777 0.36596 1.67290
0.26992 0.06452 0.02290 0.14447 0.69501
0.10997 -0.01236 -0.06937 0.11247 0.16103 1.07690
326 Cointegration Analysis of VAR Models with Markovian Shifts in Regime
-I
v
[ 0.2544
(0.2074)
1.8218
(0.2286)
1.1439
(0.3005)
1.0311
(0.3170)
0.0584
(0.2043)
1.2833
(0.2543) ]
0.96605
0.20817 1.13730
0.09887 -0.08849 1.73180
f: = 0.14430 0.14067 0.44194 1.88420
in L = -980.6021
Epilogue
A study like the present one can of course make no claims of encyclopedic com-
pleteness and it would be pointless to list aB the concepts which are related to the
MS-VAR model but which have not been discussed in this presentation. If this study
may have intended to develop an operation al econometric approach for the statist-
ical analysis of economic time series with MS-VAR models, then we can conc1ude
that some progress has been made. Concerning inter alia the flexibility of modelling
and the computational effort of estimation, this study has put forward the MS-VAR
model as an alternative to linear, normal systems. In some other respects our res-
ults are more preliminary, but realisticaBy we could not have expected to resolve all
problems.
It must be emphasized that the previous analysis rests on some basic assumptions
and most of our results will not hold without them. To maximize the effectiveness
of our efforts, some results have been restricted to processes where the shift in re-
gime affects only the level or the drift of a time series vector. One basic assumption
has been related to the class of processes considered. In most chapters, the presump-
tion has been made that the data is generated by a stationary (respectively difference-
stationary) stochastic process which eXc1udes e.g. the presence of cointegration. In
the last chapter, we have introduced - with the MSCI-VAR model- a cointegrated
vector autoregressive model where Markovian shifts occur in the mean of the cointe-
gration relation and the drift of the system. A number of fundamental methods have
been proposed to analyze them. Further research is required on this topic which we
believe to be of central theoretic and practical importance.
We are aware that we do not possess an asymptotic estimation and testing theory for
the MS-VAR model in general. We have presupposed that regularity conditions of
330 Epilogue
general propositions proven for general state-space models and non-linear models
are satisfied and there is no indication that they are not fulfilled for the processes un-
der consideration. Thus the non-standard asymptotics involved in the determination
of the number of regimes seem to be simply an exception. Several procedures have
been proposed to allow statistical analysis in practice.
Finally, we have only sketched the potential contribution of the MS-VAR model to
business cyde analysis. Our analysis was restricted to the highest possible aggreg-
ation level, where the macroeconomic activity has been summed-up contemporan-
eously in a single time series. Further research has to be undertaken to construct
a comprehensive statistical characterization of national, international and global
macroeconomic ftuctuation-generating forces. The considerable fact that MS-VAR
models possess, in most applications, an intuitive economic interpretation should not
be underestimated for it enables a dialog needed between econometricans and eco-
nomists, who are working non-quantitatively.
While some of our theoretical results remain provisional under the aforementioned
limitations, the presented applications already underline the usefulness of the MS-
VAR model and the methods proposed in this study for empirical research. It is
hoped that although the previous discussion has identified areas of necessary further
theoretical developments, this study will also provide a useful systematic basis for
empirical investigations with Markov-switching vector autoregressions.
R eferen ces
ALBERT, J., & CHIB, S. [1993]. "Bayes inference via Gibbs sampling of auto-
regressive time series subject to Markov mean and variance shifts". Journal
of Business & Economic Statistics, 11, 1-16.
AOKI, M., & HAVENNER, A. [1991]. "State space modeling of multiple time
series". Econometric Reviews, 10, 1-59.
AOKI, M. [1990]. State Space Modeling ofTime Series. Berlin: Springer Verlag,
2nd Edition.
BÄRDSEN, G., FISHER, P. G., & NYMOEN, R. [1995]. Business Cycles: Real Facts
or Fallacies? University of Os10, working paper.
BAUM, L. E., & EAGON, J. A. [1967]. "An inequality with applications to statist-
ical estimation for probabilistic functions of Markov chains and to a model for
ecology". Bull. American Mathematical Society, 73,360-363.
BAUM, L. E., PETRIE, T., SOULES, G., & WEISS, N. [1970]. "A maximiza-
tion technique occurring in the statistical analysis of probabilistic functions of
Markov chains". Annals oi Mathematical Statistics, 41, 164-171.
BERNDT, E. K., HALL, B. H., HALL, R. E., & HAUSMAN, J. A. [1974]. "Es ti-
mation and inference in nonlinear structural models". Ann. Econ. Social Meas-
urement, 3/4, 653-665.
BILLIO, M., & MONFORT, A. [1995]. Switching State Space Models. Likelihood
Function, Filtering and Smoothing. CREST working paper.
BLACKWELL, E., & KOOPMANS, L. [1975]. "On the identifiability problem for
functions of finite Markov chains". Annals oi Mathematical Statistics, 28,
1011-1015.
Box, G. E. P., & TIAO, G. C. [1968]. "A Bayesian approach to some outlier prob-
lems". Biometrika, 55, 119-129.
CARTER, C. K., & KOHN, R. [1994]. "On Gibbs sampling for state space models".
Biometrika, 81, 541-553.
COSSLETT, S. R., & LEE, L.-F. [1985]. "Serial correlation in latent discrete vari-
able models". Journal of Econometrics, 27, 79-97.
DAVIDSON, J. E. H., HENDRY, D. F., SRBA, F., & YEO, S. [1978]. "Economet-
rie modelling of the aggregate time-series relationship between eonsumers' ex-
penditure and income in the Uni ted Kingdom". Economic Journal, 88, 661-
692.
DAVIDSON, R., & MACKINNON, 1. G. [1981]. "Several tests for model specifica-
tion in the presence of alternative hypotheses". Econometrica, 49, 781-793.
DICKEY, D. A., & FULLER, W. A. [1979]. "Distribution of the estimators for auto-
regressive time series with a unit root". Journal of the American Statistical As-
sociation, 74, 427--431.
DICKEY, D. A., & FULLER, W. A. [1981]. "Likelihood ratio statistics for auto-
regressive time series with a unit roof'. Econometrica, 49, 1057-1072.
DOAN, T., LITTERMAN, R. B., & SIMS, C. [1984]. "Forecasting and conditional
projection using realistic prior distributions". Econometric Reviews, 3, 1-144.
DOLADO, J. J., & LÜTKEPOHL, H. [1996]. "Making wald tests work for cointeg-
rated var systems". Econometric Reviews, forthcoming.
FUNKE, M., HALL, S. G., & SOLA, M. [1994]. Rational Bubbles du ring Poland's
Hyperinflation: Implications and Empirical Evidence. Humboldt-Universität
zu Berlin, Discussion Paper 17.
GARCIA, R., & PERRON, P. [1990]. An Analysis ofthe Real Interest Rate under
Regime Shifts. Princeton University, working paper.
GARCIA, R., & SCHALLER, H. [1995]. Are the Effects of Monetary Policy Asym-
metric? Universite de Montreal, working paper.
GEMAN, S., & GEMAN, D. [1984]. "Stochastic relaxation, Gibbs distributions and
the bayesian restoration of images". IEEE Trans. on Pattern Analysis and Ma-
chine Intelligence, 6, 721-741.
GEWEKE, J. [1994]. "Priors for macroeconomic time series and their application".
Econometric Theory, 10,609-632.
GHYSELS E. [1994]. "On the periodic strueture ofthe business eyde". Journalof
j
GHYSELS, E. [1993]. A Time Series Model with Periodic Stochastic Regime Switch-
ing. Universite de Montreal, working paper.
336 References
GOLDFELD, S. M., & QUANDT, R. E. [1973]. "A Markov model for switching
regressions". Journal of Econometrics, 1, 3-16.
GRAN GER, C. W. J. [1981]. "Some properties oftime series data and their use in
econometric models". Journal of Econometrics, 16, 121-130.
HAGGAN, V., & OZAKI, T. [1981]. "ModeUing nonlinear random vibrations us-
ing an amplitude-dependent autoregressive time series model". Biometrika, 68,
189-196.
HALL, S. G., & SOLA, M. [1993a]. A Generalized Model ofRegime Changes Ap-
plied to the US Treasury Bill Rate. CEF discussion paper 07-93.
HAMILTON, J. D., & LIN, G. [1994]. Stock Market Volatility and the Business
Cycle. UCSD working paper.
HARVEY, A. C., & JAEGER, A. [1993]. "Detrending, stylized facts and the business
cyde". Journal oJ Applied Eeonometries, 8, 231-247.
338 References
HELLER, A. [1965]. "On stoehastie proeesses derived from Markov ehains". Annals
of Mathematical Statistics, 36, 1286-1291.
HESS, G. D., & IWATA, S. [1995]. Measuring Business Cycle Features. University
of Kansas, Research Papers in Theoretieal and Applied Eeonomics No. 1995-6.
HILDRETH, c., & HOUCK, J. P. [1968]. "Some estimators for a linear model with
random coefficients". Journal of the American Statistical Association, 63, 584-
595.
HOLST, U., LINDGREN, G., HOLST, J., & THUVESHOLMEM, M. [1994]. "Reeurs-
ive estimation in switehing autoregressions with a Markov regime". Journalof
Time Series Analysis, 15,489-506.
JUDGE, G., GRIFFITHS, W. E., HILL, R. C., LÜTKEPOHL, H., & LEE, T.-C.
[1985]. The Theory and Practice ofEconometrics. 2nd edn. New York: Wiley.
JUDGE, G., HILL, R. c., GRIFFITHS, W. E., LÜTKEPOHL, H., & LEE, T.-C.
[1988]. Introduction to Theory and Practice of Econometrics. 2nd edn. New
York: Wiley.
References 339
KÄHLER, J., & MARNET, V. [1994a]. "International business cycles and long-run
growth: An analysis with Markov-switching and cointegration models". In:
ZIMMERMAN, K. F. red], Output and Employment Fluctuations. Heidelberg:
Physica Verlag.
KALMAN, R. E. [1960]. "A new approach to linear filtering and prediction prob-
lems". Journal of Basic Engineering, Transactions ofthe ASME, 82, Series D,
35-45.
KAMINSKY, G. [1993]. "Is there a peso problem? Evidence from the dollar/pound
exchange rate, 1976-1987". American Economic Review, 83, 450-472.
KING, R. G., & REBELO, S. T. [1993]. "Low frequency filtering and real business
cycles". Journal 0/ Economic Dynamics and Control, 17, 207-231.
KRISHAMURTHY, V., & MOORE, J. [1993a]. "Hidden Markov model signal pro-
cessing in presence of unknown detenrunistic interferences". IEEE Trans.
Autom., 38, 146-152.
KYDLAND, F. E., & PRESCOTT, E. C. [1990]. "Business cycles: Real facts and a
monetary myth". Federal Reserve Bank 0/ Minneapolis, , 3-18.
References 341
LAM, P.-S. [1990J. "The Hamilton model with a general autoregressive compon-
ent. Estimation and comparison with other models of economic time series".
Journal of Monetary Economics, 26, 409-432.
LINDGREN, G. [1978]. "Markov regime models for mixed distributions and switch-
ing regressions". ScandinavianJournal ofStatistics, 5, 81-91.
Lw, J., WONG, W. H., & KONG, A. [1994J. "Covariance structure of the Gibbs
sampIer with applications to the comparison of estimators and augmentation
schemes". Biometrika, 81, 27-40.
LÜTKEPOHL, H., & POS KITT, D. S. [1992]. Testing for Causation Using Infinite
Order Vector Autoregressive Processes. Humboldt Universität zu Berlin, SFB
373 Discussion Paper 2.
LÜTKEPOHL, H., HAASE, K., CLAESSEN, H., SCHNEIDER, W., & MORYSON,
M. [1993]. "MulTi, a menu driven Gauss program". Computational Statistics,
8, 161-163.
MINTZ, I. [1969]. Dating Postwar Business Cycles, Methods and Their Application
to Western Germany, 1950-67. New York: Columbia University Press.
MIZON, G. E., & RICHARD, J. F. [1986]. "The encompassing principle and its
implication to testing non-nested hypotheses". Econometrica, 54,657-678.
PERRON, P. [1989]. "The great crash, the oil price shock, and the unit root hypo-
thesis". Econometrica, 33, 1361-1401.
PFANN, G., SCHOTMAN, P., & TSCHERNIG, R. [1995]. Nonlinear Interest Rate
Dynamics and Implications Jor the Term Structure. Humboldt Universität zu
Berlin, SFB 373 Discussion Papers 43.
SAIKKONEN, P., & LÜTKEPOHL, H. [1994]. Infinite Order Co integ ra ted Vector
Autoregressive Processes: Estimation and Inference. Humboldt Universität zu
Berlin, SFB 373 Discussion Paper 5/1994.
SHUMWAY, R., & STOFFER, D. [1991]. "Dynamic linear models with switching".
Journal ofthe American Statistical Association, 86, 763-769.
SICHEL, D. E. [1994]. "Inventories and the three phases of the business cycle".
Journal of Business & Economic Statistics, 12, 269-278.
SMITH, A. F. M., & ROBERTS, G. O. [1993]. "Bayesian computation via the Gibbs
sampier and related Markov chain Monte Carlo methods". Journal of the Royal
Statistical Society, 55B, 3-23.
TJ0STHEIM, D. [1990]. "Non linear time series models and Markov chains". Ad-
vances in Applied Probability, 22, 587-611.
UHLIG, H. [1994]. "On Jeffreys prior when using the exact likelihood function".
Econometric Theory, 10, 633-644.
WATSON, M. W., & ENGLE, R. F. [1983]. "Alternative algorithms for the esti-
mation of dynamic factor, MIMIC and varying coefficient regression models".
Journal of Econometrics, 23, 385-400.
13.1 Growth in the World Economy. Log ofReal GNP 1960-1991 .. 312
13.2 MS(2)-VECM(1) Model. . . . . . . . . . . . . . . . . . . . . .. 320
13.3 MSH(2)-VECM(1) Model. . . . . . . . . . . . . . . . . . . . .. 322
List of Notation
Most of the notation is clearly defined in the text where it is used. The following list
is designed to provide some general guidelines. Occasionally, a symbol has been
assigned to a different objeet, but this is explained in the text. For the notation of
MS-VAR models confer Table 1.1.
Matrix Operations
® Kronecker product
o element-by-element multiplication
o element-by-element division
X-I inverse of X
X* adjoint matrix of X
xj j-th power of X
XI/2 square root of X, lower triangular Choleski deeomposition
lXI = detX determinant of X
IIXII normof X
diag x diagonal matrix containing x on the diagonal
rkX rank of X
trX trace of X
vecX column stacking operator
vechX operator stacking the elements on and below the main diagonal of a
symmetrie matrix
matrix of first order partial derivatives of 4> with respect to )..
1!L
8),,8),,'
Hessian matrix of 4>
354 List of Notation
Special Matrices
the (n xl) matrix of on es
the (m x n) identity of zeros
duplication matrix
the (n x n) identity matrix
J communication matrix (I K , 0, ... ,0)
the m-th vector of an appropriate identity matrix
L (M x N) communication matrix
A diagonal matrix containing eigenvalues on the diagonal
Constants
h forecast horizon
K dimension of the observed time series
M number of regimes
N dimension of the stacked regime vector
p order of the vector autoregression
p* AR order of the VARMA representation
q order of distributed regime lag
q* MA order of the VARMA representation
r number of cointegration vectors
R number of coefficients
T number of observations
List of Notation 355