Ms Var Libro

Lecture Notes in Economics and Mathematical Systems 454
Hans-Martin Krolzig
Markov-Switching Vector
Autoregressions
Modelling, Statistical Inference, and Application to
Business Cycle Analysis
Lecture N otes in Economies
and Mathematical Systems 454
Founding Editors:
M. Beckmann
H. P. Künzi
Editorial Board:
H. Albach, M. Beckmann, G. Feichtinger, W Güth, W Hildenbrand,
W Krelle, H. P. Künzi, K. Ritter, U. Schittko, P. Schönfeld, R. Selten
Managing Editors:
Prof. Dr. G. Fandei
Fachbereich Wirtschaftswissenschaften
Fernuniversität Hagen
Feithstr. 140/AVZ II, D-58084 Hagen, Germany
Prof. Dr. W. Trockel
Institut für Mathematische Wirtschaftsforschung (IMW)
Universität Bielefeld
Universitäts,str. 25, D-33615 Bielefeld, Germany
Springer-Verlag Berlin Heidelberg GmbH
Hans-Martin Krolzig
Markov-Switching
Vector Autoregressions
Modelling, Statistical Inference,
and Application to
Business Cycle Analysis
Springer
Author
Dr. Hans-Martin Krolzig
University of Oxford
Institute of Economics and Statistics
St. Cross Building, Manor Road
Oxford OX1 3UL, Great Britain
Lfbrary of Congress Cataloging-fn-Publication Data
Krolzig, Hans-Martin, 1964-

Markov-swltching vector autoregressions : modelling, statistical
inference, and application to business cycle analysis / Hans-Martin
Krolzig.
p. cm. -- <Lecture notes in economics and mathematical
systems, ISSN 0075-8442 ; 454)
A revised version of the author's dissertation, accepted by the
Economics Dept., Humboldt-University of Berlin, 1996.
Includes bibliographical references.
1. Business cycles--Mathematical models. 2. Social sciences-

-Statistical methods. I. Title. II. Series.
HB3711.K835 1997
338.5'42--dc21 97-10163
CIP
ISSN 0075-8442
ISBN 978-3-540-63073-9 ISBN 978-3-642-51684-9 (eBook)
DOI 10.1007/978-3-642-51684-9
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, re-use
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other
way, and storage in data banks. Duplication of this publication or parts thereof is
permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from
Springer-Verlag. Violations are liable for prosecution under the German Copyright
Law.
© Springer-Verlag Berlin Heide1berg 1997
Originally pub1ished by Springer-Verlag Berlin Heide1berg New York in 1997.
The use of general descriptive names, registered names, trademarks, etc. in this
publication does not imply, even in the absence of a specific statement, that such
names are exempt from the relevant protective laws and regulations and therefore
free for general use.
Typesetting: Camera ready by author
SPIN: 10546781 42/3142-543210 - Printed on acid-free paper
Ta my parents, Grete and Walter
Preface
This book contributes to re cent developments on the statistical analysis of multiple

time series in the presence of regime shifts. Markov-switching models have become
popular for modelling non-linearities and regime shifts, mainly, in univariate eco-
nomic time series. This study is intended to provide a systematic and operational ap-
proach to the econometric modelling of dynamic systems subject to shifts in regime,
based on the Markov-switching vector autoregressive model. The study presents
a comprehensive analysis of the theoretical properties of Markov-switching vector
autoregressive processes and the related statistical methods. The statistical concepts
are illustrated with applications to empirical business cyde research.
This monograph is a revised version of my dissertation which has been accepted by

the Economics Department of the Humboldt-University of Berlin in 1996. It con-
sists mainly of unpublished material which has been presented during the last years
at conferences and in seminars. The major parts of this study were written while I
was supported by the Deutsche Forschungsgemeinschajt (DFG), Berliner Graduier-
tenkolleg Angewandte Mikroökonomik and Sondeiforschungsbereich 373 at the Free
University and Humboldt-University of Berlin. Work was finally completed in the
project The Econometrics of Macroeconomic Forecasting founded by the Economic
and Social Research Council (ESRC) at the Institute of Economies and Statistics,
University of Oxford. It is a pleasure to record my thanks to these institutions for
their support of my research embodied in this study.
The author is indebted to numerous individuals for help in the preparation of this
study. Primarily, I owe a great debt to Helmut Lütkepohl, who inspired me for mul-
tiple time series econometrics, suggested the subject, advised and encouraged my
viii Preface
research. The many hours Helmut Lütkepohl and Jürgen Wolters spent in discuss-
ing the issues of this study have been an immeasurable help.
The results obtained and their presentation have been profoundly affected by the in-
spiration of and interaction with numerous colleagues in Berlin and Oxford. Of the
many researchers from whom I have benefited by discussing with them various as-
pects of the work presented here, I would like especially to thank Ralph Friedmann,
David Hendry and D.S. Poskitt.
I wish to express my sincere appreciation of the helpful discussions, suggestions and

comments of the audiences at the 7th World Congress ofthe Econometric Society, the
SEDC 1996 Annual Meeting, the ESEM96, theAmerican Wintermeeting ofthe Eco-
nometric Society 1997, the 11 th Annual Congress of the European Economic Asso-
ciation, the Workshop Zeitreihenanalyse und stochastische Prozesse and the Pfingst-
treffen 1996 ofthe Deutsche Statistische Gesellschaft, the Jahrestagungen 1995 and
1996 of the Verein für Socialpolitik, and in seminars at the Free-University Berlin,
the Humboldt-University ofBerlin, the University College London and Nuffield Col-
lege, Oxford.
Many people have helped with the reading of the manuscript. Special thanks go to
Paul Houseman, Marianne Sensier, Dirk Soyka and Don Indra Asoka Wijewickrama;
they pointed out numerous errors and provided helpful suggestions.
I am very grateful to all of them, but they are of course, absolved from any respons-
ibility for the views expressed in the book. Any errors that may remain are my own.
Finally, I am greatly indebted to my parents and friends for their support and encour-
agement while I was struggling with the writing of the thesis.
Oxford, March 1997 Hans-Martin Krolzig

Contents
Prologuc 1
1 Thc Markov-Switching Vcctor Autorcgressive Model 6

1.1 General Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Markov-Switching Vector Autoregressions . . . . . . . . . . . . . . . . . . . . 10
1.2.1 The Vector Autoregression . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.2.2 Particular MS-VAR Processes .. . . . . . . . . . . . . . . . . . . . . . 13
1.2.3 The Regime Shift Function . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.4 The Hidden Markov Chain . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 The Data Generating Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 Features of MS-VAR Processes and Their Relation to Other Non-
linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.4.1 Non-Nonnality of the Distribution of the Observed Time
Series ........................................... 21
1.4.2 Regime-dependent Variances ana Conditional Heteroske-
dasticity ......................................... 23
1.4.3 Regime-dependent Autoregressive Parameters: ARCH and
Stochastic Unit Roots .............................. 24
1.5 Conc1usion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
LA Appendix: A Note on the Relation of SETAR to MS-AR Processes 27
2 Tbc Statc-Space Represcntation 29

2.1 ADynamie Linear State-Space Representation for MS-VAR Pro-
cesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.1 The Gaussian Measurement Equation ................. 32
2.1.2 The Non-Nonnal VAR(1)-Representation of the Hidden
Markov Chain ., '...... '" .. .. . ................ .... 32
2.1.3 Linearity of the State-Space Representation ............ 33
x Contents
2.1.4 Markov Property of the State-Space Representation . . . . . . 34

2.2 Specification ofthe State-Space Representation. . . . . . . . . . . . . . . . 37
2.3 An Unrestricted State-Space Representation . . . . . . . . . . . . . . . . . . . 40
2.4 Prediction-Error Decomposition and the Innovation State-Space
Form... ........ .... ...... ..... ..... .. .... . . .. .......... 41
2.5 The MS-VAR Model and Time-Varying Coefficient Models..... 44
3 VARMA-Representation ofMSI-VAR and MSM-VAR Processes 47

3.1 Linearly Transformed Finite Order VAR Representations . . . . . . . . 48
3.2 VARMA Representation Theorems. . . ... . . . .... . ... . .... .... 53
3.2.1 VARMA Representation ofLinearly Transformed Finite Or-
der VAR Representations ........................... 53
3.2.2 ARMA Representation of a Hidden Markov Chain ...... 54
3.2.3 VARMA Representations ofMSI(M)-VAR(O) Processes . 54
3.2.4 VARMA Representations ofMSI(M)-VAR(P) Processes . 55
3.2.5 VARMA Representations ofMSM(M)-VAR(P) Processes 56
3.3 The Autocovariance Function of MSI-VAR and MSM-VAR Pro-
cesses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.1 The ACF of the Regime Generating Process ........... . 58
3.3.2 The ACF of a Hidden Markov Chain Process .......... . 59
3.3.3 The ACF ofMSM-VARProcesses ................... 60
3.3.4 The ACF of MSI-VAR Processes . . . . . . . . . . . . . . . . . . . . . 62
3.4 Outlook. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4 Forecasting MS-VAR Processes 6S

4.1 MSPE-Optimal Predictors ................................. 66
4.2 Forecasting MSM-VAR Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.3 Forecasting MSI-VAR Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4 Forecasting MSA-VAR Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
4.5 Summary and Outlook .................................... 75
S The BLHK Filter 77

5.1 Filtering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.2 Smoothing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.A Supplements............................................. 87
5.A.l Conditional Moments ofRegime .. ................... 87
5.A.2 A Technical Remark on Hidden Markov-Chains: The
MSIIMSIH(M)-VAR(O) Model. . ........ ........ .... 88
Contents xi
6 Maximum Likelihood Estimation 89

6.1 The Likelihood Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2 The Identification Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
6.3 Normal Equations of the ML Estimator . . . . . . . . . . . . . . . . . . . . . . . 95
6.3.1 Derivatives with Respect to the VAR Parameters ........ 96
6.3.2 Derivatives with Respect to the Hidden Markov-Chain Para-
meters............................ .......... ..... 97
6.3.3 Initial State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.4 The EM Algorithm ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 101
6.4.1 Estimation of I ................................... 103
6.4.2 Estimation of CI under Homoskedasticity . . . . . . . . . . . . . .. 107
6.4.3 Estimation of CI under Heteroskedasticity .............. 108
6.4.4 Convergence Criteria ............................. " 109
6.5 Extensions and Alternatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 110
6.5.1 The Scoring Algorithm ........................... " 111
6.5.2 An Adaptive EM Algorithm (Recursive Maximum Likeli-
hood Estimation) ................................ " 113
6.5.3 Incorporating Bayesian Priors ...................... " 115
6.5.4 Extension to General State-Space Models with Markovian
Regime Shifts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 116
6.6 Asymptotic Properties of the Maximum Likelihood Estimator .. " 118
6.6.1 Asymptotic Normal Distribution of the ML Estimator . . .. 118
6.6.2 Estimation of the Asymptotic Variance-Covariance Matrix 120
6.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 122
7 Model Selection and Model Checking 123

7.1 A Bottom-up Strategy for the Specification of MS-VAR Models .. 124
7.2 ARMA Representation Based Model Selection ................ 129
7.3 Model Checking ......................................... 131
7.3.1 Residual Based Model Checking . . . . . . . . . . . . . . . . . . . .. 132
7.3.2 The Coefficient of Determination. . . . . . . . . . . . . . . . . . . .. 133
7.4 Specification Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 134
7.4.1 LikelihoodRatio Tests.......... .... . ............... 135
7.4.2 Lagrange Multiplier Tests . . . . . . . . . . . . . . . . . . . . . . . . . .. 135
7.4.3 Wald Tests ....................................... 137
7.4.4 Newey-Tauchen-White Test for Dynamic Misspecification 139
7.5 Determination of the Number of Regimes. . . . . . . . . . . . . . . . . . . .. 141
xii Contents
7.6 Some Critical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 144
8 Multi-Move Gibbs Sampling 145

8.1 Bayesian Analysis via the Gibbs Sampier . . . . . . . . . . . . . . . . . . . .. 147
8.2 Bayesian Analysis of Linear Markov-Switching Regression Models 149
8.3 Multi-Move Gibbs Sampling of Regimes. . . . . . . . . . . . . . . . . . . .. 152
8.3.1 Filtering and Smoothing Step . . . . . . . . . . . . . . . . . . . . . . .. 153
8.3.2 Stationary Prob ability Distribution and Initial Regimes. " 154
8.4 Parameter Estimation via Gibbs Sampling .................... 155
8.4.1 Hidden Markov Chain Step ......................... 155
8.4.2 Inverted Wishart Step .............................. 157
8.4.3 Regression Step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 159
8.5 Forecasting via Gibbs Sampling. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 163
8.6 Conclusions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 165
9 Comparative Analysis of Parameter Estimation in Particular MS· VAR

Models 167
9.1 Analysis ofRegimes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 169
9.2 Comparison ofthe Gibbs Sampier with the EM Algorithm . . . . . .. 171
9.3 Estimation of VAR Parameters for Given Regimes. . . . . . . . . . . . .. 172
9.3.1 The Set of Regression Equations ................. . . .. 172
9.3.2 Maximization Step of the EM Algorithm. . . . . . . . . . . . . .. 174
9.3.3 Regression Step of the Gibbs Sampier . . . . . . . . . . . . . . . .. 177
9.3.4 MSI Specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 179
9.3.5 MSM Specifications ............................... 181
9.4 Summary ................................... " . .... . .... 183
9.A Appendix: Tables ........................................ 184
10 Extensions of the Basic MS· VAR Model 199

10.1 Systems with Exogenous Variables ......................... , 199
10.2 Distributed Lags in the Regime ............................. 202
10.2.1 TheMSI(M,q)-VAR(p)Model ...................... 202
10.2.2 VARMA Representations ofMSI(M, q)-VAR(P) Processes 203
10.2.3 Filteringand Smoothing ........................... , 205
10.3 Tbe Endogenous Markov-Switching Vector Autoregressive Model 205
10.3.1 Models with Time-Varying Transition Probabilities ..... , 205
10.3.2 Endogenous Selection ............................. , 208
10.3.3 Filtering and Smoothing .. . . . . . . . . . . . . . . . . . . . . . . . . .. 209
Contents xiii
10.3.4 A Modified EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .. 210

10.4 Summary and Outlook .................................... 211
11 Markov-Switching Models of the German Business Cycle 213

11.1 MS-AR Processes as Stochastic Business Cycle Models. . . . . . . .. 216
11.2 Preliminary Analysis...................................... 217
11.2.1 Data ............................................ 217
11.2.2 Traditional Turning Point Dating . . . . . . . . . . . . . . . . . . . .. 219
11.2.3 ARMA Representation Based Model Pre-Selection . . . . .. 220
11.3 The Hamilton Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 222
11.3.1 Estimation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 222
11.3.2 Contribution to the Business Cycle Characterization .. . .. 224
11.3.3 Impulse Response Analysis. . . . . . . . . . . . . . . . . . . . . . . . .. 227
11.3.4 Asymmetries of the Business Cycle . . . . . . . . . . . . . . . . . .. 228
11.3.5 Kernel Density Estimation .......................... 229
11.4 Models with Markov-Switching Intercepts . . . . . . . . . . . . . . . . . . .. 231
11.5 Regime-Dependent and Conditional Heteroskedasticity ......... 235
11.6 Markov-Switching Models with Multiple Regimes ............. 240
11.6.1 Outliers in a Three-Regime Model. . . . . . . . . . . . . . . . . . .. 240
11.6.2 Outliers and the Business Cycle . . . . . . . . . . . . . . . . . . . . .. 242
11.6.3 A Hidden Markov-Chain Model of the Business Cycle ... 243
11.6.4 A Highly Parameterized Model ...................... 245
11.6.5 Some Remarks on Testing. . . . . . . . . . . . . . . . . . . . . . . . . .. 247
11.7 MS-AR Models with Regime-Dependent Autoregressive Parameters 247
11.8 An MSMH(3)-AR(4) Business Cycle Model. . . . . . . . . . . . . . . . .. 250
11.9 Forecasting Performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 252
11.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 255
II.A Appendix: Business Cycle Analysis with the Hodrick-Prescott Filter 257
12 Markov-Switching Models of Global and International Business

Cycles 259
12.1 Univariate Markov-Switching Models. . . . . . . . . . . . . . . . . . . . . . .. 260
12.1.1 USA............................................ 263
12.1.2 Canada.......................................... 264
12.1.3 United Kingdom .................................. 265
12.1.4 Germany......................................... 266
12.1.5 Japan............................................ 267
12.1.6 Australia......................................... 272
xiv Contents
12.1.7 Comparisons ....... ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 274

12.2 Multi-Country Growth Models with Markov-Switching Regimes.. 277
12.2.1 Common Regime Shifts in the Joint Stochastic Process of
Economic Growth ............................. . . .. 277
12.2.2 Structural Breaks and the End of the Golden Age . . . . . . .. 278
12.2.3 Global Business Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 281
12.2.4 Rapid Growth Episodes and Recessions ............... 284
12.3 Conclusions............................................. 288
12.A Appendix: Estimated MS-DVAR Models......... ............ 290
13 Cointegration Analysis of VAR Models with Markovian Shifts in Re-

gime 297
13.1 Cointegrated VAR Processes with Markov-Switching Regimes. .. 298
13.1.1 Co integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 298
13.1.2 TheMSCI-VARModel ............................. 299
13.1.3 A State-Space Representation for MSCI-VAR Processes .. 302
13.2 A Cointegrated VARMA Representation for MSCI-VAR Processes 306
13.3 A Two-Stage Procedure ................................... 309
13.3.1 Cointegration Analysis ............................. 309
13.3.2 EM Algorithm .................................... 311
13.4 Global and International Business Cycles . . . . . . . . . . . . . . . . . . . .. 312
13.4.1 VAR Order Selection........... .................... 313
13.4.2 CointegrationAnalysis ............................. 314
13.4.3 Granger Causality ................................. 316
13.4.4 Forecast Error Decomposition ....................... 318
13.5 Global Business Cycles in a Cointegrated System . . . . . . . . . . . . .. 320
13.6 Conclusions............................................. 323
13.A Appendix: Estimated CI-VAR and MSCI-VAR Models. . . . . . . . .. 325
Epilogue 329
References 331
Tables 347
Figures 351
List of Notation 353

Prologue
Objective of the Study
In the last decade time series econometrics has changed dramatically. One increas-
ingly prominent field has become the treatment of regime shifts and non-linear mod-
elUng strategies. While the importance of regime shifts, particularly in macroecono-
metric systems, seems to be generally accepted, there is no estabUshed theory sug-
gesting a unique approach for specifying econometric models that embed changes in
regime.
Structural changes such as the oil price shocks, the introduction of European Mon-
etary System, the German reunification, the European Monetary Union and Eastem
European economies in transition, are often incorporated into a dynamic system in
a deterministic fashion. A time-varying process poses problems for estimation and
forecasting when a shift in parameters occurs. The degradation of performance of
structural macroeconomic models seems at least partly due to regime shifts. Increas-
ingly, regime shifts are not considered as singular deterministic events, but the un-
observable regime is assumed to be govemed by an exogenous stochastic process.
Thus regime shifts of the past are expected to occur in the future in a similar fashion.
The main aim of this study is to construct a general econometric framework for the
statistical analysis of multiple time series when the mechanism which generated the
data is subject to regime shifts. We build-up a stationary model where a stable vector
autoregression is defined conditionaI on the regime and where the regime gene rating
process is given by an irreducible ergodie Markov chain.
The primary advantage of the Markov-switching vector autoregressive model is to

provide a systematic approach to deliver statistical methods for: (i.) extracting the
2 Prologue
information in the data about regime shifts in the past, (ii.) estimating consistently
and efficiently the parameters of the model, (iii.) detecting recent regime shifts, (iv.)
correcting the vector autoregressive model at times when the regime alters, and fi-
nally (v.) incorporating the probability of future regime shifts into forecasts. This
Markov-switching vector autoregressive model represents a very general class which
encompasses some alternative non-linear and time-varying models. In general, the
model generates conditional heteroskedasticity and non-normality; prediction inter-
vals are asymmetric and reftect the prevailing uncertainty about the regime.
We will investigate the issues of detecting multiple breaks in multiple time series,
modelling, specification, estimation, testing and forecasting. En route, we discuss
the relation to alternative non-linear models and models with time-varying para-
meters. In course of this study we will also propose new directions to generalize the
MS-VAR model. Although some methodological and technical ideas are discussed
in detail, the focus is on modelling, specification and estimation of suitable models.
The previous literature on this topic is often characterized by imprecise generalities

or the restriction of empirical analysis to a very specific model whose specification
is motivated neither statistically nor theoretically. These limitations have to be over-
come. Therefore, the strategy ofthis study has to be twofold: (i.) to provide a general
approach to model building and (ii.) to offer concrete solutions for special problems.
This strategy implies an increase in the number of models as weH as in the complex-
ity of the analysis. We believe, however, that this price will be proven in practice to
be offset by the increased ftexibility for empirical research.
Survey of the Study
The first part of the book gives a comprehensive mathematical and statistical analysis
of the Markov-switching vector autoregressive model. In the first chapters, Markov-
switching vector autoregressive (MS-VAR) processes are introduced and their ba-
sic properties are investigated. We discuss the relation of the MS-VAR model to
the time invariant vector autoregressive model and against alternative nonlinear time
series models. The preliminary considerations of Chapter 1 are formalized in the
state-space representation given in Chapter 2, which will be the framework for ana-
lyzing the stochastic properties of MS-VAR processes and for developing statistical
techniques for the specification and estimation of MS-VAR models to fit data which
Survey ofthe Study 3
exhibits regime shifts in a stationary manner. In Chapter 3, vector autoregressive

moving average (VARMA) representation theorems for VAR models with Markov-
switching means or intercepts are given.
In Chapter 4 and Chapter 5, the statistical analysis of MS-VAR models is considered

for known parameters. In Chapter 4, optimal predictors for MS-VAR processes are
derived. Chapter 5 is devoted to an intensive discussion of the filtering and smooth-
ing techniques for MS-VAR processes which the following statistical analysis is
based on. These statistical tools produce an inference for the time paths of unob-
served regimes under alternative information sets and given parameters. It is shown
that a modification of the model by introducing time-varying transition probabilities
can be analyzed with only slight modifications within our framework.
The main part of this study (Chapters 6 - 10) is devoted to the discussion of parameter
estimation for this c1ass of models. The c1assical method of maximum likelihood
estimation is considered in Chapter 6, where due to the nonlinearity of the model,
iterative procedures have to be introduced. While various approaches are discussed,
major attention is given to the EM algorithm, at which the limitation in the previous
literature of using special MS-VAR models is overcome. The issues of identifiab-
ility and consistency of the maximum likelihood (ML) estimation are investigated.
Techniques for the calculation of the asymptotic variance-covariance matrix of ML
estimates are presented.
In Chapter 7 the issue of model selection and model checking is investigated. The
focus is maintained on the specification of MS-VAR models. A strategy for sim-
ultaneously selecting the number of regimes and the order of the autoregression in
Markov-switching time series models based on ARMA representations is proposed
and combined with c1assical specification testing procedures.
Chapter 8 introduces a multi-move Gibbs-Samplerfor multiple time series subject to

regime shifts. Even for univariate time series analysis, an improvement over the ap-
proaches described in the literature is achieved by an increased convergence due to
the simultaneous sampling ofthe regimes from their joint posterior distribution using
the methods introduced in Chapter 5. Here again, a thorough analysis of various MS-
VAR specifications allows for a greater flexibility in empirical research. The main
advantage of the Gibbs sampier is that (by invoking Bayesian theory) this simula-
tion technique enables us to gain new insights into the unknown parameters. Without
informative priors, the Gibbs sampier reproduces the ML estimator as mode of the
4 Prologue
posterior parameter distribution. In addition, it allows us to determine the posterior

distribution of parameters which might in turn give more information about the para-
meter uncertainty than the asymptotic standard deviations provided by ML estima-
tion. Finally, the Gibbs sampier makes forecast intervals readily available by incor-
porating the uncertainty that comes from estimation and non-normality of the pre-
dicted density.
Chapter 9 goes into further technical details of these estimation techniques and dis-
cusses the design of the regressions involved. Due to the computational demand of
iterative estimation techniques, major attention will be given to the development of
estimators which efficiently uses the structure of a special model. The regressions
involved by the EM Algorithm and the Gibbs sampier are explicitly compared for
all alternative specifications ofMS-VAR models. It is demonstrated that the presen-
ted EM algorithm, as weIl as the introduced Gibbs sampier, permits applications to
large systems. This reveals that the self-restriction of recent empirical investigations
to rudimentary univariate time series models, mixtures of normals or hidden Markov
chains is not justified.
Generalizations of the MS-VAR model to open dynamic systems, endogenous re-

gime selection and lag distributions of regime shift effects are discussed in Chap-
ter 10.
In the second and last part of this study, the methodology introduced in the preced-
ing chapters is applied to business cyde analysis. This is not intended to present a
comprehensive analysis of the business cyde phenomenon and of all potential con-
tributions of the MS-VAR model to business cyde analysis; such an analysis would
be dearly beyond the scope of this study. Instead, the methods developed for the
statistical analysis of systems subject to regime shifts are elucidated by specific em-
pirical investigations.
Chapter 11 demonstrates the feasibility of our approach by investigating West-

German GNP data. The analysis focuses on the problems of model selection. In
contrast to previous investigations, statistical characterizations of the German busi-
ness cycle are examined for a broad range of model specifications. While some
evidence will be found for the Hamilton model of the D.S. business cyde, specifica-
tions allowing for regime-dependent heteroskedasticity and additional outlier states
will improve the capabilities of the Markov-switching model.
This univariate analysis is supplemented by investigations of the business cyde in
Survey of the Study 5
national and global business cycles by analyzing a six-dimensional system for the
USA, Japan, West Germany, the UK, Canada, and Australia. The considerations for-
mulated in Chapter 13 suggest a new methodological approach to the analysis oj
cointegrated linear systems with shifts in regime. This methodology is then illus-
trated with a reconsideration of international and global business cyeles. The study
coneludes with abrief discussion of our major findings and remaining problems.
The study has a modular structure. Given the notation and basic structures intro-
duced in the first two chapters, most of the following chapters can stand alone.
Hence, the reader, who is primarily interested in empirical applications and less in
statistical techniques, can decide to read first the fundamental Chapters 1 and 2,
then Chapter 5 and Chapter 6 followed by the empirical analyses in Chapters 11 and
12 alongside the more technically demanding Chapter 13 and to decide afterwards
which of the remaining chapters will be of interest to hirn or her.
Although it is not necessary for the reader to be familiar with all fundamental meth-
ods of multiple time series analysis, the subject of interest requires the application
of some formal techniques. A number of references to standard results are given
throughout the study, while to simplify things for the reader we have remained as
elose as possible to the notation used in LÜTKEPOHL [1991]. In order to achieve
compactness in our presentation, we have dispensed with a more general introduction
of the topic since these are already available in HAMILTON [1993], [1994b, ch. 22]
and KROLZIG & LÜTKEPOHL [1995].
Chapter 1
The Markov-Switching
Vector Autoregressive Model
This first chapter is devoted to a general introduction into the Markov-switching vec-
tor autoregressive (MS-VAR) time series model. In Seetion 1.2 we present the fun-
damental assumptions constituting this class of models. The discussion of the two
components of MS-VAR processes will clarify their on time invariant vector auto-
regressive and Markov-chain models. Some basic stochastic properties ofMS-VAR
processes are presented in Section 1.3. Finally, MS-VAR models are compared to
alternative non-normal and non-linear time series models proposed in the literature.
As most non-linear models have been developed for univariate time series, this dis-
cussion is restricted to this case. However, generalizations to the vector case are also
considered.
1.1 General Introduction
Reduced form vector autoregressive (VAR) models have been become a dominant
research strategy in empirical macroeconomics since SIMS [1980]. In this study we
will consider VAR models with changes in regime, most results will carry over to
structural dynamic econometric models by treating them as restricted VAR models.
When the system is subject to regime shifts, the parameters () of the VAR process
will be time-varying. But the process might be time-invariant conditional on an un-
observable regime variable St which indicates the regime prevailing at time t. Let
M denote the number of feasible regimes, so that St E {I, ... , M}. Then the con-
1.1. General Introduction 7
ditional probability density of the observed time series vector Yt is given by
f(Ytl.Yt-l,fh) if St = 1
p(YtIYt-l' St) = { : (1.1)
f(YtlYt-l, BM ) if St = M,
where Bm is the VAR parameter vector in regime m = 1, ... , M and Yt-l are the
observations {Yt-j }~l'
Thus, for a given regime St, the time series vector Yt is generated by a vector auto-
regressive process of order p (VAR(P) model) such that
p
E[YtIYt-1,St] = v(St) + LAj(St)Yt-j,

j=l
where Ut is an innovation term,
The innovation process Ut is a zero-mean white noise process with a variance-

covariance matrix L;(St), wh ich is assumed to be Gaussian:
If the VAR process is defined conditionally upon an unobservable regime as in equa-

tion (1.1), the description of the data generating mechanism has to be completed by
assumptions regarding the regime generating process. In Markov-switching vector
autoregressive (MS-VAR) models - the subject ofthis study - it is assumed that the
regime St is generated by a discrete-state homogeneous Markov chain: 1
where p denotes the vector of parameters of the regime generating process.
The vector autoregressive model with Markov-switching regimes is founded

on at least three traditions. The first is the linear time-invariant vector auto-
regressive model, wh ich is the framework for the analysis of the relation of
the variables of the system, the dynamic propagation of innovations to the
1 The notation Pr(·) refers to a discrete probability measure, while p(.) denotes a probability density
function.
8 The Markov-Switching Vector Autoregressive Model
system, and the effects of changes in regime. Secondly, the basic statist-
ical techniques have been introduced by BAUM & PETRIE [1966] and BAUM
et al. [1970] for probabilistic functions oi Markov chains, while the MS-
VAR model also encompasses older concepts as the mixture oi normal dis-
tributions model attributed to PEARSON [1894] and the hidden Markov-chain
model traced back to BLACKWELL & KOOPMANS [1975] and HELLER [1965].
Thirdly, in econometrics, the first attempt to create Markov-switching regression
models were undertaken by GOLDFELD & QUANDT [1973], which remained,
however, rather rudimentary. The first comprehensive approach to the statistical
analysis of Markov-switching regression models has been proposed by LINDGREN
[1978] which is based on the ideas ofBAUM et al. [1970]. In time series analysis, the
introduction of the Markov-switching model is due to HAMILTON [1988], [1989]
on which most recent contributions (as wen as this study) are founded. Finally,
our consideration of MS-VAR models as a Gaussian vector autoregressive process
conditioned on an exogenous regime generating process is closely related to state
space models as wellas the concept of doubly stochastic processes introduced by
TJ0STHEIM [1986b].
The MS- VAR model belongs to a more general class of models that characterize a
non-linear data generating process as piecewise linear by restricting the process to
be linear in each regime, where the regime is conditioned is unobservable, and only
a discrete number of regimes are feasible. 2 These models differ in their assumptions
conceming the stochastic process generating the regime:
(i.) The mixture of normal distributions model is characterized by serially inde-

pendently distributed regimes:
In contrast to MS-VAR models, the transition probabilities are independent of

the history of the regime. Thus the conditional probability distribution of Yt is
independent of St-b
2In the case oftwo regimes, POTTER [1990],[1993] proposed to call this dass of non-linear, non-normal
models the single index generalized multivariate autoregressive (SIGMA) model.
1.1. General Introduction 9
and the conditional mean E[YtIYt-l' St-l] is given by E[YtIYt_l].3 Even so,
this model can be considered as a restricted MS-VAR model where the trans-
ition matrix has rank one. Moreover, if only the intercept term will be regime-
dependent, MS(M)-VAR(P) processes with Gaussian eITors and i.i.d. switch-
ing regimes are observationally equivalent to time-invariant VAR(P) processes
with non-nor;mal eITors. Hence, the modelling with this kind of model is very
limited.
(ii.) In the self-exciting threshold autoregressive SETAR(p, d, r) model, the

regime-generating process is not assumed to he exogenous but directly linked
to the lagged endogenous variable Yt_d. 4 For a given hut unknown threshold
r, the 'probability' of the unobservable regime St = 1 is given by
While the presumptions of the SETAR and the MS-AR model seem to be quite
different, the relation between both model alternatives is father cIose. This is
also illustrated in the appendix which gives an example showing that SETAR
and MS-VAR models can be ohservationally equivalent.
(iii.) In the smooth transition autoregressive (STAR) model popularized by GRAN-

GER & TERÄSVIRTA [1993], exogenous variables are mostly employed to
model the weights of the regimes, but the regime switching rule can also be
dependent on the history of the observed variables, i.e. Yt-d:
where F(Y~_d6 - r) is a continuous function determining the weight of re-
3The likelihood function is given by

T M
p(YTIYo;O,Ü = ~ ~ (TnP(YtlYt-l,Om),
t=1111=1
where 0 = (O~, ... , o~ Y collects the VAR parameters and {m is the ergodie probability of regime
m.
4In threshold autoregressive (TAR) processes, the indicator function is defined in a switching variable
Zt-d, d ?: O. In addition, indicator variables can be introduced and treated with error-in-variabIes
techniques. Refer for example to COSSLETT & LEE [1985] and KAMINSKY [1993].
gime 1. For example, TERÄSVIRTA & ANDERS ON [1992] use the logistic dis-
tribution function in their analysis ofthe V.S. business cycle. 5
(iv.) All the previously mentioned models are special cases of an endogenous se-
lection Markov-switching vector autoregressive model. In an EMS(M, d)-
VAR(P) model the transition probabilities Pij ( .) are functions of the observed
time series vector Yt-d:
Thus the observed variables contain additional information on the conditional

probability distribution of the states:
a.e.
Pr(Stl{St-j}~l) =1= Pr(Stl{St-j}~l' {Yt-j}~l)·
Thus the regime generating process is no longer Markovian. In contrast to the

SETAR and the STAR model, EMS-VAR models include the possibility that
the threshold depends on the last regime, e.g. that the threshold for staying
in regime 2 is different from the threshold for switching from regime 1 to re-
gime 2. The EMS(M, d)-VAR(P) model will be presented in Seetion 10.3. It
is shown that the methods developed in this study for MS-VAR processes can
easily be extended to capture EMS-VAR processes.
In this study, it will be shown that the MS-VAR model can encompass a wide spec-
trum of non-linear modifications of the VAR model proposed in the literature.
1.2 Markov-Switching Vector Autoregressions
1.2.1 The Vector Autoregression
Markov-switching vector autoregressions can be considered as generalizations ofthe

basic finite order VAR model of order p. Consider the p-th order autoregression for
the K -dimensional time series vector Yt = (Ylt, ... , Y KtY, t = 1, ... , T,
Yt = v + A1Yt-l + ... + ApYt-p + Ut, (1.2)
SIf F(·) is even. e.g. F(Yt-d - r) = 1 - exp {-(Yt-d - r)2}. a generalized exponential auto-
regressive model as proposed by OZAKI [1980] and HAGGAN & OZAKI [1981] ensues.
1.2. Markov-Switching Vector Autoregressions 11
where Ut IID (O,~) and Yo, ... , Yl-p are fixed. Denoting A(L)
IK - Al L - ... - ApLP as the (K K) dimensional lag polynomial, we as-
°
X
sume that there are no roots on or inside the unit circle IA( z) I =I for Iz I ~ 1 where
L is the lag operator, so that Yt- j = Lj Yt . If a normal distribution of the error is
assumed, Ut "" NID (0, ~), equation (1.2) is known as the intercept form of a stable
Gaussian VAR(P) model. This can be reparametrized as the mean adjusted form of
a VARmodel:
Yt - Jl = AI(Yt-1 - Jl) + ... + Ap(Yt-p - Jl) + Ut, (1.3)
where Jl = (IK - L:J=1 Aj)-lv is the (K x 1) dimensional mean ofYt.

Ifthe time series are subject to shifts in regime, the stable VAR model with its time
invariant parameters might be inappropriate. Then, the MS-VAR model might be
considered as a general regime-switching framework. The general idea behind this
dass of models is that the parameters of the underlying data generating process 6 of
the observed time series vector Yt depend upon the unobservable regime variable St,
which represents the probability of being in a different state of the world.
The main characteristic of the Markov-switching model is the assumption that the
unobservable realization of the regime St E {I, ... , M} is governed by a discrete
time, discrete state Markov stochastic process, which is defined by the transition
probabilities
M
Pij = Pr(St+1 = jlSt = i), LPij =1 Vi,j E {I, ... , M}. (1.4)
j=l
More precisely, it is assumed that St follows an irreducible ergodie M state Markov

process with the transition matrix P. This will be discussed in Section 1.2.4 in more
detail.
In generalization of the mean-adjusted VAR(P) model in equation (1.3) we would
like to consider Markov-switching vector autoregressions of order P and M regimes:
Yt-Jl(St) = AI(sd (Yt-l - Jl(St-I))+" .+Ap(sd (Yt-p - Jl(St-p))+Ut, (1.5)

where Ut "" NID (0, ~(St)) and Jl(St), Al (St), .. . ,Ap(St), ~(St) are parameter
shift functions describing the dependence of the parameters 7 Jl, Al, ... , Ap, ~ on
6 For reasons of simplicity in notation, we do not introduce a separate notation for the theoretical repre-
sentation of the stochastic process and i15 actual realizations.
7In the notation of state-space models, the varying parameters p., v, Al, ... , A p , 1; become functions
of the model's hyper-parameters.
12 The Markov-Switching Yector Autoregressive Model
the realized regime St, e.g.
if St = 1,
(1.6)
if St = M.
In the model (1.5) there is after a change in the regime an immediate one-time jump
in the process mean. Occasionally, it may be more plausible to assurne that the mean
smoothly approaches a new level after the transition from one state to another. In
such a situation the following model with a regime-dependent intercept term v(St)
may be used:
Yt (1.7)
In contrast to the linear VAR model, the mean adjusted form (l.5) and the intercept
form (1.7) of an MS(M)-VAR(P) model are not equivalent. In Chapter 3 it will be
seen that these forms imply different dynamic adjustments of the observed variables
after a change in regime. While a permanent regime shift in the mean J.L(St) causes
an immediate jump ofthe observed time series vector onto its new level, the dynamic
response to a once-and-for-all regime shift in the intercept term v(St} is identical to
an equivalent shock in the white noise series Ut.
In the most general specification of an MS-VAR model, all parameters of the au tore-
gression are conditioned on the state St of the Markov chain. We have assumed that
each regime m possesses its VAR(P) representation with parameters v(m) (or /-Lrn),
~m, Alm, ... , A jm , m = 1, ... , M, such that
+ A ll Yt-l + ... ~ A plYt-p + L;1

1/2
Yt = VI Ut, if St = 1
{
VM + A 1M Yt-l + ... + A pMYt-p + ,,1/2

LJM Ut, if St = M
where Ut '" NID (0, IK ).8
However for empirical applications, it might be more helpful to use a model where
only some parameters are conditioned on the state of the Markov chain, while the
8 Even at this early stage a complication arises if the mean adjusted fonn is considered. The conditional
density of Yt depends not only on St but also on St-l, ... , St-p. i.e. MP+l different conditional
other parameters are regime invariant. In Section 1.2.2 some particular MS-VAR
models will be introduced where the autoregressive parameters, the mean or the in-
tercepts, are regime-dependent and where the error term is hetero- or homoskedastic.
Estimating these particular MS-VAR models is discussed separately in Chapter 9.
1.2.2 Particular MS-VAR Processes
The MS-VAR model allows for a great variety of specifications. In principle, it

would be possible to (i.) make all parameters regime-dependent and (ii.) to intro-
duce separate regimes for each shifting parameter. But, this would be no practicable
solution as the number of parameters of the Markov chain grows quadratic in the
number of regimes and coincidently shrinks the number of observations usable for
the estimation of the regime-dependent parameter. For these reasons a specific-to-
general approach may be preferred for the determination of the regime generating
process by restricting the shifting parameters (i.) to apart of the parameter vector
and (ii.) to have identical break-points.
In empirical research, only some parameters will be conditioned on the state of the
Markov chain while the other parameters will be regime invariant. In order to estab-
lish a unique notation for each model, we specify with the general MS(M) term the
regime-dependent parameters:
M Markov-switching mean ,
I Markov-switching intercept term ,
A Markov-switching autoregressive parameters,
H Markov-switching heteroskedasticity .
To achieve a distinction ofVAR models with time-invariant mean and intercept term,
means of Yt are to be distinguished:

1'1 +All (Yt-l -1'1)+ ... +Apl (Yt-p-I'I )+:r:~l2ut, ie "t=I, ... , st_p=l
1'1 +All (Yt -1 -1'1)+ ... +A p l (Yt-p -1'2 )+:r:~l2ut, ie .5t=I, ... , St_p+l=l. si_p=2
Yt=
I' M +A 1 M (Yt-l -1'1)+· : . +ApM(Yt-p -I'M -1 )+:r:~Ut, iC SI =M···. t - p +l =M, 't_p=M-l

I' M+A 1 M ÜII-l -I'M)+· .. +ApM(YI-p-I'M )+:r:~Ut, iC"1 =M, ... "I_p=M
Table 1.1: Special Markov Switching Vector Autoregressive Models
MSM MSI Specification

J1. varying J1. invariant 11 varying 11 invariant
Aj Einvariant MSM-VAR linear MVAR MSI-VAR linearVAR
invariant E varying MSMH-VAR MSH-MVAR MSlli-VAR MSH-VAR
Aj Einvariant MSMA-VAR MSA-MVAR MSIA-VAR MSA-VAR
varying E varying MSMAH-VAR MSAH-MVAR MSIAH-VAR MSAH-VAR
To achieve a distinction ofVAR models with time-invariant mean and intercept term,
we denote the mean adjusted form of a vector autoregression as MVAR(p). An over-
view is given in Table 1.1. Obviously the MSI and the MSM specifications are equi-
valent if the order of the autoregression is zero. For this so-ealled hidden Markov-
chain model, we prefer the notation MSI(M)-VAR(O). As it will be seen later on, the
MSI(M)-VAR(O) model and MSI(M)-VAR(P) models with p > 0 are isomorphie
eoneeming their statistical analysis. In Section 10.3 we will further extend the dass
of models under eonsideration.
The MS-VAR model provides a very flexible framework which allows for hetero-
skedasticity, oecasional shifts, reversing trends, and foreeasts performed in a non-
linear manner. In the following sections the foeus is on models where the mean
(MSM(M)-VAR(p) models) or the intereept term (MSI(M)-VAR(P) models) are
subject to oceasional diserete shifts; regime-dependent eovarianee struetures of the
proeess are eonsidered as additional features.
1.2.3 The Regime Shift Function
At this stage it is useful to define the parameter shifts more dearly by formulating the
system as a single equation by introducing "dummy" (or more precisely) indicator
variables:
1 if St = m
I ( St = m ) = { o otherwise,
where m = 1, ... , M. In the course of the following chapters it will prove helpful
to coIlect aIl the information about the realization of the Markov chain in the vector
~t as
I(st = 1) 1
~t =[
l(s, ~ M)
Thus, ~t denotes the unobserved state ofthe system. Since ~t consists ofbinary vari-
ables, it has some particular properties:
Pr( = 1)] [pr(~t = ['d ]

St
[
Pr(s, := M) Pr(6 ~ 'M) ,
where [,m is the m-th column ofthe identity matrix. Thus E[~tl, or a weIl defined con-
ditional expectation, represents the probability distribution of St. It is easily verified
that 1~~t = 1 as well as ~:~t = 1 and ~t~: = diag (~t), where 1M = (1, ... , 1)' is
an (M x 1) vector.
For example, we can now rewrite the mean shift function (1.6) as
M
/L(st} =L /LmI(St = m).
m=I
In addition, we can use matrix notation to derive
where M is a (K x M) matrix containing the means,
M = [/LI . .. /LM]' /L = vec (M).

We will occasionally use the following notation for the variance parameters:
:E ~M ]
(KxMK)
vech ( ~ )
~m, 0' = (0'1"",
I , )'
O'M
such that
is a (K X K) matrix.
1.2.4 The Hidden Markov Chain
The description of the data-generating process is not eompleted by the observational

equations (1.5) or (1.7). A model for the parameter generating process has to be for-
mulated. If the parameters depend on a regime which is assumed to be stoehastic and
unobservable, a generating process for the states St must be postulated. Using this
law, the evolution of regimes then might be inferred from the data. In the MS-VAR
model the state process is an ergodic Markov chain with a finite number of states
St = 1, ... , M and transition probabilities Pij'
It is convenient to collect the transition probabilities in the transition matrix P,
Pu Pl2 PIM
P21 P22 P2M

P= (1.8)
Pu P12 PlM
where PiM = 1- Pil - ... - Pi,M-l for i = 1, ... , M. To be more precise,

all relevant information about the future of the Markovian process is inc1uded in the
present state ~t
where the past and additional variables such as Yt reveal no relevant information bey-
ond that of the actual state. The assumption of ajirst-order Markov process is not
especially restrictive, since eaeh Markov ehain of an order greater than one ean be
reparametrized as a higher dimensional first-order Markov process (cf. FRIEDMANN
[1994]). A comprehensive diseussion of the theory of Markov ehains with applica-
tion to Markov-switehing models is given by HAMILTON [1994b, eh. 22.2]. We will
1.3. The Data Generating Process 17
just give abrief introduction to some basic concepts related to MS-VAR models, in
particular to the state-space form and the filter.
It is usually assumed that the Markov process is ergodic. A Markov chain is said to
be ergodie if exactly one of the eigenvalues of the transition matrix P is unity and
all other eigenvalues are inside the unit circle. Under this condition there exists a sta-
tionary or unconditional prob ability distribution of the regimes. The ergodie probab-
ilities are denoted by [ = E[~tl. They are determined by the stationarity restriction
p' [ = [ and the adding up restriction 1~[ = 1, from which it follows that
(= [ IM-l -:~~,-"LM-' P;~-I'M ]-1 [~M-l]. (1.9)
If [ is strictly positive, such that all regimes have a positive unconditional probab-
ility [i > 0, i = 1, ... , M, the process is called irreducible. The assumptions of
ergodicity and irreducibility are essential for the theoretical properties of MS-VAR
models, e.g. its property of being stationary. The estimation procedures, which will
be introduced in Chapter 6 and Chapter 8 are flexible enough to capture even these
degenerated cases, e.g. when there is a single jump ("structural break") into the ab-
sorbing state that prevails until the end of the observation period.
1.3 The Data Generating Process
After this introduction ofthe two components ofMS-VAR models, (i.) the Gaussian
VAR model as the conditional data generating process and (ii.) the Markov chain as
the regime generating process, we will briefly discuss their main implications for the
data generating process.
For given states ~t and lagged endogenous variables Yt-l = (Y~-l' Y~-2' ... ,Y~,
Y~, ... , Y~ _p)' the conditional probability density function of Yt is denoted by
p(Yt I~t, Yt-l). It is convenient to assume in (1.5) and (1.7) a normal distribution of
the error term Ut, so that
p(Ytl~t = t m , Yt-d
In(21l')-1/2In IEI- 1/ 2 exp{(Yt - Ymt)/E-;;/(Yt - Ymt)}, (1.10)
where Ymt = E[Ytl~t, Yt-l] is the conditional expectation of Yt in regime m. Thus

the conditional density of Yt for a given regime ~t is normal as in the VAR model
defined in equation (1.2). Thus:
NID (Ymt, ~m),

,...., NID (Y~~t, I:(~t ® I K )) , (1.11)
where the conditional means Ymt are summarized in the vector Yt which is e.g. in
MSI specifications of the form
Yt =
v, + 2::=,' AljYt-j ].
VM + 2: j =1 AMjYt-j
Assuming that the information set available at time t - 1 consists only of the sampie
observations and the pre-sample values collected in Yt-l and the states of the Markov
chain up to ~t-lo the conditional density of Yt is a mixture of normals9 :
p(Ytl~t-l = ti, Yt-d

M
L p(Ytl~t-l = t m, Yt-d Pr(~tl~t-l = ti)
m=1
M
L Pim (In(27r)-t In I~ml-t exp{(Yt - Ymt)'~;;/(Yt - Ymt)} Jl.12)
m=1
If the densities of Yt conditionaI on ~t and Yt-l are collected in the vector 1]t as
1]t = (1.13)
equation (1.12) can be written as
(1.14)
9The reader is referred to HAMILTON [1994a] for an excellent introduction into the major concepts of
Markov chains and to TITTERINGTON, SMITH & MAKOV [1985] forthe statistical properties ofmix-
tures of normals.
1.3. The Data Generating Process 19
Since the regime is assumed to be unobservable, the relevant information set avail-
able at time t - 1 consists only of the observed time series until time t and the unob-
served regime vector et has to be replaced by the inference Pr(et!Y-r). These prob-
abilities of being in regime m given an information set YT are denoted ~mtlT and
coIIected in the vector e* as
Pr(et = LIIYT) 1
e* = [
Pr(et = l.MIYT)'
which allows two different interpretations. First, etl T denotes the discrete conditional
prob ability distribution of et given YT • Secondly, e* is equivalent to the conditional
mean of et given YT • This is due to the binarity ofthe elements of ~t, which implies
that E[emt] = Pr(emt = 1) = Pr(St = m). Thus, the conditional probability
density of Yt based upon Yt-l is given by
M
L p(Yt, et-l = LmlYi-d
m=l
M
L p(Ytlet-l = l.m, Yi-d Pr(et-l = LmlYt-d (1.15)
m=l
As with the conditional probability density of a single observation Yt in (1.15) the

conditional probability density of the sampie can be derived analogously. The tech-
niques of setting-up the likelihood function in practice are introduced in Seetion 6.1.
Here we only sketch the basic approach.
Assuming presample values Yo are given, the density of the sampie Y YT for
given states e is determined by
T
p(YIO = IIp(Ytl~t, Yi-d· (1.16)

t:=l
Hence, the joint prob ability distribution of observations and states can be calculated
as
p(Y,O p(YI~) Pr(~)
rr p(ytl~t, Y't-d II Pr(~tl~t-d Pr(~d·

T T
(1.17)
t=l t=2
Thus, the unconditional density of Y is given by the marginal density
p(Y) = J p(Y, 0 d~, (1.18)
where J f(x, ~)~ := L::=1 ... Lt:=1 f(x, ~T = ti T ,· •• , 6 = [,il) denotessum-

mation over all possible values of ~ = ~T (9 ~T-1 (9 ... (96 in equation (1.18).
Finally, it follows by the definition of the conditional density that the conditional
distribution of the total regime vector ~ is given by
Pr(~IY) = p;~~) .
Thus, the desired conditional regime probabilities Pr(~tIY) can be derived by mar-
ginalization of Pr(~IY). In practice these cumbrous calculations can be simplified
by a recursive algorithrn, a matter which is discussed in Chapter 5.
The regime probabilities for future periods follow from the exogenous stochastic
process of ~t, more precisely the Markov property of regimes, Pr(~T+hl~T, Y) =
Pr(~T+hl~T ),
L Pr(~T+hl~T, Y) Pr(~TIY)
~t
L Pr(~T+hl~T) Pr(~TIY).
~t
These calculations can be summarized in the simple forecasting rule:
Pr(sT+h = llY) 1 = [p,]h [

Pr(ST = llY) 1
[
Pr(ST+h = MIY) Pr(ST ~ MIY) ,
where Pis the transition matrix as in (1.8). Forecasting MS-VAR processes is dis-
cussed in fulliength in Chapter 4.
In this section we have given just a short introduction to some basic concepts related
to MS-VAR models; the following chapters will provide broader analyses of the vari-
ous topics.
1.4. Features of MS-VAR Processes and Their Relation to Other Non-linear Models 21
1.4 Features of MS-VAR Processes and Their Rela-

tion to Other Non-linear Models
The Markov switching vector autoregressive model is a very general approach for
modelling time series with changes in regime. In Chapter 3 it will be shown that MS-
VAR processes with shifting means or intercepts but regime-invariant variances and
autoregressive parameters can be represented as non-normal linear state space mod-
els. Furthermore, MSM-VAR and MSI-VAR models possess linear representations.
These processes may be beuer characterized as non-normal than as non-linear time
series models as the associated Wold representations coincide with those of linear
models. While our primary research interest concems the modelling of the condi-
tional mean, we will exemplify the effects of Markovian switching regimes on the
higher moments of the observed time series.
For sake of simplicity we restrict the following consideration mainly to univariate

processes
Yt v(St) +L aj(St)Yt-j + Ut, Ut rv NID (0, (J2(st)).

j=l
Most of them are made for two-regimes. Thus, the process generating Yt can be re-
written as
Yt [V2 + (VI - v2)6t) + l:[a2 + (al - (2)6t]Yt-j + Ut,

j=l
Ut rv NID (0, [(J~ + ((J~ - (J~)~lt]).
If the regime St is govemed by a Markov chain, the MS(2)-AR(P) model ensues. It

will be shown that even such simple MS-AR models can encompass a wide spectrum
of modifications of the time-invariant normal linear time series model.
22 Tbe Markov-Switching Vector Autoregressive Model
1.4.1 Non-Normality of the Distribution of the Observed Time

Series
As already seen the conditional densities p(YtlYt-d are a mixture of M normals

p(Ytl~t, Yt-l) with weights P(~tlYt-l):
M
p(YtlYt-d =L tmtlt-l!P (O"-l(Yt -Ymt))

m=1
where!pO is a standard normal density and Ymt = E[Ytl~t = i. m , Yt-l]' Therefore
the distribution of the observed time series can be multi-modal. Relying on well-
known results, cf. e.g. TITTERINGTON et al. [1985, p. 162], we can notice for M =
2:
Example 1 An MS(2)-AR(p) process with a homoskedastic Gaussian inno-

vation process Ut '" NID (0,0"2) generates bimodality of the conditional density
p(YtIYt-d if
0"-1(Ylt - Y2t) > ß(l 2: 2,
where the critical value ß(l depends on the ergodie regime probability ~lr e.g.
ßO.5 = 2 and ßO.1 = .6. 0 . 9 = 3.
In contrast to Gaussian VAR processes, MS-VAR models can produce skewness

(non-zero third-order cross-moments) and leptokurtosis (fat tails) in the distribution
of the observed time series. A simple model that generates leptokurtosis in the dis-
tribution of the observed time series Yt is provided by the MSH(2)-AR(0) model:
Example 2 Let Yt be an MSH(2)-AR(O) process,
Yt - Ji, = Ut, Ut '" NID (0, O"U(St = 1) + O"~I(St = 2)).

Then it can be shown that the excess kurtosis is given by
4 -- 2 22
E[(Yt - Ji,) ] 366(0"1 - 0"2)
-=,,-'---~:=:- - 3= .
E[(Yt - Ji,)2)2 (60"; + ~20"n2
Thus, the excess kurtosis is different from zero if O"r "# O"~ and 0 < ~l < 1.
Box & TIAO [1968] have used such a model for the detection of outliers. In order
to generate skewness and excess kurtosis it is e.g. sufficient to assurne an MSI(2)-
AR(O) model:
1.4. Features of MS-VAR Processes and Their Relation to Other Non-linear Models 23
Example 3 Let Yt be generated by an MSM(2)-AR(O) process:
so that
Then it can be shown that the normalized third moment oJYt is given by the skewness
E[(Yt - /lVl (/-LI - /-L2)3(1 - 2~1)~1(1- ~d

E[(Yt - /-L)2j3/2 - (0'2 + (/-LI - /-L2)2~1(1- ~d)3/2'
If the regime i with the highest conditional mean /-Li > /-Lj is less likely than the other
regime, ~i < ~j, then the observed variable is more likely to be Jar above the mean
than it is to be Jar below the mean.
Furthermore the normalizedJourth moment oJYt is given by the excess kurtosis
Since we have that max[l E[D,I) {~l (1- ~d} = ~ < t, the excess kurtosis is positive,
i. e. the distribution oJYt has more mass in the tails than a Gaussian distribution with
the same variance.
The combination of regime switching means and variances in an MSIH(2)-AR(0)

process (cf. Example 4) is given in SOLA & TIMMERMANN [1995]. The implic-
ations for option pricing are discussed in KÄHLER & MARNET [1994b]. For an
MSMH(2)-AR(4) model, the conditional variance of the one-step prediction eITor
is given by SCHWERT [1989] and PAGAN & SCHWERT [1990].
1.4.2 Regime-dependent Variances and Conditional Heteroske-

dasticity
An MS(M)-AR(P) process is called conditional heteroskedastic if the conditional

variance ofthe prediction eITor Yt - E[yt\Yt-l],
is a function ofthe information set Yt-l . Conditional heteroskedasticity can be in-

duced by regime-dependent variances, autoregressive parameters or means.
In MS-AR models with regime-invariant autoregressive parameters, conditional

heteroskedasticity implies that the conditional variance of the prediction eITor
Yt - E[Ytltt-l], is a function of the filtered regime vector tt-llt-l. In general, an
MS-AR process is called regime-conditional heteroskedastic if
is a function of ~t-l. Interestingly, regime-dependentvariances are neithernecessary

nor sufficient for conditional heteroskedasticity. As stated in Chapter 3, a necessary
and sufficient condition for conditional heteroskedasticity in MS-VAR models with
regime-invariant autoregressive parameters is the serial dependence of regimes.
On the other hand, even if the white noise process Ut is homoskedastic, (7"2 (St) = (7"2,
the observed process Yt can be heteroskedastic. Consider the following example:
Exarnple 4 Let Yt be an MSI(2)-AR(O) process
Yt - J.-L
with Ut ""' NID (0, (7"2) and serial correlation in the regimes according to the trans-
ition matrix P. Employing the ergodie regime probability [1, Yt can be written as
where tltlt-l = putlt-Ilt-l + P21 (1 - tlt-llt-d = (Pu + P22 - l)tlt-llt-l +

(l-P22) is the predicted regime probability Pr(St = litt-I). Thus {Yd is a regime-
conditional heteroskedastic process.
In contrast to ARCH models, the conditional variance in MS-VAR models (with

time-invariant autoregressive parameters) is a non-linear function of past squared
1.4. Features of MS- VAR Processes and Their Relation to Other Non-linear Models 25
errors since the predicted regime probabilities generally are non-linear functions of
Yt-l.
Recently some approaches have been made to consider Markovian regime shifts in
variance generating processes. The dass of autoregressive conditional heteroske-
dastic processes introduced by ENGLE [1982] is used to formulate the conditional
process; OUf assumption of an i. i .d. distributed error term is substituted by an ARCH
process Ut, cf. interalia HAMILTON & LIN [1994], HAMILTON & SUSMEL [1994],
CAI [1994] and HALL & SOLA [1993b]. ARCH effects can be generated by MSA-
AR processes which will be considered in the next section.
1.4.3 Regime-dependent Autoregressive Parameters: ARCH

and Stochastic Unit Roots
Autoregressive conditional heteroskedasticity is known from random coefficient

models. Therefore it is not very surprising that also MSA-VAR models may lead to
ARCH. This effect will be considered in the following simple example.
Example 5 Let Yt be generated by an MSA(2)-MAR( 1) process with i.i.d. regimes:
Sedat independenceofthe regimes implies PlI = I-P22 = p: the regime-dependent

autoregressive parameters al, a2 are restricted such that E[a] = alP+a2(1- p) =
O. Thus it can be shown that
E[YtIYt-l] p + (alP + a2(1 - p)) Yt-l = J-L,

E[(Yt - p)2IYt_ll (}"2 + (aip + a~(1- p)) (Yt-l - p)2.
Then Yt possesses an ARCH representation Yt = J-L + et with
where I = -al a2 > 0 and ct is white noise. Thus, ARCH( 1) models can be inter-
preted as restricted MSA(2)-AR( 1) models.
The theoretical foundations of MSA-VAR processes are laid in T J 0STHEIM [1986b].

Some independent theoretical results are provided by BRANDT [1986]. As poin-
ted out by TJ0STHEIM [1986b], the dynamic properties of models with regime-
dependent autoregressive parameters are quite complicated. Especially, if the pro-
cess is stationary for some regimes and mildly explosive for others, the problems of
stochastic unit root processes as introduced by GRANGER & SWANSON [1994] are
involved. 10
It is worth noting that the stability of each VAR sub-model and the ergodicity of the
Markov chain are sufficient stability conditions; they are however not necessary to
establish stability. Thus, the stability of MSA-AR models can be compatible with
AR polynomials containing in some regimes roots greater than unity in absolute
value and less than unity in others. Necessary and sufficient conditions for the stabil-
ity of stochastic processes as the MSA-VAR model have been derived in KARLSEN
[1990a], [1990b]. However in practice, their application has been found to be rather
complicated (cf. HOLST et al. [1994]).
In this study we will concentrate our analysis on modelling shifts in the (conditional)
mean and the variance of VAR processes which simplifies the analysis.
1.5 Conclusion and Outlook
In the preceding discussion ofthis chapter MS(M)-VAR(p) processes have been in-
troduced as doubly stochastic processes where the conditional stochastic process is a
Gaussian VAR(P) and the regime generating process is a Markov chain. As we have
seen in the discussion of the relationship of the MS-VAR model to other non-linear
models, the MS-VAR model can encompass many other time series models proposed
in the literature or replicates at least some of their features. In the following chap-
ter these considerations are formalized to state-space representations of MS-VAR
models where the measurement equation corresponds to the conditional stochastic
process and the transition equation reflects the regime generating process. In Sec-
tion 2.5 the MS-VAR model will be compared to time-varying coefficient models
with smooth variations in the parameters, i.e. an infinite number of regimes.
lOModels where the regime is switching between deterministic and stochastic trends are considered by
MCCULLOCH & TSAY [1994a].
1.A. Appendix: A Note on the Relation of SETAR to MS-AR Processes 27
1.A Appendix: A Note on the Relation of SETAR to

MS-AR Processes
While the presumptions ofthe SETAR and the MS-AR model seem to be quite differ-
ent, the relation between both model alternatives is rather dose. Indeed, both models
can be observationally equivalent, as the following example demonstrates:
Example 6 Consider the SETAR model
For d = 1 it has been shown by CARRASCO [1994, lemma 2.2] that (1.19) is a par-
ticular case 01 the Markov-switching model
which is an MS/(2)-AR(O) model. For an unknown r, define the unobserved regime

variable St as the binary variable
ijYt-l ::; r
ifYt-l >r
such that
Pr(Yt-1 ::; riSt-I, Y)

Pr(/.L2 + (/.LI - J.L2)I(St-1 = 1) + Ut-I ::; r)
Pr(Ut-1 ::; r - /.L2 - (/.LI - /.L2)I(St-1 = 1))
cI> (r - /-L2 - (/-LI -:2)I(St-1 = 1»)
Pr(St = 1ISt-r}.
Hence Stlollows afirst order Markov process where the transition matrix is defined
as
p = [P11 P12] = [ cI>(r-/1) cp( I'-l;r) ].
P2I P22 cp(r-1'-2)
(J
cp(1'-2- r )
(J
If d > 1, the data can be considered as generated by d independent series which

are each particular Markov processes. A proof can be based on the property
Pr(Stl{ St-j }~1' YT ) = Pr(StI St-2' YT ); thusfollows a second order Markov
St
chain, which can be reparametrized as a higher dimensional first order Markov
chain.
Chapter 2
The State-Space Representation
In the following chapters we will be concerned with the statistical analysis of

MS(M)-VAR(P) models. As a formal framework for these investigations we em-
ploy the state-space model which has been proven useful for the study of time
series with unobservable states. In order to motivate the introduction of state-space
representations for MS(M)- VAR(P) models it rnight be helpful to sketch its use for
the three main tasks of statistical inference:
1. Filtering & smoothing of regime probabilities: Given the conditional dens-

ity functionp(YtIYi_l, ~t), the discrete Markovian chain as regime generating
process ~t, and some assumptions aboutthe initial state Yo = (Yb, ... ,Y~_p)'
of the observed variables and the unobservable initial state ~o of the Markov
chain, the complete density function p( ~, Y) is specified. The statistical tools
to provide inference for ~t given a specified observation set YT , T :::; T are the
filter and smoother recursions which reconstruct the time path of the regime,
{~dr=l' under alternative information sets:
~*, T<t predicted regime probabilities.

~*, T=t filtered regime probabilities,
~tIT' t < T:::; T smoothed regime probabilities.
In the following, mainly the filtered regime probabilities, ~tit andfull-sample

smoothed regime probabilities, ~tIT, are considered. See Chapter 5.
2. Parameter estimation & testing: If the parameters of the model are un-
known, classical Maximum Likelihood as weIl as Bayesian estimation meth-
ods are feasible. Here, the filter and smoother recursions provide the analytical
tool to construct and evaluate the likelihood function. See Chapters 6 - 9.
30 The State-Space Representation
3. Forecasting: Given the state-space form, prediction of the system is a

straightforward task. See Chapter 4 and Section 8.5.
The framework for the statistical analysis of MS-VAR models to be presented in the
next chapters is the state-space form. The advantage of viewing MS-VAR models in
this way is that general concepts can be introduced as the likelihood principle (Chap-
ter 6) and a recursive filter algorithm (Chapter 5) which corresponds to the KaIman
filter in Gaussian state-space models.
For particular MS-VAR processes, a state-space representation with ~t as the state
vector has been introduced by HAMILTON [1994a].1 In the following section we
investigate some state-space representations of MS-VAR models. These representa-
tions are then used to work out general properties of MS-VAR processes, inter aUa
we discuss the non-normality of the state-space form, we formulate conditions for
the linearity of the state-space representation and we show that the joint process of
observed variables and regimes, ( y~, ~~)', is Markovian. In Seetion 2.2 the specific-
ation of the state-space representation is discussed with regard to its adaptation to the
particular MS-VAR models proposed in Chapter 1. In the remaining sections, three
alternative state-space representations of MS-VAR processes are introduced which
will create new insights into the theory of MS-VAR processes and will be used later
on. In Section 2.3 the adding-up restriction on the state vector is eliminated by re-
ducing its dimension. Section 2.4 formulates the state-space representation in the
predicted state vectOf. Section 2.4 presents a state-space form in the vector of VAR
coefficients which allows a comparison with other time-varying coefficient models.
2.1 ADynamie Linear State-Space Representation

for MS-VAR Processes
The state-space model given in Table 2.1 consists of the set of measurement and
transition equations. The measurement equation (2.1) describes the relation between
the unobserved state vector ~t and the observed time series vector Yt. Here, the pre-
determined variables X t and the vector of Gaussian disturbances Ut enter the model.
1 HAMILTON [1994a] considers MSIA(M)-AR(P) and MSM(M)-AR(P) models. A similar approach

is taken in HALL & SOLA [1993a], HALL & SOLA [l993b] and FUNKE et al. [1994].
2.1. ADynamie Linear State-Space Representation for MS- VAR Processes 31
Tab1e 2.1: The State Space Representation
(Al) Measurement or observation equation
Yt = XtB ~t + Ut· (2.1)
(A2) State or transition equation.
(A2a) Homogeneous Markov chain
~t+l = F ~t + Vt+1' (2.2)
(A2b) Time-varying transition probabilities
(2.3)
(A3) Initial state .
(A3a) Process starts from the unconditional prob ability distribution (.
(A3b) Process starts from an arbitrary probability distribution to.

(A3c) Deterministic initial state ~o.
(A4) The system inputs X t are known at t - 1.
1. X t = x~ 0IKlx~ = (1, Y~-l) = (l'Y~_l'''''Y~_p),

2• v
.l 0 = YoI = (Yo,""
' ,),..
Yl- . . . 11y.
p IS glven determmlstlca
(A5) The mean innovation process {vtJ is defined by
(2.4)
and uncorrelated with Ut and past values of U, ~ I Y or X:
(A6) Specification of a distribution function of the innovation process {Ut}, e.g.
(2.5)
The state vector ~t follows a Markov chain subject to a discrete adding-up restriction.
In this study, the Markov chain is assumed to be homogeneous, i.e. F t = F.
2.1.1 The Gaussian Measurement Equation
The measurement equation of the state-space form may be rewritten as
If there are restrictions on the parameter space of ß, e.g. due to regime invariant
parameters, it is sometimes more useful to consider the following formulation of the
measurement equation which is linear in parameters:
(2.7)
where the parameter vector 10 contains the regime-invariant parameters and Im,
m = 1, ... , M are the regime-dependent parameter vectors.
The specification of a well-defined distribution function of the error term Ut as in

(2.5) is necessary to render the model identifiable (cf. Seetion 6.2). The assumption
of a Gaussian error term Ut is chosen to make the model most operational. But even
if the normality assumption is relaxed and substituted by any other probability dis-
tribution function, the procedures discussed here retain their justification as approx-
imating techniques, for example as quasi-maximum likelihood estimations. 2
2.1.2 The Non-Normal VAR(l)-Representation of the Hidden

Markov Chain
The transition equation (2.2) reflects a further property of the regime generating pro-
cess described in Seetion 1.2.4. Indeed, the Markov chain governing the state vector
~t can be represented as a first-ordervector autoregression (cf. HAMILTON [1994b]):
The last equation implies that the innovation Vt is an martingale difference series.
Although the vector Vt can take on only a finite set of values, the mean E[Vtl =
2 Some information about the necessary updates of filtering and estimation procedures under non-
normality of Ut are provided by HOLST et al. [1994].
E(vtI{~t-j }f=11 equals zero. While it is impossible to improve the forecast of Vt

given the previous realizations of the Markov chain, the conditional variance of Vt,
E[vtV~ I{ ~t-j }f=11 = E[vt v: I~t-d depends on ~t-l.
The transition equation (2.2) is this VAR(1)-representation of the Markov chain,
where F = P' in MSI specifications and the more complicated construction of F
in MSM specifications is discussed in Section 2.2.
The prob ability distribution of the regime vector is given by
and we have that
Analogously, the probability distribution of the discrete innovation process Vt+ 1 con-
ditional on ~t is given for the discrete support {Lj - p' ~t}f=1 by
Due to the discrete error process {vt} given in (2.4), the state-space representation
of MS-VAR processes is non-normal. This non-normality reftects the transition of
regimes causing parameter variations which are not smooth but abrupt. Therefore
the KaIman filter does not produce an optimal inference; instead the so-called BLHK
filter will be introduced in Chapter 5 for the evaluation of the likelihood function of
this non-normal system.
2.1.3 Linearity oe the State-Space Representation
It can easily be checked that the MS-VAR model without regime-dependent autoco-
variances can be considered as adynamie linear state-space model as suggested by
HARRISON & STEVENS [1976]:
Yt (2.9)
~t+I (2.10)
34 Tbe State-Space Representation
where Ut and Vt+! are independently distributed; F t and H t

Ymt = E[Ytl~t = /'m, Yt-l], are known at time t - 1. 3
The state-space representation is called linear if the observation equation is linear in
the state vector ~t, the lagged endogenous variables Yt-l = (Y~-I"'" Y~_p)/, as
weIl as the innovations Ut and Vt. MS-VAR models which are linear in ~t, Yt-l, Ut
possess an MA( 00) representation which is linear in the Gaussian innovations
{Ut- j} 7=0 of the measurement equation and the non-normal innovations {Vt- j}f=o
of the transition equation. For this sub-dass of MS-VAR models which covers the
MSM(M)-VAR(P) and the MSI(M)-VAR(P) model, a VARMA representation will
be derived in Chapter 3.
Although the state-space representation is linear in the unobservable regime vector

~t conditionalon all other variables:
Yt
- +,,1/2
= [Ylt Lll
- +,,1/2
Ut,···, YMt
Je
LI M Ut <"t,
for regime-dependentheteroskedasticity e.g. this representation would be bi linear in

Ut and ~t.
2.1.4 Markov Property of the State-Space Representation
The representation (2.1 )/(2.2) indicates that the joint process {( Y~, ~D/} ofthe state
vector ~t and the stacked vector of observed variables Yt is Markovian, where p ad-
joining observable variables vectors are collected to Yt = (y~, Y~-1 , ... , Y~-p+ 1)' .
Thus, the relevant information concerning the evolution of the system output in the
3For MS-VAR processes the following reformulations of Ht~t = [!IIt, ... , YMt] ~t
2::::=1 ~1ntYmt are possible and each ofthem is useful for particular purposes
Xt B 6 = (x~ ® IK)B~t
= (x~®IK)(~~®IK2R)ß
(~~ ® IK)' (IM ® x~ ® IK)ß = (~~ ® x~ ® IK)ß

= [B1, ... ,B M ) (diag(~t)®IKR)(lM ®xt)
= [BI, ... , BM) (~t ® xt)
where ß = (ßi, ... , ß'u)' = vec (B) and ßm = vec (B Tn ). Obviously these transformations hold
true for any (K x KR) matrix X t = (Xt ® IK).
future (Y~+h' ~t+h)', h > 0 is completely provided by the actual state (Y~,~:)', while
the past reveals no additional information.
The process { Yt} is Markovian conditional on the history of regimes {~t},
since the conditional distribution depends only on the distribution of the error term
Ut which is assumed to be independent of Yt-l.
However, the (unconditional) marginal process {yt} generally is not Markovian,

a.e.
p(YtIYt-d =1= p(YtIYt-l)'
This is caused by the information contained in the history of the observed variable
Yt- p on the distribution of state vector ~t, thus
a.e.
Pr(~t = I,mlYt-d =1= Pr(~t = I,mIYt-l)'
Only if there is no serial correlation of regimes (mixture of normals), then the

i.i.d. assumption implies Pr(~tl~t-d = Pr(~t); and hence p( YtIYt-l; ~t-d
p( Ytl Yt-l ). Thus, the Markov property of Yt would be re-established.
In Chapter 3 it will be shown that MSI(M)-VAR(P) and MSM(M)-VAR(P) pro-
cesses possess (vector) ARMA representations. Having in rnind the results known
from time-invariant VARMA models, it is not really surprising that the former ob-
servations of Yt rnight reveal information on the unobservable ~t which would incor-
po rate additional information conceming the probability density of Yt:
M
p(YtIYt-d = L p(YtIYt-l, ~d Pr(~t = trnIYt-d·
m=l
A necessary and sufficient condition for the existence of a linear state-space repre-
sentation is that there are no regime-dependent heteroskedasticity or autoregressive
dynamies. This condition is guaranteed only for processes where the variance para-
meters, E(St) = E, and the autoregressive parameters, Aj(St) = Aj,j = 1, ... ,P,
are regime-invariant. Thus, only MSI(M)-VAR(P) and MSM(M)-VAR(P) possess
a linear MA(oo) representation of Yt in the innovations {Ut-j }~o and {Vt-j }~o'
Table 2.2: Restrietions on the Parameter Space
(A7) Alternative specifications of the matrix of transition probabilities Pt-I.
(A7a) MS(M)-VAR(P) P t- 1=P

(A7b) Mixture ofnormals (Ud. regimes) P t - 1 = 1['
(A7c) STAR model P t - 1 = 17r(zt)'
(A7d) time-varying transition probabilities P t - 1 = II(zt-d)
(A7e) SETARmodel P t - 1 = 17r(Yt-d)'
(A7f) EMS(M, d)-VAR(P) P t - 1 = II(Yt-d)
(A8) Specification ofthe transition matrix F t .
(A8a) MSI-VAR: F t = P~
(A8b) MSM-VAR: F t = diag (vec Pt ® 1 Mp - l )(lM ® IMP 0 1 ~)
(A9) Specification of the input matrix B.
(A9a) Linearity in the structural parameters '"'I: ß = R'"'I

(A9b) Nonlinearity in the structural parameters '"'I: ß = cp('"'!)
(A 10) Specification of the input matrix H t
(A10a) Linearity in regime ~t and lagged endogenous variables Xt: Yt-l
and ~t enter the observation equation additively:

X t B ~t = Het + (t~ ® I K ) AYt-l
(Al Ob) Nonlinearity in regime ~t and lagged endogenous variables Xt: X t

and ~t are correlated for information sets Yt-h, h >1
(A 11) Specification of the input matrix ~
(Alla) Homoskedastic errors, ~ = IM 0 I;: linearity in ~t and Ut·
(A11b) Regime-dependent heteroskedasticity: bilinearity in ~t and Ut.

2.2. Specification of the State-Space Representation 37
Table 2.3: Parameter Restrietions - MSI Specifications
Design of the Coeffient Matrix B

v varying v invariant
A j invariant Aj varying Ai invariant Ai varying
r; invariant MSI-VAR MSIA-VAR linearVAR MSA-VAR
:r; = IM @ r; [ VI· .. VM ] [VI ... VM] [ V ... V ]

0 ... 0
[ V ... v ]
0 ... 0 °1···OM °1···OM
r; varying MSIH-VAR MSMAH-VAR MSH-VAR MSAH-VAR
E~[ :'
0
1[ ]
VI·· .vM [ vI·· .VM ] [ V ... V ] [ V ... v ]
0 ... 0 °l···OM 0 ... 0 °1···oM
r;M
2.2 Specification of the State-Space Representation
As general filtering and estimation methods discussed later are based on this general
formulation of MS-VAR models, we have to devote some attention to the relation
of the special MS-VAR model introduced in the last chapter (cf. Table 1.1) to the
state-space representation in Table 2.1.
In Table 2.2 an overview over possible restrictions on the parameter space is given
in a systematic presentation. For MSI specifications, Table 2.3 demonstrates that the
formulation of the state-space representation is straightforward. B ut as Table 2.2 also
indicates, the state-space representation is able to capture even more general specific-
ations.
In MSM-VAR specifications as such equation (1.5), a difficulty arises from the fact
that the conditional density of Yt depends on the last p + 1 regimes,
a.e.
p(YtlSt, St-I,·· ., St-p, Yt-d =f:. p(YtISt, Yt-d·
Thus, Yt and St are not Markovian while the joint process of observable variables
Yt and the regimes (St, St-}, ... , St-p) is again Markovian.
Analogously to the transformation of a K -dimensional VAR(P) process in

a Kp-dimensional VAR(I) process by formulating the model in the vector
Table 2.4: Definition oE the State Vector ~t
1. MSI specifications: p(YtISt, Yt-d
c _ c(1) I(St:= 1) ]
c"t - c"t [
I(St = M)
2. MSM specifications: p(YtISt, St-I,· .. , St-p, Yt-r}

c _
c"t -
c(p+l)
c"t dl) 0 ~~:!l 0 ... 0 ~~:!p
I(St = 1, ... ,St-p+l = 1,St_p = 1)
I(St = 1, ... , St-P+1 = 1, 3t-p = 2)
I(St = M, . .. , St-p+1 = M, St-p = M)

(I Mr 0 I ,Mp+l-r ) c"t
c(p+l)
(I M j 0 IM 01 ~p_j) d+p 1)
Yt = (yL .. ·, Y~-P+1)" we construct a Markov chain for the stacked regime

vector:
c(r) _ C(l) tO>. C(l) tO>. C(l)
c"t - c"t 'CI c"t-l 'CI ••• c"t-r+l'
where an index (r) indicates that distribution of the last r regimes is considered.
Sometimes only the joint regime distribution ofr S; P + 1 points in time is desired,
the communication matrices for extracting the information from the stacked regime
vector ~t= ~t+l) are given in Table 2.4.
The definition of the N = MP+l dimensional state vector involves an (MP+l x
MP+ 1 ) matrix of transition probabilities. The extended transition matrix F satisfies::
Pr(~t+1 = C+1I~t)
P r (~t+l
(1) _
-
*(1)
~t+l' ~t
(1)
= ~t
*(l)
, ..• ,
(1) _ C*(I) IC(l) C(l)
~t-p+l - c"t-p+l c"t , c"t-l" •• , c"t-p
c(1) )
Pr(c{l) _ C*(l) IC(l»

- c"t+l - c"t+l c"t
2.2. Speciflcation of the State-Space Representation 39
P (,,(1) _ ,,*(1) ,,(1) _ ,,*(1) 1,,(1) ,,(1) (1) )

X r <"t - <"t , ... '<"t-p+l - <"t-p+l <"t '<"t-l"", ~t-p
or in matrix notation
diag (vec P ® I MP -l )(IM ® ~~p))

diag(vec P ® I Mp-l)(IM ® (IMp ® 1:W)~t)
diag (vec P ® I Mp -l )(IM ® IMP ® 1~ )~t
((vec P ® I MP -l) ® l~p+d 0,(IM ® IMP ® 1~)~t,
where ® is the Kronecker product and 0 denotes the element-wise matrix multiplic-
ation. Therefore, we have
F (PI ® I MP -l1:WP+d 8 (1M ®IMP ® 1~)
diag (vec P ® I Mp -l )(IM ® IMP ® 1~)

((vec P ® I MP -l) ® l~p+d 8 (1M ® IMP ® 1~).
By using the properties of the ergodie probability vector~, F~ = ~, the transition

equation,
~t+1 = F~t + Vt+l,
can be rewritten as in Section 2.1.2 as
(2.11)
where ~ = ((p+l) = ~(1) ® ... ® ~(l) is the ergodic distribution of p + 1 adjoining

regimes.
Together with the transition equation, the measurement equation has to be formu-
lated in the stacked regime vector ~t, so that B is now a (R x MP+l) matrix. For
example, in the MSM(M)-VAR(P) model 4 the measurementequation is linear in ~t
and Yt-l,
Yt H~t + A1Yt-l + ... + ApYt-p + Ut,
4For all remaining MS(M)- VAR(P) models employing an MSM specification, the measurement equa-
tions are given in the Tables 9.19-9.20.
where Ut rv NID (0, E) and the (K x MP+I) input matrix His given by
p
H -'"'A·ML·
~ J J
(2.12)
j=O
[IK,-AI, ... ,-Ap] [Ip+10(JLI, ... ,JLM)] [L~, ... , L~]' (2.13)
with A o := I K and the (M x MP+1) communication matrix L j , i.e. ~~~j

Lj~ip+l),Lj = l~j ® IM 01~p_j forO:::; j :::; p.
2.3 An Unrestricted State-Space Representation
The transition equation of the state-space representation introduced in Section 2.1.2

differs from a stable linear VAR( 1) process by the fact that one eigenvalue of F is
equal to one and the covariance matrix is singular due to the adding-up restriction,
which however ensures stability. For analytical purposes, a slightly different formu-
lation ofthe transition equation is more useful, where the identity l~~t = 1 is elim-
inated.
This procedure alters the state-space representations considered so far as the new
state vector (t is only M - 1 dimensional:
where [ is the vector of ergodic probabilities of the Markov chain.
The transition matrix :F associated with the state vector (t is given by
P11 - PMI PM-I,l - PMI
1'
:F [
([M -I)x[M -1))
Pl,M-l ~ PM,M-l PM-I,M-I - PM,M-I
where ~Mt = 1 - L:m==-I

M 1 -
~mt and ~M = 1- L:m=I
M-I -
~m has been used.
2.4. Prediction-Error Decomposition and the Innovation State-Space Form 41
Using the definition of ergodic probabilities, F ( = (, the measurement equation

can be refonnulated as
where ß denotes the unconditional mean of the parameter vector, ß = E[ B ~tl.

However (2.14) is still fonnulated in the restricted regime vector ~t. In order to intro-
duce the unrestricted regime vector (t, the regime conditioned means (or intercepts)
are collected in the matrix ß, such that
ß [ ß1 - ß M . .. ß M-1 - ß M ].
(Rx[M-1])
Obviously, the j-th row of ß is equal to zero, if the j-th element of the coefficient
vector is regime-invariant.
Altogether, the elimination of the identity results in the following system:
Xtß(t+Ut, (2.15)
F (t + Vt, (2.16)
whereVt = [IM- 1 -1M-I] vtisamartingaledifferencesequencewithanon-

singular covariance matrix and the innovation sequence in the measurement equation
is unaltered, thus
Ut '" NID (0, t + S((di9 I K ))
with f; = [ ~l ~M ] (( 0 IK) denoting the unconditional mean of :E t =
~(St) and
s= [ ~l - ~M ~M-l - ~M ].
2.4 Prediction-Error Decomposition and the Inno-

vation State-Space Form
The state-space forms considered so far have been fonnulated in the unobserved state
vector ~t. For forecasting it is more practical possessing a state-space representation
in the inferred state vector €tjt-l = E[~tlyt-ll.
In the measurement equation (2.9), the innovations reflect only the error tenn Ut for
a given regime vector {t,
(2.17)
wh ich is however not in the infonnation set at time t - 1. Since the regime is un-
observable, the one-step prediction of the regime vector {tlt-l E[{tIYt-d, is
provided by
(2.18)
Equation (2.18) uses that the evolution of regimes is governed by the Markov chain,
and therefore the expectation of {t based on an infonnation set containing {t-l would
be given by
(2.19)
Thus, for given parameters, the prediction of the observable vector of variables Yt
can be derived by inserting ttlt-l into the measurement equation and using that
E[UtIYt-l] = E[Ut] = 0:
(2.20)
The resulting predictor Ytlt-l = X t B ttlt-l compared with the observed Yt gives
the prediction error et wh ich denotes the deviation of the realization Yt from its one-
step predictions Ytlt-l = E[YtIYt-d:
(2.21)
Since et represents the unpredictable element of the observed time series Yt, it is
called the innovation process.
The prediction error et can be decomposed into two components: (i.) the Gaussian
innovation Ut affecting the measurement equation and (ii.) the effects caused by re-
gime prediction errors Ct = ~t - E[~t\Yt-d. Thus,
(2.22)
These may be compared with the expectation of Yt for an infonnation set containing
Yt-l and {t-l:
2.4. Prediction-Error Decomposition and the Innovation State-Space Form 43
and the corresponding error term
Consider now the regime prediction error, which is given by
(2.23)
An innovation Ct may have two sources: (i.) the unpredictable innovation Vt of the
regime generating process and (ii.) the error ~t-l - €t-llt-l in the reconstruction
of the regime vector at time t - 1. Analogously to et, the regime prediction error
Ct can be considered as the innovation in the regime generating process given the
information set Yt-l. Since, strictly speaking, we are interested in the inferred re-
gime vector ~t+llb we have to derive the innovation term of the modified transition
equation:
ft E[~t+lIYt] - E[~t+1IYt-l]
F {- (~t - €tlt) + (~t - F~t-d + F (~t-l - €t-1It-l) } (2.24)
F (Vt - Et + FEt-l),
which is closely related to the regime prediction error Et in (2.23) .
.The prediction-error-decomposition is now used in another specification of a state-

space model, proposed by AOKI [1990] and AOKI & HAVENNER [1991], wh ich
makes the relation between the state process and the innovations in the observed time
series more apparent. Hence, this representation is more convenient for forecasting:
Yt X t B ~tlt-l + et (2.25)
~t+1lt F~tlt-l + ft, (2.26)
where et = Yt- E[YtIYt-l] and ft = E[~t+1IYtl - E[~t+1IYt-d are martingale

difference series with regard to the information set Yt-b and Xt, B, F are known at
t - 1. These expressions will become more lucid when we discuss the BHLK filter
which delivers €tlt-l as a non-linear function of Yt-l.
Note, that the original formulation in AOKI [1990] and AOKI & HAVENNER [1991]
assurnes that the involved matrix H t = X t B is time-invariant. This presumption
is obviously violated for MS-VAR proeesses with regime-dependent autoregressive

proeesses, as there are non-linearities in ~t and Yt-l and no linear reparametriza-
tion exists. Thus, our proeedure might be regarded as a generalized innovation state-
spaee form.
2.5 The MS-VAR Model and Time-Varying Coeffi-

eient Models
Until now we have considered alternative state-spaee representations in the regime

vector ~t. But obviously, the system can also be interpreted as a linear regression
model with time-varying coeffieients (by abstracting from Markovian shifts in the
varianee parameters), where the time-varying parameter veetor ßt is governed by a
Markov chain. In order to foeus on this aspect of the model, we will now derive a lin-
ear state-spaee model in the veetor of eoefficients. This representation will enable us
to eompare the MS-VAR model to other time-varying regression models possessing
a state-spaee representation.
Define the vector of parameters prevailing in a given regime ~t as ß t = B ~t.
For the following analysis the veetor of prevailing parameters is partitioned into the
regime-dependent parameters ß: = ßS + ßS (t and the regime-invariant parameters
ß~ =ßo,thusßt = (ßr,ß:')'andß= (0, ßS')"whererk ß=min{M-l,RS}
and RS is the number of regime-dependent parameters. Analogously the matrix of
explanatory variables X t is split into the (K x RS) matrix X t and (K x [R - RS])
matrix X?, X t = (X?, Xt), where R - RS is the number of regime-invariant para-
meters.
Thus the measurement equation is given by
The formulation of the transition equation in ß: is based on an inference from the

prevailing parameters ß: on the regimes (t as we have:
The solution of this problem depends on the rank of ßS. If RS = M - 1, i.e. the
number of regime-dependent parameters is equal to the number of regimes minus
2.5. The MS- VAR Model and Time-Varying Coefficient Models 45
one, there exists a unique solution and the transition equation is given by
ßS F ßS-1(ß: - ßS) + WH1

Fß"(ß: - ßS) + WH1·
where Wt+1 = ß SVH1 and F ß' has fuH rank, rk F ß' =M - 1 = RS.
If RS > M - 1, i.e. the number of regime-dependent parameters is equal to the

number of regimes minus one, a solution is given by the Moore-Penrose inverse:
Thus the transition equation is given by
where F ß' has reduced rank, rk F ß" = M - 1 < RS, and the variance-covariance
matrix of Wt+ 1 is singular. Therefore, we will find some common shifts in the para-
meters, as long as the number of regimes M - 1 is less than the number of para-
meters. If RS <M - 1, i. e. the number of regime-dependent parameters is less than
the number of regimes minus one, there exists no linear transition equation in ßt.
In the resulting state-space representation,
Yt - Xtß X:(ß: - ßS) + Ut (2.27)
(ß:+1 - ßS) F ß • (ß: - ßS) + Wt+1, (2.28)
the 'state' is the vector of (regime-dependent) parameters ((3: - ßS) and no longer
the regime or more precisely the vector of indicator variables ~t. Again the VAR(l)
representation in (2.28) can cover as usual higher order dynamics for ßt , if the state
vector is defined as (3t = (ß~, ... , (3~-q)' and (t~ 0I R )(3t is used in the observation
equation.
Hence, the MS-VAR model under consideration can be characterized as a time vary-
ing regression model, where all eigenvalues of F (3' are inside the unit circ1e and the
innovation process Wt+1 entering the transition equation is non-normal. The uncon-
ditional mean of (3t, ß= B[, has the interpretation as the average or steady-state
coefficient vector.
A time-varying regression model as in (2.27)/(2.28) is sometimes called random

coefficient model (cf. LÜTKEPOHL [1991, sec. 13.2.1h]). This state-space model
covers various different forms of parameter variation besides the MS-VAR model.
Depending on the eigenvalues of the transition matrix F ß" and the properties of the
innovation term Wt+1, the interpretation varies from model to model (cf. e.g. NICH-
OLLS & PAGAN [1985]). For example, a 'smooth' evolution of the parameters can be
modeled by a time varying regression model, where the parameter vector ßt follows
a Gaussian VAR process. This model can be presented by a linear normal state-space
form which can be analyzed by the KaIman filter:
Yt Xt(ß t - 13) + Ut
F 1 (ß t - 13) + ... + F q(ßt+1-q - 13) + Vt,
where Ut and Vt are Gaussian white noise. Ifthe Gaussian VAR(q) process is stable,
we have the return to normality model proposed by ROSENBERG [1973]. As in
(2.28) the time varying coefficients ßt fluctuate around their constant means ß. The
difference consists in the fact that the fluctuations of the parameters are not gener-
ated by a 'smooth' linear Gaussian system, but by a 'jumping' discrete-state Markov
chain.
In contrast to most other stochastically varying models where the variations in the
regression coefficients are assumed to be normally distributed, the transitions of
the parameter vector in the MS-VAR model are not smooth but abrupt. Theyare
neither transient as in the HILDRETH & HOUCK [1968] model nor permanent as in
a random-walk coefficients model. While this representation elarifies the relation
of the MS-VAR model to other regression models with stochastically varying coeffi-
cients, the state-space form is heavily restricted and it is not recommended as a device
for empirical research.
This chapter has laid out the formal framework for the statistical analysis ofMS-VAR
models. Before we consider the issue of statistical inference, we compiete the dis-
cussion of modelling MS-VAR processes by deriving VARMA representations for
MSM(M)- VAR(P) and MSI(M)- VAR(P) processes wh ich emphasize the elose rela-
tion of this MS-VAR sub-elass to linear systems.
Chapter 3
VARMA-Representation of MSI- VAR and

MSM- VAR Processes
The previous chapter introduced the state-space representation as the basic tool
for describing vector autoregressive processes with Markovian regime shifts. This
chapter looks in greater depth at the relationship between Markov-switching vector
autoregressions and linear time series models. We develop a finite order VARMA
representations theorem for vector autoregressive processes with Markovian regime
shifts in the mean or the intercept term of the multiple time series. This result gener-
alizes concepts recently proposed by POSKITT & CHUNG [1994] for univariate hid-
den Markov-chains, and by KROLZIG [1995] for univariate MSM(M)-AR(P) and
MSI(M)-AR(P) processes.
The chapter begins with the unrestricted state-space representation introduced in

the last chapter and shows that the sub-dass of MS-VAR processes under consid-
eration can be defined as linearly transformed VAR(1) processes. Having written
MSM(M)-VAR(p) and MSI(M)-VAR(P) processes in this form, results for lin-
early transformed finite order VARMA processes provided by LÜTKEPOHL [1986],
[1987] can be applied. The derivation of the VARMA representation theorems of
MSM(M)-VAR(p) and MSI(M)-VAR(P) processes is discussed in Seetion 3.2. In
Section 3.3, the resulting statistical properties of MSM(M)-VAR(P) and MSI(M)-
VAR(P) processes are illustrated with the aid of their autocovariance functions. In
Chapter 7 we will use the results of this chapter to develop a strategy for select-
ing simultaneously the state dimension M of the Markov chain and the order p of
the autoregression based on model selection procedures of the order of a univariate
ARMA or a final equations form VARMA model.
48 VARMA-Representation oE MSI-VAR and MSM- VAR Processes
3.1 Linearly Transformed Finite Order VAR Repres-

entations
We consider MS-VAR models where the mean J.L( St) (the MSM(M)-VAR(P) model)
or the intercept tenn v(St) (the MSI(M)-VAR(P) model) are subject to occasional
discrete shifts while the variance ~(St) and the autoregressive parameters Ai(st),
i = 1, ... ,p, of the time series are assumed to be regime invariant. Three alternative
models will be distinguished:
(i.) MSI(M)-VAR(O) Processes (Hidden Markov-Chain Processes)
Yt = J.L( st} + Ut
(iL) MSM(M)-VAR(p) Processes, p >0
A(L) (Yt - J.L(st}) = Ut
(iii.) MSI(M)-VAR(P) Processes, p >0
A(L) Yt = v(St) + Ut
The unobserved regime St is governed by a hidden Markov-chain which is assumed

to be homogeneous and regular, i.e. the Markov chain is not periodic, there are no
absorbing states, and the serial dependence of regimes does not vanish.
A common feature of the models under consideration is that the observed process Yt
may be considered as the sum oftwo independent processes: a non-linear time series
process J.Lt and a linear process Zt. The models differ only in the definition of these
processes.
Hidden Markov-Chain Processes
Yt J.Lt + Zt,
J.Lt (3.1)
Zt Ut,
where Ut is identically independently distributed with mean 0 and variance matrix

~'U, Ut ,...., IID (0, ~'U), the prob ability density function of Ut is identifiable, and all
3.1. Linearly Transformed Finite Order VAR Representations 49
symbols are defined as in Chapter 1. In the hidden Markov-chain modell a white

noise process Ut is added to a discrete-state Markov process /-Lt. Thus, the meas-
urement equation is characterized by the absence of any autoregressive dynamics:
Yt - /-Lt = Ut· The enrichment of the model (3.1) with autoregressive dynamics
leads to MSI(M)-VAR(P) and MSM(M)-VAR(P) processes.
MSM(M)-VAR(P) Processes
Yt /-Lt + Zt,
/-Lt M~t, (3.2)
A(L) Zt Ut, Ut "" IID (0, I: u ).
MSI(M)-VAR(P) Processes
Yt /-Lt + Zt,
A(L) /-Lt M~t, (3.3)
A(L) Zt Ut, Ut "" IID (0, I: u ).
To simplify the notation, we are using here the same shift function /-L(St) and mat-
rix M as for the MSI(M)- VAR(P) model, where the quantities represent the regime-
dependent intercept terms.
The description of the processes is completed by the VAR(1)-representation of the

Markov chain ~t of Section 2.1.2:
Clearly all the features just described about the processes /-Lt and Zt are translated
into similar features inherited by the observed process Yt.
These observations will be formalized in the following state-space representation
of Markov-switching autoregressive processes. To derive the properties of models
1 While the hidden Markov-chain model is not widely used in econometrics, it has received considerable
attention in engineering, see e.g. LEVINSON et al. [1983) and POSKlTT & CHUNG [1994)). Hence,
there exists aseparate field of literature dealing with this model, starting with B LACKWELL & Koop-
MANS [1975] and HELLER [1965]. More recently, estimation methods have been discussed by VOlNA
[1988), LEROUX [1992], and QIAN & TlTTERINGTON [1992].
50 VARMA-Representation of MSI-VAR and MSM- VAR Processes
with regime-dependent means, but with time-invariant autoregressive and variance

parameters, we adopt a state-space representation with a mixed normal and non-
normal linear state process. For our purposes, it is beneficial to eliminate the iden-
tity 1 M~t = 1. Thus we use the unrestricted state-space representation introduced
in Section 2.3; in the vector
only the remaining M - 1 states are considered. The transition probabilities and
the regime conditioned means (or intercepts) are collected in the matrices :F and M,
such that
PU - PMl PM-l,l - PMl
1'
:F [
([M-l)X[M-l])
Pl,M-l ~ PM,M-l PM-l,M-l - PM,M-l
M [ 111 - J.LM I1M-l - J.LM ],

(Kx[M-l))
Furthermore, the following notation is used to represent the involved K -dimensional

Gaussian VAR(P) process Zt as a pK-dimensional VAR( 1) process. Let
Zt Al Ut
Zt-l IK o
Zt A= Ut=
Zt-p+l 0 o o
The linear state-space representation defines MSM- and MSI-VAR processes as lin-
early transformed VAR(1) processes:
Hidden Markov-Chain Processes
Yt - J.Ly
3.1. Linearly Transformed Finite Order VAR Representations 51
(t F(t-l + Vt, (3.5)
Zt Ut·
MSM(M)-VAR(P) Processes
Yt - /.Ly M(t + Jz t ,
(t F(t-l + Vt, (3.6)
Zt AZt - 1 + Ut·
MSI(M)-VAR(P) Processes
A(L)(Yt - /.Ly) M(t + Jz t ,

(t F(t-l + Vt, (3.7)
Zt Ut,
where /.Ly is the unconditional mean of Yt, J = [,~ 0 IK, and [,1 is the first colurnn
of the identity matrix. The equation systems (3.5), (3.7), and (3.6), allow a state-
space representation, where the state vector Xt consists of the Markov chain (t and
the Gaussian process Zt.
In order to render the following results unique, we presuppose the identifiability

of the regimes, /.Li =1= /.Lj for i =1= j, as weIl as an ergodie irreducible homogen-
eous Markov chain. For MSI(M)- VAR(P) processes, the non-uniqueness problem is
avoided by assuming that the reduced transition matrix Fis non-singular, rk F =
M -1, whichestablishesthatdeg I F(L)I = deg IIM-I-FLI = M -1 andexcludes
models with i. i .d. switching regimes. Additional regularity conditions for MSI(M)-
VAR(P) processes require the non-singularity of the p-th autoregressive matrix A p
and a full rank of the (K x [M - 1]) matrix A p M; for MSM(M)- VAR(P) models
it is presupposed that IA(L)I and I F(L)I have no common roots.
Since our analysis focuses on the prediction of shifts in the mean of an observed time
series vector, but i.i.d. switching regimes produce only a mixture distribution ofthe
prediction errors, this rank condition seems to be rather reasonable. Furthermore,
rk F = M -1 can be seen as an identifying restriction. Without additional assump-
tion conceming the distribution of Ut, an MS(M)-VAR(P) model with i.i.d. regimes
52 VARMA-Representation of MSI- VAR and MSM- VAR Processes
is observationally equivalent to a time-invariant VAR(P) model with a mixture distri-

bution of error term Ut. For example, if the non-transformed error term is normally
distributed, the new distribution is a mixture of normals such that
M
Pu{x) = I: [mrp (E;:I/2(X - H t +1 (~m - [))) ,
m=1
where H t +1 contains the expectation of Yt+1 conditional on Yt and 8t+1 = m, m =

1, ... , M as in (2.9) and rp{.) is the prob ability density function of a K -dimensional
vector of standard normally distributed variables. These reflections are visualized in
the following example:
Example 7 Consider an MS(2)- VAR(p) model with i.i.d. switching regimes. The
assumption Plm = P2m implies rk (F) = 0 as well as tt+1l t = [. Thus the
actual regime reveals no information about the future, i.e. 1It+1l t = H t +1[ and
E[Yt+lIYt,8t = 1] = E[Yt+lIYt,8t = 2]. Therefore, theconditionalmeanofYt+1
remains unaltered if states 1 and 2 are joined in a common state, M* = 1, with
ß* = ß= B [ and a transition probability of unity.
In a generalization of this argument, any MS(M)-VAR(P) process with rk F

M - r < M - 1 produces a conditional density of the observed time series Yt
which could also be generated by a process with only M - (r - 1) regimes but non-
normally distributed innovations. Suppose Yt is an MS(M + 1)-VAR(P) process with
rk F = M -1 < M. The singularity of F implies the following singular-value de-
composition (cf. MAGNUS & NEUDECKER [1994]); F can berewrittenas SAl/ 2 T',
where S and T are [M x (M - 1)] matrices such that S' S = T'T = IM -1 and A is
an [( M - 1) x (M - 1)] matrix with positive diagonal elements. Thus the density of
Yt conditional on Yt-l is observationally equivalentto an MS(M)-VAR{p) process:
Yt Httt1t-l + et
HtF~t-llt-l + et
HtSAl/2T'~t_llt_l + et.
By repeated enforcement of this procedure, each MS(M)-VAR(P) process with

rk F = M - 1 - r < M - 1 can be "reduced" to an observationally equivalent
MS( M - r)- VAR(P) process with a mixture distribution of the error term.
3.2. VARMA Representation Theorems 53
3.2 VARMA Representation Theorems
The representation ofMSM-VAR and MSI-VAR processes as linearly transfonned

VAR processes will regularly be used in this study to derive theoretical properties
of VAR processes with Markovian regime shifts. In this section this relation is em-
ployed to derive the VARMA representation ofthese processes.
3.2.1 VARMA Representation of Linearly Transformed Finite

Order VAR Representations
We are now using the property of linearly transfonned finite-order VARMA

processes to derive VARMA representations of the MSM(M)-VAR(p) and the
MSI(M)-VAR(P) model.
For reasons of convenience, we summarize results from LÜTKEPOHL [1986], [1987]

in the following lemma.
Lemma 1 (Lütkepohl) Suppose that Xt is a R-dimensional VAR(p) process with

A(L) Xt = Ut. Let G be a (K x R) matrix ofrank K. Then
Yt = G Xt = G IA(L)I-l A(L)*Ut
has a VARMA(p* , q* )-representation with
p* < deg IA(L)I

q* < m~degAij(L)
1,]
- (deg IA(L)I- p*)
where deg(·) denotes the degree of a polynomial, A(L)* is the adjoint of A(L) and
A ij (L) is the i, j-th co-factor of A(L).
Proof See LÜTKEPOHL [1987], Corollary 4.2.1. o
We are now in a position to generalize the ARMA representation theorem of

Pos KITT & CHUNG [1994].
54 VARMA-Representation of MSI-VAR and MSM-VAR Processes
3.2.2 ARMA Representation of a Hidden Markov Chain
POS KITT & CHUNG [1994] consider a hidden Markov chain, which in our notation
is a univariate MSI(M)-AR(Q) model.
Proposition 1 (poskitt and Chung) Let Yt denote a hidden Markov chain, Yt

I-Lt + Ut, where I-Lt is a regular, homogeneous, discrete-time Markov chain, Ut is a
zero-mean, white noise process with variance o-~, and the noise Ut is independent 01
the signal I-Lt. Then there exists an ARMA(M - 1,M - 1) representation
where ct is a zero mean white noise process with variance 0-;.

Proof See POSKITT & CHUNG [1994], Lemmas 2.1 to 2.4. o
3.2.3 VARMA Representations ofMSI(M)-VAR(O) Processes
Using the state-space representations and the methodology of Lemma 1, the result
of POSKITT & CHUNG [1994] can be easily extended to vector systems.
Proposition 2 Let Yt denote an MS/( M )-VAR(Q) process satisfying (3.5). Then

there exists afinal equation10rm VARMA(p*,q*) representation with p* q* ~
M-1,
where ct is a zero mean vector white noise process, ')'(L) = 1 - ')'1 L - ... -
')'M-1L M-l is the scalar AR operatorand B(L) = IK - B1L - ... - BM-1L M-l
is a (K x K) dimensional lag polynomial 010rder M - 1.
Proof Rewrite the MSI(M)-VAR(O) model as
[M IK 1x" with Xt = [ ~: l
Xt
[: ~] X'-l + [ :: ].
3.2. VARMA Representation Theorems 55
Solving the transition equation for Xt and inserting the resulting vector MA( (0) re-
presentation for Xt in the measurement equation results in
F(l) 0 -1 [ Vt ]
(Yt - J-Ly) = [M IK ] [ 0 IK ] Ut'
where F(l) = IM -1 - FL. Applying Lemma 1 we get the final equation form of a
VARMA(M - 1,M - 1) model:
Notethatp* = q* = M-1issatisfiedifthescalarlagpolynomial'Y(l) = lF(l)1 =

" M - l 'Yi li an d B(l) = "
L.,.,i=1 M - l B i li are co-pnme.
L.,.,i=1 . 0
3.2.4 VARMA Representations ofMSI(M)-VAR(P) Processes
Proposition 2 can now be applied to derive the VARMA representation ofMSI(M)-

VAR(P) processes:
Proposition 3 Suppose that Yt is an MS/( M )-VAR( p) process satisfying (3.7). Un-

der quite general regularity conditions, Yt possess the VARMA( M +p - 1,M - 1)
representation
where G(l) is a (K x K) dimensional lag polynomial of order M +p - 1, B(l)

is a (K x K) dimensional lag polynomial of order M - 1, and Ct is a zero mean
vector white noise process.
Proof The proof is a simple extension of the previous one. Consider the process
Y; = A(l) (Yt - J-Ly). Since the relation A(l) (Yt - J-Ly) = M(t + Ut holds by
definition, the transformed process Y; satisfies the conditions of Proposition 2. This
MSI(M)-VAR(O) process has the VARMA(M -1, M - 1) representation
IF(l)ly; = MF(l)*Vt + IF(l)IUt. (3.8)
U sing the definition of Y; leads to the VARMA( M + p - 1 ,M - 1) representation as
IF(l)IA(l)(Yt - J-Ly) = MF(l)*Vt + IF(l)IUt. (3.9)

If Yt is a vector valued process, we have to take into account that equation (3.9) is
not a final equation form. Multiplying with the adjoint A(L)* gives the final equation
form
IF(L)IIA(L)I(Yt - ILy) = A(L)* MF(L)*Vt + IF(L)IA(L)*Ut, (3.10)
which is a VARMA(M + K p-1,M + (K -1 )p-1) model. In exceptional situations

this gives only the maximal order.
3.2.5 VARMA Representations ofMSM(M)-VAR(P) Processes
Dur last proposition is concerned with MSM(M)-VAR(P) processes.
Proposition 4 Let Yt denote an MSM( M )-VAR(p) process satisfying (3.6). Then

there exists afinal equationform VARMA(M + Kp - 1,M + Kp - 2) represen-
tation
where Ct is a zero mean vector white noise process; under quite general regularity
conditions, ,(L) is a scalar lag polynomial of order M + Kp - 1, and B(L) is a
(K x K) dimensional lag polynomial of order M - K p - 2.
Proof The MSM(M)-VAR(p) model, rewritten as
Yt - ILy [M J 1
Xt with Xt = [ :: 1
Xt
[: : 1 Xt-l + [ :: 1
satisfies obviously the conditions of Lemma 1. Therefore, we have the final equation
form VARMA(M + Kp - 1,M + Kp - 2) model:
IF(L)IIA(L)I(Yt - ILy) = MIA(L)IF(L)*Vt + IF(L)IA(L)*Ut,

where F(L) = IM-l - FL and A(L) = IK - AlL - ... - ApLP. It can be easily
verified that the order of the scalar AR polynomial in the lag operator is equal to the
3.3. The Autocovariance Function of MSI-VAR and MSM-VAR Processes 57
order of IF(L)I, M - 1, plus the order of IA(L)I, Kp, while the order ofthe matrix
MA polynomial in the lag operator equals
max {Kp, M -1, Kp + M - 2, M - 1 + (K - l)p} = M + Kp - 2.
In general, it is not ensured that the relations between the order of the MS-VAR
model and the VARMA representation given in Propositions 3 and 4 hold with equal-
ity. In exceptional cases, where regularity conditions are not satisfied, they give just
upper bounds for the order of the VARMA representations.
3.3 The Autocovariance Function of MSI-VAR and

MSM-VAR Processes
This section illustrates the correspondence between VARMA models and the MS-
VAR processes by deriving the autocovariance function of MSI-VAR and MSM-
VAR processes. The autocovariance function (ACF) provides a way to determine the
parameters of the VARMA representation as functions of the underlying MS-VAR
parameters.
In general, the state-space representation (Al)-(A6), where the transition equation

is restricted on the dynamics of the regime and the lagged endogenous variables are
incIuded in the measurement equation, is not well-suited to derive the ACF of an
MS(M)-VAR(P) process. It is more useful to model the autoregressive propagation
mechanism explicitly by extending the state vector to a partly Gaussian state vector
and partly non-Gaussian regime vector. Hence, the representation ofMSM-VAR and
MSI-VAR processes as linearly transformed VAR processes is again employed to
derive the autocorrelation function (ACF) of MS(M)- VAR(P) processes.
As we have seen in the preceding sections, the observed process Yt is the sum of the
two independent processes I1t and Zt. Hence, the moments of Yt are determined by
those of the non-linear time series process I1t and the linear process Zt:
l1y E[Ytl = E[I1t1 + E[zt],

~y Var (Yt) = Var (l1t) + Var (zt),
where the Gaussian innovations Ut are white noise, such that
3.3.1 The ACF of the Regime Generating Process
In order to derive some basic results conceming the unrestricted state vector (t, recall
the transition equation of Seetion 2.3
(3.11)
where Vt+l are the non-Gaussian innovations ofthe Markov chain. By repeated sub-
stitution of (3.11), the state (t results as a weighted sum of all previous innovations
Vt-j,j ~ 0:
00
(t = 2:= FjVt-j. (3.12)

j=O
Ifthe process is started from a fixed initial state, (0, equation (3.12) changes to
(3.13)
Hence the expectation of (t is given by
and its variance is equal to

t-l
E[(t(;] = L FjEvFjl, (3.14)
j=O
where E v is the unconditional variance matrix of Vt. Analogously, the covariances

can be calculated as
t-l
Cov ((t, (t-h) = FtEoFt-hl + 2:= PEvp- hl , (3.15)
i=O
For a deterministic or a stochastic initial state (0, whose distribution is not identical
with those of the steady-state distribution derived above, (0 -f 0, the resulting sys-
tem represents neither a mean nor a variance stationary process as long as F =f. O.
3.3. The Autocovariance Function of MSI-VAR and MSM- VAR Processes 59
However, if all eigenvalues of F are less than unity in absolute value, F t -+ 0 for
t -+ 00, and the inftuenee of an initial state (0 disappears asymptotically. Analog-
ously, the responses of the system to past impulses diminish. In Markov ehain mod-
els, the assumptions of ergodieity and irreducibility guarantee that all eigenvalues of
F are less than unity in absolute value, and that the innovation proeess Vt is station-
ary. Hence {(t} is asymptotically stationary.
Ifthe process has an infinite history (or a stoehastie initial state (0 with (010 = 0),
the first and second moments of (t are determined by
E[(tl 0 (3.16)
vec r dO) .- vec Var ((t) = (I - F 0 F)-1 vec I: v , (3.17)
r dh) .- Cov ((t, (t-h) = Fhr ,(0), h 2 0, (3.18)
where the invertibility of (I - F 0 F) in (3.17) follows from the stability of (3.11),

cf. LÜTKEPOHL [1991, eh. 2.1.4]. In practice, it is often easier to eaIculateI:, as
[1(1- [d
I:, := Var ((t) =[
-[M-1[1
Then, (3.17) can be used to determine I: v .
Analogously, the ACF of the M dimensional regime vector ~t is equivalent to the

ACF of a stable VAR(1) process:
E[~l
3.3.2 The ACF of a Hidden Markov Chain Process
First, we consider the hidden Markov chain

Yt - /-Ly (3.19)
where Ut "'" NID (0, ::Eu), the mean /-Ly = M [ of the observed time series Yt is
determined by the ergodie probabilities [. Using the independence of the innovations
Ut and Vt, the variance of Yt is seen to be
(3.20)
and its autocovariances are given by
(3.21)
Since the hidden Markov chain model exhibits no autoregressive structures, the serial
dependence of the regimes is the only source of autocorrelations in the data. This is
illustrated in the following example:
Example 8 Let Yt denote an MSI(2)-AR(O) model (hidden Markov chain), Yt - /-Ly =

/-Lt + Ut, where Ut "'" NID (0, (J~) and /-Lt = (/-LI - /-L2)(t. Then it can be shown that
the ACF is given by
r y(o)
ry(h)
With r y(h) = (Pl1 + P22 - 1)ry(h - 1)for h > 1, the ACF ofthe MS/(2)-AR(O)
process corresponds to the ACF of an ARMA( J, J) model.
3.3.3 The ACF ofMSM-VAR Processes
While the ACF of the hidden Markov chain is exclusively deterrnined by the Markov
chain, MSM(M)-VAR(p) processes exhibit more complex dynamies. From (3.2),
the representation as a linearly transformed VAR(1) process ensues:
Yt - /ly
00
(3.22)
L
00
Zt Aj Ut-j,
j=O
where the mean /ly is defined as in (3.1), J = ~~ 0 IK, Ut = ~l 0 Ut and ~ u =

(~1"~) 0 ~u. The variance is seen to be
(3.23)
wherevec ~z = (I(Kp)2 - A0 A)-lvec ~u. TheACF ofan MSM(M)-VAR(P)

process is therefore given by
M Fh~( M' + JA h J'~ u (3.24)
for h 2: O. Consider again a univariate example:
Example 9 Let Yt denote an MSM(2)-AR( 1) model:
Yt - /ly /lt + Zt,

/lt
Zt
where Ut rv NID (0, aD. The unrestricted regime process (t possess the usualAR(1)
representation:
where Vt is a non-Gaussian white noise process. From equations (3.23) and (3.24)
follows that the ACF is given by
for h ~ 0 and r y(h) = r y(- h) for h < O. Under the regularity conditions 0:1 f:. 0,
(PU + P22 - 1) f:. 0 and 0:1 f:. (PU + P22 - 1), equation (3.25) corresponds to the
ACF of an ARMA(2, 1) model, such that
r y(h) = .,plr y(h - 1) + .,p2r y(h - 2) for h > 1 with
3.3.4 The ACF ofMSI-VAR Processes
The ACF of an MSI(M)-VAR(P) process (3.3) can be traced back to the ACF of
a hidden Markov chain Xt on which the linear filter A(L) = IK - 'L;=1 AjLj is
applied
Xt
Xt
where the mean J-Ly = (I K - 'L;=1 Aj ) -1 D of the observed time series Yt is de-
termined by the ergodic probabilities t. Note that the ACF of Xt is given by (3.21),
such that
ME,M' +E u for h = 0,
rx(h) = { h (3.26)
MY E(M' for h > 0,
and r x(h) = r x( -h) for h < O. Furthermore, the covariances of Yt and Xt
are given by E[Xt(Yt - J-ty)') = ME[(tY~) + Eu and E[Xt(Yt-h - J-ty)']
FhE[(t_hY~_h) for h > 0, where we have used that
and thus (E[(tY~] - 'L j=1 .PE[(tY~]A~) = E, M'. Hence the ACF of an
MSM(M)-VAR(P) process is determined by
p
ME[(tY~] + Eu for h = 0
ry(h) - LAjry(h - j)
j=1 = { M FhE[(tY~] for h >0
(3.27)
This is illustrated in the following simple example:
Example 10 LetYt denoteanMSI(2)-AR(1)model, Yt = V+(VI -V2)(t+alYt-l +

Ut, where Ut '" NID (0, u~) and (t = f(t-l + Vt, f := (Pl1 + P22 - 1). For the
following calculations it is convenient to rewrite the process in the form:
Then, the ACF is determined by the inhomogeneous system of linear difference equa-
tions
r y(h) - (al + f)r y(h - 1) + aIfr y(h - 2)

(VI - V2)2 u; + u~ for h = 0
= { -f~ forh=l
o for h 2: 2
where r y(h) = r y( -h) for h < 0 and u; = (1 - f2)[1 (1 - [d. Thus the ACF of
an MSI(2)-AR( I) process can be calculated recursively for h > 1
with initial va lues
which corresponds to the ACF 0/an ARMA(2, 1) model as it has been stated in Pro-
position 4.
Table 3.1: Most Parsimonious MS-AR Model with an ARMA Representation
MSI(M)-AR(p) MSM(M)-AR(P)
ARMA(p*, q*) M =q* +1 M E{2, ... , q* + I}

p =p* - q* p =max{p* - M + 1, O}
3.4 Outlook
For the hidden Markov-chain model, POSKITT & CHUNG [1994] provide consistent
statistical procedures for identifying the state dimension of the Markov chain based
on linear least-squares estimations. In Section 7.2 we propose for MSM(M)-AR(P)
and MSI(M)-AR(P) models a specification procedure based on an ARMA(p*, q*)
representation which is dosely related to POS KITT & CHUNG. An overview is given
in Table 3.1 for univariate ARMA processes.
The dass of models considered in this chapter is restrictive in the sense that the or-
der of the AR polynomials cannot be less than the MA order (under regularity con-
ditions). In order to generate ARMA(p*, q*) representations where p* < q* holds,
it would be necessary to introduce MS(M)-ARMA(P, q) models, which are compu-
tationally unattractive, or to use the approach introduced in Section 10.2. There we
generalize the MSI(M)-VAR(P) model to an MSI(M, q)-VAR(P) model character-
ized by an intercept term which does not depend only on the actual regime St, but is
also conditioned on the last q regimes.
Chapter4
Forecasting MS-VAR Processes
One major objective of time series analysis is the creation of suitable models for pre-
diction. It is convenient to choose the optimal predictor Yt+hlt in the sense of a min-
imizer of the mean squared prediction eITor (MSPE),
(4.1)
Then it is also quite standard, see e.g. LÜTKEPOHL [1991, section 2.2] , that the op-
timal predictor Yt+hlt will be given by the conditional mean for a given information
set T t
(4.2)
In contrast to linear models, the MSPE optimal predictor Yt+hlt usually does not
have the property of being a linear predictor if the true data-generating process is
non-linear. In general, the derivation of the optimal predictor may be quite complic-
ated in empirical work. An attractive feature of the MS-VAR model as a c1ass of
non-linear models is the simplicity of forecasting if the optimal predictor (4.2) is ap-
plied. In the following section, the optimal predictor of MS-VAR processes is de-
rived. The properties of this predictor are shown for the MSM-VAR model in Sec-
tion 4.2 and for the MSI-VAR models in Seetion 4.3. Then, problems wh ich arise
with MSA -VAR models are discussed and an approximating technique to overcome
these problems is introduced. Finally, the forecasting facilities proposed for MS-
VAR processes are compared with the properties of forecasting with Gaussian VAR
models in Section 4.5.
66 Forecasting MS-VAR Processes
4.1 MSPE-Optimal Predictors
By ignoring the parameter estimation problem, the MSPE-optimal forecast can be

generated by the conditional expectation (4.2) of the measurement equation (2.1) and
the transition equation (2.2) of the state-space representation,
XtB ~t + Ut,
F (~t -~) + Vt+I,
where the assumptions (A1)-(A6) made in Table 2.1 apply. Thus, X t = (1, Y~-1)0
IK with Yt-l = (Y~-l" .. ,Y~_p)" In MSI specifications (cf. Table 2.3), the matrix
B contains the parameter vectors ßm associated with the regime m = 1, ... , M,
with interceptterms V m and the autoregressive parameters Om = vec (Al m, ... , Apm ).
As also stated in Chapter 2, in MSM specifications, the regime vector ~t is
N = M(p+l) dimensional, so that B is a ([K(Kp + 1)] x N) matrix with Vi
as a function of fL and 0, i = 1, ... , N.

Consider first the simplest case, where the parameter matrix B is known and the
regressor matrix X t is deterministic. Then the expectation of Yt+I conditionalon
the regime ~t+l and the observations yt is given by:
(4.3)
where we have used the unpredictability of the innovation process Ut, i.e.
E[Ut+Ilyt,~t+d = O. Thus, in case of anticipation of regime m, the optimal
predictor would be X t+ I ßm.
In practice these assumptions have to be relaxed. For example, the unknown para-
meter matrix B might be replaced by the ML estimator, which is asymptotically un-
biased. Having forecasts for the predetermined variables, the major task is to forecast
the evolution of the Markov Chain. As discussed in Section 2.4, this prediction can
be derived from the transition equation (2.2) as
(4.4)
4.1. MSPE-Optimal Predictors 67
Since Vt+1 in the general linear MS regression model is non-normal, the inferences
€tlt and €t+1lt depend on the information set Yt in a non-linear fashion. Thus, in
contrast to Gaussian state-space models, the one-step prediction of Yt+1l t cannot be
interpreted as a linear projection. Inserting the forecast of the hidden Markov chain
(4.4) into equation (4.3) yields the one-step predictor Yt+1lt:
E[Yt+1I X t+11 = X t +I B €t+1l t

X t+1ß + X t+1 B F(€tlt - Ü· (4.5)
Starting with the one-step prediction formula (4.5), general predictions can be de-
rived iteratively as long as the elements of Xt+h are uncorrelated with the state vector
ßt = B ~t+h
Yt+hlt E[Yt+hIYt] = E[XHhIYtj B €t+hlt

E[Xt+hIYtjß + E[Xt+hIYt]B Fj (€tlt - () . (4.6)
In OUf time series framework, it is crucial whether equation (4.6) holds true if lagged
endogenous variables are included in the regressor matrix X t+ j. In MSA-VAR mod-
els, the correlation of the lagged endogenous variables contained in X t with the re-
gime vector ~t may give rise to a problem which is unknown in VAR models with
deterministically varying parameters. I In general, equation (4.6) does not hold if X t
contains lagged endogenous variables
(4.7)
This problem does not occur in models with time-invariant autoregressive parameters
and constant transition probabilities, which can be represented as
(4.8)
where the matrix H = (VI, ... , VM) in MSI models and H is a function of J.L and
a = vec (Al, . .. ,A p ) in MSM models. 2
1 See e.g. the discussion in LÜTKEPOHL [1991, ch.12].

2For the MSM model, details are discussed in the following Section 4.2.
Equation (4.8) implies that the lagged endogenous regressors in X t+h and the regime
vector ~t+h enters the system additively. Hence, the regressors in Xt+h and the para-
meter vector ßt+h = B ~t+h are independently distributed, E[Xt+hßt+hIYT] =
E[Xt+hIYT]E[ßt+hIYT]. The optimal forecast of Yt+h is given by equation (4.6),
and
E[Xt+h IYt] B {t+hlt
H (~+ Fh({tjt - Ü) +(Y~+h-llt ® IK) a. (4.9)
Thus, primary attention is given to MSM and MSI processes. Since we are only inter-
ested here in the optimal predictor, it is not necessary to distinguish between models
with or without heteroskedasticity. Consider first a subclass of processes for which
a computationally effective algorithm can easily be constructed.
4.2 Forecasting MSM-VAR Processes
Predictions of MSM(M)- VAR(P) processes can be based on the state-space repre-

sentation (3.6), which has been introduced to derive the VARMA representation of
an MSM(M)-VAR(P) process:
Yt (4.10)
Zt+l A Zt + Ut+1, (4.11)

p/(~il) - {(l») + Vt+1, (4.12)
where M = [/-LI . .. /-LM] is (K x M), J = [IK 0 . .. 0] = L~ 0I K

Yt - M~?) Ut
Yt-l - M "'t-l
t(l)
o
is a (K x pK) matrix, Zt = , Ut+ 1 = , and A =
Yt-p+1 -
M~(l)
t-p+1 0
A p- 1 Ap
0 0
is a (Kp x Kp) matrix.
o IK 0
4.2. Forecasting MSM-VAR Processes 69
Hence the problem of calculating the conditional expectation of Yt+h can be reduced
to the predictions of the Markovian and the Gaussian component of the state vector
(z~, ~?)')':
(4.13)
By using the law of iterated predictions, we first derive the forecast of ~~~h condi-
tional on ~?) and of Zt+h respectively. Applying the expectation operator to (4.11)
and (4.12) yields,
(4.14)
(4.15)
Then, the expectation operator is again applied to the just derived expressions, but
now conditional on the sampie information Yt
E[C(l) lY.j- E [E[C(l)

"'t+h t -
IC(1)jlY.]
"'Hh '>t
- (1) -
t - '>t+hlt -
t(l)+pti(C(l)
'> '> * _t(l») (416)
'> , .
(4.17)
Thus, the optimal predictor is given by
(4.18)
where we have used the mean ofthe observed time series given by J-Ly = M ~(1). The
reconstructed Gaussian component Ztlt is delivered as a by-product of the filtering
procedures (see Chapter 5) for the regime vector, tgi,
(4.19)
It needs no further darification to verify that the forecasts of YHh converge to the
unconditional mean of Y, if the eigenvalues of p' and A are inside the unit cyde.
In contrast to Gaussian VAR models, where interval forecasts and forecast regions
can be derived on the basis of the conditional mean Yt+hlt and the h-step MSPE
matrix ~t+hlt = E [(Yt+h - Yt+hlt)(YHh - YHhltYIYt], the conditional first
and second moments are not sufficient to determine the conditional distribution of
Yt+hlYt which is a mixture ofnormals, e.g. for h = 1,
N
p(Yt+1IYt) = L Pr(~)t+1 = I,mIYt) r.p (L:~1/2(Yt+1-Ym.t+d),
m=l
where N = MP+1 and r.p(.) is the prob ability density function of a K -dimensional
vector of standard normals. Although the preceding calculations have been straight-
forward, in practice it is rather complicated to construct interval forecasts analytic-
ally.
To illustrate the implications of MSI-VAR processes on forecasting, we investigate

the properties of the prediction eITors. The prediction eITor Yt+h - Yt+hlt can be
decomposed according to three causes of uncertainty: the unpredictable future in-
novations Ut+h, the unknown changes in regime due to the unpredictable Vt+h and
the difficulties in detecting the actual state of the Markov chain:
Yt+h - Yt+hlt
Using again t!~hlt - (1) = P,h(t!ji - (1») as in (4.17) and Zt+hlt = Ahz t as in
(4.16) yields:
M (~~~h - Plh(t~ji - (1»)) + Mplh(~?) - t~ii)

+J(Zt+h - A hZt } + JA h(Zt - Ztlt}.
Inserting now the definition (4.19) of Ztlt results in:
Yt+h - Yt+hlt M (~~~h - P1h(t!ii - (1»))

+M plh(~?) - €~ii) + J(Zt+h - A h Zt )
+JAh(Ip 0 M)(t(1) c(1) t(1)

~t - ~tit ' ... ,~t-p+1 - c{l) )'
~t-p+llt .
By iterating on equations (4.12) and (4.11) we get finally:

4.3. Forecasting MSI-VAR Processes 71
h h
Yt+h - Yt+hlt - M L p,iVt+i +J L AiUt+i + M P,h(~i1) - tiii) (4.20)
i=l i=l
+ JA h(I
p®
M)((C(l) c(1»)'
<"'t - <.,,* ,"', <"'t-p+l
(C(l) j(l)
- <"'t-p+llt
)')'
.
The first term, M " h,i

L..-i=l P Vt+i + J" h·
L..-i=l A1Ut+i, refiects the uncertainty due
h
to future system shocks vt+j and Ut+j. The second term, M P' (~?) - ~iin +
A
JA h(I p ® M)((C(l) C(l»)'

<"'t - <"'tit ,"', (C(l) C(l)
<"'t-p+l - <"'t-p+llt . c I as-
' lS caused by regIme
),),.
sification eITors and might be called filter uncertainty. If parameters have to be es-
timated as is usually the case in practice, another term enters due to parameter un-
certainty.
Since it is quite complicated to derive the conditional density of Yt+h given T = yt

analytically, simulation techniques are used to forecast MS-VAR processes. For this
purpose, a Gibbs sampier will be introduced in Section 8.5, which incorporates all
three sources of uncertainty by deriving the conditional density by simulation.
4.3 Forecasting MSI-VAR Processes
For the MSI model, MSPE optimal forecasts can be derived by applying the condi-
tional expectation to the measurement equation,
Yt = H~t + A1Yt-I + ... + ApYt-p + Ut, (4.21)
where the lagged endogenous variables Yt-l and the regime vector ~t enter addit-
ively as in (3.7).
Thus, the optimal h-step predictor is given by
(4.22)
h
=F = (VI, ... , VM). Toderiveaclosedformsolu-
A - A -
with~t+hlt-~ (~*-O and H

tion for Yt+hlt, we use the VAR(l) representation of a VAR(P) process introduced in
Section 3.1. Denoting Yt = (y~, ... 'Y~-P+1)" the equation (4.21) can then be re-
written as
Yt H ~t + JAYt-l + Ut,
H
o
where A = is a (Kp x Kp) matrix, H =
o IK 0 o
1,1®Hisa(KpxM)matrixandJ = [I K 0 o ] = l,~ ® IK is a (K x K p)
matrix.
Thus, we get the optimal predictors by solving the following linear difference equa-
tion system
Yt+hlt H ~t+hlt + AYt+h-1Itl (4.23)
(€t+hlt - ~) F (€t+h-llt - [). (4.24)
In contrast to linear VAR(P) models, the optimal predictor Yt+ hIt depends not only on
the lastp observations Yt, butis based on the fuH sampIe information yt through €tlt
h-l
Yt+hlt = '2:= A h-i H €t+h-ilt + A h yt . (4.25)
i=O
Thus, the desired predictor is
Yt+hlt - J.L (4.26)
Although the optimal predictor is linear in the vector of smoothed regime probabil-
ities ~tlt and the last p observations of Y t , Yt+hlt is a non-linear function of the ob-
served yt as the inference ~* depends on yt in a non-linear fashion.
4.4 Forecasting MSA-VAR Processes
While the absence of restrictions on the parameter matrix B can simplify estima-
tion (as will be shown in Chapter 9 for the MSIAH-VAR model), concerning fore-
casts, the situation worsens if the autoregressive parameters are allowed to be regime
dependent. For MSIAH-VAR processes, the observed variable Yt can no longer be
4.4. Forecasting MSA-VAR Processes 73
represented as a sum of a linear Gaussian process and a non-linear discrete process.

Instead, the interaction of these innovation processes has to be analyzed in the frame-
work ofthe doubly stochastic time series model introduced by TJ0STHEIM [1986b].
The one-step prediction 1It+1l t can be ca1culated, as in the general linear MS regres-
sion model, by
(4.27)
But this is only due to the fact that X t+1 is deterministic given Yt, E[Xt+1IYt, ~t] =
X t+1' while in general E[Xt+jIYt, ~t] =j:. E[Xt +j IYt].3 Thecrucial point conceming
the MSA-VAR model is that Yt is a non-linear function ofthe regimes {~t-d~o'
This is due to the (Y~ 0 IK) Ot term, where the autoregressive parameter vector
Ot = [al,"" aM ]~t and the lagged endogenous variable Yt-j enter, which is ob-
viously a function of {~t-j-i}~O' More precisely, while the one-step prediction

Yt+1lt can be derived according to (4.27) as
zlt+1lt + Al.t+1lt Yt ... + Ap.t+1It Yt-p+1

H ([ + F(€tlt - [)) (4.28)
+(Y~0IK)[al, ... ,aM] ([+F(€tlt -[)).
A two-step prediction for example would involve the following conditional expect-
ations:
E[Vt+2IYt] + E[At+2Yt+1IYtj
E[Vt+2IYt] + E[At+2(Vt+1 + At+IYt)IYt]
E[Vt+2IYt] + E[A t +2Vt+IIYt] + E[A t +2At+1IYt]Yt,
where At+2At+1 = J''5. t +2 A'5.t+1 AJ, which uses the notation,
A
(MKxMK)
J '::'t
(MKxK) (MKxMK)
3 Similar problems arise if the transition probabilities are time varying. such that F varies stochastically.
such that At = J'3 t AJ = J' A3t J and Vt = J'3 t v, v = (v~, ... , VM)'.
It can be easily verified that
J'E[3t+2IYt]V + J' AE[~t+2~~+1IYt]v

+J' AE[~t+2~~+1IYt]AJYt,
where we have assumed an MSA(M)-VAR(1) process for the sake of simplicity.

Hence, h-step predictions involve conditional cross-moments of the future states and
are thus hard to calculate if h becomes large.
To have a tractable forecasting tool at one's disposal, approximations have to be used.

Fortherelated time-varying state-space model, DOAN, LITTERMAN & SIMS [1984]
suggest to approximate the conditional expectation,
E[Yt+hIYt]
r r
} Yt+h }Yt+1.t+h-l
p(Yt+h, Yt+1.t+h-lIYt) Yt+h dYt+h dYt+1.t+h-l
r r
}Yt+h }Yt+1.t+h-l
p(Yt+hIYt+l.t+h-l, Yt) p(Yt+l.t+h-lIYt)
by substituting the predicted values Yt+1.t+h- 1 It = (Y~+h-llt"'" Y~+1lt)' for the

unknown lagged endogenous variables Yt+1.t+h-l = (Y:+h-l' ... 'Y~+1)"
E[Yt+hIYt+1.t+h-1It, Yt]
r
}Yt+h
Yt+h p(Yt+hIYt+1.t+h-llt, Yt) dYt+h,
where E[Yt+hIYt+l.t+h-llt, Yt] denotes the conditional expectation

E[Yt+h IYt+h-l, ... ,Yt+1, Yt] evaluated at Yt+i = Yt+il t , 0 < j < h. Thus,
we can forecast Yt+h by applying equations (4.28) and (4.4) recursively, and sub-
stituting the future lagged endogenous variables Yt+i contained in Xt+h with the
j-step forecasts Yt+hlt, which of course are no Ion ger the conditional mean for
j > 1. To summarize, we get the predictor,
(4.29)
4.5. Summary and Outlook 75
which is obtained from (4.7) and which is not the optimal predictor E[Yt+hIYtj as in
the MSM-VAR and MSI-VAR model.
In practice, the parameters are unknown and we have to estimate them. Hence, the
usual procedure of substituting the unknown parameters by their estimates, whieh
are a non-linear function of the observed past values, is itself only an approximation.
Therefore, the predictor given in (4.29) might be justified for the same reasons.
4.5 Summary and Outlook
In this chapter we have investigated the effects of the non-normality and the non-
linearity of the MS-VAR model on forecasting. It has been shown that:
(i.) the optimal predictor of MSM( M)- VAR(P) and MSI(P)-VAR(P) models is lin-
ear in the last p observations and the regime inference, but there exists no
purely linear representation of the optimal predictor in the information set.
The results could be compared with forecasts based on the VARMA represen-
tation of these processes.
(ii.) If the autoregressive parameters will be regime-dependent, then the optimal

multi-step predictor loses the property of linearity in observations and in re-
gime inference. These problems associated with MSA(M)-VAR(P) processes
can be avoided by approximating the conditional mean through iterative
pseudo one-step predictions which have the advantage of possessing this
linearity property.
(iii.) The predicted probability densities are non-normal and thus in general neither
symmetrie, homoskedastic, nor regime invariant.
The feasibility of the proposed forecasting techniques in empirical research is

demonstrated in Section 11.9. Since the developed forecasting devices employ
the MSPE optimal predictor, our analysis could skip the problem of non-Gaussian
densities. However conditional heteroskedasticity and non-normality of conditional
densities are essential features of the MS-VAR model. It is unsatisfactory, therefore,
to restriet the forecasting facilities to the optimal predictor. A forecasting tool which
incorporates parameter uncertainty, non-normality of the prediction error, as weIl
as non-linearities of the process, is introduced in Chapter 8 with the Gibbs sampier.
A main advantage of the Gibbs sampier is the feasibility of generating forecasting

intervals by producing the predicted density of Yt+h given yt.
Before we consider this simulation technique, which invokes Bayesian theory, the
filtering techniques delivering the statistical inference about the regimes and the clas-
sical method of maximum likelihood estimation in the context of this non-linear time
series model are presented in the following chapters.
Chapter 5
The BLHK Filter
An important task associated with the statistical analysis of MS-VAR models is dis-
cussed in this chapter: the filtering and smoothing of regime probabilities. In the
MS-VAR model the state vector ~t is given a structural interpretation. Thus an infer-
ence on this unobserved variable is of interest for its own sake. However, the filtered
and smoothed state probabilities provide not only information about the regime at
time t, but also open the way for the computation ofthe likelihood function and con-
sequently for maximum likelihood estimation and likelihood ratio tests.
The discrete support of the state in the MS-VAR model allows to derive the com-
plete conditional distribution of the unobservable state variable instead of deriving
the first two moments, as in the KaIman filter (cf. KALMAN [1960],[1963] and KAL-
MAN [1961]) for normal linear state-space models, or the grid-approximation sug-
gested by KITAGAWA [1987] for non-linear, non-normal state-space models.
In their re cent form, the filtering and smoothing algorithms for time series models
with Markov-switching regimes are dosely related to HAMILTON [1988], [1989],
[1994a] building upon ideas of COSSLETT & LEE [1985]. The basic filtering and
smoothing recursions have been introduced by BAUM et al. [1970] for the recon-
struction of the hidden Markov chain. Their algorithms have been applied by LIND-
GREN [1978] to regression models with Markovian regime switches. A major im-
provement of the smoother has been provided by the backward recursions of KIM
[1994]. For these reasons, the recursive filter and smoother for MS-VAR models is
termed in the following chapter the Baum-Lindgren-Hamilton-Kim (BLHK) filter
and smoother. However, this name should not diminish the contributions of other
researchers to the development of related methods; for example, the basic filtering
formula has been derived independently by TJ0STHEIM [1986b] for doubly stochas-
tic processes with a Markov chain as the exogenous process goveming the parameter
78 The BLHK Filter
evolution. Recently, elosed-fonn solutions to BLHK recursions have been proposed

by FRIEDMANN [1994].
The aim of this chapter is to present and evaluate the algorithms proposed in the lit-
erature in the context of our settings and to discuss their implications for the follow-
ing analyses. In Seetion 5.1 algorithms to derive the filtered regime probabilities ~*
are presented. Smoothing algorithms delivering the full-sample conditioned regime
probabilities, ~tIT' are considered in Section 5.2. This will be done under the as-
sumption that the parameter vector ,.\ is known. In practice, ,.\ is usually unknown
and has to be estimated with the methods to be described in Chapter 6. Some related
technical remarks elose the discussion.
5.1 Filtering
The filter introduced by HAMILTON [1989] can be described as an iterative al-

gorithm for calculating the optimal forecast of the value of ~t+l on the basis of
the infonnation set in t consisting of the observed values of Yt, namely yt
, ... ,Yl-
(Yt,' Yt-l' ' )p ' •
The BLHK Filter might be viewed as a discrete version of the KaIman filter, where
the state-space representation is given by (Al) - (A6):
Yt XtB ~t + Ut,
~t+l F ~t + Vt+l·
Note that the (N xl) regime vector is M -dimensional for MSI specifications, while
we consider for MSM specifications the stacked regime vector collecting the infor-
mation about the last p + 1 regime realizations, N = MP+l.
By assuming that all parameters ofthe model are known, the discrete-state algorithm
under consideration summarizes the conditional prob ability distribution of the state
vector ~t+l by
5.1. Filtering 79
Since each component of (t+1lt is a binary variable, (t+1lt possesses not only the
interpretation as the conditional mean, which is the best prediction of ~t+1 given Yt,
but the vector ~t+1lt also presents the conditional probability distribution of ~t+1.
Analogously, the filter inference (tlt on the current state vector based only on cur-
rently available data is defined as:
The filtering algorithm computes (tlt by deriving the joint probability density of ~t
and Yt conditioned on observations Yt.
By invoking the law ofBayes, the posterior probabilities Pr(~tIYtl Yt-d are given
by
p(Ytl~tl Yt-d Pr(~tIYt-d

(5.1)
p(YtIYt-l)
with the prior probability
Pr(~tIYt-d = L Pr(~tl~t-l) Pr(~t-lIYt-l) (5.2)

~t-l
and the density
Note that the summation involves all possible values of ~t and ~t-l.
Let 'r}t be the vector of the densities of Yt conditional on ~t and Yt-l
P(YtIBl..' Yt-l) ]
'r}t = [
p(YtlBNl Yt-t}
where Bhas been dropped on the right hand side to avoid unnecessary notation, such
that the density of Yt conditional on Yt-l is given by p(YtIYt-d = 'r}~(tlt-l =
l/v('r}t 0 (tlt-d.
80 The BLHK Filter
Then, the contemporaneous inference ~tlt is given in matrix notation by
1Jt 0 ~tlt-l
~tlt (5.4)
where 0 denotes the element-wise matrix multiplication and IN = (1, ... , 1Y is a

vector consisting of ones. The filter weights for each regime the conditional density
of the observation Yt, given the vector Dm of VAR parameters of regime m, with the
predicted probability ofbeing in regime m at time t given the information set l't-l.
Thus, the instruction (5.4) describes the filtered regime probabilities ~* as an update
of the estimate ~tlt-l of ~t given the new information Yt.
Consider, for example, an MS(2)-VAR(P) model with a Gaussian white noise: equa-
tion (5.4) traces the ratio of posterior regime probabilities
A A
~ltlt _ ~ltlt-l 1Jlt

6 tlt ~2tlt-l 1J2t
back to the conditionallikelihood ratio 1Jlt and the prior ~ltlt-l . If one denotes :F =
1J2t ~2tlt-l
Pu - (1 - P22) and Umt = Yt - Xtßm, then the filtered probability €ltlt of regime
1 is found as
~ltlt
1 - ~ltlt
The transition equation implies that the vector ~~+1lt of predicted probabilities is a
linear function of the filtered probabilities ~tlt:
(5.5)
The sequence {€tlt-l };=l can therefore be generated by iterating on (5.4) and (5.5),
which can be summarized as:
F(1Jt 0 €tlt-d
~t+llt (5.6)
1'(1Jt 0 €tlt-d
In the prevailing Bayesian context, ~tlt-l is the prior distribution of ~t. The posterior
distribution €tlt is calculated by linking the new information Yt with the prior via
5.1. Filtering 81
Bayes' law. The posterior distribution ttit becomes the prior distribution for the next
state ~t+I and so on.
The iteration is started by assuming that the initial state vector ~o is drawn from the
stationary unconditional probability distribution ofthe Markov chain tilO = tor by
handling ~o parametrically. In this case, ~Ilo is an additional parameter vector to be
estimated.
Equations (5.4) or (5.6) present a fast algorithm for calculating the filtered regime
probabilities. For analytical purposes, it can be useful to have a final form of ~~It
which depends only on the observations Yt and the parameter vector ..\. The desired
transformation of equation (5.4) can be achieved as follows: equation (5.4) can be
rewri tten as
(5.7)
Denoting for simplicity,
equation (5.7) results in

Kt~t-llt-I
(5.8)
A
~tit = A ,
l' Kt~t-llt-l
where we have used that I' tt+llt = 1 holds by definition, and that expressions (5.2)
and (5.3) can be collected as
17~ttlt-l = 1'(diag (17t)ttlt-l) = 1'(diag (17t) Ftt-l1t-d

= 1'Kt tt - l 1t- l '
Solving the difference equation in {ttit}, we get the following final form of ~~It as
= TI p(Yt-jIYt-j-d ~o = p(Yt!Yo)
t-I Kt-j 1 TI K
(t-l
t- j
)
~o. (5.9)
j=O j=O
Expression (5.9) verifies that the regime probabilities are linear in the initial state ~o,
but non-linear in the observations Yt-j entering 17t-j and the remaining parameters.
82 The BLHK Filter
5.2 Smoothing
The filter recursions deliver estimates for ~t, t = 1, ... , T based on information up
to time point t. This is a limited information technique, as we have observations up
to t = T. In the following, full-sample information is used to make an inference
about the unobserved regimes by incorporating the previously neglected sampie in-
formation Yt+1.T = (Y~+l' ... ,y~)' into the inference about ~t. Thus, the smooth-
ing algorithm gives the best estimate of the unobservable state at any point within
the sampie.
Different approaches are available to calculate these probabilities, i.e. the smoothed
inference about the state at date t based on data available through some future date
T > t, where T := T is considered here exc1usively. The algorithm introduced by
HAMILTON [1988], [1989] derives the full-sample smoothedinference€tlT from the
common probability distribution of ~t and ~T conditional on YT ,
Pr(~tIYT) = L Pr(~T, ~tIYT), t = 1, ... , T - l. (5.10)

~T
Pr(~T, ~tIYT) can be constructed recursively,
where
Pr(~T,~tIYT-l) = L Pr(~T-l,~tIYT-l)Pr(~TI~T-d·
~T-l
Unfortunately, this approach is computationally demanding. Therefore,

KIM'S [1994] smoothing algorithm is considerably more attractive. His smoother
may be interpreted as a backward filter that starts at the end point t = T of the
previously applied filter.
The full-sample smoothed inferences €tlT can be found by iterating backward from
t = T - 1, ... , 1 by starting from the last output of the filter ~TIT and by using the
identity
L Pr(~t, ~t+1IYT)
~t+l
L Pr(~tl~t+l' YT ) Pr(~t+lIYT)' (5.11)

~t+l
5.2. Smoothing 83
For pure VAR models with Markovian parameter shifts, the probability laws for
Yt and ~t+1 depend only on the current state ~t and not on the former history of
states.Thus, we have
Pr(~tl~t+l, yt, Yt+1.T)

p(yt+1.TI~t, ~t+1, yt) Pr(~tI~t+1' yt)
p(yt+1.TI~t+1, yt)
Pr(~tl~t+1' yt).
It is therefore possible to calculate the smoothed probabilities ~tlT by getting the last
term from the previous iteration of the smoothing algorithm tt+1l r , while it can be
shown that the first term can be derived from the filtered probabilities ttjt,
(5.12)
If there is no deviation between the full information estimate, ~t+1IT, and the in-
ference based on the partial information, ~t+1lt. then there is no incentive to update
ttlT = ttlt and the filtering solution ttlt cannot be further improved.
In matrix notation, (5.11) and (5.12) can be condensed to
~tlT (5.13)
where 0 and 0 denote the element-wise matrix multiplication and division. The re-
cursion is initialized with the final filtered probability vector tTIT. Recursion (5.13)
describes how the additional information yt+1.T is used in an efficient way to im-
prove the inference on the unobserved state ~t. As an illustration of this consider
i.i.d. switching regimes, where F = [1'. The missing serial dependence of regimes
implies that the observation at time t is a sufficient statistic for a regime inference.
Past observations, ttlt-l = ~, as weH as future observations, ttlT = ttlt are irrelev-
ant.
84 The BLHK Filter
The filtering recursion (5.4) and the smoothing recursion (5.13) are the base for
computationally appealing algorithms which will be used for parameter estimation.
However for theoretical purposes, it is sometimes beneficial to possess a final form
solution for ttlT. It can be easily verified that the final form solution of (5.13) is
identical to FRIEDMANN'S [1994] smoothing algorithm.
In (5.12) the ratio of smoothed and filtered regime probabilities at time t has been
traced back in a recursive fashion to the regime inferences for t + 1, Pr('t+1!YT)
and Pr('t+1IYt). To see the basis for another approach, apply Bayes' law to the
smoothed probability Pr('t IYt+1.T' Yt) to get the identity
P (t IY; )
r ':,t
= Pr (t
T - ':,t
IY;
t+1.T, t
Y;) = p(Yt+1.TI't,
("t/'
Yt) Pr('tIYt)
l"t/') , (5.14)
P .I t+1.T .I t
where the ratio of smoothed and filtered regime probabilities is reduced to the ra-
tio of the conditional probability density ~t, p(Yt+ 1. T I't, Yt), and the unconditional
density p(Yt+1.TIYt) ofthe new information Yt+1.T.
From the discussion of the filtered probabilities it follows that
l' ( rr
T-t-1
J==O
KT-j
)
/'m,
l' ( rr
T-t-l
J=O
KT-j
)
ttlt.
Inserting these formulae into equation (5.14) results in
~, 1 , (n
T- t- 1K
T-j )
'*
j=O /'m A
'm.tIT = ,( T-t-1 ) ~ . 'rn.fIt

1 nj=o KT~j
Thus, the vector of smoothed probabilities 'tiT is given by
~, l' (nJ:;-l KT-j) diag (ttld

l' (rr T - t - 1 K
(5.15)
'tiT =
j=O ':,*
.) i
T-J
Lastly, the final form for ttlT follows from the definition of filtered probabilities ac-
cording to equation (5.9):
5.2. Smoothing 85
diag (rr~:~ Kt-j~o) (rrJ:lt K:+j ) 1

~t)T
(rrJ:Ol K ~o
(g K<-j) ~o] (8 K;+j) l

I' T- j )
~ p(Y:IYo) [ 0 [ (5.16)
Equation (5.16) represents the smoothed regime probability vector €tlT as a non-
linear function of the past observations Yt and future observations Yt+1.T; except
for the normalization constant, ~tlT is linear in ~o.
A drastic simplification of this final form solution occurs if the regimes are seri-
ally uncorrelated (mixtures-of-normals model). By applying the recursions (5.4) and
(5.13), being aware ofthe unpredictability of ~t,
A _
~tlt-l = ~,
we get the following filtered and smoothed probability distributions:
€tlt [l~(17t 0 ~)rl(17t 0~)

~tlT [l('(€t+1IT 0 ü] 0 €tlt = [11'€t+lIT] 0 €tlt = €tlt.
Thus the filtered and smoothed probabilities are identical:
€tlT = €tlt <X 17t 0 ~. (5.17)
Due to the independence of regimes, future observations Yt+1.T reveal no infor-

mation on the actual regime. Thus the optimal inference ~mtlT in (5.17) is propor-
tional to the product of the ergodic regime probability ~m and the conditional density
of Yt given St = m, p(ytiSt = m, Yt-l).
In an MSI(2)-AR(P) model with two i.i.d. regimes, for example, the smoothed prob-
ability of regime 1 at time t is given by
~ltiT
86 The BLHK Filter
where Ymt denotes again the conditional mean E[YtISt = m, l't-1] = vm +

2:;::::10jYt-j, m = 1,2. These results in the case ofa very simple MS-VARmodel
emphasize that the outputs of the BLHK filter and smoother are non-linear functions
of the sampie observations. The same must hold true for constructs involving filtered
or smoothed regime probabilities, e.g. as we have seen for the optimal predictor in
the last chapter. With the derivation of the BLHK filter and smoother, the instru-
ments are available to make an inference about the unobserved regime at time t, ttlt
and ttlT and thus the parameters at time t, ßt = B 't.
5.A. Supplements 87
5.A Supplements
5.A.l Conditional Moments of Regime
Some cross products of the smoothed and filtered states might be of interest and can
also be calculated recursively. At first, we consider the conditional variance of ~t and
the predicted variance of ~t+l given Yt:
E[(~t - ttlt)(~t - ttit)'IYtl

E[~t~~IYtl- E(~tIYtlE[~~IYtl
diagCttit) - t*~lt· (5.18)
E[(~t+1 - tt+llt)(~t+l - tt+llt)'IYtl

E[~t+l~;+lIYtl - E[~t+lIYtlE[~~+lIYtl
diag ett+1ld - tt+lltt~+llt'
diag (Fttit) - Fttltt~lt F'. (5.19)
Obviously, both moments are functions ofthe inference about the actual regime ~tlt.
The conditional variance of the parameter vector ßt+ j and future values of the ob-
served variable Yt+j will therefore depend on ~tlt.
For example, the standard deviation of the filtered regime probability ~mt given Yt
can be calculated as
Far the conditional moments of states given the full-sample information YT is valid
analogously to:
Var (~tIYT) E[(~t - ttIT)(~t - ttIT)'IYT]

E[~t~;IYT] - E[~tIYT]E[~;IYT]
diag ettlT) - ttITt~IT· (5.20)
E[~t~~+h\YTJ - E[~tlYT1E[~~+hIYT]
h' ~ ~,
Var [~tlYTl F - ~tlT ~t+hIT' h> 0, (5.21)
88 The BLHK Filter
where the parameter vector ). is assumed to be known. In practice, when the para-
meter vector ). is unknown, it is convenient to replace the true parameters with their
estimates. In the next chapter the maximum Iikelihood estimation is discussed. As it
will be shown, the path of smoothed regime probabilities plays a dominant role even
for the solution of the estimation problem.
5.A.2 A Technical Remark on Hidden Markov-Chains: The

MSUMSIH(M)-VAR(Q) Model
If p = 0, the filtering and smoothing algorithms produce only the M -dimensional

vector ~tlT containing the marginal probabilities Pr( St IYT ). However the estimation
of the transition probabilities P = [Pij] is based on the joint distribution of St+ I and
St conditional on the full-sample infonnation YT . The probabilities Pr(StH, StIYT)
can be derived as the product of marginal and conditional probabilities:
Since St is Markovian, we can again use the property stated in (5.12):
rSt+I,St IYT -
P ( )- Pr(stHlst)Pr(StHIYT)Pr(StIYt)
P( I'J) .
r St+1 .It
Hence, the desired joint probabilities Pr(StH = j, St = iIYT ), t = 1, ... , T - 1

can be traced back to the smoothed probabilities {j,tHIT = Pr(StH = jIYT ), the
predicted probabilities {j,t+llt = Pr(StH = jIYt), the filtered probabilities ~i,tlt =
Pr(St = iIYt), and the transition probabilities Pij = Pr(St+l = jlSt = i).
Calculating the joint probabilities for all St, St+1 = 1, ... , M yields the (M 2 x 1)
vector ~~I~ of regime probabilities {~~IT = Pr( StH =j, St =iIYT), m=(j -1)M +i:
vec
(P) i(l)
0 [ ( "'tHIT 0
i(l») ® "'*
"'t+llt
i(l)] '
(M 2 xl) (Mxl) (MX1) (Mx I)
where the filtered and smoothed probabilities {tls == {~f:l) = {~i~ are obtained by
the procedures (5.4) and (5.13).
Chapter6
Maximum Likelihood Estimation
In the last chapter attention was given to the determination of the state vector ~ for
given observations Y and known parameters).. In this chapter the maximum like-
lihood estimation of the parameters). = (BI, pI, ~b)1 of an MS-VAR model is con-
sidered. The aim of this chapter is (i.) to provide the reader with an introduction to
the methodological issues of ML estimation of MS-VAR models in general, (ii.) to
propose with the EM algorithm an estimation technique for all discussed types of
the MS-VAR models, (iii.) to inform the reader about alternative techniques which
can be used for special purposes or model extensions and (iv.) to give some basic
asymptotic results.
Thus, this chapter is partly a survey, partly an interpretation, and partly a new con-
tribution; preliminaries for the ML estimation are considered in the following two
sections. Section 6.1 gives three alternative approaches to formulate the likelihood
function of MS-VAR models which it will be seen, have turned out to be useful.
Section 6.2 discusses the identifiability of MS(M)-VAR(P) models. An identifiab-
ility result for hidden Markov-chain models provided by LEROUX [1992] is exten-
ded to our augmented setting. In Section 6.3 the normal equations of ML estima-
tion of MS-VAR models are derived. At the center of interest is the EM algorithm
which has been suggested by HAMILTON [1990] for the statistical analysis of time
series subject to changes in regimes. In the literature the regressions involved with
the EM algorithms are developed only for vector systems without autoregressive dy-
namics. We analyze the critical points; in particular, we relax the limitation in the
literature to MSI(M)-VAR(O) models thus allowing the estimation of genuine vec-
tor autoregressive models. It is shown that the implementation of the EM algorithm
to MS(M)-VAR(P) models causes some problems. Therefore the discussion is re-
stricted to an MS regression model, but one which captures all MSI specifications.
90 Maximum Likelihood Estimation
A concrete discussion of the ML estimation of the various model types via the EM al-
gorithm is left for Chapter 9. Extensions and alternatives which have been proposed
in the literature are considered in Section 6.5. In the closing Seetion 6.6, the asymp-
totic properties of the ML estimation ofMS-VAR models are discussed; in particular,
procedures for the estimation of the variance-covariance matrix of the ML estimates
are suggested.
6.1 Tbe Likelibood Function
In econometrics the so-called Markov model 01 switching regressions considered by

GOLDFELD & QUANDT [1973]
Yt = x~ßm + Umt, Umt '" NID (0, CT~) for m = 1,2

has been one of the first attempts to analyze regressions with Markovian regime
shifts. GOLD FELD & QUANDT claimed to derive maximum likelihood estimates by
maximizing their "likelihood" function, which would be in terms of our model
T
Q(8, p, ~o) = II 7Jt(8)/~tlo(p, ~o), (6.1)
t=l
where 7Jt is again an (M xl) vector collecting the conditional densities

p(YtIYt-l,8m ),m = 1, ... ,M, and ~tlO = Ft~o are the unconditional regime
probabilities.
By using this function of prior regime probabilities ~tlO (p, ~o) which can be approx-
imated by the ergodie probabilities ((p) for sufficiently large t instead of the "pos-
terior" inference €tlt-l, GOLDFELD & QUANDT are not required to provide filtering
procedures to reconstruct the time-path of regimes. The model's parameters are es-
timated by numerical methods.
Unfortunately, the function Q(8, p, ~o) is not the likelihood function as pointed out
by COSSLETT & LEE [1985]. However, equipped with the results of Chapter 5 it is
possible to derive the likelihood function as a by-product of the BLHK filter:
L(..\IY) .- p(YTlYo;..\)
6.1. The Likelihood Function 91
t=l
T
TI LP(Ytl~t, Yt-l, 0) Pr(~tlYt-l,'x)
t=l €t
T T
TI 7J~ttlt-l = TI 7J~ Ftt-l1t-l . (6.2)
t=l t=l
As seen in Chapter 1, the conditional densities p(Ytl~t-l = Li, Yt-d are mixtures
of nonnals. Thus, the likelihood function is non-normal:
L('xIY)
T N N
TILLPij Pr(~t-l = LiIYt-1,'x) p(Ytl~t = tj,Yt-1,O)
t=l i=l j=l
g~ T N N
j;PtJ ~t.t-1It-1
" A, {
(211")
-K/2
I~J I
,-1/2
exp
1
( __
2 Ujt~j
I -1 ,
UJt
)}
,
where Ujt = Yt - E[Ytl~t = Lj, Yt-1] and N = MP+1 in MSM specifications or

N = M otherwise.
For maximization purposes, this fonnulation of the (conditional) likelihood function

is quite impractical, since the filtered regime probabilities ~t-1It-1 themselves are
non-linear functions of the parameter vector ,X.
FRIEDMANN [1994] has proposed inserting the c1osed-form solution (5.9) for
tt-1I t - l into equation (6.2). This procedure leads to the following algorithm for
determining the likelihood function:
L('xIY) 7J~tTIT-1L('xIYT-1 )
T T
TI 7J~ttlt-1 = TI 1:'" diag (7Jd Ftt-1It-1
t=l t=l
(6.3)
with
where the transition matrix F = (h, ... , IN) is such that K t = ("lt 0 h, ... ,Tlt 8
IN).
For the estimation procedures to be discussed in the following sections, a further
setting-up of the likelihood function will be employed wh ich makes use of the exo-
genity of the stochastic process ~t:
L('\IY) := p(YI'\) J p(Y, ~I'\) d~

Jp(YI~, 0) Pr(~lpl ~o) d~, (6.4)
where the integration denotes again summation over all possible values of ~ = ~T 0
~T-l 0 ... 0 ~l' Later, these cumbersome calculations are simplified to a recursive
algorithm using the Markov properties:
T
p(YI~, 0) rrp(Ytl~t, rt-l,O),
t=l
Pr(~lp, ~o) rr
T
t=l
Pr(~tl~t-ll p).
So far we have introduced three approaches to formulate the likelihood function.

Equation (6.2) has the advantage of relying on the (N x 1) vectors "lt and ~tlt-l
for t = 1, ... , T, which are needed for the BLHK filter of Chapter 5 in any case.
Thus, (6.2) will be used to calculate the likelihood L(>') of a parameter vector >.. In
a theoretical sense, (6.3) is quite interesting since it demonstrates that the likelihood
function is linear in the initial state ~o (see Section 6.3.3). However, (6.4) is used in
the next section for the derivation of the maximum likelihood estimates since it splits
the Iikelihood function into two terms, where the first term depends only on the VAR
parameters () and the second exclusively on the Markov chain parameters p and ~o.
6.2 Tbe Identification Problem
Before we can proceed to the actual estimation, a unique set of parameters must be
specified. Maximum likelihood estimation presupposes that the model is at least
6.2. The Identification Problem 93
locally identified. For a subclass of MS-VAR processes (the hidden Markov-chain

model) the issue of identifiability has been proven by LEROUX [1992] whose con-
tribution is generalized in the following considerations.
In MS-VAR models, an identification problem may be superficially caused by the in-

terchangeability of the labels of states. As the indices of states of the Markov chain St
can be permuted without changing the law ofthe process B(St, A) and Yt, MS-VAR
models are not strictly identified. However, this interchangeability of the label of
states and sub-models can be avoided by some prior beliefs with regard to the char-
acteristics of regimes. For example, the states may be ordered in MSM(M)-VAR(P)
models so that they are increasing in the mean of the k-th variable, J-Lkl < J-Lk2 <
... < J-LkM, which is obviously not restrictive for the empirical analysis.
In general, this problem is avoided by checking the identifiability for an equivalence

dass; parameter vectors AI , A2 belong to the same equivalence dass A if and only if
they define the same law forthe VARparameterprocess B(St; Ai), i.e. there are initial
distributions ~o (A i), i = 1,2 such that (i.) B( St; Ai) is a stationary process, where St
is govemed by a Markov chain determined by the transition probabilities collected
in p(A i) and the initial distribution ~o (A i), and (i i. ) the processes B( ~t ; Ai), i = 1, 2
have the same laws. Identifiability of the MS-VAR model requires that identifiable
equivalence classes contain only structures obtained by permutation of the indices
of states. Generalizing the arguments of LEROUX [1992], this can be expected if
the VAR parameter vectors BI, ... ,B M are distinct and the Markov chain associated
with p is irreducible and aperiodic, and hence has a unique stationary distribution ~.
This statement defines the first necessary condition for the identifiability ofMS-VAR
models.
This observation implies that the linear VAR(P) model with parameter vector BO as a
nested special case of an MS(2)-VAR(P) model is not identifiable, since all structures
with BI = B2 = BO as weH as aH structures with P = 1/'~ and Bm = BO belong
to the same equivalence dass. The non-identifiability of a structure with BI = B2
causes problems for tests where the number of identifiable regimes is changed under
the null; this issue will be discussed further in Chapter 7.
It has been seen in Chapter 3 that the assumption of a well-specified distribution

function is essential for identification. Otherwise, identification cannot be ensured.
For example, an MSH(M)-VAR(P) model with i. i .d. regimes and Gaussian errors Ut
would be observationally equivalent to a linear VAR(P) model where the distribution

of eITors Ut is a mixture of M normals.
This consideration is formalized in a second regularity condition for the identifi-

ability of equivalence classes: the family of mixtures of at most m elements of
!(YtIYt-l, 0) has to be identifiable. The finite mixture with m or fewer components
determines a unique mixing distribution, if the identity
m m
(6.5)
i=l i=l
is satisfied if and only if we can order the summations such that (i = {; and Oi = 0i
for i = 1, ... , m. Under the assumption (A6) made in Chapter 2 this condition is
fortunately fulfilled since the class of normal density functions is identifiable.
For hidden Markov-chain models LEROUX [1992] has shown that under this regu-
larity condition the equivalence classes are identifiable.
Reconsideration of (6.4) verifies that the likelihood function of an MS-VAR model

is a finite mixture of products of conditional densities ! (Yt IYt-l , O( St, .\)) with pos-
itive mixing proportions,
MT
L~I(P(.\),~O(.\)) p(YIYo, O(el' A» (6.6)

1=1
where{ = 6060 .. .0~T is a (MT x 1) vector, O(e, A) = O(S1, A)0 .. .00(ST, .\),
and
T
p(YIYo, O(e l , .\)) II !(YtIYt-l; O(St, .\)),
t=l
M T
~I(P(.\), ~o(.\)) L (so II P S t_lSt (.\).
so=1 t=l
Employing standard results of the statistical theory of linear systems it is clear that
p(YT lYo, O({/, .\)) is a Gaussian density and that O({l'.\) would be identifiable.
6.3. Normal Equations of the ML Estimator 95
Hence, the critical point is whether the structures .V and ),2 define the same joint
density (6.6) only if they define the same mixing distribution (6.5), i.e. they belong
to the same equivalence cIass. It follows from TEICHER [1967] that (under inde-
pendence) the identifiability of mixtures carries over to products of densities from a
specific family. Using the argument of LEROUX [1992], that the result of TEICHER
is valid also for finite mixtures with a fixed number of components, we concIude that
the identifiability of (6.6) is ensured if and only if the identifiability of (6.5) does.
Thus .V and ),2 produce the same stationary law for Yt if and only if ),1 and ),2
are identical or differ only in the numeration of the states. This identifiability re-
sult is in line with previous suggestions of KARLIN & TAYLOR [1975] and WALL
[1987] where the latter has addressed the identification of varying coefficient regres-
sion models presupposing non-stochastic explanatory variables. Some useful proofs
can be found in BAUM & PETRIE [1966] and PETRIE [1969].
6.3 Normal Equations of the ML Estimator
The maximum likelihood (ML) estimates can be derived by maximization of likeli-

hood function L(),IY) given by equation (6.4) subject to the adding-up restrictions 1
PIM 1
l:W~o 1
and the non-negativity restrietions
p 2: 0, Cl 2: 0, ~o 2: 0.
If the non-negativity can be ensured, the ML estimate ), is given by the first-order

conditions (FOCs) ofthe constrained log-likelihood function
In L *(),)
1 For simplicity in notation we consider here explicitly an M -dimensional state vector as in MSI specific-
ations. The results can be straightforwardly transferred to MSM specifications where the dimension
of the initial state vector is MP.
Let Kl and K2 denote the Lagrange multipliers associated with the adding-up restric-
tions on the matrix of transition probabilities, i.e. p, and the initial state ~o. Then the
FOCs are given by the set of simultaneous equations
BlnL(.:\IY)
BB'
o
BInL(.:\IY) '(1' I) o
Bp' - Kl M 0 M
BIn L( .:\IY) l'

0,
B~b - K2 M
where it is assumed that the interior solution of these conditions exits and is well-
behaved, such that the non-negativity restrictions are not binding. These FOCs are
now calculated successively for the VAR parameter vector B, the vector of transition
probabilities p, and the initial state ~o.
6.3.1 Derivatives with Respect to the VAR Parameters
The derivation of the log-likelihood function conceming the parameter vector Bleads
to the score function
BInL(.:\IY)
BB'
JBp(YI~,
~
L BB'
B) P (CIC
r '> ,>0,
) dC
P '>
~ JBInp~~I~, B) p(YI~, B) Pr(~I~o, p) d~

_J81np~~le, ,X) Pr(elY,.:\) d~.
Thus the scores conditioned on a given regime vector~,
8Inp(YI~, .:\)
BB'
are weighted with the conditional probabilities Pr(~IY,.:\) of this regime vector,
where we have used the definition of conditional probabilities
p(YI~, B) Pr(~I~o, p)
Pr(~IY, .:\)
Jp(Yle, B) Pr(eleo, p) d(
6.3. Nonnal Equations of the ML Estimator 97
Thus, the FOC for 0 is given recursively by
T
BInL(AIY) = ' " ' " BInp(yt!{t, Yt-l, A) P (t IY; A) O. (6.7)
BO' L., L., BO' r ... t T,
t=l €t
In matrix notation, (6.7) yields:
0, (6.8)
where In 1]t is defined as folIows:
ln 1]lt ] - ~ In{2x) - pn I EI: - !u;, (-y )E. u1t(-y)1

].
[
In ~t := In ~Nt
[
-~ In(27r) - ! In IEMI- !u'rvt(J)EA:luNt(J)
6.3.2 Derivatives with Respect to the Hidden Markov-Chain

Parameters
Maximization of the constrained likelihood function with respect to the parameter

vector p of the hidden Markov chain leads to
BlnL(AIY)
Bp'
~ Jp(YI{,(J)BPr~~~o,p) d{
~J BInp~~,I{o'P)p(YI{,())pr({I{o,p) d{
J BIn P~~I{o, p) Pr({IY, A) d{.
Hence, the derivatives for each component Pij of p = vec ( P) are given by
T
BIn L(AIY)
BPij
LLL BInPr({t!{t-l,l)p(t
a
t
r ... t, ...
Pij
t-l
IY; A)
T,
t=l €t €t-l
where we have used that
ßPr(~tl~t-1, I) if ~t = Lj, ~t-1 = Li

ßPij otherwise
Collecting these derivatives in a vector yields:
ßlnL('xIY)
ßp'
The maximization problem is constrained by the M adding-up restrictions P 1 M =

1 M, respectively
(6.9)
which alters the FOC such that
ßlnL('xIY) '(' I) =,C(2)' 0 p, - '(I'M ® IM) 0,(6.10)

ßp' - 1\,1 1M ® M 1\,1
where 1\,1 is the vector of corresponding Lagrange multipliers. Solving equation

(6.10) for p yields:
(6.11)
Applying the adding-up restriction (6.9) to equation (6.11) results in
and hence
e(1)
A
01\,1
,
= 1M , (6.12)
where ,c(1) = (I'M 'CI,

IM)C(2)
tO.
= ~T-1
L.d=O ,(1)
<"t· Thus, th e vect or 0 f correspond'mg
Lagrange multipliers 1\,1 is given by
(6.13)
Since the score has the property that ßßln L -+ 00 if Pij -+ 0, there always exists
Pij
an interior solution for p, which is determined by equation (6.14), which in turn is
derived by inserting (6.13) into (6.11):
p (6.14)
6.3. Normal Equations of the ML Estimator 99
Thus, the ML estimator of the vector of transition probabilities p is equal to the trans-
ition probabilities in the sampie calculated with the smoothed regime probabilities
Pr(St = j, St-l = iIY), t = 1, ... ,T, i, j = 1, ... ,M collected in ~(2) (A).
6.3.3 Initial State
Maximization of the constrained likelihood function conceming the initial state ~o
of the hidden Markov chain yields the interior solution:
alnL(AIY)
a~b
J
~ P(y,~,B)apr~~r'p) d~
~ Jalnp~~~I~o, p(YI~, Pr(~I~o,p) d~
p) B)
Jalnp~~~I~o,P) Pr(~IY,A) d~
L alnp~~~I~o,p) .Pr(6IY,A).
6 ~o
If the initial state is assumed to be fixed but unknown, the desired derivatives are
given by
alnPr(6 =tjl~o,p) _p (e _ .Ie )-lF ..
a~iO -
r <,,1 - t) <,,0, P )l'
In matrix notation one has
alnL(AIY)
(6.15)
a~b
Consider now the FOC of the constrained ML problem with 1~~0 = 0,
alnL(AIY) l'
a~b - /'1,2 M
o.
Inserting equation (6.15), yields (after some algebraic manipulations):

Since IM~O = 1 implies ""2 = 1, we get the following solution for ~o:
~O (6.16)
It is worth noting that the smoothed probability solution toIT(,x) for ~o in equation
(6.15) is a function of ~o itself. Furthennore, an analysis of the equivalent fonnula-
tion of the likelihood function (6.3) shows that the likelihood function is linear in ~o,
such that the interior solution (6.16) does not necessarily provide the global max-
imum. Hence, irrespective of whether the initial state ~o is assumed to be fixed to
one regime m* or stochastic with probabilities ~olo, the ML estimate is given by the
boundary solution:
~o = Lm ·, with m* := arg max 1

l<i<N
M( 11 K
T-l
T- j
)
Li, (6.17)
- - j=O
whereKT_j = diag (1]T-j) F as in (6.3). Ifanestimate fortheinitial state ~o is pre-

ferred, which is doser to the ergodic distribution, the BLHK smoother can be used to
detennine €OIT. This would also correspond to the situation in HAMILTON [1990].2
However, both procedures provide no consistent estimator of ~o due to the binarity
of its components.
The problem of assigning initial values can be overcome by assuming that the un-
conditional prob ability distribution of 6,
*'
~110' is equivalent to the ergodic probab-
ility distribution ~, since the ergodic probability distribution ~ is a function of the
transition probabilities p. Thus, a~~!O = would have to be included in the FOC
(6.15) of p.
Equations (6.16)/(6.17) complete the system of nonnal equations of the ML estima-

tor when the data generating process is an MS(M)-VAR(P) process. The solution,x
of the FOCs (6.8), (6.14) and (6.16)/(6.17) involves the conditional regime probab-
ilities Pr(~t IY, ,x). Due to the resulting non-linearity of the first order conditions of
,x, the maximization ofthe likelihood function is a non-linear optimization problem.
Its solution requires numerical techniques that maximize In L(,xIYT ) iteratively. In
2ForMSM specifications, {~MP) = {00{-l0 .. .06- p can bedetennined uniquely by using (6.16).
FRIEDMANN [1994] has proposed selecting only {i~p' while the initial state vector is determined by
j(p)
"010
= p,pc(l)
"l-p
0 ... 0 p'c(l) 0 c(l) .
"l-p "l-p
6.4. The EM Algorithm 101
the following sections, alternative algorithms are introduced that deliver maximum
likelihood estimates of the parameter vector A = «(), p, ~o) for given observations
YT = (y~, ... , Y~_p)1 by maximizing the conditionallog-likelihood function nu-
merically.
6.4 The EM Algorithm
As shown in HAMILTON [1990], the Expectation-Maximization (EM) algorithm in-

troduced by DEMPSTER et al. [1977] can be used in conjunction with the BLHK fil-
ter presented in Chapter 5 to obtain the maximum likelihood estimates of the model 's
parameters.
The EM algorithm is an iterative ML estimation technique designed for a general

class of models where the observed time series depends on some unobservable sto-
chastic variables. For the hidden Markov-chain model (cf. Seetion 3.1) an early pre-
cursor to the EM algorithm was provided by BAUM et al. [1970] building upon ideas
in BAUM & EAGON [1967]. The consistency and asymptotic normality of the pro-
posed ML estimator were studied in BAUM & PETRIE [1966] and PETRIE [1969].
Their work has been extended by LINDGREN [1978] to the case of regression models
with Markov-switching regimes.
Each iteration of the EM algorithm consists of two steps:
• In the expectation step (E), the unobserved states ~t are estimated by their
smoothed probabilities €tIT. The conditional probabilities Pr(~IY, A(j-I)) are
calculated with the BLHK filter and smoother by using the estimated para-
meter vector A(j -1) of the last maximization step instead of the unknown true
parameter vector A.
• In the maximization step (M), an estimate of A is derived as a solution>' of

the FOCs (6.8), (6.14) and (6.16), where the conditional regime probabilities
Pr(~tIY, A) are replaced with the smoothed probabilities €tIT(A(j-I)) of the
last expectation step. Thus, the dominant source of non-linearities in the FOCs
is eliminated. If the score, i.e. the gradient of In L(AIYT), would have been
linear in ~, this procedure were equivalent to replacing the unobserved latent
variables ~ in the FOCs with their expectation ~tIT.
Table 6.1: The EM Algorithm
I. Initialization
ß. Expectation Step
1. Filtering (forward recursion t = 1, ... ,T):
'T}t 0 ~tlt-l 'T}t 0 F~t-llt-l
2. Smoothing (backward recursion j= 1, ... ,T-l):
~T-jIT
Iß. Maximization Step
1. Hidden Markov Chain Step: p
2. Regression Step: Normal Equations for 0
~ c' [BIll'T}t]
~ <"tiT BO' O.
t==l
3. Initial State: ~o
~o ~OIT'
IV. Iterate Steps ß & Iß until Convergence

Equipped with the new parameter vector ,\ the filtered and smoothed probabilities are
updated and so on. Thus, each EM iteration involves a pass through the BLHK filter
and smoother, followed by an update of the first order conditions and the parameter
estimates and is guaranteed to increase the value of the likelihood function.
General results available for the EM algorithm indicate that the likelihood function
increases in the number of iterations j. Finally, a fixed-point of this iteration sched-
ule ,\ (j) = ,\ (j -1) coincides with the maximum of the likelihood function. The gen-
eral statistical properties of the EM algorithm are discussed more comprehensively
in RUUD [1991].
An overview on the EM algorithm for the maximum likelihood estimation of MS-

VAR processes is given in Table 6.1 which might also serve as a guideline for the
following discussion of the statistical procedures involved by this algorithm. More
precisely, the next sections are dedicated to the discussion of the modified first order
condition (6.8) for the vector ofVAR parameters e= (,', (7')' consisting ofthe vec-
tor of regression coefficients, i.e. , = (v', a')' respectively, = (/1,', a')' and the
vector (7 of variance-covariance parameters. In particular, it is shown that the max-
imization step involves familiar GLS regressions.
6.4.1 Estimation of 'Y
It could be easily shown that the FOC's of the model where the smoothed regime
probabilities Pr(~IY,'\) are not determined simultaneously with the parameter vec-
tor'\ but calculated with a second predetermined parameter vector ,\ (j -1) is equival-
ent to maximization of the following objective function as pointed out by HAMILTON
[1990],
(6.18)
Equation (6.18) denotes the expected log-likelihood for ,$j) given a distribution
parameterized by ,\(j-I). After some algebraic manipulations,
JIn p(Y, ~I'$ p(Y, ~I'\ U-I») d~

J (p(YI~,'\) Pr(~I'\)
In ) Pr(~IY, ,\U-I») p(YI,\U-I») d~
p(YI>.U- 1 ») JInp(YI~, Pr(~IY, d~

>') >.U- 1 »)
+ p(y/>.U-l») JInPr(~/>.) Pr(~ly,>.(j-l») d~,

we finally get
l(>'!YT, >.U- 1») (6.19)

T
cx: L L Inp(ytl~t, Yt-l,(1) Pr(~t/YT, >.U-l»)
t=l ~I
T
+L L In Pr(~tl(t-l, p) Pr((t, (t-l!YT, >.U- 1 )}. (6.20)
t=l ~t-l
Thus, the j-th maximization step of the EM algorithm maximizes the object-
ive function (6.18). Let 5. denote the maximizer of the expected log-likelihood
t'(>'/YT, >.U-l») conditional on >.U-l). Then 5. is the ML estimator of >. when the
algorithm has converged, i.e.
In the following, we will drop the >.U-l) indicating the parameter vector used for
the reconstruction of the Markov chain t'(>.IYT) == t'(>.IYT, >.U-l») for notation al
simplicity.
Since we are here interested only in the estimation of the VAR parameter vector 0,
we can concentrate our analysis on the first part of (6.19). By using the normality of
the conditional densities,
p(YtlSt = m, Yt-b 0) = (27r)-K/21L: ml- 1 / 2 exp { - ~Umt(T)'L:;;/Umt(T) }

the expected likelihood function becomes:
l(OIYT)
T N
cx: const. - ~L L ~mtlT {Kln(27f) + In lL:ml + Umt(T)'L:~IUmtb)}
t=l m=1
A T A
where Tm = 'L:t=l ~mtIT.
For the sake of simplicity, we will consider here only MS-VAR models, which are
linear in the vector I of structural parameters
M
Yt L ~mtXmt'Y + Ut
m=l
such that the residuals at time t associated with regime mare given by
Yt - Xmt'Y.
As seen in Table 2.3, this assumption is guaranteed e.g. for MSI-VAR models, where
I = (1/', 0:')' and
X mt = [(t~, Y~-l' ... , Y~_p) 0IK].
Hence, the ML estimation of these models to be presented in Section 9.3.4 is a
straightforward application of the procedures which we are going to discuss here. In
Section 9.3.5 we shall also discuss how the procedures developed in this seetion can
be applied to MSM-VAR models which are non-linear in the structural parameters
0: and J-L. The linearity of the score in 0: conditional on J-L et vice versa will provide
the key.
Next, we show that the linearity of the model in the parameter vector I results in a
generalized linear regression model with many observations per cell, where the frac-
tional number of pseudo-observations(Yt, X mt , 6 = t m ) is given by the smoothed
regime probabilities
For calculating the derivatives of the expected likelihood function, the following mat-
rix notation will turn out to be useful:
M
f(BIYT ) cx: const. -~ L {i'mlnIEml+ u m(!)'(3m 0E;;/)um(!)},
m=l
1 MAI
cx: const. - 2L Tm In IEml- 2 uh),W- 1 u(!), (6.22)
m=l
106 Maximum Ljkelihood Estimation
where
w- 1
(MTKxMTK)
(TxT)
o ] = diag (~~)
tmTIT
u(r)
(MTKxl)
Um (1) y- X m 11
(TKxl)
X Xm
(MTKxR) (TKxR)
The ML estimates of the structural parameters 1 are given by the well-known GLS
estimator, since obviously:
ßf(OIYT)
(6.23)
81
Substituting for u = 1M 0 Y - X1 and setting (6.23) to zero results in a linear set

of normal equations which can be solved analytically as
(6.24)
Thus, the regressions necessary at each maximization step are GLS estimations
where the pseudo-observations (Yt, X mt1 ~t = t m ), m = 1, ... IN, are weighted
with their smoothed probabilities ttIT(.~(j-l)):
X'M
X'M
By using the definitions of the above-mentioned X and W- 1 matrices, equation

(6.24) can be written in a slightly different form, which reduces the computational
effort significantly, as
Furthermore, if the regressors in each equation are identical X mt

Xm = Xm ® IK, equation (6.25) can be simplified further to:
A thorough analysis of the estimator in particular MS-VAR models presented in

Table 1.1 is given in Chapter 9. There it is shown that in MSM-VAR specifications,
as weH as MS models with a time-invariant mean, a difficulty arises from the non-
linearity of Ymt = E[Yt I~t = /'m, Yt-l] in a and f-L. However, conditional on f-L, Ymt
will be linear in a and conditional on a, it is linear in f-L. This procedure will result
in linear normal equations which can be solved iteratively as in Section 9.3.5.
We continue the discussion with adescription of the estimation of variance para-

meters at each maximization step of the EM algorithm.
6.4.2 Estimation of (J under Homoskedasticity
Consider first the ML estimation of variance parameters when E m = E holds.

Hence, given that w- 1 = diag (t~It' ... ,t~lt) ® E-l we can obtain a different
expression for the log-likelihood function which will be useful in order to detennine
the ML estimator of E:
KT T 1
t'(AIYT ) oe const. - -2-ln(27r) - "2lnIEI- 2u*(-y)'W*-lU*(,),
whereu*(-y) = [diag (V€U1t"",V€Nl1t"",V€lT1t"",V€NT1t) <8>IK] u(-y),

and W*-l = (IT <8> E- 1 ).
The partial derivatives of the expected log-likelihood with respect to the elements of
E are
at'(AIYT) _
-'-:-:'::---'- -
T ~-l
--LJ -
1 ~-l U *( , )' U *( , )~-l
-LJ LJ •
aE 2 2
Setting this expression to zero and solving for E gives us

M
~() = TU,
l...J' 1 _*( )'_*()
U , = T1L..t
" Um_ (, ),,::., _ ( )
'::'m Um , . (6.27)
m=l
6.4.3 Estimation of (J under Heteroskedasticity
If regime-dependent heteroskedasticity will be assumed, it is convenient to write the

expected likelihood in the form
In order to detennine the ML estimates of EI, ... , E M, the system of first order par-
tial derivatives is needed. By means of standard matrix differential ca1culus (cf. e.g.
MAGNUS & NEUDECKER [1994]) we get
ae(AIYT ) = _ TmE - 1 _ ~E-I * ( )' * ( )E- 1

aE m 2 m 2 m Um' Um' m'
(6.28)
Setting this expression to zero and solving for E m results in
(6.29)
Again, it is easily verified that the maximization of f( >'1 YT ) yields the modified FOC:
0, (6.30)
where O"m = vech (E m ). Inserting
8 In 17mt
80"m = -"21 DK vec (-1
I
Em -
-1 ( I -1
E m Umt ')')umt{t) E m ), (6.31)
where DK = :v::~\~:)) is the (K 2 x K(K + 1)/2) duplication matrix, as in

LÜTKEPOHL [1991, A. 12.2], in (6.30), where
8 In 17it
8
O"m
= 0 forz. #- m,
results to:
T
'" i 8In 11mt
6<" mt lT 80"'
t=1 m
-~ 2: D~ vec (tmtITE;;/ - tmtITE;;/umt{t)umt{t)'E;;/) = 0,

t=l
wh ich is equivalent to (6.28).
The interdependence of the estimates for ')' and 0" theoretically requires iterating
between the equations (6.24) and (6.27)/(6.29) within each maximization step.
However, as in the GeneraIized EM algorithm (cf. DEMPSTER et al. (1977] and
RUUD [1991 D, it can be sufficient to do only one single (estimated) generalized least
squares (GLS) estimation within each maximization step to ensure convergence to
a stationary point of the log-likelihood. In order to compromise convergence re-
quirements and computation time, the convergence criterion of the internal iteration
within each maximization step may be formulated less restrictive than the criterion
of the EM algorithm.
6.4.4 Convergence Criteria
The iteration steps of the EM algorithm are repeated until convergence is ensured.
For this purpose, different convergence criteria can be used. The first criterion is
related to the relative change of the log-likelihood
In L(A{i+1) IYT ) -ln L(A(j) IYT )

~1 = In L(A(j) IYT)
In addition, the parameter variation rnight be taken into account with a given norm
11 . 11, such that
If a maximum norm of the absolute change of the parameter values is used, we have
Alternatively, the (root of) mean of squared relative parameter changes rnight be con-
sidered:
~ R (A~i+l) _ A~j) ) 2
~2b:= R ~ A~j) ,
where R is the number of non-negative parameters A~j) ::f. O. The recursion stops if
convergence is achieved, i.e. if the changes of the log-likelihood and the parameters
are negligibly smalI: ~i ~ €i for all i = 1,2. Note that in ~2a and ~2b, the discrete
parameter vector ~o is not included in A. Finally, the EM algorithm is terrninated if
the number of iterations j exceeds a previously specified upper bound j > €4.
6.5 Extensions and Alternatives
The EM algorithm can easily be extended to models with endogenous transition

probabilities, as will be shown in Chapter 10 and to more general specifications of
the conditional variance, cf. e.g. the MS-ARCH model of HAMILTON & SUSMEL
[1994].
The EM algorithm has many attractive features; foremost among these are its com-
putational simplicity and its convergence properties. In our experience, the method
finds estimates in the region of the maximum reasonably quickly from arbitrary ini-
tial values. Among the undesirable features is the drawback that it does not produce
the information matrix automatically. However, the EM algorithm may be completed
by the procedures proposed in Seetion 6.6.2 for the estimation of the asymptotic vari-
ance matrix.
6.5. Extensions and Alternatives 111
Although the EM algorithm has good convergence properties even starting far away
from the maximum of the log-likelihood function, elose to the maximum, it con-
verges rather slowly. An algorithm which has attractive convergence characteristics
elose to the maximum is the scoring algorithm, which will be discussed in the next
section.
6.5.1 The Scoring Algorithm
As we have seen, the maximization of the log-likelihood function is due to the non-
linearity of the first order conditions for 5. a highly non-linear optimization problem.
Its solution requires numerical techniques that maximize In L(AIYT ) iteratively. A
popular numerical optimization method uses gradient algorithms (cf. LÜTKEPOHL
[1987]). The general form ofthe j-th iteration step is
(6.32)
where h j is the step length 3 in the j-th iteration, H j is a positive definite direction
matrix and ST(A (j)) is the score defined as the gradient of In L(AIYT ) at A(j),
ST(A(j)) = BInLCAIYT ) I . (6.33)

BA ).=AUl
The various gradient algorithms differ in the choice of the direction matrix H j (cf.
e.g. JUOGE et al. [1985, sec. B.2]). The scoring algorithm uses the inverse of the
information matrix:
H j = [I(A(j))] -1 ,
i.e. the negative of the inverse of the expectation of the Hessian,
Thus the method of scoring requires the score vector and the information matrix. Por
parsimoniously specified models it might be possible to derive the expressions forthe
3There are nurnerous ways to choose the step length hj. For sake of simplicity, it can be set to one,
hj = 1. A more efficient method is a grid search in a set ofincreasing positive values of hj in order
to maximize In L j +1 ( h j) = In L (,\ (j+ 1) ( hj ) ) , where the search stops after the first decline in the
likelihood. Then, the optimal step Iength is chosen either as the preceding value for h j or via quadratic
interpolation as the maximizer of a quadratic polynomial in In Lj +1 (hj) over the last three points.
score and the infonnation matrix analytically. In practice they are usually derived
numerically, where the infonnation matrix is approximated as i ().. (j)) by dropping
the expectation operator and by substituting the true parameter vector ).. with ).. (j) .
Altematively, an estimate of the infonnation matrix can be derived via BERNDT et al.
[1974]. This algorithm will be discussed in Section 6.6.2 in more detail.
The required derivatives of In L()..IY) might be computed as
8InL(~IYT) I
8)..1 )..=)..Ul
where ).. + := ).. (j) + c~j) Li, ).. - := ).. (j) - c~j) Li> c1 is a small positive number
and Li is the i-th coIumn of the identity matrix. The resuiting approximated infor-
mation matrix is assumed to be positive definite. If this assumption is violated, a
sufficiently large positive number c might be added to the elements of the main di-
agonaIofI()..(j)),Hj := [I()..(j))+d]-l.
Having evaluated the score vector and the infonnation matrix, the j -th iteration step
changesto
The method of scoring might be modified conceming the treatment of the initial state
parameters ~o. In each iteration step onIy the unknown elements of ).. t = (B', p')' are
estimated via scoring for given ~o. Then ~o is replaced by the smoothed probability
vector ~OIT' Thus, the recursion fonnulae are given by
).. t(j+l) )..t(j) +h} [i()..t(j)I~aj))]-l 8InL()..IYT ))

8)..t
I )..=)..U>
'
Finally, in order to check convergence, the criteria introduced in Seetion 6.4 can be
used.
In the literature, maximum likelihood estimation of the parameter vector )..

(B, p, ~o) for given observations YT = (Y!.r, . .. , Y~_p)' via numerical maximization
ofthe conditionallog-likelihood function In L()..IYT) is a widely used procedure, cf.
e.g. HAMILTON [1989],[1988]. Problems related to practical use ofthe scoring al-
gorithm conceming MS-VAR models have been mentioned by HAMILTON [1990].
More general problems in the context of normal state-space models have been dis-
cussed in WATSON & ENGLE [1983]. In particular, it has been noted inter alia by
LÜTKEPOHL [1991, p. 437] that even though the scoring mostly has good conver-
gence properties near the maximum, far off the maximum it may perform poorly.
As proposed by WATSON & ENGLE [1983], the most practical method seems to be
a mix of EM and scoring algorithms. While the EM algorithm can be used to move
the parameters quickly to the neighborhood of the maximum, scoring can be used to
pinpoint the maximum and to estimate the information matrix.
6.5.2 An Adaptive EM Aigoritbm (Recursive Maximum Likeli-

bood Estimation)
Building upon ideas introduced by COLLINGS et al. [1992], KRISHAMURTHY

& MOORE [1993a], [1993b] and HOLST & LINDGREN [1991],[1995], a recurs-
ive maximum likelihood estimation procedure has been proposed by HOLST et al.
[1994]. This procedure is also closely related to the adaptive EM algorithm sugges-
ted by SCHNEIDER [1991] for linear normal state-space models.
In contrast to the so far presented algorithms where each iteration was based on
the full-sample information, the t-th iteration of the recursive maximum likelihood
estimator uses only the first t observations (after an initialization period).
Thus, the recursive ML estimator of the MS-VAR model is given by
(6.34)
where St+ 1 (A (t)) is a score vector and H t is the adaptive matrix. The optimal choice
of the adaptive matrix is the inverse of the information matrix. However, for com-
putational reasons, the inverse of the observed information matrix is used:
L S.(A(·-l))s.(A(·-l))'.
t
~ (6.35)
.=1
The adaptive matrix H t is computed iteratively,

The crucial point of this procedure is to keep the adaptive matrix weIl behaved, i.e.
in particular positive definite. After an initial phase where the adaptive matrix is sta-
bilized, each observation Yt is processed separately.
There is a superficial similarity to the iterative step (6.32) of the scoring algorithm
with h t = (t + 1) -1. However, there are also important differences consisting of two
major points: first, at the t-th iteration only the first t observations are used, which
turns the algorithm into an on-line estimation technique. Secondly, the score func-
tion is not derived numerically, but involves an EM recursion
St('\) E [ BIn ~~'\IYt) Iyt]

[:,\ {In 1Jt('\ (t-l))' diag (vec (Pi)) }] [t~~l (,\ (t-l))] , (6.37)
where the filtered regime probabilities trlt are involved. Again, these are provided
by the BLHK filter. Note that equations (6.35) and (6.37) use that St_l(,\(t-l)) =
alnL(~IYt_d _ 0 Thus the conditional score h (,\(t-1)) _ alnp(YtIYt_l;~(t-l») -
a~(t 1) -. , t - a~(t 1) -
o and St(,\(t-l)) coincide.
A major drawback of this approach is its sensitivity to the adaptive matrix whose cal-
culation will become lengthy if large VAR pro ces ses with many parameters have to
be estimated. Note that the simulation results presented in HOLST et al. [1994] come
from a model which is restricted to only 8 parameters. Furthermore, the algorithm
provides only filtered regime probabilities. While this problem can be overcome by
a final full-sample smoothing step, the recursive EM algorithm will provide no full-
information maximum likelihood parameter estimates.
For large sampies, however, a combination of the recursive EM algorithm, the "full-
information" EM algorithm with the scoring algorithm might be favorable; perform
some iterations of the recursive EM algorithm to derive initial estimates for the full-
information EM algorithm, which is then used to come elose to the maximum. Tbe
EM algorithm will itself provide starting values for the scoring algorithm. In the final
step, the scoring algorithm is used to achieve the maximum ofthe log-likelihood and
to derive an estimate of the information matrix.
6.5.3 Incorporating Bayesian Priors
In the presence of information about the parameters beyond that contained in the
sampie, Bayesian estimation provides a convenient framework for incorporating
such prior information. 4
With this, any information the anal yst has about the parameter vector ..\ is represented
by a prior density p(..\). Prob ability statements conceming..\ after the data YT have
been observed are based on the posteriordensity p(..\IYT ), which is given via Bayes'
theorem by
(..\IY; ) = p(YTI..\)p(>')
p T p(YT ) '
where the density ofYT conditional on the value ofthe random variable >., p(YTI..\)
is algebraically identical to the likelihood function L(>.IYT) and p(YT) denotes the
unconditional sampIe density which is just a normalization constant. Hence, all in-
formation available on >. is contained in
p(..\IYT ) cx L(..\IYT ) p(..\). (6.38)
Note that for ftat priors, i.e. p(>') = const., the posterior density is proportional to
the likelihood function
p(>'/YT) cx L(..\IYT).
Thus, without reliable prior information, the mode of the posterior distribution is
given by the ML estimator..\. Analogously, equation (6.38) usually can be inter-
preted as a penalized likelihood function. However, it might be worth noting that
the Bayesian approach does not derive the distribution of the estimator ..\ but makes
an inference ab out ..\ which is itself regarded as a random variable in Bayesian stat-
istics. Thus, p( >'1 YT ) denotes the posterior distribution of the unknown parameter ..\
and not the distribution of tbe ML estimator .x.
For mixtures of normal distributions HAMILTON [l991a] has proposed a quasi-
Bayesian estimation implemented as a modification of the EM algorithm. The bene-
fit of a quasi-Bayesian analysis might be the capability of offering a solution for some
singularity problems associated with ML estimation and for cboosing between loeal
4The reader is referred to LÜTKEPOHL [1991, sec. 5.4] or HAMILTON [1994b, eh. 12] for an intro-
duction to the basic principles underlying Bayesian analysis with applications to time-invariant VAR
models.
116 Maximum LikeJihood Estimation
maxima of the likelihood function. While there is no natural conjugate prior for the
MS-VAR model (cf. HAMILTON [1993]) it is convenient to treat Normal-Garnrna-
priors. For the MSI(M)-VAR(O) model it is shown by HAMILTON [1991 a] that these
priors can be easily incorporated in the EM algorithm by representing prior infor-
mation as equivalent to observed data. In an MSI(M)-VAR(O) model for example,
the mode of the posterior density of V m would be given by
where v~ is the prior mean of V m , Dm is the ML estimator, i. e. the sampie mean of

Vm , and Cm is the weight of the prior mean relative to the sampie mean depending
on the precision of priors relative to the number of observations for regime m.
A simple extension, however, of the EM algorithm in the presence ofBayesian priors

is unable to overcome the shortcornings of ML estimation via the EM algorithm; the
normal distribution of 5. is valid only asymptotically, standard errors for inference
rnight be large if many parameters have to be estimated and the predicted density of
Yt+h is even non-normal if the effects of estimation errors are neglected.
Therefore, we will introduce in Chapter 8 a Gibbs sampier for the analysis of

MS-VAR models. This simulation technique for estimating and forecasting MS-
VAR processes will give new insights about the unknown parameters by invoking
Bayesian theory.
6.5.4 Extension to General State-Space Models with Markovian

Regime Shifts
The underlying VAR model can be extended to a general state-space model (cf.
LÜTKEPOHL [1991, ch. 13]), where the parameters in the measurement and the
transition equation can depend on the regime governed by a Markov chain as in the
MS-VAR model:
Yt D(St) Zt+ B(st) Xt + et (6.39)
Zt A(st) Zt-l + C(St) Xt + G(sdUt (6.40)
with [ :: ] - N1D (0, [ : ~]),

where Yt is a (K x 1) vector of observable endogenous variables, Xt is a (R x 1)

vector of observable inputs, and Zt is the (L x 1) state vector govemed by the linear
transition equation (6.40) with Gaussian innovations. Dm, B m , Am, Gm are suit-
able parameter matrices of regime m, m = 1, ... , M. From Chapter 2 we know
for example that the MSM-VAR model possesses the state-space representation (3.6)
which can be reformulated according to (6.39)/(6.40) where only the input matrix
B(St) = J.L(St) of the input vector Xt = 1 in the measurement equation is regime-
dependent. Using the notation introduced in Section 3, we have the transition equa-
tion Zt + Ut and D(St) = t~ (9 IK and a measurement equation without
= AZ t - 1
measurement errors, R = 0, Q = L:.
For this more general dass of models, KIM [1994] has proposed an estimation tech-
nique that combines a BLHK-like filter with an approximating KaIman filter and
smoother. The first one "reconstructs" the regimes as in the MS-VAR context, while
the KaIman filter and smoother "reconstruct" the states Zt as in a time-invariant linear
normal state-space model. In order to make the estimation of the model tractable, ap-
proximations to optimal filtering are involved as in HARRISON & STEVENS [1976].
KIM [1994] suggests to maximize the likelihood function by using a non-linear op-
timization technique which, however, is not discussed further. Nevertheless, KIM'S
model generalizes the switching approachof SHUMWAY & STOFFER [1991], where
the regime-goveming random process is assumed to be serially independent and the
switching is restricted to the measurement equation.
While the procedure proposed by KIM [1994] seems to work in practice, theoretical
results conceming the effects of the various approximations are missing. Recently,
BILLIO & MONFORT [1995] have proposed a partial KaIman filter and smoother in
combination with Importance sampling techniques to compute the likelihood func-
tion of switching state-space models like (6.39)/(6.40). Simulated maximum like-
lihood methods have also been suggested by LEE [1995] for MS-AR models with
latent variables.
A major improvement of this approach is the possibility to treat MS(M)-

VARMA(p, q) models, which entails the problem that the conditional density
of Yt depends on the whole history of states, making filtering and smoothing very
unattractive (cf. LAM [1990]). For MS-VARMA models, KIM's procedure requires
additional approximations to keep the associated filter tractable. The crucial point
conceming MS-VARMA models is that equation (5.12) in Seetion 5.2 is no longer
valid since equation (5.12) presupposes that
Pr( ~t l~t+I, Yr ) = Pr( ~t I~t+l, Yt)
which is violated by an MS(M)-VARMA(P, q) model, i.e.:
Furthermore, the redefining of the regime vector dr+ 1) = ~t ®~L 1 ® ... ®~Lr as in
the MSM(M)-VAR(P) model is intractable, since the number r of relevant regimes
inp(Ytl~t, Yt-l) grows with t, i.e. r ---? 00.
As already mentioned, MS-VAR models possess a linear Gaussian state-space repre-
sentation with Markov-switching regimes as in (6.39) and (6.40). But since they are
quite simplistic, the advantage of a partial KaIman filter estimation is rather limited
compared with the additional effort involved.
6.6 Asymptotic Properties of the Maximum Likeli-

hood Estimator
6.6.1 Asymptotic Normal Distribution of the ML Estimator
The asymptotic theory of ML estimation in linear time series models is very well-
developed, but fragmentary for non-linear models. For the MS-VAR model, it is
usually assumed that the standard asymptotic distribution theory holds. Unfortu-
nately, as far as we know, there exist no general theoretical results conceming the
asymptotic properties of the maximum likelihood estimation. As HAMILTON [1993,
p. 249] points out "All ofthe asymptotic tests [ ... ] assume regularity conditions are
satisfied, which to our knowledge have not yet been formally verified for this dass
ofmodels."
However, there are results in the literature which justify this assumption. For the
mixture-of-normals model with its i.i.d. regimes, the consistency and asymptotic
distribution of the maximum likelihood estimator have been shown by LINDGREN
[1978] and KIEFER [1978] ,[1980]. In LEROUX [1992], the consistency of maximum
likelihood estimators is proven for general hidden Markov-chain models, i.e. for
MSI(M)-VAR(O) processes. For stable MS(M)-AR(P) processes, it has been proven
6.6. Asymptotic Properties of the Maximum Likelihood Estimator 119
by KARLSEN [1990a] that Yt is a strong mixing proeess with a geometrie mixing rate.
For a hidden Markov ehain, the stationarity of Yt is implied by the stationarity of the
homogeneous Markov ehain ~t. Moreover, following BILLINGSLEY [1968], as ~t is
<p-mixing, Yt is <p-mixing, as weIl. When the data are <p-mixing and stationary, the
asymptotic distribution ean be based on the funetional centrallimit theorem given
in BILLINGSLEY [1968]. Following HOLST et al. [1994, p. 498], this might open
up a possibility of proving eonsisteney as weIl as asymptotie normality. In addition,
for univariate Markov-switehing regression models with endogenous state seleetion 5
(but again without lagged dependent variables), the eonsisteney of maximum likeli-
hood estimators has been proved by RIDDER [1994]. It remains to show, however,
that these results ean be transferred to the MS-VAR model in general.
Generic eonditions for eonsisteney and asymptotie normality in non-linear models

are given by TJ0STHEIM [1986a], although, they are diffieult to apply in praetice
(cf. TJ0STHEIM [1990]). For a related, albeit very special model, TYSSEDAL &
TJ0STHEIM [1988] have proven the consistency of method of moments estimators.
Related results ean be foundin BOLLERSLEV & WOODRIDGE [1992]; while amore
general approach is taken in TONG [1990].
Thus, it can be conjectured that the maximum likelihood estimator is consistent and
asymptotically normal under suitable conditions. Typical eonditions require iden-
tifiability, roots of :F(L) and A(L) to be outside the unit circle, and that the true
parameter vector does not fall on the boundary of the allowable parameter space (cf.
HAMILTON [1994a]). Therefore it should be mentioned that on the boundary, i.e. if
Pij = 0 for at least one pair i, j, the asymptotic distribution will eertainly be incor-
rect. The intuition beyond this condition is that the convergence of the ML estim-
ates Pij of the transition parameters of the Markov ehain depends on the number of
transitions nij ~ Pij[iT. Thus, Pij will converge very slowly to the true value if
the transition prob ability Pij or the ergodic probability of regime i, ~i, is near zero.
Furthermore, Pij = 0 or Pij = 1 would imply under normality that the confidence
interval is not restricted to the [0,1] range. This problem can be avoided by using
logits of the Pij as parameters. Then boundary solutions cannot be achieved.
5The transition probabilities Pij are time varying due to their dependence on Yt.
6.6.2 Estimation ofthe Asymptotic Variance-Covariance Matrix
By invoking standard asymptotic theory, an inference of the asymptotic variance-

covariance matrix of the ML estimator .x can be based on an estimation of the asymp-
totic information matrix Ia(A) (cf. HAMILTON [1991b], [1993]).
Under quite general regularity conditions, an ML estimator .x for A is consistent and

asymptotically normal (cf. LÜTKEPOHL [1991, Section CA]),
Thus, in large sampIes, the variance-covariance matrix of .x can be approximated as

- 1_ 1
:E>. = TIa ,
where I a is the asymptotic information matrix:
I a = lim Tl I,
T-+oo
and the information matrix I is defined as minus the expectation of the matrix of
second partial derivatives of the log-likelihood function evaluated at the true para-
meter vector. Hence, the asymptotic information matrix is given by
'T = I' -~E [8 2 In p (YT lYoj A)] . (6.41)

.La T~~ T 8A8N
Since the maximum of the likelihood function lies on the boundary of the parameter
space concerning the parameter vector ~o, these parameters must be excluded from
A when the variance-covariance matrix is calculated.
For the MS-VAR model, it is in general impracticable to evaluate (6041) analytic-
ally. As suggested by HAMILTON [1993), an estimation of the information matrix
can be achieved by using the conditional scores ht(A) as proposed by BERNDT et al.
[1974]:
(6.42)
The conditional score of the t-th observation, ht (A) is defined as the first partial de-
rivative of In p(Yt IYt-1 jA):
ht(A) == 81np(Y~I;-1j A). (6043)

6.6. Asymptotic Properties of the Maximum Likelihood Estimator 121
Obviously, (6.43) is closely related to the score 8t (,>') as the first partial derivative of
the log-likelihood function Inp(YtIYo; A):
t t
( >') = 81np(YtIYo;A) =" 81np(Yr !Yr-l;A) = " h ( ) (6.44)
8t - 8>' - ~ 8>' - ~ r A.
r=1 r=1
Since -X is the maximizer of the likelihood function, the score s(-X) == STe-X) as the
gradient of the full-sample log-likelihood function In p(YT!Yo; >') at -X must be equal
to zero.
The scores 8t (-X) are calculated according to the normal equations of the ML estima-
tor
t
8t(\) = L \lfrC-X)'trlt,
r=1
where
.T, ( \) -
'±'r /\ -
8 diag (1]r ) F r
8N
I _.
>.=>.
The smoothed probabilities ~rlt can be derived analogously to KIM'S smoothing al-
gorithm (5.13)
trlt = [F' etr+1l t 0 tr+llt)] 0 trlTl
with the filtered probabilities ttlt as the starting values.
Then the sequence of conditional scores h t is calculated recursively according to

equation (6.44) so that
ht(-X) 8t(-X) - 8t-l (-X), t > 1,

h l (-X) 81 (-X).
This recursion results in

t-l
ht(-X) = \lft(-X)'ttlt + 2: \lfr(-X)'(trlt - trlt-d.
r=l
Finally, the variance-covariance matrix can be obtained from equation (6.42).
The result can be compared to an alternative estimator of the variance-covariance

matrix calculated as numerical second derivatives of the log-likelihood function,
(6.45)
A huge discrepancy between both estimates may be an indication of a model rnis-

specification (see Section 7.4.4). Under those circumstances, the pseudo-maximum
likelihood approach suggested by WHITE [1982] is preferable:
Var(.\) ~ ~ [i2i 1 1i 2 ]-1 (6.46)
6.7 Conclusion
In this chapter we have discussed the classical method of maximum likelihood esti-
mation far MS(M)-VAR(P) models. While parameter estimation with the proposed
EM algorithm is quite standard, some tests of interest will not have standard asymp-
totics. The problems and methods of statistical tests in MS-VAR models will be in-
vestigated in the next chapter. It will be shown that this problem concems only hy-
potheses, where the number of identifiable regimes is altered under the null. Before
we come back to the EM algorithm in Chapter 9, where the regression step is final-
ized for all MS(M)- VAR(P) specifications under consideration, we will introduce in
Chapter 8 a new Gibbs sampier for MS-VAR models which combines Bayesian stat-
istical theory and Monte Carlo Markov-chain simulation techniques.
Chapter 7
Model Selection and Model Checking
The last two chapters have demonstrated that the estimation methods and filtering
techniques are now weIl established for MS-VAR processes. Most unresolved ques-
tions arising in empirical investigations with MS-VAR models concern the issue
of model specification. In Section 6.6 we discussed the asymptotic distribution of
the maximum likelihood estimator of MS-VAR models. In the literature (cf. e.g.
HAMILTON [1993]) it has been assumed that standard asymptotic theory holds:
The asymptotic normal distribution of the ML estimator ensures that most model dia-
gnostics and tests known from the time-invariant VAR(P) model (cf. the discussion
in LÜTKEPOHL [1991, ch. 4]) can be applied generally with only some slight modi-
fications.
Strategies for selecting simultaneously the number of regimes and the order of the
autoregression in Markov-switching time series models based on ARMA represent-
ations as weIl as usual specification testing procedures are introduced in Section 7.1.
These considerations are summarized in a bottom-up specification strategy. The
strategy is build upon a preliminary model selection to be presented in Section 7.2
whkh is based on the ARMA representation introduced in Chapter 3.
According to these preliminaries, the presentations in Section 7.4 focus on tests

which are concerned with the special structure ofthe MS-VAR model, i.e. specifica-
tion of the MS-VAR model regarding the particular MS-models introduced in Chap-
ter 1. The critical decision in the specification of MS-AR processes is the choke of
the number of regimes. Due to the existence of nuisance parameters under the null,
likelihood ratio tests are confronted with the violation of the identifiability assump-
124 Model Selection and Model Checking
tion of standard asymptotic theory. This problem, as weIl as procedures for the deriv-
ation of the asymptotic null distribution of the likelihood ratio statistic, are discussed
in Section 7.5.
7.1 A Bottom-up Strategy for the Specification of

MS-VAR Models
In this section we discuss a specification strategy designed to deteet Markovian shifts

in the mean of a (multiple) time series. Thus, we presuppose that the major research
interest eoncerns structural breaks in the level ofthe time series, i.e. the mean or the
intercept term of a VAR proeess; regime shifts in the dynamie propagation mechan-
ism are considered as additional features. Thus we propose to start with a simple but
statistically reliable MS-VAR model by restricting the effeets of regime shifts on a
very limited number of parameters and checking the model against alternatives. In
such a procedure, most of the structure contained in the data is not attributed to re-
gime shifts, but explained by observable variables. Hence, it does not contradict the
general-to-specific approach to econometric modelling.
Why not use a top-down strategy? Starting with more elaborated models has the ad-
vantage that e.g. an MSMAHlMSIAH-VAR model can be easily estimated (as we
will show in Chapter 9). However, sinee we have to use numerical and iterative tech-
niques, this advantage is compromised by the potential danger of getting loeal max-
ima. This is due to the theoretical properties of these models discussed already in
Section 1.4: an MSMAH or MSIAH model can exhibit very different and extraordin-
ary statistical features which are hard to check theoretically. It therefore becomes
very important to perform estimations for alternative initial values. Furthermore,
from the results in Chapter 4, we know that forecasting becomes much harder if time-
varying autoregressive parameters are allowed. This view is stressed by the fact that
the analyst should be foreed to have some priors concerning the regime switching in
order to ensure that the model is identified.
7.1. A Bottom-up Strategy for the Specification of MS- VAR Models 125
Table 7.1: Bottom-up Strategy for the Specification of MS-AR Processes
1. Pre-selection of the number of regimes M and the autoregressive order P
• ARMA representation based simultaneous detennination of M and P
• (Wald encompassing tests: S~ 8(5.) )

2. Specification analysis ofthe estimated MS(M)-AR(P) models
(a) Likelihood ratio and Lagrange multiplier tests
• Regime-dependent heteroskedasticity a 2 (St)
Ho: a m = ai for all i, m = 1, ... , M

vs. H1 : a m =/:. ai for at least one i =/:. m
• Regime-dependency ofthe autoregressive parameters aj(St)
(b) Likelihood ratio and Wald tests
• Wald tests of the autoregressive order p:
Ho : a p = 0 vs. H 1 : a p =/:. 0
3. Checking for Markovian regime shifts in the conditional mean
(a) Likelihood ratio tests for regime-dependent means
• Non-standard asymptotics if the number of identifiable regimes is altered

• Standard asymptotics in MSA-AR and MSH-AR models, e.g.:
Ho: J1.m = J-ti for all i, m=l, ... , M, and ai =/:. a m for all i =/:. m
vs. H1 : J-tm =/:. J-ti for at least one i =/:. m, and ai =/:. a m for all i =/:. m
(b) Transition matrix based tests

• Wald tests on i.i.d. regimes (mixture-of-normals model)
Ho: Pmj =Pij foralli,j,m= l, ... ,M

vs. H 1: Pmj =/:. Pij for at least one i =/:. m
• (Tests for the rank of the transition matrix)

Suppose that economic theory or the data set under consideration indicate po-
tential regime shifts 1 . Then the analyst may start with some MSIIMSM(M)-
VAR(p)models which are chosen by the ARMA-representation-based model se-
lection procedure. Henceforth, an MSM model is only chosen if it is the most
parsimonious model feasible. The choice of an MSI specification is mainly motiv-
ated by practical considerations. As we will see in Chapter 9, for an MSI model
smoothing and filtering of regime probabilities and parameter estimation are much
less computationally demanding (and therefore much faster) than the statistical
analysis with an MSM model. 2 Hence, if there are no theoretical reasons which call
for an MSM specification, an MSI specification is preferred.
In the next step, the pre-selected models are estimated with the methods developed
in the last chapter. Since the estimation of MS-VAR models req uires numerical max-
imization techniques with the danger of convergence to a local maximum of the like-
lihood function, estimations should be performed for some initial values. Finally, the
statistically significant and econornically meaningful models are tested successively
against more general models. As Lagrange multiplier tests require only estimating
the parsimonious model, i.e. the restricted model, they might be preferred to LR and
Wald tests.
The proposed bottom-up specification strategy for single equation Markov-
switching models is shown in Table 7.1 in a systematic presentation. It is pointless
to list all test hypotheses related to the specification ofMS-VAR models. Numerous
examples are considered in the empirical analysis of Chapter 11 and Chapter 12. In
Section 7.4 the construction of statistical tests in MS-VAR models will be investig-
ated.
While a time-invariant Gaussian VAR(P) model is nested as an MS( 1)-VAR(P) model
in MS(M)-VAR(P), LR tests for the null of only one regime 01 = O2 = ... = 0M
are not easily available. Unfortunately, equivalence of the VAR parameters in all
1 Linearity tests as proposed by GRANGER & TERÄSVIRTA [1993, eh. 6] may be applied. If the linear
model is rejected by the linearity tests, this might be an indieation for MS-VAR models. But they
have power against several nonlinear models. To our knowledge there exists no partieular test with an
MS-VAR model as alternative without specifying and estimating the alternative. Unfortunately, there
seems to exist no deseriptive tool to deteet Markovian shifts reliably. In partieular, no graphical devices
are available, cf. TJ0STHEIM [1990].
2 Keeping our results in mind, it is surprising that beginning with the seminal eontributions of HAMIL-
TON [1988], [1989] the MSM specification clearly dominates empirical research with MS-VAR
models.
7.1. A Bottom-up Strategy for the Specification of MS-VAR Models 127
regimes implies that the Markov-chain parameters p are not identified as already seen
in Seetion 6.2. Thus, these nuisance parameters cause a bias ofthe LR test against the
null, see Section 7.5. Therefore alternative approaches may be preferable. If regime-
dependent heteroskedasticity is assumed, the number of regimes remains unaltered
under the null and standard results can be inferred for likelihood ratio tests of ßl =
... = ßM S.t. Um =f:. Uj iff m =f:. j.
Procedures testing Markovian regime shifts in the conditional mean E [Yt Irt-l , ~t-l]
can also be constructed as tests of restrictions on the transition matrix. For example,
Wald tests on i.i.d. regimes are feasible. Suppose that the hypothesis F = ~1' can-
not be rejected. Thus, the conditional density p(Yt Irt-l) would be a mixture of nor-
mals, but past regime shifts have neither predictable mean nor variance effects. Sim-
ilarly, a test for a reduced rank rk ( F) could be carried out.
A specification strategy for Markov-switching models of multiple time se ries is

presented in Table 7.2. Since the statistical analysis of VARMA models would be
rather complicate, the pre-selection of the autoregressive order p* is based on model
selection criteria for finite VAR approximations of the data generating process.
A pre-selection of the number of regimes M* can employ the results of univariate

MS-VAR analyses of each component ofthe time series vector (cf. e.g. the empir-
ical application in Section 12.1.7). In order to illustrate this procedure, consider a
bivariate system where the regime associated with the i-th equation is denoted by s~.
Contemporaneously perfectly correlated regime shifts s~ = s~ allow the specifica-
tion of an unrestricted MSI(2)-VAR(P) model:
[
fes} = s~ = 1) 1 F = [ Pu 1'
fes} = s; = 2)
~t P21
, P12 P22
Consider now the effects of intertemporally perfectly correlated regime shifts. For
example, suppose that the regime variable s} associated with the first equation leads
the regime variable of the second equation: s; = 8Ll' Then the model is given by
fes} = 1, SLI = 1)
~t
fes} = 1, SLI = 2)
fes} = 2, SLI = 1) H=[ Vu
V21
Vu
V22
V12
V21
V12
V22 1'
fes} = 2, SLI = 2)
Table 7.2: Specification of MS(M)- VAR(p) Processes
I. Pre-selection of the number of regimes M and the autoregressive order p.
1. Model selection criteria based determination of a maximal autoregressive

order p*:
• Finite pure VAR(p*) approximation ofthe VARMA representation as-

sociated with the data generating mechanism.
2. Statistical analysis of MS-AR models for each component of Yt
3. Comparison of the regime shifting behavior based on smoothed regime

probabilities
• Contemporaneous perfectly correlated regime shifts?

• Intertemporal perfectly correlated regime shifts? e.g.: sr = sLl
• Independent regime shifts in the equations?
4. Estimation ofthe preliminary MSI(M*)-VAR(p*) model
H. Specification analysis of estimated MS(M)-VAR(P) models
1. Lagrange multiplier tests

• Regime-dependence of the variance-covariance matrix E (s t)
• Regime-dependence of the autoregressive parameters Aj (St)
2. Wald tests for the determination of the autoregressive order:
• Top-down strategy: for p :::; p*
3. Wald tests and likelihood ratio tests:
• Restricting the regime shifts to single equations of the system, e.g.

7.2. ARMA Representation Based Model Selection 129
Pu Pu 0 0
F diag (vec P)(L2 ® 12 ® [,~) =

o 0 P21 P21
P12 P12 0 0
o 0 P22 P22
As in the last example, independent regime shifts in both equations imply a restricted
MS(4)-VAR(P) model:
l(s~ = 1,s; = 1)
~t
l(s}
l(s~
= 1,s; = 2)
= 2,s; = 1)
H = [ Vu
l/21
l/11
l/22
l/12
l/21
l/12
l/22
],
I(Sf = 2,s; = 2)
1 2 1 2 1 2 1 2
PUPll P11P21 P21PU P21P21
1 2 1 2 1 2 1 2
PuPu PllP21 P21Pll P21P21
F 1 2 1 2 1 2 1 2
P12P12 P12P22 P22P12 P22P22
1 2 1 2 1 2 1 2
P12P12 P12P22 P22P12 P22P22
Test procedures for these and other restrietions associated with a specification
analysis of estimated MS-VAR models will be discussed in Seetion 7.4. Notice
that parsimony with regard to the number of regimes is extremely desirable since
the number of observations, which are feasible for the estimation of the regime-
dependent parameters and the transition probabilities, shrinks dramatically when
the number of regimes increases.
7.2 ARMA Representation Based Model Selection
In this seetion we will discuss some problems related to the specification ofMS-VAR
models based on ARMA representations. In particular, we present a strategy for se-
lecting simultaneously the state dimension M of the Markov chain and the order P of
the autoregression based on model selection procedures of the order of a univariate
ARMA model (or a final equations form VARMA model).
Table 7.3: ARMA-Representations of MS-AR Models
MSI(M)-AR(P), p ~ 0 ==:} ARMA(M + p - 1, M - 1)

MSM(M)-AR(P), p >0 ==:} ARMA(M +P - 1, M +p - 2)
This approach is based on the VARMA representation theorems for MSM(M)- and
MSI(M)-VAR(P) processes, which have been derived in Chapter 3 (cf. Table 7.3). In
conclusion, an ARMA structure in the autocovariance function may reveal the char-
acteristics of a data generating MS-AR process. In the class ofMSI-AR models there
exists for any ARMA(p*, q*) representation with p* ~ q* ~ 1 a unique MSI(M)-
AR(P) model with M = q* + 1 and p = p* - q*. This result is summarized in
Table 7.4. Even if the regularity conditions do not hold, so that Table 7.3 provides
only the maximal orders, the specifications given in Table 7.4 are the most parsimo-
nious MSI-AR and MSM-AR models.
Since our results are closely related to POSKITT & CHUNG [1994], it seems to be
straightforward to adopt their statistical procedures for identifying the state dimen-
sion of the Markov chain. Based on linear least squares estimations, the identific-
ation process is consistent for the hidden Markov chain models. However their ap-
proach for identifying the number of states takes explicit account of the special struc-
ture of hidden Markov chains. An adjustment of their procedures for the conditions
of the models under consideration, as well as a general discussion of the statistical
properties of the proposed procedures, will be left for further research.
The representation theorems reduce the problem of selecting the number of states
and the order ofthe autoregression to the specification of ARMA models. Therefore,
the determination of the number of regimes, as well as the number of autoregressive
parameters, can be based on currently available procedures to estimate the order of
ARMA models. In principle, any of the existing model selection criteria may be ap-
plied for identifying M and p. To restrict the computational burdens associated with
the non-linearities of maximum likelihood estimation, model selection criteria may
be preferred which are based on linear LS estimations Ce.g. HANNAN & RISSANEN
[1982] and POS KITT [1987]). Altematively, for specifying univariate ARMA mod-
7.3. Model Checking 131
Table 7.4: Selection of Univariate MS-AR Models
ARMA(p* , q*) MSI(M)-AR(P) Model MSM(M)-AR(P) Model

p* < q*
p* = q* ~ 1 MSI(q* + l}-AR(O) MSM(q* + l}-AR(O)
p* = q* + 1 ~ 2 MSI(q* + l}-AR(l) MSM(M)-AR(P*-M+l), ME {2, ... ,p*}
p* > q* + 1 ~ 2 MSI(q* + l}-AR(p*-q*)
eIs, the Box-Jenkins strategy can be used.
In the case of vector valued processes, identification techniques can be based on well-
established estimation procedures of the order of a final equation VARMA represen-
tation (cf. LÜTKEPOHL [1991]). A problem that should be mentioned is that the fi-
nal equations VARMA models lead only to restrictions on M + p. This is clearly
a dis advantage as it is the possibly large number of parameters. We have therefore
restricted our attention to the specification of univariate Markov-switching models.
Model selection procedures based on linear approximations of a data generating

Markov switching process have to take into account that the VARMA(p*, q*) rep-
resentations exhibit non-normal and non-linear features. Although the innovations
Ct are uncorrelated, they are not independent. The optimal predictor is no longer
linear and it is possible to gain information from moments higher than first and
second order. Therefore, model selections based on linear LS estimations can be
considered only as recommendations. A final analysis should always make use of
fuIl-information techniques available for Markov-switching models which will be
discussed in the next seetion.
Further research is required on the development of encompassing tests of the null

hypothesis that an ARMA(p* , q*) representation of the autocovariance structure in
the data has been generated by the selected MS(M)-AR(P) model.
7.3 Model Checking
We continue the discussion with model-checking. For this task, some descriptive
model-checking tools are introduced in the following section.
7.3.1 Residual Based Model Checking
As in the linear regression model, checking might be based on the presence of struc-
tures in the estimated errors. In the MS-VAR model three alternative definitions can
be distinguished:
(i.) Conditional residuals (measurement errors)
Umt Yt - E[Ytl~t = Lm , Yt-l;,\ = ~l

Yt - xtiJm .
(ii.) Smoothed residuals (measurement errors)
Üt Yt - E[Ytl~t = €tIT' Yt-l;,\ = ~l

Yt - X t B~tIT
(iii.) One-step prediction errors etlt-l
etlt-l Yt - E[YtIYt-l;,\ = ~l
Yt - X t B F~t-llt-l.
Obviously the difference concerns the weighting of the conditional residuals,
[Ült, . .. , ÜMt] €tIT'

[Ült, ... ,ÜMt] €tlt-l,
where the one-step prediction error etlt-l is based on the predicted regime probab-
A A
ilities ~tlt-l and the residuum Üt on the smoothed regime probabilities ~tIT.
Superficially, the (smoothed) residuals Üt seem to be closely related to the sampie

residuals in a linear regression model. But due to the use of the full-sample infor-
mation covered in the smoothed regime vector ~tIT' Üt overestimates the explanat-
ory power ofthe MS-VAR model. Advantageously, the one-step prediction error is
uncorrelated with the information set Yt-l,
7.3. Model Checking 133
Thus, etlt-I is a vector martingale difference sequence with respect to the infor-
mation set Yt-I'
If sampie moments of the conditional residuals Umt are computed, they have to be
weighted with their smooth regime probabilities ~mtlT as in the maximization step in
Chapter 6. For example, the sampie variance of each series of conditional residuals
may be helpful for a test of the homoskedasticity assumption,
Var (um) = T;;/ L ~mtITUmtU~t·

t=1
A test to determine whether the residuals etlt-I are white noise can be used, while
to test whether the regime-dependent residuals Ut are white noise, the residuals are
weighted with their regime probabilities.
Model checking techniques have to take into account the non-normality of the predic-
tion errors and conditional distributions. Hence, statistical devices employed should
not rely on a normality assumption conceming the prediction errors or conditional
densities of the endogenous variables.
Typical statistical tools for checking linear models are the residual autocorrelations
and the portmanteau statistic. Since we are not sure about the asymptotic distribution
of the residual autocorrelation, such an instrument can be used only as a descriptive
device. In the time-invariant VAR case prediction tests Jor structural change are weH
established model checking devices. In MS-VAR models, however, the predicted
density of Yt+hlt is no longer normal and standard statistics cannot be used uncrit-
ically. Given the asymptotic normality of the ML estimator, the model specification
can be tested analogously to time varying models with deterministic model shifts like
periodic models (cf. interalia LÜTKEPOHL [1991, eh. 12]).
7.3.2 The Coefficient of Determination
A further advantage of the one-step prediction errors becomes now important;

etlt-I = Yt - Ytlt-I is uncorrelated with the one-step predictor Ytlt-I' Otherwise,
namely, if etlt-I and Ytlt-l would be correlated, it were possible to improve the
forecast of Yt. Hence, Ytlt-I were not the optimal predictor, which contradicts its
definition.
In accordance with the linear regression model the fitting of the data might be meas-
ured with the coefficient of determination
2
1- se
.- S2
Y
where the one-step prediction eITors etlt-l are used to measure the fit of the data.
A cOITection for the bias towards preferring the larger model can be obtained by mul-
tiplying R 2 with the ratio of the degrees of freedom and the number of observations
fl? = 1 _ T- 1 (1 _ R 2 )
T - M(M - 1 + K) - Kp - K(K + 1}/2 - 1 .
For the case of regime-dependent heteroskedasticity the adjusted fl? ensues as
R2 = 1 _ T - 1 (1 _ R 2 ) (7 1)
T - M(M - 1 + K + K(K + 1}/2) - Kp - 1 ,.
where T is the number of observations, M is the number of regimes, which induces

M(M - 1) independent transition probabilities, M regime-dependent means and
eventually M regime-dependent variances and K p is the number of regressors in
each equation. It is perhaps surprising that we propose in equation (7.1) to reduce
the degrees of freedom by only one if the initial states are estimated independently.
But, although the parameter vector ~llo is (MP+l x 1) dimensional, the information
contained in 610 can be represented by a scalar parameter.
7.4 Specification Testing
If the conditions under which the standard asymptotic distribution theory holds are
satisfied, the likelihood ratio, Lagrange multiplier and Wald tests of most hypotheses
of interest all have the usual null distributions. Unfortunately, for one important ex-
ception standard asymptotic distribution theory cannot be invoked, namely, hypo-
thesis tests of the number of states of the Markov chain. Specification procedures
for those hypotheses altering the number of regimes under the null will be discussed
in Seetion 7.5. Before that, testing under standard asymptotics is considered in the
following Seetions 7.4.1 to 7.4.4.
7.4. Specification Testing 135
7.4.1 Likelihood Ratio Tests
The likelihood ratio (LR) test can be based on the statistic
LR 2(1nL(Ä) -lnL(Ä r )), (7.2)
where ); denotes the unconstrained ML estimator and Är the restricted ML estimator

under the null H o:</>(>.) = O. Here, </> : ~n ---+ ~r is a continuously differentiable
function with rank r, r = rk (a:i~») ::; n. Under the null, LR has an asymptotic
X2 -distribution with r degrees of freedom,
(7.3)
More details can be found in LÜTKEPOHL [1991, sec. C.5]. A necessary condition
for the validity of these standard results is that the number of regimes M is unaltered
under the null. The critical situation, where the number of regimes changes, will be
discussed in Section 7.5.
As long as the number of regimes remains unchanged under the null, t-tests and F-
tests concerning linear restrictions of the VAR coefficient vector () can be performed
as in linear models. Note, however, that the calculation of the variance-covariance
matrix differs obviously from the linear regression model.
Under the same conditions which ensure the applicability of standard asymptotics,
the LR test statistic has the same asymptotic distribution under the null hypothesis
as the Lagrange multiplier statistic and the Wald statistic.
7.4.2 Lagrange Multiplier Tests
The scores can also be used to implement Lagrange multiplier (LM) tests of
Ho : </>()..) = 0 vs. H 1 : </>()..) i= O.

Lagrange multiplier tests are based on the principle that, if the model is correctly
specified, the conditional score vectors h t ()..) should form a martingale difference
sequence (cf. [1991b],HAMILTON [1994b, p.428]):
(7.4)
which implies that the scores St(.~) have mean zero.
While the scores of an unrestricted model have sampie mean zero by construction as
discussed in Section 6.6.2,
T
s(,x) =L 'l1 t (,x)' ttlT = 0, (7.5)
t=l
the scores of a restricted model can be used to implement the LM test
(7.6)
where R = rk (~ ) . Typical applications might be:
• Testing forregime-dependentheteroskedasticity (R = (M -1)K(K + 1)/2):
Ho: vech CE i ) = vech (~m) for all i, m = 1, ... , M, (MSM-VAR)
vs. H1 : vech (~i) =1= vech (~m) for at least one i =1= m, (MSMH-VAR).
• Testing for regime-dependentmeans (R = (M - 1)K):
Ho: /1i = /1m for all i, m = 1, ... , M, (MSH-VAR)
vs. H1 : /1i =1= J-tm for at least one i =1= m, (MSMH-VAR).
• Testing the order ofthe MS-VAR model: Ho : A p +1 = 0 (R = K2).

Using an estimate of the asymptotic information matrix I a based on the restricted
estirnator 'xr, Tappears in the denominator of the LM statistic:
Estimating the asymptotic information matrix I a , as suggested in Seetion 6.6.2, the

LM statistic can be evaluated as
where St (.Xr )' == a In ~i;/Yd I _ and So == o.

>'=>'r
For a test of homoskedasticity 2;1 = ... = 2;M in the MSI model, we have to cal-
culate
aIn 'T/Mt ]
avech (2;M )
A
frlt
with
where Umtb) = Yt - Xmt'Y are the residuals at time t associated with regime m,
DK = :ve~
vec
!:!:m
Tn
is the (K 2 x K(K + 1)/2) duplication matrix as in (6.31) and
2;m = 2; is valid under the null.
The LM test is especially suitable for model checking because testing different model
specifications against a maintained model is straightforward. A new estimation is not
required as long as the null hypothesis is not altered. Note that the LM test operates
under the following conditions:
• The model is estimated under the null so that, for all unrestricted parameters,
the score s('\r) is zero. The scores of the last R elements are calculated ac-
cording to equations (6.8) and (6.10). Their magnitude reflects how much the
likelihood function increases if the constraints are relaxed .
• The number of regimes is identical under Ho and H 1 . This assumption is es-

sential for the asymptotics of the test (cf. Section 7.5).
Tests for regime-dependent intercept terms or autoregressive parameters can be per-

formed analogously.
7.4.3 ~ald 1Lests
The Wald statistic is based on an unconstrained estimator ,\ which is asymptotically

normal. It follows that
Thus, if Ho: 4J(A) = 0 is true and the variance-<:ovariancematrix ~~ is invertible,
where ~~ is the ML estimator of ~~ (cf. inter aUa LÜTKEPOHL [1991, p.493]).
Suppose that the parameter vector is partitioned as A = (Al, A2) and the interest
centers on linear restrietions on the parameter vector A2,
while there is no constraint given for the parameter vector Al. Then the relevant Wald
statistic can be expressed as
To make the procedure a bit more transparent, it may be helpful to consider some
applications:
• Testing ofhomoskedasticity in MSMH-VARmodels (R = (M -l)K(K +1)/2}:
Ho: (MSM(M)-VAR(P) model)

vech(~l) ]
vech (~2)
] [ : =0
vech (~M)
vs. H1 : (MSMH(M)-VAR(P) model)
vech(~d ]
vech (~2)
] [ : # 0
vech (~M)
• Testing for regime-dependent means (R = (M - l)K):
Ho: (MSH(M)-VAR(P) model)
[lM-10IK -IM-10hJ[:t]=o
:t ]
vs. H1 : (MSMH(M)-VAR(P) model)
[ IM-I 0I K -IM-I 0I K ) [ ;i 0
• Testing of identical autoregressive parameters in an MS-VAR model: (Ho: MSM-

VAR).
• Testing the order of an MS-VAR model: (Ho: A p = 0).
• Tests of the MSI(M)-VAR(P) and the MSM(M)-VAR(P) model against an

MSI(M, q)-VAR(P) model to be introduced in Chapter 10: (Ho: Restrictions
on Hand A).
7.4.4 Newey-Tauchen-White Test for Dynamic Misspecification
As suggested by WHITE [1987], tests can be based on the conditional scores by using
the fact that the scores should be serially uncorrelated,
(7.9)
if the model is correctly specified. In particular, the conditional moment tests of

NEWEY [1985] and TAUCHEN [1985] can be implemented for the null E[8t (.X)] = 0,
where the vector bt{A) contains the R elements of the matrix ht (>')h t - 1 (>')' which
are of interest. Under the null, the proposed test statistic for serial correlation of the
scores has an asymptotic X2 distribution with R degrees of freedom,
where
Eh", [ (t ,(X1
6 6'(\)') - I
(t 6,(X)h,M) (th,(X)h'(X)') (t,h,(X)6'(W) ] .

-I
Dynamic specification tests based on this principle are derived in HAMILTON

[1991b] for MS-AR models. For sake of illustration, we consider the proposed test
for an MSMH(2)-AR(P) model:
• Autocorrelation:
E [8 In p(YtIYt-I;'x) 8 In p(Yt-IIYt-2; 'x)']

Ho:
ß~l 8~2
= E [8In p (Y t IYt-l;'x) 8Inp(Yt-lIYt-2; ,X)'] = o.

8~2 8~1
• ARCH effects:
E [8 Inp(YtIYt-l;'x) 8 In p(Yt-IIYt-2; ,X)]

Ho:
8al 8a2
= E [8In p(Yt IYt-l;'x) 81np(Yt-IIYt-2; ,X)] = o.
ßa2 8al
• Varying transition probabilities:
E [8In p(Yt IYt-l;'x) 8Inp(Yt-IIYt-2; ,X)]

Ho:
8Pll 8~1
= E [81n p(Y t IYt-l;'x) 81np(Yt-lIYt-2; ,X)] = o.
8P22 8J.L2
• Second Order Markov Process:
E [81n p (Yt IYt-I;'x) 81np(Yt-lIYt-2; ,X)]

Ho:
8pll 8Pl1
= E [81n p (Yt IYt-l;'x) 81np(Yt-lIYt-2; ,X)] = o.
8P22 8P22
Unfortunately, HAMILTON [1991 b] found that these tests have poor small sampie
properties.
7.5. Determination of the Number of Regimes 141
7.5 Determination of the Number of Regimes
A special problem which arises with the MS-VAR model is the determination ofthe
number of states required for the Markov process to characterize the observed pro-
cess. Testing procedures suffer from non-standard asymptotic distributions of the
likelihood ratio test statistic due to the existence of nuisance parameters under the
null hypothesis. For the derivation of the asymptotic null distribution, procedures
have been proposed by HANSEN [1992] and GARCIA [1993].
To illustrate this problem, consider a test of the Markov-switching model against a

time invariant linear model. Since a VAR(P) model can be rewritten as an MSM(2)-
VAR(P) model with J-Ll = /-L2, one rnight consider a likelihood ratio test of a linear
VAR(P) model as the null against an unrestricted MSM(2)-VAR(p) model. Unfortu-
nately, under the null, the parameters Pij are unidentified and the information matrix
is singular. The presence of the nuisance parameters gives the likelihood surface suf-
ficient freedom so that one cannot reject the possibility that the apparently significant
parameters could simply be due to sampling variation. Hence likelihood ratio tests
of the null /-LI = J-L2 have no asymptotic standard distribution.
For tests conceming the number of states of the Markov chain, standard asymptotic
distribution theory cannot be invoked in general. Due to the unidentified nuisance
parameters, the conventional regularity conditions are violated (i.e. identical zero
scores, singular variance-covariance-matrix). HANSEN [1992] has proposed a gen-
eral theory of testing for such conditions. By viewing the likelihood as a function of
the unknown parameter, the asymptotic distribution of the standardized likelihood
ratio statistic can be bounded even under non-standard conditions.
Unfortunately, the asymptotic distribution of the standardized LR statistic is depend-

ent on the data and parameters, so generic tabulation is not possible. The genera-
tion of the asymptotic distribution requires simulations of the conditioned LR stat-
istic for a grid oftransition and regime-dependent parameters. The testing methodo-
logy of HANSEN [1992] has been simplified by GARCIA [1993]. It is shown that the
simulations only have to be performed for a given grid of the (nuisance) transition
probabilities while all remaining parameters can be estimated via maximum likeli-
hood. For the MSI(2)-VAR(O), the MSIH(2)-VAR(O), the MSM(2)-VAR(1) and the
MSMH(2)-VAR( 1) model the asymptotic distribution of the LR statistic is tabulated
in GARCIA [1993]. The critical values for the likelihood ratio statistic are consider-
ably higher than the values of X2 (1), the asymptotic standard distribution in classical
theory. 3 Hence, classical critical values may be used to check that the null cannot be
rejected, if LR < xLa(1). For this series of MS(2)-AR(P) it is shown by GAR-
CIA [1993] that the asymptotic distribution is elose to the small sampie distribution,
whereby the procedures proposed by HANSEN [1996b] to simulate central X2 pro-
ces ses have been employed. However, this approach is computationally demanding
and therefore only of limited use for empirical research with highly parameterized
models and vector systems.
The test procedures suggested by HANSEN and GARCIA are elosely related to DAV-
IES' [1977] bounded likelihood ratio test. The point of these procedures is to avoid
the problem of estimating the q nuisance parameters An by setting a grid of values of
the nuisance parameters, estimating the remaining vector of identified parameters Ai
and considering the likelihood ratio statistic conditioned on the value of the nuisance
parameters:
and constructing a test statistic based on the resulting values of the objective function,
LR = sup LR(A n ).
An
As shown by ANDREWS & PLOBERGER [1994], sup LR{A n ) is not the optimal test
which has an average exponential form. However, the power of the LR test is almost
insensitive to the choice of sup LR(A n ).
DAVIES [1977] has derived an upper bound for the significance level of the likelihood
ratio test statistic under nuisance parameters, which might be applied to a test of the
null hypothesis of M - 1 states. If the likelihood has a single peak, the following
approximation is valid:
Pr(LR > x) ~ Pr(x~ > x) + x! exp ( _~) 2 1-! [r (~)]-1

GALLANT's testing procedure [1977] consists of calculating the estimated values of
the dependent variable associated with given values of the unidentified parameters.
These constructed variables are added to the model with M - 1 regimes and the test
is based on an F-test for their significance.
3The critical values depend on the value of the autoregressive parameter, but in no case is the 5% critical
value less than eight.
7.5. Determination of the Number of Regimes 143
In addition, the so-called I-test for non-nested models of DAVIDSON & MACKIN-
NON [1981] can be applied. The model with the larger number of states M is estim-
ated and the fitted values y~M) are inserted into the regression of Yt in a model with
M - 1 states
Yt = (1 - 8)Xt Bd M - I) + 8y~M) + €t,
where y~M) = xtBtit:.). Then the coefficient 8 is subject to at-test.
An application of these testing procedures to MSM(M)-VAR(P) and MSMH(2)-
VAR(P) is discussed in GARCIA & PERRON [1990].
As in the c1assical theory, the LR, LM and Wald statistics all have the same asymp-
totic distribution under the null as shown by HANSEN [1996b]. In order to make
these procedures a bit more transparent we sketch briefly a Wald test of the hypo-
thesis Ho : J-L* = J-LI - J-L2 = 0 against the alternative J-L* = J-LI - J-L2 =j:. 0 as
considered by CARRASCO [1994] for the MSI(2)-AR(0) model. 4 The ML estimates
fi, * = fi,I - fi,2 and fi,2 have a joint limiting distribution of the form
])
The Wald statistic is given by
L T.I'T ( ) _ T( _ )2 (1 {p)[1 - (1 (p)]

rr T P - J-Ll J-L2 2 .
(J
Unfortunately, P is a vector of nuisance parameters that are not identified under the
null. For given transition probabilities p, the Wald test statistic would have its stand-
ard asymptotic distribution under the null
According to HANSEN [1992], the following test statistic is chosen
LWT ~ sup LWT(p),

pEP
4 CARRASCO [1994] derives also the asymptotic distribution of the Wald statistic of a threshold model
and a structural change model when the true model is a misspecified Markov-switching model and
constructs a Wald encompassing test (cf. MIZON & RICHARD [1986]) of the structural change model
by the Markov-switching model.
with P being a compact parameter space, where p is supposed to lie. Analogously to

ANDREWS [1993] it can be shown that sup W converges to a function of a Brownian
bridge
d
sup LWT(p) -t
pEP
where X E]O, l[ is the image of P under [1 (p). BB(l = B([t} - [IB(l) is a

Brownian Bridge and B(·) denotes a Brownian motion on [0, l] restricted to X.
7.6 Some Critical Remarks
In this chapter we have just scratched the surface of model selection and check-
ing techniques in MS-VAR models. It must be emphasized that the previous ana-
lysis rests on some basic assumptions and most of the presented results will not hold
without them. Furthermore, investigations of the small sampie properties of the em-
ployed statistical tests are needed.
Model selection and model checking represent an important area conceming empir-
ical investigations with MS-VAR models. Therefore, the development of an asymp-
totic theory and of new statistical tools for the specification of MS-VAR processes
merits future research.
Chapter8
Multi-Move Gibbs Sampling
In this section we discuss the use of simulation techniques to estimate and forecast
MS-VAR processes. A general feature of MS-VAR models is that they approximate
non-linear processes as piecewise linear by restricting the processes to be linear in
each regime. Since the distribution of the observed variable Yt is assumed normal
conditional on the unobserved regime vector ~t, the MS-VAR model is weH suited
for Gibbs sampling techniques.
The Gibbs sampier has become increasingly popular as a result of the work of GE-
MAN & GEMAN [1984] in image processing, and GELFAND & SMITH [1990] in
data analysis (cf. SMITH & ROBERTS [1993]). In particular, the Gibbs sampier is
quite tractable for parameter estimation with missing values, see for example Ru-
ANAIDH & FITZGERALD [1995]. The crucial point is that the unobservable states
can be treated as additional unknown parameters. Thus, the joint posterior distribu-
tion of parameters and regimes can be analyzed by Monte Carlo methods.
Existing Gibbs sampling approaches 1 for MS(2)-AR(p) models have been intro-
duced independently by ALBERT & CHIB [1993] and MCCULLOCH & TSAY
[1994b]. ALBERT & CHIB [1993] present a single-move Gibbs sampier for an
MSMlMSMH(2)-AR(P) model, while MCCULLOCH & TSAY [1994b] consider
a more general MS(2)-ARX(P) model. The latter approach has been applied by
GHYSELS [1994] to periodic MS-AR models. An extended version has been used
by FILARDO [1994] to estimate an MS-AR model with time-varying transition
probabilities. Unfortunately, Gibbs sampIers available in the literature are restric-
ted to univariate time series and to the presence of only two regimes. Since we
IFor mixtures of normal distributions, HAMILTON [1991a] proposed a quasi-Bayesian estimation.

However, this is not implemented as a Monte Carlo Chain method, but as a modification of the EM
algorithm which has been discussed in Section 6.5.3.
146 Multi-Move Gibbs Sampling
do not wish to restrict our analysis to MS(2)-AR(P) models, an extension of these

approaches is necessary.
There is a wide range of views about the appropriate way to develop a Gibbs sampier
for a given problem. For the purpose of a reduction in correlation between con-
sequent iterations of the Gibbs sampling algorithm, and thus increased convergence
and efficiency, we suggest ways of modifying the single-move Gibbs sampling ap-
proach of ALBERT & CHIB [1993] and MCCULLOCH & TSAY [1994b] to a multi-
move sampIer. The difference between single-move and multi-move Gibbs sampier
lies in the generation of the state variables. While the single-move Gibbs sampier
generates each state variable ~t conditional on the observations Yf = (y~, . .. , y~ )
,
and all other generated regimes ~~t = (~~, ... , ~:-l ~~+l ' ... , ~T ),
the multi-move Gibbs sampier produces the whole state vector f = (~~, ... , ~T )
simultaneously from the joint probability distribution given the sampie YT and the
parameter vector ,X
~ f--' Pr(~IYT, 'x).
This multi-move sampling of the regime vector ~ is implemented by incorporating
the slightly revised filtering and smoothing algorithms for MS-VAR models which
have been discussed in Chapter 5. The aim of this modification is to reduce the cor-
relation between the draws of consequent iterations. Thus, an increased speed of
convergence of the Gibbs sampIer to the desired posterior distribution and an effi-
ciency of estimates relative to the algorithms proposed in the previous literature can
be achieved.
The chapter will be organized as folIows: we start our discussion with abrief intro-
duction to the Gibbs sampling technique. In the following sections it is shown that
generating the complete regime vector ~ is straightforward for Markov-switching
time series models by using the smoothed full sampie probabilities ttlT. Again, this
is a bit more sophisticated for MSM specifications. Given the regimes ~, Bayesian
inference about the parameter vector ,x is quite standard. The conditional posterior
distribution of the transition probabilities can be derived as in Markov-chain mod-
els. In this chapter, the Bayesian analysis is based on a generalized MS(M)-VAR(P)
model, which is linear in the vector I of VAR parameters. Finally, the usage of the
Gibbs sampIer for prediction purposes is discussed.
8.1. Bayesian Analysis via the Gibbs Sampler 147
8.1 Bayesian Analysis via the Gibbs SampIer
The Gibbs sampier is an iterative Monte Carlo technique that breaks down the prob-
lem in Bayesian time series analysis of drawing sampies from a multivariate dens-
ity such as p(~, ),IYT ) into drawing successive sampies from lower dimensional (in
particular univariate) densities. Thus the regimes ~ and the parameter vector ), are
drawn from the smoothed regime probability distribution Pr(~IYT,),) and the con-
ditional density p(),I~, Yr ). Following a cycIical iterative pattern, the Gibbs sampier
generates the joint distribution p(~, ),!Yr) of ~ and)'. TIERNEY [1994] proves the
convergence of the Gibbs sampier under appropriate regularity conditions. A general
discussion ofthe involved numerical Bayesian methods can be found in RUANAIDH
& FITZGERALD [1995].
The main idea of the Gibbs sampier is to construct a Markov chain on (~, ),) such that
the limiting distribution ofthe chain is the joint distribution of p(~, ),!YT). Given the
data set YT and initial values 2 ), 0 , the Gibbs sampier consists of the following moves
"t-J" at each iteration j 2: 1:
~(j) t-J Pr(~I),(j-1),YT),
'r/),i : ),~j) .t-J p(),il),~:-l),~, YT ),
where the parameter vector has been partitioned and ),-i is the complement to ),i.
. (j-1)' _ ( (j)' (j)' (j-1)' (j-1)')
More prec1sely, we have ),-i - ) , 1 ' · · . , \-1 , \+1 , .. ·'),R . Each
iteration involves a pass through the conditional probability distributions. As soon as
a variate is drawn it is substituted into the conditional prob ability density functions.
The Gibbs sampIer produces aseries of j = 1, ... , NI, ... ,NI + N 2 dependent
drawings by cycIing through the conditional posteriors. To avoid an effect of the
starting values on the desired joint densities and to ensure convergence, the first NI
draws are discarded and only the simulated values from the last N 2 cycIes are used.
The simulated values (~(j), ),(j»), j = NI + 1, ... , NI + N 2 are regarded as an
approximate simulated sampie from p(~, ),IYT). To compute the posterior density
2For their MS(2)-ARX(P) model, MCCULLOCH & TSAY [1994b] propose to use the estimates from a
linear multiple regression (M = 0) as initial parameter values.
of any component, we can average its fuH conditional distribution
1 N 1 +N2
p(.AiI Y ) = N L p(.A~j)I.A-i'~'YT). (8.1)
2 j=Nl+l
The posterior density p(.AiIY) can be evaluated non-parametrically. Of particular

interest are the posterior expectation of functions g( .) of the parameter .Ai, which can
be estimated by the sampie average
Nl+N 2
E[g(.Ai)] = ~ L g(.A~j)). (8.2)
2 j=N 1 +l
As emphasized by ALBERT & CHIB [1993], the numerical standard error Ui of the
estimate ~i cannot be calculated as usual by si!..;N;" where Si is the standard devi-
ation of .Ai in the sampled series,
This results from the fact that the quantities involved are sums of correlated obser-
vations. However, this effect can be corrected by invoking the batch-means method
(cf. RIPLEY [1987]): the sampie is divided in n batches of size N 2 /n, such that the
lag correlation of the batch means is just under a given c, e.g. c = 5%. Then the
numerical standard error is estimated by Ui = s;j Vii as if the batch means would
constitute the sampie, where Si is now the standard deviation of the batch means
-(k) _ .
\ ,k-l, ... ,n.
Si= ~t(:X~k)-E[.Ai]r·
k=l
As suggested by GEWEKE [1994] and PFANN et al. [1995], some quantities can be
more easily and accurately computed by using the analytical expression for the con-
ditional expectation and averaging over the conditional expectation. For example,
the expected value of .Ai can be calculated as the average conditional mean
where ~~j) is the mean of the conditional posterior distribution of .A~j) at the j-th
iteration of the Gibbs sampier. Analogously, the variance can be estimated as the
8.2. Bayesian Analysis of Linear Markov-Switching Regression Models 149
average conditional variance plus the variance of the conditional mean
8.2 Bayesian Analysis of Linear Markov-Switching

Regression Models
For reasons of convenience, we consider a general (in parameters I) linear MS(M)

regression model
Yt XOt/O + ~1tX1t/l + ... + ~MtXMt/M + Ut, (8.3)
Ut r-.J NID (0, L;t), L;t = ~ltL;l + ... + ~MtL;M
where the innovation term Ut is called homoskedastic if L;m = L; for all m =

1, ... , M, and where the vector 10 of regime invariant parameters is cornmon to all
states; the parameter vectors Im, m = 1, ... , M contain the location parameters
unique to state m. Some of the regime-specific parameters in Im, m = 1, ... , M,
namely Ir = (/~,l' ... , I~.M )', are constrained in the form of inequalities in order
to ren der the states and thus the model identified.
The evolution of regimes is again given by an M dimensional ergodie Markov chain,

determined by the transition probabilities Pij, and innovations Vt+ 1
The parameters p = vec (P) of the Markov chain, the scale parameter vectors (J' m =
vech (L;m), (J" = «(J'~, ... , (J'M) and the location parameters I are collected to the
parameter vector3
A\' = (' , "

lo,Ir,/l,,,,,IM,(J'l, " ).
... ,(J'M,P
homoskedasticity of
. Wh'Ite nOlse
the G ausslan . tLt. =
a parameter vector '"\' (I 'c' Ir, '1'
I ...• 'M'
"(j',p . use d• where
) IS
(j = vech (L:).
For purposes of estimation, a partition of A into its components is very useful. In

particular, conditioning of the probability distribution overcomes problems due to
non-linearities of the normal equations (as in ML estimation) by deriving the con-
ditional probability distributions for each component vector separately. In our time
series framework, the variables X t are considered to be lagged endogenous or con-
stant. However, a modification of the algorithm for the introduction of exogenous
variables is straightforward.
The conditional densities required for the Gibbs sampier can be derived from the
likelihood function. For given ~ the likelihood function is determined by the density
function p(YT I~, A):
where Ut(-y) = (Yt - [(1, ~:) ® IK]Xt/). For purposes of estimation a slightly
different formulation of the likelihood function is useful:
such that W- 1is a (T K M x T K M) dimensional matrix, where Tm = 'Li'=1 ~mt

and the other matrices have the form
u=
10
o x~J 1= 11
IM
WoI-I
o ~lt~011
1'
W- l = [ W t- 1 -
- [
w.-
T
l
8.2. Bayesian Analysis of Linear Markov-Switching Regression Models 151
As in traditional Bayesian analysis, inference on a parameter vector A is based on its

posterior densitYP(AIYT ) cx: ll'(A) p(YTIA), where ll'(A) is the prior density of A. A
crucial point is that the unobservable states are treated as additional unknown para-
meters. Since the unobservable regimes ~ are involved, the posterior density p( AI YT )
of A has to be considered as derived by the integration of the joint probability distri-

bution p(AIYT ) = J p(A, ~IYT )d~.
However, the decisive difference consists in the way of deriving these densities. For
each draw of the Gibbs sampIer, the conditional posterior densities are needed. They
are given by
Suppose that the prior density of Ai is denoted by p(Ai I~, A-i), then the conditional
posterior density of Ai is given by
p(Ail~,A-i) p(YTI~, A-i; Ad

(8.7)
Jp(YTIC A-i; Ai) dAi
where p(YTI~, A-i; li) = p(YTI~, A) is the likelihood function conditioned on the
sampled regimes ~ and the remaining parameters A-i.
Intuitive priors for the parameters of multiple time series models are much more
complicated to specify than in univariate time series analysis. For the following ana-
lysis a flat prior is assumed. However, the procedures discussed can be easily ex-
tended to the incorporation of informative priors. In particular, if Normal-Gamma-
priors are chosen the procedures presented can be easily implemented. For the VAR
parameters, the LITTERMAN [1986] and DOAN et al. [1984] prior rnight be con-
sidered as an example. 4
For flat priors, i.e. p(Ail~, A-d is constant for all feasible Ai, equation (8.7) yields
4For the precision matrix as the inverse ofthe variance-covariance matrix L:, HAMILTON [1991a] fol-
lowing DEGROOT [1970] suggests the use of a Wishart distribution L:;;.1 rv W(a m , Am) with a
degrees of freedom and a (K X K) precision matrix A, such that
p(L:~ I } cx 1L:~ll(am -K -1)/2 exp[ -(1/2) tr (A 7n L:~1 )].
In other words, the posterior probability distribution of Ai is proportional to the like-

lihood function conditioned on the sampled regimes ~ and the remaining parameters.
Therefore, the mode of the posterior prob ability distribution is the maximizer of the
conditionallikelihood function. Furthermore, ifthe conditional density p(YTI~, A)
is normal, the mode and the mean are identical
and thus closely related to the ML estimator discussed in the previous chapter.
We continue with the derivation of the posterior probability distributions for the gen-
erallinear Markov-switching model under consideration. Our approach is summar-
ized in Table 8.1 on page 164, which presents the Gibbs sampling algorithm.
8.3 Multi-Move Gibbs Sampling of Regimes
We begin the presentation of the Gibbs sampier by discussing the derivation of the
posterior distribution of the regime vector ~. In the Gibbs sampiers proposed by
ALBERT & CHIB [1993] and MCCULLOCH & TSAY [1994a], the states are gen-
erated one at a time ("single move") utilizing the Markov properties to condition on
neighboring states (cf. CARLIN et al. [1992]). Unfortunately, since the regimes are
highly correlated, the desired asymptotic distribution of the sampier might be ap-
proached only very slowly. MCCULLOCH & TSAY [1994b, p. 529] mention that
drawing such highly dependent variables together speeds-up convergence. There-
fore, they propose to sampie the regimes from the conditional probability distribu-
tion Pr(~t, ... , ~t+k-lIYT, 6, ... , ~t-l, ~t+k,"" ~T, A) for an arbitrary k.
We take a different Gibbs sampling approach, generating all the states at on ce

("multi move") by utilizing the structure of the Markov chain. This approach
provides an efficient method to derive the full sampie posterior Pr(~IA, YT ).
The use of a multi-move Gibbs sampier has been suggested independently by SHEP-
HARD [1994] and CARTER & KOHN [1994] for related time series models. Among
other partially non-Gaussian state-space models, SHEPHARD [1994] considers a
state-space model where the intercept term depends on a binary Markov chain of
the transition equation and where the innovations are normally distributed. CAR-
TER & KOHN [1994] consider a linear state-space model with varying coefficients
8.3. Multi-Move Gibbs Sampling of Regimes 153
and errors that are a mixture of nonnals. The approach is applied to an MSH(2)-
AR(O) model which has been used by Box & TIAO [1968]. Following ANDERS ON
& MOORE [1979], it is shown that a smoothing algorithmrelated to KIM [1994] can
be used to generate the conditional probability distribution of the regimes. An applic-
ation to a switching regression state-space model used by SHUMWAY & STOFFER
[1991] is mentioned, but without going into details. The approach is then supported
theoretically by the results of Lw et al. [1994] who show that generating variables
simultaneously produces faster convergence than generating them one at a time.
In the following section we derive the algorithm for multi-move Gibbs sampling. It
is shown that the conditional posterior distribution of regimes involves the smoothed
regime probabilities ~tIT' Therefore, the Gibbs cycle is closely related to the EM
algorithm for ML estimation since it makes use of the same filtering and smoothing
procedures.
8.3.1 Filtering and Smoothing Step
In this section we use the multi-move Gibbs sampling approach, generating all the
states at on ce by taking advantage of the structure of the Markov chain,
T-I
Pr('IYT) = Pr(~TIYT) II Pr('tl~t+1, yt). (8.8)
t=1
Equation (8.8) is analogous to Lemma 2.1 in CARTER & KOHN [1994] where it is
derived for conditionally normally distributed state variables.
Thus to generate ~ from the posterior Pr('IYT), we first draw 'T from Pr('TIYT)
that is the smoothed full-sample prob ability distribution which can be derived with
the BLHK filter. Then ~t, t = T - 1, ... , 1, is generated from Pr(~t l~t+1' YT ).
In the course ofthe discussion of KIM's smoothing algorithm it has been shown that
the distributionPr('t l't+1, YT ) is equal to Pr(~t l~t+1' yt) and, thus, can be deduced
from
In matrix notation equation (8.9) yields:
(8.10)
where 8 and 0 denote the element-wise matrix multiplication and division respect-
ively.
With the exception that the generated ~t+ 1 is used instead of the smoothed probabil-
ities ~t+1IT' equation (8.10) works analogously to the smoothing procedure involved
in the EM algorithm of ML estimation.
To summarize, in the Gibbs cycle the generation mechanism of regimes is given by

the following iterations
~T r> ~TIT (8.11)
~t r> ttl~t+1, YT, t =T - 1, ... ,1, (8.12)
Pr(~T=/'lIYT) 1 [ Pr(~t=/'ll~t+1, 1 YT )
where~TIT = [ : andttl~t+l,YT =
Pr(~T~/'MIYT) Pr(~t=/'MI~t+1,
YT )
denotes the prob ability distribution of ~t conditional on the previously drawn re-
gime vector ~t+1 and the sampie information YT . To ensure identification at the de-
termination of the conditional prob ability distributions of the transition and regime-
dependent parameters (see Section 8.4), a sampie can be accepted only if it contains
at least one draw of each regime.
8.3.2 Stationary Probability Distribution and Initial Regimes
In contrast to the handling of initial states of the Markov chain in the EM algorithm of
maximum likelihood estimation, we assume that the regimes in t = 0, ... , 1 - P are
generated from the same Markov process as the regimes in the sampie t = 1, ... , T.
Assuming that the Markov process is ergodic, there exits a stationary probability
distribution Pr( ~t Ip), where the discrete probabilities can be included in the vec-
tor [=[(p). Irreducibility ensures that the ergodie probabilities are strictly positive,
[m > 0 for all m = 1, ... , M.
Consequently, we are sampling ~o from the stationary probability distribution
Note that determination of [has already been discussed in the first chapter. The esti-
mation procedures established there are unaltered whether the single-move or the
multi-move Gibbs sampier is used for drawing the state vector~.
8.4. Parameter Estimation via Gibbs Sampling 155
8.4 Parameter Estimation via Gibbs Sampling
8.4.1 Hidden Markov Chain Step
The conditional probability distribution of the transition probabilities contained in

P = vec (P) depends only on the drawn regimes~. Hence, the derivation of the
posterior distribution of P can be based on a Bayesian analysis of Markov chains,
where the unobservable states are replaced with the drawn states and, thus, known.
Therefore, the conditional distribution can be described with the help of the sampie
estimates: let nij denote the number of transitions from regime i to j in the sampie
of ~ and define ni = L~l nij. Then the likelihood function of pis given by
p(pl~, YT , T, 0') <X p(YI~) Tl 0') Pr(~lp)
rr Pr(~t I~t-l, rr II
T M M
= p) = (Pij )nij .
t=l i=l j=l
This formulation of the likelihood function does not take account of the adding-up
restrietion on the transition probabilities explicitly. Given that PiM = 1- L~~l Pij
and niM = ni - L ~~ 1 nij for all i = 1, ... , M, the likelihood function of p equals
p(plO
For the two-regime case as discussed in the literature, it can be easily seen that the
desired posterior is a product of independent Beta distributions,
p(plO
Therefore each state can be sampled according to

Pu f---> Beta(nl1, n12),
P22 f---> Beta( n22, n21 ).
In generalization of this procedure we can deal with equation (8.13) as folIows. Cal-
culate the distribution of Pij conditional on Pil, ... , Pi,j-l, Pi,j+l, ... ,Pi,M -1 as:
Therefore, given Pim, m # j, m ::; M, the transformed variable pij ,
*
Pij ( 1- L
j-l
Pim -
M)
L Pim Pij (8.14)
m=1 m=j+l
has a standard Beta distribution with hyperparameters nij, niM as its conditional
posterior. To generate the transition probability Pij, we are sampling first pij from
this Beta distribution,
(8.15)
and then transfonn the draw pr j into the corresponding parameter of interest:
-1
j-l M
Pij
(
1- L Pim - L Pim
)
pTj · (8.16)
m=1 ==j+l
This procedure is iterated for j = 1, ... , M - 1, while the transition probability PiM
is determined by the adding up restrietion:
M-l
PiM 1- L Pij, (8.17)
j=1
where i = 1, ... , M.
8.4.2 Inverted Wishart Step
8.4.2.1 Conditional Posterior Distribution of the Variance-Covariance Para-

meters under Homoskedasticity
The conditional density for the variance-covariance parameters presents a slightly

more complicated task. Consider first the sampling of variance parameters when
~m = ~ holds for all m = 1, ... , M. Thus W- 1 = diag (~~, ... , ~~) ® ~-1,
and we can obtain a different expression for the likelihood function which will be
useful in order to determine the conditional posterior distribution of~:
p(YTI~, ..\)
where Ut(~t, ,) = ~~Ut, u* = [diag (0 ® IKju = (~® IK) u and W*-1 = (IT ®
~-1). By collecting the elements of u* in a (T x K) matrix U* = (ui, ... , ur)',
we have
, T T
u* Wu* u *' (IT ® "'-1)
L.J U* =~
~ u t*'",-1
L.J u t* = ~t
~ r ( u t* u t*'",-1)
L.J
t=l t=l
Hence, the conditional posterior distribution of ~ is given by
(8.18)
Thus, the joint probability distribution ofthe K (K + 1) /2 elements of~ is the inverse
Wishart distribution
with a = (T - K - 1) degrees of freedom, the (K x K) precision matrix A =

[U*U*'j-l and a normalization factor K(aj A). Note that the Gibbs sampler5 draws
5Implementation note: The generation of a (K x K) matrix ofWishart W(a, A) distributed random

variables can be achieved by generating vectors xi, i = 1, ... , a of K standard normally distributed
from a Wishart distribution,
:E- 1 +--> W(a, A), (8.20)
where the mean E[:E- 1 ] = (a - K -l)A ofthe conditional density of :E- 1 is exactly
the inverse of the ML estimate of :E under the conditions considered,
~U*U*'. (8.21)
Eventually, a slightly different calculation might be preferred such as:
(8.22)
where U = (UU,U12, ... ,UM,T-l,UMT) is a (K x MT) matrix containing the

errors in t conditional on the regime m, Umt = Yt - Xotl'o - Xmtl'm, which are
weighted with the probability of being in regime m at time t for a given diagonal
(MT x MT) matrix 3 = diag (~).
8.4.2.2 Conditional Posterior Distribution of the Variance-Covariance Para-

meters under Heteroskedasticity
If the variance-covariance matrix varies over regimes, it is convenient to split the

observed sampie in M subsampies according to the regime affiliation ~t. Denoting
Tm = {tiI::; t ::; T,~t = L m} and Tm = 'L.;=l ~mt = l~~m, the likelihood
function is then determined by
p(YTI~, >')
where ~~ = (~~l' ... ' ~~T) and Um = (y - Xo'Yo - Xm'Ym) = (U~l'···' U~T)'
is aT K dimensional vector. Collect the elements of Um in a (T x K) matrix Um =
= A 1/2 xi is NID (0, A). Then the random vari-

random variables, Xi '" NID (0, IK), such that Zi
able ~-l = 'L.~=l ZiZ; = A1/2 (E~=l XiXn A1/2' is W( a, A)-distributed. For adescription
of singular Wishart distributions involved in cointegrated systems cf. UHLIG [1994].
(Um!, ... , UmT) and after some algebraic manipulations, the conditional posterior
distribution of L;m is given by
(8.23)
Thus, the joint probability distribution of the K (K +1) /2 elements of L; is the inverse
Wishart distribution
K(a m ; Am)IL;;;/ 1-(a m +K+1)/2 exp { - ~ tr (A;;/ L;;;"/) }

(8.24)
with a m = Tm - K - 1 degrees of freedom and a (K x K) precision matrix Am =

(Um U~)-l. Note that again the Gibbs sampier draws from a Wishart distribution,
(8.25)
where the mean E [L;;;/] = (am - K - 1 )A m of the conditional density of L;~1 is

exact1y the inverse of the ML estimate of L;m for given ~m,
~ 1 I
L;m = Tm UmUm· (8.26)
8.4.3 Regression Step
For a simulated path of regime ~t conditions are established as if the regimes were
observable. Thus, the conditionallikelihood function model is equivalent to the like-
lihood function of an intervention VAR model. Such a model structure is associated
with struetural ehanges in time series where the parameter variations are systematic.
Given Rat priors phlYo, ~, 0') the eonditional posterior distribution of I is propor-
tional to the likelihood funetion given by equation (8.6). Therefore, we get anormal
distribution, i.e. phl~, YT , 0') is N (1', Var (-y)), where the postefior mean beeomes
the ML estimator 1'. A classical statistician would eonsider a normal distribution of
I given ~ to be valid only asymptotieally, sinee X t eontains lagged dependent vari-
ables. Hefe however, p( II~, YT) is the exaet small-sample posterior distribution as
in traditional Bayesian analysis (cf. HAMILTON [1994b, eh. 12]). Henee, the Gibbs
sampler6 is drawing / from a normal distribution with mean i and variance Var (i)
/ ~ N (i, Var (i))· (8.27)
The mean of the location parameters / is given by the well-known GLS estimator
which is identical to the ML estimator for given~,
where the matrices X, W, Y are defined as in (8.6). The posterior variance

Var (TI~, YT, eT) becomes
In equation (8.27), all VAR coefficients are drawn from their joint conditional
posterior density. MCCULLOCH & TSAY [1994a] suggest considering the con-
ditional posterior distributions of the regime invariant parameter vector /0 and
regime-dependent parameter vectors /1, ... , / M separately.
8.4.3.1 Conditional Posterior Distribution of the Regime Invariant Para-

meters
For the derivation of the posterior distribution of the common parameter vector /0
conditioned on the observations yt, the regimes ~, the variance-covariances eT, and
the regime-dependent parameters /1 , ... , / M, we are transforming the data by
and denote
o
Yo = [
Y01
YOT
]
XO = [
X01
X OT
]
W o- 1 --
~-1
T
].
6Implementation note: Since the conditional density p( 1'1~, YT, u) for the autoregressive parameter l'
is muItivariate Gaussian, l' rvN( 1',:E.y), a random sampIe vector l' can be generated by a vector e: of
independent standard normally distributed random variables, as l' = l' + Qe:, where the matrix Q is
the square root of the variance-covariance matrix I:.y such that QQ' = :E.y. This can be carried out
using a standard Choleski decomposition of the positive definite variance-covariance matrix.
Then the conditional posterior distribution of 10 is detennined by
pbol~,YT,U,'l'''','M) cx:exp { -~(YO -XOIO)'WÜl(yo -XoIo)}.

(8.30)
Thus, the Gibbs sampier draws again from a normal distribution,
10 ~ N (-ro, Var (10)). (8.31)
The mean of 10 is the ML estimator of (8.30),

l
10 = (X'0 W-lX
A
0 0
)-lX'0 W-
0 Yo (8.32)
and the posterior variance Var bo I~, YT, U, 11, ... , IM) is given by
(8.33)
8.4.3.2 Conditional Posterior Distribution of the Regime-Dependent Para-

meters
For the regime-dependent parameters, we consider the conditional posterior distri-

bution for the parameters of each regime Im, m = 1, ... ,M, separately. Again, it is
convenient to split the observed sampie in M subsampies Tm = {tll ~ t ~ T, ~t =
/'m}. Thus, the conditional posterior distribution of Im for ftat priors is given by
Pbml') pbmIYTTn,~m,Um)
cx: exp { -~(Ym - X mlm )' (IT ® l:~l) (Ym - X mlm )}
where Ym = Y - X O,O and X~ = (X;"l' ... , X;"T)' The joint prob ability distri-
bution of the elements of Im is therefore normal:
(8.34)
with moments given by LS estimates if the regressors of each equation are identical,
such that X m = (X m ® I K ) holds,
((X~Xm)-lX~ ® I K ) Ym,
(X~Xm)-l ® l:m.
162 MuIti-Move Gibbs Sampling
8.4.3.3 Conditional Posterior Distribution of Restricted Parameters
Since the label of states and the submodels are interchangeable, the MS-VAR model
would be unidentifiable in the data fitting process. Hence, certain constraints are ne-
cessary to overcome the identifiability problem. As pointed out by MCCULLOCH &
TSAY [1994a], the investigator must have some priorbeliefs about how in the partic-
ular application the states differ. These beliefs become part of the modelling process.
To render the model identified, we are considering restrictions on the state-dependent

means of the form /-Lm,k > /-Lm-l,k for m = 2, ... , M, 1 :::; k :::; K. Other restric-
tions could be introduced for the variance or the autoregressive parameters. ALBERT
& CHIB [1993] constrain /-LI > /-L2 and O'I > O'~. Again, the formulation of these
restrictions is more complicated for the vector case.
For the sake of simplicity we denote the restricted parameter /-Lm,k as the first com-
ponent in Im, m = 1, ... ,M. Hence, the conditional densities of Im are truncated
normal:
(8.35)
An obvious procedure is to draw Im from the multivariate normal distribution,
in equation (8.34) and then discard the draw which violates the restriction, i.e.
Im,l > Im-l,l·
The draw of Im from the truncated normal distribution can be more easily obtained
by the method of inversion. 7 Let the vector Im,2 contain the unrestricted parameters
of regime m and Oij denote Cov (Im, i, Im,j) such that for m = 2, ... , M:
[ ~:::] ~ N ([ ~::: ] , [~:: ~::]) Ibm,l < ,m-l,l)

To simulate Im we first draw a uniform random variable U t--' G[O, 1] and then de-
rive Im,l by the transformation
7For univariate time series, see ALBERT & CHIB [1993, p.5].
8.5. Forecasting via Gibbs Sampling 163
Finally, we draw Im,2 from its distribution conditional on Im,1:
In this section we have considered the principles of Gibbs sampling for parameter
estimation and regime reconstruction in linear unrestricted MS regression models.
Before discussing the use of the Gibbs sampIer as a forecasting device, Table 8.1
summarizes the results of this section in form of an algorithm.
8.5 Forecasting via Gibbs Sampling
A major advantage of the multi-step Gibbs sampIer compared with the classical ML
estimation is the feasibility of generating forecast intervals. If the iterations (8.36)
and (8.37) are embodied in the regular Gibbs cycle, sampIes can be generated sim-
ultaneously from the parameter posterior and the prediction posterior. As such, it is
possible to obtain the non-normal prediction density of any future observation.
The foundations of forecasting MS-VAR processes have been discussed in the con-
text of the linear state-space representation. However, the investigation was restric-
ted to MSPE-optimal predictions. Forecasting via Gibbs sampling has the objective
of determining the Bayes prediction density p(YT+hIYT). The issue of forecasting
future observations using a single-move Gibbs sampier is discussed in ALBERT &
CHIB [1993, p. 8]. Starting with the one-step prediction of YT+l, this can be easily
done using the decomposition
where T/T+1 contains again the conditional prob ability densities of YT+1
p(YT+1I~T+1 = LI, YT,'x) ]

T/T+l = [
P(YT+1I<T+1:= 'M, YT, A)
and ~T+1IT is the one-step prediction of the Markov chain given the full sampIe in-
formation set
Table 8.1: The Gibbs SampIer
J. Initialization.
(0)
'Yo (X~XO)-lXOYO
T
1 "'(
E(O)
m T L...J Yt - Xot'Yo(0))(Yt - Xot'Yo(0))'
t=l
('Y!,l'O'),
pO
11. Gibbs Cyde through Conditional Densities.
1. Filtering and Smoothing Step:
~T +-' ~TIT,
~T-j +-' [F'(~T-i+l 0 tT-j+1IT-j )]0 tT-jIT-j.
2. Hidden Markov Chain Step:
*
Pij
j-l M
Pij (1 - L pim - L Pim)-lP:j,

m=l m=j+l
M-l
PiM 1- L Pim·
m=l
3. Inverted Wishart Step:
E- 1 +-' W(Tm - K - 1, f::), (Homoskedasticity),

E~l +-' W(Tm - K - 1, f:: m ), (Heteroskedasticity, m = 1, ... , M).
4. Regression Step:
'Yo +-' N('Yo, Var('Yo)),

'Yl +-' N('Yl, Var ('Yd),
'Ym +-' N('Ym, Var ('Ym))I(-Ym,l < 'Ym-l,d, m = 2, ... , M.
ill. Iterate on Step 11 NI + N 2 times.

8.6. ConcJusions 165
Thus, sampies from the Bayesian predietion densities ean be obtained by sampling
~T+h +-' Pr(~T+hl~T+h-l, A), (8.36)
YT+h +-' p(YT+hl~T+h, YT+h- 1 , A), (8.37)
for eaeh draw of (~, A) made available via the Gibbs sampier. Implementing these
two steps along with the regular Gibbs eycle produees sampies on whieh ealcula-
tions of the predietion density ean be based. For eaeh eyde, the eonditional densities
p(YT+hl~T+h, YT+h-l, A) are normal, i.e.
p(YT+hl~T+h, YT+h- 1 , A)
N ( X O,t+h70 + fl {m,t+h X m,t+h7m, fl {m,t+hEm) .
Note, that the predietion density ineorporates both parameter uneertainty and state
uneertainty. This is extremely helpful for MS-VAR models, sinee the eonditional
distribution of YT+hIYT is a mixture of normals. For interval foreeasts, the eondi-
tional mean and varianee are not sufficient as in the Gaussian VAR model.
8.6 Conclusions
Gibbs sampling has many attraetive features. Foremost among these are its eomputa-
tional simplicity and its eonvergenee properties. A major advantage is its feasibility
to generate the non-normal prediction density of any future observation.
If the foreeasting reeursions are embodied in the regular Gibbs eycle, sampies from
the predietion posterior are generated simultaneously with those of the parameter
posterior.
The Gibbs sampling method, as weIl as EM algorithm based ML estimations, are

eomputationaIly intensive and therefore not well-suited for a speeifieation seareh for
the order of the autoregression and the number of regimes. Therefore, we suggest a
model specifieation strategy based on ARMA representations as diseussed in Sec-
tion 7.2.
Up to this point we have considered principles of Gibbs sampling in the framework

of the linear unrestricted MS regression model. Having the results of the last sec-
tion in mind, we are now in a position to evaluate the Gibbs sampier for special MS-
VAR models. In Chapter 9the Gibbs sampier will be applied to the MS(M)-VAR(P)
model in its various specifications introduced in Chapter 1. A comparison of the
Gibbs sampier and the previously discussed EM algorithm for maximum likelihood
estimation will be made there.
Chapter 9
Comparative Analysis of Parameter Estimation

in Particular MS- VAR Models
The general framework for ML estimation of the MS(M)-VAR(P) model was laid
out in Chapter 6. In Chapter 8 the methodological issues of Gibbs sampling and its
conceptional differences to the EM algorithm have been discussed. In this chapter,
we will focus on the technical aspects of estimation of the VAR coefficients under
the various types of restrictions. 1
The particular Markov-switching vector autoregressive models introduced in Chap-

ter 1 can be considered as versions of the general linear MS regression model with
alternative restrietions. Therefore estimation of these models might be carried out
with the estimation techniques introduced in Chapters 6 and 8. However a straight-
forward application of the GLS estimator used by the EM algorithm and the Gibbs
sampier has a major drawback, namely their computational requirement due to the
dimension of matrices 2 up to order (M KT x M KT) which have to be multiplied
and inverted. Hence, for a computationally efficient implementation, it will be im-
portant to make use of the specific structure of these models.
Before the particular Markov-switching vector autoregressive models are discussed
it might be worth to retrace the logic behind these two iterative procedures under
consideration. Thus we are starting with a comparison of the previously discussed
1 Note that in HAMILTON [1990] only the univariate MSIA(M)-AR(P) model is discussed explicitly.
The MSI(M)-AR(P) model and the MSIH(M)-AR(P) model are discussed under the assumption p =
0, which is a very crucial restrietion for purposes of time series analysis. It is therefore important to
relax it here.
2For example, the GLS estimation of the three regime models of the six-dimensional system in Chap-
ter 12 with 120 observations would involve multiplications with the (2160 x 2160) matrix W- 1 .
168 Comparative Analysis of Parameter Estimation in Particular MS- VAR Models
Table 9.1: Particular Markov Switching Vector Autoregressive Models
1
MSM Specification MSI Specification
J.I varying J.I invariant .
I 11 varying 11 invariant
~ invariant MSM-VAR linear MVAR MSI-VAR linearVAR

A; Table 9.18 p.195 Table9.8 p.185
invariant
~ varying MSMH-VAR MSH-MVAR MSIH-VAR MSH-VAR
Table 9.19 p.196 Table9.15 p.192 Table9.9 p.186 Table9.12 p.189
~ invariant MSMA-VAR MSA-MVAR MSIA-VAR MSA-VAR

A; Table9.20 p.197 Table 9.16 p.193 Table9.10 p.187 Table9.13 p.l90
varying
~ varying MSMAH-VAR MSAH-MVAR MSIAH-VAR MSAH-VAR
Table9.21 p.198 Table9.17 p.194 Table9.11 p.188 Table9.14 p.191
Notation: MS ... Markov switching mean (M), intercept tenn (I), autoregressive parameters (A) andlor
heteroskedasticity (H)
MVAR mean adjusted vector autoregression
oVAR vector autoregression in its intercept fonn
Gibbs SampIer and the EM algorithm for maximum likelihood estimation. After this
introduction we summarize in Seetion 9.1 the BLHK filter and smoother, which pro-
e
duce the vector of simulated regimes and the vector of smoothed regime probabil-
ities etlT respectively, as inputs for the maximization step and the regression step,
respectively. At the regression step of the Gibbs sampIer these smoothed regime
probabilities can be taken as if they were the true vectors of regimes. It has been
shown in Chapter 6 that the same does not hold for the EM algorithm. The resulting
set of regression equations yields a time-varying VAR with observable regimes and
is discussed further in Seetion 9.3.3. The implications for the EM algorithm (Sec-
tion 9.3.2) and for the Gibbs sampIer (Seetion 9.3.3) follow.
For the particular Markov-switching vector autoregressive models a number of sim-
plifications result and closed form expressions can be given for the GLS estimator
which have to be performed at each iteration of the EM algorithm (maximization
step) and the Gibbs sampIer (regression step) respectively. An overview is given in
Table 9.1.
9.1. Analysis of Regimes 169
Table 9.2: Indicators of Realized Regimes and Smoothed Probabilities
S""
(TxT)
= diag (~"")
- = diag(e) ~ = (~;, ... ,~:W)' T = tr(S) = 1~~
(MTxMT) (MTxl)
. ( ..., "")
(TxT)
~""
(Tx 1)
= ~""IIT""'~""TIT
S = diag(€) t = (t; , ... , t:w )'

(MTXMT) (MTXI)
9.1 Analysis of Regimes
In this chapter we are investigating the estimation of the parameters of the vector
autoregression for a given inference on the regimes (the maximization step of the EM
algorithm), respectively the derivation of the posterior distribution of the parameter
for given regimes in the sampie (the regression step ofthe Gibbs sampier). Since the
following considerations are based on a previous analysis of regimes within the EM
algorithm and the Gibbs sampier, we will discuss them briefly. In Table 9.3 on page
170, the usage of the BLHK filter and smoother at the expectation step of the EM
algorithm and the Gibbs sampier, as weH as the treatment of the parameters of the
hidden Markov chain, are visualized.
The multi-move property of the proposed Gibbs-sampler is guaranteed by the fil-

ter and smoothing step, which draws the whole state vector ~ from the conditional
probability distribution Pr(~IYT,'x) as described in Section 8.3.1. The draw ofthe
transition parameters from a Beta distribution has been discussed in fuB length in
Section 8.4.1.
The expectation step of the EM algorithm uses the forward recursion (5.6) of the filter
and backward recursion (5.13) ofthe smoother. The transition probabilities Pij are
estimated with the transition frequencies ~ which are calculated from the smoothed
regime probabilities ~ilj.. according to (6.14).
Furthermore, in order to maintain an identical notation for the next remarks, the in-
formation produced by the BLHK filter and smoother are summarized in Table 9.2.
Note that we have introduced no new symbols for the simulated regimes, thus main-
taining the use of ~.
Table 9.3: Analysis of Regimes
Gibbs-Sampler
1. Filtering and Smoothing Step
~T +-' ~TIT,
~T-j +-' [P(~T-j+1 0 tT-j+1IT-j)] <::) tT-jIT-j, j = 1, ... , T - 1.
2. Hidden Markov Chain Step

j-l M
Pij (1 - L Pim - L Pim)-IP ;j, "i = 1, ... , M, j = 1, ... , M -1,
m=1 m=j+l
*
Pij Beta(nij, niM),
M-l
PiM 1- L Pim, i = 1, ... , M.
m=1
EM Algorithm
1. Expectation Step
F(1Jt ttlt-d
<::)
1 M(1Jt 0 ttlt-d
[F'(tT-j+1I T 0 tT-i+1IT-j)] <::) tT-jIT-j, j = 1, ... , T-1.
2. Maximization Step
Pij
M-l
PiM 1- L Pim, i = 1, ... , M.
m=1
9.2. Comparison of the Gibbs SampIer with the EM Algorithm 171
9.2 Comparison of the Gibbs SampIer with the EM

Algorithm
For a given state vector~, the regression step is based on the same estimation pro-
cedures established for ML estimation via EM at which 2tlT is substituted by E t . If
the priors are ftat, then the estimates maximize the likelihood function and the ML
estimates are derived as the mean of the posterior distribution.
To make the procedure a bit more transparent it may be helpful to compare the Gibbs
sampier with the EM algorithm. There is a superficial similarity. Suppose that in-
terest centers on ML estimation such that the priors are ftat. Then the multi-stage
Gibbs cycle results in the following sampling instructions: 3
A +---' p(AI~, YT ), (9.1)
~t +---' Pr(~tIYT, A). (9.2)
Iterating the Gibbs cycle N times, N --+ 00, produces the joint posterior distribution
of (A, 0 and thereby the marginal posterior distribution of A. The ML estimator for
A is the maximizer of this function. In other words, each draw of the Gibbs sampier
can be considered as the ML estimate plus noise.
The EM iteration produces the most probable draw of the Gibbs sampier. Instead of
sampling the regimes and parameters from the posterior distribution as in the Gibbs
sampier, at each iteration of the EM algorithm the means of the conditional probab-
ility distributions, €tlT (expectation step) and)' (maximization step), are calculated.
At each iteration the EM algorithm maximizes
i p(AI~, YT ) Pr(~IYT, A(j-I») d~

T
cx II I:p(Ytl~t, Yt-I; A) Pr(~tIYT, A(j-I»), (9.3)
t=l ~t
where Pr(~IYT, A(j-I») is the predictive density of ~ given the observations YT and
30bviously, equation (9.1) is a simplification since the parameter vector A is further decomposed. But
this does not substantially affect the following considerations.
the parameter vector A(j -1) derived at the preeeding iteration. As shown by HAMIL-
TON [1990], the EM algorithmeonverges to the ML estimator)., where). maxirnizes
the likelihood funetion
p(AIYT ) ip(~, AIYT )d~
i p(AI~, YT ) Pr(~IYT )d~
cx rr
T
t=l
LP(Ytl~t, Yt-l; A) Pr(~tIYT, A).
~t
(9.4)
Therefore, under the condition of flat priors, the ML estimate A is the fix point of
the EM sequenee as wen as the mode of the posterior prob ability density funetion
p(AIYT) from whieh the Gibbs sampIer is drawing A.
While the EM algorithm is less eomputationally demanding, it does not provide the
posterior distribution of the parameters and an estimate of the varianee-eovarianee
matrix direetly. eurrent estimation theory delivers only information about the
asymptotie distribution of A.
9.3 Estimation ofVAR Parameters for Given Regimes
Sinee methodologie al aspeets have already been dealt with, we ean now coneentrate
our interest on technieal issues. As a basis for the following diseussion, the estima-
tion methods introdueed in Chapter 6 and Chapter 8 are outlined in Table9.4 and
Table 9.5.
9.3.1 The Set of Regression Equations
Retracing the logic behind these two iterative proeedures, we see that the inputs for
the regression step are given by the observations y, X and the veetor of simulated re-
gimes ~ (respectively the vector of smoothed regime probabilities ~tIT) which have
been produeed by the BLHK filter and smoother or via simulation. These are taken at
the iteration as if they were the true (though unobserved) vectors of regimes. Each
pair of observed dependent and exogenous (or lagged dependent) variables Yt, X t
9.3. Estimation of VAR Parameters for Given Regimes 173
and regime St = m is considered as a single observation with the (fractional) fre-

queney of observation imtlT respeetively ~mt. Thus ~t, ~tlT enter the regression as
the number of observations per eell via the weighting matrix W- 1 • As such
Yt { fh. + ~:/2"., with probability €lt
YMt + r;~2Ut, with probability ~Mt
where Ut '" N (0, IK) and Ymt = X OtlO + Xmt'Ym. There is a dummy variable
corresponding to each regime and the dummy variable that corresponds to regime m
will take the value unity if the regime m has been drawn for St by the Gibbs sampier.
In the eontext of the EM algorithm, imt stands for the smoothed probabilities imtlT'
To get a better insight into this feature it may be worth noting that the estimator used
coincides with the GLS estimator of the following regression model:
:, ] 0 I K ) ?+ [ :~ ]
X'Y+ u ,
U <'V N (0, W) , W =
~~:EJ . (9.5)
Since the inverse of 3 m does not really exist, this formulation of the set of re-
gression equations is only a theoretical construet which features formal equival-
enee. However, Var (Ytl~t) ~ 00 insures that the likelihood of observing a tripie
(Yt, Xt, ~mt) is identieal to zero. Henee, the observation Yt eannotbeproduced (with
a positive probability) from regime m.
This linear statistical model implies an ML estimator whieh is exaetly the GLS
estimator introdueed in Chapter 6 and Chapter 8:
(9.6)
As one ean easily imagine, this set of regression equations eombines many features
better known from pooled times-series and eross-sectional data using dummy vari-
ables (cf. JUDGE et al. [1988, eh. 11.4]).
174 Comparative Analysis of Parameter Estimation in Particular MS-VAR Models
Conceming the Gibbs sampier, we have ~mt = I(~t = (,m) E {O, I} and we can
eliminate those equations where ~mt = 0, m = 1, ... , M and pool the remaining
T K equations of the MT K dimensional system (9.5) to the following system,
M
y XOIO + L (3m X 1 ® I K ) Tm + U,
m=l
For convenience, only this pooled regression equation is given in the tables.
9.3.2 Maximization Step of the EM Aigorithm
The GLS estimator computed at each iteration of the EM algorithm is given in

Table 9.4. For the following considerations it is quite useful to put the GLS estimator
in a slightly different form, as
If the regressors are identical for each equation Yk and regime m, X m = Xm ®IK,
equation (9.7) will yield for the particular MS-VAR models
'i (t, (X:" 0I K) (3 m 0 E;;;I) (X m 0 IK)) -1
(t, (X:" 0 IK ) (3 m 0 E;;;I) Y) .
Thus, the GLS estimator can be represented as
This formulation ofthe estimator has two advantages compared to the standard GLS
form of Chapters 6 & 8. First of all it requires only the multiplication with weighting
matrices ofthe order (T K x T K), whereas formula (9.6) would require to multiply
matrices up to order (MT K x MT K). Thus the computational burden at each it-
eration is not much high er than M times the effort for a time-invariant VAR model.
For example, the ML estimator ßand the mean /3 of the posterior conditional distri-
bution of the MSIA model are identical to those of the MSIAH (cf. Tab1es 9.10 and
9.11). This is due to the presence of only regime-dependent parameters, which are
estimated regime by regime. In both models there prevails homoskedasticity in each
regime and the GLS estimator shrinks to an LS estimation. The GLS estimates can
be calculated faster as a LS estimation by
with the previously transformed variables
Xm = s~(2x = (Vf,m 0 1 :WK(1+Kp)) 0 X,

Ym= (S;!2 0IK )y = (Vf,m 01K )0Y,
whereS;!2 = diag (~m) ,and ~m = ( ~m,lIT'···' ~m,TIT) '. Analogously

a faster computation can be achieved for the estimator of the EM algorithm, where
Sm has to be replaced by 3m • Note that ~m == ~m since ~t,m is binary. Obviously,
that does not hold for the smoothed probability vector ~m.
Also, under homoskedasticity ofthe innovation process Ut, I: m = I:, W- 1 = (30

I:-l) the GLS estimator can be simplified further to
Table 9.4: Maximization Step of the EM Algorithm in Linear MS Regression Models
Regression Equation.
(1M ®y) = X-y+u, u ...... N(O,W).
Maximization Step oe the EM Algorithm: Regressions.

1. Coefficients
a. GLS Type Regression

-y = (X'W- 1 X)-l X 'W- 1 (1M ®y).
b. Identical Regressors X m = Xm ® IK
7 ~ (~(X:'SmXm) 0 i':;;;1) -1 (~(X:'Sm) 0 i':;;;, ) y.

2. Covariance Parameters
a. Homoskedasticity
m=l
b. Heteroskedasticity
Definitions
]
[ 81 0 t,' 0
W-l =
(MTKxMTK)
0
3M ®t-;)
X
(MTKxKR) []J. Y
(TKxl)
,
= (Yl"" 'YT
, )' ,
Üm Y - (X m 0IK)"Y, Y = (Yl, ... , YT )' .

(TxK) (TxK)
Instead of multiplying with an (MT K x MT K) matrix, only a (T x T) matrix is

involved. In matrix notation we would have with Y = (Yl, ... , YT )'
If all parameters I are regime dependent, i.e. there are no common parameters, then
Xm = (0, ... ,0, X m , 0, ... ,0). Thus, in an MSIAH(M)-VAR(P) model, each
parameter vector Im can be estimated separately,
where each observation at time t and regime m is weighted with the smoothed prob-
ability ~tIT'
9.3.3 Regression Step of the Gibbs SampIer
Under uninformative priors, the mean of the posterior distribution of the VAR para-
meters:Y is technically almost identical4 to the ML estimator1', where the vector of
smoothed regime probabilities t is substituted with the drawn regime vector 5 ~ and
the remaining parameters are also drawn by the Gibbs sampier.
Despite their conceptual differences, the technical similarities of the regressions in-
volved with the Gibbs sampier as by the EM algorithmjustify considering the estim-
ators together.
As we have seen in the presentation of the Gibbs sampier in Chapter 8, the estima-
tion procedures involved are conditioned to a higher degree than those at the max-
imization step of the EM algorithm. However, in principle, the partitioning of the
parameter vector I and conditioning on the rest of the parameters can be done in the
same way within the EM algorithrn.
4 This technical point of view should not neglect the alternative theoretical foundations in classical and
Bayesian statistics on which the EM algorithm of ML estimation and the Gibbs sampier are built
respectively.
5 Remember that eis sampled from the discrete probability distribution ~tlT = E[et IYT] which is used
by the EM algorithm.
Table 9.5: Gibbs Sampling in Linear MS Regression Models
Regression Equation.
M
Y XO'Yo +L (Sm 0 I K )Xm 'Ym + U, U '" N (0, n).

m=l
Homoskedasticity: n = IT 0 I:.
L Sm 0 I:
M
Heteroskedasticity: n= m .
m=l
Gibbs SampIer: Regression Step.

1. Regime Independent Parameters
a. Moments of the posterior conditional distribution
-1 ) -1 -1
2:~=1 (Sm 0 1~ )Xm 'Ym,
I I
io ( XoW o Xo XoW o Yo, Yo = Y -
Var(iol') = (X~ Wö1XO)-1, Wo-1
= ",M ~
L.."m=l '::'m 0
~-l
LJ m .
b. Identical Regressors X m = Xm 0 IK
10 (,t (X~BmXo) ® 10;;;') -, (,t(X~Sm) ® 10;;;') Yo·

VM (101-) ~ (,t(X~SmXo) ® 10;;;') -,
2. Regime Dependent Parameters
a. Moments of the posterior conditional distribution
im ( X~ (Sm 0 I:;;'/) X m ) X~ (Sm 0 I:;;/) (y -

-1 XO'Yo).
Var(iml') = (X~ (Sm 0 I:;;.l) Xmr 1 .
b. Identical Regressors X", = :Km 0IK

im (X~SmXm)-l X~Sm 0 IK ) (y - XO'Yo).
Var(iml') = (X~SmXm r 10 I: m.
3. Covariance Parameters
a. Homoskedasticity
M m
t T- 1 LU:"SmUm, Um =Y-Xo'Yo- LSmXm'Ym.

m=l
b. Heteroskedasticity
-1U'm='m
Trn ~ U
m·
Tab1e 9.6: MSI Specifications of Linear MS Regression Models
Model EMAlg. Gibbs Sampier

MSI-VAR &MSIH-VAR , = (v~, ... , v:W ' a')' ,m = m V ,0 = a
MSIA-VAR&MSIAH-VAR , = h~, ... , ':W )' , = (v'm' a'm )' ,0 = a
,m = (v:,., a:"'),
Tn
MSIA-VAR&MSIAH-VAR , = h~, ... , ':W )'

MSA-VAR &MSAH-VAR , = (,~, ... , ':W )' ,m =am ,0 = v
MSH-VAR , = (v', a') ,0 =,
9.3.4 MSI Specifications
MSI specifications have the convenient property that the closed-form of the estimator
follows immediately from the definition of the parameter vector in Table 9.6. 6
Inserting these definitions 7 in the formulae derived above yields (after some algeb-
raic manipulation) the estimators given in Tables 9.8 - 9.17. In particular for the
MSI -VAR & MSIH-VAR model and the MSIA-VAR & MSIAH-VAR model, the es-
timators can be given in a very compact form.
It might be worth noting the analo gy of the formulae for MSI-VAR (MSIA -VAR) and
MSIH-VAR (MSIAH-VAR) models. This result can easily be visualized by deriving
the estimator i of the MSI-VAR model from the estimation equation given for the
MSIH-VAR model under the restrietion E m = E for a11 m:
6Note that the corresponding regressor matrices are defined in the tables, accordingly.
7 To avoid any misunderstanding, for the filtering procedures we have assumed that the matrix r contains
regime-dependent and regime-invariant parameters. However, it is useful for estimation purposes to
split the parameter vector , into regime-invariant parameters ,0 and the parameters ,m belonging to
regime m, m = I, ... , M. See e.g. Chapter 8.
The estimations of the MSI(M)-VAR(P) model computed at each iteration of the EM

algorithm and the Gibbs sampier are given in Table 9.8. The estimator
B' = (Z'3Z) -1 Z'3 (1M 0Y)
associated with the maximization step illustrates the principle of weighted LS esti-
mation. However, B' is not identical to an LS estimation of the corresponding regres-
sion equation where the smoothed probabilities collected in 3 are substituted for the
unobserved 3:
tr' = (Z*' Z* ) -1 Z*' y*
i- 3, it can be easily verified

A A A'A A
with Z* == SZ and y* = :::(IM 0 Y). Since::: :::

that tr' =f. 13'.
The conditioning principle of the Gibbs sampler results in ordinary LS estimations
of the means A' and Dm, m = 1, ... , M of the posterior distributions. The me an A'
is calculated via LS estimation by intercept-adjusting the observation Yt according to
the drawn regime vector ~t. The means Dm of the regime-dependent intercept terms
are calculated separately for each regime m = 1, ... , M by correcting the observa-
tions for the effect of the lagged endogenous variables.
Comparing the results for the MSIAH(M)-VAR(P) model in Table 9.11 with the
estimator obtained for the MSIA(M)-VAR(P) model in Table 9.10, it turns out that
the ML estimator ßof the parameter vector ß and the mean /3 of the posterior distri-
bution of ß are identical. This is due to the fact that the parameter vector ßm asso-
ciated with regime m can be estimated separately for each regime m = 1, ... , M.
Thus the GLS estimator under heteroskedasticity and the weighted LS estimator un-
der homoskedasticity are identical. Differences in the treatment ofboth models con-
cern only the estimation ofthe covariance parameters:E versus :EI, ... , :E M • which
are estimated under regime-dependent heteroskedasticity with the residuals of re-
gime m weighted with the smoothed regime probabilities 3m •
In Table 9.13 and Table 9.14 the estimation of intercept-form VAR processes is
presented, where only the autoregressive parameters - and in the MSAH-VAR model
the variance-covariance matrix - are subject to regime shifts. Due to the conditioning
principle of the Gibbs sampier the regime-dependent autoregressive parameters can
be estimated separately, such that G LS and LS estimation are equivalent. Thereby
the estimation of the regime-invariant intercept terms is affected by heteroskedasti-
city.
The effect of heteroskedasticity is isolated in the MSH(M)-VAR(P) model in

TabJe 9.12. In the GLS-type estimation of the regime-invariant parameter vector ß,
the pseudo-observations (Yt, Xt, ~t = (,rn) are weighted with their probability tmt
(Gibbs sampier: ~rnt) and their precision t;;;,l (Gibbs sampier: t;;;,l).
In Tables 9.15 - 9.17 the estimation of MS models with regime-dependent auto-
regressive and covariance parameters is reconsidered for processes with a time-
invariant mean (instead of a time-invariant intercept term as in Tables 9.12 - 9.14).
The resulting normal equations are non-linear as in the mean-adjusted time-invariant
VAR model (cf. LÜTKEPOHL [1991, sec.3.4.2]). This problem is solved at the max-
imization step of the EM algorithm by conditioning. Thus we employ the same prin-
ciple that has already been used by the Gibbs sampier.
The interdependence of the estimators for ß and Ern involved at the maximization
step of the EM algorithm normally require iterating between both equations recurs-
ively at each maximization step. However, to ensure convergence of the EM al-
gorithm to a stationary point of the likelihood function, in most cases it is sufficient
to ron regressions conditional on some estimates of the previous maximization step
(cf. the ECM algorithm, RUUD [1991]). Therefore, we estimate ß given the estim-
ated Ern of the last maximization step et vice versa in order to operationalize the
regression step. As an alternative one could consider ronning an interior iteration
until ß and E converge. This problem diminishes in the Gibbs sampier due to its
construction principle of conditioning.
9.3.5 MSM Specifications
For a reconstructed path of regimes the MSM-VAR model results in an intervention

model with a discrete change in mean. Since such a model exhibits non-linear re-
strictions on T, an exact maximum likelihood estimation of this model results in non-
linear normal equations (as shown by LÜTKEPOHL [1991]). This problem is over-
come in our analysis by splitting the parameter vector I into autoregressive and mean
parameters.
In contrast to MSI specifications which have been identified as restricted linear MS

models, two major problems arise concerning MSM specifications:
1. The conditional density of Yt depends not only on the actual regime, but also
on the last p regimes.
Table 9.7: Notation in MSM Models
.:::*
Xn
(TxKp)
.
XI' =
~p
- L...Jj=O V 0 Aj,
(MHp KxMK)
x~ =
(KxMK)
Lj
(MxMHp)
X
(TxKp)
Y_j (Yl-j, ... , YT-j)'
(TxK)
y (Y~, . .. ,Y~)'
(TKXl)
Um Y - X* A~ - (lT 0 J1.')
(TxK)
Üm Y - X;",.B'
(TxK)
2. The regression equation is no longer linear in the parameter vector /, i.e. the
vector of means J.-t = (J.-ti, ... , J.-t~ )" and the vector of autoregressive para-
meters 0: = (o:i, ... , o:~ )' .
We will now discuss briefly how these problems can be solved:
1. Themodel is rewritten by usingthe MP+l dimensionalstate vector~t = ~~l) 0

~~:!l 0 ... ~~:!p. This definition ofthe state vector involves an (MP+l x MP+l)
matrix of transition probabilities F = diag (vec P ® I,MP-l )(I,M ® IMP ®
l,~ ). Note that this procedure has been discussed in more detail in Seetion 2.2.
Hence, the maximization of the likelihood function for an MSM specification
can be computationally demanding if T Ml+P is large.
2. The problem of non-linearity is overcome by conditioning as proposed by

FRIEDMANN [1994] for univariate MSM-AR and MSMH-AR processes.
Note that Yt is linear in J.-t conditional on 0: and ~ as in equation (1) of
Table 9.18 and that Yt is linear in 0: conditional on J.-t and ~ as in equation
(2). Thus, MSM-VAR models can be analyzed with the statistical methods
developed for the general linear MS regression model in Seetion 6.4.1 and
9.4. Summary 183
Section 8.4.3 (despite the non-linear restrietions on the reduced form para-
meters). The convergence of the estimates is ensured by an internal iteration
in 5:(J-t, 0"), Jt(a, 0"), and if(J-t, a).
From these principles the maxirnization step of the EM algorithm and the regression
step of the Gibbs sampier ensue. The resulting explicit closed-form estimators are
given in Tables 9.18 - 9.21.
Tbe MSM(M)-VAR(P) model considered in Table 9.18 differs from the MSMH-
VAR model in Table 9.19 by the restrietion EI = ... = E M = E on the parameter
space. Inserting this restrietion into the estimators ofthe MSMH(M)-VAR(P) model
in Table 9.19 results in a weighted LS estimation of the autoregressive coefficients a.
Meanwhile, the estimation of the mean J-t involves a GLS estimator even if Ut is ho-
moskedastic. Since the regressor X~t is not identical in each single equation of the
vector system, the GLS-type estimation of the regime-dependent means J-t remains.
Interestingly, in the MSA(M)-VAR(P) model given in Table 9.13 and the
MSAH(M)-VAR(P) model given in Table 9.14, the regime-dependent auto-
regressive parameters a m can be estimated for each regime m = 1, ... , M
separately while the regime-dependent means J-tm have to be estimated simul-
taneously. This results from the presence of J-tm in the regression equations with
St f. m Thereby a m enters the regression equation if and only if St = m. Moreover,
it follows that a m is estimated with weighted LS and J-t is estimated with GLS irre-
spectively of homo- or heteroskedasticity of the innovation process Ut. Due to the
comrnon principle of conditioning, the regressions required by the EM algorithm
and the Gibbs sampier are identical, if the estimated parameters and smoothed
probabilities are replaced by their sampled values et vice versa.
9.4 Summary
The preceding discussions have shown the power of the statistical instruments given
by the BLHK Filter for the analysis of regimes, the EM algorithm for the ML esti-
mation of parameters and the Gibbs Sampier for simulating the posterior distribution
of parameters and the predicted density for future values involving Bayesian theory.
Various specifications have already been introduced. Nevertheless, some extensions
of the basic MS-VAR model rnight be useful in practice. They will be considered
briefly in the next chapter.
9.A Appendix: Tables
Before that, as an appendix, we present for particular MS-VAR models the closed
form expressions of the GLS estimator employed at each maximization step of the
EM algorithm, respectively at each regression step ofthe Gibbs sampier (cf. the over-
view given in Table 9.1).
9.A. Appendix: Tables 185
Table 9.8: The MSI(M)- VAR(p) Model
Regression Equation
y=I)3 T11 1T0IK )vrn+(X0IK)a+u, u'"'"'N(O,D), D=!r0r;

m=l
EM Algorithm: Maximization Step
Gibbs Sampier: Regression Step
A' = (x'x) -1 X' (Y - 3 N')

Var (&1,) = (X'X) -1 (9 r;-1
v:n = T~l~~ (Y - XA ' )

Var (vrnl') = T~l 0 r;-1
i; =T- 1 U I
3U
Definitions
E (6, ... ,~TY

X
(TxKp)
= (Y-1, ... ,Y_ p ) (TxM)
Xm = (IT 0 t~; X) :a:
(TxM)
= (e1IT, ... ,ETIT)'
(Tx[M+Kp])
Z
(MT X [M +KpJ)
= (IM01T,IM0X) S
(MxM)
diag (1!z.8)
y.
(TXK)
= (Y1-j, ... , YT-j Y U
(MTxK)
= 1M 0 (Y - XA I ) - N' 01T
y ( I I )1
(TKx1)
Y1"" 'YT Ü 1M 0Y-ZB '
(MTxK)
Table 9.9: TheMSIH(M)-VAR(p)Model
Regression Equation
M M
Y= LC:::TnlT QSi IK)VTn + (x QSiIK) 0 + u, U rv N(O,!1), !1 = L 3 111 181 E 111
111=1
ß= (t(:X~3TnXm) QSi E;;.I) -1 (t(X~3m) E;;.I) QSi Y

m=1 Tn=1
- '-1-''':''-
E Tn = Tm U m'::' 111 U Tn
Gibbs SampIer: Regression Step
,,= (~(X'E_X) ® E;;1) -1 [~ {(X'E_) ® E;;1) (Y -lT ®"- ) } 1

V", ("I.) (~(X'E_X) E;;1) -1
= ®
V111 = T~I(~~ QSi IK)(Y - (X 181 IK )0) = T;;;1 (Y' AX') ~111
Var (vTn I') = T~1 QSi E;;.l
f,
.l.Jm = y-1U'
m
~ U
m="m m
Definitions
X
(TxKp)
= (Y-l, ... ,Y- p )
y.
(Tx-k) = (Yl-j, ... , YT-j)'
(TIxI)
= (Y~,' .. , Y;')'
Xm
(Tx[M+Kp])
= (lT QSi L~; X)
Um Y - XA' - (lT QSi v~)
(TxK)
Üm Y -XmB'
(TxK)
Table 9.10: The MSIA(M)- VAR(p) Model
Regression Equation
Y = L(3mX ® IK)ßm + u, u "" N(O,n), n = IT ® ~

",=1
ßm = (X'SmX)-l X'Sm ® IK) Y

B'rr, = (X'SmX)-l X'Sm y
E=T- 1 Ü'SÜ
ßm = «X'3mX)-lX'3 m ® IK) Y
iJ'rr, = (X'SmX)-lX'Sm y
Var (ßml') = (X'3mX)-1 ® ~
f; =T- 1 U'3U
Definitions
X
(Tx[Kp+1])
= (IT,y-1,""Y_ p )
y. = (Yl-j, ... ,YT-j)'

(Txlt)
(TIxI)
= (y~, . .. ,y'r)'
ßm
([K 2 p+1jX1)
=
U = 1M ® y - (IM ® X)(Bl"" ,Bm )'
(MTxK)
Ü 1M ® Y - (IM ® X)(E1"'" Em )'
(MTxK)
Table 9.11: The MSIAH(M)-VAR(p) Model
Regression Equation
M M
Y = L(SmX ® IK)ßm + u, u '" N (0, ü), ü= LSm®~m

m=l m=1
ßm = ((X'Sm X )-lX'Sm ®IK) Y

=> B;" = (X'SmX)-lX'Sm y
- • -1 -,,::.. -
~Tn = Tm U,.,.'='m U=
/3m = ((X'S=X)-l X'Sm ®I K ) Y

=> B'm = (X'SmX)-l X'Sm y
Var (ßrn\') = (X'SmX)-l ® Ern
.f.
L....Jm
rp-1 U
==.J.. 1n
~ U
'1n~rn. m
Definitions
(Tx[Kp+1))
X = (lT,Y-1, ... ,Y_p)
y.
(TX-K) = (Y1-j, . .. , YT-j)'
(TIxI)
= (y~, ... ,Y;")'
ßm
([K 2 p+1)X1)
= (v:n , Q~)' = vec B~
Um y - XB:n
(TXK)
Ü=
(TxK)
= Y-XB~
Table 9.12: The MSH(M)- VAR(p) Model
Regression Equation
M
Y = (X 0 IK)ß + u, u rv N (0, n), n= L3 m 0 ~m
m=l
Definitions
X (lT,Y-l, ... ,Y_p)
(Tx[l+Kp]) U Y - (X 0IK)B'
y. (Yl-j, ... ,YT-j)' (TxK)
(Tx-jq Ü Y - (X 0IK)B'
y (y;, ... , Y~)' (TxK)
(TKxl)
Table 9.13: The MSA(M)- VAR(p) Model
Regression Equation
M
Y = (IT 1S1IK)v + 2)SmX 0 IK )a m + U, U IV N (0, n), n= IT 0 1;

m=1
ß ~ ((t, X~SmXm) -, (~X~Sm) ®IK) Y
iJl ] [T t~ X ... t~ X ] -1 [ Y ]
=> [ ~; ~ ~t X'S,X . ~. _ ~'~'Y
A'u X'~M 0 X'SMX X'SMY
M
E= T- 1 L Ü~3mÜm
m=1
am = -I -)-1 (-I)
(( XSmX XSm 0IK ) (y-IT0v)
Var(aml') = (x's m xr 1 01;
V ~ (T-' ®IK) (Y - ~(smX ®IK)am )

Var(lIm l') = T- 1 01;
L U~SmUm
M
t =T- 1
m=1
Definitions
X = (Y-1, ... ,Y- p )
(TxKp)
y. (Yl-j, ... ,YT-j)'
(Tx1n
( I I )'
(Tlx1) Yl"" 'YT
Xm
(Txl+MKp)
= (IT,t~ 0X)
Um
(TxK)
= Y - XA~ - (IT 0 V')
Üm Y-XmB '
(TxK)
9.A. Appendix: TabJes 191
Table 9.14: The MSAH(M)-VAR(p) Model
Regression Equation
M M
Y = (lT 0 IK)V + I::(2 m X 0 I K )a m + U, U rv N (O,n), n = I:: 2 m 0 ~m
711=1 m=1
Definitions
X
(TxKp)
= (Y -1, ... , Y _p)
y.
(Tx-i) = (Y1-j, ... , YT-j)'
(Tlxl)
= (y; , ... , Y!r)'
Xm
(Tx [1+M Kp])
= (lT,t~ 0 X)
Um = y - XA~ - (lT 0 v')
(TxK)
Üm
(TxK)
= Y - XmB'
Table 9.15: TheMSH(M)-MVAR(p)Model
Regression Equation
L 3= 0
M
= (IT 0 IK)/J + (X* 0 IK)O: + U, U rv N (0, n), n= L: m
m=l
Gibbs Sam pier: Regression Step
Definitions
x* = X-(ITI~0/J)
A(l)
(KxK)
= IK -l:~=1 Ai
(TxKp)
:::*
X
(TxKp)
= (Y -1, ... , Y -p) X
(TxKp)
= X - (ITI~ 0M
y.
(Tx-k) = (Yl-j, . .. ,YT-j)'
(TxK)
u = Y - X* A' - (IT 0 /J/)
= ... , Y;')' =.. -,

(y~,
=
I
(Tlxl) Ü Y - X A - (IT 0 ji. )
(TxK)
Table 9.16: The MSA(M )-MVAR(p) Model
Regression Equation
M
=(IT ® IK)J.L + L (3 Tn X* ® IK )O:m + u, U f'v N (0, D), D = IT ® 'E
711=1
QTn = ( (**' 3711 X*) -1 (X*' 3 m ) ® IK ) (y - IT ® [L)
i' = (~T"Ä..(1)'t-' Ä .. (l)) -, (~ (t:.. 00 Ä .. (1)'t-') (Y - (X OO'K)O .. ) )

M
t=T- 1 LÜ~3mÜm
711=1
&m = ( (X*'Sm X *r 1 (X*'Sm) ® I K ) (y - IT ® /1)
Var(&ml') = (X*'SmX*) -1 ® 'E
"= (~T. A .. (1 )'E-' Am(l)) -, (~ (,:.. 00 A .. (l )'E-') (Y - (X 00 r.- )a.. ) )
Va, Ci' .. 1-) = (~T"A.. (»E-'A.. (1)')-'

M
t = T- 1 L U~3mUm
==1
Definitions
x* = X - (lTl~ ® /1) A=(l) = IK - 2:;=1 Amj
(TxKp) (KxK)
X = =*
(TXKp)
(Y-1,""Y-p) X
(TXKp)
= X - (lTl~ ® j1)
y.
(Tx-K)
(Y1-j, . .. , YT-j)' U= = Y - X* A~ - (IT ® /1')
= ( ,
Y1""
, )'
'YT
(TxK)
Ü= = y - X;,.i3'
(Tlx1) (TxK)
Table 9.17: TheMSAH(M)-MVAR(p)Model
Regression Equation
M M
=(IT 12> IK)I-' + L (2 771 X* 12> IK )0 771 + ", " rv N (0, 0), 0 = L 2 771 12> I: 771
771=1
0 771 = ( (x*' 3 X*) (x"' 3

771 -1 7n ) 12> IK ) (y - IT 12> jj.)
; ~ (~T mAm (1)' ;;;;;' Am(I)) -1 (j; (t;." Am(I)' ;;;;;1) (Y - (X® IK )<im) )
0771 =( (X*'2 771 X*) -1 (x*'2 771 ) 12> I K ) (y - IT 12> 1-')
Var{arnl') = (X"'2 X"r 1 12> I:

771 771
P ~ (~TmAm (I)' E;;;' Am (I)) -1 (j; ((;. ®Am(l)' E;;;') (Y - (X® IK )am ) )
Definitions
X" = X- (lTl~ 12> 1-') A 7n {l) = IK - L~=l A771j
(TxKp) (KxK)
(TxKp)
X (Y -1, ... , Y -p) "'''
X
(TxKp)
= x - (lTl~ 12> p.)
y. = U X* A:"
(TX-lh
(Yl-j, •.. ,YT-j)' 771 = Y - - (IT 12> 1-")
(TxK)
(Tlxl)
= (y~, ... , Y;')' Ü 771 = Y - X;."iJl
(TxK)
Table 9.18: The MSM(M)- VAR(p) Model
Regression Equations
(t L ~n 0X~) 11 0IK)
y = + (X a + u, (1)
t {L (~n IK ) 11m} (t L 3nX~

m=l n{m}
y = 0 + 0 IK) a + u, (2)
m=l n(m} m=l n(m}
U '" N (0, n), n = IT 0 L;
Ci = [ { (t LX:'
==1 n(=)
2nX:) -1 (t LX:' 2n) }0IK] (Y -
",=1 n(m)
lT 0
jj.", )
jj.= (tL1'nX~'f;-lx~)_l [(tLt~0X~'f;-I) (Y-(X0IK)a)]

==1 n(=) ==1 n(=)
f; =1'; .1 L L U~2n Un
M
==1 n(",)
Var(&I'} = (t L
==1 n(",)
X:'2 n X:) -1 0!:
t = T- 1 ~ ~ u 'n '=
L-..; L-..;
n1. ...... n
Un
==1 n(",)
Table 9.19: TheMSMH(M)-VAR(p)Mode1
U '" N (0, 11), !1 = LL 2 n 0 ~7Tl

m=l n(m)
ci = [t {(2: x:' *:) t:,l }]_l [t {(L.: x:' t;;:l) (Y - IT

... =1 n( ... )
2:.. ®
... =1 "C"')
s" ® ® ji... ) } ]
j1 = (t L T"X~' t;;:IX~) [(t 2: e~ X~' t;;:l) (Y - 1.)0)]

... =1 "(,,,j
-1
... =1 n(m)
® (lt 0
1:-:... = T",--1,,",-'':'-
~ U"':',, U"
,,(m)
(t (2: X:'EnX:)
Var(ä:I·) =
tn_l a(m)
® r;;;;t)_1
jA = (t 2: T"x~' 1::;;:lX~) -1 [(t 2: e~

... =1 n(m) ",,,,1 n(m)
® x~' 1::;;:1) (Y - (x ® IK) Q)]
t ... = T;;l 2: U~E" U"

n(m)
Table 9.20: The MSMA(M)- VAR(p) Model
L L Ü~E
M
t = 1';;.1 .. Ü ..
m=l n(m)
M
:t = T- 1 ~
fn
~ u'n-n
~ ~
.. U ft
198 Comparative Analysis oE Parameter Estimation in Particular MS- VAR Models
Table 9.21: The MSMAH(M)-VAR(p) Model
u '" N(O,f}), f} = L L =n 0 ~=
==1 n(m)
- - - I ' " " ' - I.... -

E".=T", ~un;nU ..
n(",j
Var(ä",I.) = (2: jt~l:=:njt~)

n(",j
-1 ® E",
t ... = T,;;I 2:
n(",j
U~:=: .. u"
Chapter 10
Extensions ofthe Basic MS- VAR Model
In the preceding chapters we have made three essential assumptions with regard to
the specification of MS-VAR processes: we have assumed that (i.) the system is
autonomous, i.e. no exogenous variables enter into the system, (ii.) the regime-
dependent parameters depend only on the actual regime but not on the former history,
and (iii.) the hidden Markov chain is homogeneous, i.e. the transition probabilities
are time-invariant. As we have seen in the foregoing discussion, these assumptions
allow for various specifications. Modelling with MS-VAR processes is discussed ex-
tensively in the last part of this study for some empirical investigations related to
business cycle analysis. However, there might be situations where the assumptions
made about the MS-VAR model result in !imitations for modelling.
Therefore, in this chapter we will introduce three extensions of the basic MS-VAR
model. In Section 10.1 we will consider systems with exogenous variables; in Sec-
tion 10.2 the MSI(M)- VAR(P) model is generalized to an MSI(M, q)- VAR(P) model
with intercept terms depending on the actual regime and the last q regimes, thus ex-
hibiting distributed lags in the regimes. In a third section we discuss MS-VAR mod-
els with time-varying transition probabilities and endogenous regime selection, i.e.
specifications where the transition probabilities are functions of observed exogenous
or lagged endogenous variables.
10.1 Systems with Exogenous Variables
In regard to the previously considered MS(M)-VAR(P) models, we have assumed

that aB variables are determined within the system and that the model describes the
joint generation process of all the observable variables of interest. In practice, the
200 Extensions of the Basic MS- VAR Model
generation process of Yt may be affected by further observable variables Xt outside

of the system under consideration. l
The natural way to introduce these variables is to generalize the MS-VAR model to
adynamie simultaneous equation model with Markovian regime shifts:
V(St) + Al (St)Yt-1 + ... + Ap(st}Yt-p + Bo(st)Xt

+B 1 (St)Xt-1 + ... + Br(St)Xt-r + Wt, (10.1)
where Wt '" NID (0, I;(st)) and Yt = (Ylt, ... , YKt)' is a K-dimensional vector
of endogenous variables, the Ai and Bj are coefficient matrices. The vector Xt of
exogenous variables may contain stochastic components (e.g. policy variables) and
non-stochastic components (e.g. seasonal dummies). The intercept v has not been
included into the vector Xt.
Equation (10.1) is the structural form of adynamie simultaneous equation model,

where instantaneous links of the endogenous variables are allowed. As in the time-
invariant case (cf. e.g. LÜTKEPOHL [1991, ch.9D, restrictions on the structural form
coefficients are necessary to guarantee the identifiability of the structural form.
In the foUowing we will focus on the reduced form of the system which can be ob-
tained by premultiplying (10.1) with AöI:
Yt v(sd + AI(st)Yt-1 + ... + Ap(sdYt-p (10.2)
+Bo{St)Xt + BI (St)Xt-l + ... + Br(st)Xt-r + Ut,
where Ut := AO{St)-IWt is a Gaussian white noise process, v(St) := AÖI (st}v{St),

Ai(St) := AÖI(St)Ai(St), and Bj(st) := AÖI(St)Bj(St) are coefficient matrices.
Equation (10.2) denotes the most general form of an MS(M)-VARX(P) model,
where all the parameters are affected by shifts in regime. In general, various spe-
cifications are possible analogous to the MS-VAR model (cf. Table 1.1). In partic-
ular, the coefficient matrices B j may not be regime dependent. However, it is suf-
ficient to have regime-dependent autoregressive parameters in order to obtain time
IOn the other way round, it may be interesting to check whether the regime shift appears for changes
in an omitted or unobservable variable (world business cycle, state of confidence, oil price, ete.).
10.1. Systems with Exogenous Variables 201
varying dynatnic multipliers Dj(st}, while D(L) = L;':o DjLj = A(L)-l B(L) is
time invariant iff A(L) and B(L) are time invariant.
The statistical analysis of MS(M)-VARX(P) models can be easily performed as a

straightforward extension of the by now familiar methods. These methods, however,
require an update of the conditional densities, p(ytl~t, Yt-l, Xt) and the likelihood
function
T
L(A) / p(YI~, z, B) Pr(~I~o, p) d~ L 1J~~tlt-l'
t=l
where
1Jt = ~tlt-l =
and X t - 1 = (x~_I' x~_2' ... ,x~)', Z = X T . Then, the estimation ofthe parameter
vectors
b = vec (B o, BI, ... , Bq) or bm = vec (B o.m , Bl. m ,"" B q.m ), m = 1, ... , M,
respectively, can be obtained in the same manner as for the intercept parameter vector
I/m in the previous chapters. For example, the normal equations of the ML estimator
are given by
ßlnL
ab'
If one is interested in an estimation of the (over-identified) structural parameters of

model (10.1), non-linear restrictions ensue, requiring numerical maximization tech-
niques. The introduction of exogenous variables is not restricted to the measurement
equation, where they interact with the endogenous variables in a linear fashion. They
~.
may be introduced into the state equation, where they determine the probabilities of
regime transitions, e.g. the transition probabilities Pij(Xt-d) could be a function of
some observed exogenous variables at time t - d such that
The implications of exogenous variables in the transition matrix Pt are considered in

Section 10.3. In the next section a further modification of the measurement equation
is introduced which exhibits distributed lags in the regimes.
10.2 Distributed Lags in the Regime
10.2.1 The MSI(M, q)-VAR(P) Model
As a generalization ofthe MSI(M)-VAR(P) models, one may assume that the inter-
cept tenn depends not only on the actual regime but in addition on the last q regimes
q q M
Vt v(St, St-l,···, St-q) =L Vj(St_j) =L L vmjI(St-j = m).

j=O . j=O m=l
This specification is reflected by an MSI(M, q)-VAR(P) model. Obviously,

MSM(M)-VAR(P) models are restricted MSI(M, q)-VAR(P) models with
where the Mq+l = MP+1 different (K x 1) intercept tenns are functions of the
M different (K x 1) mean vectors and the (K x K) autoregressive matrices A j ,
j = 1, ... ,po We have seen in the context ofthe MSM(M)-VAR(P) model that the
problem of lagged regimes in the conditional density of the observable variable can
be avoided by redefining the relevant state vector. However, such an unrestricted pro-
cedure increases the dimension of the state vector dramatically and without further
restrietions this leads to a parameter inflation.
Therefore we will not generalize this model further by relaxing the assumption of
additivity. In particular, for two-regime models M = 2 with q = 1, this assumption
is not restrictive since Mq+1 = (q + 1)M:
V(St = 1, St-l = 1) 1 0 1 0 v(St = 1)

v(St = 1,St-l = 2) 1 0 0 1 v(St = 2)
v(St = 2, St-l = 1) 0 1 1 0 V(St-l = 1)
v(St = 2, St-l = 2) 0 1 0 1 V(St-l = 2)

10.2. Distributed Lags in the Regime 203
The MSI(M, q)-VAR(P) model is of particular interest for multiple time series as
Section 7.1 has indicated. For example, {Ylt} may be a leading indicator for {Y2t}
where the lead is d periods:
Thus, one would observe the effects of a change in regime in the first time series d
periods before the shift affects the second time series.
10.2.2 VARMA Representations of MSI(M, q)-VAR(P) Pro-

cesses
The MSI(M, q)-VAR(P) model enables us to complete the correspondence between

time-invariant VARMA models and MS-VAR models. In contrast to the MS-
AR models considered in Chapter 3, MSI(M, q)-AR(P) processes can possess
ARMA(p*, q*) representations with q* > p*. We will now generalize the proof
given in Chapter 3 for MSI(M)-VAR(P) processes to derive the VARMA represen-
tation ofMSI(M, q)-VAR(P) processes.
Proposition 5 Suppose that Yt is an MS/( M, q)-VAR( p) process,
A(L)(Yt - J-Ly)
F(L)(t
where A(L) = IK - Al LI - ... - ApLP is a (K x K) matrix polynomial, M(L) =

M o- M MqLq isa (Kx [M -1]) matrixpolynomial, F(L) = I M - 1 -
1 L - ... -
1
FL is ([ M - 11 x [M - 1] dimensional, the error terms Ut, Vt are independently

distributed as in Table 2.1, and the ([M - 1] x 1) regime vector is defined as
~1t - (1 ]
~M-l,t ~
[
[M-l .
Then Yt has a VARMA(M +p - I,M +q - 2) representation,
where Ct is a zero mean vector white noise process.

Proof The proof is a simple extension of the proof for MSI(M)-VAR(P) pro-
cesses. The stable VAR(1) process {(t} possesses the vector MA( 00) representa-
tion (t = F(L)-IVt. Since the inverse matrix polynomial can be reduced to the
inverse of the determinant I F(L)I- 1 and the adjoint matrix F(L)*, we have (t =
I F(L)I- I F(L)*Vt. Inserting this transformed state equation into the measurement
equation results in
IF(L)I A(L) (Yt - /-Ly) M(L) F(L)*Vt + IF(L)I Ut· (10.3)
Equation 10.3 denotes the VARMA(M +p - 1,M +q - 2) representation of

the MSI(M, q)-VAR(P) where the AR order is equal to the previously considered
MSI(M)- VAR(P). If Yt is a vector valued process, we have to take into account that
IF(L)IA(L) is not a scalar lag polynomial. Hence, equation (3.9) is not a final equa-
tions form, which is given by
IF(L)IIA(L)I (Yt - /-Ly) A(L)* M(L) F(L)*Vt + A(L)*IF(L)I ullOA)
This corresponds to the application of Lemma 1 to the following state-space repre-

sentation:
Yt - /-Ly (10.5)
Xt (10.6)
where the state vector consists of p adjoining observable vectors {Yt-j }j:~ and q+ 1
unobservable regime vectors {(t+l-j }]=o:
Al ... Ap-IAp MI .. . M q- l M q Ut
Yt IK 0 0 0 0
Yt-p+l 0 IK 0 0
Xt = ,G= ,Ut =
(t+l P' 0 0 Vt
IM-I 0 0 0
(t-q+l
0 0 IM-l 0 0
10.3. The Endogenous Markov-Switching Vector Autoregressive Model 205
and J = (I K , 0) is a (K x [pK + q(M - 1)]) matrix.
10.2.3 Filtering and Smoothing
The statistical analysis ofMSI( M, q)- VAR(P) models can be also easily performed as
a straightforward extension of the by now familiar methods. It should come as no SUf-
prise that the MSI(M, q)- VAR(P) model can be treated analogously to the MSM(M)-
VAR(P) model. Define the relevant state vector as
= ,,(1),0.. ,,(1),0.. ,0.. ,,(1)

I,t '<Y I,t-I 'C:I ••• 'C:I I,t-q'
and the matrices Fand H accordingly, e.g.
H I: (l:Vi 0 IM 0 1:Vq-i )Vj

j=O
F diag (vec P (9 1 MP - l )(lM (9 IMP 01~)
then the statistical procedures discussed in Chapter 5 can be applied as usuaI.
10.3 The Endogenous Markov-Switching Vector

Autoregressive Model
10.3.1 Models with Time-Varying Transition Probabilities
In the foregoing we have assumed that the hidden Markov chain is homogeneous,
such that the matrix of transition probabilities, Pt = P t-I = ... = P, is constant
over time. In their cIassical contribution, GOLDFELD & QUANDT [1973] have pro-
posed an extension ofthe approach by allowing the elements ofthe transition matrix
to be functions of an extraneous variable Zt. For M = 2 regimes we would have for
example
This approach has been ealled by GOLDFELD & QUANDT [1973] the "r(z)-
method". If the underlying model is a veetor autoregression, we will refer to it as
a generalized Markov-switching vector autoregressive model or GMS(M)-VAR(P)
model. If some transition probabilities depend on the lagged endogenous variable
Yt-d, d > 0, i.e. Pr(St = ils t - I = j, Yt-d = Pij(Y~_d8), then the resulting model
will be termed endogenous selection Markov-switehing vector autoregressive model
or EMS(M, d)-VAR(P) model.
DIEBOLD et al. [1994] eonsider Markov-switehing models with exogenous switeh-
ing, but without lagged endogenous variables. Markov-switching models with endo-
genous switehing (but again without lagged endogenous variables) are considered by
RIDDER [1994]. In particular, DIEBOLD et al. [1994] have discussed a modifieation
to an MSI(2)-AR(0) model in which the transition probabilities can vary with fun-
damentals. The transition probabilities Pr(StISt-l' zt} are parameterized by use of
logit transition functions as
In ( Pmm ) = z~8m, m = 1,2. (10.7)

1-Pmm
Hence, the matrix P t-I of transition probabilities Pr( St ISt-I , Zt) equals
exp(z~81)
[ 1 + ex~(z~8d (10.8)
1 + exp(z~82)
In contrast to the STAR model, the effects of the variable Zt on the probability distri-
bution of forthcoming regimes depends on the actual regime. An alternative model
might coneern asymmetrie and stoehastic poliey multipliers. Suppose, far example,
that a tight monetary poliey is more effeetive in stopping "booms" than an expan-
sionary monetary poliey in implementing upswings. 2 Then the poliey variable Zt
would affeet the transition probability of slumping from a "boom" (St = 1) into
"recession" (St = 2), Pr(st = 1lSt-l = 2, Zt), while the transition probabilities
Pr(StISt-l = 2, Zt) = Pr(StISt-l = 2), remain unaffeeted.
This model is also well-suited to ineorporate deterministic elements of regime
switching. Suppose one expects the prevailing of regime m* = 1 for a given period
2Such asymmetries ofmonetary policy are considered in GARCIA & SCHALLER [1995] for the United
States.
of time Tr. Define a dummy variable dt such that dt = I (t +1 E Tr) and let
Zt = (1, dt ). Finally, denote P t-I in a slightly different form to equation (10.8)
with the unrestricted parameter vector p = (bO,I, bO,2, b1),
exp(bO,I + b1dt} 1 ]
Pt 1 = [ 1 + exp(bo,l + bIdt ) 1 + exp(bO,1 + bldt}
- exp(bo,2 + bIdt} 1 .
1 + exp(bo,2 + b1dt} 1 + exp(bO,2 + bldt )
Then for b1 -+ 00, the probability of being in regime 1 goes to one at any point t in
time with t + 1 E Tr,
Pt~l = [~ ~],
while the transition probabilities are given for the remaining periods t with t + 1 ~ Tr
by
exp(bo,l)
P
t-I
=P = [ 1 + exp( bo,t}
exp(bO,2)
1+ ex~(bo,tl] [ P11 1 - P11 1'
1- P22 P22
1 + exp(bo,2) 1 + exp(bo,2)
_
where PlI = 1 +exp( bo,d
exp(bO,I)
_
and P22 =- - -1
--
1 + exp(bO,2)
A slightly different model can be achieved by introducing the dummy variable Zt
only in the transition functions for regime 1, Pr(stlSt-1 = 1, Zt), but not in those of
regime 2: Pr(StISt-1 = 2, Zt) = Pr(StISt-1 = 2). As previously mentioned, the
deterministic event leads to an immediate jump at time t into a special regime, say
m *. After the regime prevails, the transition probabilities are unaltered compared to
the former history. By using the dummy variable approach, we define dr - 1 = 0, T =f:.
t, and dt - l = 1. This implies the following expected evolution of regimes:
- Ih-l
~t+hlt-l = P /'rn'"
This is to be compared to the implications of the former specification:
~t+hlt-l t rn * for h E Tr
- h-l
~t+T",. +h-Ilt-l P' t rn .,
where it is assumed that the intervention period is a compact period Tr with length
Tt.
208 Extensions oE the Basic MS-VAR Model
10.3.2 Endogenous Selection
In a multiple time series framework, the introduction of only exogenous variables

in the transition matrix functions may be too restrictive. Let us consider an eco-
nomic example. The effect of the policy variable Y2t on the target variable Ylt
might be asymmetrie and stochastic even if the parameters of the model are known.
Suppose there is no linear interaction between both variables, i.e. A(L) is diag-
onal, but the transition probabilities are functions of the instrument variable Y2t,
Pr(StISt-l, Yt-d = Pr(StI St-l, Y2,t-d· The effect is then given by
where, again, St-l = 2 may indicate a recession and St-l = 1 a boom.

The EMS(M, d)- VAR(P) model exhibits dynamic multipliers which can reflect very
complicated interactions. Even for known parameters, the impulse-response func-
tions become stochastie. EMS(M, d)-VAR(P) processes possess adynamie state-
space representation. In comparison to the state-space model introduced in Chap-
ter 2, the alterations matter only for the state equation which determines the non-
nonnal process ~t. The transition matrix F is no longer time-invariant, but a function
of the lagged values of the observed variable Yt, F t = F (Yt - d). So,
Yt Ht-l(~t - Ü + DtUt (10.9)
Ft(~t - Ü + Vt+! (10.10)
where H t = X t B = H(Yt-b .. " Yt-p), D t = (~~ 0 I K ) :E 1 / 2 , Ut ,.....,

NID (0, IK) and Vt+! is defined as in Table 2.1 on page 31. If D = E 1 / 2 then the
state-space representation is again linear in ~t.
Thus, the EMS-VAR model implies a feedback from the observational process to the
state process which can be exemplified by setting-up the likelihood function of an
EMS-VAR model
L(,XIYT ) = hp(YTI~, B)p(~IYT, p, ~o)d~.
The effects of lagged endogenous variables on the transition probabilities of the

Markov chain causes a major problem involved with EMS-VAR models. In contrast
to MS- VAR models with an exogenous Markov chain as regime generating process,
the likelihood function cannot be written in the form of a finite mixture of conditional
densities p(YIYo, f)(~l' A)) with positive mixing proportions ~1(p(A), ~O(A))
L:~=l €so rri'=l PS t - 1 St (A) as in (6.6). For this reason the identifiability arguments
invoked in Section 6.2 cannot be applied to the EMS(M)- VAR(P) model. In RIDDER
[1994], identifiability and consistency ofML estimation is checked for endogenous
selection models without autoregressive dynamies, i.e. only for EMS(M)- VAR(O)
models. Hence the properties of the statistical procedures to be discussed in the next
seetions merit further investigation.
10.3.3 Filtering and Smoothing
We will show now how the filtering algorithm and the estimation procedures have to
be modified to handle the case of non-homogeneous Markov chains.
Since P t - 1 = II(Yt-d) is known at time t - 1, the Bayesian calculations of the
last sections remain valid, even if endogenous selection of regimes is assumed. For
example, the posterior probabilities Pr( ~t IYt, Yi-l) are given by invoking the law of
Bayes as
c l"ll')
P r (<,t I t
= P r (C<,t IYt, v t-I ) = p(Ytl~t, Yi-d
- I
Pr(~tIYi-d
( I"ll' ) (10.11)
P Yt It-l
with the a-priori probability
Pr(~tIYi-d = L Pr(~d~t-l, Yt-d) Pr(~t-lIYi-d,

~t-l
where Pr(~t I~t-l, Yt-d) has replaced the simple transition prob ability Pr(~t I~t-l)
and the density p(YtlYt-l) is again
P(YtlYi-l) = LP(Yt, ~tIYi-l) = L Pr(~tIYi-l )p(Ytl~t, Yi-d·
Hence we only have to take into account that the Markov chain is no longer homo-
geneous. The necessary adjustments of the filtering and smoothing algorithms affect
only the transition matrix F, which is now time varying
Ft (1Jt 0 €tlt-d
~t+1lt (10.12)
l'(1Jt 0 €tlt-d '
(F~(€t+1IT 0 €t+1lt)) 0 €tlt. (10.13)

DIEBOLD et al. [1994] have proposed a modification to the EM algorithm that can be
used to estimate the parameter vector entering into the transition functions. The use
ofthe Gibbs sampier has been suggested by FILARDO [1994] and GHYSELS [1994].
10.3.4 A Modified EM Algorithm
While the MS-VAR model with constant transition probabilities has been recognized
as a non-normal, linear state-space model (see Seetion 2) the EMS-VAR model can
be described as a non-normal, non-linear state-space model, where the non-linearity
arises in the transition equation.
In order to motivate our procedure, let us consider at first the treatment of non-linear
models which are more established in the literature. Again, if the innovations Vt were
normal, Vt '" NID (0, Ev ), we would have a normal, non-linear state-space model.
Forthis kind ofmodel the extendedKalman filter (cf. e.g. HAMILTON [1994a]) is of-
ten an efficient approach. The idea behind the extended KaIman filter is to linearize
the transition equation and to treat the Taylor approximation at ~t = ttit as if it were
the true model. These procedures result in an augmented time-varying coefficient
version of a linear state-space model, for which the iterations needed for deriving
the smoothed states ~tlT are well-established. It can be easily verified that the mod-
i fied EM algorithm proposed by DIEB 0 LD et al. [1994] is a straightforward applic-
ation of these ideas developed for the normal non-linear state-space model to MS-
VAR models with time-varying transition probabilities. Thus the statistical analysis
of these models can be emdebbed in the EM algorithm, which has been discussed in
Chapter 6, for the MS-VAR model with time-invariant transition probabilities.
8pmm
Pmm(1- Pmm)z~
86'
Pu (1 - Pu) 0
8pt 0 -P21(1- P2t}
o z~
86' -(1 - P12)P12 0
0 (1 - P22)P22
10.4. Summary and OutIook 211
z't 0
0 -z't
diag (p 0 (t - p))
-z't 0
0 z't
Since the resulting first-order condition is non-linear, DIEBalD et al. [1994] suggest
a linear approximation at <5 1- 1
It may be worth noting that HOLST et al. [1994] have proposed to estimate the logits
of transition probabilities In (1~;ij) of a homogeneous Markov chain rather than
the Pij itself. This reparametrization is useful especially for the detennination of the
information matrix and thus for the variance-covariance matrix.
10.4 Summary and Outlook
In this chapter we have discussed possible generalizations of the MS-VAR model to

open dynamic systems, endogenous regime selection and lag distributions of regime
shift effects. In the last part of this study, the methodology introduced in the preced-
ing chapters is applied to business cycle analysis. The next chapter demonstrates the
feasibility of our approach by investigating West-German GNP data. In Chapter 12
a six-dimensional system reftecting international and global business cycles will be
analyzed. The final chapter will generalize the MS-VAR model considered so far by
introducing a new approach to the analysis of cointegrated time series.
Chapter 11
Markov-Switching Models of the German

Business Cycle
The statistical measurement of business cycles has recently experienced a revival of

interest. Empirical business cycle research has always been interested in the chrono-
logy of contraction and expansion epochs (cf. interalia QUAH [1994]). This view is
expressed in the primary descriptive definition of the 'business cycle' proposed by
BURNS & MITCHELL [1946, p. 3] which is howevercompatible with most business
cycle theories (cf. KLEIN [1995]):
"Business cycles are a type offluctuationsfound in the aggregate eco-

nomic activity of nations that organize their work mainly in business
enterprise: a cycle consists of expansions occurring at about the same
time in many economic activities, followed by similarly general reces-
sions, contractions, and revivals wh ich merge into the expansion phase
of the next cycle; this sequence of changes is recurrent but not periodic;
in duration business cycles vary from more than one year to ten or twelve
years; they are not divisible into shorter cycles of similar character with
amplitudes approximating their own."
In the Burns-Mitchell tradition, the identification of turning points has been con-
sidered as the principal task of empirical business cycle research. While the NBER
methodology has been criticized as "measurement without theory" (cf. KOOPMANS
[1947]), the statistical measurement ofbusiness cycles is still worth studying.
Since HAMILTON'S [1989] model of the US business cycle, the Markov-switching

autoregressive model has become increasingly popular for the empirical character-
ization of macroeconomic ftuctuations. 1
lSee inter alia ALBERT & CHIB [1993], DIEBOLD el al. [1994], GHYSELS [1994], GOODWIN
214 Markov-Switching Models of the German Business eyde
Figure 11.1: Log of West-Gerrnan Real GNP (Seasonally Adjusted) 1960-1994
120 ~----------------------------------------------------~
100
80
60
40
20
65 70 75 80 85 90 95
In Markov-switching autoregressive (MS-AR) processes, contractions and expan-

sions are modelled as switching regimes of the stochastic process geherating the
growth rate of real gross national product (GNP).2 The regimes are associated with
different conditional distributions of the growth rate of real GNP, where e.g. the mean
is positive in the first regime ('expansion') and negative in the second regime ('con-
traction'). The statistical methods discussed in the first part of this study deliver op-
timal inference on the latent state of the economy by assigning probabilities to the
unobservedregimes 'expansion' and 'contraction' conditional on the available infor-
mation set.
For West-Gerrnan GNP data, KÄHLER & MARNET [1994a] found that "this model
identifies regimes which cannot be associated with notions oi the business cycle."
This issue is reconsidered in this chapter: we analyze the MS(M)-AR(P) model for
WIN [1993], HAMILTON & SUSMEL [1994], KÄHLER & MARNET [1994a], KIM [1994], KROLZIG
& LÜTKEPOHL [1995], LAM [1990], MCCULLOCH & TSAY [1994a], PHILLIPS [1991] and SEN-
SIER [1996].
2 As an alternative to MS-AR models of real GNP growth rates, it would be possible to model "'uctu-
ations in the utilization rate of potential output which is preferred in other definitions of the business
cycle (cf. e.g. OPPENLÄNDER [1995]). However, this approach requires the measurement of potential
output and would heavily depend on the quality of the constructed time series. For these reasons we
followed the standard assumptions in the relevant literature.
Markov-Switching Models of the German Business Cycle 215
the seasonally adjusted quarterly GNP data for West-Germany from 1960 to 1994.
The overall objectives of this analysis of the German business cycle are (i.) to illus-
trate the as yet theoretically derived properties of MS-AR models, (ii.) to demon-
strate the feasibility of the approach developed in this study for empirical analysis
and (iii.) to examine the potential role of the MS-AR models in forecasting. In con-
trast to the previous literature, statistical characterizations of the business cycle are
examined for a broad range of model specifications. In particular, we will exam-
ine whether the proposed models ean essentially replicate traditional business eycle
classifications by employing stoehastic models that are parsimonious, statistically
satisfactory and eeonomically meaningful.
This chapter will proceed as folIows. In the tradition of HAMILTON [1989], Markov-
switching autoregressive processes in growth rates of the real gross national product
(GNP) are interpreted as stochastie business eycle models. In the following section
the data are presented. Traditional characterizations of the German business cycle
are eonsidered as a benchmark for the following analysis. The strategies introduced
in Chapter 7 for selecting simultaneously the number of regimes and the order of the
autoregression in Markov-switehing time series models based on ARMA representa-
tions is used. Maximum likelihood (ML) estimations of the alternative models have
been performed with versions of the EM algorithm introduced in Chapter 6. The
estimation proeedures were implemented in GAUSS 3.2.
The diseussion of estimated MS(M)-AR(P) models foeuses on their ability to

identify epoehs of expansion ('booms') and eontraetion ('recessions') in the process
of economic growth. For this purpose, the compatibility of the reconstructed phases
of contractions and expansions with a traditional classification of the German busi-
ness cycle is analyzed. Model specification procedures discussed in Chapter 7 are
adopted to check that the model under eonsideration is eonsistent with the data,
i.e. the model ean not be rejected against more general Markov-switehing models
allowing e.g. for regime-dependent heteroskedasticity and covarianee struetures.
The presentation begins with the HAMILTON [1989] model. This MSM(2)-AR(4)
model illustrates the implieations of the Markov-switehing autoregressive model for
the stylized facts of the business cycle. It is shown that the MSM(2)-AR(4) model
cannot be rejeeted in the class ofMSM(2)-AR(P) models. Then we will remain in
the two-regime world and compare the Hamilton model to speeifieations, where the
intercept term (MSI(M)-AR(P) models) is shifting. In further steps, the assumption
216 Markov-Switching Models oE the German Business CycJe
of a homoskedastic white noise is relaxed and the autoregressive coefficients are no

Ion ger assumed to be time invariant. Thus MS-AR models with more than two re-
gimes are employed to investigate whether the data-generating process has been sub-
ject to structural breaks as weIl as the switching business cyc1e regimes. This analysis
concludes with an MSMH(3)-AR(4) model as a convincing device for a description
of essential features of the economic history in the last three decades. FinaIly, we
evaluate the out-of-sample performance of the most interesting models by estima-
ting them up to 1989:4 and computing forecasts far the last five years.
11.1 MS-AR Processes as Stochastic Business Cycle

Models
In HAMILTON'S model [1989] of the D.S. business cycle a fourth-order autoregres-

sion (p = 4) is fitted to the quarterly percent change in D.S. real GNP from 1953 to
1984,
6.Yt - Jl( St) al (D.Yt-l - Jl(st-d) + ... + a4 (D.Yt-p - Jl(St-4)) + Ut,

Ut ,..., NID (0, a 2 ),
where 6.Yt is 100 times the first differences of the log of real GNP and the conditional
mean Jl( sd switches between two states (M = 2),
_ { JlI >0 if St = 1 ('expansion', 'boom'),

Jl ( St ) -
Jl2 < 0 if St = 2 ('contraction', 'recession'),
and the variance a 2 is constant. The effect of the regime St on the growth rate
D.Yt is illustrated with the conditional prob ability density function p( D.Yt 1St) in Fig-
ure 11.2. 3
3The plotted p(.6.Yt\st} are constructed analogously to equation (11.4) using the regime classifica-
tions of the estimated MSM(2)-AR(4) model (cf. Section 11.3). As .6.Yt is neither independently nor
identically distributed, Figure 11.2 cannot be considered as a viable kernel density estimate.
11.2. Preliminary Analysis 217
Figure 11.2: The Hamilton Model: Conditional Densities
0.5 .----------------------------------------------------r
0.4 Recession
-4 -3 -2 -1 o 1 2 3 4 5
The regime generating process is assumed to be a two-state Markov chain. Thus,

the Hamilton model can be denoted according to our classification in Table 1.1 as an
MSM(2)-AR(4) model since we have a Markov switching mean (MSM), time invari-
ant autoregressive parameters and no heteroskedasticity. A major benefit of the pre-
ceding results of this study is that we are not restricted in the analysis of the German
business cycle to this exact specification of an MS(M)-AR(p) model. As already
mentioned, all parameters of an autoregression can be made conditional on the state
of the Markov process. Nevertheless, restricted models might be useful where only
some parameters are conditioned on the state of the Markov chain, while the other
parameters are regime invariant.
11.2 Preliminary Analysis
11.2.1 Data
While the definition of the business cycle proposed by B URNS & MITCHELL [1946]
emphasizes co-movements in the dynamics of many economic time series, we will
restriet our investigation to a broad macroeconomic aggregate: the gross national
product (GNP) in constant prices of West-Germany from 1960 to 1994, which is
218 Markov-Switching Models ofthe German Business Cyc1e
Figure 11.3: West-German Real GNP-Growth, Quarter over Quarter
6 r-------------------------------------------------~
-2
60 65 70 75 80 85 90 95
plotted in Figure 11.1. More precisely, we are going to model the quarterly growth
rate of the seasonally adjusted series given in Figure 11.3. The data consists of 132
quarterly observations for the period 1962: 1 to 1994:4 (excluding presample values).
Data sources are the Monatsberichte of the Deutsche Bundesbank and for the data
before 1979, the Quarterly National Accounts Bulletin of the OECD.
The presence of unit roots in the data has been checked by the augmented DrCKEY-
FULLER (ADF) test [1979], [1981]. For the null hypothesis of unit roots, i.e. Ho:
7r = 0 in the regression
p-l
.6.Yt = <P + L 'l/Ji.6.Yt-i + 1fYt-l + Ut, (11.1)
i=l
the test statistic gives -1.8778 (with p = 12) and -1.85961 (with p = 8). At a 10%
significance level, the null of a unit root in Yt cannot be rejected. For differenced
time series .6.Yt, the ADF test rejects the unit root hypothesis on the 1% significance
level (with test statistics of -4.2436 and -4.0613). Thus, Yt was found to be integrated
of order 1. In the appendix, we also show that the Hodrick-Prescott filter does not
produce a detrended time series with satisfying statistical characteristics. Therefore,
the data are detrended by differencing. The potential importance of structural breaks
for this result has been emphasized by PERRON [1989]. In contrast to this view we
will now consider the MS-AR model, where the presence of regime shifts and unit
roots is assumed.
Figure 11.4: West-German Real GNP-Growth, Year over Year
60 65 70 75 80 85 90 95
11.2.2 TraditionaI Thrning Point Dating
In order to establish a systematic control of the ability of MS-AR models to monitor

the German business cycle we will compare the results with the efforts of traditional
techniques for dating business cycIe turning points. The most recent chronology of
business cycIe and growth cycIe peaks and troughs for Germany are available from
the International Economic Indicators of the Center of International Business Cycle
Research (CIBCR). The CIBCR's methodology is based on the work of the National
Bureau of Economic Research. The dating of the business cycIe relies on a system
of monthly indicators measuring macroeconomic performance.
In Figure 11.4 etc., the dark shadowed areas denote recessions as the decline from
the upper turning point ("peak") to the lower turning point ("trough") of the busi-
ness cycIe. The classicaL business cycIes are characterized by alternating periods of
expansion and contraction in the level of macroeconomic activity. They are encom-
passed by growth cycles which are short-term ftuctuations in macroeconomic activity
characterized by periods of high and low mean rate of growth. The more common
phases of decelerating growth rates are indicated by light shadowed areas. More de-
tails on the methodology of the CIBCR and the data source can be found inter alia
in ZARNOWITZ [1995] and NIEMIRA & KLEIN [1994].
220 Markov-Switching Models of the German Business CycJe
Table 11.1: Models Recommended by Different Model Selection Criteria
Akaike Criterion
AIC ARMA MSI-AR MSM-AR MSI(M, q)-AR(P)
1. 0.0295 (6,8) (2,8,5)
2. 0.0583 (8,8) (2,8,7)
3. 0.0874 (3,7) (2,7,2)
4. 0.0997 (3,4) (2,4,2)
5. 0.1062 (4,4) (5,0)
Schwarz Criterion
SC ARMA MSI-AR MSM-AR MSI(M, q)-AR(P)
1. 0.2939 (3,4) (2,4,2)
2. 0.3247 (4,4) (5,0)
3. 0.3544 (3,7) (2,7,2)
4. 0.3652 (3,6) (2,6,2)
5. 0.3746 (5,4) (5,1) (2,4)
11.2.3 ARMA Representation Based Model Pre-Selection
A critical decision in the specification of MS-AR processes is the choice of the num-
ber of regimes M which are required for the Markov chain to characterize the ob-
served process. As we have seen in Section 7.5, testing procedures for the determin-
ation of the number of states are confronted with non-standard asymptotics. Due to
the existence of nuisance parameters under the null hypothesis, the likelihood ratio
test statistic does not possess an asymptotic X2 distribution.
Procedures for the derivation of the asymptotic null distribution proposed by

HANSEN [1992] and GARCIA [1993] are computationally demanding and only of
limited value for empirical research. Hence, we adopt the model selection proced-
ures suggested in Section 7.2. The state dimension M of the Markov chain and the
order p of the autoregression are thereby simultaneously selected on the basis of the
ARMA(p*, q*)- representation ofMSM(Al)-AR(P) and MSI(M)-AR(P) processes.
In order to apply this model selection strategy to the data under consideration, we
have performed a univariate ARMA analysis. The maximum likelihood estimations
of the ARMA models were computed with the BOXJENK procedure provided by
RATS. The Akaike information criterion (AlC) and the Schwarz criterion (SC) were
employed to assist in choosing the appropriate order of the ARMA(p, q) processes.
The recommended ARMA models and corresponding MS-AR models are given in
Table 11.1. 4 Equipped with these results, we are able to select MS models which
could have generated the selected ARMA representation and thus can be expected
to be consistent with the data.
Note that in the class of MSI(M, q)-AR(P) models, under regularity conditions,
the ARMA(p*, q*) representation corresponds to a unique generating MSI(M, q)-
AR(P) process as can be inferred from Table 7.3. Apart from that, the specifica-
tion (M, p, q) of the most parsimonious MSM(M, q)-AR(P) and MSI(M, q)-AR(P)
model has been reported. 5 Thus, for the selected ARMA(p* , q*) representation with
p* 2: q* 2: 1 the unique MSI(M)-AR(P) model with M = q* + 1 and p = p* - q*
and for p* - 1 = q* 2: 1 the parsimonious MSM(2)-AR(q* - 1) is provided. For
completeness, the MSI(M, q)-AR(P) model introduced in Section 10.2 has been ap-
plied, if the MA order q* is larger than the AR order p* .
The selected MSM-AR and MSI-AR models should be considered as take-off points
for the estimation of more general MS models. As a next step, the recommended
MSM(M)-AR(P) and MSI(M)-AR(P) models are estimated and then compared with
regard to the resulting classifications of the German business cycle. It is worthwhile
to note that the MSM(2)-AR(4) model used by Hamilton in his analysis ofthe D.S.
business cycle is among the preferred models. But the results indicate that the further
analysis should not be restricted to two regimes. A Markov chain model with five
st"tes and no autoregressive structures may be an especially feasible choice. 6 The
MSI(5)-AR(O) will be discussed in Section 11.6.3.
4The complete results incIuding the computed selection criteria values for ARMA(p,q) models with
o~ p ~ 14, 0 ~ q ~ 10 are presented in KROLZIG [1995].
5For example, the recommended ARMA(5,4) model is also compatible with an MSM(3)-AR(3) and an
MSM(4)-AR(2) model.
6Unfortunately, MSM-AR models with more than two states and containing some lags quickly become
computationally demanding and therefore unattractive. Analogous problems would have been caused
by MSI(M, q)-AR(P) models. Consequently we consider only MSM(M)-AR(P) models with M ~ 3
and MSI(M)-AR(P) models.
222 Markov-Switching Models of the German Business Cyc1e
Table 11.2: MSM(2)-AR(p) Models
MSM(2) MSM(2) MSM(2) MSM(2) MSM(2) MSM(2)

-AR(O) -AR(l) -AR(2) -AR(3) -AR(4) -AR(5)
11-1 0.7835 1.0892 1.0927 1.2005 l.0774 1.0791
11-2 0.5820 -0.0298 -0.1280 -0.2006 -0.3049 -0.3330
a1 -0.2476 -0.3169 -0.4413 -0.2932 -0.3134
a2 -0.1343 -0.3014 -0.1055 -0.1037
a3 -0.2784 0.0026 0.0154
a4 0.3812 0.4051
a5 0.0459
Pu 0.8672 0.8914 0.9063 0.8657 0.9183 0.9198
P22 0.7631 0.7934 0.7990 0.7549 0.7813 0.7828
{I 0.6408 0.6554 0.6820 0.6461 0.7281 0.7304
{2 0.3592 0.3446 0.3180 0.3539 0.2719 0.2696
(1 - PU)-l 7.5320 9.2064 10.6714 7.4468 12.2423 12.4702
(1 - P22)-1 4.2216 4.8399 4.9763 4.0797 4.5724 4.6035
(]'2 l.9385 1.5624 1.4748 1.2227 1.1980 1.1820
InL -23l.29 -229.12 -228.27 -225.38 -218.78 -218.69
11.3 The Hamilton Model
11.3.1 Estimation Results
According to the results of our ARMA representation based model pre-selection, the
empirical analysis of the German business cycle can be started with the application
of the MSM(2)-AR(4) model introduced by HAMILTON [1989], whose theoretical
aspects have been discussed in Seetion 11.1. It will be shown that (i.) the Hamilton
specification does not only reveal meaningful business cycle phenomena, but also
that (ii.) the Hamilton specification cannot be rejected by likelihood ratio tests in
the class of MSM(2)-AR(P) models as shown in Table 11.2 and in Seetion 11.7, that
(iii.) the MSM(2)-AR(4) model is supported by likelihood ratio tests of the homoske-
dasticity hypothesis. Furthermore, the main features of the Markov-switching auto-
regressive model will be illustrated by means of the Hamilton model.
Maximum likelihood estimation of the MSM(2)-AR(4) model has been carried out
11.3. The Hamilton Model 223
Figure 11.5: Hamilton's MSM(2)-AR(4) Model
Smcothoed an.d F'ilt.r.d ProbobilWes ; Reg im e I
.
.~---~~~~----,-~~--~~~~----~--~~~
ContriOulöon of lht Iotor.oy Clooin to th. Busönt" Cyelt
with the EM algorithm given in Table 9.19; the numbers in parentheses give the
asymptotic standard errors as discussed in Section 6.6.2:
- A() - A(l) - A(l)

Yt M~tIT - 0.2932 (Yt-1 - M~t-1IT)- 0.1055 (Yt-2 - M ~t-2IT)
(0.0976) (0.1061)
- A(l) _
+ + + Ut
- A(l)
0.0026 (Yt-3 - M ~t-3IT) 0.3812 (Yt-4 - M ~t-4IT)
(0.1225) (0.1015)
Ut '" NID (0, ö- 2 ), ö- 2 = 1.1980 In L = -218.7775

(0.1777)
1.0774 0.9183 0.0817

M
- ,= (0.1276)
p=
(0.0431 ) (0.0431)
0.3050 0.2187 0.7813

(0.2364) (0.1074) (0.1074)
These results are in line with MSM(2)-AR(4) models estimated by GOODWIN [1993]
224 Markov-Switching Models of the German Business Cycle
Figure 11.6: MSM(2)-AR(4) Model. Probabilities oEthe Regime "Recession"
0.5
0 .0 ~~~~~~~~~~~~~~~~~~~~~~~~~~
60 65 70 75 80 8S 90 95
[1995] for data from 1961:2 to 1991:4 as weH as the MSM(2)-AR(1) model fitted by
PHILLIPS [1991] to monthly growth rates in West-German industrial production.
The implications of the MSM(2)-AR( 4) model for the statistical characterization of

the German business cyc1e are visualized in Figure 11.5. In Figure 11.5 and in the
foHowing figures, the time paths of smoothed fuH-sample probabilities are plotted as
solid lines, while the filtered probabilities are plotted as dotted lines. As discussed
in fuBlength in Chapter 5, the filtered probabilities Pr(St = lIYi), Pr(St = 21Yi)
represent an optimal inference using only the current information up to time t. The
smoothed probabilities of being in the high-growth state 1 ("boom") Pr( St = IIYT)
or the low-growth-state 2 ("recession") Pr( St = 2IYT ) are based on the fuB infor-
mation of the sampie.
11.3.2 Contribution to the Business Cycle Characterization
Since the most innovative aspect of the Hamilton model is the ability to objectively
date business cyc1es, a rnain purpose of our analysis is to check the sensitivity of
business cyc1e classifications to the model specification.
For the frarnework of two-state Markov-switching models with a high-growth state

1 and low-growth state 2, J..tl > J..t2, HAMILTON [1989] has proposed to c1assify
periods with a smoothed probability Pr( St = IIYT ) > 0.5 as "booms" or with
Pr(St = IIYT) < 0.5 as "recessions".
Figure 11.7: MSM(2)-AR(4) Model: Regime Shifts and the Business CycJe
- 2
-4
60 65 70 75 80 85 90 95
In general, we will use the following simple rule for regime classification: attach
the observation at time t to the regime m * with the highest full-sample smoothed
probability,
m* := arg max Pr(St = mIYT). (11.2)
m
This procedure is in two-regime models equivalent with the 0.5 rule proposed by
HAMILTON [1989] such that
> 0.5
m' = { ~ if Pr(St = lIYT )
otherwise.
Interestingly, the tradition al business cycle dates given in Figure 11.6 correspond
fairly closely to the expansion and contraction phase as described by the Markov-
switching model. In contrast to the conclusion of KÄHLER & MARNET [1994a,
p.173] who "were not abLe to find meaningfuL business-cycle phenomena", our es-
timated MSM(2)-AR(4) model detects precisely the recession in 1966:3-1967:2 as
weIl as the recessions in 1973:4-1975:2 and 1980: 1-1982:4 succeeding the oil price
shocks. Furthermore, the model is able to describe even the macroeconomic tenden-
eies after the German reunification.
One advantage of the MS-AR model is its ability not only to classify observations,
but also to quantify the uncertainty associated with this procedure of regime classific-
ation. Ifwe attach the observation at time t to theregime m* accordingto rule (11.2)
226 Markov-Switching Models of the Gennan Business CycJe
then an appropriate measurement of the uncertainty of this classification is given by
M
M-1
' " Pr{St
~
= mIYT ),
mfm·
where M;:;l is the maximal uncertainty if all regimes m = 1, ... , M are possible
with the same probability 1. Hence, the proposed measurement is bounded between
o and 1. Obviously, we get for M = 2, that the probability of a wrong classifica-
tion, which is given by the complementary probability, is nonnalized to 2 Pr( St =f:.
m*IYT ).
The results presented in Figure 11.25for the MSM(2)-AR(4) model ofthe German
business cycle show that uncertainty approaches its maximum at the CIBCR turning
points of the business cycles. Given the results from Figure 11.6 this coincides with
the detection of regime transitions. Thus, the timing of a regime shift seems to be the
main problem arising with the identification of regimes. These findings and Their
implications for forecasting will be reconsidered in Section 11.9.
In MS(M)-AR(P) models ofreal GNP growth, macroeconomic ftuctuations are gen-

erated by regime shifts that occur intennittently, and by the dynamic propagation of
the Gaussian innovations. We propose to measure the contribution of the Markov
chain to the business cycle in general by the estimated mean of 6,Yt conditioned on
the regime inference ~ = {~tIT }:=1'
For the MSM(2)-AR(4) model, we have
where ~ltiT = Pr( St = lIYT). Figure 11.7 reconsiders the estimated time path of
the conditional me an of growth rate which has already been given as the third chart
of Figure 11.5. Obviously the (reconstructed) regime shifts describe the core growth
rate in the historical boom and recession episodes fairly weIl.
Figure 11.8: MSM(2)-AR(4) Model: Impulse-Response Analysis
Impulse Response q> h

1.5 -r-------------r
............................;;. .............. .
= 1 "Boom"
St
1.0
......... / ........... L:i=o q>i
10~f; 0.5
............................ -----
,_ ... -
0.5 ~;;~~
,,
I
,,
I
I
0.0 + - ' , - - - - - - - - - - - l - -0.5
,,
', ................................................ ,................. /f.::1 ... ,
-0. 5
St =
2 "Recession"
+-:....-........~----,r-o-~....-~--r--l-
o 4 8 12 16 20 o 4 8 12 16 20
h h
11.3.3 Impulse Response Analysis
Figure 11.8 shows the dynamic effects of a shift in the regime St and of a shock Ut.
In the left figure, the expected growth rate is given conditional on the information
that the business cycle is at time t in the state of a boom or a recession.
The innovation impulse responses plotted in Figure 11.8 are the coefficients of the
MA( 00) representation,
= I:: M I::
(Xl (Xl
!J.Yt pliVt_i + q>iUt-i with q>o = 1.

i=O i=O
where the coefficients q>i of the infinite lag polynomial cP(L) = O(L)-l can be cal-
culated recursively as
I:: q>i-jOj, with Oj = 0 for j > p,

t
q>i =
j=l
which can be interpreted as the response of the growth rate !J.Yt to an impulse Ut-i,
i periods ago. Thus, the impulse response function for the Gaussian innovation can
be calculated as for time invariant AR processes. 7
7However. trus innovation impulse function has to be distinguished substantially from the forecast error
228 Markov-Switching Models ofthe German Business Cyc1e
The impulse responses exhibit a strong periodic structure. Hence, the remarkable be-
nefit from a fourth lag might be an evidence of spurious seasonality in the considered
seasonally adjusted data.
If the shift in regime would be permanent, the system would jump immediately to
its new level /-LI or /-L2 (dotted line). Due to the stationarity of the Markov chain, the
conditional distribution of regimes converges to the ergodie distribution.
Hence, we have for a finite forecast horizon h,
and in the long-run
These predicted regime probabilities converge to the unconditional probability distri-

bution [ = ([1, , ... , [M )' of the ergodie Markov chain with [= P' [ and 1 ~ [ = 1.
For a two-dimensional Markov chain, it can be shown that the unconditional regime
probabilities shown in Table 11.2 are given by
l _ 1 - P22 d l _ 1 - Pu
<,,1-(1-P22)+(1-pu) an <,,2-(1-Pn)+(1-pu)·
11.3.4 Asymmetries of the Business eycle
An important characteristic associated with business cycles and many other eco-
nomic time se ries (cf. KUNITOMO & SATO [1995]) is the asymmetry of expansion-
ary and contractionary movements. It is thus a great advantage of the MS-AR model
in comparison with linear models that it can generate asymmetry of regimes. The
incorporated business cycle non-linearities in the MSM(2)-AR(4) model are shown
in Figure 11.9.
impulse response function of a linear AR representation. In MS-VAR models, it is essential whether

the impulse response functions are calculated separately for the errors in the measurement and the ob-
servation equation (which gives insight about the relative importance of the occasional shifts in regime
to the normal disturbanees) or the forecast errors in a prediction-error decomposition. As we have seen
in Chapter 3, the impulse response function of a linear representation of a data generating MS(M)-
VAR(P) process reflects also the effects of Markov-switching regimes. Hence the coefficients of the
ARMA representation of MS-AR processes can drastically differ from the true AR coefficients of the
MS-AR process.
Figure 11.9: MSM(2)-AR(4) Model: Duration ofRegimes
Pr(h = j)
0.25 0 = - - - - - , , - - - - - - - - - - - - - - - - - - - - - - - , -
Recession
\
\
0.20 \
\
\
,
\
\
,
\
0.15 \
\',
,
0.10 Boom ""'"
.... ' .............
0.05
--- --- -----
o.00 +------,r---~--_,__---~_._-~-----,....:--:..:::-_r--::.::-c=..--::.::-=-~--=-""'--=____j.
o 4 8 12 16 20
The expected duration of a recession differs in general fram the duration of a boom.
These expected values can be calculated fram the transition prababilities as:
1 (Xl. 1
E[hls t = m] = 2: 00 i LP~mi = 1_ ,m E {I, ... , M}. (11.3)
i=1 Pmm i=l Pmm
In the Hamilton model (cf. Table 11.2) the expected duration of a recession is 4.6
quarters, that of a boom is 12.2 quarters. The unconditional probability of a recession
is estimated as 0.2719.
11.3.5 Kernel Density Estimation
In the MSM(M)-AR(P) model the innovations Ut are assumed to be independent and

identically normally distributed. The ML estimation produces M(p+1) sets of (con-
ditional) innovations {Uit} ;=1 with
p
Uit = /}.Yt - iI~i - L Aj/}.Yt-j far i = 1, ... , MP+1,
j=l
230 Markov-Switching Models of the Gennan Business Cyc1e
Figure 11.10: MSM(2)-AR(4) Model: Kernel Density Estimation
Density Estimate fo.5{UtlSt)

0.5 r---------------------------------------------------~
Recession
0.4 ",,"'-,
"
l"
'\,
\\
0.3 \
\
,,
,
\
0.2 ,,
l
0.1 ,I
,/
---_ .."
-4 -3 -2 -1 o 1 2 3 4
Innovation Ut
where iI is the ML estimate of H defined in (2.12).
Since we are interested in a possible regime-dependence of the variance of Ut, we

want to approximate the probability density function ist (u) as a function only of the
actual regime St. For this purpose we have to estimate the residuals as
form = 1, ... ,M.
Finally, the estimated density is constructed as
(11.4)
where h > 0 is here the bandwidth and
is the Gaussian kerne!. For h = 0.5, theresulting kernel density estimates are given
in Figure 11.10.
11.4. Models with Markov-Switching Intercepts 231
Figure 11.11: MSM(2 )-AR(4) Model. Residuals
4 .---------------------------------------------------~
2 ~~~r_----~r_------------~----------------------~
-2 ~~~------4_-4--------------------~~~--------~~
60 65 70 75 80 85 90 95
Altogether, there is some evidence for regime-dependent heteroskedasticity as weH

as evidence for more than two regimes. These hypotheses are investigated in Sec-
don 11.7.
In Figure 11.2 the same procedure has been applied to the observed time series tlYt.
Since the observations tlYt are dependent, the kernel density estimation is used only
as a descriptive device to illustrate the prob ability density of growth rates in "booms"
and "recessions.,,8
The results of the kernel density estimation should be compared with Figure 11.11
where the expected residuum ut,
M(p+l)
Ut = L
m=l
Umt€mtlT
is plotted against the time. The path of residuals verifies that the business cycIe is
generated by shifts in regime, larger shocks Ut are not related to the CIBCR turning
points.
11.4 Models with Markov-Switching Intercepts
In the Hamilton model there is an immediate one-time-jump in the process mean

after a change in the regime. It may be plausible to ass urne that the expected growth
8 See BIANCHI [1995] for a possible detection of regime shifts by kerne! density estimation.
Table 11.3: MSI(2)-AR(p) Models
MSI(2) MSI(2) MSI(2) MSI(2) MSI(2)

-AR(l) -AR(2) -AR(3) -AR(4) -AR(5)
111 1.5150 1.6009 1.7690 1.3835 1.3991
112 -0.3311 -0.2848 -0.2452 -0.3959 -0.3542
01 -0.2694 -0.3144 -0.3457 -0.3050 -0.2855
02 -0.1140 -0.1707 -0.1293 -0.1278
°3 -0.l442 -0.0596 -0.0615
°4 0.2957 0.2866
°5 -0.0480
Pll 0.8283 0.8813 0.9102 0.9034 0.9046
P22 0.6588 0.7398 0.7863 0.7741 0.7744
6 0.6653 0.6868 0.7040 0.7005 0.7027
(2 0.3347 0.3132 0.2960 0.2995 0.2973
(1 - PU)-1 5.8258 8.4268 11.1345 10.3555 10.4776
(1-P22)-1 2.9310 3.8433 4.6805 4.4277 4.4328
0'2 1.2115 1.2653 1.2727 1.1499 1.1584
InL -228.68 -227.86 -226.51 -219.56 -219.39
rate smoothly approaches a new level after the transition from one state of the busi-
ness cycle to another. For these situations, the MSI-AR model may be used. Esti-
mation results for alternative MSI specifications for the period 1962: 1 to 1994:4 are
summarized in Table 11.3.
Interestingly, the results are very similar to those of the last section. As a compar-
ison with Table 11.2 verifies, the estimated parameters of the Markov chain and the
likelihood are quite close to the corresponding MSM(2)-AR(P) models. Again an
MS(2)-AR(4) model outperforrns models with lower and higher AR orders. This
can be shown by means of a likelihood ratio test of the type Ho : op = 0 against
H 1 : op "# 0 shows which is asymptotically X2 (1) distributed. Thedifferences in the
properties of the MSM(2)-AR(4) and the MSI(2)-AR(4) model shall be compared in
the following considerations.
In Figure 11.12, the conditional growth expectations and the impulse response func-
tion is given for the MSI(2)-AR(4) model. While the impulse responses are quite
similar to those ofthe MSM(2)-AR(4) model, a comparison ofthe dynamic propaga-
11.4. Models with Markov-Switching Intercepts 233
Figure 11.12: MSI(2)-AR(4) Model: Impulse-Response Analysis
Conditional Growth E[b.Yt+hlstJ Impulse Response <P h

1.5 .,...---------------r
S, =;-... :.'BO~.~." ...... .
1.0
.......... .............. . ",
:' ..... " ... ".,.....
..........'
1.0
0.5
'0,::':
0 .5
,,'\,'
\'
- ...... - ,
-'
I
0.0 V V
/\
v
"
I \ I
I
I
" - 0.5
0.0 +f--'- - - - - - - - - - - - - - 4
: s, =2 "Recession" 1
: ..... .... a(l)- /.12
" ' ..... , ...... ,. ........ .. ...... ... ... ..
,
-1.0
- O. 5 +----.-,.--~--..--,,----~__r_o,----...._,_~_.__f ~-~-~-,----~-~-~
o 4 8 12 16 20 o 4 8 12 16 20
h h
tion (dotted lines) of a permanent shift in regime in the MSI model with those of
MSM model illustrates the different assumptions of both models.
As we have seen in Figure 11.8, a permanent shift in regime induces in the MSM(2)-
AR(4) model a once-and-for-all jump in the process mean. In the MSI model,
however, a permanent change in regime causes the same dynamic response (dotted
lines in the left diagram of Figure 11.12) as the accumulated impulse-responses of a
Gaussian innovation with the same impact effect (dotted line on the right),
00
E[b.Ytl~l = ~ <Pj H~t-j.

j=O
Thus the periodic structure of the impulse-responses as seen in right of Figure 11.12
is translated into the dynamic propagation of a shift in regime.
As long as the Markov chain is ergodic, though, the dynamic effects of a shift in
regime in both approaches differ only tran si tory. More precisely,
where 6-j = (for j > 0 is assumed.

234 Markov-Switching Models of the Gennan Business Cyc1e
Figure 11.13: TheMSI(2)-AR(4) Model
Smoothtd- ond Fl lltr.d ProbObilities; R~im. ,
Smoolhee <md r itt. red Probobiliti es : ReQim. 2
,
r ~.
.
j ~:
"\".
r; ,
, .
I "
.' l
-..
Ccntribution 0' th. McrkQ'I.I Choin to th . Bus in.Es Cycl e
Since the long-term mean growth rate depends only on the stationary distribution
of the state of the Markov chain t and is thus given by the unconditional mean
6.y = P, = M t<l) in the MSM model and 6.y = 2:;:0 <Pj H( in the MSI model,
respectively.
In Figure 11.13 the contribution of the Markov chain to the business cycle is again
measured by the estimated mean of 6.Yt conditioned on regime inference ~ =
{ttiT 1;=1' [LtiT = MttlT = E[6.Ytltl which can be calculated in the MSI(2)-
AR(P) model recursively as
P
[LtiT = L O:j[Lt-jIT + zltlT'
j=l
(11.5)
where the smoothed intercept term is given by

11.5. Regime-Dependent and Conditional Heteroskedasticity 235
The p-th order differenee equation (11.5) is initialized with the uneonditional mean,
p
fi = (1 - L Öj )-1 (iidl + ii2(2).
j=l
Thus, the ealculation of fl,tlT is slightly more eomplieated than for MSM-AR models.
As ean be seen from the estimation results in Table 11.3 and Table 11.2, as well as a
eomparison of Figure 11.13 with Figure 11.5, the similarity of the regime classifie-
ations to those of the MSM(2)-AR(4) model is obvious. A major differenee whieh
oeeurs by using the 0.5 rule eoneerns the year 1987, where the MSI-AR model de-
teets a one-quarter reeession whieh leads the stock market crash.
Thus, Markov-switching models with a regime shift in the intercept term ean be used,
as weB as models with a regime shift in the mean, as a device to describe the German
business cycle.
11.5 Regime-Dependent and Conditional Heteroske-

dasticity
In this section we will relax the assumption that the white noise proeess Ut is homo-
skedastic, instead allowing for regime-dependent heteroskedasticity of Ut,
Even if the white noise process Ut is homoskedastic, (T2 (St) = (T2, the observed pro-
cess ,6,Yt may be heteroskedastic. The process is called conditionally heteroskedastic
if the conditional varianee of the forecast error
Var (,6,YtIYt-l1,.\) = E {(,6,Yt - E[,6,YtIYt-b ,.\])2 IYt-l'''\}
is a funetion of Yt-l. This implies for MS-AR proeesses with regime-invariant auto-
regressive parameters that the conditional varianee is a function of the regime infer-
ence €t-llt-l' A necessary and sufficient condition for conditional heteroskedasti-
city of these processes is the predictability of the regime vector.
For the MSIH(2)-AR(P) model, the effect of the actual regime classification ~tit for
the conditional heteroskedasticity of the forecast error varianee in t + 1 is given by
2 2 2 2
+ (Tl + (JLl
A
Var (,6,YtlYt-l,,.\) = (T2 - (T2)~1,t+Ilt - JL2) ~l,t+Ilt(l - ~l,t+Ild

A A
(11.6)
Figure 11.14: MSI(2)-AR(4) Model: Conditional Heteroskedasticity
MSI(2)-AR(4) MSIH(2)-AR(4)
2.0 1---:::::======::::::---' 2.0 r---=======::::::::=----I
1.5 1.5
---------
1.0 Var (L'lYt+ll~t+l)
0.5 0.5
0.0 +--~-~----,-~--~~+ 0.0 +--------.--~----+

0.0 0.5 1.0 0.0 0.5 1.0
Probability of the regime "boom" Probability of the regime "boom"
where tl,t+llt = (,~ pi tm. Ifthe variance(J'2 ofthe white noise term Ut is not regime
dependent, as in the MSI(2)-AR(P), the caIculation of the conditional forecast error
variance in t + 1 (11.6) simplifies to
In Figure 11.14, these two components of the forecast error variance are illustrated
for the MSI(2)-AR(4) model which has been discussed in Section 11.4 and the
MSIH(2)-AR(4) model, which will be introduced next.
It will be clarified that the uncertainty associated with regime classification tltlt(l-
tltld is immediately transformed into the forecast error variance through
where F = Pu - (1 - P22).
In MSM specifications, the caIculations are more complicated since the conditional
density P(Yt+l!Yt, ~d (and thus the conditional variance) depends on the MP+l di-
mensional state vector. The uncertainty resulting from /-L( St), ... , /-L( St-P+l) has to
be taken into consideration.
Table 11.4: MS(2)-AR(p) Models with Regime Dependent Heteroscedasticity
MSIH(2) MSIH(2} MSMH(2) MSIH(2) MSIH(2) MSIH(2) MSMH(2) MSIH(2)

-AR(O) -AR(l) -AR(l) -AR(2) -AR(3) -AR(4) -AR(4) -AR(5)
J.lI, VI 1.3062 1.1920 0.9659 1.3917 1.6202 1.3762 1.0120 1.3918
J.l2, V2 0.5957 -0.2847 -0.2855 -0.3506 -0.4281 -0.4324 -0.3273 -0.3889
001 .().2449 -0.2224 '().3002 -0.3380 -0.3089 -0.2754 -0.2895
002 -0.1251 -0.1804 -0.1344 -0.1405 .().1321
003 -0.1508 -0.0629 -0.0529 .().0641
004 0.2956 0.2892 0.2869
oos -0.0465
Pll 0.8868 0.9565 0.9541 0.9539 0.9532 0.9120 0.9490 0.7152
P22 0.9864 0.8370 0.8191 0.8349 0.8266 0.1798 0.8221 0.2848
6 0.1075 0.7892 0.7976 0.7816 0.7!r76 0.7144 0.1770 11.3621
€2 0.8925 02108 0.2024 0.2184 0.2124 0.2856 0.2230 4.5236
(l-Pll )-1 8.8302 229705 21.7930 21.6688 21.3886 11.3627 19.5944 0.9120
(1-P22)-1 73.3121 6.1364 5.5290 6.0563 5.7617 4.5416 5.6222 0.1789
(72 5.6469 1.8643 1.8772 1.7609 1.6418 1.1955 1.5080 1.1975
I
(72 1.1400 05622 0.4606 0.5827 0.5660 1.0453 0.5348 1.0656
2
1nL -220.25 -226.00 -225.06 -225.72 -224.92 -219.49 -217.22 -219.33
We study now the implication of regime-dependent heteroskedasticity on the em-

pirical characterization of the German business cycle. The estimates of the MS(2)-
AR(P) models with regime-dependent variances (Ti =1= (Ti are given in Table 11.4,
which can be compared with the results of the restricted ML estimation with (Ti = (Ti
in Table 11.2 and Table 11.3.
In Section 7.4.1 it has been shown that the likelihood ratio test can be based on the
LR statistic
LR = 2(lnL()') -lnL(X o)),
where ).0 denotes the restricted ML estimate of the n dimensional parameter vector
.\ under the null Ho: <p(.\) = 0, where r= rk (a~\),,)) ~ n. Under the null,
LR has an asymptotic X2-distribution with r degrees of freedom as stated in (7.3).
Conditional on the regime dependence of the mean, J.LI > J.L2, or the intercept term,
VI > V2, likelihood ratio tests of hypotheses of interest such as (Ti = (Ti can be
performed as in models with deterministic regime shifts.
For the MSIH(2)-AR(4) model the conditional forecast error variance as weil as the
conditional variance of the error term have been illustrated in Figure 11.14.
Figure 11.15: The MSIH(2)-AR(4) Model
.
' . (nl'iji\
v:V ,,·A. ,.r
\ / ., ' ~
""
\ I .
\.;-0 .
'.
L
Smcotl'l ed ond rat.rad P"'Obobllit ies: R4t9 ime 2
" I
.'
I'. . : ..
. . .. ~
..
Contrio-ution 0' t". '-40t'lc.ov Choin to th. Bu, i"'.1$ Cycle
Testing for regime-dependent heteroskedasticity gives LR = 2[(-219.49)-

(-219.56)]= 0.14 for the hypothesis MSI(2)-AR(4): (J'r = (J'~, VI > V2 versus
MSIH(2)-AR(4): (J'r # (J'~, VI > V2. With X~.95(1) = 3.84146, the null hypo-
thesis of a regime invariant variance of the innovation process cannot be rejected.
This result of the LR test is illustrated in Figure 11.15, which makes dear that
the MSIH(2)-AR(4) model leads to exactly the same regime dassification as the
MSI(2)-AR(4) model (cf. Figure 11.13). The rather small difference in the variance
has obviously no effect on the regime detection which is dominated by a shift in the
mean due by the different phases of the business cyde.
In Section 11.8, where we combine the feature of a regime-dependent heteroske-

dastic white noise with the introduction of an additional state, these results will be
further investigated. In the following we consider the effect of a regime-dependent
heteroskedastic white noise in the Harnilton model.
The kernel density estimation in Section 11.3.5 has provided same evidence that it
Figure 11.16: The MSMH(2)-AR(4) Model
I.·'
Smoot "'ed ond fi tter.d P robcbi liti.s: : Reg ime 1
'. .
,
1
. , J
'. 1-
.~-----~----~----.-~~~--~~~~----~----~----~
.. . ;
Contribulion of the Mor kOy Chain to t". Busi ness C)rC~
. \.f':;' .
, "
, I
may be too restrictive to assurne that the regime shift does not alter the variance of
the innovation Ut. An estimation of the MSMH(2)-AR(4) model seems to support
this view, ar = 1.5080 > 0.5348 = a~. However, for the null hypothesis MSM(2)-
AR(4): ur = u~, J.Ll > J.L2 versus the alternative MSMH(2)-AR(4): ur =I u~, J.Ll >
J.L2, we get the LR test statistic LR = 2[( -217.22) - (-218.78)] = 3.12. With an
conventional critical value of X6.9s(1) = 3.84146, the null hypothesis of a regime
invariant variance of the innovation process cannot be rejected.
In contrast to KÄHLER & MARNET [1994a], we conclude that allowing for regime-
dependent heteroskedasticity cannot significantly improve the fit of the model. But
this result may depend essentially on the estimation period. This outcome of the LR
test is visualized in Figure 11.16, which makes clear that the MSMH(2)-AR( 4) model
leads to a very similar regime classification as the MSM(2)-AR(4) model (cf. Fig-
ure 11.5). There are two major changes: the short CIBCR recession in 1963 is now
attached to the more volatile expansionary regime, and the same holds true for the
11.6 Markov-Switching Models with Multiple Re-

gimes
The ARMA representation based model selection indicates that more than two stages
should be taken into account. In particular, a Markov-switching model with five re-
gimes has been recommended as an ingenious device.
In recent work, SICHEL [1994] has suggested including an additional post-

recessionary high-growth state. For U.S. data, it has been shown by HAMILTON &
SUSMEL [1994] that extensions of the original Hamilton model to multiple states
results in the creation of separate outlier states. This evidence can also be supported
for the West-German GNP data. This is exactly what we will find in the Markov-
switching models with more than two states, which we consider in the following.
The estimation results to be discussed are summarized in Table 11.5.
11.6.1 Outliers in a Three-Regime Model
The ML estimation of an MSI(3)-AR(0) model identifies two outlier states (regime

1 and 3) representing extreme values of economic growth,
0.4907 0.3157
- , 3.2997]
[ 0.0000
0.1936]
M = [ 0.6353, P = 0.9732 0.0268 .
-2.5763 0.9999 0.0001 0.0000
As seen in Figure 11.17, the MSI(3)-AR(0) model has completely lost its business
cycIe characterization. The outlier periods coincide with epochs of high volatility in
the process of economic growth; the period 1968-71 with an active Keynesian sta-
bilization policy and drops in industrial production caused by strikes, and the first
quarter in the year of the stock market crash 1987.
11.6. Markov-Switching Models with Multiple Regimes 241
Table 11.5: MS(M )-Mode1s with more than two Regimes
MSI(3) MSnI(3) MSMH(3) MSI(4) MSnI(4) MSnI(4) MSnI(4) MSI(5) MSIH(5) MSIH(5)
-AR(O) -AR(O) -AR(3) -AR(O) -AR(O) AR(2) -AR(4) -AR(O) -AR(O) -AR(l)
P-l, VI 3.2997 3.2969 0.9068 3.4190 3.4027 3.5906 3.5143 3.9168 3.2809 3.5677
P-2, V2 0.6353 0.6370 1.2712 1.0601 1.2000 1.4362 1.3852 1.6698 1.4460 1.5170
P-3, V3 -2.5763 -2.6269 -0.3436 -0.1917 0.3388 -0.3528 -0.3657 0.9966 0.6312 1.1761
P-4, V4 -2.5008 -2.6237 -2.2066 -2.2104 -0.2181 -0.3499 -0.3255
P-S, VS -2.6016 -2.6180 -2.1021
0<1 -0.3576 -0.2376 -0.2345 -0.2386
0<2 -0.1268 -0.0012 -0.0038
0<3 -0.0985 0.0144
0<4 0.0265
Pll 0.4907 0.4948 0.7459 0.4007 0.4533 0.5527 0.5524 0.1433 0.4886 0.5366
P12 0.3157 0.3113 0.2541 0.4587 0.3810 0.2281 0.2284 0.3424 0.0000 0.2487
Pl3 0.1936 0.1939 0.0000 0.0000 0.0034 0.0000 0.0000 0.5143 0.3215 0.0000
P14 0.1406 0.1622 0.2192 0.2192 0.0000 0.0000 0.0000
PIS 0.0000 0.1899 0.2147
P2l 0.0000 0.0000 0.1782 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
P22 0.9732 0.9737 0.5814 0.8372 0.7573 0.7725 0.7846 0.6035 0.5015 0.3031
P23 0.0268 0.0263 0.2404 0.1099 0.1589 0.1861 0.1741 0.0000 0.4653 0.6969
P24 0.0529 0.0838 0.0414 0.0413 0.0000 0,0000 0.0000
P2S 0.3965 0.0332 0.0000
P3I 0.9999 1.0000 0.0081 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
P32 0.0001 0.0000 0.3094 0.2208 0.0794 0.3393 0.3271 0.0000 0.1100 0.0784
P33 0.0000 0.0000 0.6825 0.7792 0.9206 0.6607 0.6729 0.8825 0.7866 0.7620
P34 0.0000 0.0000 0.0000 0.0000 0.1105 0.0699 0.1022
P3S 0.0070 0.0335 0.0574
P4l 1.0000 1.0000 0.7880 0.7847 0.0000 0.0000 0.0000
P42 0.0000 0.0000 0.2120 0.2153 0.0421 0.2254 0.1835
P43 0.0000 0.0000 0.0000 0.0000 0.1737 0.0017 0.0000
PH 0.0000 0.0000 0.0000 0.0000 0.7843 0.7729 0.8165
P4S 0.0000 0.0000 0.0000
PSI 1.0000 1.0000 0.7853
PS2 0.0000 0.0000 0.2147
PS3 0.0000 0.0000 0.0000
PS4 0.0000 0.0000 0.0000
PSS 0.0000 0.0000 0.0000
EI 0.0753 0.0749 0.2922 0.0686 0.0651 0.0684 0.0685 0.0323 0.0752 0.0714
(2 0.8863 0.8873 0.4028 0.5945 0.2987 0.5765 0.5824 0.0595 0.1912 0.1595
(3 0.0384 0.0378 0.3049 0.2958 0.6006 0.3162 0.3100 0.5823 0.5315 0.4669
{4 0.0411 0.0356 0.0389 0.0391 0.2983 0.1636 0.2601
(s 0.0277 0.0385 0.0421
(I-Pll) 1 1.9633 1.9795 3.9358 1.6687 1.8291 2.2356 2.2340 1.1673 1.9555 2.1582
(1-P22)-1 37.2831 38.0814 2.3890 6.1419 4.1210 4.3957 4.6420 2.5218 2.0062 1.4349
(l-P33)-1 1.0000 1.0000 3.1492 4.5283 12.5944 2.9470 3.0574 8.5117 4.6861 4.2010
(1-P44)-1 1.0000 1.0000 1.0000 1.0000 4.6351 4.4031 5.4509
(l-pss)-l 1.0000 1.0000 1.0000
O"t 1.0306 1.1313 3.6251 0.7118 1.0626 0.2448 0.2588 0.7110 1.1576 0.2503
0"2 1.0501 0.1904 0.3955 0.4280 0.4510 0.1252 0.0522
2
0"2 0.3034 0.3854 1.1549 0.4308 0.4447 1.0913 0.8545
O"~4 0.3095 0.6612 0.6990 0.2432 0.5219
O"~ 0.3064 0.7706
InL -211.62 -210.43 -209.82 -206.37 -204.95 -198.11 -198.09 -201.70 -199.60 -190.45
Figure 11.17: The MSl(3 )-AR(O) Model
Smoot .... '" 01'\0 ' 1It.,.." Pro !>4Ua:i.. . III: ~. I
Il ~ IU ·
~I" . CI
C'
ol"ldl f ll\ .,.o P'tooob" lI:l . ..:

<2
R~ .. 2
c
v
~ I
\1 l~\ J ~
."
I
~.
I1
'. · $n\oolft,. ", CI",II "'1\ .,...0 Pntbor. lkl. . : " ' - J
[ ~ H~ · Coto.lnoY h Ofto 0 1 IN
0
1oI0 !'ttoY '1'101" lO t,.. ~.

p
cyt _
~ ;~
I
,
.
,
f ffi · f····
I
;, '.
.\ . i
: . . .•
' /".
.
.' ! .~ .
vi '.";:,
.1
,
·
1
11.6.2 Outliers and the Business CycIe
Therefore it is necessary to include a fourth state or some lagged variables in order

to aehieve a business eyde model. In the MSI(4 )-AR(O) model, the regimes 1 and
4 deteet (as in the MSI(3)-AR(0) model) the outliers at the end of the sixties and in
1987. The expeeted duration of regime 1 is 1.8 quarters; that of regime 2 is exactly
one quarter (cf. Table 11.5).
The regimes 2 and 3 are associated with business eyde phenomena. Regime 2 with
an expected growth rate of 1.06, which is extremely dose to the 1.07 of the MSM(2)-
AR(4) model, reflects phases of"normal" expansions. Also, the reeessionary state 3
is quite eompatible with the Hamilton model. The expeeted duration of a recession
is identically 4.5 quarters, the conditional mean growth rate is -0.19 vs. -0.30. So,
Figure 11.18: The MSI(4)-AR(O) Model
11
••
11. L ~~~, ~f16V.

.
IIW\o ,, """ .I "'- ____ ~
A~ ' I
.1
I ... · '~ I
_ .. .. •
3.4190 0.4007 0.4587 0.0000 0.1406

- I 1.0601 0.0000 0.8372 0.1099 0.0529
M P
-0.1917 0.0000 0.2208 0.7792 0.0000
-2.5008 1.0000 0.0000 0.0000 0.0000
The regime probabilities associated with the MSI(4)-AR(0) model are plotted in Fig-
ure 11.18. Unfortunately, the cIassification uncertainty is relatively high. Allow-
ing for a first-order autocorrelation destroys some of these features as in an MSI( 4)-
AR(l) model. In contrast to previous specifications, first-order autocorrelation and
heteroskedasticity are not evident for MSI models with five regimes.
11.6.3 A Hidden Markov-Chain Model of the Business Cyde
The model pre-selection in Section 11.2.3 has shown that an MSI(5}-AR(O) model
may have generated the autocovariance function of the observed West-German GNP
Figure 11.19: The MSI(5)-AR(O) Model
Il ~. H~ . ,0
[ \" ~tm~,. - t .-. . _D, I

'1;· J: I
~-",-""""",,,,,,,.
~I j... \,' ,./".: .", .', : .";;'; . .. --: -. )"1~

,·C'" \.,. j' -~
~, ~ JJi\!\ ~ ~\ ' 'r \/'" i f i '" r '
..
s-..w_,..,.. ........
' - ,-~~~-
i lt kfDJ'\i~; f\__ I
~.
1~'----~'~~~'--~~j~--~~~~~----~--~~~~~LL2- ----4
I ~ . ~~~ l
,--_n..-~""'t
r ~. I c. ..................... a... .... -...-a.
.[ ' ' , ', j. :-.,~ I
growth rates. Thus, a hidden Markov-chain model with five states and no auto-
regressive structures may be a feasible choice.
In the MSI(5)-AR(0) model as weIl as in the already considered MSI(4)-AR(0)

model, macroeconornic ftuctuations are exclusively explained as shifts in regime.
Thus, the dynarnics in the data are captured by the higher dimensional Markov chain
instead of a rich linear structure (cf. Figure 11.19),
3.9168 0.1433 0.3424 0.5143 0.0000 0.0000

1.6698 0.0000 0.0000 0.6035 0.0000 0.3965
- ,
M= 0.9966 p= 0.0000 0.0000 0.8825 0.1105 0.0070
-0.2181 0.0000 0.0421 0.1737 0.7843 0.0000
-2.6016 0.0000 1.0000 0.0000 0.0000 0.0000
Figure 11.20: MSI(5)-AR(0) Model. Probabilities of the Regime "Recession"
0.5
0.0 ~~~~~~~~~~~~~-L~~~~~~~~~~~~~
60 65 70 75 80 85 90 95
The MSI(5)-AR(O) model has remarkable tuming point dating abilities, as a com-
parison with the CIBCR dated tuming points in Figure 11.20 verifies.
As seen in Figure 11.19, the elimination of outliers allows (again) an economic

interpretation in this model since they reftect high volatility in economic activity.
However, the episodes in 1963, 1968-71, and 1987 attached to the regimes 1,2 or
5 may indicate a heteroskedastic white noise process. These shifts in the variance
of the process Ut do not, though, seem to be correlated with shifts in the mean from
booms to recessions et vice verSQ. Therefore, the hypothesis of heteroskedasticity
may not be rejected in the pure business cycIe models of Section 11.3 and Sec-
tion 11.4. This issue will be examined in Seetion 11.8, where we will investigate
whether compatibIe results can be achieved by introducing a third state associated
with a high volatility in the 1963 and 1968-71 periods into the Hamilton model.
11.6.4 A Highly Parameterized Model
In this section we have so far only considered hidden Markov-chain models with a
homoskedastic white noise. Augmenting the MSI(5)-AR(O) model with a first-order
autoregression and regime-dependent variances leads to an MSIH(5)-AR(1), where
the boom regime splits into pre- and post-recessionary expansion phases represented
Figure 11.21: The MSIH(5)-AR(1) Model
i
"
~ •
1,~,--~------------------------~----~--~----~. I
by regimes 3 and 2:
3.5677 0.5366 0.2487 0.0000 0.0000 0.2147

1.5170 0.0000 0.3031 0.6969 0.0000 0.0000
- I
M= 1.1761 ,p= 0.0000 0.0784 0.7620 0.1022 0.0574
-0.3255 0.0000 0.1835 0.0000 0.8165 0.0000
-2.1021 0.7853 0.2147 0.0000 0.0000 0.0000
The main difference between the MSI(4)-AR(O) model and the MSIH(5)-AR(1) lies
in the function of regime 2. In the MSIH(5)-AR(1) model, regime 2 hel ps to rep-
licate excessively high growth after the end of recessions, a phenomenon that has
been stressed by SICHEL [1994] for the United States. It should be noted that the
MSIH(5)-AR(1) model is characterized by a very fast detection of economic up-
11.7. MS-AR Models with Regime-Dependent Autoregressive Parameters 247
swings.
11.6.5 Some Remarks on Testing
As discussed in Chapter 7, standard asymptotic distribution theory cannot be invoked

for tests concerning the number of states of the Markov chain. Due to the exist-
ence of nuisance parameters, the likelihood ratio test statistic of the null J.Ll = J.L2
has no asymptotic standard distribution. Unfortunately, as pointed out by GARCIA
[1993], the true asymptotic distribution of the standardized LR statistic depends on
the data and parameters, so generic tabulation is in general not possible. However,
the asymptotic distribution of the LR statistic has been tabulated by GARCIA [1993]
for an MSM(2)-AR(1) model. The critical values depend on the value of the auto-
regressive parameter, but in no case is the 5% critical value less than eight. For the
period 1961-91 the test statistic LR = 2(219.43 - 216.11) = 6.64 was much smal-
ler, hence, the null of a time invariant AR(I) process cannot be rejected. However,
this result is not really surprising since we have already found that an MSM(2)-AR(1)
is dominated by other Markov-switching models. For example, the log-likelihoods
given in Table 11.2 verify that the MSM(2)-AR( 1) model can also be rejected against
the MSM(2)-AR(4) model.
11.7 MS-AR Models with Regime-Dependent Auto-

regressive Parameters
For U.S. data, HANSEN [1992] found some evidence for an MSA(2)-MAR(4)
model with shifting autoregressive parameters, but a regime invariant mean. 9
Consequently, the previous assumption that the regime shift does not alter the
autoregressive parameters, will be relaxed in this section.
The estimation results for MSIAH(M)-AR(P) where all parameters are assumed to
shift are given in Table 11.6. For the MSIAH(2)-AR(4) model the probability ofbe-
ing in regime 2, which is characterized by a lower intercept term V2 = 0.3608 <
1.9978 = VI and variance o-i = 1.2004 < 6.7104 = 0-; is plotted in Figure 11.22.
gIn contrast to the MSA(2)-MAR(4) model, the likelihood ratio test proposed by HANSEN [1992]
could not reject a regime invariant AR(4) model against an MSM(2)-AR(4) model; compare, however,
HANSEN [1 996a].
Table 11.6: MSIAH(M)-AR(p) Models
MSIAH(2) MSIAH(2) MSIAH(4) MSIAH(4)

-AR(l) -AR(4) -AR(l) -AR(4)
111 1.7295 1.9978 3.3170 3.4648
01.1 -0.4139 -0.4080 ·0.3876 -0.5416
a2.1 0.0230 -0.3221
03.1 -0.3635 0.1317
a4.1 0.0757 0.0037
0"2
1 6.2775 6.7103 0.2845 0.7125
112 0.5524 0.3608 1.5419 1.3671
°1.2 0.0893 -0.3034 -0.2638 -0.4416
a2.2 0.1045 -0.0292
°3.2 0.5617 -0.0616
°4.2 0.1733 -1.0162
2
0"2 1.1313 1.2004 0.3697 0.5348
113 -0.2436 -0.2765
01.3 0.1417 -0.4836
a2.3 -0.0212
a3.3 0.0494
°4.3 -1.1067
0"2
3
0.4564 0.4835
114 -3.8077 -0.0447
01.4 1.1721 0.4440
02.4 0.1542
a3.4 0.1137
a4.4 0.0368
0"2
4
1.2763 0.0238
PlI 0.8607 0.8687 0.0000 0.7525
P12 0.1393 0.1313 0.0000 0.1276
P13 0.0000 0.0000
P14 1.0000 0.1199
P21 0.0212 0.0173 0.0000 0.0000
P22 0.9788 0.9827 0.7043 0.8367
P23 0.2344 0.1110
P24 0.0614 0.0522
P3l 0.0000 0.0000
P32 0.3059 0.1201
P33 0.6941 0.8089
P34 0.0000 0.0710
P4l 0.6351 0.3941
P42 0.3649 0.6059
P43 0.0000 0.0000
P44 0.0000 0.0000
InL -219.00 -213.54 -192.39 -186.04
11.7. MS-AR Models with Regime-Dependent Autoregressive Parameters 249
Figure 11.22: MSIAH(2)-AR(4) Mode1. Probabilities of the Regime "Recession"
0.5 [1
65 70 75 80 8S 90 9S
Obviously, the regime shifts detected by these models are not closely related to turn-
ing points of the business cycle. The level effect associated with business cycle phe-
nomena seems to be dominated by changes in the covariance structure of the data
generating process in all MSIAH(M)-AR(P) models considered so far. Interestingly,
MSIAH(4)-AR(P) models exhibit regimes with unit roots. Thus these models are re-
lated to the conceptof stochastic unit roots as in GRANGER & SWANSON [1994] and
have to be investigated in future research.
Instead we are going to test the MSI(2)-AR(4) and the MSIH(2)-AR(4) model
against the MSIAH(2)-AR(4) model where all parameters are varying apply-
ing usual specification testing procedures. The likelihood ratio test statistic is
LR = 2[(-213.54) - (-219.56)] = 12.04 for a test of the MSI(2)-AR(4) against
the MSIAH(2)-AR(4) and LR = 2[(-213.54) - (-219.49)] = 11.90 for the
MSIH(2)-AR(4) model. Under the number-of-regimes preserving hypothesis.
Vl =I V2. and critical values of X6.9S(5) = 11.0705 and X6.9S(4) = 9.48773, the
null hypothesis of regime-invariant autoregressive parameters is rejected at the 5%
level.
The strong support for Markov-switching models with regime-dependent autocov-

ariance structures might indicate that the non-linearity of the data generating process
may only be partially associated with economic upswings and downswings which
have been represented as Markovian shifts in the level of the considered time series.
250 Markov-Switching Models of ehe German Business eyc1e
Figure 11.23: The MSMH(3)-AR(4) Model
J
J
I
J
'-
'\ \ ) .
- ,
J,V ·\"'
: i
!' "
;~ .. .
- .. .
;'1;
.~. .
. ~ .
11.8 An MSMH(3)-AR(4) Business Cycle Model
Although the Hamilton model of the U.S. business cycle cannot be rejected in the
cIass of two-regime models, our previous analysis indicates that there are features
in the data which are not well-explained by the MSM(2)-AR(4) model. In partic-
ular, the evidence for an MS(5)-AR(0) model suggests to take the extreme macro-
economic ftuctuation of the 1968-71 period into eonsideration. These results un-
derline the need for an extension of the Hamilton model for an adequate empirical
eharacterization of the West-German business eycIe. Therefore, in this seetion, the
Hamilton model is augmented in two respects: an additional regime is introduced
and the varianees are considered to be regime-dependent.
An overview of the results with an MSM(3)-AR(4) model is given in Figure 11.23.

11.8. An MSMH(3)-AR(4) Business Cycle Model 251
Figure 11.24: MSMH(3)-AR(4) Model. Probabilities ofthe Regime "Recession"
1.0 r-~~--~~--~~--~r-----~~~~--~~----~
.. . - __ ~
., ..
0 .5
0.0 ~~~~~~~~~~~~~~~~~~~~~~~~~~
60 65 70 75 80 85 90 95
This model has two important features . First, the business cycle phenomenon of the
Hamilton model are replicated by the second regime and the third one. Secondly,
the first regime separates those periods from the virtual business cycles which have
been captured by the MSI(5)-AR(0) model as the three outlier states. In comparison
to the normal expansionary regime, this episode is characterized by a slightly higher
mean and a much higher variance of innovations.
The following parameters of the MSMH(3)-AR( 4) model have been estimated:
- - (1 ) - ~( 1) - - (1)
Yt M ~tlT - 0.2963 (Yt - l - M ~t-lIT) - 0.1407 (Yt-2 - M ~t-2IT)
(0 .1060) (0 .1081)
- -(1) - -(1) -
0 .0290 (Yt-3 - M~t-3IT) + 0.2127 (Yt-4 - M~t-4IT) + Ut
(0.0949) (0 .0916)
Ut ....., NID (0, ~~t), In L = -208.67.
= [ 1.3438
(0.4708)
0.9368
(0.1344)
0.3630
(0 .1509)
],
[ 3.6596
(1.9046)
0.8146
(0.1840)
0.5344
(0 .1853)
].
252 Markov-Switching Models ofthe German Business CycJe
0.8938 0.1062 0.0000

(0.2244) (0.0855) (0.1420)
0.0000 0.9296 0.0704
P
(0.1532) (0.1100) (0.2024)
0.0478 0.1393 0.8128
(0.2640) (0.0624) (0.1234)
(
[ 01096]
0.6469 E[h] =
[ 14.1991
9.4191]
0.2434 5.3430
In Figure 11.24, the smoothed and filtered probabilities of the recessionary regime
3 are compared with the business cyele elassifications of the CIBCR. The empir-
ical characterization of the business cyele is quite elose to those of the MSM(2)-
AR(4) and the MS(5)-AR(O) model. More fundamentally, regime shifts coincide
with CIBCR turning points. Interestingly, regime discrimination is maximized, and
thus regime uncertainty minimized.
A elear advantage of this specification is its more parsimonious parameterization

with 16 instead of the 26 parameters of the MSI(5)-AR(O) model. Unfortunately,
a statistical test between both models is rather complicated since the alternatives are
not nested and nuisance parameters would be involved. In the next section, it will be
shown that the MSM(3)-AR(4) model has also satisfactory forecasting abilities.
11.9 Forecasting Performance
The analysis of the German business cycle by means of Markov-switching auto-

regressive time series models conc1udes with abrief investigation of the potential
role of MS-AR models in forecasting. For this purpose, the performance of various
models in out-of-sample forecasts with forecast horizons from one to ten quarters
has been examined; we have estimated the considered models using the data up to
1989:4 and then computedforecasts for the last five years.
The forecasts, YT+j+h\T+j, were obtained by the methods described in Chapter 4.

While the estimation period was fixed to the quarters from 1962: 1 to 1989:4; the
necessary update of regime probabilities ~T+j+h\T+j was ca1culated up to the actual
period. The information set used for the forecasts was extended successively. Thus,
11.9. Forecasting Performance 253
Table 11.7: Forecasting Performance of MS-AR Models
Post-Sample Root Mean Squared EITors

of Forecasts of the Growth Rate of West-German GNP
MSM(2) MSM(2) MSMH(3) MSM(5) Linear Alternatives
h obs. -AR(O) -AR(4) -AR(4) -AR(O) AR(O) AR(l) AR(2) AR(3) AR(4) AR(5) AR(6) MA4
1 20 0.93 1.00 1.07 1.02 0.92 0.92 1.00 1.02 0.94 0.92 0.91 0.96
2 19 0.90 0.89 0.88 0.94 0.90 0.90 0.89 0.90 0.91 0.89 0.90 0.99
3 18 0.90 0.90 0.90 0.93 0.93 0.93 0.93 0.93 0.93 0.91 0.90 1.04
4 17 0.91 0.90 0.85 0.93 0.91 0.91 0.90 0.90 0.91 0.90 0.88 1.20
5 16 0.93 0.93 0.85 0.91 0.93 0.92 0.92 0.92 0.92 0.95 0.93 1.29
6 15 0.86 0.88 0.82 0.85 0.86 0.86 0.86 0.85 0.87 0.91 0.89 1.34
7 14 0.87 0.91 0.83 0.87 0.87 0.87 0.87 0.86 0.89 0.94 0.91 1.37
8 13 0.84 0.89 0.81 0.85 0.85 0.85 0.85 0.84 0.87 0.91 0.89 1.18
9 12 0.87 0.93 0.84 0.87 0.87 0.87 0.87 0.86 0.90 0.95 0.92 1.32
10 11 0.88 0.93 0.84 0.88 0.88 0.88 0.88 0.87 0.89 0.94 0.91 1.38
Estimation Period: 1962:1 to 1989:4. Forecasting Period: 1990:1 to 1994:4.

MA4: 0<1 = 0<2 = 0<3 = 0<4 = 0.25. jj = O.
MSM(2) - AR(O): ill = 1.07, jj2 = -0.37; ü 2 = 1.21; Pll = .93, P22 = 0.77.
MSM(2) - AR(4): ii l = -0.33, ii2 = -0.12, ii3= =
-0.0026, ii 4 0.41; ü 2 = 1.21;
jjl = 1.07,il2 = -0.37iPll = .93,pn = 0.77.
MSMH(3)-AR(4): ci l = -0.39, ii 2= -0.20, ii3= -0.12, ci4 = 0.16; jjl = 1.40, jj2 = 0.85,
jj3 = -0.58; ü~ = 3.37, Ö'~ = 0.81, Ü5 = 0.14; Pn = 0.90, Pl2 = 0.10, Pu = 0,
P2l= 0, P22= 0.94, P23 = =
0.06, P31 0.08, P32= 0.19, P33= 0.73.
MSM(5) - AR(O): jjl = 4.45, ill= =
2.43, ill =
1.13, jjl =
-0.43, jjl -2.63; ü 2 = 0.25; Pll = 0,
P12 = 0.75, PI3 = 0.25, P14 = 0, P21 = O. pn = 0.23, P23 = 0.54, P24 = 0,
P3I= 0, P32= 0, P33= 0.70, P34 = 0.28, P41 = 0, P42 = 0.16, P43 = 0.28,
P44 = 0.54, PSI = 0.80, P52 = 0.20, PS3 = 0, PS4 = O.
if the information set used for the parameter estimation is denoted by YT , YT +j was
used to derive the h-step prediction YT+j(h).
The forecast performance of the models over the following 20 quarters (1990:1 to
1994:4) has been measured in terms of the root of the mean squared prediction eITors
(RMSPE). The results are summarized in Table 11.7.
It is of considerable interest that there are no substantial differences in the forecast-

ing performance ofMS-AR models and linear AR models. Neither model performed
satisfactorily in forecasting the macroeconomic ftuctuations of the nineties. The ran-
dom walk with drift, AR(O), is not less successful in predicting West-German GNP
than more elaborated linear and non-linear models.
However, except for the one-step forecast, the MSM(3)-AR(4) model substantially
outperforms the alternative model-based predictors. Our results are thus in contrast
Figure 11.25: Uncertainty of the Regime Classification in the MSM(2)-AR(4) Model
o +-~~~~~~~~~~~~~-L~~~ __ ~~~~~~~~
60 65 10 15 80 85 90 95
to the evidence found in the literature supporting the claim that non-linear time series
models tend to have superior in-sample and worse post-sample abilities. For ex-
ample, GRANGER & TERÄSVIRTA [1993, p.l63] found for logarithms ofU.S. GNP
data from 1984:3 to 1990:4 that the post-sample root me an squared prediction error
of a smooth transition regression model was 50 % higher than for a corresponding
linear model.
One explanation for the unsatisfaetory performance may be the singularity of events
in the forecast period. This forecasting period was affected by the monetary union
with the former East Germany (GDR), the reunification of East and West Germany,
the gulf-war, and a severe international recession (to be considered in Chapter 12). A
non-linear model may be expeeted to be superior to a linear one only if the foreeast-
ing period eontains those non-linear features. For the period under eonsideration, the
non-linearities associated with the business eycle phenomenon seem to be dominated
by the politieal shoeks affeeting the econornie system.
This is illustrated by the high ex-post uneertainty of regime classifieation pietured in

Figure 11.25, which was based on the MSM(2)-AR( 4) model estimated for the whole
period (1962: 1 to 1994:4). Given this uncertainty in the ex-post "explanation" of the
business eyde, it is not very surprising that the prediction of the business cyde can
be extremely difficult. In addition to the large shocks to the systems eaused by the
regime shifts in the intercept term, there is an indieation of a struetural change which
eoneerns the linear model as weB as the Markov-switching autoregressive model.
11.10. ConcIusions 255
In particular, the high one-step prediction errors see m to be associated with some
instabilities of the autoregressive parameters. This is supported by the superior per-
formance of the MSM(2)-AR(O) model, where we have set the AR-parameters ofthe
estimated MSM(2)-AR(4) model to zero.
11.10 Conclusions
This analysis has examined whether MS-AR models could be useful tools for the
investigation of the German business cycle. It has been shown that - among other
preferred models - the MS(2)-AR(4) model proposed by Hamilton is able to cap-
ture the main characteristics of the German business cycle. Potential for improve-
ment using the introduction of more than two regimes has also been established. In
particular, the Markov-switching model with additional regimes reftecting episodes
of high volatility has been recommended as an inventive device for dating the Ger-
man business cycle and multi-step forecasts. Dur findings demonstrate the effects
of model selection; the new insights and improvements gained from departing from
the basic MSM(2)-AR(4) model, which dominates the literature, have been proven
to be worth the effort.·
A main assumption of OUf analysis which might be relaxed is that of fixed transition
probabilities. For post-war O.S. data, DIEBOLD et aI. [1993] found that the memory-
lessness of contractions and expansions are strongly rejected. Models with varying
transition probabilities have been considered by DIEBOLD et ai. [1994], FILARDO
[1994],LAHIRI & WANG [1994],andDuRLAND & MCCURDY [1994]forU.S. data
and should be applicable to Markov-switching models of the German business cycle.
GHYSELS [1993] ,[ 1994] has proposed an MS model with seasonal varying transition
probabilities, which is more suited fer seasonally unadjusted data.
The necessary instruments for implementation of these models for German data are
given in Chapter 10. In any case, the two remaining chapters of this study are con-
cemed with another imperfection of the models considered so far. The business cycle
as defined by BURNS & MITCHELL [1946] is essentially a macroeconomic phe-
nomenon, which reftects co-movements of many individual economic series. There-
fore, the dynamics of the business cycle have to be considered in a multiple time
series framework. A univariate approach to the German business cycle must be con-
sidered unsatisfactory. In the next chapter, we will show how the tradition al distinc-
256 Markov-Switching Models of (he German Business Cyc1e
tion in an analysis of co-movements among economic time series and the division
of the business cycle into separate regimes can be solved by means of the Markov-
switching vector autoregressive model. In addition, Chapter 13 investigates the ap-
plicability of the Markov-switching model to the analysis of cointegrated systems.
11.A. Appendix: Business Cyc1e Analysis with the Hodrick-Prescott Filter 257
Figure 11.26: Hodrick-PrescottFilter: Cyclical ComponentofWest-Gennan GNP
-2
- 4
-6
60 65 70 75 80 85 90 95
11.A Appendix: Business Cycle Analysis with the

Hodrick-Prescott Filter
A broadrange ofbusiness eycle studies generate "stylized facts" ofthe business cycle
using the Hodrick-Prescott (HP) filter which derives the trend component 1it of a uni-
variate time series Yt as the result of the following algorithm:
T T-l
argmin 2..:(Yt - flt)2 +). 2..: (ßYt+1 - ßYt)2, (11.7)
t=l t=2
where ßYt = Yt - Yt-I. The first-order eondition for Yt, 2 < t < T - 2 associated
with the optimization problem (11.7) is given by
whieh eanbe simplified to the following inhomogeneous differenee equation:

258 Markov-Switching Models of the German Business eyde
The system of first-order conditions for {:ilt} ;=1 results in the following linear filter:
-1
'fh 1 -2 1 0 Yl
'fh -2 5 -4 1 Y2
fh 1 -4 6 -4 1 Y3
IT + .,\
YT-2 1 -4 6 -4 1 YT-2
YT-l 1 -4 5 -2 YT-l
YT 0 1 -2 1 YT
The cyclical component is given by the residuum of this procedure, Yt - Yt. Thus
the cyclical component measures the deviation of the considered series from its loeal
trend. For the West German GNP data, the eyclieal component is plotted in Fig-
ure 11.26 where .,\ = 1600 has been chosen as in KYDLAND & PRESCOTT [1990].
A comparison with Figure 11.4 clarifies the fact that peaks and troughs of the series
detrended with the HP filter coincide with those of the change in GNP against pre-
vious year. CIBCR-reeessions are not associated with low values but with sharp de-
clines of the cyclieal component, which indicates that there might be a unit root in the
cyclical component. Furthermore, it is neither clear how the turning points should
be dated nor how the filtered data could be used e.g. for forecasts. Furthermore,
the statistical properties of the filter have been criticized recently inter aUa by KING
& REBELO [1993], HARVEY & JAEGER [1993], COGLEY & NASON [1995] and
BÄRDSEN et al. [1995]. In particular, COGLEY & NASON [1995] have shown that
the HP filter can generate spurious cycles when the time series are integrated as in
our case (cf. Section 11.2.1).
Altogether, the CIBCR classification of the German business cycle seems to be the
best available benchmark for measuring the quality of empirical characterizations of
the German business cycle by means of MS-AR models.
Chapter 12
Markov-Switching Models of Global and

International Business Cycles
Business cyde research focuses traditionallyon Ci.) the co-movement of macro-

economic time series and (iL) the regime switching nature of macroeconomic activ-
ity. Recent theoretical and empirical research has revived interest in each issue sep-
arately as pointed out by DIEBOLD & RUDEBUSCH [1994]. A synthesis ofthe dy-
namic factor and the non-linear approach for the modelling of macroeconomic fluc-
tuations associated with these different traditions in empirical macroeconomics is
provided by the MS-VAR business cyde model, where the regime shift governing
process generates dynamic factor structures. The purpose of this chapter therefore
is not only to illustrate the MS-VAR model and the related methods developed in
this study, but also to lend new insight into the common center ofthese two research
strategies.
Before we move to the system approach by studying six-dimensional MS-VAR mod-

els, we will describe in a preliminary section national Markov-switching business
cyde models. In contrast to our study of the Gennan business cyde in the forego-
ing chapter, we will not go into such a detailed specification analysis of the univari-
ate models. Instead, we employ mainly the HAMILTON [1989] model as a statistical
device to collect stylized facts of the different behavior of the economies during con-
tractions and expansions. The coherence of the national business cydes will be our
primary concern for the specification of a multi-country model of global and interna-
tional business cycles. Alternative model specifications are considered which lead to
different, but complementary conclusions about the economic system under consid-
eration. Tbe last section concludes with an analysis of the stylized facts of business
260 Markov-Switching Models of Global and International Business CycJes
cycles in the considered six-country world found by the methodology introduced in

this study.
Since OUf primary research interest concems business cycle phenomena and not the
convergence of per capita income and growth, the national trends are eliminated sep-
arately. To use a precise notation, the considered MS-VAR model in differences is
called an MS(M)-DVAR(P) model. The issue of cointegration will be investigated
in the next chapter and is therefore not dealt with here. The analysis of co-movement
of economic time series within cointegrated systems with Markov-switching regime
will conclude this study.
The study uses the data from the OECD on real GNP of the USA, Japan and West-
Germany, as weIl as real GDP of the Uni ted Kingdom, Canada and Australia. The
data set consists of quarterly seasonally adjusted observations. The estimation
period, excluding presample values, covers 120 quarters from 1962: 1 to 1991 :4.
The time series were tested for unit roots. Each one was found to be 1(1). Thus, first
differences of logarithms (times 100) are used, which are plotted in Figure 12.1.
12.1 Univariate Markov-Switching Models
Most empirical research has been done with a single equation Markov-switching
model, see interalia LAM [1990], PHILLIPS [1991], GOODWIN [1993], KIM [1994],
KÄHLER & MARNET [1994a], KROLZIG & LÜTKEPOHL [1995] and SENSIER
[1996]. All ofthese cited investigations have applied the HAMILTON [1989] model
of the U .S. business cycle with at best slight modifications. In line with these studies,
we investigate the national business cycle phenomena in oUf data set by means of an
MSM(2)-AR(4) model. In contrast to previous studies, evidence ofthe inadequacy
of the Hamilton specification is revealed, at least in the case of the Japanese growth
process.
12.1. Univariate Markov-Switching Models 261
Figure 12.1: Growth in the World Economy: Quarter over Quarter
6 6
USA CAN
4 4
2 2
0 0
-2 -2
-4 -4
60 65 70 75 80 85 90 60 65 70 75 80 85 90
6 6
UK FRG
4 4
2 2
0 0
-2 -2
-4 -4
60 65 70 75 80 85 90 60 65 70 75 80 85 90
6 6
JAP AUS
4 4
2 2
0 0
-2 -2
-4 -4
60 65 70 75 80 85 90 60 65 70 75 80 85 90
An overview of our estimation results is given in Table 12.1. The models have been
estimated with the EM-algorithm discussed in Chapter 6 and Chapter 9. In contrast
to GOODWIN'S [1993] analysis, which used numerical optimization techniques, we
were not forced to employ Bayesian priors in order to derive meaningful business
cycle phenomena (even for the Japanese and the United Kingdom data).
262 Markov-Switching Models of Global and International Business Cyc1es
Table 12.1: Estimated MSM(2)-AR(4) Models
USA CAN UK FRG JAP AUS
/LI 1.0590 1.1835 0.7939 1.0689 1.3979 1.6390

/L2 -0.1211 -0.3761 -0.3370 -0.3843 -0.2585 -0.0841
QI 0.0699 0.1059 -0.4146 -0.3241 -0.0171 -0.2937
Q2 0.0450 -0.1137 -0.2118 -0.1023 0.2393 -0.1333
Q3 -0.1345 0.0847 -0.1758 0.0326 0.3113 -0.0693
Q4 0.0065 -0.1353 -0.1077 0.4172 0.2665 -0.3262
pu 0.9316 0.9754 0.9588 0.9237 0.9534 0.7436
P22 0.8572 0.8857 0.8684 0.7717 0.7473 0.6500
6 0.6761 0.8229 0.7615 0.7494 0.8443 0.5772
(2 0.3239 0.1771 0.2385 0.2506 0.1557 0.4228
(1 - Pll)-I 14.6117 40.6564 24.2653 13.1018 21.4552 3.9000
(1 - P22)-1 7.0014 8.7484 7.6004 4.3808 3.9579 2.8570
(72 0.5714 0.7394 1.4690 1.1402 0.7326 0.5482
InL -152.75 -162.56 -204.59 -197.17 -169.23 -182.48
With aH estimated MSM(2)-AR(4) models it is eommon that they represent business

eyc1es where the eonditional mean of the growth rate in regime 1 ("boom") gives
an expeeted annualized expansion from 3.2 % (United Kingdom) to 5.6 % (Japan),
and the seeond regime ("reeession") refteets mean annualized eontraetions from -
0.34% (Australia) to -1.54 % (Germany). The regime durations are consistent with
tradition al deseriptions of the length of reeessions and reeoveries (cf. the BURNS &
MITCHELL [1946] definition of a business eyc1e).
Coherenee, as weH as specifics of the national business eyc1es, are investigated in

the following. As a benchmark we will again use the regime c1assifieations of the
CIDCR.
Figure 12.2: The Hamilton Model: USA
Smoothed and Filtered Probabilities of the Regime "Recession"

. -. --:-:-:---
1. 0 r::-::---:-:- : - - - - -,..,.---- -y----:...-:--- -------,.,.-,
....
0.5
60 65 70 75 80 85 90
Contribution of the Markov Chain to the Business Cyde
.:.:.::::::.:
4
- 2 .........". ....; ....:..;:::::.

- 4 +:~:: ~:~. ~.~.~.,_~~~.~.~.. ,~.. --~:~<:~:::~:: :~--~~::~::::~(~::~:~:~:: :~~~~~~~~·r··~::: ~
:··
60 65 70 75 80 85 90
12.1.1 USA
In contrast to HESS & IWATA [1995] who observed a breakdown of the Hamilton
model for data which indudes the end of World War II and the Korean War, our res-
ults for the 1962-1991 period reveal structural stability of the MSM(2)-AR(4) model
of the U. S. business cyde. The estimates given in Table 12.1 are broadly consistent
with those presented by HAMILTON [1989] for the 1952-1984 period. This concerns
the conditional means, {LI = 1.06 vs. 1.16 and {L2 = -0.12 vs. -0.36, as weIl as
the transition probabilities, Pu = 0.93 vs. 0.90, P22 = 0.86 vs. 0.75, and the er-
ror variance 0- 2 = 0.57 vs. 0.59. The filtered and smoothed probabilities generated
by the MSM(2)-AR(4) model are presented in Figure 12.2. Interestingly, as for the
NBER classifications used by HAMILTON [1989] we found that the expansion and
contraction episodes found by the Markov-switching model
Figure 12.3: The Hamilton Model: Canada

1.0 TT~~--~~~--~~----~~~--~~~r---~----~~~
....
::::;::
.. ....
.',',',
'
.-:: ' , '
0.5 ;:::::;
:::::::
:::::;:
.... ',.
65 70 75 80 85 90
Contribution of the Markov Chain to the Business Cycle
6
~: ~'-'~-~: ~:~~~~--'~'~'~'----~' '~' -'~~--~:~:~:~::~~: ~~~:~--~~--~-~-~~

60 65 70 75 80 85 90
correspond fairly closely to traditionally dated tuming points.
12.1.2 Canada
The Canadian economy is characterized in the sampie period by two strong contrac-
tions in 1981/82 and 1990/91. As illustrated in Figure 12.3, the MSM(2)-AR(4)
model captures these deep recessions as shifts from regime 1 with an expected
growth rate [LI = 1.18, to regime 2 with a negative mean growth [L2 = -0.37. The
shorter contractionary periods in 1974 and 1980, which are classified however by the
CIBCR as downswings of the business cycle, are not explained as being caused by a
shift in regime, but rather as negative shocks in an underlying expansionary regime.
Note also that our estimations are quite compatible to those of GOODWIN [1993]
who was compelled to use Bayesian priors to establish a meaningful result.
Figure 12.4: The Hamilton Model: United Kingdom
:+ "t(
:,:
:
:t.
. ... •
,".'r :,·,-: ,I
"
" ' f ';'

0.5 t1
:
/1:1
I
I ~
I
""
I
I
65 70 75 80 85 90
6
2 '
, ...
- 2 "
- 4 ~--~~~~'~
" ~~~~'~
' --~~----~~~~--~~~~~~ " ~
"
60 65 70 75 80 85 90
12.1.3 United Kingdom
The macroeconomic ftuctuations in the Uni ted Kingdom are marked by the three
strong recessions dated by the MSM(2)-AR(4) model as the periods from 1973:4 to
1975:3,1979:3-1981:3 and 1990:2-1991:4.
The CIBCR methodology leads to three additional, yet shorter, recessions in 1966
and 1971/72. But Figure 12.4 shows that these are not clearly reftected in the
quarterly GDP growth rate of these periods and hence not detected by the MS-AR
model. Note that this is in line with the estimates of GOODWIN [1993] and SENSIER
[1996].
Figure 12.5: The Hamilton Model: West Germany
,
I
1
1
I
0.5 I.
0.0 f-.....:....;~__~,.;c.::.::::...:..J:.:A04~Iooi[.;...:.J,O'""-'=~..c.....R:liolt~:...:....:...:....:..:.lo!.I,J~~.;;..;::Qt.a.e.:....J
60 65 70 75 80 85 90
6
· ......... ::::, ........ -,
· .'.' .....
4
· ... ' .....
· .'.' ..... ?:::::::::::
-2
-4 +-~~~~~~--~~~--~~--~~,.~~~.~.~.~.--~~.~~.~~~
~tmmt ... .. ..
60 65 70 75 80 85 90
12.1.4 Germany
MS(M)-AR(P) models of the German business cycle have been discussed at full-
length in Chapter 11. A comparison of the estimated parameters in Table 12.1 and
Table 11.2 as wen as Figure 12.5 with Figures 11.7 and 11.7 shows that the addi-
tional 12 observations, together with the update of the 1990/91 observations, have
only limited effects. Interestingly, the results are again very elose to the estimations
of GOODWIN [1993].
In comparison to the D.S. business cycle, the recessions are shorter (4.4 vs. 7 quar-
ters), but more pronounced (-0.4% vs. -0.1 %). The variance of the white noise is
higher and regime shifts more frequent. Thus, relative to the process of D.S. GNP
growth, the German growth rates are more difficult to predict.
Figure 12.6: The Hamilton Model: Japan
Smoothed and Filteted Probabilities of the Regime "Recession"
0.5 1
::~::::
65 70 75 80 85 90
6 :::::.:.:.:.:.
4 I::::::::::
2
- 2
- 4 +---~~~~--~-,~~--~+-~~--~~~~~~~ __~~~
60 65 70 75 80 85 90
12.1.5 Japan
The Japanese economy is characterized by an extremely high economic growth in

the pre-1974 era (cf. also Figure 13.1). This high expansion rate of economic activ-
ity is connected with major ftuctuations. The sharp contraction in macroeconomic
activity in 1974 is followed by a slowdown in the trend rate of economic growth in
the following two decades.
The estimated conditional means, [ll = 1.4 and [l2 = -0.26 are quite compatible
with the business cycles of the other countries under consideration. However, Fig-
ure 12.6indicates that the process of economic growth in Japan is not described very
weIl by a two-regime model. The MSM(2)-AR(4) model underestimates the me an
growth rate in the first part of the sampie and overestimates the mean growth rate in
the second part of post-war economic history.
Figure 12.7: MSI(4)-AR(4) Model of Japanese Growth
--9-
I. • • • 14
'/'IIOOC~ ..., ,.... 'tMCIlI_: 1
I. . .. J.4
s..o.ttwd .... ,..,... ....... ~! • .p- }
'. l<
~ .... r..... m..bllIiII.: 111 ...... t.
\I~.-----:-----:--~~,~--:------!I
~..,. . ", I,.. IM"' ... 0It0_ ,. u.. ...-- c,.
,I
I .
~'~~ r.
" ·1
Thus, we will consider MS-AR models with more than two regimes. The estimation
results of the more general MS(M)-AR(P) are given in Table 12.2.
The MSI(4 )-AR( 4) model presented in Figure 12.7 reveals a structural break in the
business cycle behavior of the Japanese economy. A growth cycle (regime 1: VI =
3.0 vs. regime 2: V2 = 1.22) is identified until 1974. The contraction in 1974 is
identified as an outlier state with V4 = - 2.8 and an expected duration of exactly one
quarter. The recession initiates a third regime of dampened macroeconomic fluctu-
ations. This regime is the .absorbing state of the regime shifts generating Markov
chain with an expected growth rate of &(1) V3 = 1.13.
Table 12.2: MS(M)-AR(p) Models of Japanese Growth
MSM(3) MSMH(3) MSI(3) MSIH(3) MSI(4) MSIH(4)

-AR(4) -AR(4) -AR(4) -AR(4) -AR(4) -AR(4)
/-L1,1I1 2.7627 2.6702 3.1435 2.1565 2.9557 4.3721
/-L2,1I2 1.1155 1.1296 1.0807 0.9974 1.2204 2.7359
/-L3,1I3 0.0095 0.1120 1.7117 0.1251 1.0744 1.1926
/-L4,1I4 -2.7851 0.2579
(Y1 -0.2379 -0.1946 -0.1075 -0.0698 -0.1156 -0.1558
(Y2 -0.0599 -0.0365 0.0316 0.0626 -0.0216 -0.0076
(Y3 0.0124 0.1134 -0.0108 0.0302 -0.0025 -0.0047
(Y4 0.1227 0.1917 -0.0344 0.0306 0.0881 0.0835
pu 0.8963 0.9036 0.8582 0.9767 0.8756 0.0000
P12 0.0398 0.0002 0.0382 0.0000 0.1244 1.0000
P13 0.0639 0.0962 0.1036 0.0233 0.0000 0.0000
P14 0.0000 0.0000
P21 0.0383 0.0399 0.0000 0.0060 0.1919 0.2069
P22 0.9617 0.9284 1.0000 0.9940 0.7550 0.6660
P23 0.0000 0.0317 0.0000 0.0000 0.0000 0.0435
P24 0.0531 0.0836
P31 0.0000 0.0000 0.1983 0.0000 0.0000 0.0000
P32 0.2318 0.4215 0.0000 0.1368 0.0000 0.0398
P33 0.7682 0.5785 0.8017 0.8632 1.0000 0.9602
P34 0.0000 0.0000
P41 0.0000 0.0000
P42 0.0000 0.0000
P43 1.0000 0.2114
P44 0.0000 0.7886
6 0.2509 0.2615 0.0000 0.1970 0.0000 0.0431
(2 0.6799 0.6313 1.0000 0.7694 0.0000 0.2084
(3 0.0692 0.1071 0.0000 0.0336 1.0000 0.6660
[4 0.0000 0.0824
(I-pu) 1 9.6469 10.3725 7.0512 42.8850 8.0356 1.0000
(I-P22)-l 26.1365 13.9685 00 167.5159 4.0818 2.9944
(I-P33)-1 4.3143 2.3724 5.0420 7.3123 00 25.1475
(I-P44)-1 1.0000 4.7301
ar 0.6097 1.0807 0.7089 1.3537
0.4130
0.5815
0.5815
1.7684
0.3527
a~ 0.6097 0.3258 0.7089
d 0.6097 1.2832 0.7089 1.8384 0.5815 0.3917
al 0.5815 1.5360
in L -162.04 -155.07 -161.95 -155.77 -154.81 -151.90
Figure 12.8: MSI(3)-AR(4) Model of Japanese Growth
J_ .. • •
1 1/J A/..
Smoot " 1oCI 0"4 fllt .,..d PrablDIII' ..... : 1It~ Z
1I Ä ~. (1. I
The more parsimonious MSI(3)-AR(4) model subsumes the growth recessions be-
fore 1974 and the post-1974 episode as a joint "normal growth" regime. Virtually
unchanged are the "high-growth" regime and the remaining third stagnationary re-
gime as Figure 12.8 clarifies.
So far we have assumed that the variance is regime invariant. However, this hypo-
thesis is rejected by likelihood ratio tests., The LR test statistic gives 12.36 for Ho:
MSI(3)-AR(4) vs. H 1 : MSIH(3)-AR(4) and 13.94 for Ho: MSM(3)-AR(4) vs. H 1 :
MSMH(3)-AR(4) which are both significant at 1%, X5.99(3) = 11.3.
Figure 12.9: MSMH(3)-AR(4) Model of Japanese Growth
IV _\1\ ,
Smoou~f(t ond f li litfed Probo rWIiri.. ; 1iI'~ 1
1I
Smootl'llM3 Ol"ld FnltrH PrOCoclbtl~ : Ittqlm.t '2
I
In Figure 12.9 the effects of regime-dependent heteroskedasticity on the filtered and

smoothed regime probabilities and the conditional me an of the J apanese growth rate
are studied for an MSMH(3)-AR(4) model. The regime cIassifications implied by
the MSMH(3)-AR(4) model are quite similar to those of the MSI(3)-AR(4) model.
They are also compatible with those ofthe MSI(4)-AR(4) model ifthe probabilities
of its second and third regime are taken together.
The foregoing results of the MSM(2)-AR(4) model confirm the evidence found in
the previous literature that the Harnilton model is able to replicate traditional busi-
ness cycIe cIassifications. However, as in Chapter 11, we have also seen that there
are structural breaks in the data which cannot be subsumed under the notion of busi-
ness cycIes. These findings for Japan of the pre-197 5 period are similar to the result
of MINTZ [1969] that for the West-German economy of the fifties and sixties only
growth cycIes can be identified.
272 Markov-Switching Models of Global and International Business Cycles
Figure 12.10: The Hamilton Model: Australia
.. ill.
~ .. .
: :fTt ~
~;~ : I .....
.-ll·
rl:
,,
r:.;·
I , . .... . .
::_~ :
:: 1". '
::~ :;:t:
0.5 :1
! lt: ::: 1
..
.
I
" I
: '.:-:-;.>
Jw ?\\;
I1 I - I1
';:\J :~.~~ ::'ij :::-:- ~

,I : I - I
,I I
Vi
I
~f l
0.0 .'W ,A
~~~~~~~~~~~~~~~~~,-~~~~~~--~~~
~~
60 65 10 75 80 •••••••••••• 85 90
Contribution of the Markov Chain to the Business Cyde
0.5 :->:-
......
'I< ': .:
:::-:-:-
u · . .
".
- - ..
65 70 75 80 85 90
12.1.6 Australia
As the last single equation analysis in this study, we investigate the Australian macro-
economic fluctuations with the help of the MSM(2)-AR(4) model. To our know-
ledge, there exists no result in the literature on MS-AR models of the Australian
business cycle. Hence, we ~se again the CIBCR business cycle classifications as a
benchmark.
The estimated parameters given in Table 12.1 are quite compatible with the MSM(2)-
AR(4) models discussed previously. Figure 12. 10 reveals a relatively high volatility
of the Australian growth process. While this observation seems to be consistent with
the high frequency of CIBCR recessions, the expected duration of expansion is less
than one year, which is much shorter than those of the other country models and the
notion of a business cycle. Hence we have considered alternative specifications.
12.1. Uni varia te Markov-Switching Models 273
Figure 12.11: TheMSMH(2)-AR(4)Mode1: Austra1ia
0.5
65 70 75 80 85 90
6
-2 .. ..
-4 +-:->-:----~·~·_·_·~·--~~~~:~
::::~::::~-~·Ä·~~~:~}~
.:::+::::~~~~---Ä~
60 65 70 75 80 85 90
The relaxation of the homoskedasticity assumption gives the following estimates of

an MSMH(2)-AR(4) model:
- A( 1 ) - A( 1 ) - A(1 )
l'1Yt = M~tIT- 0.0217 (.6.Yt-l-M~t_lIT)- 0.0903 (.6.Yt-2-M~t_2IT)
(0.0954) (0.0987)
- A(l) - A(l) _
+ 0.1603 (.6.Yt-3 - M ~t-3IT) - 0.1411 (.6.Yt-4 - M ~t-4IT) + Ut
(0.0985) (0 .0906)
a-21 = 1.2379 ö- 2
2 -
- 0.2154 InL = -181.8147
(0.1809) (0 .1238)
1.1199 0.9703 0.0297

- , (0 .1121)
p= (0.0243) (0.0243)
M = 0.1584 0.8416
0.3412
(0.1488) (0.1410) (0 .1410)
The hypothesis of homoskedasticity uI = u~ cannot be rejected in a likelihood ratio

test: Under the number-of-regimes preserving condition /-LI =I=- /-L2, the test statistic
LR = 1.33 is insignificant at 10%, X5.9o(1) = 2.71. However, the regime classific-
ations based on the MSMH(2)-AR(4) model which are given in Figure 12.11 will be
of interest for the following system approach.
A comparison with Figures 12.3 and 12.4 clarifies that the regime shifts of the
MSMH(2)-AR(4) model are closely related to those in the UK and Canada. This
coherence of regime shifts suggests the notion of a common regime shift generating
process.
In contrast to our analysis in the foregoing chapter, we will not go further into the
details of model specification of the univariate time series under consideration. In-
stead, we will move direct1y to the system approach by studying a six-dimensional
system of the global economy.
12.1.7 Comparisons
In the preceding discussion, we have seen that the MS(M)-AR(P) model is able to
capture the business cycle dynamies of the considered national economies. The re-
cession probabilities that we have obtained for the six countries are compared in
Figure 12.12. At least for the last two decades, recessions and booms occur simul-
taneously across countries (with four exceptions regarding Japan, Canada and Aus-
tralia); this might be due to the world-wide oil price shocks or the increasing glob-
alization of markets.
Figure 12.12: National or Global Business Cyc1es?
10
0.5
0.0
r , ~lL71-JYj ,~1l
A
60 65 70 75 80 85 90
276 Markov-Switching Models oE Global and International Business Cyc1es
Figure 12.13: Growth in the World Economy: Year over Year
15 15
USA CAN
10 10
5 5
0 0
-5 -5
60 65 70 75 80 85 90 60 65 70 75 80 85 90
15 15
UK FRG
10 10
5 5
0 0
-5 -5
60 65 70 75 80 85 90 60 65 70 75 80 85 90
15 15
AUS
10 10
5 5
0 0
-5 -5
60 65 70 75 80 85 90 60 65 70 75 80 85 90
This evidence seems also to be consistent with a comparison of the path of annual
growth rates given in Figure 12.13.
12.2. Multi-Country Growth Models with Markov-Switching Regimes 277
12.2 Multi-Country Growth Models with Markov-

Switching Regimes
12.2.1 Common Regime Shifts in the Joint Stochastic Process of

Economic Growth
In this section we investigate common regime shifts in the joint stochastic process
of economic growth in the six countries under consideration. More precisely, for the
system of quarterly real GNP growth rates,
alternative specifications of Markov-switching vector autoregressive models of or-

der p and M regimes in differences are considered. Such a model might be called
MS(M)-DVAR(P) models.
In general, it would be possible to consider regime shifts for each individual country
separately. However, together with the possible leadingllagging relationships, this
formulation entails that the number of regimes would explode to M K = 26 = 64
(cf. Table 7.2). The dimension of the regime vector involved in the EM algorithm
would be (M K )2 = 64 2 = 4096 for an MSI specification or even (M K )(1+p) for
an MSM model, which makes the analysis impossible.
Thus we assume in the following that the regime shifts are perfectly correlated. As a
consequence, the dynamic propagation mechanism of impulses to the system con-
sists of (i.) a linear autoregression representing the international transmission of
national shocks and (ii.) the regime shifts generating Markov process representing
large - contemporaneously occurring - common shocks.
These sources of macroeconomic ftuctuations are not necessarily independent. If the

variance is considered to be regime-dependent, the regime generating process alters
the contemporaneous correlation of the innovations Ut and thus the orthogonalized
impulse response function. In this sense, changes in regime can simultaneously af-
feet the international and the global business cyde.
This procedure is in line with PHILLIPS' [1991] analysis of monthly growth rates of
industrial production, where U .S. data have been combined with UK, German and Ja-
panese data. In none of the bivariate difference-stationary MSM(2)-DVAR(1) mod-
eIs considered by PHILLIPS could the null hypothesis of perfectly correlated regime
shifts be rejected.
But our analysis does not only extend the approach of PHILLIPS [1991] to large VAR
systems. In particular, we do not restrict our investigation to MSM-DVAR models
with M = 2 and p = 1. For pure VAR(P) processes, the Akaike order selection cri-
terion suggests a first-order autoregression in differences (p = 1), while the Hannan-
Quinn and the Schwarz criteria support a random walk with drift (p = 0). However,
such a specification seems to be a good starting point. If we consider time-invariant
VAR(P) models as approximations of the infinite VAR representation of a data gen-
erating MS-VAR process, then we get p = 1 as the maximal autoregressive order for
the MS-DVAR process by neglecting the non-normality assumption of the model. In
addition to p = 1, the goodness of fit achieved for each component makes a fourth
order autoregression of the system attractive.
In the following specification analysis, we are going to test the order of the vec-
tor autoregression p for various MSI and MSM specifications, introduce additional
states M, and allow for shifts in the variance 2: (St). In order to demonstrate the feas-
ibility of the methods proposed in this study, we have put no further restrictions on
the regime switching process. The limited number of regimes can therefore capture
quite different shifts in this rather large vector system. Indeed, we will show in the
following that alternative specifications ofthe MS(M)-DVAR(P) model lead to dif-
ferent but complementary conclusions on the economic system under consideration.
This strengthens our view that model variety is essential for the statistical analysis
of time series subject to regime shifts and that the necessary methods which allow
for the estimation of these models have to be provided.
12.2.2 Structural Breaks and the End of the Golden Age
We start our discussion with a first-order vector autoregression which is affected by

Markovian regime shifts between the two regimes:
respectively for regime-dependent means

Figure 12.14: MSM(2)-DVAR(1) Model
.. . ;.:-:-: .
. .....
. :-,';'>
:: .:-:::;:;:;
.. . ... .
. ... .
65 7'0 75 80 85 90
The results are given in Table 12.4 for a homoskedastic white noise process and in
Table 12.3 for a process with a regime-dependent variance-covariance matrix 1: (St).
The implications for business cycle analysis are visualized in Figure 12.14 and Fig-
ure 12.15. For both specifications, a contemporaneous structural break in the growth
rate of a11 six time series is detected in 1973:2 when the system approaches the ab-
sorbing state 2. This structural break detected by an unrestricted MSM(2)-DVAR(1)
model is known in economic history as the end ofthe 'Golden Age'. The striking fea-
ture ofthis period after World War II ("-' 1950-1973) has been an average growth rate
which is more than double the mean of any other period in history (cf. e.g. CRAFTS
[1995]).
The estimated slump in the mean growth rate in the MSMH(2)-DVAR(I) model is
given by:
.6037 0.4754
1.3313 0.9462
.6336 0.4926
{LI - {L2 = f..L2
.5279 0.3707
.7833 0.6757
.8325 0.6141
As Table 12.4 verifies, these estimations are almost identical to those of the MSM(2)-
DVAR(1) model.
Figure 12.15: MSMH(2)-DVAR(1) Model

1.0 ~.-.-------------:-::~:::~:~:--~-.~.. ~..-------::-:~:::-::~.:j:~j.::~:-----------.-.~:.:.~:.
0.5
..
....
r;. :- :-:
:., :. '.: .: .:1., \:
.:.:::~ II
.....
... ....-:::)( '-

... .
><. . . .....
:>:.::::
... :.... ::::
... :.' ... -:-:-:':
.. .
...
.. :::
. , .. ,
n: -:-: . ;.: :::
11·.·.·.
H .... ' ,': . .

-:.J:::
:./f
'li
~:
. . . . . . . ..
.:-:-: -::: ::
... ::' ...:::::;:;
0.0 ~"__L/·~·,_··~··~;'~t_·,_·,__~",·~·::;~[:~I~t~A~"r··_______.~'.~::._"~-:::~::::~:--~~-----r~"~'
60 65 70 75 80 85 90
Consider now the contemporaneous correlations of the first four variables of the sys-
tem, where the lower triangular matrix gives the contemporaneous correlations in
regime 1 (1962: 1-1973: 1) and the upper triangular matrix gives those in regime 2
(1973:2-1991 :4):
USA .267 .113 .206
-.100 JAP .209 .267
-.179 .265 FRG .352
-.016 .297 0.153 UK
Allowing for a regime-dependentvariance-covariancematrix, ~(St), shows a strong

contemporaneous correlation of economic shocks affecting national economies in
the post-1973 period, while the previous period has been characterized by a negative
correlation ofU.S. and foreign growth rates (except for Canada).
The importance of shifts in the variance of the white noise process Ut is confirmed
by the strong rejection of the MSM(2)-DVAR(I) model against the MSMH(2)-
DVAR(1) model: the likelihood ratio test for the Ho: ~1 = ~2 yields LR =
52.3869, wh ich is significant at 0.1 % (X5.999 (21) = 46.8), where we again assume
VI -=1= V2 to be valid under the null and the alternative.
Figure 12.16: MSI(2)-DVAR(1) Model

1.0 ~--------------~~--~~-------,r-~------------~~
..
65 70 75 80 85 90
12.2.3 Global Business CycIes
12.2.3.1 The MSI(2)-DVAR(1) Model
A parsimonious model that generates global business cycles with only two regimes
and a first-order autoregression is presented in Figure 12.16 and Table 12.5.
The recessionary regime coincides with the post-1973 U .S. recessions identified with
the Hamilton model of Section 12.1.1. The smoothing and the filtering procedures
identify the oil-price shock recession from 1973:2 to 1975: 1, the double-dip reces-
sion of 1979:3-1980:3 and 1981 :2-1982:4 as weIl as the recession in the nineties.
They are associated with contractions in the UK and rather slow growth in the other
countries:
1.0355 -0.2551
0.7996 0.4776
0.4284 0.2798
VI - V2 = V2
0.9459 -0.0074
0.6118 0.0957
0.8360 0.0460
282 Markov-Switcmng Models of Global and International Business Cyc1es
Figure 12.17: MSM(2)-DVAR(4) Model
.....
. "."
0.5
75 80 85 90
12.2.3.2 The MSM(2)-DVAR(4) Model
Figure 12.17 gives the business eyde dating ifthe HAMILTON [1989] specifieation is
applied to the multiple time series under eonsideration. Aeeording to the estimated
parameters in Table 12.6, the effect of a regime shift is given by
1.8294 -0.7622
1.0072 0.5704
1.1454 -0.2149
/-tl - /-t2 = /-t2 =
1.1818 -0.3876
1.6696 -0.3795
1.3761 -0.2193
Reealling the evidenee of heteroskedastie innovations for the MSMH(2)-DVAR(I)

model, it is relevant to test the MSM(2)-DVAR(4) model against the MSMH(2)-
DVAR(4) alternative. A likelihood ratio test of the homoskedasticity hypo thesis
Ho: ~1 = ~2 is easily aeeomplished by a eomparison with the log-likelihood of
the MSM(2)-DVAR(4) model. The likelihood ratio statistic takes the value LR =
81.3888. Hence, the 21 restrictions implied by a homoskedastic model are rejeeted
at 0.1 %.

1.0 ~---·-:·-:·-:·-:~Ir-.-.----~··~·~~~.~..~-----,~.-..~::::_:::~----------:.. ~. ~(f-: .: .:-:.:
..... , . ...... . ....... .
....
.:.:.:-::,.. I ··w·
'.'
.. ....
:: . . ::::: ..:.:.:.: ! <m(
0.5
. ..... . .. .. ,' ..
. ..... .:<}:::::
..
. . ...
.....
.... :::: ..... . ..
0.0 ~~~~~~~~~.~..~~~.~.~
.. ----~~~:~::~:::~::~:.::~:::.~~.~.~.~.~!~!~L'~'~'
60 65 70 75 80 85 90
12.2.3.3 The MSMH(2)-DVAR(4) Model
In the MSMH(2)-DVAR(4) model, which is presented in Table 12.7, the regime

shifts are clearly associated with traditionally-dated business cycle turning points.
In Figure 12.18 again the US business cycle has been chosen as the benchmark. The
recession in the nineties is detected as weIl as both oil-price shock episodes.
However, while in the MSM(2)-DVAR(4) model the second regime was associated
with negative mean growth rates in all national economies, except Japan, the re-
cessionary state reveals contractions only in the UK and Australia, while the me an
growth rate in the USA, Canada, Japan and West Germany corresponds closer to
growth recessions
0.1904 0.7958
0.4285 l.4707
0.4254 0.3777
J.L2 = -0.0293 0.9626
0.2607 l.0627
-0.2344 l.6825
Thus, this model detects some asymmetries in the national size of the global busi-
ness cycle; the effect of a shift in regime is for the German economic process, with
annualized 1.51 %, much less important than in the rest of the world, where the drop
in the mean growth rate is between 3.06% and 6.73% per anno.
In line with the MSMH(2)-DVAR(1) model, the innovations in the rest of the world
are highly positively contemporaneously correlated shocks in the U.S. growth rate in
the second regime. While the variance of all other growth rates is reduced in regime
2, the U.S. standard error is doubled.
The estimated MSMH(2)-DVAR(4) model renders not only econornic meaningful

interpretations, but it is also statistically reliable. In addition to rejeetion of homo-
= 4 against p = 1 ofthe MSMH(2)-
skedasticity, a likelihood ratio test ofthe order p
DVAR(1) model, Ho: A 2 = A 3 = A 4 = 0 gives LR = 189.2976. The LR statistie
has approximately a X2 distribution with 108 degrees of freedom, so that the zero
restrictions on the coefficient matrices are rejected at 0.1 %.
12.2.4 Rapid Growth Episodes and Recessions
The huge differences in the regime classifications of alternative MS(2)-DVAR(P)

models indicate that a two-regime model is not yet a fully congruent description of
the world business cycles over the last three deeades. Indeed the assumption of a
data generating process shifting between only two regimes is too restrictive to eap-
ture recessions as weIl as rapid growth episodes. Therefore we will consider now
MS-DVAR specifieations with three regimes, which rnight be able to reflect persist-
ent positive and negative deviations in the mean growth rate from the 'normal' long-
term growth rate.
12.2.4.1 The MSMH(3)-DVAR(I) Model
To formulate a three-regime model whieh is parsimonious and consistent with our

previous results, we beg in with a first-order vector autoregression with regime-
dependent means and eovariances. The ML estimation of this MSMH(3)-DVAR(1)
is given in Table 12.8. According to our findings of Section 12.1.5, the probabilities
for the high-growth regime 1 are compared in Figure J2.19 with the CIBCR busi-
ness eycle classification for Japan. The probabilities for the recessionary regime 3
are again compared with the CIBCR turning points dated for the U.S. economy.
12.2. MuIti-Country Growth Models with Markov-Switching Regimes 285
Smoothed and Filtered Probabilities of "High Growth Rates"

... -.-
1.0 '---~~'-T----n~~~.-.--~~--------.~ ..-.-.------------~
~r:::::::: :
:::: ......... .
0.5
••••.
•
t::::::: ~:.
;::>:.:.:.:.:
H::::::::::~
0.0 +-~~~~~~~~~--~~------~~~~~~~--~--~
60 65 70 75 80 85 90

1.0 ~.--.-----w------~::~:~~.w,~)~)r-'---~~'~'~:::~:::------------::~'T->~::
. . .... .' /0.'
60 65 70 75 80 85 90
A prevalence of regime 1 coincides with high-growth in all six countries. Particu-

larly, it matches exactly the episodes with absence of growth recessions in Japan,
0.6694 1.2692
1.8637 2.8917
0.8296 1.4081
fLi - fL2 = fLi = (12.1)
0.5439 1.2664
0.8413 1.6168
1.0953 1.8695
Interestingly, global recessions are again asymmetric. Negative mean growth rates
are restricted to the four English-speaking countries, whereas the loss in economic
growth is rather small in Japan and Germany:
1.0461 -0.4463
0.2657 0.7623
0.1919 0.3866
{12 - {13 = {13 -
1.6160 -0.8935
0.8850 -0.1095
0.8293 -0.0551
12.2.4.2 The MSIH(3)-DVAR(4) Model
In the dass of Markov-switching models with two regimes we have found evid-
ence for a fourth-order autoregression. Table 12.9 gives the estimation results for
an MSIH(3)-DVAR(4) model, where again regime 1 reflects high-growth episodes,
regime 2 corresponds to a 'normal' macroeconomic growth and regime 3 indicates
recessions. In comparison with the MSMH(3)-DVAR(1) model, it produces a relat-
ively shorter duration ofthe high-growth regime (3.3 vs. 12.7 quarters), but a dearer
indication of recessions (5.5 vs. 2.4 quarters).
In the first chart of Figure 12.20 we have again given the CIBCR business cyde das-
sification for Japan. It can be seen that regime 1 matches the down swings of the
growth eyde very weIl. Moreover, this result is quite compatible to the UK and Ger-
man dassifications:
0.4353 1.4626
1.4814 2.4330
1.5943 1.9709
2.0775 3.0103
0.8326 1.7937
0.7864 1.5609
A direct comparison of these impact effects of a shift from regime 2 to regime 1 with
(12.1) in the MSMH(3)-DVAR(1) world could be misleading, since the assumed dy-
namic propagations of regime shifts are different. In an MSI-DVAR model, a persist-
ent shift in regime causes effects which are equivalent to the accumulated responses
of an impulse as high as VI -V2; while, in the MSM-DVAR model, a once-and-for-all
jump in the mean growth rate is enforeed.
Figure 12.20: MSIH(3)-DVAR(4) Model
Smoothed and Filtered Probabilities of "High Growth Rates"

1.0
::«<
r----,-r----~I~----or~~------~----------------~
III :. :. :. :. :. <-: . . . . .
0.5
I
I
I
I
..... t:::::::·:·
.~: ~
I
I
I
I
I ::: ::: : :: . •...•:.: :.: .: :.: .: i.· •.: :.. : ...
}}~.
0.0 +---~~~--~~~~~~~------~~~--~~----~~
60 65 70 75 80 85 90

1.0 ~---------.-:.-:--~:.~:---,-:~::~------~.~:.~:::~)~::-----------. .-..~:~_'.
..
0.5 ...... . ,
..
: .: .:.:.....:.:.:.:. :.:.::. I~. i
0.0 ~.. ~~·~.·~.·~·~~~~~~~n~~.~..~.~a~ftu-~~L.~..~.k-~~l~·\_n~I'~:~~.:~:::~:::
.
... ~. !~I ". .... : ::: : ~ , ....
60 65 70 75 80 85 90
The recessionary regime 3 coincides obviously with the post-1973 recessions ofthe
V.S. economy which are associated with contractions in all other countries:
1.2128 -0.1855
0.8816 0.0700
0.7227 -0.3461
V2 - V3 = V3
1.0426 -0.1098
0.6992 0.2619
0.9943 -0.2198
The smoothed, as weIl as the filtered, probabilities of regime 3 reftect the oil-price
shock recession from 1973:2 to 1975: 1, the double-dip recession of 1979:311980: 1-
1980:3 and 1981:2-1982:4, and the last recession starting 1990:4.
It needs no furtherclarification to see that the MSIH(3)-DVAR(4) model has the best
fit of all estimated models with a log-likelihood of -812.76. While a likelihood
ratio test of the three-regime hypo thesis would be confronted with the violation of
the identifiability assumption of standard asymptotic theory (cf. Section 7.5), this
provides some evidence in favor of the MSIH(3)-DVAR(4) model.
12.3 Conclusions
This chapter has evaluated the potential ofMarkov-switching vector autoregressions

for the analysis of international eo-movements of output growth. The MS-DVAR
analysis presented above is intended to illustrate some of the ideas and procedures
developed in this study, rather than produee a definite model of international business
eycles. Nevertheless, our findings are eonsistent with much of available theoretical
and empirical evidenee. As main results we have found that:
(i.) In each time series considered, business cycle phenomena could be identified
as Markov-switching regimes in the mean growth rate.
(ii.) There is clear evidence for a structural break in the unconditional trend growth
ofthe world economy in 1973:2. For the Japanese economy this result is ob-
vious, in the other univariate analyses the lowered trend growth after 1973 is
expressed in a higher frequency of realized recessionary states.
(iii.) Booms and recessions occur to a large extent simultaneously across countries.
Since the oil-price shock in 1973/74 contemporaneous world-wide shocks
have been the major source of the high international co-movement of output
growth.
(iv.) In addition to the uniform regime shifts in the mean growth rate, the post-1973
period is characterized by a strong contemporaneous correlation of eountry-
specific shocks.
Altogether there is some evidence that the maeroeconomic ftuctuations in the last
twenty years have been mainly driven by world-wide shocks. While the dominance
of a global business cycle does not exclude the possibility that a large asymmetrie
shoek such as the German reunification can temporarily interfere the common cycle,
the MS-DVAR models suggest a less than eentral role for the international transmis-
sion of country-specific shocks. 1
1 In contrast, we will see in the next chapter that the international transmission of shocks in the U.S.
economy dominates the dynamies in a linear cointegrated VAR model.
12.3. ConcJusions 289
Even the very rudimentary six-country models considered in this chapter have been
able to produce plausible results for the statistical characterization of international
business cycles over the last three decades. Nevertheless a deeper analysis seems de-
sirable. In particular, the assumption that the unit roots in the data generating process
can be eliminated by differencing without destroying relevant information seems too
restrictive, as it does not allow for catch-up effects in low-income countries, which
e.g. might be an explanation for the high growth rate ofthe Japanese economy in the
sixties. 2
Therefore the assumption of an MS-VAR model in differences will be relaxed in

the next chapter, where we introduce a new class of models. The Markov-switching
model 0/ cointegrated vector autoregressive processes (MSCI-VAR model) will
provide us with a useful framework for a reconsideration of global and international
business cycles.
2Note, however, that an economically meaningful analysis of the issue of convergence would require
per-capita data which have not been used in this study.
12.A Appendix: Estimated MS-DVAR Models
Table 12.3: MSMH(2)-DVAR(1)
0.2256 -0.0025 0.0226 0.1570 0.0299 0.0669

0.0889 -0.1021 0.0957 0.0886 -0.1800 0.0989
0.4517 -0.0230 -0.1924 0.0931 -0.0900 -0.0166
Ä1
0.1831 -0.1901 -0.1023 -0.1755 0.1398 -0.0143
0.3937 -0.1317 0.1368 0.1230 -0.0030 0.1607
0.1497 0.0544 0.0788 0.0137 0.0532 -0.1103
0.5928 -0.1515 -0.2184 -0.0176 0.1692 -0.1602

-0.1515 1.3724 0.4905 0.3362 0.0842 0.0350
-0.2184 0.4905 2.4984 0.3564 0.0370 0.1786
1:: 1 = -0.0176 0.3362 0.3564 2.1562 0.3287 0.5160
0.1692 0.0842 0.0370 0.3287 0.8300 0.1395
-0.1602 0.0350 0.1786 0.5160 0.1395 1.1687
0.7659 0.1718 0.0992 0.2163 0.2928 0.2215

0.1718 0.5377 0.1531 0.2355 -0.0100 0.0681
0.0992 0.1531 0.9988 0.4231 -0.0077 -0.1106
1:: 2
0.2163 0.2355 0.4231 1.4418 0.0232 -0.0249
0.2928 -0.0100 -0.0077 0.0232 0.5774 0.1615
0.2215 0.0681 -0.1106 -0.0249 0.1615 1.1906
1.0791 0.4754
2.2775 0.9462
1.1262 0.4926
ji1 = 0.8986
ji2
0.3707
, In L = -990.0887
1.4590 0.6757
1.4466 0.6141
P
[ 0.9778
0.0000
0.0222
1.0000 ] ~_
, ~ -
[ 0.0000 ]
1.0000 ' E[h] =[ 44.9996
00 ]
12.A. Appendix: Estimated MS-DVAR Models 291
Table 12.4: MSM(2)-DVAR(1)
0.2041 -0.0023 0.0322 0.1477 0.0680 0.0859

0.1043 -0.0409 0.0494 0.1004 -0.1198 0.0910
0.5569 -0.0696 -0.2341 0.1447 -0.1604 -0.0320
Äl
0.2194 -0.1463 -0.0857 -0.2162 0.0722 0.0415
0.4310 -0.1207 0.1350 0.1134 -0.0200 0.1870
0.1839 0.0549 0.0904 -0.0090 0.0552 -0.0614
0.6996 0.0505 -0.0153 0.1291 0.2467 0.0773

0.0505 0.8413 0.2798 0.2765 0.0250 0.0552
-0.0153 0.2798 1.5456 0.4017 0.0087 -0.0001
f;
0.1291 0.2765 0.4017 1.7006 0.1352 0.1744
0.2467 0.0250 0.0087 0.1352 0.6706 0.1511
0.0773 0.0552 -0.0001 0.1744 0.1511 1.1779
1.0951 0.4749
2.2973 0.9422
1.1473 0.4920
Pl P2 = 0.3717
, In L = -1016.2821
0.8920
1.4705 0.6735
1.4651 0.6097
P
[ 0.9777
0.0000
0.0223
1.0000 ] "- _ [ 0.0000
, ~ - 1.0000 ] , E[h] =[
44.8849
00 ]
Table 12.5: MSI(2)-DVAR(1)
0.0789 -0.0207 0.0291 0.1159 0.0276 0.0362

0.0241 0.1642 0.0583 0.0581 -0.1048 0.1328
0.5115 0.0232 -0.2307 0.1233 -0.1573 -0.0164
ÄI
0.1150 -0.0755 -0.0857 -0.2435 0.0573 0.0279
0.3583 -0.0851 0.1366 0.0854 -0.0364 0.1747
0.0877 0.1173 0.0918 -0.0420 0.0376 -0.0735
0.5665 -0.0121 -0.0485 0.0137 0.1814 -0.0108

-0.0121 1.0079 0.3613 0.2892 0.0410 0.0919
-0.0485 0.3613 1.5847 0.4036 0.0171 0.0164
f;
0.0137 0.2892 0.4036 1.6305 0.0909 0.1212
0.1814 0.0410 0.0171 0.0909 0.6516 0.1286
-0.0108 0.0919 0.0164 0.1212 0.1286 1.1510
0.7804 -0.2551
1.2772 0.4776
0.7082 0.2798
VI V2 , lnL == -1026.1669
0.9385 -0.0074
0.7075 0.0957
0.8810 0.0460
P
[ 0.9482
0.1387
0.0518
0.8613 ] ":. _ [ 0.7278
, €- 0.2722 ] , E[h) == [
19.2872
7.2122 ]
Table 12.6: MSM(2)-DVAR(4)
0.1968 0.0196 0.0275 0.1512 -0.0164 0.0119

0.0933 0.1612 0.0506 0.0441 -0.0205 0.0838
0.3259 -0.0268 -0.2646 0.1285 -0.0878 -0.0770
Ä1
-0.0035 -0.0784 -0.1930 -0.2299 0.1189 -0.0173
0.4814 -0.0154 0.1280 0.1208 -0.1604 0.2139
0.0753 0.0985 0.0377 0.0130 0.2392 -0.1108
0.1130 -0.0070 0.0884 -0.0011 0.0699 -0.0499

-0.1225 0.2172 0.0379 -0.1599 0.1312 -0.0091
0.1248 0.1615 -0.0605 -0.1081 0.0670 0.1350
Ä2 -0.0729 -0.1414
0.3167 0.0948 -0.0233 -0.0655
0.1930 -0.1810 0.0985 0.0148 -0.1113 0.0903
-0.1841 -0.0534 -0.0553 0.1000 0.2354 -0.1955
-0.2057 -0.0351 0.0353 0.0909 0.0611 0.0008

-0.1177 0.1057 -0.1856 -0.0120 0.0586 0.0125
-0.2395 0.0032 -0.0814 -0.0202 0.0332 -0.1455
Ä3 -0.0845 -0.0051 0.0327
-0.0251 0.1493 --0.1274
0.1924 0.0263 -0.0132 0.0642 0.0710 0.0643
-0.0046 0.2059 -0.0856 0.0828 0.2959 -0.1056
0.0417 -0.0022 -0.0885 0.0327 -0.0308 -0.0466

0.0437 0.0696 0.0423 -0.2506 -0.1407 0.0363
-0.1803 0.2430 0.2732 -0.0420 -0.0862 -0.0857
Ä4 = 0.0455 -0.0361 0.0862 -0.0157 -0.1844 -0.0782
-0.0339 0.0929 0.0742 -0.0472 0.0148 -0.0026
-0.4117 0.2223 0.0856 -0.1452 0.2790 -0.3423
0.4604 -0.0426 0.0264 0.0570 0.1610 -0.0380

-0.0426 0.7035 0.2058 0.1930 0.0311 -0.0300
0.0264 0.2058 1.2384 0.2916 0.0456 -0.1297
t 0.1930 0.2916 1.4572 0.1378 0.0951
0.0570
0.1610 0.0311 0.0456 0.1378 0.4708 -0.0102
-0.0380 -0.0300 -0.1297 0.0951 -0.0102 0.7364
1.0672 -0.7622
1.5786 0.5704
0.9305 -0.2149
fit
0.7942
jl2 = -0.3876
1.2901 -0.3795
1.1568 -0.2193
P
[ 0.9482
0.1829
0.0518
0.8171 ] ~ = [ 0.7791
0.2209 ] , E[hJ =[
19.2886
5.4681 ]
In L = -936.1343
0.1696 0.0315 -0.0642 0.1673 0.1830

0.0442 0.0430 0.2110 0.2080 -0.1619 0.0510
0.2108 ]
[ 0.1957 0.0025 -0.0441 -0.1063 0.0924 -0.1235

Ä1 = 0.0713 0.0383 -0.1336 -0.3614 -0.0462 -0.1209
0.3604 -0.1025 0.2151 0.1606 -0.0511 0.2809
0.0431 0.1789 0.0147 0.0668 0.0720 -0.1162
0.0382 0.1206 -0.0746 -0.0354 0.0842 -0.0661
[
-0.2799 0.1380 0.1526 -0.0426 0.1060 0.0894
0.1096 0.1145 -0.1205 -0.0551 0.1754 -0.0681
Ä2 0.3795 -0.0500 -0.0369 0.0515 -0.0916 -0.1422
0.0020
-0.1342
-0.1344
-0.1081
0.0614
-0.0288
-0.0586
0.0172
0.1587
0.0778
0.0164
0.2645
-0.0373
0.0918
-0.0904
0.0117
1
-0.0261 -0.0253
[
-0.3536 0.0937 -0.0159 0.0285 0.4001 0.1961
0.0811 -0.1399 -0.1529 0.0188 0.0865 -0.4420
Ä3 0.0080 0.0930 -0.1220 -0.0721 -0.0091 0.2572
-0.1243
-0.1507
0.0021
0.1263
-0.0684
-0.0182
-0.1878
0.0566
0.0618
0.0594
0.1451
0.3245
-0.1784
0.0563
0.0903
-0.0040
1
0.1386 0.0032
-0.2913 -0.1810 0.2756 -0.2046 0.0169 -0.0272
[ ]
-0.0335 0.1626 0.2977 -0.0310 0.0909 -0.0115
Ä4 -0.0153 -0.0895 0.1455 -0.0072 0.0292 0.3456
-0.1045 0.0194 0.0958 0.0616 -0.0688 -0.0188
-0.5272 0.1389 0.1519 -0.1410 0.3185 -0.1247
0.4383 -0.2789 -0.0979 -0.1086 0.0603 -0.0366
-0.2789 1.2289 0.0283 0.2937 0.2400 0.1901
[ ]
-0.0979 0.0283 2.1132 0.4829 0.0437 -0.2993
EI -0.1086 0.2937 0.4829 2.0066 0.0915 0.3645
0.0603 0.2400 0.0437 0.0915 0.5582 0.0867
-0.0366 0.1901 -0.2993 0.3645 0.0867 1.0143
1.0214 0.4668 0.3754 0.2659 0.5009 0.2365
]
0.4668 0.6209 0.5250 0.1423 -0.1681 0.0119
t 2
[ 0.3754
0.2659
0.5009
0.2365
0.5250
0.1423
-0.1681
0.0119
0.5294
0.0653
-0.1471
-0.0187
0.0653
0.5531
0.0815
-0.4368
-0.1471
0.0815
0.7043
0.2151
-0.0187
-0.4368
0.2151
0.9074
0.9862 0.1904
[ ] [ ]
1.8992 0.4285
ji.l
0.8031
0.9333 I ji.2 = 0.4254
-0.0293 I In L = -895.4399
1.3234 0.2607
1.4481 -0.2344
P [ 0.9408
0.1184
0.0592
0.8816 ] I
€_ [
-
0.6667
0.3333 ] I E[hJ =[ 16.8968
8.4460 ]
0.2611 0.0486 -0.0491 0.2805 0.1756 0.1942

0.1353 -0.1208 0.0885 0.0892 -0.0438 0.1322
0.3405 -0.0349 -0.1690 0.0697 -0.0619 0.0002
Ä1
0.1466 -0.1929 -0.0869 -0.1512 0.1849 0.1359
0.5093 -0.0940 0.1147 0.2400 -0.0883 0.3965
0.2214 0.0519 0.0960 -0.0526 0.1102 -0.0565
0.3490 -0.3009 -0.1210 0.2879 -0.0043 -0.1371

-0.3009 1.1041 0.7852 0.1396 0.1568 -0.0554
-0.1210 0.7852 4.2340 0.4872 0.2583 0.3914
~l
0.2879 0.1396 0.4872 1.7697 0.1680 0.4827
-0.0043 0.1568 0.2583 0.1680 0.2563 -0.0908
-0.1371 -0.0554 0.3914 0.4827 -0.0908 1.2117
0.5479 -0.0616 -0.0358 0.1163 0.1942 0.1029

-0.0616 0.3746 0.1008 0.3399 0.0955 0.0972
-0.0358 0.1008 1.0126 0.5358 0.1223 0.1096
~2
0.1163 0.3399 0.5358 1.5798 0.3222 0.0055
0.1942 0.0955 0.1223 0.3222 0.7137 0.0775
0.1029 0.0972 0.1096 0.0055 0.0775 1.0847
1.9401 1.2769 0.4678 0.1393 0.8160 0.6396

1.2769 1.5155 0.4951 0.4007 0.0912 0.1197
0.4678 0.4951 0.7174 0.1230 -0.3385 -0.4429
~3
0.1393 0.4007 0.1230 0.5569 -0.2180 0.1382
0.8160 0.0912 -0.3385 -0.2180 1.0667 1.0594
0.6396 0.1197 -0.4429 0.1382 1.0594 1.7205
1.2692 0.5998 -0.4463

2.8917 1.0280 0.7623
1.4081 0.5785 0.3866
j'L1 j'L2 j'L3
1.2664 0.7225 -0.8935
1.6168 0.7755 -0.1095
1.8695 0.7742 -0.0551
[
0.9214 0.0786 0.0000
1 12.7305
P
1' = =[
~ [ 0217B
1
0.0287 0.8418 0.1295 0.5961 • Elh) 6.3217
0.0000 0.4148 0.5852 0.1861 2.4109
In L = -926.3881
296 Markov-Switching Models oE Global and International Business CycJes
Table 12.9: MSIH(3)-DVAR(4)
0.0986 -0.0643 0.0296 0.1163 0.1327 -0.0336

-0.1349 0.0681 0.1029 0.0009 0.0164 -0.0255
[
0.3019 -0.1200 -0.2193 0.0132 0.0131 -0.0979
Ä1 -0.0610 -0.1047 -0.2615 -0.2652 0.0713 -0.1593
0.3499
-0.0589
-0.0604
-0.0567
0.1331
0.1620
0.0128
-0.0229
0.0837
0.0469
-0.1207
-0.0646
-0.0478
0.0968
0.1396
-0.3542
-0.1422
1
0.0350
[
-0.0728 0.2435 0.0363 -0.1468 0.1155 -0.0468
0.0511 0.0418 -0.1448 -0.0005 -0.0168 0.1427
Ä2 0.3612 -0.0708 -0.0609 0.0014 0.0717 -0.0085
-0.0030
0.0544
-0.1201
-0.1911
-0.1178
0.0024
0.0101
-0.0286
0.0311
-0.0733
0.1261
0.0821
-0.0246
0.2739
0.1251
-0.1022
-0.2316
0.0014
1
[
-0.0563 0.0256 -0.1119 0.0813 0.2189 0.0239
0.0021 -0.0840 0.0180 0.0349 0.1336 -0.0730
Ä3 -0.0040 0.1096 -0.1864 -0.0895 -0.0162 -0.0246
0.0552
0.0929
0.0042
0.2081
-0.0829
-0.2138
0.0215
0.0068
0.0799
0.0678
0.3014
:.....0.0504
0.0316
-0.0966
-0.1485
1
-0.0045 -0.0234 -0.1148
[
-0.0318 -0.0323 0.0625 -0.0379 -0.0894 0.0954
-0.1204 0.1950 0.3304 -0.0851 -0.0139 0.1104
Ä4 0.0930 -0.1344 0.1110 -0.0246 0.0654 0.2069
-0.0818
-0.3487
0.0967
0.2090
0.0807
0.0815
0.0006
-0.1422
0.2365
-0.0222
0.3095
0.2601
-0.0994
-0.2874
-0.1803
1
0.3447 -0.2417 -0.2682
[
-0.2417 0.5941 0.8281 -0.7906 0.2479 -0.0443
-0.2682 0.8281 1.8107 -1.5582 0.1518 -0.1079
EI 0.2365 -0.7906 -1.5582 4.1308 -0.0078 0.1913
0.2601
-0.1803
0.2479
-0.0443
0.1518
-0.1079
-0.0078
0.1913
-0.2410
0.7993
-0.3136
0.1643
-0.3136
0.1700
0.0177
1
0.4356 -0.0643 -0.1381
[ ]
-0.0643 0.7252 -0.0951 -0.0469 0.0457 0.0284
-0.1381 -0.0951 1.2716 0.3832 -0.1294 -0.3024
E2 -0.2410 -0.0469 0.3832 1.2429 -0.0479 0.0887
0.1643 0.0457 -0.1294 -0.0479 0.5199 0.1490
0.0177 0.0284 -0.3024 0.0887 0.1490 1.2206
0.6159 0.1514 0.2343 0.3784 0.1238 -0.0109
[
0.4712 -0.3952 -0.3606
]
0.1514 0.5515 0.2697
0.2343 0.2697 0.5938 0.0464 -0.2620 -0.1664
E3 0.3784 0.4712 0.0464 0.7824 -0.1304 -0.3441
0.1238 -0.3952 -0.2620 -0.1304 0.6942 0.2092
-0.0109 -0.3606 -0.1664 -0.3441 0.2092 0.3892
1.4626 1.0273 -0.1855
[ ] [ ] [ ]
2.4330 0.9516 0.0700
1.9709 0.3766 -0.3461
VI
3.0103 V2 = 0.9328
, V3 = -0.1098
, In L = -812.7632
1. 7937 0.9611 0.2619
1.5609 0.7745 -0.2198
[ ]
0.6929 0.2052 0.1019 ] 3.2560
[ 0.0787 ]
P = 0.0351 0.9156 0.0493 , ~= 0.6887 , E[h] = [ 11.8550
0.0000 0.1803 0.8197 0.2326 5.5455
Chapter 13
Cointegration Analysis of VAR Models with

Markovian Shifts in Regime
The following consideration proposes a new methodological approach to the ana-

lysis of cointegrated linear systems with shifts in regime. The main difference with
the foregoing analysis, as weIl as the previous literature, is the application of the
MS-VAR model and the associated statistical procedures to cointegrated time series.
Whereas the relevance of shifts in regimes of cointegrated time series has recently
found a growing audience, the current state-of-the-art in this increasingly important
field is rudimentary.
The chapter proceeds as follows. The next section gives abrief introduction into
the issue of cointegration. Then we introduce the MSCI(M,r)-VAR(p) model as a
Markov-switching p-th order vector autoregression with cointegration rank rand M
regimes. Modelling and some basic theoretical properties of these processes are dis-
cussed in Section 13.1. Issues of co-breaking drifts and intercepts are also investig-
ated. In a generalization of the results of Chapter 3, a cointegrated VARMA repre-
sentation for MSCI(M,r)-VAR(p) processes is introduced in Seetion 13.2. For this
dass of processes, a two-stage ML estimation technique is proposed in Section 13.3.
In the first stage, the JOHANSEN [1988], [1991] procedure is applied to finite VAR
approximations of the data generating MSCI-VAR process in order to deterrnine the
cointegration rank and estimate the cointegration matrix. In the second stage, condi-
tional on the estimated cointegration matrix, the remaining parameters of the vector
equilibrium correction representation of the MSCI-VAR process are estimated via
the version of the EM algorithm presented in Section 10.1. Finally, the proposed
methodology is illustrated with an application to the data set introduced in the last
chapter.
298 Cointegration Analysis of VAR Models with Markovian Shifts in Regime
13.1 Cointegrated VAR Processes with Markov-

Switching Regimes
13.1.1 Cointegration
In the foregoing chapters we have considered stationary, stable MS-VAR processes.

It has been assumed that the variables of the system under consideration have time
invariant (unconditional) first and second moments or that stationarity can be estab-
lished by removing the stochastic trend of integrated variables via differencing as
in Chapter 11. In a multiple time series framework this procedure can be mislead-
ing. The integrated variables could be cointegrated, i.e. there can exist a linear rela-
tion between the undifferenced variables which is stationary. This dass ofVAR pro-
cesses has been introduced by GRANGER [1981] and ENGLE & GRANGER [1987].
In econometrics, these cointegration relationships are often interpreted as the long-
run equilibrium of the system.
The basic model will be based again on a finite order VAR process with Markov-
switching intercepts
Yt L AiYt-i + Ut + lI(St), (13.1)

i=l
where Yt = (Ylt, ... , YKd, lI(St) = (1I1(St), ... , lIK(St))', the Ai are (K x K)
coefficient matrices and Ut = (Ult, ... , UKt)' is a Gaussian white noise with cov-
ariance matrix L:, Ut ....., NID (0, L:), and Yo, ... , Yl-p are fixed. The reverse charac-
teristic polynomial ofthe system (13.1) is given by
If IA (z ) I has one or more roots for z = 1, IA (1) I = 0, and all other roots are outside
the complex unit cirde, IA( z) I =1= 0 for Iz I :s; 1, z =1= 1, the Yt variables are integrated
and possibly cointegrated.
In the following we consider processes where Yt is integrated of order 1, Yt ....., 1 (1),
such that f:j.Yt is stable while Yt is unstable. The 1(1) process Yt is called cointegrated
if there is at least one linear combination of these variables c' Yt which is stationary.
Obviously, there can exist up to K - 1 linearly independent cointegration relation-
ships. The variable Zt = c' Yt - b with b = E[c' Yt] is a stationary stochastic variable
measuring deviations from the equilibrium.
13.1. Cointegrated VAR Processes with Markov-Switching Regimes 299
The concept of co integration is closely related to the error correction model respect-
ively vector equilibrium correction model (VECM) proposed by DAVIDSON et aI.
[1978]. Subtracting Yt-l from both sides and rearranging terms, the process defined
in (13.1) can be written in its vector equilibrium correction form as
p-l
!:lYt L Di!:lYt-i + TIYt-p + Ut + v(St), (13.2)

i=l
where !:l is the differencing operator, !:lYt := Yt - Yt-l, the coefficient matrices are
defined by D i = -(I K - L.:~=l A j ), i = 1, ... ,p - 1 and the matrix TI = I K -
L.:;=1 Aj = A(l) is singular. The rank r of the matrix TI is called the cointegration
rank. Thus TI can be written as BC with Band C' being of dimension (K x r) and of
rank r. The (r x K) matrix C is denoted as the cointegration matrix and the matrix
B is sometimes called the loading matrix. We consider systems with 0 < r < K,
thus Yt is neither stationary (r = K; TI unrestricted) nor purely difference stationary
(r = 0; TI = 0). A more detailed discussion ofthe properties and statistical analysis
oflinear cointegrated systems (v(st} == v) can be found in LÜTKEPOHL [1991, ch.
11]. A Markov-switching p-th order vector autoregression with cointegration rank r
is called MSCI(M,r)-VAR(p) model.
13.1.2 The MSCI-VAR Model
In cointegrated VAR(P) models, the intercept term v reftects in general two rather
different quantities. Applying the expectation operator to the VECM model (13.2)
gives us
D(I)E[!:lyt] = v + BE[CYt],
where D(l) = IK - D1 - ... - D p- 1. Thus,
v = -B8 + D(I)JL,
where JL denotes the expected first difference of the time series
and 8 is a constant determining the long-run equilibrium and is thus included in the
cointegration relation,
Cointegration implies the following restriction for the expected first differences of
the system:
0, (13.3)
revealing that J-t consists only of K - r free parameters reflecting the common de-
terministic linear trends of the system. Thus J-t can be parameterized as
where C.l is a fuH column rank (K x [K - rl) matrix orthogonal to C, CC.l = 0,

and J-t* is a ([K -r] x 1) vector. Suppose that the (K x r) matrix C can be partitioned,
C = [Cl C2 ] , such that Cl is a non-singular (r x r) matrix, C2 is an (r x [K -rl)
matrix. Then equation (13.3) can be rewritten as
from where it follows that
and we get
J-t * == [ /Li.. ].
ILK-r
If the intercept term can be absorbed into the cointegration relation, the variables
have no deterministic linear time trends. Otherwise, in the absence of any restriction
on v, there are K - r time trends producing the drift in Yt. such that
(13.4)
In cointegrated systems it is therefore useful to discriminate between models with

the absorbing restriction, v = -Bö, and unrestricted models with B.lv =j:. 0.
Analogously, a regime shift in the intercept term can change the mean growth rate
and the equilibrium mean. In MSCI-VAR models each regime m == 1, ... , M is
associated with an attractor (/L:n, Öm ):
Thus alternative specifications for MSCI-VAR models are possible:
(i.) Unrestricted shifts of the intercept 1I( sd:
(13.5)
(ii.) Shifts in the drift /1( sd:
D(L) (~Yt - /1(St)) = B ( CYt-p - <5) + Ut, (13.6)
where bis an (r x 1) vector, and /1(St) = C.l/1(St)* is a (K x 1) vector with

K - r free parameters /1';,. for each regime m.
(iii.) Shifts in the long-run equilibrium <5(St):
D(L) (t1Yt - /1) = B ( CYt-p - b(sd) + Ut, (13.7)
where <5 (St) is an (r x 1) vector of switching cointegration intercepts, and /1 =

C.l J.L * is a (K x 1) vector with K - r free parameters /1 *.
(iv.) Contemporaneous shifts in the drift /1( St) and in the long-run equilibrium
b(sd:
(13.8)
where b( St) and J.L(St) are defined as in (13.6) and (13.7). The difference to the
model in (13.5) consists of an immediate one-time-jump of the process drift
and equilibrium mean after a change in regime, as in the MSM-VAR model.
Furthermore, the shifts in the drift and in the long-run equilibrium might be
(contemporaneously or intertemporally) perfectly correlated or not.
The MS-VECM model is closely related to the notion of multiple equilibria in dy-
namic economic theory. Henceforth, each regime is characterized by its attractor of
the system which is defined by the equilibrium value of the cointegration vector and
the drift.
Consider, for example, a bivariate model with logarithms of income Yt and consump-
tion Ct, where the cointegration relation is determined by an equilibrium consump-
lion ratio, Ct - Yt = <5. The MS-VECM form of this model is given by
(13.9)
where Ut '" NID (0, L:) and J-L* is the equilibrium growth rate.
In (13.9), each regime m is associated with a particular attractor (J-L':n, Om) given by
the equilibrium growth rate J-L':n and the equilibrium consumption ratio om. Hence the
different specifications of the MSCI-VAR process can be characterized either by (i.)
a rather complex dynamic adjustment after the transition from one state into another,
v(st}, (iL) regime shifts in the common growth rate J-L* (st}, (iii.) regime shifts in the
equilibrium consumption ratio o( St), or (iv.) contemporaneous regime shifts in both
parameter vectors, J-L(sd and o(St).
13.1.3 A State-Space Representation for MSCI-VAR Processes
In order to conc1ude these considerations we introduce a stable state-space represen-

tation of cointegrated VAR processes with Markovian regime shifts in the intercept
term. Thus, the intercept term v( St) in (13.2) is not a simple parameter but is gener-
ated by a stochastic process,
(13.10)
where ii is the unconditional mean of v( St) and M = [ VI -VM VM -1 -VM ]
l
is a (K x [M - 1]) matrix and the ([M - 1] x 1) regime vector is defined as
~1t - [1 ]
(13.11)
{M-l,t ~ (M-l
The regime vector (t follows the hidden Markov chain, which is again represented
as a VAR(1) process,
(13.12)
where in the ([ M -1] x [M -1]) matrix :F the adding-up restrietion on the transposed
transition matrix P' is elirninated,
:F
l Pu -
Pl,M-l ~
PMl
PM,M-l
PM-l,l - PMl
PM-l,M-l - PM,M-l
] ,
and Vt is a non-normal zero-mean white noise process.

Reparametrizing the VECM form (13.2) as proposed by CLEMENTS & MIZON

[1991] for (time-invariant) vector equilibrium correction models combined with
(13.11) and (13.10) gives the following state-space representation of MSCI(M, r)-
VAR(P) processes:
D.Yt
D.Yt-l
D.Yt = [I K 0 ... 0 ] (13.13)

D.Yt-p+l
CYt-p
(t
D 1 ... D p- 1 Be B M D Ut
D.Yt D.Yt-l
IK 0 0 0 0 0 0
D.Yt-l D.Yt-2
= + +
D.Yt-p+l 0 IK 0 0 D.Yt-p
CYt-p 0 0 C Ir 0 CYt-p-l 0
(t 0 0 F (t-l Vt
0
Equation (13.13) also generalizes the state-space representation (3.7) introduced in

Section 3.1 for MSI( M)- VAR(P) processes (formulated for a differenced time series
vector) as it takes account of the equilibrium correction mechanism:
D.Yt - jl
D.Yt-l - jl
(D.Yt - ji) [ IK 0 ... 0 ] (13.14)

.tl.Yt-p+l - jl
CYt-p -"8
(t
flYt - jl D1 ... Dp-l BC B M D.Yt-l - jl Ut
D.Yt-l - jl IK 0 0 0 0 D.Yt-2 - jl 0
+
D.Yt-p+l - jl 0 IK 0 0 D.Yt-p - jl
CYt-p - 8 0 0 C Ir 0 CYt-p-l - 8 0
(t 0 0 F (t-l Vt
If there exists no absorbing state of the Markov chain, then all eigenvalues of :F are
less than one in absolute value. In PROIETTI [1994, p.5] it is shown that the remain-
ing eigenvalues of (13.13) lie outside the unit circle. Thus, the state-space represen-
tation (13.13) associated with MS-VECM processes is stable.
This steady-state representation opens the way to the investigation of common trends
and cycles. Due to the non-normal innovations Vt, the statistical analysis of (13.13)
requires a combination of KaIman filter with BLHK filter techniques, which have
been discussed in Chapter 5. We willleave this last issue to future research.
13.1.3.1 Co-Breaking and MSCI-VAR Processes
When multiple time series are subject to regime switching, the shifts in regime can
be related in an analogous way to cointegration. To clarify the properties of the re-
gime shifts in MSCI-VAR processes, a comparison with the concept of co-breaking
recently introduced by CLEMENTS & HENDRY [1994] and HENDRY [1996] might
be helpful.
Co-breaking is closely related to the idea of cointegration (cf. EMERSON & HENDRY
[1995]): while cointegration removes unit roots from linear combinations of vari-
ables, co-breaking can eliminate the effects of regime switching by taking linear
combinations of variables. Roughly speaking, (drift) co-breaking prevails if the re-
gime shift alters the drift of system such that at least one linear combination remains
stationary. The condition for co-broken MSCI-VAR processes can be formulated as
r [ l/1 - l/M l/M-l - l/M ] = r M = 0,

where r is an (n x K) matrix collecting the n contemporaneous mean co-breaking
combinations of variables. In a K dimensional MS(M)-VAR(P) model there must
occur n ~ K - M + 1 linearly independent co-breaking vectors (cf. HENDRY
[1996]). The obvious most interesting co-breaking relations are given by the co-
integration matrix r C, if the shift in the intercept term satisfies (ii), l/(St)
CJ..f.L*(St),
CM=O;
and by r = B J.., if the regime shifts are restricted to the cointegration space as in
(iii), l/(St) = -B6(St),
Bl.M=O,
where B 1.. is a full row rank ([K - Tl x K) matrix orthogonal to the loading matrix
B, BBJ.. = O.
For MSCI -VAR processes, the stationarity of the cointegration relation remains un-
altered even if the regime shifts are not co-breaking. Due to the stationarity of the
stochastic process generating the path of regimes the effects of regime switching are
elirninated asymptotically. Since there exists an ergodic distribution of the state vec-
tor ~t, a shift in regime does not affect the unconditional drift of the cointegrated
variables.
13.1.3.2 More general MSCI-VAR Processes
As in the case of stationary Markov-switching VAR processes, the MSCI-VAR

model can be generalized by assurning that the coefficient matrices and the covari-
ance matrices are regime dependent:
Di(St) shifts in the short-term dynarnics,

B( St) shifts in the feedback of the equilibrium correction mechanism,
G(St) shifts in the cointegration relation (switching long-run equilibria),
~(St) regime-dependent heteroskedasticity.
While shifts in the coefficient matrices D i and B of the MS-VECM model can be
treated analogously to the stationary MSA-VAR model, further investigations need
to be undertaken for the implications of shifts in the cointegration relation.
If the variance-covariance matrix ~ is allowed to vary over regimes, the error term
Wt of the resulting VAR model becomes bilinear in the innovations Vt and Ut such
that
where Vt '" NID (0, I K ), ~~(2 is the lower-triangular Choleski decomposition of

~m and
~= [~1
is the unconditional mean of ~t = ~(St) where S is defined as
The bilinearity of (13.15) may effect the justification to be given in Seetion 13.3 for
the applicability ofthe Johansen framework.
In the following we consider only the simplest case, where the deviation from Gaus-
sian cointegration systems (as considered inter alia by [1995]) is restricted to the
different treatment of the intercept term, which is no longer a simple parameter but
is assumed to be generated by a stochastic process, i.e. the hidden Markov chain. An
example given in Section 13.4 will illustrate the relevance ofthe MSCI-VAR model
for empirical macroeconomics. However, it must be emphasized that due to their
non-standard asymptotics the estimated MSCI-VAR models are here primarily used
as descriptive devices. Nevertheless, this investigation shows that the development
of an asymptotic distribution theory for these pro ces ses is a worthy pro gram for fu-
ture research.
13.2 A Cointegrated VARMA Representation for

MSCI -VAR Processes
In contrastto Gaussian models considered by JOHANSEN [1988], [1991], [1995] the

intercept term v is determined by the shift function v(St) = (VI, ... , VM )~t of the
MSCI-VAR model,
Yt :L: AiYt-i + Ut + v(St). (13.16)

i=l
Thus, the intercept term is not a simple parameter but it is generated by the stochastic
process (13.10):
V(St) = v+ M(t,
wherevistheunconditionalmeanofv(st)and M = [VI-VM ... VM-I-VM]

is a (K x [M - 1]) matrix and the ([M - 1] x 1) regime vector (has been defined
in (13.11). Recalling the VAR(1) representation of the regime generating Markov
chain in (13.12), the MA(oo) representation is given by
2:
00
FjVt-j. (13.17)
j==O
13.2. A Cointegrated VARMA Representation for MSCI- VAR Processes 307
Hence the intercept term lI(St) is generated by a linearly transformed VAR(l) pro-
cess. Inserting (13.17) and (13.10) into (13.16) gives us
p =
Yt = L AiYt-i + Ut + iJ + M L :FjVt-j. (13.18)
i=l j=O
Thus the Markovian shift in the intercept term implies a cointegrated VAR process
where the equilibrium term Wt is the sum oftwo independentprocesses, the Gaussian
white noise Ut and an autocorrelated non-normal process,
Yt iJ +L AiYt-i + Wt, (13.19)

i=l
Wt = Ut +M L :FjVt-j' (13.20)
j=O
Denoting :F(L) = IM -1 - :FL, the process Wt can be rewritten as
Using the definition of the adjoint matrix, :F(L)* = I :F(L)I :F(L)-l, results in
In Chapter 3 we have shown that the stationary process Wt possesses a VARMA(M -

1, M - 1) representation with a non-normal zero-mean white noise Ct,
(13.21)
where,(L) = I:F(L)I = IIM-l - :FLI andB(L) = IK - B 1 L- ... -BM-I LM-l.

Note that ,(L) is a scalar. Thus equation (13.21) is a final-equation form, which is
identifiable. Consider now the implications of (13.21) for the properties cf the pro-
cess generating the observed variables Yt:
As in the stationary case, the VARMA(M - 1, M - 1) structure ofthe innovations

Wt results in a - now cointegrated - VARMA process of Yt as
I(L)A(L)Yt 1(1)0 + B(L)ct (13.22)
or written with the (K x K) reduced polynomial A(L) = I(L)A(L) and the (K xl)
constant ao = 1(1)0,
p+M-1 M-1
Yt ao + L AjYt-j + ct + L Biet-i. (13.23)

j=1 i=1
As in the VARMA representation theorems of stationary MS-VAR processes in

Chapter 3, the given orders are only upper bounds if e.g. it is not ensured that the
unrestricted transition matrix F is non-singular, rk( F) = M - 1.
From (13.22) it is clear that MSCI(M, r)-VAR(p) processes can be written in the
form of a vector autoregression with an infinite order. To illustrate thi~ point, sup-
pose that both sides of equation (13.23) are multiplied with the inverse polynomial
B(L)-1 such that
I(L)B(L)-1 A(L)Yt (13.24)
Defining w(L) = I K - l::1 WiLi ,(L)B(L)-1 A(L) as the infinite AR poly-

:=
nomial, we would have a cointegrated infinite order system with non-normal innov-
ations ct,
L WiYt-i + ct,
00
Yt 1jJ + (13.25)
i=O
where the interceptterm 1jJ = 1(1 )B(l )-1 0 reflects the unconditional mean of 1I( sd
and w(L) exhibits only the unit roots introduced by A(L).
Some remarks on this point are necessary. Note that Yt is an integrated variable and
thus an infinite sum is not absolutely summable. In this sense equation (13.25) is
not well-defined. The rough disregard of the initial conditions of the process might
be justified for our purposes as we are not interested in the parameters of (13.25).
13.3. A Two-Stage Procedure 309
Furthermore, the representation suggested in SAIKKONEN & LÜTKEPOHL [1994]

could be easily achieved.
The main point is that equation (13.25) characterizes the cointegrated system (13.1)
with Markovian regime shifts as a non-normal cointegrated vector autoregression of
infinite order. This property ofMSCI-VAR processes enables us to base the cointe-
gration analysis of such data generating processes on procedures available for infinite
order VAR models.
13.3 A Two-Stage Procedure
The fuH-information maximum likelihood analysis of cointegrated systems, which

has been developed by JOHANSEN [1988], [1991] for finite order Gaussian vector
autoregressions, has been recently extended to more general model cIasses. The pur-
pose of the following considerations is to check how far the MSCI-VAR model can
be analyzed within the cointegration analysis framework introduced by JOHANSEN
[1988], [1991]. It is beyond the scope ofthis study to develop an asymptotic theory
for the dass of models under consideration. Instead, we base our analysis on general
results available in the literature on the infinite cointegrated vector autoregressive re-
presentation of MSCI -VAR processes ..
13.3.1 Cointegration Analysis
SAIKKONEN [1992] and SAIKKONEN & LUUKKONEN [1995] show that the use of
analogs or elose versions of the likelihood ratio tests developed for finite order Gaus-
sian vector autoregressive processes is justified even when the data are generated by
an infinite non-Gaussian vector autoregressive process.
A vector equilibrium correction model with finite order h is fitted to the data which
are assumed to be generated by an infinite order cointegrated VAR process. If the
finite order VAR process is regarded as an approximation,
h
<Ph + 2: Di,hfJ.Yt-i + IIhYt-l + Ut,h, (13.26)
i=l
310 Cointegration Analysis oE VAR Models with Markovian ShiEts in Regime
SAIKKONEN [1992] provides some general asymptotic results for infinite order VAR
processes showing that most of the asymptotic results of J OHANSEN [1988], [1991]
for the estimated cointegration relations and weighting matrix remain valid.
Thus, the conditions of the Johansen-Saikkonen test correspond to the situation cur-
rently under consideration. For the application to the specific model under consid-
eration, four results are essential:
• Under the assumption that the order of the fitted process is increased with the
sampie size, some of the results for finite order VAR processes can be extended
to these more general data generation processes. The asymptotic properties of
the estimated short-run parameters, as weIl as impulse responses, are derived
in SAIKKONEN & LÜTKEPOHL [1994] and LÜTKEPOHL & SAIKKONEN
[1995]. In particular, LÜTKEPOHL & SAIKKONEN [1995] demonstrate that
the usual interpretation of cointegrated VAR systems through impulse re-
sponses and related quantities can be justified even if the true VAR order is
infinite while a finite VAR(P) process is fitted to the observed data.
• Furthermore, LÜTKEPOHL & CLAESSEN [1996] conjecture that the results

of SAIKKONEN [1992] (in conjunction with those of JOHANSEN [1991]) jus-
tify the application of the Johansen and Johansen-Saikkonen test statistics to
systems that include an intercept term, which generally involves deterministic
trends.
• A major problem might occur from the fact that an asymptotic estimation
theory is not weIl established for infinite cointegrated VAR processes with
a drift term (cf. SAIKKONEN & LÜTKEPOHL [1994]): The contribution of
SAIKKONEN [1992] and his co-authors is restricted to models where the inter-
cept term can be included in the cointegration relation 11 = -Bb, where b is
an (r xl) vector. Furthermore, the asymptotic distribution of ML estimates of
the general intercept term is non-standard (cf. HAMILTON [1994a, ch. 18.2]).
Thus, LR tests of hypotheses conceming a shift of the intercept term typically
have non-standard distributions even if the number of regimes is unaltered un-
der the null. The estimated regime-dependent intercept terms primarily have
a descriptive value.
• FinaIly, it can be assumed that the non-normality of Ct is not essential for

our analysis. As stressed by JOHANSEN [1991, p. 1566] the assumption
13.3. A Two-Stage Procedure 311
of a Gaussian distribution is not very serious conceming the results of the

asymptotic analysis. Furthermore, the normality assumption is not involved
in SAIKKONEN & LÜTKEPOHL [1994]. Thus, the procedure proposed can at
least be considered as a pseudo ML method.
Since the assumed latent Markovian shifts in regime imply a data generating
VARMA process, the Johansen-Saikkonen statistic would be a natural testing pro-
cedure. However LÜTKEPOHL & CLAESSEN [1996] found for small sampies that
the Johansen statistic is more closely approximated by the identical asymptotic
distribution. In conclusion, under the prevailing conditions of our analysis, that is,
under the assumption that the data generating process is an MSCI(M, r)-VAR(p)
process, there is no obstac1e in studying the long-term properties within the well-
known Johansen framework for linear systems.
13.3.2 EM Algorithm
Our two-stage procedure employs the Johansen ML analysis only to determine the
cointegration rank r ofthe system and to deliver an estimation of cointegration mat-
rix C. The remaining parameters of the MSCI(M, r)-VAR(p) model are estimated
with the methods developed in Chapters 6, 9 and 10.
While the cointegration analysis has been based on approximating a linear sys-
tem, we consider again the equilibrium correction form of the data generating
MSCI(M, r)-VAR(p) process:
p-l
6.Yt = LD i 6.Yt-i + Bzt _ p + M~t + Ut, (13.27)

i==1
where M = [VI, ... , VM] and Zt = CYt.

Conditional on the estimated cointegration matrix C and Zt = CYt, equation (13.27)
corresponds to the reduced form (10.2) of an MS(M)-VARX(P) model. Hence the
remaining parameters ofthe MS(M)-VECM(P - 1) model in (13.27) can be estim-
ated with the EM algorithm introduced in Section 10.1 for the ML estimation ofMS-
VARX models.
Figure 13.1: Growth in the World Economy. Log of Real GNP 1960-1991
200
JAP
180
160
140
CAN
120 AUS
100 ~,
~-_ FRG
USA
80
UK
60
40
20
0
60 65 70 75 80 85 90 95
13.4 Global and International Business Cycles
As an illustration of the procedures described in the previous sections, we consider

again the six-dimensional system of the world economy introduced in the last chap-
ter. The data consists of 120 quarterly observations from 1962:1 to 1991:4 of the
V.S. GNP in prices of 1987, the GNP of Japan, the West German GNP, the GDP in
the Vnited Kingdom, and the Canadian GDP in constant prices of 1985, finally the
Australian GDP in constant prices of 1984/85. Logarithms of seasonally adjusted
data (Source: OECD) are used.
Thus in the following we refer to the system:
yPSA
y:AP
yfRG
Yt =
yp K
yfAN
y~US
13.4. Global and International Business Cyc1es 313
By subtracting the value of 1960:1, the system has been normalized such that the
time series vector is equal to zero in 1960: 1. Note that in contrast to the last chapter,
the non-stationary univariate components of the multiple time series are not differ-
enced prior to modelling the MS-VAR process, which is now defined in levels. The
ordering of the variables corresponds to the size of the national economies and to
their possible importance for the world economy and, thus, the international busi-
ness cycle. In particular, this ordering ensures that in the usual way orthogonalized
shocks in the V.S. economy may have an instantaneous impact on all other variables
of the system.
The variables under consideration are plotted in Figure 13.1. Obviously there is a
strong parallel trending movement, which suggests possible cointegration. Interest-
ingly, there seems to be a break in the trend of the Japanese GNP as seen in the last
chapter.
This section evaluates the cointegration features of international and global business
cycles on the basis of a finite pure VAR(P) approximation ofthe system. Hence the
following cointegration analysis does not consider the latent Markovian regime shifts
explicitly. This enables us to perform the cointegration analysis with the Johansen
ML procedures for linear cointegrated systems. With this estimated model, some
issues related to cointegration will be discussed.
Initially we determine the order of the VAR approximation and the cointegration rank
ofthe system. All calculations in this seetion have been carried out with MULTI (cf.
LÜTKEPOHL et al. [1993]).
13.4.1 VAR Order Selection
We have applied VAR order selection criteria (cf. LÜTKEPOHL [1991, sec. 11.4.1.])
with a maximum order of 8. Four different criteria have been used for specifying
the VAR order. The Schwarz criterion (SC) and the Hannan-Quinn criterion (HQ)
estimated the order p = 1 of a VAR approximation ofthe system, while the Akaike
(AlC) and thefinal prediction error (FPE) criteria support a larger model, p = 2.
For finite VAR processes, SC and HQ are both consistent while AlC is not a con-
sistent criterion. This would justify to choose the order ß = 1, thus, restricting the
dynamics of the model to the equilibrium correction mechanism exclusively.
Table 13.1: Cointegration Analysis: VAR( 1) Model
Rank Eigenvalue Trace t Max t Loading Cointegration

r LR Test LR Test Matrix B Matrix C'
0 0.408390 128.00** 63.00** USA 0.18127 0.21960
0.203130 65.00 27.20 JAP 0.60252 0.09798
2 0.123050 37.70 15.80 FRG 0.45074 -0.21563
3 0.105790 22.00 13.40 UK 0.24955 0.03702
4 0.045427 8.55 5.58 CAN 0.28878 -0.04310
5 0.024494 2.98 2.98 AUS 0.49677 -0.14984
t Trace test for cointegration rank: Ho: rank = r versus Hl: r < rank ::; K.
t Maximum eigenvalue test for cointegration rank: Ho: rank = r versus Hl: rank = r + 1.
** Significant at 1%level, * significant at 5% level.
Percentage points ofthe asymptotic distribution are taken from OSTERWALD-LENUM [1992].
However, since the true model is assumed to be subject to Markovian regime shifts,
under the present conditions any finite VAR order is only an approximation. There-
fore, we have performed the cointegration analysis using both specifications with
p = 1 and p = 2. An intercept term was included in the VAR(P) model under con-
sideration.
13.4.2 Cointegration Analysis
Since the assumed latent Markovian shifts in regime imply a data generating
VARMA process, the Johansen-Saikkonen test statistic would be a natural test-
ing procedure. However as already mentioned in Section 13.3, LÜTKEPOHL &
CLAESSEN [1996] and SAIKKONEN & LUUKKONEN [1995] found that for small
sampies the Johansen statistic is more closely approximated by the (identical)
asymptotic distribution.
Therefore, we will perform the statistical analysis based on the JOHANSEN approach
[1988], [1991] to maximum likelihood estimation of cointegrated linear systems.
As the employed finite VAR model is only an approximation of the data generat-
13.4. Global and International Business CycJes 315
Tab1e 13.2: Cointegration Analysis: VAR(2) Model
Rank Eigenvalue Trace t Max t Loading Cointegration

r LR Test LR Test Matrix B Matrix C'
0 0.323000 103.00** 46.80** USA -0.0001 0.3320
1 0.167180 56.50 22.00 JAP 0.4770 0.1401
2 0.106900 34.50 13.60 FRG 0.3277 -0.2912
3 0.084319 21.00 10.60 UK 0.2875 -0.0207
4 0.045713 10.40 5.61 CAN 0.0940 -0.0449
5 0.038092 4.77 4.77 AUS 0.4141 -0.2099
t Trace test for cointegration rank: Ho: rank = r versus Hl: r < rank S; K.
:/: Maximum eigenvalue test for cointegration rank: Ho: rank = r versus Hl: rank = r + 1.
** Significant at 1% level, * significant at 5% level.
Percentage points of the asymptotic distribution are taken from OSTERWALD-LENUM [1992].
ing mechanism, the estimation of the cointegration matrix is not a full-information

method.
Initially we have determined the cointegration rank of the system. Table 13.1 shows
the results of the Johansen trace and maximum eigenval ue test for the VAR( 1) model,
where the critical values from ÜSTERWALD-LENUM [1992] correspond to the situ-
ation where the variables exhibit deterministic trends; the significance levels are
valid for the individual tests only. As shown by SAIKKONEN & LUUKKONEN
[1995], these tests maintain their asymptotic validity even if the true VAR order is
infinite. Both Johansen tests strongly support a cointegration rank of r = 1 for the
VAR(1) model as weIl as forthe VAR(2) model (cf. Table 13.2). Thus, K - r = 5
linearly independent stochastic trends remain.
Since our main interest is the analysis of the effects of regime shifts, we restrict our
analysis of the VAR approximations to the long-run properties of the system. The
estimated cointegration vector is quite similar in both specifications:
VAR(I): Zt y~SA + 0.4462y:AP - O.9819yfRG + O.1686y~K

-O.1963yfAN - O.6823ytUS
VAR(2): Zt yfSA + 0.4218y:AP - O.8770yfRG - O.0622yfK

-O.1352y;AN - O.6321y~US
where we have normalized the cointegration vector so that the U .S. coefficient equals
1 and the constant has been suppressed. The highest weight for the USA in the coin-
tegration relationship in conjunction with the positive elements of the loading matrix
for the rest of the world point out the dominance of the U. S. economy for the global
economic system.
An eITor in this long-run relationship due to a country-specific positive shock in the

USA is cOITected by an increased growth in the rest of the world (ROW). But, in the
VAR(2) system, U.S. GDP does not react on deviations from the long-run equilib-
rium. Interestingly, a deviation in the equilibrium caused by a positive shock in the
German economy has negative impact effects on the system.
While the equilibrium of both models are quite similar, the VAR(l) model restricts
the dynamics of the model to the equilibrium cOITection mechanism by
b..Yt = I1Yt-1 + v + Ut,

where II = Al - I K . Since the finite order VAR model is intended to approximate
the dynamic propagations produced by the Markov switching intercept term, we will
concentrate our discussion on the CI(1)-VAR(2) model given in Table 13.6.
13.4.3 Granger Causality
Cointegration entails Granger causality (GRANGER [1969], [1986]) in at least one

direction. For the data set under consideration, Granger causality has been checked
in the finite VAR approximation of an MS-VAR model.
In pure VAR models, tests for Granger-causality can be based on Wald tests for a set
of linear restrictions. The vector Yt is partitioned into the single time series Xt and
the resulting 5-dimensional rest-of-the-world system ROW t , such that
A 12 .i
A 22 ,i
1'
Then Xt does not Granger-cause ROW t if and only ifthe hypothesis
Ho: A 21 ,i = Ofori = l, ... ,p

13.4. Global and International Business Cyc1es 317
Tab1e 13.3: Granger-Causality
x ROW-f.x ROWf+x x-f.ROW

USA 1.22 16.96** 4.97**
JAP 2.70* 9.86 0.88
FRG 2.33* 9.89 2.17*
UK 0.09 11.44* 0.86
CAN 4.29* 17.58** 0040
AUS 1.86 5044 1.13
** Significant at 1% level, * significant at 5% level.
is true (see e.g. LÜTKEPOHL [1991, sec. 11.3.2.]).
In order to study Granger causality in an MS-VAR model it is not sufficient the

check its AR coefficients. Even if there exists no linear interrelation, the second
group of variables might reveal information about the actual regime E[~t IYlt , Y2t] =j:.
E[~t IYlt] and thus improve the forecast of the first group if the regime matters, i.e.
E[Yl t+ 11~t] =j:. E[Ylt+l]. Naturally, if the VAR coefficients are allowed to be regime-
dependent, Granger causality may change from regime to regime.
The aspects of testing Granger causality in MSCI-VAR models are not solved at
this stage of research. For stationary MSM-VAR and MSI-VAR processes, Gran-
ger eausality eould be eheeked on the basis of their VARMA representations (cf.
LÜTKEPOHL [1991, eh. 6.7.1]). As shown by LÜTKEPOHL & POSKITT [1992] for
stationary VARMA processes and SAIKKONEN & LÜTKEPOHL [1995] for cointeg-
rated processes, tests for Granger causality eould then be based on finite order VAR
approximations.
Table 13.3 gives the results for a pure finite VAR approximation ofthe data set under
eonsideration. The significanee levels given in Table 13.3 are valid for a X2 distribu-
tion with degrees offreedom equal to the number of zero restrictions (i.e. x2 (10) for
the Ho: ROW-f. x and X2 (50) for Ho: x -f. ROW). However, as already noted, the
asymptotic distribution of the Wald statistic could be non-standard as in finite-order
eointegrated VAR proeesses; on the other hand, overfitting could be helpful as in the
modified testing strategy of DOLADO & LÜTKEPOHL [1996].
The one-directional Granger causality of the US growth rate demonstrates the im-
portance of the U.S. economy for the world economy (significant at 1%). In addi-
tion to the Uni ted States, only West German econornic data seems to have a predict-
ive power for the rest of the system (significant at 5%). West Germany, Canada and
Japan are highly dependent on the state of global macroeconomic activity as they
are respectively Granger caused by the rest of the world. There are no statistically
significant findings for the United Kingdom and Australia.
The test results for instantaneous causality ROW t-+ x are given in Table 13.3, too.
There is instantaneous causality between Xt and ROWt if and only if HO:~12 =
~21 = 0 is true (cf. e.g. LÜTKEPOHL [1991, sec. 2.3.1.]). In addition to the dy-
namic propagation of econornic shocks, evidence for contemporaneous shocks is es-
tablished for the USA, Canada, and the UK.
Altogether, there is statistically founded evidence of a highly interdependent system

(with the exception of Australia). Nevertheless, the asymptotic distribution of the
employed test statistics merits further investigation.
13.4.4 Forecast Error Decomposition
If the process is stationary, forecast eITor impulse responses are the coefficients of
the Wold MA representation,
=L
00
Yt <PiCt-i with <Po = IK ,

i=O
where the process mean has been set to zero for simplicity.
While such a Wold representation does not exist for cointegrated processes, the (K x
K) matrix <Pi can be calculated recursively (cf. LÜTKEPOHL [1991, sec. 11.3.1]) as
<Pi = L <Pi_jA j , with A j = 0 for j > p,

j=l
and the kl-th element of <Pi can be interpreted as the response of variable k to an
impulse in variable l, i periods ago.
The presence of contemporaneous cOITelation suggests considering the orthogonal-

ized forecast eITor impulse responses. Table 13.4 gives the decomposition of the vari-
anee of the orthogonalized eITors for a foreeast horizon of 40-quarters.
13.5. Global Business Cyc1es in a Cointegrated System 319
Table 13.4: Forecast EITor Decomposition
USA JAP FRG UK CAN AUS

USA 0.9252 0.0154 0.0395 0.0146 0.0013 0.0041
(0.0960) (0.0255) (0.0782) (0.0182) (0.0056) (0.0132)
JAP 0.2471 0.3120 0.3381 0.0003 0.0093 0.0933

(0.1302) (0.1266) (0.1136) (0.0022) (0.0167) (0.0~18)
FRG 0.6036 0.1315 0.1210 0.0069 0.0152 0.1217

(0.1150) (0.0650) (0.0807) (0.0120) (0.0216) (0.0698)
UK 0.3011 0.0454 0.1085 0.5045 0.0004 0.0401

(0.1282) (0.0481) (0.1193) (0.2199) (0.0012) (0.0442)
CAN 0.6756 0.0192 0.0857 0.0185 0.1900 0.0111

(0.0931) (0.0297) (0.1061) (0.0023) (0.1112) (0.0221)
AUS 0.4499 0.0560 0.4383 0.0004 0.0015 0.0540

(0.1318) (0.0481) (0.1332 ) (0.0014) (0.0022) (0.0436)
After 10 years, 93% of the variance of US GDP is due to own innovations, but also
68% ofthe Canadian, 60% ofthe German, 30% ofthe UK, 45% ofthe Australian and
25% of the Japanese are caused by shocks in the U.S. economy. Other than the ef-
fects ofU.S. shocks, only the own innovations in Japan and the UK and the feedback
between Japan and Germany are statistically significant. Furthermore, there is evid-
ence for effects of German shocks on the Australian process of economic growth.
It should be emphasized that the main conclusions forthe VAR(l) process are similar
to those for the VAR(2) process. However, an asymptotic theory for the forecast er-
ror decomposition under our assumptions regarding the data generating mechanism
merits future investigation.
13.5 Global Business Cycles in a Cointegrated System
The preceding discussion concentrated on the cointegration properties of the system

using a pure VAR approximation. We are now in a position to reconsider the issues
320 Cointegration Analysis of VAR Models with MarkoYian Shifts in Regime
Figure 13.2: MS(2)- VECM(1) Model

1.0 ~.. --------------~----:"a1n::~~r.:--------:~.'-:--:~::~::------------~
. .... . I'::i: :-: :.:-:::: ...
. I~:f:
.. ':r~
. :. .'. '::::::::
. ... :r:r
0.5
. .. . Fr:
'L:\ .. ''..':<>:
:::
',. . ... .. . ... ::=:::::: . . }:~t:·::H
..... . .. 1':-:-:-:
)\A.
.!.~.
.. i~ ..... . .6. : 1:::::::: t. ...
0.0 ~~~~~--~--~~~~~~--~~~~--~~----~~
60 65 70 75 80 85 90
of the last chapter.
In order to investigate the effects of the equilibrium correction mechanism to the

measurement of the global business cycle, we now allow far regime shifts in the in-
tercept term of the vector equilibrium correction representation. Consider first an
MS(2)-VECM(1) model:
where Ut f"V NID (0, E) and Zt = CYt - Cfh.
Applying the two-stage-procedure as described in the previous seetion, gives us the

estimation results presented in Table 13.7. This estimated MS(2)-VECM( 1) model
may be compared with the MSI(2)-DVAR( 1) model where the cointegration relation-
ship has been ignored.
In Figure 13.2 the filtered probabilities Pr(St = 21yt) of being in the recession-
ary state 2 and the (full sampie) smoothed probabilities Pr(St = 2IYT) are again
compared with the chronology of business and growth cyc1e tuming points of the
U.S. economy provided by the Center ofIntemational Business Cyc1e Research. The
recessionary regime is clearly associated with the two recessions after the oil price
shocks in 1973/74 and 1979/80.
Interestingly, the impact effect of a regime shift is quite heavy for the United States
and the Uni ted Kingdom, but negligible for Australia and contradictory for Canada.
13.5. Global Business Cycles in a Cointegrated System 321
So,
1.3310 -0.8542
1.3193 0.3020
0.7652 0.2173
VI - V2 = V2 =
1.2817 -0.2901
-0.3722 0.8112
0.0627 0.8647
These results should be compared to the regime classifications given in Figure 12.16
for an MSI(2)-DVAR(1) model, i.e. where the equilibrium correction mechanism
has been dropped out: C == 0; the estimation results have been given in Table 12.5.
A likelihood ratio test of the MSI(2)-OVAR( 1) model against the MS(2)-VECM( 1)
model results in
LR = -2(998.7344 - 1026.1669) = 54.865,
which will be significant at 1% if the critical values of the linear world can be ap-
plied (cf. OSTERWALD-LENUM [1992]). The rejection of the null hypothesis of no
cointegration relation against a cointegration rank r = 1 strongly supports the cointe-
gration result which has been found in the pure VAR approximations ofthe system.
As we have seen in the last chapter, conditional heteroskedasticity seems to be im-

portant for the time series under consideration. Therefore, we have estimated an
MSH(2)-VECM(1) model,
where the white noise process is heteroskedastic, Ut '" NIO (0, :E(sd), and Zt
GYt - Gy is detennined by the Johansen maximum likelihood estimation ofthe co-

integration matrix G.
The allowance of regime-dependent heteroskedasticity results in a major change in

regime-classification given in Figure 13.3. In contrast to the previous model, the
MSH(2)-VECM( 1) model is characterized by a synchronization of the impact effects
Figure 13.3: MSH(2)- VECM(1) Model

,, ,:-.
,,
J"
.,:./:
::::::::
,, ,,
1:-:-:-:
•
I
,l ~
,.,,.
•
I
,.
.,,.
.
11
.u·
.H
.. " -.
I, A
65 70 75
of a regime shift:
0.8969 -0.2041
0.4587 1.1533
0.7719 0.3981
0.9251 0.2322
0.6484 0.1702
0.6335 0.5737
Thus a turning of the 'world business cycle' causes annualized impact effects on real
econornic growth in the range from 1.8% (Japan) to 3.7% (UK). This reveals a strong
homogeneous effect of a shift in regime to the national economies.
The regime-dependent variance-covariance matrices reveal reduced volatility in the

low-growth regime
as weIl as a strong positive correlation between shocks in the United States and in
the other national economies.
These effects establish an economically meaningful reconstruction of the regime

switching process which is statistically significant. Suppose, that the LR statistic
13.6. Conc1usions 323
possesses its standard asymptotic distribution. Then, under the number of regimes
preserving hypothesis 2; 1 = 2;2 but VI =/:. V2, the LR test statistic results in
LR = -2(980.6021- 998.7344) = 36.2646,

with X6.95(21)=32.6706 the LR statistic is significant at the 5% level.
In order to conclude our empirical analysis, the results of this chapter may be com-
pared with those of the last one. The incredibly high parameter values of the estim-
ated loading matrix emphasize the importance of the equilibrium correction mech-
anism. Economic differences in the regime classification occur with regard to the
double-dip characterization of the recession 1979/80 and 1981/82. Finally, an addi-
tionallow growth period for 1966/67 is identified.
13.6 Conclusions
Given the recent popularity of cointegrated systems in macroeconometrics and the

evidence of Markovian shifts in macroeconomic systems confirmed in the foregoing
chapters, the MSCI(M, r)- VAR(P) model has been introduced as a p-th order vector
autoregression with cointegration rank rand M Markov switching regimes.
The theoretical analysis has focused on the modelling of Markovian regime shifts
of cointegrated systems. This issue has been linked to the notion of multiple equi-
libria in dynamic economic theory, as weIl as to the recently proposed concept of
co-breaking trends and means. The procedures proposed for the statistical analysis
of cointegrated systems subject to changes in regime have been based on the infinite
cointegrated VAR representation of MSCI-VAR models.
While there is much work that can and will be done on this class of models in the
near future, the main results of our investigation can be summarized:
(i.) The data generating MSCI(M,r)-VAR(p) process is associated with a non-

normal cointegrated VARMA representation and a stable state-space represen-
tation of its MS-VECM form.
(ii.) Methods of estimating the MS-VECM form of MSCI-VAR models with

known cointegration relationships are easily available by a simple extension
of the methods proposed in this study for stationary systems.
(iii.) For the statistical detenrunation of the cointegration relationship, i.e. tests
of the cointegration rank rand the cointegration matrix C, the non-normal
VARMA process may be approximated by a finite pure VAR(P) process, which
allows the application of the Johansen ML analysis of cointegrated linear sys-
tems.
(iv.) An asymptotic theory for the statistical methods of testing the cointegration
rank may be based on the infinite cointegrated VAR representation. While the
development of an asymptotic theory has been beyond the scope of this chap-
ter, there is hope that research currently in progress will provide a theoretical
basis. In particular, a theory of infinite VAR processes with drift would be
able to solve currently existing problems. As long as this theory does not ex-
ist, some of our results remain provisional.
The empirical evidence found in cointegration analysis of the considered six-

country-system confirmed the theoretical analysis. The feasibility of the proposed
methods has been evaluated. For the analysis of international and global business
cycles, it has been shown that the implications of cointegration on the derived regime
classifications are econornically relevant. Whether the better fit of the cointegrated
MSCI-VAR model is translated into improved ex-ante forecasting abilities remains
to be investigated.
13.A. Appendix: Estimated CI-VAR and MSCI- VAR Models 325
13.A Appendix: Estimated CI-VAR and MSCI-VAR

Models
Table 13.5: CI(1)-VAR(1)
1.0398 0.1776 -0.0391 0.0067 -0.0078 -0.0272

(0.0184) (0.0082) (0.0180) (0.0031) (0.0036) (0.0125)
0.1323 1.0590 -0.1299 0.0223 -0.0260 -0.0903
(0.0193) (0.0086) (0.0190) (0.0033) (0.0038) (0.0132)
0.0990 0.0442 0.9028 0.0167 -0.0194 -0.0675
(0.0264) (0.0118) (0.0259) (0.0044) (0.0054) (0.D180)
Ä1
0.0548 0.0245 -0.0538 1.0092 -0.0108 -0.0374
(0.0275) (0.0123) (0.0270) (0.0046) (0.0054) (0.0187)
0.0634 0.0283 -0.0623 0.0107 0.9875 -0.0433
(0.0197) (0.0088) (0.0173) (0.0033) (0.0039) (0.0134)
0.1091 0.0487 -0.1072 0.0184 -0.0214 0.9256
(0.0214) (0.0095) (0.0210) (0.0036) (0.0042) (0.0146)
-I
/.I
= [ 0.8870
(0.1170)
2.0371
(0.1230)
1.1788
(0.1681)
0.8115
(0.1753)
1.2567
(0.1255)
1.4204
(0.1362) ]
0.71625
0.10359 0.86969
0.01348 0.25282 1.50300
f;
0.15058 0.25777 0.36596 1.67290
0.26992 0.06452 0.02290 0.14447 0.69501
0.10997 -0.01236 -0.06937 0.11247 0.16103 1.07690
Table 13.6: CI(1)- VAR(2)
1.2002 0.0327 0.0350 0.1337 0.0681 0.0980

(0.0965) (0.0796) (0.0612) (0.0059) (0.0913) (0.0748)
0.1353 0.9991 -0.0100 0.0621 -0.1055 0.0174
(0.1063) (0.0877) (0.0675) (0.0645) (0.1007) (0.0824)
0.0574 -0.0966 0.7217 0.1239 -0.1625 -0.1029
(0.1398) (0.1153) (0.0887) (0.0848) (0.1323) (0.1083)
Ä1 = 0.2363 -0.1511 -0.1243 0.7669 0.0750 -0.0139
(0.1475) (0.1217) (0.0936) (0.0894) (0.1396) (0.1143)
0.4335 -0.0943 0.1257 0.0941 0.9812 0.1791
(0.0951) (0.0784) (0.0603) (0.0576) (0.0900) (0.0074)
0.2007 -0.0188 0.0333 -0.0361 0.0425 0.8347
(0.1183) (0.0976) (0.0751) (0.0717) (0.1120) (0.0917)
-0.2002 -0.0327 -0.0350 -0.1337 -0.0681 -0.0980

(0.0991) (0.0847) (0.0610) (0.0585) (0.0912) (0.0707)
0.0230 0.0677 -0.1289 -0.0720 0.0840 -0.1175
(0.1092) (0.0934) (0.0673) (0.0644) (0.1005) (0.0779)
-0.4650 0.1425 0.1829 -0.1307 0.1478 0.0341
(0.1436) (0.1228) (0.0884) (0.0847) (0.1321) (0.1024)
Ä2 = -0.1409 0.1913 0.0406 0.2272 -0.0879 -0.0464
(0.1515) (0.1295) (0.0933) (0.0894) (0.1394) (0.1081)
-0.4023 0.1074 -0.1531 -0.0960 0.0146 -0.1988
(0.0976) (0.0083) (0.0601) (0.0576) (0.0898) (0.0697)
-0.0633 0.0768 -0.1538 0.0276 -0.0611 0.0784
(0.12154) (0.1039) (0.0749) (0.0717) (0.1118) (0.0867)
-I
v
[ 0.2544
(0.2074)
1.8218
(0.2286)
1.1439
(0.3005)
1.0311
(0.3170)
0.0584
(0.2043)
1.2833
(0.2543) ]
0.96605
0.20817 1.13730
0.09887 -0.08849 1.73180
f: = 0.14430 0.14067 0.44194 1.88420
0.43287 0.12426 0.05508 0.10539 0.83940
0.12903 0.00464 0.21587 0.22790 0.13490 0.92838

13.A. Appendix: Estimated CI-VAR and MSCI-VAR Models 327
Table 13.7: MSCI(2,1)-VECM(1)
0.1183 -0.0363 0.0276 0.1118 0.1294 0.0830

-0.1043 -0.1362 0.1215 0.0502 -0.0232 0.1027
0.4179 -0.1822 -0.1872 0.1181 -0.1125 -0.0427
Dl = 0.0620 -0.2579 -0.0477 -0.2483 0.1470 0.0320
0.4252 -0.0881 0.1551 0.1021 -0.0317 0.2030
0.0594 -0.0800 0.1535 -0.0286 0.0640 -0.0791
0.6200 0.0082 -0.0418 0.0580 0.2968 0.1055

0.0082 0.7752 0.1980 0.1659 0.0912 -0.0168
-0.0418 0.1980 1.4711 0.3127 0.0384 -0.0720
z:;
0.0580 0.1659 0.3127 1.5836 0.1706 0.1081
0.2968 0.0912 0.0384 0.1706 0.6875 0.1623
0.1055 -0.0168 -0.0720 0.1081 0.1623 1.0767
0.0129 0.4768 -0.8542

0.4900 1.6213 0.3020
0.3354 0.9825 0.2173
B = 111 = 112 =
0.3001 0.9916 -0.2901
0.0903 0.4390 0.8112
0.4148 0.9274 0.8647
P [ 0.9789 0.0211 ] ~= [ 09346] E[h] = [ 47.4550 ]

0.3013 0.6987 ' ~ 0.0654' 3.3186
In L = -998.7344
Table 13.8: MSHCI(2,l)-VECM(1)
0.1022 0.0153 -0.0214 0.1273 0.0408 0.0636

-0.0853 -0.0709 0.1513 0.0199 -0.1144 0.1198
0.3400 -0.1153 -0.2065 0.1202 -0.1173 -0.1068
ih = -0.0079 -0.1851 -0.0729 -0.2202 0.1612 -0.0490
0.3004 -0.0906 0.1359 0.0900 -0.0792 0.1799
-0.0482 -0.0596 0.1516 -0.0677 0.0427 -0.1058
0.5560 -0.0565 -0.1884 -0.0514 0.1987 0.0454

-0.0565 0.8842 0.2002 0.1038 0.1361 -0.0024
-0.1884 0.2002 1.5925 0.4892 0.0012 -0.0486
~1
-0.0514 0.1038 0.4892 1.8304 0.1426 0.0221
0.1987 0.1361 0.0012 0.1426 0.6361 0.0915
0.0454 -0.0024 -0.0486 0.0221 0.0915 1.0905
0.6704 0.2975 0.1657 0.2108 0.1256 -0.0521

0.2975 0.7387 0.2240 0.4600 -0.2919 -0.1858
0.1657 0.2240 1.0106 -0.2843 -0.1756 -0.3765
~2
0.2108 0.4600 -0.2843 0.8572 -0.1792 0.0486
0.1256 -0.2919 -0.1756 -0.1792 0.6474 0.1388
-0.0521 -0.1858 -0.3765 0.0486 0.1388 0.8309
0.0161 0.6928 -0.2041

0.4450 1.6120 1.1533
0.2215 1.1700 0.3981
B = 1/1 = 1.1573
1/2 =
0.2322
0.1762
0.0745 0.8186 0.1702
0.3932 1.2072 0.5737
P [ 0.9328 0.0672] , ~ = [ 07076] , Elhl = [ 14.8713]

0.1628 0.8372 ~ 0.2924 6.1444
in L = -980.6021
Epilogue
A study like the present one can of course make no claims of encyclopedic com-
pleteness and it would be pointless to list aB the concepts which are related to the
MS-VAR model but which have not been discussed in this presentation. If this study
may have intended to develop an operation al econometric approach for the statist-
ical analysis of economic time series with MS-VAR models, then we can conc1ude
that some progress has been made. Concerning inter alia the flexibility of modelling
and the computational effort of estimation, this study has put forward the MS-VAR
model as an alternative to linear, normal systems. In some other respects our res-
ults are more preliminary, but realisticaBy we could not have expected to resolve all
problems.
It must be emphasized that the previous analysis rests on some basic assumptions
and most of our results will not hold without them. To maximize the effectiveness
of our efforts, some results have been restricted to processes where the shift in re-
gime affects only the level or the drift of a time series vector. One basic assumption
has been related to the class of processes considered. In most chapters, the presump-
tion has been made that the data is generated by a stationary (respectively difference-
stationary) stochastic process which eXc1udes e.g. the presence of cointegration. In
the last chapter, we have introduced - with the MSCI-VAR model- a cointegrated
vector autoregressive model where Markovian shifts occur in the mean of the cointe-
gration relation and the drift of the system. A number of fundamental methods have
been proposed to analyze them. Further research is required on this topic which we
believe to be of central theoretic and practical importance.
We are aware that we do not possess an asymptotic estimation and testing theory for
the MS-VAR model in general. We have presupposed that regularity conditions of
330 Epilogue
general propositions proven for general state-space models and non-linear models
are satisfied and there is no indication that they are not fulfilled for the processes un-
der consideration. Thus the non-standard asymptotics involved in the determination
of the number of regimes seem to be simply an exception. Several procedures have
been proposed to allow statistical analysis in practice.
As the VARMA representation theorems indicate, Markov-switching vector autore-

gressions and linear time series models can be more dosely connected than has been
usually suggested. Such MS-VAR processes might be approximated by linear pro-
cesses without a great loss in ex-ante forecast accuracy. On the other hand, features
such as conditional heteroskedasticity and non-normality will be lost, both of which
have recently become a growing area of interest.
Finally, we have only sketched the potential contribution of the MS-VAR model to
business cyde analysis. Our analysis was restricted to the highest possible aggreg-
ation level, where the macroeconomic activity has been summed-up contemporan-
eously in a single time series. Further research has to be undertaken to construct
a comprehensive statistical characterization of national, international and global
macroeconomic ftuctuation-generating forces. The considerable fact that MS-VAR
models possess, in most applications, an intuitive economic interpretation should not
be underestimated for it enables a dialog needed between econometricans and eco-
nomists, who are working non-quantitatively.
Further investigations are desired on generalizations of models and instruments; sev-

eral ideas, e.g. on the analysis of structural dynamics which are subject to shifts in
regime and the endogenous determination of regime shifts have been noted in the
study. The asymptotic aspects of estimation and testing in MS-VAR models are not
resolved at this stage (and have been beyond the scope ofthis study), although many
promising new directions far future research have been discovered.
While some of our theoretical results remain provisional under the aforementioned
limitations, the presented applications already underline the usefulness of the MS-
VAR model and the methods proposed in this study for empirical research. It is
hoped that although the previous discussion has identified areas of necessary further
theoretical developments, this study will also provide a useful systematic basis for
empirical investigations with Markov-switching vector autoregressions.
R eferen ces
ALBERT, J., & CHIB, S. [1993]. "Bayes inference via Gibbs sampling of auto-
regressive time series subject to Markov mean and variance shifts". Journal
of Business & Economic Statistics, 11, 1-16.
ANDERS ON , D. D. 0., & MOORE, J. B. [1979]. Optimal Filtering. Englewood

Cliffs, New Jersey: Prentice Hall.
ANDREWS, D. W. K. [1993]. "Tests for parameter instability and structural change

point". Econometrica, 61,821-856.
ANDREWS, D. W. K., & PLOBERGER, W. [1994]. "Optimal tests whenanuisance

parameter is present only under the alternative". Econometrica, 62, 1386-1414.
AOKI, M., & HAVENNER, A. [1991]. "State space modeling of multiple time
series". Econometric Reviews, 10, 1-59.
AOKI, M. [1990]. State Space Modeling ofTime Series. Berlin: Springer Verlag,
2nd Edition.
BÄRDSEN, G., FISHER, P. G., & NYMOEN, R. [1995]. Business Cycles: Real Facts
or Fallacies? University of Os10, working paper.
BAUM, L. E., & EAGON, J. A. [1967]. "An inequality with applications to statist-
ical estimation for probabilistic functions of Markov chains and to a model for
ecology". Bull. American Mathematical Society, 73,360-363.
BAUM, L. E., & PETRIE, T. [1966]. "Statistical inference forprobabilistic functions

of finite state Markov chains". Annals of Mathematical Statistics, 37, 1554-
1563.
332 References
BAUM, L. E., PETRIE, T., SOULES, G., & WEISS, N. [1970]. "A maximiza-
tion technique occurring in the statistical analysis of probabilistic functions of
Markov chains". Annals oi Mathematical Statistics, 41, 164-171.
BERNDT, E. K., HALL, B. H., HALL, R. E., & HAUSMAN, J. A. [1974]. "Es ti-
mation and inference in nonlinear structural models". Ann. Econ. Social Meas-
urement, 3/4, 653-665.
BlANCHI, M. [1995]. Detecting Regime Shifts by Kernel Density Estimation. Bank

of England, working paper.
BILLINGSLEY, P. [1968]. Convergence oi Probability Measures. New York: Wiley.
BILLIO, M., & MONFORT, A. [1995]. Switching State Space Models. Likelihood
Function, Filtering and Smoothing. CREST working paper.
BLACKWELL, E., & KOOPMANS, L. [1975]. "On the identifiability problem for
functions of finite Markov chains". Annals oi Mathematical Statistics, 28,
1011-1015.
BOLLERSLEV, T., & WOODRIDGE, J. M. [1992]. "Quasi-maximumlikelihoodesti-

mation and inference in dynamic models with time-varying covariances". Eco-
nometric Reviews, 11, 143-172.
Box, G. E. P., & TIAO, G. C. [1968]. "A Bayesian approach to some outlier prob-
lems". Biometrika, 55, 119-129.
BRANDT, A. [1986]. "The stochastic equation Y n+1 = AnYn + B n with stationary

coefficients". Advances in Applied Probability, 18, 211-220.
BURNS, A. F., & MITCHELL, W. C. [1946]. Measuring Business Cycles. New

York: NBER.
CAI, J. [1994]. HA Markov model of switching-regime ARCH". Journal oiBusiness

& Economic Statistics, 12,309-316.
CARLIN, B. P., POLSON, N. G., & STOFFER, D. [1992]. "AMonteCarloapproach

to nonnormal and nonlinear state-space modelling". Journal oi the American
Statistical Association, 87,493-500.
References 333
CARRASCO, M. [1994]. TheAsymptotic Distributionofthe WaldStatisticinMisspe-

cified Strnctural Change, Threshold or Markov Switching Models. GREMAQ,
working paper.
CARTER, C. K., & KOHN, R. [1994]. "On Gibbs sampling for state space models".
Biometrika, 81, 541-553.
CLEMENTS, M. P., & HENDRY, D. F. [1994]. "Towards a theory of economic fore-

casting". Pages 9-52 oI HARGREAVES, C. [ed], Nonstationary Time Se ries
Analysis and Cointegration. Oxford: Oxford University Press.
CLEMENTS, M. P., & MIZON, G. E. [1991]. "Empiricalanalysisofmacroeconomic

time series. VAR and structural models". European Economic Review, 35, 887-
932.
COGLEY, T., & NASON, J. M. [1995]. "Effeets of the Hodrick-Prescott filter on

trend and difference stationary time series". Journal of Economic Dynamics
and Control, 19,253-278.
COLLINGS, 1., KRISHAMURTHY, V., & MOORE, J. [1992]. On-Une Identification

of Hidden Markov Models via Recursive Prediction Error Techniques. Aus-
tralian National University, working paper.
COSSLETT, S. R., & LEE, L.-F. [1985]. "Serial correlation in latent discrete vari-
able models". Journal of Econometrics, 27, 79-97.
CRAFTS, N. F. R. [1995]. Endogenous Growth: Lessonsfor andfrom Economic

History. University ofWarwick, working paper.
DAVIDSON, J. E. H., HENDRY, D. F., SRBA, F., & YEO, S. [1978]. "Economet-
rie modelling of the aggregate time-series relationship between eonsumers' ex-
penditure and income in the Uni ted Kingdom". Economic Journal, 88, 661-
692.
DAVIDSON, R., & MACKINNON, 1. G. [1981]. "Several tests for model specifica-
tion in the presence of alternative hypotheses". Econometrica, 49, 781-793.
DAVIES, R. B. [1977]. "Hypothesis testing when a nuisance parameter is present

only under the alternative". Biometrika, 64,247-254.
DEGROOT, M. J. [1970]. Optimal Statistical Decisions. New York: Mc-Graw-Hill.

334 References
DEMPSTER, A. P., LAIRD, N. M., & RUBIN, D. B. [1977]. "Maximum likelihood

estimation from incomplete data via the EM algorithm". Journal ofthe Royal
Statistical Society, 39, Series B, 1-38.
DICKEY, D. A., & FULLER, W. A. [1979]. "Distribution of the estimators for auto-
regressive time series with a unit root". Journal of the American Statistical As-
sociation, 74, 427--431.
DICKEY, D. A., & FULLER, W. A. [1981]. "Likelihood ratio statistics for auto-
regressive time series with a unit roof'. Econometrica, 49, 1057-1072.
DIEBOLD, F. X., & RUDEBUSCH, G. D. [1996]. "Measuring business cycles: A

modem perspective". Review of Economic Studies, 78, 67-77.
DIEBOLD, F. X., RUDEBUSCH, G. D., & SICHEL, D. E. [1993]. "Further evid-

ence on business cycle duration dependence". Pages 255-280 oi' STOCK, J.,
& WATSON, M. [eds] , Business Cycles,lndicators, and Forecasting. Chicago:
University of Chicago Press and NBER.
DIEBOLD, F. X., LEE, J. H., & WEINBACH, G. C. [1994]. "Regimeswitching with

time-varying transition probabilities". Pages 283-302 oi' HARGREAVES, C.
red], Non-stationary Time-series Analyses and Cointegration. Oxford: Oxford
University Press.
DOAN, T., LITTERMAN, R. B., & SIMS, C. [1984]. "Forecasting and conditional
projection using realistic prior distributions". Econometric Reviews, 3, 1-144.
DOLADO, J. J., & LÜTKEPOHL, H. [1996]. "Making wald tests work for cointeg-
rated var systems". Econometric Reviews, forthcoming.
DURLAND, J. M., & MCCURDY, T. H. [1994]. "Duration dependent transitions

in a Markov model of U.S. GNP growth". Journal of Business & Economic
Statistics, 12, 279-288.
EMERSON, R. A., & HENDRY, D. F. [1995]. An Evolution of Forecasting using

Leading lndicators. Bank ofEngland, working paper.
EN GLE, R. F. [1982]. "Autoregressive conditional heteroskedasticity with estimates

of the variance of UK inflation". Econometrica, 50, 987-1008.
References 335
ENGLE, R. F., & GRANGER, C. W. J. [1987]. "Co-integration and errorcorrection:

Representation, estimation and testing". Econometrica, 55, 251-276.
FILARDO, A. J. [1994]. "Business-cycle phases and their transitional dynamies".

Journal ofBusiness & Economic Statistics, 12, 299-308.
FRlEDMANN, R. [1994]. Untersuchung der Zinsentwicklung auf Regimewechsel.

Universität des Saarlandes, working paper.
FUNKE, M., HALL, S. G., & SOLA, M. [1994]. Rational Bubbles du ring Poland's
Hyperinflation: Implications and Empirical Evidence. Humboldt-Universität
zu Berlin, Discussion Paper 17.
GALLANT, A. R. [1977]. "Testing for a nonlinear regression specification". Journal

of the American Statistical Association, 72, 523-530.
GARCIA, R. [1993]. Asymptotic Null Distribution ofthe Likelihood Ratio Test in

Markov Switching Models. Universite de Montreal, working paper.
GARCIA, R., & PERRON, P. [1990]. An Analysis ofthe Real Interest Rate under
Regime Shifts. Princeton University, working paper.
GARCIA, R., & SCHALLER, H. [1995]. Are the Effects of Monetary Policy Asym-
metric? Universite de Montreal, working paper.
GELFAND, A. E., & SMITH, A. F. M. [1990]. "Sampling-based approaches to cal-

culating marginal densities". Journal of the American Statistical Association,
85, 398-409.
GEMAN, S., & GEMAN, D. [1984]. "Stochastic relaxation, Gibbs distributions and
the bayesian restoration of images". IEEE Trans. on Pattern Analysis and Ma-
chine Intelligence, 6, 721-741.
GEWEKE, J. [1994]. "Priors for macroeconomic time series and their application".
Econometric Theory, 10,609-632.
GHYSELS E. [1994]. "On the periodic strueture ofthe business eyde". Journalof
j
Business & Economic Statistics, 12, 289-298.
GHYSELS, E. [1993]. A Time Series Model with Periodic Stochastic Regime Switch-
ing. Universite de Montreal, working paper.
336 References
GOLDFELD, S. M., & QUANDT, R. E. [1973]. "A Markov model for switching
regressions". Journal of Econometrics, 1, 3-16.
GOODWIN, T. H. [1993]. "Business-cycle analysis with a Markov-switching

model". Journal of Business & Economic Statistics, 11, 331-339.
GRAN GER, C. W. J. [1969]. "Investigating causal relations by econometric models

and cross spectral methods". Econometrica, 37, 424-438.
GRAN GER, C. W. J. [1981]. "Some properties oftime series data and their use in
econometric models". Journal of Econometrics, 16, 121-130.
GRANGER, C. W. J. [1986]. "Developments in the study of cointegrated economic

variables". Oxford Bulletin of Economics and Statistics, 48, 213-228.
GRANGER, C. W. J., & SWANSON, N. R. [1994].Anintroductiontostochasticunit

root processes. UCSD working paper.
GRAN GER, C. W. J., & TERÄSVIRTA, T. [1993]. Modelling nonlinear economic

relationships. Oxford University Press.
HAGGAN, V., & OZAKI, T. [1981]. "ModeUing nonlinear random vibrations us-
ing an amplitude-dependent autoregressive time series model". Biometrika, 68,
189-196.
HALL, S. G., & SOLA, M. [1993a]. A Generalized Model ofRegime Changes Ap-
plied to the US Treasury Bill Rate. CEF discussion paper 07-93.
HALL, S. G., & SOLA, M. [1993b]. StructuralBreaksandGARCH Modeling. CEF

discussion paper 20-93.
HAMILTON, J. D. [1988]. "Rational-expectations econometric analysis of changes

in regime. An investigation of the term structure of interest rates". Journal of
Economic Dynamics and Control, 12, 385-423.
HAMILTON, J. D. [1989]. "A new approach to the economic analysis ofnonstation-

ary time series and the business cycle". Econometrica, 57, 357-384.
HAMILTON, J. D. [1990]. "Analysis oftime series subject to changes in regime".

Journal ofEconometrics, 45,39-70.
References 337
HAMILTON, J. D. [1991a]. "Aquasi-bayesian approach to estimating parameters for

mixtures of normal distributions". Journal oJ Business & Eeonomic Statisties,
9,27-39.
HAMILTON, J. D. [1991b]. Speeijieation Testing in Markov Switching Time Series

Models. University of Virginia, working paper.
HAMILTON, J. D. [1993]. "Estimation, inference, and forecasting of time series

subject to changes in regime". In: MADDALA, G. S., RAO, C. R., & VINOD,
H. D. [eds],HandbookoJStatisties, vol. 11. Amsterdam: North-Holland.
HAMILTON, J. D. [1994a]. "State-space models". In: ENGLE, R., & McFADDEN,

D. [eds], Handbook oJ Econometries, vol. 4. Amsterdam: North-Holland.
HAMILTON, J. D. [1994b]. Time Series Analysis. Princeton: Princeton University

Press.
HAMILTON, J. D., & LIN, G. [1994]. Stock Market Volatility and the Business
Cycle. UCSD working paper.
HAMILTON, J. D., & SUSMEL, R. [1994]. "Autoregressive heteroskedasticity and

changes in regime". Journal oJ Econometries, 64, 307-333.
HANNAN, E. J., & RISSANEN, J. [1982]. "Recursive estimation of mixed auto-

regressive moving average order". Biometrika, 69,81-94.
HANSEN, B. E. [1992]. "The likelihood ratio test under non-standard conditions:

Testing the Markov switching model of GNP". Journal oJ Applied Eeonomet-
ries, 7, S61-S82.
HANSEN, B. E. [1996a]. "Erratum: the likelihood ratio test under non-standardcon-

ditions: Testing the Markov switching model of GNP". Journal oJApplied Eeo-
nometries, 11, 195-199.
HANSEN, B. E. [1996b]. "Inference when a nuisance parameter is not identified

under the null". Eeonometriea, 64, 414-430.
HARRISON, P. J., & STEVENS, C. F. [1976]. "Bayesian forecasting". JournaloJ

the Ameriean Statistical Association, 38 B, 205-247.
HARVEY, A. C., & JAEGER, A. [1993]. "Detrending, stylized facts and the business
cyde". Journal oJ Applied Eeonometries, 8, 231-247.
338 References
HELLER, A. [1965]. "On stoehastie proeesses derived from Markov ehains". Annals
of Mathematical Statistics, 36, 1286-1291.
HENDRY, D. F. [1996]. A Theory ofCo-breaking. University ofOxford, working

paper.
HESS, G. D., & IWATA, S. [1995]. Measuring Business Cycle Features. University
of Kansas, Research Papers in Theoretieal and Applied Eeonomics No. 1995-6.
HILDRETH, c., & HOUCK, J. P. [1968]. "Some estimators for a linear model with
random coefficients". Journal of the American Statistical Association, 63, 584-
595.
HOLST, J., & LINDGREN, G. [1991]. "Reeursive estimation in mixturemodels with

Markov regime". IEEE Trans. Information Theory, IT-37, 1683-1690.
HOLST, J., & LINDGREN, G. [1995]. "Reeursive estimation of parameters in

Markov modulated Poisson processes" . IEEE Trans. on Pattern Analysis and
Machine Intelligence, forthcoming.
HOLST, U., LINDGREN, G., HOLST, J., & THUVESHOLMEM, M. [1994]. "Reeurs-
ive estimation in switehing autoregressions with a Markov regime". Journalof
Time Series Analysis, 15,489-506.
JOHANSEN, S. [1988]. "Statistieal analysis of cointegration veetors". Journalof

Economic Dynamics and Control, 12, 231-254.
JOHANSEN, S. [1991]. "Estimation and hypothesis testing of cointegration vectors

in gaussian vector autoregressive models". Econometrica, 59, 1551-1580.
JOHANSEN, S. [1995]. Likelihood-Based Inference in Cointegrated Vector Auto-

regressive Models. Oxford University Press.
JUDGE, G., GRIFFITHS, W. E., HILL, R. C., LÜTKEPOHL, H., & LEE, T.-C.
[1985]. The Theory and Practice ofEconometrics. 2nd edn. New York: Wiley.
JUDGE, G., HILL, R. c., GRIFFITHS, W. E., LÜTKEPOHL, H., & LEE, T.-C.
[1988]. Introduction to Theory and Practice of Econometrics. 2nd edn. New
York: Wiley.
References 339
KÄHLER, J., & MARNET, V. [1994a]. "International business cycles and long-run
growth: An analysis with Markov-switching and cointegration models". In:
ZIMMERMAN, K. F. red], Output and Employment Fluctuations. Heidelberg:
Physica Verlag.
KÄHLER, J., & MARNET, V. [1994b]. "Markov-switching models for exchange

rate dynamics and the pricing of foreign-currency options". In: KÄHLER, J.,
& KUGLER, P. [eds], Econometric Analysis of Financial Markets. Heidelberg:
Physica Verlag.
KALMAN, R. E. [1960]. "A new approach to linear filtering and prediction prob-
lems". Journal of Basic Engineering, Transactions ofthe ASME, 82, Series D,
35-45.
KALMAN, R. E. [1961]. "New results in linear filtering and prediction problems".

Journalof Basic Engineering, Transactions ofthe ASME, 83, Series D, 95-108.
KALMAN, R. E. [1963]. "New methods in Wiener filtering theory". Pages 270-

3880f' BOGDANOFF, J. L., & KOZIN, F. [eds], Proceedings ofthe First Sym-
posium of Engineering Applications of Random Function Theory and Probab-
ility. New York: Wiley.
KAMINSKY, G. [1993]. "Is there a peso problem? Evidence from the dollar/pound
exchange rate, 1976-1987". American Economic Review, 83, 450-472.
KARLIN, S., & TAYLOR, H. M. [1975]. A First Course in Stochastic Processes.

New York: Academic Press.
KARLSEN, H. [1990a]. A Class of Non-linear Time Series Models. University of

Bergen, Ph.D. Thesis.
KARLSEN, H. [1990b]. "Existence ofmoments in a stationary stochastic difference

equation". Advances in Applied Probability, 22, 129-146.
KIEFER, N. M. [1978]. "Discrete parameter variation: Efficient estimation of a

switching regression model". Econometrica, 46, 427-433.
KIEFER, N. M. [1980]. "A note on switching regressions and logistic discrimina-

tion". Econometrica, 48, 1065-1069.
340 References
KIM, c.-J. [1994]. "Dynamic linear models with Markov-switching". Journal 0/

Econometrics, 60, 1-22.
KING, R. G., & REBELO, S. T. [1993]. "Low frequency filtering and real business
cycles". Journal 0/ Economic Dynamics and Control, 17, 207-231.
KITAGAWA, G. [1987]. "Non-gaussian state-space modeIing ofnonstationary time

series". Journal 0/ the American Statistical Association, 82, 1032-1041.
KLEIN, P. A. [1995]. "Die Konjunkturindikatoren des NBER - Measurement

without Theory?". Pages 32-44 of' OPPENLÄNDER, K. H. [ed], Konjunk-
turindikatoren. Fakten, Analysen, Verwendung. Oldenbourg: München Wien.
KOOPMANS, T. C. [1947]. "Measurement without theory". Review 0/ Economic

Studies,29,161-179.
KRISHAMURTHY, V., & MOORE, J. [1993a]. "Hidden Markov model signal pro-
cessing in presence of unknown detenrunistic interferences". IEEE Trans.
Autom., 38, 146-152.
KRISHAMURTHY, v., & MOORE, J. [1993b]. "OnIine estimation ofhidden Markov

parameters based on the kullback-Ieibler informational measure". IEEE Trans-
actions Signal Process, 51, 2557-2573.
KROLZIG, H.-M. [1995]. Specijication 0/ Autoregressive Processes with Markov

Switching Regimes Based on ARMA Representations. Humboldt Universität zu
Berlin, working paper.
KROLZIG, H.-M., & LÜTKEPOHL, H. [1995]. "Konjunkturanalyse mit Markov-

Regimewechselmodellen". Pages 177-196 of' OPPENLÄNDER, K. H. [cd],
Konjunkturindikatoren. Fakten, Analysen, Verwendung. Oldenbourg: München
Wien.
KUNITOMO, N., & SATO, S. [1995]. A Stationary and Non-Stationary Simultan-

eous Switching Autoregressive Models with an Application to Financial Time
Series. University ofTokyo, working paper.
KYDLAND, F. E., & PRESCOTT, E. C. [1990]. "Business cycles: Real facts and a
monetary myth". Federal Reserve Bank 0/ Minneapolis, , 3-18.
References 341
LAHIRI, K., & WANG, 1. G. [1994]. "Predictingcyclicalturningpointswithleading

index in a Markov switching model". Journal of Forecasting, 13, 245-263.
LAM, P.-S. [1990J. "The Hamilton model with a general autoregressive compon-
ent. Estimation and comparison with other models of economic time series".
Journal of Monetary Economics, 26, 409-432.
LEE, L.-F. [1995]. Simulation Estimation of Dynamic Switching Regression and

Dynamic Disequilibrium Models - Some Monte Carlo Results. The Hong Kong
University of Science & Technology, Working paper no. 95-12.
LEROUX, B. G. [1992]. "Maximum-likelihood estimation for hidden Markov mod-

els". Stochastic Processes and their Application, 40, 127-143.
LEVINSON, S. E., RABINER, L. R., & SONDHI, M. M. [1983]. ''An introduction

to the application of the theory of probabilistic functions of a Markov process to
automatie speech recognition". The Bell System Technical Journal, 62, 1035-
1074.
LINDGREN, G. [1978]. "Markov regime models for mixed distributions and switch-
ing regressions". ScandinavianJournal ofStatistics, 5, 81-91.
LITTERMAN, R. B. [1986). "Forecasting with Bayesian vector autoregressions".

Journal of Business & Economic Statistics, 4, 25-38.
Lw, J., WONG, W. H., & KONG, A. [1994J. "Covariance structure of the Gibbs
sampIer with applications to the comparison of estimators and augmentation
schemes". Biometrika, 81, 27-40.
LÜTKEPOHL, H. [1986]. Prognose aggregierter Zeitreihen. Göttingen: Vanden-

hoeck & Ruprecht.
LÜTKEPOHL, H. [1987]. Forecasting Aggregated Vector ARMA Processes. Berlin:

Springer.
LÜTKEPOHL, H. [1991J. Introduction to Multiple Time Series Analysis. Berlin:

Springer.
LÜTKEPOHL, H., & CLAESSEN, H. [1996]. "Analysis of cointegrated VARMA

processes". Journal of Econometrics, forthcoming.
342 References
LÜTKEPOHL, H., & POS KITT, D. S. [1992]. Testing for Causation Using Infinite
Order Vector Autoregressive Processes. Humboldt Universität zu Berlin, SFB
373 Discussion Paper 2.
LÜTKEPOHL, H., & SAIKKONEN, P. [1995]. Impulse Response Analysis in Infinite

Order Cointegrated Vector Autoregressive Processes. Humboldt Universität zu
Berlin, SFB 373 Discussion Paper 11/1995.
LÜTKEPOHL, H., HAASE, K., CLAESSEN, H., SCHNEIDER, W., & MORYSON,
M. [1993]. "MulTi, a menu driven Gauss program". Computational Statistics,
8, 161-163.
MAGNUS, J. R., & NEUDECKER, H. [1994]. Matrix DifJerential CalculuswithAp-

plications in Statistics and Econometrics. New York: Wiley.
MCCULLOCH, R. E., & TSAY, R. S. [1994a]. "Bayesian analysis ofautoregressive

time series via the Gibbs sampler". Journal ofTime Series Analysis, 15, 235-
250.
MCCULLOCH, R. E., & TSAY, R. S. [1994b]. "Statistical analysis of economic

time series via Markov switching models". Journal of Time Series Analysis,
15,521-539.
MINTZ, I. [1969]. Dating Postwar Business Cycles, Methods and Their Application
to Western Germany, 1950-67. New York: Columbia University Press.
MIZON, G. E., & RICHARD, J. F. [1986]. "The encompassing principle and its
implication to testing non-nested hypotheses". Econometrica, 54,657-678.
NEWEY, W. K. [1985]. "Maximum likelihood specification testing and conditional

moment tests". Econometrica, 53, 1047-1070.
NICHOLLS, D. F., & PAGAN, A. R. [1985]. "Varying coefficient regression". In:

MADDALA, G. S., RAO, C. R., & VINOD, H. D. [eds],HandbookofStatistics,
vol. 5. Amsterdam: North-Holland.
NIEMIRA, M. P., & KLEIN, P. A. [1994]. Forecasting Financial and Economic

Cycles. New York: Wiley.
OPPENLÄNDER, K. H. [1995]. "Zum Konjunkturphänomen". Pages 4-22 oi"

OPPENLÄNDER, K. H. [ed], Konjunkturindikatoren. Fakten, Analysen, Ver-
wendung. Oldenbourg: München Wien.
References 343
OSTERWALD-LENUM, M. [1992]. "A note with quantiles of the asymptotic distri-

bution of the maximum cointegration rank test statistic: Four cases". Oxford
Bulletin oJ Economics and Statistics, 54,461-472.
OZAKI, T. [1980]. "Non-linear time series models for non-linear vibrations".

Journal oJ Applied Probability, 17, 84-93.
PAGAN, A. R., & SCHWERT, G. W. [1990]. "Alternative models for conditional

stock volatility". Journal oJ Econometrics, 45, 267-290.
PEARSON, K. [1894]. "Contributions to the mathematical theory of evolution".

Philosophical Transactions oJthe Royal Society, 185, 71-110.
PERRON, P. [1989]. "The great crash, the oil price shock, and the unit root hypo-
thesis". Econometrica, 33, 1361-1401.
PETRIE, T. [1969]. "Probabilistic functions of finite state Markov chains". Annals

oJ Mathematical Statistics, 60,97-115.
PFANN, G., SCHOTMAN, P., & TSCHERNIG, R. [1995]. Nonlinear Interest Rate
Dynamics and Implications Jor the Term Structure. Humboldt Universität zu
Berlin, SFB 373 Discussion Papers 43.
PHILLIPS, K. [1991]. "A two-country model of stochastic output with changes in

regime". Journal oJ International Economics, 31, 121-142.
POSKITT, D. S., & CHUNG, S.-H. [1994]. Markov ChainModels, TimeSeriesAna-

lysis and Extreme Value Theory. Australian National University, working paper.
POS KITT, D. [1987]. "A modified Hannan-Rissanen strategy for

mixed autoregressive-moving average order determination". Biometrika, 74,
781-790.
POTTER, S. M. [1990]. Nonlinear Time Series and Economic Fluctuations. Uni-

versity of Wisconsin, Ph.D. thesis.
POTTER, S. M. [1993]. ''A nonlinear approach to US GNP". Journal oJ Applied

Econometrics, 10, 109-125.
PROIETTI, T. [1994]. Short Run Dynamics in Cointegrated Systems. Universita di

Perugia.
344 References
QIAN, VI., & TITTERINGTON, D. M. [1992]. "Estimation of parameters in hidden

Markov models". Phil. Trans. R. Soc. Land., A337, 407-428.
QUAH, D. [1994]. Measuring Some UK Business Cycles. LSE, working paper.
RIDDER, T. [1994]. Consistency of Maximum Likelihood Estimators in Markov

Switching Regression Models with Endogenous State Selection. Universität
Mannheim, working paper.
RIPLEY, B. D. [1987]. Stochastic Simulations. New York: Wiley.
ROSENBERG, B. [1973]. "Random coefficient models: The analysis of a cross sec-

tion of time series by stochastically convergent parameter regression". Ann.
Econ. Social Measurement, 2, 399-428.
RUANAIDH, J. J. K., & FITZGERALD, W. J. [1995]. Numerical Bayesian Methods

Applied to Signal Processing. mimeo.
RUUD, P. A. [1991]. "Extensions of estimation methods using the EM algorithm".

Journal of Econometrics, 49, 305-341.
SAIKKONEN, P. [1992]. "Estimation and testing of cointegrated systems by an auto-

regressive approximation". Econometric Theory, 8, 1-27.
SAIKKONEN, P., & LÜTKEPOHL, H. [1994]. Infinite Order Co integ ra ted Vector
Autoregressive Processes: Estimation and Inference. Humboldt Universität zu
Berlin, SFB 373 Discussion Paper 5/1994.
SAIKKONEN, P., & LÜTKEPOHL, H. [1995]. Asymptotic Inference on Nonlinear

Functions of the Coefficients of Infinite Order Cointegrated VAR Processes.
Humboldt Universität zu Berlin, SFB 373 Discussion Paper 66/1995.
SAIKKONEN, P., & LUUKKONEN, R. [1995]. Testing Cointegration in Infinite Or-

der Vector Autoregressive Processes. University of Helsinki, working paper.
SCHNEIDER, W. [1991]. "Stability analysis using KaIman filtering, scoring, EM,

and an adaptive EM method". In: HACKL, P., & WESTLUND, A. H. [eds],
Economic Structural Change: Analysis and Forecasting. Berlin: Springer.
SCHWERT, G. W. [1989]. "Business cycles, financial crisis and stock volatility".

Carnegie-Rochester Conference Series on Public Policy, 31, 83-126.
References 345
SENSIER, M. [1996]. Investigating Business Cycle Asymmetries in the UK. Uni-

versity of Sheffield, Ph.D. Thesis.
SHEPHARD, N. [1994]. "Partial non-Gaussian state space". Biometrika, 81, 115-

131.
SHUMWAY, R., & STOFFER, D. [1991]. "Dynamic linear models with switching".
Journal ofthe American Statistical Association, 86, 763-769.
SICHEL, D. E. [1994]. "Inventories and the three phases of the business cycle".
Journal of Business & Economic Statistics, 12, 269-278.
SIMS, C. A. [1980]. "Macroeconomics and reality". Econometrica, 48, 1-48.
SMITH, A. F. M., & ROBERTS, G. O. [1993]. "Bayesian computation via the Gibbs
sampier and related Markov chain Monte Carlo methods". Journal of the Royal
Statistical Society, 55B, 3-23.
SOLA, M., & TIMMERMANN, A. [1995]. Fitting the Moments: A Comparison of

ARCH and Regime Switching models for Daily Stock Returns. UCSD working
paper.
TAUCHEN, G. [1985]. "Diagnostie testing and evaluation of maximum likelihood

models". Journal of Econometrics, 30, 415-443.
TEICHER, H. [1967]. "Identifiability of mixtures of produet measures". Annals of

Mathematical Statistics, 38, 1300-1302.
TERÄSVIRTA, T., & ANDERS ON , H. [1992]. "Modelling nonlinearities in business

cycles using smooth transition autoregressive models". Journal ofApplied Eco-
nometrics, 7, SI19-S136.
TIERNEY, L. [1994]. "Markov ehains for exploring posterior distributions". Annals

of Statistics, 22.
TITTERINGTON, D. M., SMITH, A. F. M., & MAKOV, U. E. [1985]. Statistical

Analysis of Finite Mixture Distributions. Wiley.
TJ0STHEIM, D. [1986a]. "Estimation in nonlineartime series models". Stochastic

Processes and their Application, 21, 251-273.
346 References
TJ0STHEIM, D. [1986b]. "Some doubly stochastic time series models". Journalof

Time Se ries Analysis, 7, 51-72.
TJ0STHEIM, D. [1990]. "Non linear time series models and Markov chains". Ad-
vances in Applied Probability, 22, 587-611.
TONG, H. [1990]. Non-linear Time Series. Oxford: University Press.
TYSSEDAL, J., & TJ0STHEIM, D. [1988]. "An autoregressivemodel with suddenly

changing parameters and an application to stock market prices". Applied Stat-
istics, 37, 353-369.
UHLIG, H. [1994]. "On Jeffreys prior when using the exact likelihood function".
Econometric Theory, 10, 633-644.
VOINA, A. A. [1988]. "Statistical estimation in a scheme of random variables on

Markov chains with incomplete observations". Theory ofProbability and Math-
ematical Statistics, 37, 19-28.
WALL, K. D. [1987]. "Identification theory for varying coefficient models". Journal

ofTime Series Analysis, 8, 359-371.
WATSON, M. W., & ENGLE, R. F. [1983]. "Alternative algorithms for the esti-
mation of dynamic factor, MIMIC and varying coefficient regression models".
Journal of Econometrics, 23, 385-400.
WHITE, H. [1982]. "Maximum likelihood estimation of misspecified models". Eco-

nometrica, 50, 1-25.
WHITE, H. [1987]. "Specification testing in dynamic models". In: BEWLEY, T. F.

red], Advances in Econometrics. 5th World Congress. Cambridge University
Press: Cambridge, U.K.
ZARNOWITZ, V. [1995]. "Globale Konjunktur- und Wachstumszyklen" . Pages 253-

28101" OPPENLÄNDER, K. H. red], Konjunkturindikatoren. Fakten, Analysen,
Verwendung. Oldenbourg: München Wien.
Tables
1.1 Special Markov Switching Vector Autoregressive Models. . . . . . 14
2.1 The State Space Representation ........................... 31

2.2 Restrictions on the Parameter Space . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3 Parameter Restrictions - MSI Specifications ................. 37
2.4 Definition of the State Vector ~t .........•.•.•••........... 38
3.1 Most Parsimonious MS-AR Model with an ARMA Representation 64
6.1 The EM Algorithm.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 102
7.1 Bottom-up Strategy for the Specification of MS-AR Processes .. 125

7.2 Specification ofMS(M)-VAR(P) Processes . . . . . . . . . . . . . . . . .. 128
7.3 ARMA-Representations ofMS-AR Models.. . . . . . . . .... .... 130
7.4 Selection of Univariate MS-AR Models 131
8.1 The Gibbs SampIer ..................................... 164
9.1 Particular Markov Switching Vector Autoregressive Models. . .. 168

9.2 Indicators of Realized Regimes and Smoothed Probabilities . . .. 169
9.3 Analysis of Regimes .................................... 170
9.4 Maximization Step of the EM Algorithm in Linear MS Regression
Models........................... .................... 176
9.5 Gibbs Sampling in Linear MS Regression Models ............ 178
9.6 MSI Specifications of Linear MS Regression Models. . . . . . . . .. 179
9.7 Notation in MSM Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 182
9.8 The MSI(M)-VAR(P) Model ............................. 185
9.9 The MSIH(M)-VAR(P) Model. . . ..... . . ... . . . ... ... . . . . .. 186
348 Tables
9.10 The MSIA(M)-VAR(P) Model . . . . . . . . . . . . . . . . . .. 187

9.11 The MSIAH(M)-VAR(P) Model . . . . . . . . . . . . . . . . . . 188
9.12 The MSH(M)-VAR(P) Model. . . . . . . . . . . . . . . . . . .. 189
9.13 The MSA(M)-VAR(P) Model. . . . . . . . . . . . . . . . . . .. 190
9.14 TheMSAH(M)-VAR(p)Model................... 191
9.15 The MSH(M)-MVAR(P) Model . . . . . . . . . . . . . . . . . . 192
9.16 The MSA(M)-MVAR(P) Model . . . . . . . . . . . . . . . . . . 193
9.17 The MSAH(M)-MVAR(P) Model . . . . . . . . . . . . . . . . . 194
9.18 The MSM(M)-VAR(P) Model . . . . . . . . . . . . . . . . . . . 195
9.19 The MSMH(M)-VAR(P) Model ... . . . . . . . . . . . . . .. 196
9.20 The MSMA(M)-VAR(P) Model . . . . . . . . . . . . . . . . . . 197
9.21 The MSMAH(M)-VAR(P) Model .. . . . . . . . . . . . . . .. 198
11.1 Models Recommended by Different Model Selection Criteria .. 220

11.2 MSM(2)-AR(p) Models . . . . . . . . . . . . . . . . . . . . . .. 222
11.3 MSI(2)-AR(p) Models. . . . . . . . . . . . . . . . . . . . . . .. 232
11.4 MS(2)-AR(P) Models with Regime Dependent Heteroscedasticity 237
11.5 MS(M)-Models with more than two Regimes . . . . . . . . . . . . 241
11.6 MSIAH(M)-AR(P) Models. . . . . . . . . . . . . . . . . . . .. 248
11.7 Forecasting Performance of MS-AR Models ... . . . . . . .. 253
12.1 Estimated MSM(2)-AR(4) Models. . . . . . . . . . . . . . . .. 262

12.2 MS(M)-AR(P) Models ofJapanese Growth . . . . . . . . . . .. 269
12.3 MSMH(2)-DVAR(1)......................... 290
12.4 MSM(2)-DVAR(1).......................... 291
12.5 MSI(2)-DVAR(I)........................... 292
12.6 MSM(2)-DVAR(4).......................... 293
12.7 MSMH(2)-DVAR(4)......................... 294
12.8 MSMH(3)-DVAR(1)......................... 295
12.9 MSIH(3)-DVAR(4).......................... 296
13.1 CointegrationAnalysis:VAR(1)Model . . . . . . . . . . . . . . 314

13.2 Cointegration Analysis: VAR(2) Model . . . . . . . . . . . . .. 315
13.3 Granger-Causality.......................... 317
13.4 Forecast Error Decomposition. . . . . . . . . . . . . . . . . . .. 319
13.5 CI(1)-VAR(1) . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 325
13.6 CI(1)-VAR(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 326
Tables 349
13.7 MSCI(2,1)-VECM(1)........................ 327

13.8 MSHCI(2,1)-VECM(1)....................... 328
Figures
11.1 Log ofWest-German Real GNP (Seasonally Adjusted) 1960-1994 214

11.2 The Hamilton Model: Conditional Densities. . . . . . . . . . . . . . . .. 217
11.3 West-German Real GNP-Growth, Quarter over Quarter. . . . . . .. 218
11.4 West-German Real GNP-Growth, Year over Year . . . . . . . . . . . .. 219
11.5 Hamilton's MSM(2)-AR(4) Model ....................... " 223
11.6 MSM(2)-AR( 4) Model. Probabilities of the Regime "Recession" 224
11.7 MSM(2)-AR(4) Model: Regime Shifts and the Business Cycle " 225
11.8 MSM(2)-AR(4) Model: Impulse-Response Analysis. . . . . . . . .. 227
11.9 MSM(2)-AR(4) Model: Duration ofRegimes .............. " 229
11. 10 MSM(2)-AR(4) Model: Kernel Density Estimation . . . . . . . . . .. 230
11.11 MSM(2)-AR(4) Model. Residuals ....................... " 231
11.12 MSI(2)-AR(4) Model: Impulse-Response Analysis ........... 233
11. 13 The MSI(2)-AR(4) Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 234
11.14 MSI(2)-AR(4) Model: Conditional Heteroskedasticity . . . . . . . .. 236
11.15 The MSIH(2)-AR(4) Model .............................. 238
11.16 The MSMH(2)-AR(4) Model. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 239
11.17 The MSI(3 )-AR(O) Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 242
11.18 The MSI( 4 )-AR(O) Model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 243
11.19 The MSI(5)-AR(0) Model ..... , . '" ................. , .. .. 244
11.20 MSI(5)-AR(0) Model. Probabilities of the Regime "Recession". 245
11.21 The MSIH(5)-AR(l) Model .............................. 246
11.22 MSIAH(2)-AR(4) Model. Probabilities ofthe Regime "Recession" 249
11.23 The MSMH(3)-AR(4) Model ............................ , 250
11.24 MSMH(3)-AR(4) Model. Probabilities ofthe Regime "Recession" 251
11.25 Uncertainty of the Regime Classification in the MSM(2)-AR(4)
Model. ................ ........... ........ ............ 254
352 Figures
11.26 Hodrick-Prescott Filter: Cyclical Component of West-German

GNP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
12.1 Growth in the World Economy: Quarter over Quarter. . . . . .. 261

12.2 The Hamilton Model: USA. . . . . . . . . . . . . . . . . . . .. 263
12.3 The Hamilton Model: Canada. . . . . . . . . . . . . . . . . . .. 264
12.4 The Hamilton Model: United Kingdom . . . . . . . . . . . . .. 265
12.5 The Hamilton Model: West Germany . . . . . . . . . . . . . . . 266
12.6 The Hamilton Model: Japan. . . . . . . . . . . . . . . . . . . .. 267
12.7 MSI(4)-AR(4) Model ofJapanese Growth . . . . . . . . . . . .. 268
12.8 MSI(3)-AR(4) Model ofJapanese Growth . . . . . . . . . . . .. 270
12.9 MSMH(3)-AR(4) Model of Japanese Growth . . . . . . . . . .. 271
12.10 The Hamilton Model: Australia . . . . . . . . . . . . . . . . . .. 272
12.11 The MSMH(2)-AR(4) Model: Australia . . . . . . . . . . . . .. 273
12.12 National or Global Business Cycles? . . . . . . . . . . . . . . .. 275
12.13 Growth in the World Economy: Year over Year . . . . . . . . .. 276
12.14 MSM(2)-DVAR(1) Model. . . . . . . . . . . . . . . . . . . . .. 279
12.15 MSMH(2)-DVAR(1) Model. . . . . . . . . . . . . . . . . . . .. 280
12.16 MSI(2)-DVAR(1) Model. . . . . . . . . . . . . . . . . . . . . .. 281
12.17 MSM(2)-DVAR(4) Model. . . . . . . . . . . . . . . . . . . . .. 282
12.18 MSMH(2)-DVAR(4) Model. . . . . . . . . . . . . . . . . . . .. 283
12.19 MSMH(3)-DVAR(1) Model . . . . . . . . . . ~ . . . . . . . . .. 285
12.20 MSIH(3)-DVAR(4) Model. . . . . . . . . . . . . . . . . . . . .. 287
13.1 Growth in the World Economy. Log ofReal GNP 1960-1991 .. 312
13.2 MS(2)-VECM(1) Model. . . . . . . . . . . . . . . . . . . . . .. 320
13.3 MSH(2)-VECM(1) Model. . . . . . . . . . . . . . . . . . . . .. 322
List of Notation
Most of the notation is clearly defined in the text where it is used. The following list
is designed to provide some general guidelines. Occasionally, a symbol has been
assigned to a different objeet, but this is explained in the text. For the notation of
MS-VAR models confer Table 1.1.
Matrix Operations
® Kronecker product
o element-by-element multiplication
o element-by-element division
X-I inverse of X
X* adjoint matrix of X
xj j-th power of X
XI/2 square root of X, lower triangular Choleski deeomposition
lXI = detX determinant of X
IIXII normof X
diag x diagonal matrix containing x on the diagonal
rkX rank of X
trX trace of X
vecX column stacking operator
vechX operator stacking the elements on and below the main diagonal of a
symmetrie matrix
matrix of first order partial derivatives of 4> with respect to )..
1!L
8),,8),,'
Hessian matrix of 4>
354 List of Notation
Special Matrices
the (n xl) matrix of on es
the (m x n) identity of zeros
duplication matrix
the (n x n) identity matrix
J communication matrix (I K , 0, ... ,0)
the m-th vector of an appropriate identity matrix
L (M x N) communication matrix
A diagonal matrix containing eigenvalues on the diagonal
vector of means, E[Ytl

variance matrix, E[ (Yt - /-Ly)(Yt - /-Ly)']
autocovariance function, E[(Yt - /-Ly )(Yt-h - /-LY)'], h = 0, ±1, ±2, ...
T/t vector of conditional densities, p(Ytllt-l, ~t = t m ), m = 1, ... , N
<Ph (K x K) matrix of an impulse response function, h = 0, 1, .. .
'l'h (K x K) matrix of an infinite order VAR representation, h = 0,1, ...
vector of Lagrange multipliers
n variance matrix of y
W- 1 matrix of weights
Constants
h forecast horizon
K dimension of the observed time series
M number of regimes
N dimension of the stacked regime vector
p order of the vector autoregression
p* AR order of the VARMA representation
q order of distributed regime lag
q* MA order of the VARMA representation
r number of cointegration vectors
R number of coefficients
T number of observations
General and Statistical Symbols

is distributed as
converges in distribution to
is drawn from the distribution
ß difference operator
ßj convergence criterion
L lag operator
A(L) matrix lag-polynomial
a(L) scalar lag-polynomial
E expectation
Var variance
Cov covariance
Pr(·) probability
p(.) probability density
ip(. ) Gaussian density function
x 2 (m) X2 distribution with m degrees of freedom
N (f-t) ~) normal distribution with mean f-t and variance matrix ~
NID (f-t)~) identically independently normal distributed with mean f-t and ~
IID (f-t) ~) identically independently distributed with mean f-t and variance ~
L(·) likelihood function
C( .) log-Iikelihood function
St (.) score vector
h t { .) conditional score vector
I information matrix
1(-) indicator function
Y information set
rjJC) deviation from parameter restriction
Ho null hypo thesis
H1 alternative hypothesis
LR likelihood ratio statistic
LM Lagrange multiplier statistic
LW Wald statistic
SC Schwarz criterion
HQ Hannan-Quinn (criterion)
AIC Akaike information criterion
FPE final prediction error (criterion)
356 List of Notation
Variables Related to MS-VAR Processes

Yt (K xl) vector of observed variables
Ykt observation of the k-th variable at time t, k = 1, ... , K
Yt (K[t + p] x 1) vector of observed variables
y sampie, Y = YT = (yT, . .. , y~, y~, .. . , y~_p)'
Yt stacked (K p xl) vector of observed variables, Yt = (y~, ... , y~_p+ 1)'
Y stacked (T K xl) vector of observed variables, Y = (YT' ... , y~)'
Y (T x K) matrix of observed variables
h-step prediction, expectation of Yt+h conditional on Yt-h
conditional mean of Yt given St = m and Yt-l
(K x R) matrix ofregressors
(1 x R) matrix of regressors
(MT K xl) matrix of regressors
(T K x R) matrix of regressors
(MT K x R) matrix of regressors (cf. Tables 9.7- 9.20 )
St regime variable, St E {I, ... , M}

~t (N x 1) regime vector at time t, I(~t = Lm ), m = 1, ... , N
~o initial state
~mt dummy of regime m at time t, I (~t = Lm )
~ (N(T+1) X 1) regime vector, ~ = ~T 0 ~T-10 ... 0 ~o

~m (T x 1) vector associated with regime m, I(~t = Lm ), m = 1, ... , N
[ vector of ergodic regime probabilities
vector of regime probabilities, inference on ~t given YT , 0 ::; T
~*
'T ::;
matrices of regime probabilities (cf. Table 9.2)

unrestricted ([ M - 1] xl) regime vector
Ut (Gaussian) innovations of the observation equation

Zt linear Gaussian process with innovations Ut
Vt (non-Gaussian) innovations of the transition equation
(K xl) vector of one-step prediction errors
(K xl) vector of h-step prediction errors
(K xl) vector of errors of the VARMA representation
(N xl) vector of one-step regime predictions errors
Parameter Vectors and Matrices ofMS·VAR Representations

vector of MS-VAR parameters, A = (0', p' , ~b),
vector of parameters of the VAR (in regime m)
p (M[M - 1] x 1) vector oftransition probabilities
vector of autoregressive parameters (in regime m)

j-th autoregressive coefficient
(R xI) vector of coefficients in regime m, B Lm
(R xI) vector of coefficients at time t, B ~t

vector of structural parameters of the VAR (in regime m)
vector of equilibrium means (in regime m)
/-L, /-Lm vector of means or drift (in regime m)
vector of intercept terms (in regime m)
vector of variance-covariance parameters (in regime m)
(K x K) VAR coefficient matrix

(K x K) SEM coefficient matrix
(K p x K p) VAR coefficient matrix
(M K x M K) VAR coefficient matrix
(K x r) matrix of factorloadings
(K x K) VARX coefficient matrix
(K x K) SEM coefficient matrix
(R x N) matrix of coefficients
(R x [M - 1]) adjusted input matrix
(r x K) cointegration matrix
(K x M) input matrix
(K x K) coefficient matrix of the VECM representation
(N x N) transition matrix
([M - 1] x [M - 1]) transition matrix
(K x N) input matrix
(K x M) matrix of regime-dependent means/intercepts
(K x [1\1 - 1]) matrix of adjusted means/intercepts
(M x M) matrix of transition probabilities Pij
(K x K) coefficient matrix of the VECM representation
(K x M K) stacked variance matrix
(K x K) variance matrix at time t, :E(~t 0 I K )
(K x [M - l]K) adjusted variance matrix

Ms Var Libro

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ms Var Libro

Uploaded by

Copyright:

Available Formats

Lecture Notes in Economics and Mathematical Systems 454

Lfbrary of Congress Cataloging-fn-Publication Data

Krolzig, Hans-Martin, 1964-

1. Business cycles--Mathematical models. 2. Social sciences-

This book contributes to re cent developments on the statistical analysis of multiple

This monograph is a revised version of my dissertation which has been accepted by

I wish to express my sincere appreciation of the helpful discussions, suggestions and

Oxford, March 1997 Hans-Martin Krolzig

1 Thc Markov-Switching Vcctor Autorcgressive Model 6

2 Tbc Statc-Space Represcntation 29

2.1.4 Markov Property of the State-Space Representation . . . . . . 34

3 VARMA-Representation ofMSI-VAR and MSM-VAR Processes 47

4 Forecasting MS-VAR Processes 6S

S The BLHK Filter 77

6 Maximum Likelihood Estimation 89

7 Model Selection and Model Checking 123

7.6 Some Critical Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 144

8 Multi-Move Gibbs Sampling 145

9 Comparative Analysis of Parameter Estimation in Particular MS· VAR

10 Extensions of the Basic MS· VAR Model 199

10.3.4 A Modified EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .. 210

11 Markov-Switching Models of the German Business Cycle 213

12 Markov-Switching Models of Global and International Business

12.1.7 Comparisons ....... ~ . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 274

13 Cointegration Analysis of VAR Models with Markovian Shifts in Re-

List of Notation 353

Objective of the Study

The primary advantage of the Markov-switching vector autoregressive model is to

The previous literature on this topic is often characterized by imprecise generalities

Survey of the Study

exhibits regime shifts in a stationary manner. In Chapter 3, vector autoregressive

In Chapter 4 and Chapter 5, the statistical analysis of MS-VAR models is considered

Chapter 8 introduces a multi-move Gibbs-Samplerfor multiple time series subject to

posterior parameter distribution. In addition, it allows us to determine the posterior

Generalizations of the MS-VAR model to open dynamic systems, endogenous re-

Chapter 11 demonstrates the feasibility of our approach by investigating West-

1.1 General Introduction

ditional probability density of the observed time series vector Yt is given by

E[YtIYt-1,St] = v(St) + LAj(St)Yt-j,

The innovation process Ut is a zero-mean white noise process with a variance-

If the VAR process is defined conditionally upon an unobservable regime as in equa-

where p denotes the vector of parameters of the regime generating process.

The vector autoregressive model with Markov-switching regimes is founded

(i.) The mixture of normal distributions model is characterized by serially inde-

In contrast to MS-VAR models, the transition probabilities are independent of

(ii.) In the self-exciting threshold autoregressive SETAR(p, d, r) model, the

(iii.) In the smooth transition autoregressive (STAR) model popularized by GRAN-

where F(Y~_d6 - r) is a continuous function determining the weight of re-

3The likelihood function is given by

Thus the observed variables contain additional information on the conditional

Thus the regime generating process is no longer Markovian. In contrast to the

1.2 Markov-Switching Vector Autoregressions

1.2.1 The Vector Autoregression

Markov-switching vector autoregressions can be considered as generalizations ofthe

where Jl = (IK - L:J=1 Aj)-lv is the (K x 1) dimensional mean ofYt.

More precisely, it is assumed that St follows an irreducible ergodie M state Markov

Yt-Jl(St) = AI(sd (Yt-l - Jl(St-I))+" .+Ap(sd (Yt-p - Jl(St-p))+Ut, (1.5)

the realized regime St, e.g.

+ A ll Yt-l + ... ~ A plYt-p + L;1

VM + A 1M Yt-l + ... + A pMYt-p + ,,1/2

where Ut '" NID (0, IK ).8

1.2.2 Particular MS-VAR Processes

The MS-VAR model allows for a great variety of specifications. In principle, it

means of Yt are to be distinguished: