Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Statistical Learning for Big Dependent

Data (Wiley Series in Probability and


Statistics) 1st Edition Daniel Peña
Visit to download the full and correct content document:
https://ebookmeta.com/product/statistical-learning-for-big-dependent-data-wiley-serie
s-in-probability-and-statistics-1st-edition-daniel-pena/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Statistics for Data Scientists: An Introduction to


Probability, Statistics, and Data Analysis Maurits
Kaptein

https://ebookmeta.com/product/statistics-for-data-scientists-an-
introduction-to-probability-statistics-and-data-analysis-maurits-
kaptein/

Statistical Foundations, Reasoning and Inference: For


Science and Data Science (Springer Series in
Statistics) Göran Kauermann

https://ebookmeta.com/product/statistical-foundations-reasoning-
and-inference-for-science-and-data-science-springer-series-in-
statistics-goran-kauermann/

Python for Probability Statistics and Machine Learning


2nd Edition José Unpingco

https://ebookmeta.com/product/python-for-probability-statistics-
and-machine-learning-2nd-edition-jose-unpingco/

Statistical Data Analysis Using SAS Intermediate


Statistical Methods Springer Texts in Statistics
Marasinghe

https://ebookmeta.com/product/statistical-data-analysis-using-
sas-intermediate-statistical-methods-springer-texts-in-
statistics-marasinghe/
Fundamentals of Queueing Theory (Wiley Series in
Probability and Statistics) 5th Edition Shortle

https://ebookmeta.com/product/fundamentals-of-queueing-theory-
wiley-series-in-probability-and-statistics-5th-edition-shortle/

Statistics Volume 3 Categorical and time dependent data


analysis 3rd Edition Kunihiro Suzuki

https://ebookmeta.com/product/statistics-volume-3-categorical-
and-time-dependent-data-analysis-3rd-edition-kunihiro-suzuki/

Introduction to Linear Regression Analysis (Wiley


Series in Probability and Statistics) 6th Edition
Montgomery

https://ebookmeta.com/product/introduction-to-linear-regression-
analysis-wiley-series-in-probability-and-statistics-6th-edition-
montgomery/

Statistics and Probability Faulkner

https://ebookmeta.com/product/statistics-and-probability-
faulkner/

Advances in Machine Learning for Big Data Analysis


Satchidananda Dehuri (Editor)

https://ebookmeta.com/product/advances-in-machine-learning-for-
big-data-analysis-satchidananda-dehuri-editor/
STATISTICAL LEARNING FOR BIG
DEPENDENT DATA
WILEY SERIES IN PROBABILITY AND STATISTICS
Established by Walter A. Shewhart and Samuel S. Wilks

Editors: David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice,


Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott,
Adrian F. M. Smith, Ruey S. Tsay

Editors Emeriti: J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane,


Jozef L. Teugels

The Wiley Series in Probability and Statistics is well established and authoritative.
It covers many topics of current research interest in both pure and applied statistics
and probability theory. Written by leading statisticians and institutions, the titles
span both state-of-the-art developments in the field and classical methods.

Reflecting the wide range of current research in statistics, the series encompasses
applied, methodological and theoretical statistics, ranging from applications and
new techniques made possible by advances in computerized practice to rigorous
treatment of theoretical approaches.
This series provides essential and invaluable reading for all statisticians, whether in
academia, industry, government, or research.

A complete list of titles in this series can be found at


http://www.wiley.com/go/wsps
STATISTICAL LEARNING
FOR BIG DEPENDENT
DATA

DANIEL PEÑA
Universidad Carlos III de Madrid
Madrid, Spain

RUEY S. TSAY
University of Chicago
Chicago, United States
This first edition first published 2021
© 2021 John Wiley and Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise,
except as permitted by law. Advice on how to obtain permission to reuse material from this title is
available at http://www.wiley.com/go/permissions.

The right of Daniel Peña and Ruey S. Tsay to be identified as the authors of this work has been asserted
in accordance with law.

Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office
111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products
visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content
that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty


While the publisher and authors have used their best efforts in preparing this work, they make no
representations or warranties with respect to the accuracy or completeness of the contents of this work
and specifically disclaim all warranties, including without limitation any implied warranties of
merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives, written sales materials or promotional statements for this work. The fact that an
organization, website, or product is referred to in this work as a citation and/or potential source of
further information does not mean that the publisher and authors endorse the information or services
the organization, website, or product may provide or recommendations it may make. This work is sold
with the understanding that the publisher is not engaged in rendering professional services. The advice
and strategies contained herein may not be suitable for your situation. You should consult with a
specialist where appropriate. Further, readers should be aware that websites listed in this work may have
changed or disappeared between when this work was written and when it is read. Neither the publisher
nor authors shall be liable for any loss of profit or any other commercial damages, including but not
limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging-in-Publication Data

Names: Peña, Daniel, 1948- author. | Tsay, Ruey S., 1951- author.
Title: Statistical learning for big dependent data / Daniel Peña, Ruey S.
Tsay.
Description: First edition. | Hoboken, NJ : Wiley, 2021. | Series: Wiley
series in probability and statistics | Includes bibliographical
references and index.
Identifiers: LCCN 2020026630 (print) | LCCN 2020026631 (ebook) | ISBN
9781119417385 (cloth) | ISBN 9781119417392 (adobe pdf) | ISBN
9781119417415 (epub) | ISBN 9781119417408 (obook)
Subjects: LCSH: Big data–Mathematics. | Time-series analysis. | Data
mining–Statistical methods. | Forecasting–Statistical methods.
Classification: LCC QA76.9.B45 P45 2021 (print) | LCC QA76.9.B45 (ebook)
| DDC 005.7–dc23
LC record available at https://lccn.loc.gov/2020026630
LC ebook record available at https://lccn.loc.gov/2020026631

Cover Design: Wiley


Cover Images: Colorful graph Courtesy of Daniel Peña and Ruey
S.Tsay, Abstract background © duncan1890/Getty Images

Set in 10/12pt TimesTenLTStd by SPi Global, Chennai, India

10 9 8 7 6 5 4 3 2 1
To our good friend & mentor George C. Tiao (DP & RST)

To my wife, Jette Bohsen, with love (DP)

To Teresa, Julie, Richard, and Vicki (RST)


CONTENTS

PREFACE xvii

1. INTRODUCTION TO BIG DEPENDENT DATA 1


1.1 Examples of Dependent Data 2
1.2 Stochastic Processes 9
1.2.1 Scalar Processes 9
1.2.1.1 Stationarity 10
1.2.1.2 White Noise Process 12
1.2.1.3 Conditional Distribution 12
1.2.2 Vector Processes 12
1.2.2.1 Vector White Noises 15
1.2.2.2 Invertibility 15
1.3 Sample Moments of Stationary Vector Process 15
1.3.1 Sample Mean 16
1.3.2 Sample Covariance and Correlation Matrices 17
1.4 Nonstationary Processes 21
1.5 Principal Component Analysis 23
1.5.1 Discussion 26
1.5.2 Properties of the PCs 27
1.6 Effects of Serial Dependence 31

vii
viii CONTENTS

Appendix 1.A: Some Matrix Theory 34


Exercises 35
References 36

2. LINEAR UNIVARIATE TIME SERIES 37


2.1 Visualizing a Large Set of Time Series 39
2.1.1 Dynamic Plots 39
2.1.2 Static Plots 44
2.2 Stationary ARMA Models 49
2.2.1 The Autoregressive Process 50
2.2.1.1 Autocorrelation Functions 51
2.2.2 The Moving Average Process 52
2.2.3 The ARMA Process 54
2.2.4 Linear Combinations of ARMA Processes 55
2.3 Spectral Analysis of Stationary Processes 58
2.3.1 Fitting Harmonic Functions to a Time Series 58
2.3.2 The Periodogram 59
2.3.3 The Spectral Density Function and Its Estimation 61
2.4 Integrated Processes 64
2.4.1 The Random Walk Process 64
2.4.2 ARIMA Models 65
2.4.3 Seasonal ARIMA Models 67
2.4.3.1 The Airline Model 69
2.5 Structural and State Space Models 71
2.5.1 Structural Time Series Models 71
2.5.2 State-Space Models 72
2.5.3 The Kalman Filter 76
2.6 Forecasting with Linear Models 78
2.6.1 Computing Optimal Predictors 78
2.6.2 Variances of the Predictions 80
2.6.3 Measuring Predictability 81
2.7 Modeling a Set of Time Series 82
2.7.1 Data Transformation 83
2.7.2 Testing for White Noise 85
2.7.3 Determination of the Difference Order 85
2.7.4 Model Identification 87
2.8 Estimation and Information Criteria 87
2.8.1 Conditional Likelihood 87
2.8.2 On-line Estimation 88
2.8.3 Maximum Likelihood (ML) Estimation 90
CONTENTS ix

2.8.4 Model Selection 91


2.8.4.1 The Akaike Information Criterion (AIC) 91
2.8.4.2 The Bayesian Information Criterion (BIC) 92
2.8.4.3 Other Criteria 92
2.8.4.4 Cross-Validation 93
2.9 Diagnostic Checking 95
2.9.1 Residual Plot 96
2.9.2 Portmanteau Test for Residual Serial Correlations 96
2.9.3 Homoscedastic Tests 97
2.9.4 Normality Tests 98
2.9.5 Checking for Deterministic Components 98
2.10 Forecasting 100
2.10.1 Out-of-Sample Forecasts 100
2.10.2 Forecasting with Model Averaging 100
2.10.3 Forecasting with Shrinkage Estimators 102
Appendix 2.A: Difference Equations 103
Exercises 108
References 108

3. ANALYSIS OF MULTIVARIATE TIME SERIES 111


3.1 Transfer Function Models 112
3.1.1 Single Input and Single Output 112
3.1.2 Multiple Inputs and Multiple Outputs 118
3.2 Vector AR Models 118
3.2.1 Impulse Response Function 120
3.2.2 Some Special Cases 121
3.2.3 Estimation 122
3.2.4 Model Building 123
3.2.5 Prediction 125
3.2.6 Forecast Error Variance Decomposition 127
3.3 Vector Moving-Average Models 135
3.3.1 Properties of VMA Models 136
3.3.2 VMA Modeling 136
3.4 Stationary VARMA Models 140
3.4.1 Are VAR Models Sufficient? 140
3.4.2 Properties of VARMA Models 141
3.4.3 Modeling VARMA Process 141
3.4.4 Use of VARMA Models 142
3.5 Unit Roots and Co-Integration 147
3.5.1 Spurious Regression 148
x CONTENTS

3.5.2 Linear Combinations of a Vector Process 148


3.5.3 Co-integration 149
3.5.4 Over-Differencing 150
3.6 Error-Correction Models 151
3.6.1 Co-integration Test 152
Exercises 157
References 157

4. HANDLING HETEROGENEITY IN MANY TIME SERIES 161


4.1 Intervention Analysis 162
4.1.1 Intervention with Indicator Variables 163
4.1.2 Intervention with Step Functions 165
4.1.3 Intervention with General Exogenous Variables 166
4.1.4 Building an Intervention Model 166
4.2 Estimation of Missing Values 170
4.2.1 Univariate Interpolation 170
4.2.2 Multivariate Interpolation 172
4.3 Outliers in Vector Time Series 174
4.3.1 Multivariate Additive Outliers 175
4.3.1.1 Effects on Residuals and Estimation 176
4.3.2 Multivariate Level Shift or Structural Break 177
4.3.2.1 Effects on Residuals and Estimation 177
4.3.3 Other Types of Outliers 178
4.3.3.1 Multivariate Innovative Outliers 178
4.3.3.2 Transitory Change 179
4.3.3.3 Ramp Shift 179
4.3.4 Masking and Swamping 180
4.4 Univariate Outlier Detection 180
4.4.1 Other Procedures for Univariate Outlier Detection 183
4.4.2 New Approaches to Outlier Detection 184
4.5 Multivariate Outliers Detection 189
4.5.1 VARMA Outlier Detection 189
4.5.2 Outlier Detection by Projections 190
4.5.3 A Projection Algorithm for Outliers Detection 192
4.5.4 The Nonstationary Case 193
4.6 Robust Estimation 196
4.7 Heterogeneity for Parameter Changes 199
4.7.1 Parameter Changes in Univariate Time Series 199
4.7.2 Covariance Changes in Multivariate Time Series 200
4.7.2.1 Detecting Multiple Covariance Changes 202
4.7.2.2 LR Test 202
CONTENTS xi

Appendix 4.A: Cusum Algorithms 204


4.A.1 Detecting Univariate LS 204
4.A.2 Detecting Multivariate Level Shift 204
4.A.3 Detecting Multiple Covariance Changes 206
Exercises 206
References 207

5. CLUSTERING AND CLASSIFICATION OF TIME SERIES 211


5.1 Distances and Dissimilarities 212
5.1.1 Distance Between Univariate Time Series 212
5.1.2 Dissimilarities Between Univariate Series 215
5.1.3 Dissimilarities Based on Cross-Linear Dependency 222
5.2 Hierarchical Clustering of Time Series 228
5.2.1 Criteria for Defining Distances Between Groups 228
5.2.2 The Dendrogram 229
5.2.3 Selecting the Number of Groups 229
5.2.3.1 The Height and Step Plots 229
5.2.3.2 Silhouette Statistic 230
5.2.3.3 The Gap Statistic 233
5.3 Clustering by Variables 243
5.3.1 The k-means Algorithm 244
5.3.1.1 Number of Groups 246
5.3.2 k-Medoids 250
5.3.3 Model-Based Clustering by Variables 252
5.3.3.1 Maximum Likelihood (ML) Estimation of the AR
Mixture Model 253
5.3.3.2 The EM Algorithm 254
5.3.3.3 Estimation of Mixture of Multivariate Normals 256
5.3.3.4 Bayesian Estimation 257
5.3.3.5 Clustering with Structural Breaks 258
5.3.4 Clustering by Projections 259
5.4 Classification with Time Series 264
5.4.1 Classification Among a Set of Models 264
5.4.2 Checking the Classification Rule 267
5.5 Classification with Features 267
5.5.1 Linear Discriminant Function 268
5.5.2 Quadratic Classification and Admissible Functions 269
5.5.3 Logistic Regression 270
5.6 Nonparametric Classification 277
5.6.1 Nearest Neighbors 277
5.6.2 Support Vector Machines 278
xii CONTENTS

5.6.2.1 Linearly Separable Problems 279


5.6.2.2 Nonlinearly Separable Problems 282
5.6.3 Density Estimation 284
5.7 Other Classification Problems and Methods 286
Exercises 287
References 288

6. DYNAMIC FACTOR MODELS 291


6.1 The DFM for Stationary Series 293
6.1.1 Properties of the Covariance Matrices 295
6.1.1.1 The Exact DFM 295
6.1.1.2 The Approximate DFM 297
6.1.2 Dynamic Factor and VARMA Models 299
6.2 Fitting a Stationary DFM to Data 301
6.2.1 Principal Components (PC) Estimation 301
6.2.2 Pooled PC Estimator 303
6.2.3 Generalized PC Estimator 303
6.2.4 ML Estimation 304
6.2.5 Selecting the Number of Factors 305
6.2.5.1 Rank Testing via Canonical Correlation 306
6.2.5.2 Testing a Jump in Eigenvalues 307
6.2.5.3 Using Information Criteria 307
6.2.6 Forecasting with DFM 308
6.2.7 Alternative Formulations of the DFM 314
6.3 Generalized DFM (GDFM) for Stationary Series 315
6.3.1 Some Properties of the GDFM 316
6.3.2 GDFM and VARMA Models 317
6.4 Dynamic Principal Components 317
6.4.1 Dynamic Principal Components for Optimal Reconstruction 317
6.4.2 One-Sided DPCs 318
6.4.3 Model Selection and Forecasting 320
6.4.4 One Sided DPC and GDFM Estimation 321
6.5 DFM for Nonstationary Series 324
6.5.1 Cointegration and DFM 329
6.6 GDFM for Nonstationary Series 330
6.6.1 Estimation by Generalized DPC 330
6.7 Outliers in DFMs 333
6.7.1 Factor and Idiosyncratic Outliers 333
6.7.2 A Procedure to Find Outliers in DFM 335
6.8 DFM with Cluster Structure 336
6.8.1 Fitting DFMCS 337
CONTENTS xiii

6.9 Some Extensions of DFM 344


6.10 High-Dimensional Case 345
6.10.1 Sparse PCs 345
6.10.2 A Structural-FM Approach 347
6.10.3 Estimation 348
6.10.4 Selecting the Number of Common Factors 349
6.10.5 Asymptotic Properties of Loading Estimates 351
Appendix 6.A: Some R Commands 352
Exercises 353
References 354

7. FORECASTING WITH BIG DEPENDENT DATA 359


7.1 Regularized Linear Models 360
7.1.1 Properties of Lasso Estimator 362
7.1.2 Some Extensions of Lasso Regression 366
7.1.2.1 Adaptive Lasso 367
7.1.2.2 Group Lasso 367
7.1.2.3 Elastic Net 368
7.1.2.4 Fused Lasso 368
7.1.2.5 SCAD Penalty 368
7.2 Impacts of Dynamic Dependence on Lasso 377
7.3 Lasso for Dependent Data 383
7.4 Principal Component Regression and Diffusion Index 388
7.5 Partial Least Squares 392
7.6 Boosting 397
7.6.1 𝓁 2 Boosting 399
7.6.2 Choices of Weak Learner 399
7.6.3 Boosting for Classification 403
7.7 Mixed-Frequency Data and Nowcasting 404
7.7.1 Midas Regression 405
7.7.2 Nowcasting 406
7.8 Strong Serial Dependence 413
Exercises 414
References 414

8. MACHINE LEARNING OF BIG DEPENDENT DATA 419


8.1 Regression Trees and Random Forests 420
8.1.1 Growing Tree 420
8.1.2 Pruning 422
8.1.3 Classification Trees 422
8.1.4 Random Forests 424
xiv CONTENTS

8.2 Neural Networks 427


8.2.1 Network Training 429
8.3 Deep Learning 436
8.3.1 Types of Deep Networks 436
8.3.2 Recurrent NN 437
8.3.3 Activation Functions for Deep Learning 439
8.3.4 Training Deep Networks 440
8.3.4.1 Long Short-Term Memory Model 440
8.3.4.2 Training Algorithm 441
8.4 Some Applications 442
8.4.1 The Package: keras 442
8.4.2 Dropout Layer 449
8.4.3 Application of Convolution Networks 450
8.4.4 Application of LSTM 457
8.5 Deep Generative Models 466
8.6 Reinforcement Learning 466
Exercises 467
References 468

9. SPATIO-TEMPORAL DEPENDENT DATA 471


9.1 Examples and Visualization of Spatio Temporal Data 472
9.2 Spatial Processes and Data Analysis 477
9.3 Geostatistical Processes 479
9.3.1 Stationary Variogram 480
9.3.2 Examples of Semivariogram 480
9.3.3 Stationary Covariance Function 482
9.3.4 Estimation of Variogram 483
9.3.5 Testing Spatial Dependence 483
9.3.6 Kriging 484
9.3.6.1 Simple Kriging 484
9.3.6.2 Ordinary Kriging 486
9.3.6.3 Universal Kriging 487
9.4 Lattice Processes 488
9.4.1 Markov-Type Models 488
9.5 Spatial Point Processes 491
9.5.1 Second-Order Intensity 492
9.6 S-T Processes and Analysis 495
9.6.1 Basic Properties 496
9.6.2 Some Nonseparable Covariance Functions 498
9.6.3 S-T Variogram 499
9.6.4 S-T Kriging 500
CONTENTS xv

9.7 Descriptive S-T Models 504


9.7.1 Random Effects with S-T Basis Functions 505
9.7.2 Random Effects with Spatial Basis Functions 506
9.7.3 Fixed Rank Kriging 507
9.7.4 Spatial Principal Component Analysis 510
9.7.5 Random Effects with Temporal Basis Functions 514
9.8 Dynamic S-T Models 519
9.8.1 Space-Time Autoregressive Moving-Average Models 520
9.8.2 S-T Component Models 521
9.8.3 S-T Factor Models 521
9.8.4 S-T HMs 522
Appendix 9.A: Some R Packages and Commands 523
Exercises 525
References 525

INDEX 529
PREFACE

For the first time in human history, we are collecting data everywhere and every
second. These data grow exponentially, are produced and stored at minimum costs,
and are changing the way we learn things and control our activities, including mon-
itoring our health and using our leisure time. Statistics, as a scientific discipline, was
created in a different environment, where data were scarce, and focused mainly on
obtaining maximum efficiency of available information from small-structured data
sets. New methods are, therefore, needed to extract useful information from big data
sets, which are heterogeneous and unstructured. These methods are being developed
in statistics, computer science, machine learning, operation research, artificial intel-
ligence, and other fields. They constitute what is usually called data science. Most
advances in big data analysis so far have assumed that the data are collected from
independent subjects, so that they can be treated as independent observations. On
the other hand, empirical data are often generated over time or in space, and, hence,
they have dynamic and/or spatial dependence.
The main goal of this book is to learn from big dependent data. Data with tem-
poral dynamics have been studied in statistics as time series analysis, whereas spatial
dependence belongs to the newer area of spatial statistics. Several key ideas that
formed the core of recent developments in big data methods are from time series
analysis. For instance, the first model selection criterion was proposed by Akaike for
selecting the order of an autoregressive process, an iterative model building proce-
dure with nonlinear estimation was proposed by Box and Jenkins for autoregressive
integrated moving-average (ARIMA) models, and methods for combining forecasts,
which open the way to model averaging and ensemble methods in machine learning,
can be traced back to Bayesian forecasting. Today, analyzing big data with temporal
and spatial dependence is needed in many scientific fields, ranging from economics
and business to environmental and health science, and to engineering and computer

xvii
xviii PREFACE

vision. It is our belief that modeling and processing big dependent data is a key
emerging area of data science.
Several excellent books have been written for statistical learning with inde-
pendent data, but much less work has been done for dependent data. This book
tries to stimulate the use of available methods in forecasting and classification, to
promote further research in statistical methods for extracting useful information
from big dependent data, and to point out the potential weaknesses when the data
dependence is overlooked. We start with brief reviews of time series analysis, both
univariate and multivariate, followed by methods for handling heterogeneity when
analyzing many time series. We then discuss methods for classification and clustering
many time series and introduce dynamic factor models for modeling multivariate
or high-dimensional time series. This is followed by a thorough discussion on
forecasting in a data-rich environment, including nowcasting. We then turn to deep
learning and analysis of spatio-temporal data.
The book is applied oriented, but it also provides some basic theory to
help readers understand better the methods and procedures used. Due to the
high-dimensional nature of the problems discussed, we include some technical
derivations in a few sections, but for most parts, we refer readers interested in
rigorous mathematical proofs to the available literature. Real examples are used
throughout the book to demonstrate the analysis and applications. For empirical
data analysis, we use R extensively and provide the necessary instructions and R
scripts so that readers can reproduce the results shown in the book.
The book is organized as follows. Chapter 1 provides some examples of the data
sets considered in the book and introduces the basic ideas of temporal dependency,
stochastic process, and time series. It also discusses principal component analysis
and illustrates the weaknesses of traditional statistical inference when the data
dependence is overlooked. Chapter 2 starts with new ways to visualize large sets
of time series, summarizes the main properties of ARIMA and state space models,
including spectral analysis and Kalman filter, and presents a methodology for build-
ing univariate models for large sets of time series. Chapter 3 reviews the available
methods for building multivariate vector ARMA models and their limitations with
high-dimensional time series. It also discusses cointegration and ways to handle
unit-root multivariate nonstationary series. Real time series data are typically
heterogeneous, with clustering and certain common features, contain missing values
and outliers, and may encounter structural breaks. We address these issues in the
next two chapters. Chapter 4 presents missing value estimation and some new
methods for detecting and modeling outliers in univariate and multivariate time
series. The outliers considered include additive outliers, innovative outliers, level
shifts, transitory shifts, and parameter changes. Chapter 5 analyzes clustering, or
unsupervised classification methods, for time series, including recent procedures
for clustering time series based on their dependency or for selecting the number
of clusters. The chapter also considers time series discrimination, or supervised
classification, and covers approaches developed not only in statistics, but also in
machine learning such as support vector machines. Chapter 6 focuses on an active
research topic for modeling large sets of time series, namely dynamic factor models
and other types of factor models. The chapter describes in detail the developments
of the topic and discusses the pros and cons of several models available in the
literature, including those for the high-dimensional setting. Chapter 7 concentrates
PREFACE xix

on the key problem of forecasting large sets of time series. The application of Lasso
regression to time series is investigated, as well as procedures developed mostly in
the statistic and econometric literature for forecasting high-dimensional time series.
It also studies the method of nowcasting and demonstrates its usefulness with real
examples. Chapter 8 considers machine learning for forecasting and classification,
including neural networks and deep learning. It also studies methods developed
in the interface between Statistics and Machine Learning, commonly known as
Data Science, including Classification and Regression Trees (CART) and Random
Forests. Finally, Chapter 9 studies spatio-temporal data and their applications,
including various kriging methods for spatial predictions.
Dependent data cover a wide range of topics and methods, especially in the pres-
ence of high-dimensional data. It is too much to expect that a book can cover all
these important topics. This book is no exception. We need to make decision on the
material to include. Like other books, our choices depend heavily on our experience,
research areas, and preference. There are some important subjects, especially those
under rapid development, that we do not cover, for instance, functional data analysis
for dependent data. We hope to include this and others relevant topics in a future
edition.
The book takes advantages of many available R packages. Yet, there remain some
methods discussed in the book that cannot be implemented with any existing pack-
age. We have developed some R scripts to perform such analyses. For instance, we
have developed a new automatic modeling procedure for a large set of time series,
which is used and demonstrated in Chapter 2. We have included, with the great help
of Angela Caro and Antonio Elías, these R scripts into a new package for the book
and refer to it as the SLBDD package. In addition, the data sets used, except for
those subject to copyright protection are also included. Some R scripts are provided
in the web page of the book at https://www.rueytsay.com/slbdd.
We are grateful to many people who have contributed to our research in general
and to this book in particular. We have dedicated the book to Professor George C.
Tiao, a giant in time series analysis and a pioneer of many procedures presented here.
He is a generous mentor and a good friend of us. We have also learned a lot from
our coauthors on topics covered in the book. In particular, we like to acknowledge
the contributions from Andrés Alonso, Stevenson Bolívar, Jorge Caiado, Angeles
Carnero, Rong Chen, Nuno Crato, Chaoxing Dai, Soudeep Deb, Pedro Galeano,
Zhaoxing Gao, Victor Guerrero, Yuefeng Han, Nan-Jung Hsu, Hsin-Cheng Huang,
Ching-Kang Ing, Ana Justel, Mladen Kolar, Tengyuan Liang, Agustin Maravall,
Fabio Nieto, Pilar Poncela, Javier Prieto, Veronika Rockova, Julio Rodríguez,
Rosario Romera, Juan Romo, Esther Ruiz, Ismael Sánchez, Maria Jesús Sánchez,
Ezequiel Smucler, Victor Yohai, and Rubén Zamar. Some of the programs used
in our analyses were written by Angela Caro, Pedro Galeano, Carolina Gamboa,
Yuefeng Han, and Javier Prieto. We sincerely thank them for their wonderful works.
Chaoxing Dai helps us in many ways with the keras package for deep learning.
Without his help, we would not be able to demonstrate its applications.
Finally, we have always had the constant support of our families. DP wants to
thank his wife, Jette Bohsen, for her constant love and encouragement in all his
projects, and, in particular, for her generous continued help during the years he was
writing this book. RST likes to thank the love and care of his wife and the support
xx PREFACE

from his children. Our families are the source of our energy and inspiration. With-
out their unlimited and unselfish support, it would not have been possible for us
to complete this book, especially in this challenging time of COVID-19 pandemic
during which the book was finished. We hope that the methods presented in this book
can help, by learning from big complex dependent data, to prevent and manage the
challenging problems we will face in the future.

April 2020 D.P. MADRID, SP


R.S.T. CHICAGO, USA
CHAPTER 1

INTRODUCTION TO BIG DEPENDENT


DATA

Big data are common nowadays everywhere. Statistical methods capable of


extracting useful information embedded in those data, including machine learning
and artificial intelligence, have attracted much interest among researchers and
practitioners in recent years. Most of the available statistical methods for analyzing
big data were developed under the assumption that the observations are from
independent samples. See, for instance, most methods discussed in Bühlmann and
van de Geer (2011). Observations of big data, however, are dependent in many
applications. The dependence may occur in the order by which the data were taken
(such as time series data) or in space by which the sampling units reside (such as
spatial data). Monthly civilian unemployment rates (16 years and older) of the 50
states in the United States is an example. Unemployment rates tend to be sticky
over time and geographically neighboring states may share similar industries and,
hence, have similar unemployment patterns. For dependent data, the spatial and/or
temporal dependence is often the focus of statistical analysis. Consequently, there is
a need to study analysis of big dependent data.
The main focus of this book is to provide readers a comprehensive treatment of
statistical methods that can be used to analyze big dependent data. We start with
some examples and simple methods for their descriptive analysis. More sophisticated
methods will be introduced in other chapters. Keep in mind, however, that the anal-
ysis of big dependent data is gaining more attention as time advances. The methods
discussed in this book reflect, to a large degree, our personal experience and prefer-
ences. We hope that the book can be helpful to many researchers and practitioners

Statistical Learning for Big Dependent Data, First Edition. Daniel Peña and Ruey S. Tsay.
© 2021 John Wiley & Sons, Inc. Published 2021 by John Wiley & Sons, Inc.

1
2 INTRODUCTION TO BIG DEPENDENT DATA

in different scientific fields in their data analysis. Also, it could attract more attention
to and motivate further research in the challenging problem of analyzing big depen-
dent data.

1.1 EXAMPLES OF DEPENDENT DATA

In this section, we present some examples of big dependent data considered in the
book, introduce some statistical methods for describing such data, and provide a
framework for making statistical inference. Details of the methods introduced will
be given in later chapters. Our goal is to demonstrate the data we intend to analyze
and to highlight the consequences of ignoring their dependency. Whenever possi-
ble, we also illustrate the difficulties statistical methods developed for independent
observations are facing when they are applied to big dependent data.
By dependent data, we mean observations of the variables of interest were taken
over time or across space or both. Consequently, in these data the order in time or
space matters. By big data, we mean the number of variables, k, is large or the number
of data points, T, is large or both, and it could be that k > T. Figure 1.1 shows a data
set of three time series representing the relative average temperature of November
in Europe, North America, and South America from 1910 to 2014. The data are in
the file Temperatures.csv of the book web site. The measurement is relative
to the average temperature of November in the twentieth century. These data are
dependent because observations of consecutive years tend to be closer to each other
than observations that are many years apart. Furthermore, the relative temperature
in South America shows a growing trend since 1980, whereas the trend can only be
seen in Europe starting from 2000, yet it does not appear in the North America data.
Statistical analysis of these monthly temperature series is relatively easy, as the
number of series is only 3 and the sample size is small. It belongs to the multivariate
Europe

1
−2
1920 1940 1960 1980 2000
Time
North Amer

1
−2
1920 1940 1960 1980 2000
Time
South Amer

1.0
−0.5
1920 1940 1960 1980 2000
Time

Figure 1.1 Relative average temperatures of November in Europe, North America, and
South America from 1910 to 2014. The measurement is relative to the average temperature of
November in the twentieth century.
EXAMPLES OF DEPENDENT DATA 3

time series analysis in statistics. See, for instance, Tsay (2014). The main concern of
this book is statistical analysis when the number of time series is large or the time
intervals between observations are small, resulting in high-frequency data with large
sample sizes. Modeling big temperature data is important in many applications. As a
matter of fact, daily (even hourly) temperature is routinely taken at many locations
around the world. These temperatures affect the demands of electricity and heating
oil, are highly related to air pollution measurements, such as the particulate mat-
ter PM2.5 and ozone concentration, and play an important role in commodity prices
around the world.
Figure 1.2 shows the standardized daily stock indexes of the 99 most important
financial markets around the world, from 3 January 2000 to 16 December 2015
for 4163 observations. The data are in the file Stock_indexes_99world.csv.
The magnitudes of stock index vary markedly from one market to another so that
we standardize the indexes in the plot. Specifically, each standardized index series
has mean zero and variance one. In applications, we often analyze the returns of
stock indexes. Figures 1.3 and 1.4 show the time plots of the first six indexes and their
log returns, respectively. The log returns are obtained by taking the first difference
of the logarithm of stock index.
These time plots demonstrate several features of multiple time series. First, they
show the overall evolution and cross dependence of the financial markets. All mar-
kets exhibited a trough in 2003, had a steady increase reaching a peak around early
2008, then showed a dramatic drop caused by the 2008 financial crisis. Yet some mar-
kets experienced a gradual recovery after 2009. Second, the variabilities of the world
financial markets appear to be higher when the markets were down; see Figure 1.4.
This is not surprising as the fear factor, i.e. market volatility, is likely to be dominant
during a bear market. On the other hand, the ranges of world market indexes at a
given time seem to be smaller when the market were down; see Figure 1.2.
Figures 1.5 and 1.6 show the time plots of the daily financial indexes for 22 Asian
and 47 European markets, respectively, for the same time period, but with x-axis now

2
Stocks

−2

2000 2005 2010 2015


Year

Figure 1.2 Time plots of standardized daily stock market indexes of the 99 most important
financial markets around the world from 3 January 2000 to 16 December 2015.
4 INTRODUCTION TO BIG DEPENDENT DATA

Stock idx 1 2

Stock idx 4
2
1 1
0 0
−1 −1
−2 −2
2000 2005 2010 2015 2000 2005 2010 2015
Year Year
2
Stock idx 2

Stock idx 5
2
1 1
0 0
−1 −1
−2
2000 2005 2010 2015 2000 2005 2010 2015
Year Year
2
Stock idx 3

Stock idx 6
2
1 1
0 0
−1 −1
−2
2000 2005 2010 2015 2000 2005 2010 2015
Year Year

Figure 1.3 Time plots of the first six standardized daily stock market indexes: 3 January 2000
to 16 December 2015.

0.10
Stock idx 1

Stock idx 4

0.05
0
−0.05
−0.10
2000 2005 2010 2015 2000 2005 2010 2015
Year Year

0.10
Stock idx 2

Stock idx 5

0.05
0
−0.05
−0.10
2000 2005 2010 2015 2000 2005 2010 2015
Year Year
0.10
Stock idx 3

Stock idx 6

0.05
0
−0.05
−0.10
2000 2005 2010 2015 2000 2005 2010 2015
Year Year

Figure 1.4 Time plots of the log returns of the first six daily stock market indexes: 3 January
2000 to 16 December 2015.

30 000

10 000

0
0 1000 2000 3000 4000
Time

Figure 1.5 Time plots of daily stock market indexes of 22 Asian financial markets from 3
January 2000 to 16 December 2015.
EXAMPLES OF DEPENDENT DATA 5

8e+04

4e+04

0e+04
0 1000 2000 3000 4000
Time

Figure 1.6 Time plots of daily stock market indexes of 47 European financial markets from
3 January 2000 to 16 December 2015.

being in time index. The difference between Asian and European financial markets
is easily seen, implying that care must be exercised when analyzing similar time series
across different continents.
As a third example, Figure 1.7 shows 33 monthly consumer price indexes of
European countries from January 2000 to October 2015. The data are in the file
CPIEurope2000-15.csv. In the plot, the series have been standardized to have
zero mean and unit variance. As expected, the plot shows an upward trend of the
price indexes, but it also demonstrates sufficient differences between the series. For
example, some series show a clear seasonal pattern, but others do not.
Large sets of data are available in marketing research. As an example, Figure 1.8
shows the time plots of daily sales, in natural logarithms, of a clothing brand in 25
provinces in China from 1 January 2008 to 9 December 2012 for 1805 observations.
The data are from Chang et al. (2014) and are given in the file clothing.csv.
The plots exhibit certain annual pattern, referring to as seasonal behavior in time

1
Normalized CPI

–1

–2

2000 2005 2010 2015


Year

Figure 1.7 Time plots of 33 monthly price indexes of European countries from January 2000
to October 2015.
6 INTRODUCTION TO BIG DEPENDENT DATA

11

10
ln (sales)

2008 2009 2010 2011 2012 2013


Year

Figure 1.8 Time plots of daily sales in natural logarithms of a clothing brand in 25 provinces
in China from 1 January 2008 to 9 December 2012.

series analysis. To gain further insight, Figure 1.9 shows the same time plots for the
first eight provinces. They are Beijing, Fujian, Guangdong, Guangxi, Hainan, Hebei,
Henan, and Hubei. The levels of the plots are adjusted so that there is no overlapping
in the figure. A special characteristic of the plots is that local peaks occur irregularly
in the early part of each year, followed by certain drops in sales. This is caused by the
Chinese New Year holidays that vary from year to year. In addition, the peaks do not
occur for all provinces; see the second plot (from the top) of Figure 1.9. Analyzing
these series jointly requires modeling the common features as well as the variations
from one province to another.

2008 2009 2010 2011 2012 2013


Year

Figure 1.9 Time plots of daily sales in natural logarithms of a clothing brand in eight
provinces in China from 1 January 2008 to 9 December 2012.
EXAMPLES OF DEPENDENT DATA 7

300

PM 2.5 100
0
2006 2008 2010 2012 2014 2016
Year

300
PM 2.5

100
0
2006 2008 2010 2012 2014 2016
Year

Figure 1.10 Time plots of hourly observations of the particulate matter PM2.5 from two mon-
itoring stations in southern Taiwan from 1 January 2006 to 31 December 2015.

Figure 1.10 shows the time plots of hourly observations of the particulate matter
PM2.5 from two monitoring stations in southern Taiwan from 1 January 2006 to 31
December 2015. We drop the data of February 29 for simplicity, resulting in a sample
size 87 600. The data are given in the file TaiwanPM25.csv. The two series are parts
of many monitoring stations throughout the island. Also available at each station
are measurements of temperature, dew points, and other air pollution indexes. Thus,
this is a simple example of big dependent data. The time plots exhibit certain strong
annual patterns and seem to indicate a minor decreasing trend. Figure 1.11 shows
the first 30 000 sample autocorrelations of the PM2.5 series of Station 1. The annual
cycle with periodicity s = 24 × 365 = 8760 is clearly seen from the autocorrelations.
Figure 1.12, on the other hand, shows the first 120 sample autocorrelations of the
same time series. The plot shows that there also exists a daily cyclic pattern in the

Series PM2.5
1.0

0.8

0.6

0.4
ACF

0.2

0.0

–0.2

0 5 000 10 000 15 000 20 000 25 000 30 000


Lag

Figure 1.11 Sample autocorrelations of PM2.5 measurement of Station 1 in southern Taiwan.


The first 30 000 autocorrelations are shown. The annual cycle with periodicity 8760 is clearly
seen.
8 INTRODUCTION TO BIG DEPENDENT DATA

Series PM2.5
1.0

0.8

0.6
ACF

0.4

0.2

0.0

–0.2
0 20 40 60 80 100 120
Lag

Figure 1.12 Sample autocorrelations of PM2.5 measurement of Station 1 in southern Taiwan.


The first 120 autocorrelations are shown. The daily cyclical pattern with periodicity 24 is clearly
seen.

PM2.5 series with a periodicity 𝜆 = 24. Consequently, analysis of such a big dependent
data should consider not only the overall trend, but also various cyclic patterns with
different frequencies.
Finally, Figure 1.13 plots the monitoring stations (in circle) and the ozone levels
(in color-coded squares) of US Midwestern states on 20 June 1987. The data set is
available from the R package fields. The states are added to the plot so that readers
can understand the geographical features of ozone levels in the Midwest. Clearly,
on this particular day, high ozone readings appeared in the southwest of Illinois and
south of Indiana. Also, ozone levels of neighboring monitoring stations are similar.
This is an example of spatial process discussed in Chapter 9.
The six data sets presented are examples of multivariate spatio-temporal series,
where we observe the values of several variables of interest over time and space.

44
Michigan:south 80

42 Iowa 60

Indiana oh 40
40 Illinois

20
Missouri
38
Kentucky
0

92 90 88 86 84

Figure 1.13 Locations and ozone levels at various monitoring stations in US Midwestern
states. The measurements are 8-hour averages (9 a.m. to 4 p.m.) on 20 June 1987.
STOCHASTIC PROCESSES 9

The time sequence introduces dynamic dependency in a given series, which may be
time varying. The data also show cross dependence between series and the strength
of cross dependence may, in turn, depend on geographical locations or economic
relations among countries. One of the main focuses of the book is to investigate both
the dynamic and cross dependence between multiple time series.
The objectives of analyzing big dependent data include (a) to study the relation-
ships between the series, (b) to explore the dynamic dependence within a series and
across series, and (c) to predict the future values of the series. For spatial processes,
one might be interested in predicting the variable of interest at a location where
no observations are available. The first two objectives are referred to as data anal-
ysis and modeling, whereas the third objective involves statistical inference. Both
modeling and inference require not only methods for handling data, but also the
theory on which proper inference can be drawn. To this end, we need a framework
for understanding the characteristics of the time series under study and for making
sound predictions. We discuss this framework by starting with stochastic processes
in the next section.

1.2 STOCHASTIC PROCESSES

Theory and properties of stochastic processes form the foundation for statistical anal-
ysis and inference of dependent data.

1.2.1 Scalar Processes


A scalar stochastic process is a sequence of random variables {zt }, where the sub-
script t takes values in a certain set C. In most cases, the elements of C are ordered and
often correspond to certain calendar time (days, months, years, etc.). The resulting
stochastic process zt is called a time series. Unless specifically mentioned, we assume
that the time index t is equally spaced in the book. For a spatial process, C consists
of locations of certain geographical regions. The locations may denote the latitude
and longitude of an observation station or a city. For each value t in C (e.g. each
point in time), zt is a real-valued random variable defined on a common measurable
space. The observed values, also denoted by {zt } for simplicity, form a realization of
the stochastic process. We shall denote the realization by zT1 = (z1 , . . . , zt , . . . , zT )′ ,
where T is the sample size and A′ denotes the transpose of the matrix or vector
A. The sample size T denotes the number of observations for a time series or the
number of locations for a pure spatial process.
Statistical properties of a stochastic process zt are characterized by its probability
distribution. More specifically, for a given T (finite and fixed), properties of zT1 are
determined by the joint probability distribution of its elements {zt |t = 1, . . . , T}. The
marginal distribution of any non-empty subset of zT1 can be obtained from the joint
distribution by integrating out elements not in the subset. In particular, the proba-
bility distribution of the individual element zt is called the marginal distribution of
zt . Thus, similar to the general finite-dimensional random variable, we say that the
probabilistic structure of zT1 is known when we know its joint probability distribution.
In this book, we assume that a probability distribution has a well-defined den-
sity function. Let ft (z) be the marginal probability density function of zt . Define the
10 INTRODUCTION TO BIG DEPENDENT DATA

expected value or mean of zt as

𝜇t = E(zt ) = zft (z)dz,


∫R

provided that the integral exists, where R denotes the real line. The sequence 𝝁T1 =
(𝜇1 , 𝜇2 , . . . , 𝜇T ) is the mean function of zT1 . Define the variance of zt as

𝜎t2 = E(zt − 𝜇t )2 = (z − 𝜇t )2 ft (z)dz


∫R

provided that the integral exists. The sequence {𝝈 2t }T1 = (𝜎12 , . . . , 𝜎T2 ) is the variance
function of zT1 . Higher order moments can be defined similarly, assuming that they
exist.
A special case of interest is that all elements of zT1 have the same mean. In this
case, 𝝁T1 is a constant function. If all elements of zT1 have the same variance, then
{𝝈 2t }T1 is a constant function. If both 𝝁T1 and {𝝈 2t }T1 are constants for all finite T, then
the stochastic process zt is stable and we say that it is a homogeneous process. In
some applications, 𝝁T1 or {𝝈 2t }T1 may not exist.
In what follows, we assume that 𝜎t2 exists for all t. The linear dynamic dependence
of a stochastic process zt is determined by its autocovariance or autocorrelation func-
tion (ACF). For zt and zt−j , where j is a given integer, we define their autocovariance
as
𝛾(t, t − j) = Cov(zt , zt−j ) = E[(zt − 𝜇t )(zt−j − 𝜇t−j )]. (1.1)

When j = 0, 𝛾(t, t) = Var(zt ) = 𝜎t2 . Also, from the definition, it is clear that 𝛾(t, t − j) =
𝛾(t − j, t). The autocorrelation between zt and zt−j is defined as

𝛾(t, t − j)
𝜌(t, t − j) = . (1.2)
𝜎t 𝜎t−j

Of particular importance in real applications is the case in which the autocovariance


and, hence, the autocorrrelation between zt and zt−j is a function of the lag j, and not a
function of t. In this case, the autocovariance and autocorrelation are time-invariant
and we write 𝛾(t, t − j) = 𝛾j and 𝜌(t, t − j) = 𝜌j .

1.2.1.1 Stationarity A stochastic process zt is strictly stationary if the


joint distributions of the m-dimensional random vectors (zt1 , zt2 , . . . , ztm ) and
(zt1 +h , zt2 +h , . . . , ztm +h ) are the same, where m is an arbitrary positive integer,
{t1 , . . . , tm } are arbitrary m time indexes, and h is an arbitrary integer. In other words,
the joint distribution of arbitrary m random variables is time-invariant. In particular,
the marginal probability density ft (z) of a strictly stationary process zt does not
depend on t, and we simply denote it by f (z). A stochastic process zt is weakly
stationary if its mean and autocovariance functions exist and are time-invariant.
Specifically, a process zt is stationary in the weak sense if (1) 𝜇t = 𝜇 for all t; (2)
𝜎t2 = 𝜎 2 for all t; and (3) 𝛾(t, t − j) = 𝛾j for all t, where j is a given integer. The
first two conditions say that the mean and variance are time-invariant. The third
condition states that the autocovariance and, hence, the autocorrelation between
two variables depend only on their time separation j. Unless specifically stated, we
STOCHASTIC PROCESSES 11

use stationarity to denote weak stationarity of a stochastic process. For this process
the autocovariance and autocorrelation functions depend only on the difference of
lags and, in particular, we have 𝛾0 = 𝜎 2 , 𝛾j = 𝛾−j and 𝜌j = 𝜌−j , where j is an arbitrary
integer.
Assume that zt is stationary. For a given positive integer m, the covariance matrix
of (zt , zt−1 , . . . , zt−m+1 )′ is defined as

⎧⎡ z − 𝜇 ⎤ ⎫
⎪⎢ t

⎪⎢ zt−1 − 𝜇 ⎥⎥ ⎪
M z (m) = E ⎨ [(zt − 𝜇), (zt−1 − 𝜇), . . . , (zt−m+1 − 𝜇)]⎬
⎪⎢⎢ ⋮ ⎥ ⎪
⎪⎣ zt−m+1 − 𝜇 ⎥⎦ ⎪
⎩ ⎭

⎡ 𝛾0 𝛾1 · · · 𝛾m−1 ⎤
⎢ 𝛾 𝛾0 · · · 𝛾m−2 ⎥
=⎢ ⎥.
1
(1.3)
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎣ 𝛾m−1 𝛾m−2 · · · 𝛾0 ⎦

This covariance matrix has the special pattern that (a) all diagonal elements are the
same, (b) all elements above and below the main diagonal are the same, and so on.
Such a matrix is called a Toeplitz matrix. The corresponding autocorrelation matrix
is
⎡ 1 𝜌1 · · · 𝜌m−1 ⎤
⎢ 𝜌 1 · · · 𝜌m−2 ⎥
Rz (m) = ⎢ ⎥,
1
(1.4)
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎣ 𝜌m−1 𝜌m−2 · · · 1 ⎦

which is also a Toeplitz matrix.


An important property of a stationary process is that any non-trivial finite linear
combination of its elements remains stationary. In other words, a process obtained by
any non-zero linear combination (with finite elements) of a stationary process is also
stationary. For example, if zt is stationary, the process 𝑤t defined by 𝑤t = zt − zt−1
is also stationary. A consequence of this property is that the autocovariances of zt
must satisfy certain conditions that are useful for studying stochastic processes. In
particular, we can consider a non-trivial linear combination of a stationary process
zt and its lagged values, namely

yt = c1 zt + c2 zt−1 + · · · + cm zt−m+1 = c′ zt,m

where c = (c1 , . . . , cm )′ ≠ 𝟎 and zt,m = (zt , zt−1 , . . . , zt−m+1 )′ . Since yt is stationary, its
variance exists and is given by c′ M z (m)c ≥ 0, where M z (m) is defined in Eq. (1.3).
Consequently, M z (m) must be a nonnegative definite matrix. Similarly, the correla-
tion matrix Rz (m) must also be nonnegative definite.
Strict stationarity implies weak stationarity provided that the first two moments
of the process exist. But weak stationarity does not guarantee strict stationarity of a
stochastic process. Consider the following simple example. Let {zt } be a sequence
of independent and identically distributed standard normal random variables,
12 INTRODUCTION TO BIG DEPENDENT DATA

i.e. zt ∼iid N(0, 1). Define a stochastic process xt by


{
0.5(z2t − 1), if t is odd
xt =
zt , if t is even.

It is easy to show that (a) E(xt ) = 0, (b) Var(xt ) = 1, and (c) E(xt xt−j ) = 0 for j ≠ 0.
Therefore, {xt } is a weakly stationary process. On the other hand, xt is Gaussian if t
is even and it is a shifted 𝜒 2 with one degree of freedom when t is odd. The process
{xt } is not identically distributed and, hence, is not strictly stationary. If we assume
that zT1 follows a multivariate normal distribution for all T, then weak stationarity
is equivalent to strict stationarity, because a normal (or Gaussian) distribution is
determined by its first two moments.

1.2.1.2 White Noise Process An important stationary process is the white noise
process. A stochastic process zt is a white noise process if it satisfies the following
conditions: (1) E(zt ) = 0; (2) Var(zt ) = 𝜎 2 < ∞; and (3) Cov(zt , zt−j ) = 0 for all j ≠ 0.
In other words, zt is a white noise series if and only if it has a zero mean and finite
variance, and is not serially correlated. Thus, a white noise is not necessarily strictly
stationary. If we impose the additional condition that zt and zt−j are independent and
have the same distribution, where j ≠ 0, then zt is a strict white noise process.

1.2.1.3 Conditional Distribution In addition to marginal distributions, condi-


tional distributions also play an important role in studying stochastic processes. Let
zk𝓁 = (z𝓁 , z𝓁+1 , . . . , zk )′ , where 𝓁 and k are positive integers and 𝓁 ≤ k. In prediction,
we are interested in the conditional distribution of zT+h T+1
given zT1 , where h ≥ 1 is
referred to as the forecast horizon. In estimation, we express the joint probability
density function of zT1 as a product of the conditional density functions of zt+1 given
zt1 for t = T − 1, T − 2, . . . , 1. More details are given in later chapters.
If the stochastic process zt has the following property

f (zt+1 |zt1 ) = f (zt+1 |zt ), t = 1, 2, . . . ,

then zt is a first-order Markov process. The aforementioned property is referred to


as the memoryless of a Markov process as the conditional distribution only depends
on the variable one period before.

1.2.2 Vector Processes


A stochastic vector process is a sequence of random vectors {zt }, where the index
t assumes values in a certain set C and zt = (z1t , . . . , zkt )′ is a k-dimensional vector.
Each component zit follows a scalar stochastic process, and the vector process is char-
acterized by the joint probability distribution of the random variables (z1 , . . . , zT ).
Similar to the scalar case, we define the mean vector of zt as

E(zt ) = [E(z1t ), E(z2t ), . . . , E(zkt )]′ = (𝜇1t , . . . , 𝜇kt )′ ≡ 𝝁t ,

provided that the expectations exist, and the lag-𝓁 autocovariance function as

𝚪(t, t − 𝓁) = Cov(zt , zt−𝓁 ) = E[(zt − 𝝁t )(zt−𝓁 − 𝝁t−𝓁 )′ ] (1.5)


STOCHASTIC PROCESSES 13

provided that the covariances involved all exist, where 𝓁 is an integer. We say that the
vector process {zt } is weakly stationary if (a) 𝝁t = 𝝁, a constant vector, and (b) 𝚪(t, t −
𝓁) depends only on 𝓁. In other words, a k-dimensional stochastic process zt is weakly
stationary if its first two moments are time-invariant. In this case, we write 𝚪(t, t −
𝓁) = E[(zt − 𝝁)(zt−𝓁 − 𝝁)′ ] = 𝚪(𝓁) = [𝛾ij (𝓁)], where 1 ≤ i, j ≤ k. In particular, the lag-0
autocovariance matrix of a weakly stationary k-dimensional stochastic process zt is
given by
⎡ 𝛾11 (0) 𝛾12 (0) · · · 𝛾1k (0) ⎤
⎢ 𝛾 (0) 𝛾 (0) · · · 𝛾 (0) ⎥
𝚪(0) = ⎢ ⎥
21 22 2k
(1.6)
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎣ 𝛾k1 (0) 𝛾k2 (0) · · · 𝛾kk (0) ⎦

which is symmetric, because 𝛾ij (0) = 𝛾ji (0). Note that the diagonal element 𝛾ii (0) =
𝜎i2 = Var(zit ).
For a stationary vector process zt , its lag-𝓁 autocovariance matrix 𝚪(𝓁) = [𝛾ij (𝓁)]
measures the linear dependence among the component series zit and their lagged
variables zj,t−𝓁 . It pays to study the exact meaning of individual element 𝛾ij (𝓁) of 𝚪(𝓁).
First, the diagonal element 𝛾ii (𝓁) is the lag-𝓁 autocovariance of the scalar process zit
(i = 1, . . . , k). Second, the off-diagonal element 𝛾ij (𝓁) is

𝛾ij (𝓁) = E[(zit − 𝜇i )(zj,t−𝓁 − 𝜇j )],

which measures the linear dependence of zit on the lagged value zj,t−𝓁 for 𝓁 > 0.
Thus, 𝛾ij (𝓁) quantifies the linear dependence of zit on the 𝓁th past value of zjt . Third,
in general, 𝚪(𝓁) is not symmetric if 𝓁 > 0. As a matter of fact, 𝛾ij (𝓁) ≠ 𝛾ji (𝓁) for most
stochastic process zt because, as stated before, 𝛾ij (𝓁) denotes the linear dependence
of zit on zj,t−𝓁 , whereas 𝛾ji (𝓁) quantifies the linear dependence of zjt on zi,t−𝓁 . Finally,
we have

𝛾ij (−𝓁) = E[(zit − 𝜇i )(zj,t+𝓁 − 𝜇j )]


= E[(zj,𝑣 − 𝜇j )(zi,𝑣−𝓁 − 𝜇i )] (𝑣 = t + 𝓁)
= 𝛾ji (𝓁). (by stationarity)

Therefore, we have 𝛾ij (−𝓁) = 𝛾ji (𝓁) for all 𝓁. Consequently, the autocovariance matri-
ces of a stationary vector process zt satisfy

𝚪(−𝓁) = [𝚪(𝓁)]′ ,

and, hence, it suffices to consider 𝚪(𝓁) for 𝓁 ≥ 0 in practice. If the dimension of 𝚪(𝓁) is
needed to avoid any confusion, we write 𝚪(𝓁) ≡ 𝚪k (𝓁) with the subscript k denoting
the dimension of the underlying stochastic process zt .
A global measure of the overall variability of the process zt is given by its general-
ized variance, which is the determinant of the lag-0 covariance matrix 𝚪(0), i.e. |𝚪(0)|.
The standardized, by the dimension, measure is the effective variance, defined by

EV(zt ) = |𝚪(0)|1∕k . (1.7)

For instance, if zt is two dimensional, then EV(zt ) = [𝛾11 (0)𝛾22 (0){1 − 𝜌212 (0)}]1∕2 ,
where 𝜌12 (0) is the contemporaneous correlation coefficient between z1t and z2t .
14 INTRODUCTION TO BIG DEPENDENT DATA

Clearly, EV(zt ) assumes its largest value when the two series are uncorrelated, i.e.
𝜌12 (0) = 0. For a k-dimensional process zt , as the determinant of a square-matrix is
the product of its eigenvalues, the effective variance is the geometrical mean of the
eigenvalues of 𝚪(0).
We can summarize the properties of a stationary vector process zt by using linear
combinations of its components. Consider, for example, the linear process


k
yt = c′ zt = ci zit ,
i=1

where c = (c1 , . . . , ck )′ ≠ 𝟎. This process is stationary because it is a finite linear


combination of the stationary components zit (i = 1, . . . , k). More specifically,
the expectation is E(yt ) = c1 E(z1t ) + · · · + ck E(zkt ) = c′ E(zt ) = c′ 𝝁, which is
time-invariant. Similarly, the variance of yt is given by Var(yt ) = Var(c′ zt ) = c′ Var(zt )
c = c′ 𝚪(0)c, which is also time-invariant. In general, we have Cov(yt , yt−𝓁 ) = c′ 𝚪(𝓁)c,
which depends only on 𝓁. Thus, yt is stationary. In addition, since Var(yt ) is non-
negative and c is an arbitrary non-zero vector, we conclude that 𝚪(0) is nonnegative
definite.
Similar to the scalar case, the autocovariance matrices in Eq. (1.5) of a stationary
vector process zt must satisfy certain conditions. To see this, we consider a non-zero
linear combination of zt and its m lagged values, say,


m
yt = b′i zt−i+1 = b′1 zt + b′2 zt−1 + · · · + b′m zt−m+1
i=1

where bj = (bj1 , . . . , bjk )′ is a k-dimensional real-valued vector for j = 1, . . . , m.


Defining b = (b′1 , b′2 , . . . , b′m )′ , which is a non-zero km-dimensional constant vector,
and zt,m = (z′t , z′t−1 , . . . , z′t−m+1 )′ , we have Var(yt ) = b′ Var(zt,m )b = b′ M z (m)b, where

⎡ 𝚪(0) 𝚪(1) · · · 𝚪(m − 1) ⎤


⎢ 𝚪(−1) 𝚪(0) · · · 𝚪(m − 2) ⎥
M z (m) = ⎢ ⎥ (1.8)
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎣ 𝚪(−m + 1) 𝚪(−m + 2) ··· 𝚪(0) ⎦

is a nonnegative block Toeplitz matrix. This M z (m) matrix is the generalization of


M z (m) in Eq. (1.3) for the scalar series zt .
The cross-correlation matrices (CCMs) of zt are defined as the autocovariances
of the standardized process xt = D−1∕2 (zt − 𝝁), where the matrix D is defined
as diag(𝛾11 (0), . . . , 𝛾kk (0)), consisting of the variances of the components of zt .
Specifically, the lag-𝓁 CCM of zt is

Rz (𝓁) = Cov(xt , xt−𝓁 ) = D−1∕2 Cov(zt , zt−𝓁 )D−1∕2

⎡ 𝜌11 (𝓁) 𝜌12 (𝓁) · · · 𝜌1k (𝓁) ⎤


⎢ 𝜌 (𝓁) 𝜌 (𝓁) · · · 𝜌2k (𝓁) ⎥
=⎢ ⎥.
21 22
(1.9)
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ ⎥
⎣ 𝜌k1 (𝓁) 𝜌k2 (𝓁) · · · 𝜌kk (𝓁) ⎦
SAMPLE MOMENTS OF STATIONARY VECTOR PROCESS 15

From the definition, we have 𝜌ii (0) = 1 and


𝛾ij (𝓁)
𝜌ij (𝓁) = √ .
𝛾ii (0)𝛾jj (0)

Similarly to the autocovariance matrices, we have M z (𝓁) = [M z (−𝓁)]′ .


The effective correlation of zt is given by
ERz = 1 − |Rz (0)|1∕k (1.10)
and it assumes a value between zero and one. For instance, for k = 2, ERz = 1 −
[1 − 𝜌12 (0)]1∕2 . See Peña and Rodriguez (2003) for more properties.

1.2.2.1 Vector White Noises A k-dimensional vector process zt is a white noise


process if (a) E(zt ) = 𝟎, (b) Var(zt ) = 𝚪(0) is positive-definite, and (c) 𝚪(𝓁) = 𝟎 for all
𝓁 ≠ 0. This is a direct generalization of the scalar white noise. Thus, a vector white
noise process consists of random vectors that have zero means, positive-definite
covariance, and no lagged serial or cross correlations.

1.2.2.2 Invertibility In many applications, we use linear regression type of models


to describe the dynamic dependence of a vector process. The model can be written as


zt = 𝝅 0 + 𝝅 i zt−i + at (1.11)
i=1

where {at } is a stationary vector white noise process, 𝝅 0 is a k-dimensional vector of


constants, and 𝝅 i are k × k real-valued matrices. For Eq. (1.11) to be meaningful, the
coefficient matrices must satisfy the condition


‖𝝅 i ‖ < ∞ (1.12)
i=1
(∑ ∑ )1∕2
k k
where ‖A‖ denotes a norm of the matrix A, e.g. ‖A‖ = i=1
2
j=1 aij , which
is the Frobenius norm. A vector process zt is invertible if it can assume the model
in (1.11) that satisfies the condition of Eq. (1.12). If the summation in Eq. (1.11) is
truncated at a finite integer p, then zt is invertible and the process follows a vector
autoregressive (VAR) model.
Invertibility is another important concept in time series analysis. A necessary con-
dition for invertible model is that ‖𝝅 i ‖ → 0 as i → ∞. This means that the added con-
tribution of zt−𝓁 to zt conditioned on the available information {zt−1 , zt−2 , . . . , zt−𝓁+1 }
is diminishing as 𝓁 increases. Consider the scalar process zt , any AR(p) process with
finite p is invertible. On the other hand, the process zt = at − at−1 , where {at } is a
white noise, is stationary, but non-invertible.

1.3 SAMPLE MOMENTS OF STATIONARY VECTOR PROCESS

In real applications, we often observe a sequence of realizations, say {z1 , . . . , zT }, of


a stationary vector process zt . Our goal is to investigate properties of zt based on the
observed data. To this end, certain exploratory data analysis is useful. In particular,
much can be learned about zt by studying its sample moments.
16 INTRODUCTION TO BIG DEPENDENT DATA

1.3.1 Sample Mean


An unbiased estimator of the population mean of the scalar process zit is the sample
mean
1 ∑
T
z̄ i = z . (1.13)
T t=1 it
Putting them together, the sample mean of the vector process zt is

1 ∑
T
z̄ = (̄z1 , . . . , z̄ k )′ = z. (1.14)
T t=1 t
For an independent and identically distributed random sample, it is easy to see that
E(̄zi ) = E(zit ) = 𝜇i , and the variance of z̄ i is 𝜎i2 ∕T, where 𝜎i2 = Var(zit ). Therefore,
the sample mean in this case is a consistent estimator of the population mean 𝜇i as
T → ∞.
Note that for a stationary process, the consistency of the sample mean does not
necessarily hold. For example, let z11 be a random sample from a normal distribution
with mean zero and variance 𝜎12 . Let z1t = z11 for t = 2, . . . , T. This special process
{z1t } is stationary because (1) E(z1t ) = 0, which is time-invariant, (2) Var(z1t ) = 𝜎12 ,
which is also time-invariant, and Cov(z1t , z1,t−j ) = 𝜎12 , which is time-invariant for all j.
However, in this particular case we have z̄ 1 = z11 and Var(̄z1 ) = 𝜎12 , no matter what
the sample size T is. Therefore, the sample mean is not a consistent estimator for this
special process as T → ∞.
The condition under which the sample mean z̄ i is a consistent estimator of the
population mean 𝜇i is referred to as the ergodic condition of a stochastic process. The
ergodic theory is concerned with conditions under which the time average of a long
realization (sample mean) converges to the space average (average of many random
samples at a given time index) of a dynamic system. For a scalar ∑ weakly stationary zt ,
the ergodic condition for z̄ to converge to 𝜇 = E(zt ) is that ∞ i=0 |𝛾i | < ∞. We assume
this condition in the book from now on.
For a weakly stationary vector process zt , the covariance matrix of the sample
mean z̄ is given by
[{ }{ }]
[ ] 1 ∑T
1 ∑T
Var(̄z) = E (̄z − 𝝁)(̄z − 𝝁)′ = E (z − 𝝁) (z − 𝝁)′
T t=1 t T j=1 j
[ {T }] [ T−1 ]
1 ∑ ∑ ∑
T
1
= 2 E (zt − 𝝁) (zj − 𝝁) ′
= 2 (T − |j|)𝚪(j) . (1.15)
T t=1 j=1
T j=−(T−1)

This implies that the asymptotic variance of the scalar sample mean z̄ i is


T × Var(̄zi ) → 𝛾ii (0) + 2 𝛾ii (𝓁), as T → ∞. (1.16)
𝓁=1

where, again, 𝛾ii (𝓁) is the lag-𝓁 autocovariance of zit . From Equation (1.2), if {zit } is
white noise the variance of the sample mean z̄ i is 𝜎i2 ∕T. In general, we have
[ ]
1 ∑T
Var(̄zi ) ≈ 𝛾 (0) + 2 𝛾ii (l) , as T → ∞.
T ii l=1
SAMPLE MOMENTS OF STATIONARY VECTOR PROCESS 17

Therefore, Var(̄zi ) could be large if the sum of the autocovariances of zit is large. A
necessary (although not sufficient) condition for the sum of autocovariances of zit to
converge is
lim 𝛾ii (l) → 0,
l→∞

which implies that the dependence of zit on its past values zi,t−l vanishes as l increases.

1.3.2 Sample Covariance and Correlation Matrices


For a vector process zt , the lag-𝓁 sample autocovariance matrix is

1 ∑
T
̂
𝚪(𝓁) = (z − z̄ )(zt−𝓁 − z̄ )′ . (1.17)
T t=𝓁+1 t

In particular, the sample covariance matrix is

∑T
̂ = 1
𝚪(0) (z − z̄ )(zt − z̄ )′ .
T t=1 t

The lag-𝓁 sample CCM of zt is then

̂
R(𝓁) ̂ −1∕2 𝚪(𝓁)
=D ̂ D ̂ −1∕2 , (1.18)
̂ = diag{̂𝛾11 (0), . . . , 𝛾̂kk (0)}. From Eq. (1.18), we have
where D
𝛾̂ij (𝓁)
̂
R(𝓁) = [𝜌̂ij (𝓁)] with 𝜌̂ij (𝓁) = √ .
𝛾̂ii (0)̂𝛾jj (0)

Under some mild conditions, R(𝓁)̂ is a consistent estimator of R(𝓁) for a weakly
stationary process zt . The conditions are met if zt is a stationary Gaussian process.
̂
The asymptotic properties of R(𝓁), however, are rather complicated. They depend
on the dynamic dependence of the process zt . However, the results for some special
cases are easier to understand. Suppose the vector time series zt is a white noise
process so that R(𝓁) = 𝟎 for 𝓁 ≠ 0. Then, we have
1
Var[𝜌̂ij (𝓁)] ≈ , 𝓁 ≠ 0.
T
and, for i ≠ j,
[1 − 𝜌2ij (0)]2
Var[𝜌̂ij (0)] ≈ . (1.19)
T
For further details, see, for instance, Reinsel (1993, section 4.1.2). These simple results
are useful in exploratory data analysis of vector time series. This is particularly so
when the dimension k is large. An important feature in analyzing time series data is
to explore the dynamic dependence of the observed time series. For a k-dimensional
series, each sample CCM R(𝓁) ̂ is a k × k matrix. If k = 10, then each CCM contains
100 sample correlations. It would not be easy to decipher the pattern embedded in
the sample CCM R(𝓁)̂ simultaneously for 𝓁 = 1, . . . , m, where m is a given positive
18 INTRODUCTION TO BIG DEPENDENT DATA

integer. To aid the reading of sample CCM, Tiao and Box (1981) device a simplified
̂
approach to inspect the features of sample CCM. Specifically, consider R(𝓁) for 𝓁 >
0. Tiao and Box define a corresponding simplified matrix S(𝓁) = [Sij (𝓁)], where

⎧ + if 𝜌̂ij (𝓁) ≥ 2∕ T,
⎪ √
Sij (𝓁) = ⎨ ⋅ if |𝜌̂ij (𝓁)| < 2∕ T,
⎪ √
⎩ − if 𝜌̂ij (𝓁) ≤ −2∕ T,

where it is understood that 1∕ T is the sample standard error of elements of R(𝓁)̂
provided that zt is a Gaussian white noise. It is much easier to comprehend the
dynamic dependence of zt by inspecting the simplified CCM S(𝓁) for 𝓁 = 1, . . . , m.
We demonstrate the effectiveness of the simplified CCM via an example.

Example 1.1

Consider the three temperature series of Figure 1.1. The sample means of the three
series are 0.15, 0.14, and 0.12, respectively, for Europe, North America, and South
America. The sample standard deviations are 1.02, 1.06, and 0.47, respectively. The
average temperatures of November for South America appear to be lower and rel-
atively less variable compared with those of Europe and North America. Turn to
sample CCMs. In this particular instance, k = 3 so that each CCM contains 9 real
numbers. Suppose we like to examine the dynamic dependence of the three series
using the first 12 lags of sample CCM. This would require us to decipher 108 num-
bers simultaneously. On the other hand, the dynamic dependence pattern is relatively
easy to comprehend by examining the simplified CCMs, which are given below.

Simplified matrix:
CCM at lag: 1
. + +
. . +
. . +
CCM at lag: 2
. . +
. . +
+ . +
CCM at lag: 3
. . +
. . .
. + +
CCM at lag: 4
. . .
. . .
. + +
CCM at lag: 5
. . .
. . +
. + +
SAMPLE MOMENTS OF STATIONARY VECTOR PROCESS 19

CCM at lag: 6
. . +
. . .
. . +
CCM at lag: 7
. . .
. . +
. . +
CCM at lag: 8
. . .
. . .
. . +
CCM at lag: 9
. . +
. . .
. . +
CCM at lag: 10
. . .
. . .
. . +
CCM at lag: 11
. . .
. . +
. . +
CCM at lag: 12
. . .
. . +
. . +

From the simplified CCM, we make the following observations. First, the dynamic
dependence of the temperature series of South America appears to be persistent as
all of its sample ACFs are large (showing by the + sign). This is not surprising as
the time plot of the series in Figure 1.1 shows an upward trend. Second, the tem-
perature series of Europe and North America are dynamically correlated with that
of South America because there exist several + signs at the (1,3)th and (2,3)th ele-
ments of the CCMs. Finally, there seems to have no strong dynamic dependence in
November temperatures between Europe and North America, because most sample
cross-correlations at the (1,2)th and (2,1)th positions are small. ◾

Remarks. Some remarks are in order.


• The demonstration of Example 1.1 is carried out by the MTS package in R. The
command used is ccm with default options.

da <- read.table(“temperatures.txt”,header=TRUE)
da <- da[,-1] # remove time index
require(MTS)
ccm(da)
20 INTRODUCTION TO BIG DEPENDENT DATA

• If the dimension k is close to or larger than the sample size T, then the afore-
mentioned properties of sample CCMs do not hold. We shall discuss the situa-
tion in later chapters, e.g. Chapter 6.
• When the dimension k is large, even the simplified CCM S(𝓁) becomes hard
to comprehend and some summary statistics must be used. In this book, we
provide three summary statistics to extract helpful information embedded in
the sample CCMs. They are given below:
1. P-value plot: A scatterplot of the p-values of the null hypothesis H0 ∶ R(𝓁) =
𝟎 versus lag 𝓁. The test statistic used asymptotically follows a 𝜒 2 2 distribution
k
under the null hypothesis that there are no serial or cross correlations in zt
for 𝓁 > 0. Details of the test statistic is given in Chapter 3.
2. Diagonal element plot: A scatterplot of the fraction of significant diagonal
̂ ̂
elements of R(𝓁) √ 𝓁. Here an element of R(𝓁) is said to be significant
versus
if it is greater than 2∕ T in absolute value.
3. Off-diagonal element plot: A scatterplot of the fraction of significant
̂
off-diagonal elements of R(𝓁) √ 𝓁. Again, the significance of an
versus
̂
element of R(𝓁) is with respect to 2∕ T.

The R command for the three summary plots is Summaryccm of the SLBDD pack-
age, which is a package associated with this book.

Example 1.2

Consider the daily log returns of the 99 financial market indexes of the world in
Figure 1.2. Here k = 99 and it would be hard to comprehend any 99-by-99 cross cor-
relation matrix. Figure 1.14 shows the three summary statistics of the daily log returns.
The upper plot is the scatterplot of p-values. From the plot, all p-values are close to
zero indicating that there are serial and cross-correlations in the 99-dimensional log
returns for lags 1 to 12. The middle plot shows the fraction of significant diagonal
̂
elements of R(𝓁). This plot shows that there are significant serial correlations in the
daily log returns of 99 world financial indexes. But, as expected, the serial correla-
tions decay quickly so that the fraction approaches zero as 𝓁 increases. The bottom
plot shows the fraction of significant off-diagonal elements of R(𝓁)̂ versus 𝓁. The
plot suggests that there is significant cross dependence among the daily log returns
of world financial market indexes when 𝓁 is small. As expected, the cross dependence
also decays to zero quickly as 𝓁 increases.

R command used for the summary CCM plots:

require(SLBDD)
da <- read.csv2(“Stock_indexes_99world.csv”,header=FALSE)
da <- as.matrix(da[,-1]) # remove the time index
rt <- diff(log(da)) # log returns
Summaryccm(rt)

NONSTATIONARY PROCESSES 21

p-Value
0.8
0.4
0.0
2 4 6 8 10 12
Lag
Diagonal

0.8
0.4
0.0
2 4 6 8 10 12
Lag
Off−diagonal

0.8
0.4
0.0
2 4 6 8 10 12
Lag

Figure 1.14 Scatterplots of three summary statistics of lagged CCMs of the daily log returns
of 99 world financial market indexes. The summary statistics are (a) p-values of testing
̂
R(𝓁) = 𝟎, (b) fractions of significant diagonal elements of R(𝓁), and (c) fractions of significant
̂
off-diagonal elements of R(𝓁).

1.4 NONSTATIONARY PROCESSES

A vector process zt is nonstationary if it fails to meet the stationarity conditions.


Simply put, if some joint distributions of zt change over time, then zt is nonstation-
ary. In this case, the moments of zt may depend on time. Since we typically have a
single observation at a given time index t, further assumptions are needed to ren-
der estimation possible for a nonstationary process. The most commonly employed
nonstationary processes are the integrated processes. Such processes are commonly
referred to as unit-root nonstationary series.
A scalar time series zt is called an integrated process of order 1 if its increment
𝑤t = zt − zt−1 is a stationary and invertible process. In the econometric literature, a
stationary and invertible process is called an I(0) process, and an integrated process
of order 1 is called an I(1) process. A random-walk series is an example of an I(1)
process. Let B denote the back-shift operator, such that Bzt = zt−1 . Then, zt is an
I(d) process if 𝑤t = (1 − B)d zt is an I(0) process. Following the convention, we may
also use the notation ∇ = (1 − B) in describing integrated processes. The operator
(1 − B) is called the first-difference operator, and the process 𝑤t = (1 − B)zt is the
first-differenced series of zt .
In a similar way, a vector process zt is called an I(d) process if 𝒘t = (1 − B)d zt is
a stationary and invertible series, where (1 − B)d applies to each and every compo-
nent of zt . Of course, there is no particular reason to expect that all components of
zt are integrated of the same order d. Therefore, let d = (d1 , . . . , dk )′ , where di are
nonnegative integers and max{di } > 0. We call the vector process zt an I(d) process
if 𝒘t = (𝑤1t , . . . , 𝑤kt )′ , where 𝑤it = (1 − B)di zit , is a stationary and invertible process.
In this case, we may define D = diag{∇d1 , . . . , ∇dk } and write 𝒘t = Dzt .
An important concept of the integrated vector process zt is co-integra-
tion. Suppose that the components zit are I(1) processes. If there exists a
22 INTRODUCTION TO BIG DEPENDENT DATA

0.10

0.05
Inflation

0.00

2000 2005 2010 2015


Year

Figure 1.15 Time plots of the first-differenced series of the 33 log monthly price indexes of
European countries from January 2000 to October 2015.

k-dimensional vector c = (c1 , . . . , ck )′ ≠ 𝟎 such that 𝑤t = c′ zt is an I(0) process,


then zt is called a co-integrated process and c is the co-integrating vector. The I(0)
process 𝑤t is the co-integrated series, which is stationary and invertible. In general,
one can have more than one co-integrated process for a k-dimensional unit-root
process zt . We shall discuss co-integration and its associated properties later.
An important class of nonstationary processes consists of seasonal time series. For
a seasonal time series, the mean function is not constant but varies over time accord-
ingly to certain periodic pattern. This pattern is repeated every season in a given year
such as every quarter, month, or week. The number of seasons in a year is referred to
as periodicity. For monthly series, the pattern is often repeated year after year and we
have a periodicity s = 12. For quarterly data, we have s = 4. Consider the 33 monthly
price indexes of Figure 1.7. All of the series exhibit an upward trend so that they are
not stationary. Figure 1.15 shows the time plots of the first-differenced series of the
log price indexes. From the plots, as expected, the trends disappear and the series
(i.e. inflation rate) now fluctuate around zero. Therefore, there is evidence that the
log price indexes are I(1) processes. The cyclic patterns shown in Figure 1.15 indicate
that some months have higher values than others so that the processes belong to the
class of seasonal time series. We shall discuss seasonal time series later. For now, it
suffices to introduce the idea of seasonal difference. The seasonal difference is given
by 𝑤t = (1 − Bs )zt = zt − zt−s , which denotes the annual increment. For convenience,
we refer to 1 − B as the regular difference.
Figure 1.16 shows the time plots of the regularly and seasonally differenced series
of the 40 price indexes of Figure 1.7. Specifically, the time series shown are 𝑤t =
(1 − B)(1 − B12 )zt . From the plots, the seasonal patterns are largely removed. A sea-
sonal time series that requires seasonal difference to achieve stationarity yet remains
invertible is referred to as a seasonally integrated process. The idea of co-integration
also generalizes to seasonal co-integration. Again, details are given in Chapter 3.
Another type of nonstationarity that attracts much attention is the locally
stationary process. A stochastic process is locally stationary if its mean and variance
functions evolve slowly over time. A formal definition of such processes can be
PRINCIPAL COMPONENT ANALYSIS 23

found in Dahlhaus (2012). Different from the conventional time series analysis,
statistical inference on locally stationary processes is based on in-filled asymptotics.
To illustrate, consider the time-varying parameter autoregressive process

zt = 𝜙t zt−1 + 𝜎t 𝜖t , (1.20)

where {𝜖t } is a sequence of standard normal random variates, and 𝜙t and 𝜎t are
rescaled smooth functions such that 𝜙t ≡ 𝜙(t∕T) ∶ [0, 1] → (−1, 1) and 𝜎t ≡ 𝜎(t∕T) ∶
[0, 1] → (0, ∞), where T is the sample size. When T increases, the observations on
𝜙(t∕T) and 𝜎(t∕T) become denser in their domain [0,1] so that we can obtain bet-
ter estimates of the time-varying parameters. Figure 1.17 shows the time plot of a
realization with 300 observations from the time-varying parameter AR(1) model
of Eq. (1.20), where 𝜙(t∕T) = 0.2 + 0.4(t∕T) − 0.2(t∕T)2 and 𝜎(t∕T) = exp(t∕T) with
T = 300. As expected, the series fluctuates around zero with an increasing variabil-
ity. The process is locally stationary because it can be approximated by a stationary
series within a small time interval. For instance, in this particular example, the first
120 observations of Figure 1.17 appears to be weakly stationary.

1.5 PRINCIPAL COMPONENT ANALYSIS

Curse of dimensionality is a well-known difficulty in analyzing big dependent


data. This is so because it is hard to decipher useful information embedded in
high-dimensional time series. Some exploratory data analyses are helpful in this
situation. A powerful tool to explore the dynamic dependence in a vector time
series is the principal component analysis (PCA). Even though PCA was developed
primarily for independent data, it has been shown to be useful in time series analysis
too. See, for instance, Peña and Box (1987), Tsay (2014, chapter 6) and the references
therein. See also Taniguchi and Krishnaiah (1987) and Chang et al. (2014) for some
theoretical developments. However, there are differences between applying PCA
to independent data and to time series data. The main difference is that PCA is

0.05
dd(ln(cpi))

0.00

–0.50

–0.10
2005 2010 2015
Year

Figure 1.16 Time plots of the first and seasonally differenced series of the 30 monthly price
indexes of European countries from January 2000 to October 2015.
Another random document with
no related content on Scribd:
30
Oct
11658 Menk W “ 12 F
30
Morrow J C, S’t 101 Oct
11683
Maj E 31
Oct
11684 McCann J Cav 11 L
31
184 Oct
11686 Moore W
B 31
Oct
11692 Mulligan J 7H
31
Nov
11909 McCune J 67 E
8
Nov
11913 McClush N 97 E
8
Nov
11982 Manee M 53 H
13
145 Nov
12008 McCray J
A 14
118 Nov
12088 Maher D
E 18
Nov
12103 Miller W 31 I
22
Dec
12248 Murray W Cav 14 H
8
Dec
12326 McIntire J 55 C
24
Dec
12334 Myers A D 52 A
26
Jan
12554 Matthews J Cav 6F 65
30
184 Feb
12595 Maloy J M
D 5
Feb
12625 McGenger J 20 C
9
12696 Myers H 87 E Feb
23
Mar
12771 McDonald —— 9G
13
103 Feb
12806 McGarrett R W
F 21
May
1134 Nicholson Jno Cav 3H 64
16
May
1298 Nelson Wm 76 H
23
July
2832 Nolti Wm 6F
3
183 July
3653 Newell G S
A 20
July
4246 Nicholson W Cav 1H
29
Aug
4489 Nelson Geo 2K
1
Aug
4936 Naylor G W, S’t Cav 13 L
7
125 Aug
5109 Nichols D A
D 9
Aug
6001 Neal H G 90 B
17
37 Aug
6011 Nickle C
G 17
77 Aug
6702 Nickem Jas
G 24
Sept
8154 Naylor S Cav 20 H
8
Sept
8907 Noble J 73 D
16
Sept
9424 Nice Isaac 11 L
21
9468 Neff J Cav 4D Sept
21
Oct
10146 Nelson G 55 A
1
145 Oct
10286 Nelson J A
G 4
Oct
10764 Newberry Jno Cav 20 A
12
160 Oct
11107 Nelson A
E 18
19 Oct
11254 Noble Thos Cav 64
G 21
Nov
11776 Nichols G 20 C
3
April
414 Osbourne S K 4K
7
April
622 Oglesby J Cav 4K
19
May
1318 O’Brien P 13 A
23
May
1409 Ottinger I Cav 8 I
27
June
1817 O’Neil Jno, S’t 69 -
12
55 June
2589 Oswald Stephen
G 28
July
3161 O’Conor —— 83 -
11
July
3199 O’Neil J 63 I
12
July
3704 Olmar H, S’t Cav 2H
21
July
3861 O’Connor H 49 E
24
July
4161 Owens G H 7A
28
5119 Offlebach Z 90 K Aug
9
103 Aug
5184 Oliver W
D 9
101 Aug
5939 O’Hara M
E 17
183 Aug
6254 O’Connell Wm
G 20
150 Aug
6535 O’Hara Jno
E 23
103 Aug
6658 Oiler Sam
G 24
109 Aug
6908 O’Rourke Chas
C 26
Aug
7105 Otto Jno Cav 5B
28
Sep
9330 Owens E 50 D
20
Oct
10805 Osborn E, Cor Cav 11 A
13
Mar
30 Peck Albert 57 K
9
Mar
62 Patterson Rob Res 2E
18
Mar
125 Parker Jas M, Cor 76 B
23
April
500 Petrisky H 54 F
12
May
1110 Patterson T Cav 3A
15
73 May
1119 Patent Thos
G 15
May
1258 Powell Wm Cav 14 D
21
1556 Powers Jno 26 I June
2
June
1780 Preso Thos 26 E
9
June
1884 Powell Frank 18 -
12
183 June
2566 Page J
G 27
101 June
2590 Porter David
H 28
103 July
2903 Parsons J T
D 5
July
3197 Painter J G 26 F
11
July
3445 Painter S 63 A
17
101 July
4049 Patterson R
H 27
July
4157 Pickett J C Cav 3A
28
July
4177 Pratt F “ 14 I
28
July
4191 Plymeer W “ 20 B
28
112 July
4415 Page Jno
A 31
102 Aug
4473 Powell H
H 1
Aug
5323 Prosser J 63 -
11
72 Aug
5579 Pyers Isaac
G 14
101 Aug
5610 Phillips Jas B
I 14
184 Aug
5947 Parish J A
- 17
6341 Preans H 149 Aug
K 21
140 Aug
6439 Palmer H
D 22
Aug
6527 Poole G 52 B
22
13 Aug
6536 Pifer M
G 23
Aug
6574 Phillips J W Cav 1F
23
103 Aug
6843 Peterson G
D 25
Aug
6844 Penn Jno Cav 5E
25
Aug
6885 Patten H W Art 2F
26
183 Aug
7118 Potts Edw
H 28
103 Aug
7232 Perkins N
D 29
149 Sept
8030 Powell A T
C 6
Sept
8160 Pricht F 87 H
8
145 Sept
8763 Peck C W
H 14
101 Sept
8877 Persil Frederick
- 15
143 Sept
9220 Palmer A
D 19
143 Sept
9684 Perego W
G 24
Sept
9754 Phipps J H 57 E
26
10074 Price G 106 Sept
H 30
144 Oct
10573 Penstock A
B 9
101 Oct
10858 Powell I 64
I 13
109 Oct
11168 Price O
C 19
Oct
11261 Phay M 69 C
21
Oct
11637 Phillips F 61 K
28
145 Nov
11737 Pees M T
H 2
Nov
11833 Penn J Cav 18 I
6
Nov
11918 Phelps W “ 4G
8
Oct
11328 Porterfield J K “ 5M
23
Nov
12075 Pencer W 18 C
18
Nov
12191 Pryor Wm 11 C
28
Dec
12359 Poleman H Cav 1F
30
121 Jan
12378 Perry H 65
C 2
Jan
12388 Pritchett J 72 C
8
148 Jan
12479 Potter B F
I 17
Aug
6756 Quinby L C 76 E 64
24
Mar
47 Reed Sam Cav 4D
15
126 Robertson J 119 Mar
K 23
49 Mar
132 Rosenburg Henry
G 24
Mar
171 Reign Jno 83 K
26
April
308 Richpeder A 13 B
2
April
610 Ray Wm Cav 8F
18
May
847 Rhinehart J “ 3D
3
May
895 Russell F 4D
3
May
907 Rhinebolt J Cav 18 I
5
150 May
940 Robinson C W, S’t
E 7
May
1152 Randall H Cav 4H
16
May
1218 Rigney Chas “ 4G
19
51 May
1454 Raleigh A
G 29
May
1485 Rudolph S, Cor Cav 13 K
30
June
1599 Rhine Geo 63 I
4
June
1624 Rosenburg H Cav 13 H
4
June
1719 Raymond Jno, S’t “ 18 H
8
June
1803 Rheems A, S’t 73 I
10
1833 Ramsay J D 103 June
F 11
18 June
1922 Rush S
G 14
June
1942 Robinson Wm 77 D
14
101 June
2225 Roush Peter
E 20
June
2528 Rupert F Cav 2H
26
June
2602 Roat J 54 F
28
July
2735 Rhoades F 79 E
1
July
2911 Rock J E 5M
5
July
2979 Regart Jno Cav 13 E
7
July
2103 Ray A, S’t 77 E
17
103 July
3024 Rugh M J
D 7
July
3270 Robins R 69 B
13
148 July
3468 Ransom H
I 17
July
3827 Rinner L Cav 5A
23
July
4074 Ringwalk J F 79 H
27
115 July
4241 Roger L
L 29
July
4309 Rogers C 73 C
30
184 Aug
4476 Ray Jas R
B 1
4507 Riese S 103 Aug
D 1
103 Aug
4844 Richie Jas
B 6
Aug
4940 Ruthfer J Art 2F
7
101 Aug
5319 Rice Sam’l
K 11
103 Aug
5389 Ross David
B 12
Aug
5430 Robinson John 99 D
12
Aug
5537 Rose B 13 I
13
Aug
5800 Robins J Cav 2M
15
Aug
5879 Rider H “ 7L
16
143 Aug
5894 Richards E
E 16
103 Aug
5912 Reese Jacob
B 17
Aug
5940 Richards Jno, Cor Cav 1G
17
106 Aug
6321 Robbins G
G 21
110 Aug
6373 Roger Jno L
H 21
Aug
6520 Reynolds J 14 H
22
103 Aug
6725 Rowe E, Cor
A 24
149 Aug
6777 Rangardener J 64
H 25
6789 Richards G Cav 13 A Aug
25
Aug
6790 Runels Jno “ 6L
25
188 Aug
6822 Rum A
C 25
148 Aug
6838 Reese D
K 25
Aug
6896 Raiff T 1A
26
Aug
6933 Richardson —— 61 -
26
143 Aug
7067 Reese D
F 28
103 Aug
7202 Ruff J
F 29
Aug
7292 Redmire H 98 B
30
Aug
7293 Robins Geo 62 A
30
103 Aug
7410 Richardson H
K 31
Sept
7467 Richard D Cav 18 D
1
Sept
7716 Rice E 7B
3
101 Sept
7738 Roads Frederick
E 3
Sept
8139 Rathburn K 2F
8
Sept
8540 Russell S A, Cor 79 A
12
149 Sept
8545 Ray A
D 12
106 Sept
8602 Richards J
H 12
8635 Rhangmen G, S’t 138 Sept
D 13
Sept
8742 Root D 48 B
14
Sept
9019 Ret Geo 18 A
17
149 Sept
9272 Ramsay J I
- 19
Sept
9585 Richie H 11 F
23
Sept
9599 Renamer W H 87 H
23
113 Sept
9612 Richards Jno
D 23
103 Sept
9653 Reed R
A 24
Sept
9766 Ramsay R 84 D
25
Sept
9882 Richards J 53 K
27
Oct
10174 Reed J 55 A
1
Oct
10863 Ramsay Wm 87 B
13
Oct
10622 Reedy E T, S’t 87 B
10
Oct
10935 Roundabush H B 51 A
14
Oct
10947 Rockwell A Cav 2L
14
Oct
11071 Raeff J B 72 E
17
Oct
11115 Rinkle Jno A 20 A
18
11293 Rolston J 18 F Oct
22
Oct
11147 Rudy J 13 F
19
189 Oct
11444 Riffle S G, Cor
C 25
144 Oct
11566 Richardson A
E 27
111 Nov
11868 Rowland N
F 6
Nov
12008 Rapp A E Cav 18 I
15
Nov
12048 Ruth B S 23 I
16
101 Dec
12206 Rothe C
A 1
Dec
12355 Reese D 7A
29
128 Jan
12372 Reed W S 65
H 1
April
377 Smith M D 18 B 64
5
April
788 Smith Geo Cav 5H
28
May
881 Smith Wm 4A
4
19 May
882 Smith T
G 4
12 May
921 Steffler W J, S’t Cav
G 6
May
1014 Serend H “ 4D
10
May
1030 Shebert Gotlieb 73 C
11
May
1058 Spilyfiter A 54 F
13
1105 Sullivan D 101 May
K 15
140 May
1114 Shindle S R, S’t
K 15
May
1155 Stearnes E K Cav 14 A
16
May
1169 Sloat D 76 I
16
May
1175 Scott Wm 4B
16
139 May
1216 Severn C
A 19
May
1256 Sammoris B, S’t Cav 2B
21
May
1349 Smith Chas 26 A
24
May
1453 Schlenbough C Cav 4G
29
May
1503 Smith Martin “ 18 H
31
June
1535 Stone Samuel 26 F
1
June
1543 Shoemaker M, S’t Cav 13 H
1
June
1605 Swearer G 13 H
4
June
1620 Schiefeit Jacob 54 F 64
4
June
1632 Schmar R 45 F
5
June
1963 Smith D Cav 11 H
14
June
2039 Slough H 53 -
15
2070 Stevens A Cav 13 June
M 16
June
2121 Sherwood C H, S’t “ 4M
17
June
2123 Stall Sam’l 75 D
17
June
2126 Say J R Cav 4K
17
June
2163 Steele J S “ 7F
19
June
2259 Scoles M 27 K
21
14 June
2331 Sims B Cav
G 22
June
2412 Shop Jacob 2M
24
101 June
2622 Springer Jno
E 28
103 June
2650 Stewart J B
A 29
150 July
2725 Scott Allen
H 1
73 July
2738 Schimgert J
G 1
July
2791 Shimer J A Cav 13 A
2
July
2864 Scott Wm, (negro) 8D
4
July
2905 Stump A 11 I
5
July
2941 Smith Jacob 51 H
6
140 July
2982 Shaw W
B 7
112 July
2999 Smulley Jno
K 7
3057 Sutton R M 103 July
I 9
July
3113 Sweet H 57 K
10
148 July
3136 Shoemaker M
G 10
July
3154 Sillers Wm 77 D
11
53 July
3214 Stone W F
G 12
103 July
3480 Swelser J
D 17
July
3567 Smalley L 58 K
19
150 July
3568 Stevens S G
H 19
116 July
3586 Sickles Daniel
K 19
142 July
3632 Serders J S
K 20
July
3670 Stopper Wm 16 B
20
172 July
3763 Stillenberger F
F 22
July
3775 Strance D 11 H
22
July
3855 Smith J 79 F
24
77 July
3906 Smith O C
G 24
144 July
3956 Seilk A
D 25
July
3960 Sullivan T 77 F
25
4006 Smith F 64 K July
26
July
4009 Shafer J H 84 E
26
103 July
4012 Shapley Geo
G 26
July
4043 Strickley C 53 H
27
19 July
4064 Shriveley E S Cav
M 27
145 July
4113 Sheppard E
G 28
101 July
4164 Smith S W
B 28
July
4213 Shaffer Peter 52 F
29
July
4223 Shister F Cav 3A
29
July
4228 Stein J 7G
29
July
4274 Sloan J 11 E
29
July
4285 Shone P Cav 4D
30
101 July
4345 Stobbs W W, Cor
E 30
July
4348 Scott A 22 F
31
July
4351 Scundler J 67 A
31
July
4372 Smith P 72 C
31
15 Aug
4566 Sale Thos
M 2
Aug
4775 Shink Jas 81 F
5
4791 Sullivan Ed 67 H Aug
5
Aug
4797 Sear C Cav 14 L
5
Aug
4845 Shember Jno “ 11 D
6
Aug
4928 Slicker J 77 D
6
61 Aug
4931 Sheit P
G 7
Aug
4945 Swartz P, Cor 27 I
7
22 Aug
5160 Stiner Jno Cav
G 9
Aug
5189 Striker F “ 14 C
9
184 Aug
5215 Sworeland Wm
A 10
118 Aug
5232 Speck A
A 10
Aug
5411 Shaffer Daniel Cav 13 F 64
12
103 Aug
5529 Spangrost A
D 12
149 Aug
5437 Shears J S
K 12
Aug
5463 Stibbs W 56 H
13
Aug
5494 Shape F Cav 18 A
13
Aug
5603 Somerfield W 69 E
14
150 Aug
5700 Stinebach A
C 15
5750 Spears W M, S’t Cav 2K Aug
15
Aug
5874 Sheppard N 79 F
16
Aug
5965 Shultz F Cav 13 K
17
103 Aug
6205 Shoop G
K 19
Aug
6289 Smith H 26 K
20
Aug
6337 Smith W Cav 18 B
21
101 Aug
6382 Swager M
F 21
118 Aug
6436 Spain Thos
H 22
Aug
6523 Stover J 49 F
22
149 Aug
6526 Stahler S
G 22
118 Aug
6534 Snyder Jno
C 23
Aug
6584 Sloate E 50 D
23
105 Aug
6595 Shirley Henry
I 23
Aug
6669 Sherwood P 84 I
24
150 Aug
6776 Shellito R
C 25
118 Aug
6823 Spain Richard
H 25
79 Aug
6829 Sturgess W A, Cor
G 25
Aug
6880 Stuler D Cav 4A
26
7029 Strickler J W 11 F Aug
27
Aug
7106 Smith Jno F 55 C
28
Aug
7137 Sloan J M Cav 18 D
28
113 Aug
7141 Springer J
F 29
Aug
7262 Shriver B Cav 18 K
30
Aug
7302 Singer J Art 2A
30
Aug
7358 Scoleton J 53 F
31
Aug
7363 Sweeney D Cav 14 E
31
Aug
7379 Scott W B “ 4D
31
Sept
7631 Streetman J 7E
2
62 Sept
7638 Steele J
M 2
Sept
7648 Spencer Geo 20 C
3
183 Sept
7662 Snyder M S
A 3
Sept
7705 Swartz Geo Cav 5A
3
Stockhouse D, Sept
7770 “ 18 I
Cor 4
149 Sept
7905 Sellers H
G 5
Sept
7939 Shultz Jno Cav 4 I
5
7960 Smith A C 7F Sept

You might also like