Professional Documents
Culture Documents
(Download PDF) Statistical Modelling Using Local Gaussian Approximation 1St Edition Tjostheim Ebook Online Full Chapter
(Download PDF) Statistical Modelling Using Local Gaussian Approximation 1St Edition Tjostheim Ebook Online Full Chapter
(Download PDF) Statistical Modelling Using Local Gaussian Approximation 1St Edition Tjostheim Ebook Online Full Chapter
https://ebookmeta.com/product/from-statistical-physics-to-data-
driven-modelling-simona-cocco/
https://ebookmeta.com/product/heat-transfer-modelling-using-
comsol-1st-edition-layla-s-mayboudi/
https://ebookmeta.com/product/statistical-data-analysis-using-
sas-intermediate-statistical-methods-springer-texts-in-
statistics-marasinghe/
https://ebookmeta.com/product/asymptotic-statistical-inference-a-
basic-course-using-r-shailaja-deshmukh/
Mathematical Modelling of Decision Problems Using the
SIMUS Method for Complex Scenarios 1st Edition Nolberto
Munier
https://ebookmeta.com/product/mathematical-modelling-of-decision-
problems-using-the-simus-method-for-complex-scenarios-1st-
edition-nolberto-munier/
https://ebookmeta.com/product/gaussian-measures-in-finite-and-
infinite-dimensions-1st-edition-daniel-w-stroock/
https://ebookmeta.com/product/computation-and-approximation-1st-
edition-vijay-gupta/
https://ebookmeta.com/product/marketing-analytics-statistical-
tools-for-marketing-and-consumer-behaviour-using-spss-1st-
edition-carvalho-de-mesquita/
https://ebookmeta.com/product/the-psychosocial-reality-of-
digital-travel-being-in-virtual-places-ingvar-tjostheim/
STATISTICAL
MODELING USING
LOCAL GAUSSIAN
APPROXIMATION
This page intentionally left blank
STATISTICAL
MODELING USING
LOCAL GAUSSIAN
APPROXIMATION
DAG TJØSTHEIM
HÅKON OTNEIM
BÅRD STØVE
Academic Press is an imprint of Elsevier
125 London Wall, London EC2Y 5AS, United Kingdom
525 B Street, Suite 1650, San Diego, CA 92101, United States
50 Hampshire Street, 5th Floor, Cambridge, MA 02139, United States
The Boulevard, Langford Lane, Kidlington, Oxford OX5 1GB, United Kingdom
Copyright © 2022 Elsevier Inc. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or by any means,
electronic or mechanical, including photocopying, recording, or any information storage and
retrieval system, without permission in writing from the publisher. Details on how to seek
permission, further information about the Publisher’s permissions policies and our arrangements
with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency,
can be found at our website: www.elsevier.com/permissions.
This book and the individual contributions contained in it are protected under copyright by the
Publisher (other than as may be noted herein).
Notices
Knowledge and best practice in this field are constantly changing. As new research and experience
broaden our understanding, changes in research methods, professional practices, or medical
treatment may become necessary.
Practitioners and researchers must always rely on their own experience and knowledge in
evaluating and using any information, methods, compounds, or experiments described herein. In
using such information or methods they should be mindful of their own safety and the safety of
others, including parties for whom they have a professional responsibility.
To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors,
assume any liability for any injury and/or damage to persons or property as a matter of products
liability, negligence or otherwise, or from any use or operation of any methods, products,
instructions, or ideas contained in the material herein.
ISBN: 978-0-12-815861-6
Biography xi
Preface xiii
1. Introduction 1
1.1. Computer code 6
References 6
3. Dependence 49
3.1. Introduction 49
3.2. Weaknesses of Pearson’s ρ 52
3.3. The copula 56
3.4. Global dependence functionals and tests of independence 61
3.5. Test functionals generated by local dependence relationships 80
References 81
vii
viii Contents
Dag Tjøstheim
is Emeritus Professor, Department of Mathematics, University of Bergen.
He has a PhD in applied mathematics from Princeton University (1974).
He has authored more than 120 papers in international journals. He is a
member of the Norwegian Academy of Sciences and has received several
prizes for his scientific work. His main interests are in econometrics, non-
linear time series, nonparametric methods, modeling of dependence, spatial
variables, and fishery statistics.
Håkon Otneim
is Associate Professor at the Norwegian School of Economics. He has a
PhD in statistics from the University of Bergen (2016), and he has published
papers in international journals about multivariate density estimation and
conditional density estimation. His research interests include development
and application of nonparametric and semiparametric statistics, statistical
programming, and data visualization.
Bård Støve
is Professor of Statistics at the University of Bergen. He received his PhD
degree in statistics, 2005. He was Assistant Professor at the Norwegian
School of Economics (2007–2011), and worked as an Actuary in a con-
sulting firm (2005–2007). He has been working on the development of
nonparametric models and application of such models to finance and eco-
nomics. He has published several research papers in such journals as Econo-
metric Theory and Scandinavian Journal of Statistics.
xi
This page intentionally left blank
Preface
correlation. The local Gaussian correlation is put directly into this context
in Chapter 4.
There is some overlap between the various chapters in the book. This
has been done intentionally, so that a reader can single out the chapters of
primary interest to her/him. Most chapters can be read independently of
each other as the basic material from Chapter 4 is included briefly as an
introductory material in each of the following chapters. The mathematical
and technical level of each chapter is quite modest. For readers with more
interest in technical details, we give references, often to supplementary ma-
terial to the papers that the book is composed from.
There are three R-packages that have been developed for various types
of analysis in the book. We do not present details of use of these packages
in this book, but references to the packages are given in Chapter 1.
The local Gaussian approach is a recently developed methodology. Some
of the chapters are based on papers that have just appeared or are in the
process of appearing in journals. Putting this in a book, the emphasis is on
presenting the fundamental concepts inherent in a local Gaussian approx-
imation and in demonstrating their usefulness in several areas in statistics.
At the same time, we hope that the book may serve as a starting point and
inspiration for further research and applications in the subject matters taken
up in each of the chapters of the book, as well as in new subject areas.
The chapters of the book have been primarily based on papers by the
three authors of the book, but some chapters have also benefited from joint
work and joint papers with others, namely Karl Ove Hufthammer (Chap-
ters 4 and 6), Geir Berentsen (Chapters 5 and 7), Viginia Lacal (Chapter 7),
Lars Arne Jordanger (Chapter 8), Martin Jullum (Chapter 13), and Anders
Sleire (parts of Chapter 6). Without their contributions the book had not
been possible in its present form, and we are very grateful to them for their
good work and cooperation on these subjects.
Dag Tjøstheim
Håkon Otneim
Bård Støve
Bergen, May 2021
This page intentionally left blank
CHAPTER 1
Introduction
Contents
1.1. Computer code 6
References 6
where μ = {μi } and = {σij }, i, j = 1, . . . , p, are the mean vector and co-
variance matrix of X, respectively. Looking at this familiar expression, it
is easy to forget its simplicity and elegance. Here we have a distribution
whose location is completely determined by its means μi , the scale by the
variances σii , and whose dependence relations have the amazing property
that they are completely determined by the pairwise covariances σij .
Moreover, if X is subdivided into two components X = (X 1 , X 2 ), then
any linear combination of X 1 and X 2 is again Gaussian, and the condi-
tional distribution fX 1 |X 2 (x1 |x2 ) is Gaussian. The dependence properties of
these derived distributions are again determined by the pairwise covari-
ances, in the latter case, through the partial covariances. These properties
make the Gaussian especially suitable for linear statistical modeling. Fur-
ther, the properties of the conditional distribution imply that in a Gaussian
system the optimal least squares predictor, given by the conditional mean,
is linear and equals the optimal linear predictor. Finally, uncorrelatedness is
equivalent to independence in the Gaussian distribution, that is, X 1 and X 2
are independent if and only if they are uncorrelated. In this case, we can
test for independence by computing covariances.
Unfortunately, data are not always well described by a Gaussian distri-
bution and a linear model. In particular, for data in economics and finance
the data are usually governed by distributions having thicker tails, and the
dependence properties are not well described by pairwise covariances only,
Statistical Modeling using Local Gaussian Approximation Copyright © 2022 Elsevier Inc.
https://doi.org/10.1016/B978-0-12-815861-6.00008-0 All rights reserved. 1
2 Statistical Modeling using Local Gaussian Approximation
ing between two time series. We compare to other tests like the Brownian
distance covariance for both simulated and real data.
In Chapter 8 the time series frame is kept, but here we focus on the
local autocorrelation and the local spectrum that can be derived from it.
It is shown that frequency behavior that cannot be detected by ordinary
spectral analysis can be detected by the local spectrum. The chapter also
contains a brief review of alternative nonlinear spectral techniques.
Chapters 9 and 10 are devoted to density estimation and conditional
density estimation, respectively. The density estimation is the aspect stressed
by Hjort and Jones (1996) in their local parametric analysis. We carry this
through for the local Gaussian approximation of a density and compare
with other methods as the dimension increases. In the conditional density
estimation, we exploit locally the fact that the conditional density in a joint
Gaussian density framework is again a Gaussian density, where the local
mean vector and covariance matrix can be found by explicit formulas.
In a sense testing for conditional independence is more important than
testing for independence. This is due to the applications to causality analysis
among other things. For globally Gaussian data, the partial correlation co-
efficient is an important tool, for example, in path analysis. In Chapter 11,
we introduce the local partial correlation and use it both for measur-
ing conditional dependence and for testing of conditional independence.
We compare with alternative tests and give applications to testing Granger
causality.
Regression and conditional quantile estimation is covered in Chap-
ter 12. We note that the local Gaussian approach is primarily suited to a
situation where all the variables are treated on the same basis. It is perhaps
less well suited to a situation where there is one dependent variable and
one or several explanatory variables. Nevertheless, we show in this chapter
that the local Gaussian approximation can be applied and that in particular
cases it may offer an alternative to the additive approximation in regression
models.
The traditional Fisher discriminant for discriminating between two or
more populations is based on a Gaussian assumption. In Chapter 13, we
make the parameters of the Gaussian local and derive a local Gaussian Fisher
discriminant, which is applied to simulated and real data. It is easy to find
examples where the global Fisher discriminant does not work, whereas the
local one does.
6 Statistical Modeling using Local Gaussian Approximation
References
Berentsen, G.D., Kleppe, T., Tjøstheim, D., 2014. Introducing localgauss, an R package for
estimating and visualizing local Gaussian correlation. Journal of Statistical Software 56
(12), 1–18.
Hjort, N., Jones, M., 1996. Locally parametric nonparametric density estimation. Annals of
Statistics 24 (4), 1619–1647.
Otneim, H., 2019. lg: Locally Gaussian distributions: estimation and methods. https://
CRAN.R-project.org/package=lg. R package version 0.4.1.
Otneim, H., 2021. Ig: an R package for local Gaussian approximations. To appear, The R
Journal. URL: https://journal.r-project.org/archive/2021/RJ-2021-079/index.html.
R Core Team, 2017. R: A Language and Environment for Statistical Computing. R Foun-
dation for Statistical Computing, Vienna, Austria.
Taleb, N.N., 2007. The Black Swan: The Impact of the Highly Improbable. Random
House.
Parametric, nonparametric,
locally parametric
Contents
2.1. Introduction 7
2.2. Parametric density models 9
2.2.1 The Gaussian distribution 9
2.2.2 The elliptical distribution 11
2.2.3 The exponential family 15
2.3. Parametric regression models 17
2.3.1 Linear regression 17
2.3.2 Nonlinear regression and some further modeling aspects 19
2.4. Time series 20
2.5. Nonparametric density estimation 23
2.5.1 Nonparametric kernel density estimation 24
2.5.2 Bandwidth selection 26
2.5.3 Multivariate and conditional density estimation 27
2.6. Nonparametric regression estimation 29
2.6.1 Kernel regression estimation 29
2.6.2 Local polynomial estimation 31
2.6.3 Choice of bandwidth in regression 32
2.7. Fighting the curse of dimensionality 33
2.7.1 Additive models 34
2.7.2 Regression trees, splines, and MARS 37
2.8. Quantile regression 37
2.9. Semiparametric models 38
2.9.1 Partially linear models 39
2.9.2 Index models and projection pursuit 39
2.10. Locally parametric 40
References 43
2.1 Introduction
In statistical modeling, we have to choose between a parametric model
and the use of nonparametric statistics. A compromise is a semiparametric
model, where both aspects of modeling are taken into consideration.
For a parametric model, the mathematical form of the model and
relationships between stochastic variables entering the model and their dis-
tributions are explicitly stated and generally assumed to be known except
Statistical Modeling using Local Gaussian Approximation Copyright © 2022 Elsevier Inc.
https://doi.org/10.1016/B978-0-12-815861-6.00009-2 All rights reserved. 7
8 Statistical Modeling using Local Gaussian Approximation
for a set of parameters. The parameters may, for instance, appear as coeffi-
cients in a linear regression or as parameters of a distribution function from
a certain class. Strictly speaking, a parametric model is never true, or in the
words of Box and Draper (1987, p. 424), “All models are wrong, but some
are useful.” A parametric model is often quite simple and has a straightfor-
ward interpretation. In certain situations, however, parametric models may
lead astray. When the model is seriously wrong, the accuracy of param-
eter estimates does not help. One well-known example is the estimation
of value-at-risk for financial markets. Sometimes, financial crises have been
blamed on using Gaussian models in the tail of a distribution when financial
objects quite clearly have thicker tails. Using the Gaussian distribution may
lead to a disastrous underestimation of the risk; see Taleb (2007). In such
situations a parametric model works as an extremely inconvenient strait
jacket.
The purpose of a nonparametric approach is letting the data speak for
themselves and thereby preventing such situations from occurring. How-
ever, this approach also has its disadvantages. First, the convergence rate of
nonparametric estimates is slower than the parametric rate. But perhaps the
most important obstacle is the curse of dimensionality: when we have a
moderate or large number of variables, the nonparametric approach does
not work in practice. It may still be possible to state theorems of conver-
gence rates for nonparametric estimates, but these are so slow that we need
astronomically large sample sizes to come close to the true values, which
we typically do not have. There are various ways of trying to get around
the curse. This can be done by assuming further restrictive assumptions such
as an additive model in a regression context. We will come back to this on
several occasions later in this chapter and in later chapters of the book.
Another way of tackling the curse of dimensionality is using a semipara-
metric model, that is, a model where some parts of the model are treated
parametrically and other parts of the model are treated nonparametrically.
The implicit understanding is that the nonparametric part is specified in
such a way that we avoid the curse of dimensionality. However, it may not
be obvious which parts of the model should be specified parametrically and
which parts should be treated in a nonparametric fashion.
The main philosophy of this book is to try to take advantage of the best
features of the nonparametric and parametric methodologies. We do this
by letting the parameters of a parametric model depend on the variables
involved, which is a local parametric approach advocated by, for example,
Hjort and Jones (1996) and Loader (1996). These two references treat lo-
Parametric, nonparametric, locally parametric 9
x −μ 2 x1 − μ1 x2 − μ2 x2 − μ2 2
1 1
− 2ρ12 + ,
σ1 σ1 σ2 σ2
where ρ12 = σ12 /σ1 σ2 is the correlation between X1 and X2 . It is well
known that uncorrelatedness does not imply independence in general, but
for the Gaussian, as is easy to check from the form of the distribution func-
tion, uncorrelatedness and independence are equivalent. In a local Gaussian
approximation, this property is used to asses dependence by means of the
local correlation, and tests of independence are constructed by accumulating
the local Gaussian correlation, see Chapter 7.
Another very important property of the Gaussian distribution is that
marginal distributions and conditional distributions are again Gaussian. Let
X ∼ N (μX , XX ) be a p-dimensional column vector, and let Y be a q-
dimensional column vector with mean μY = E(Y ) and covariance matrix
Parametric, nonparametric, locally parametric 11
and
2|1 = (1 − ρ21
2
)σ22 .
These formulas are starting points for defining the partial correlation func-
tion of two vectors X and Y given a third vector Z. In Chapter 11, we will
introduce local versions of these quantities by straight analogy and use them
in a description of conditional density functions and tests for conditional
independence. The local partial correlation function derived from the for-
mulas for the global Gaussian will play an essential role in these derivations.
Other useful properties of a multivariate Gaussian are its simple transfor-
mation rules. If X is a multivariate Gaussian of dimension p, c is a vector of
scalars having dimension q, B is a q × p matrix of scalars, and if X ∼ N (μ, ),
then Y = c + BX ∼ N (c + Bμ, B BT ). In particular, any linear combina-
tion of the components of X is again normally distributed. We will make
use of this in Section 6.4 on nonlinear local portfolio construction.
The estimation of the parameters μ and in (2.1) can be done by
maximizing the log likelihood function, which becomes very simple for
the density function (2.1). The analogue for estimating a local mean and
a local covariance is a local log likelihood function as explained in some
detail in Chapter 4.
X = μ + AX ,
CHAPTER XI
VISCONTI FAITH