Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Random notes on kriging:

an introduction to geostatistical
interpolation for environmental
applications

Luca Bonaventura
Stefano Castruccio
MOX - Laboratorio di Matematica Applicata
Dipartimento di Matematica
Politecnico di Milano
luca.bonaventura@polimi.it
Contents

1 The estimation of spatially distributed and uncertain data 2

2 Basic definitions on random fields 4


2.1 Finite dimensional distributions . . . . . . . . . . . . . . . . . . 4
2.2 First, second order moments and variograms of random fields . . . 5
2.3 Analysis of random fields . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Definitions of stationarity of random fields . . . . . . . . . . . . . 7
2.5 Characterization and representation theorems for variogram func-
tions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Measurement error and subgrid scales:
the nugget effect . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7 Isotropic variogram models . . . . . . . . . . . . . . . . . . . . . 12

3 Variogram estimation 14
3.1 Empirical variogram estimators . . . . . . . . . . . . . . . . . . . 14
3.2 Least squares variogram fitting procedures . . . . . . . . . . . . . 15

4 Spatial prediction and kriging 17


4.1 Ordinary kriging . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Universal kriging . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Appendix: Basics of random variables 23

References 26

i
Introduction

The purpose of these notes is to provide a short and self-contained introduc-


tion to the literature on geostatistical interpolation of scattered data. The existing
literature on this topic has a very broad scope and presents the related issues from
a wide range of rather different perspectives, motivated by highly specific appli-
cations, e.g. in mining, groundwater flow modelling, oceanography, meteorology.
The aim of this introduction is to summarize in a consistent way the basic ter-
minology and the key theorical concepts underlying the practice of geostatistical
interpolation and to present the derivation of the most widely used kriging estima-
tors.
There is no attempt at a complete presentation of the underlying theories or
methods, which is available in a number of well known publications. For a more
complete description of the statistical techiques surveyed here, the reader is re-
ferred, among many others, to the presentations in [5], [9], [12], [13]. A more
advanced presentation of the same material for readers with good background in
mathematical statistics can be found in [4].
There is also no attempt at achieving a high standard of mathematical rigour
in the formulation of the definitions and theorems. The reader interested in the
complete presentation of the measure theoretic problems associated with proba-
bility spaces, random variables and random fields should consult textbooks such
as [2]. A basic introduction to probability theory and mathematical statistics can
be found e.g. in [10].

1
Chapter 1

The estimation of spatially


distributed and uncertain data

Consider N points xi , i = 1, · · · , N in the vector space Rd . At these loca-


tions, data zi i = 1, · · · , N are assumed to be known. These data are interpreted
as the values of a field z, whose value depends on the position in space. In general,
the points xi will be scattered disorderly in space, rather than aligned on a regular
grid. Furthermore, the data are assumed to be affected by some uncertainty, due
either to measurement error, or to the fact that the quantity z is dependent on some
unpredictable physical process, or both.

Definition 1 (Geostatistical interpolation) Given the N points xi , i = 1, · · · , N


and the uncertain data zi i = 1, · · · , N, the geostatistical interpolation problem
consist of

• predicting the most appropriate value z0 for the quantity z at a point x0 ,


different from the points associated to the available data

• estimating the uncertainty of the prediction z0 as a function of the uncer-


tainty on the available data zi i = 1, · · · , N and of their correlation struc-
ture.

The geostatistical interpolation problem is quite different from the classical


interpolation problem. In classical interpolation, the data zi are assumed to be
sampled from a function z(x), that is reconstructed from the data under some
assumption on the nature of the interpolating function ẑ. Typically, for classical
Lagrange interpolation one assumes that the function ẑ is a polynomial (see e.g.
[11]), while in the case of Radial Basis Function interpolation (which is quite
useful for deterministic interpolation from scattered data and has many technical
similarities with kriging as far as the formulation of the interpolation problem is

2
Spatial data 3

concerned, see e.g. [3]) the interpolator is assumed to be a linear combination of


shape functions with particular properties. Furthermore, the approximation error
is dependent on the regularity of the underlying function z and of its derivatives.
On the other hand, geostatistical interpolators do not depend in general on the
regularity of z and do not yield in general regular reconstructions, apart from
the fact that, if measurement errors and subgrid effects are disregarded, an exact
interpolation condition holds at the points xi , i = 1, · · · , N.
Chapter 2

Basic definitions on random fields

Definition 2 A random field is a function Z = Z(ω, x) which prescribes a real


number Z for each couple (ω, x), where ω is an event in a probability space
(Ω, P ) and x ∈ Rd (in the following, the dependence on ω will often be omitted
for the sake of simplifying the notation).

Thus, a random field is a function of several real variables which also happens
to depend on elements of a probability space. A short review of the basic proper-
ties of these mathematical objects will show how they combine the peculiarities
of random variables and scalar fields on Rd . Concepts from both analysis and
probability theory are necessary for a proper description of their behaviour.

2.1 Finite dimensional distributions


From a probabilistic viewpoint, the behaviour of a random field is completely
determined if it is known how to compute the probabilities
h i
P Z(x1 ) ∈ (a1 , b1 ), · · · , Z(xN ) ∈ (aN , bN ) , (2.1)

where ai , bi , denote the extremes of arbitrary intervals on the real line. For each N
and each set of N points xi , i = 1, · · · , N the probabilities (2.1) define uniquely
a set of values P(x1 ,··· ,xN ) [(a1 , b1 ), · · · , (aN , bN )] which identifies a probability
distribution on RN . These probability distributions are called finite dimensional
distributions of the random field Z. It should be observed that the quantities (2.1)
are symmetric with respect to permutations of the set of points xi , i = 1, · · · , N.
In the case of random fields with continuous finite dimensional distributions,
to compute (2.1) it is sufficient to prescribe for each N and each set of N points
xi , i = 1, · · · , N a probability density fZ (u) = f(Z(x1 ),··· ,Z(xN )) (u1 , · · · , uN ).

4
Random fields 5

Theorem 1 (Kolmogorov) A set of probability distributions on RN , defined as


P(x1 ,··· ,xN ) ([a1 , b1 ], · · · , [aN , bN ]) for N ≥ 1 and symmetric with respect to per-
mutations of the set of points xi , i = 1, · · · , N determines uniquely the probabil-
ity of any event associated with the random field if one assumes
h i
P(x1 ,··· ,xN ) ([a1 , b1 ], · · · , [aN , bN ]) = P Z(x1 ) ∈ [a1 , b1 ], · · · , Z(xN ) ∈ [aN , bN ] .

An important example are gaussian random fields, for which the finite dimen-
sional distributions are defined by multidimensional gaussian distributions, whose
densities are given for a generic set of points xi , i = 1, · · · , N by

1 n −(u − m)Ȧ−1 (u − m) o
fZ (u) = p exp (2.2)
(2π)N det(A) 2
where m = (m(x1 ), · · · , m(xN ) is vector of space dependent quantities and A =
Ax1 ,··· ,xN is a symmetric, positive definite matrix.

2.2 First, second order moments and variograms of


random fields
The average and the variance of a random field are defined as usually for ran-
dom variables
h i Z +∞
m(x) = E Z(x) = ufZ(x) (u) du (2.3)
−∞

h i h 2 i
V ar Z(x) = σZ2 (x) = E Z(x) − m(x)
Z +∞
= (u − m(x))2 fZ(x) (u) du (2.4)
−∞

The computation of mean and variance only involves the one dimensional distri-
butions. Other quantities such as the covariance require instead two dimensional
finite distributions
h i h  i
Cov Z(x), Z(y) = E Z(x) − m(x) Z(y) − m(y) . (2.5)
The covariance is defined if the first and second order moments of the random field
exist. In the case of Gaussian random fields whose finite dimensional distributions
are described by equation (2.2), the vector m = (m(x1 ), · · · , m(xN ) has indeed
Random fields 6

as components the mean values of the field at locations x1 , · · · , xN , while the


matrix A is such that ai,j = Cov[Z(xi )Z(xj )].
A very important quantity which plays a key role in the development of statis-
tical interpolators is the variogram.

Definition 3 (Variogram) The variogram of a random field Z(x) is defined as


h i
V ar Z(x) − Z(y) .

The quantity
1 h i
γ(x, y) = V ar Z(x) − Z(y)
2
is called semivariogram of Z. If Z has constant mean, the semivariogram is de-
fined equivalently as
1 h 2 i
γ(x, y) = E Z(x) − Z(y)
2
If a random field has second order moments, both variogram and covariance
exist and there is a simple relationship between them
h i h i h i h i
V ar Z(x) − Z(y) = V ar Z(x) + V ar Z(y) − 2Cov Z(x), Z(y) (2.6)

. Higher order moments can also be defined as done for standard random variables.
However, in practice they are quite difficult to estimate from the data and in many
applications estimation and inference are only feasible for first and second order
moments.

2.3 Analysis of random fields


If a random field Z(ω, x) is considered as a function of the spatial variable,
a number of the usual analysis concepts (limit, continuity, derivative) can be in-
troduced. This, however, can be done in different ways depending on how the
dependency on the probability space is dealt with. Various concepts of limit are
given here for the (spatially) pointwise convergence of a sequence of random fields
Zn (ω, x), n = 1, · · · , ∞. The same definitions can be extended to different types
of convergence in the spatial variable. Furthermore, based on these limit con-
cepts, the continuity and differentiability of the random fields can also be defined
accordingly.
Random fields 7

Definition 4 (Pointwise convergence in probability) The sequence Zn (ω, x), n =


1, · · · , ∞ converges pointwise in probability to Z(ω, x) if for any  > 0 and for
any x ∈ Rd one has
lim P [|Zn (ω, x) − Z(ω, x)| > ] = 0. (2.7)
n→∞

Definition 5 (Convergence with probability one ) The sequence Zn (ω, x), n =


1, · · · , ∞ converges pointwise with probability one to Z(ω, x) if for any x ∈ Rd
one has
P [ lim |Zn (ω, x) − Z(ω, x)| = 0] = 1. (2.8)
n→∞

Definition 6 (Convergence in mean square sense ) The sequence Zn (ω, x), n =


1, · · · , ∞ converges pointwise in mean square sense to Z(ω, x) if for any x ∈ R d
one has
lim E|Zn (ω, x) − Z(ω, x)|2 ] = 0. (2.9)
n→∞

These convergence concepts are not independent of each other: for example, both
convergence in mean square sense and convergence with probability one imply
convergence in probability.
An important result relating the continuity of a random field to the properties
of its second order moments is the following:
Theorem 2 (Continuity of random fields) If there is a β > 0 such that
h 2 i
E Z(x) − Z(y) ≤ Ckx − yk2d+β

the random field Z(x) is continuous with probability one.


Proof: See e.g. [1].
This theorem implies that the specific features of the variogram function have
a relevant impact on the regularity of the field as function of the spatial variables.

2.4 Definitions of stationarity of random fields


Geostatistical interpolation, as it will be seen later, can be in general intro-
duced independently of any hypothesis on the nature of the random field. How-
ever, in order to achieve an acceptable estimate of the semivariogram without
requiring an amount of data much larger than what is usually available (especially
in underground flow or mining applications) some restrictions on the nature of the
allowed random fields are necessary. Similar restrictions are also introduced for
either conceptual or practical reasons in other areas in which random fields are
applied.
Random fields 8

Definition 7 (Stationary random fields) A random field is called stationary if


for any vector h ∈ Rd and for any set of points xi , i = 1, · · · , N one has
h i
P Z(x1 + h) ∈ [a1 , b1 ], · · · , Z(xN + h) ∈ [aN , bN ]
h i
= P Z(x1 ) ∈ [a1 , b1 ], · · · , Z(xN ) ∈ [aN , bN ] . (2.10)

The stationarity property can also be summarized by saying that the finite di-
mensional distributions of a stationary field are translation invariant. As a conse-
quence, all the single site moments E[Z(x)k ], k ≥ 1 are constants. If they exist,
either covariance or semivariogram only depend on the difference between the
two locations at which Z is evaluated.

Definition 8 (Intrinsically stationary random fields) A random field is called


intrinsically stationary if the field semivariogram is only a function of the differ-
ence between the two positions at which the increment is computed, that is, if there
exists a real scalar field γ on Rd such that

γ(x, y) = γ(x − y) (2.11)

In general, the class of intrinsically stationary random fields is much larger


than that of stationary random fields. Furthermore, a stationary field is also intrin-
sically stationary.

Definition 9 (Second order stationary random fields) A random field is called


second order stationary if the field covariance exists and is only a function of the
difference between the two positions at which the increment is computed, that is,
if there exists a real scalar field C on Rd such that

C(x, y) = C(x − y) (2.12)

If a field Z has finite second order moments that are constant in space, definitions
2.11 and 2.12 are equivalent, since one can use equation 2.6 to obtain
h i h i h i
γ(x, y) = V ar Z(0) + V ar Z(0) − 2Cov Z(x), Z(y) (2.13)

Definition 10 (Increment stationary random fields) A random field is called


increment stationary if the field of increments Z(x) − Z(0) is stationary.

Increment stationary random fields are also intrinsically stationary.


Random fields 9

Definition 11 (Isotropic random fields) An intrinsically (second order) station-


ary random field is called isotropic if, for any x, y ∈ Rd the semivariogram
(covariance) only depends on the euclidean norm of the difference between the
the two points, that is γ(x, y) = γ(kx − yk).

Some special cases of anisotropy can be handled more easily, as in the case of
Definition 12 (Geometrically anisotropic random fields) An intrinsically sta-
tionary random field is geometrically anisotropic if its semivariogram is given
by
γ(x, y) = γ o (kA(x − y)k)
where A is a d × d matrix.
In the case of gaussian random fields, stationarity and second order station-
arity coincide, since the finite dimensional distributions of the field are entirely
determined by the covariance function.

2.5 Characterization and representation theorems for


variogram functions
In geostatistical interpolation, variograms have in general to be estimated from
the data. In order to reconstruct their functional form, however, it is necessary to
take into account that variograms belong to a special class of functions that will
now be defined. If this fact is disregarded, some serious inconsistencies may arise
when using estimated variograms which do not belong to this class, such as for
example negative values for positive quantities such as the kriging variance.

Definition 13 (Conditionally negative definite functions) A function φ(x, y) is


called conditionally negative if for any N ≥ 2, given xi ∈ Rd , i = 1, · · · , N and
any set of real numbers αi , i = 1, · · · , N such that
N
X
αi = 0
i=1

one has
N X
X N
αi αj φ(xi , xj ) ≤ 0.
i=1 j=1

Theorem 3 (Conditional negative definiteness of variograms) The semivari-


ogram of an an intrinsecally stationary random field is a conditionally negative
definite function.
Random fields 10
P
Proof: Let αi , i = 1, · · · , N such that N i=1 αi = 0 and assume that Z is an
intrinsecally stationary random field. Given xi ∈ Rd , i = 1, · · · , N, one has
( N )2 ( N N )
X 1 XX
αi Z(xi ) = − αi αj (Z(xi ) − Z(xj ))2 , (2.14)
i=1
2 i=1 j=1

PN
since i=1 αi = 0. Taking the expected value one obtains
N X
N m
!
X X
αi αj 2γ(xi − xj ) = −2var αi Z(si ) ≤ 0. (2.15)
i=1 j=1 i=1

Conditionally negative definite functions can be characterised as follows

Theorem 4 Let γ(·) be a continuous function on Rd such that γ(0) = 0. The


following statements are equivalent
• γ(·) is conditionally definite negative;

• for any a > 0, exp(−aγ(·)) is positive definite

• there exist a quadratic form Q(·) ≥ 0 and a positive measure


R +∞ RG(·) that
+∞
is symmetric, continuous at the origin and that satisfies −∞ . . . −∞ (1 +
kωk2 )−1 G(dω) < +∞ such that
Z +∞ Z +∞
1 − cos(ω 0 h)
γ(h) = Q(h) + ... G(dω). (2.16)
−∞ −∞ kωk2

As a result, one obtains the following representation theorem

Theorem 5 (Schoenberg-Yaglom) A continuous function φ(x, y) that is condi-


tionally negative definite and such that φ(x, x) = 0 is the variogram of an intrin-
sically stationary random field.

Proof: Define the random field


Z +∞ Z +∞ 0
eiω s − 1
Z(s) = ... W (dω), (2.17)
−∞ −∞ kωk

where {W (s) : s ∈ Rd } is a complex valued zero mean random field with inde-
pendent increments and such that E(|W (dω)|2 ) = G(dω)/2. One then has
Z +∞ Z +∞
eiω s Wh∗ (dω),
0
Z(s + h) − Z(s) = ... (2.18)
−∞ −∞
Random fields 11

where Wh∗ is the independent increment fields such that


Z ω1 Z ωd
∗ 2 ∗ 1 − cos(ν 0 h)
E(|Wh (dω)| ) = Gh (dω) = ... G(dν). (2.19)
−∞ −∞ kνk2

The random field defined by (2.17) has then semivariogram given by


Z Z +∞
1 +∞ 1 − cos(ω 0 h)
γ(h) = ... G(dω), (2.20)
2 −∞ −∞ kωk2

which is in the form of equation (2.16) with Q(h) = 0.

A consequence of these representation theorems is that, given any set of semi-


variograms γi , i = 1, · · · P
, m and non negative coefficients αi , i = 1, · · · , m the
linear combination γ = m i=1 αi γi is also the semivariogram of an intrinsically
stationary process.
Functions that satisfy the hypotheses of theorem 4 are also called admissible
or valid variogram functions. Similar representation theorems can be derived also
for covariograms, based on the concept of conditionally positive definite function.
For second order stationary fields the related representation theorems are entirely
equivalent.

2.6 Measurement error and subgrid scales:


the nugget effect
It is clear from definition 3 that for a stationary (in any sense) random field one
has γ(0) = 0. If the variogram is assumed to be continuous in the origin, it will be
seen in the following that the geostatistical interpolation procedure yields an exact
interpolation of the known data at the points where the field has effectively been
sampled. This is not appropriate in many cases for two reasons. On one hand, it
does not allow to include measurement error among the uncertainties that affect
the data: Measurement error is in general assumed to be spatially uncorrelated
and should not affect the structure of the variogram for values of h different from
zero. Another important effect that is not taken into account if the variogram
is assumed to be continuous is the so called nugget effect, i.e. the possibility
of sudden jumps in field values on spatial scales that have not been completely
sampled by the available data. In many applications, it is necessary to allow for
the possibility that even very close to the sampled point the reconstructed random
field can take rather different values in a way that is effectively independent of the
sampled value.
Random fields 12

Both these effects, although conceptually quite different, can be effectively de-
scribed by allowing the variogram to be discontinuous at the origin. In particular,
if limh→0 γ(h) = c0 with c0 different from zero, the variogram is said to display
the nugget effect. A complete proof of the formal equivalence of nugget effect
and inclusion of measurement errors can be found in [4].

2.7 Isotropic variogram models


A number of isotropic variogram model have been widely used in the applica-
tions. In all these examples, we denote semivariograms by γθ (·), where θ repre-
sents the vector of free parameters that fully determine the variogram shape. For
the variogram models we consider, it will often be the case that θ = (c 0 , c1 , c2 ),
where c0 is the nugget parameter, i.e. the non zero limit lim h→0 γ(h) = c0 in
case the variogram model is assumed to be discontinuous in the origin, c 1 is the
so called sill parameter, that is the limit value limh→+∞ γ(h) = +∞, and c2 is the
range, i.e. the typical spatial scale associated to significant changes in the vari-
ogram function. It is to be remarked that for some authors range denotes indeed
the maximum distance beyond which the correlation between two different field
values is zero; we use here a more general definition.

Definition 14 (Power law model) The power law variogram is given by



0, h = 0,
γθ (h) = (2.21)
c0 + c1 |h|λ , h 6= 0,

with θ = (c0 , c1 ) and c0 , c1 ≥ 0. The particular case λ = 1 is also known as


linear variogram model.

In order to satisfiy the requirements for admissible variograms described in section


2.5, it must be assumed that 0 < λ < 2. For this variogram model, lim h→+∞ γ(h) =
+∞, so that the variogram does not have a sill, does not define an associated co-
variogram and the associated random field does not have a spatial scale on which
correlations decay.

Definition 15 (Exponential model) The exponential variogram model is given


by (
 0,  h = 0,
γθ (h) = (2.22)
c0 + c1 1 − exp (−|h|/c2 ) , h 6= 0,

where θ = (c0 , c1 , c2 ), ci ≥ 0 for i = 0, 1, 2.


Random fields 13

Definition 16 (Gaussian model) The gaussian variogram model is defined by


(
 0,  h = 0,
γθ (h) = |h|2 (2.23)
c0 + c1 1 − exp (− c2 ) , h 6= 0,
2

with θ = (c0 , c1 , c2 ), ci ≥ 0 for i = 0, 1, 2.

It should be remarked that random fields with Gaussian variogram need not be
Gaussian random fields. Gaussian variograms imply very smooth random fields,
that are often not realistic for many practical applications.

Definition 17 (Spherical model) The spherical model is defined by





   0,   3 
h = 0,
γθ (h) = c0 + c1 23 |h|
c2
− 21 |h|
c2
, 0 < h ≤ c2 , (2.24)



c0 + c 1 , h > c2 ,

with θ = (c0 , c1 , c2 ), ci ≥ 0 for i = 0, 1, 2.

This formula defines a valid variogram only if h is the absolute value of a vector
in R2 or R3 .
Chapter 3

Variogram estimation

In order to estimate the variogram of an intrinsically stationary random field from


the available data, several variogram estimators have been introduced, which are
used to derive the so called empirical variogram, i.e. a discrete set of values to
which then an admissible variogram model can be fitted. For the purposes of this
presentation, we will restrict the attention to isotropic random fields, although
similar considerations can be carried out in the anisotropic case.

3.1 Empirical variogram estimators


A finite set of positive values hk , k = 1, · · · , K is introduced. These values are
assumed to be ordered so that hk < hk+1 and they are interpreted as absolute
distances from the origin. We also introduce the positive values δ k , k = 1, · · · , K
so that the intervals [hk − δ2k , hk + δ2k ] are mutually disjoint and cover completely
the interval [0, hK + δ2K ]. These values can be used to define the distance classes
δk δk
N (hk ) = {(xi , xj ) : hk − ≤ kxi − xj k < hk + δk , } (3.1)
2 2
Here, xi ∈ Rd , i = 1, · · · , N denotes as in the previous chapters the points at
which the data are available, so that class N (hk ) includes all pairs of measurement
points whose mutual distance falls in the interval [hk − δ2k , hk + δ2k ). N (hk ) =
|N (hk )| will denote in the following the cardinality of class N (hk ). In general,
it is required that the distance classes are sufficiently populated for the variogram
estimation to be significant. For example, [5] suggests that N (hk ) ≥ 30. In case
this condition is not satisfied, new values of hk should be chosen to guarantee the
significance of the variogram estimation.
The classical Matheron estimator is defined for k = 1, · · · , K as

14
Variogram estimation 15

1 X
γ̂ M (hk ) = (Z(xi ) − Z(xj ))2 . (3.2)
2N (hk )
N (hk )

This the most straightforward form of a variogram estimator and it has been
widely applied, see e.g. [5], [6], [7], [12].
One problem with the Matheron estimator is that it can be very sensitive to the
presence of outliers in the data. In [8], a more robust estimator was proposed by
Cressie and Hawkins. This is defined for k = 1, · · · , K as

!4
1 1 X 1
C
γ̂ (hk ) =   |Z(xi ) − Z(xj )| 2 . (3.3)
0.494 N (hk )
2 0.457 + N (hk )
N (hk )

This choice can be explained as follows since, for gaussian random fields,
Z(xi ) − Z(xj ))2 is a random variable with a χ21 distribution with one degree of
freedom. For this type of variables, it can be seen heuristically that elevating to the
power 1/4 is the transformation that yields a distribution most similar to a normal
distribution, for which it can be proven that |Z(xi ) − Z(xj )|1/2 are less correlated
among themselves than |Z(xi ) − Z(xj )|2 .
Another alternative is the estimator
  4 
med 1/2
γ̂ (h) = med |Z(xi ) − Z(xj )| : (xi , xj ) ∈ N (h) 2B(h), (3.4)

where med{·} denotes the median of the values in brackets and B(h) is a bias
corrector that tends to the asymptotic value of 0.457.

3.2 Least squares variogram fitting procedures


Once an empirical variogram has been estimated using the techniques outlined
in the previous section, a valid variogram model can be fitted to the estimated
values. More precisely, denote by γ̂ ] (h) one of the variogram estimators defined
in section 3.1 and by γ(h; θ) a valid variogram model, dependent on a parameter
vector θ. The simplest fitting procedure, also known as ordinary least squares
method, computes an optimal value of θ by minimization of the functional
K
X  ] 2
γ̂ (hk ) − γ(hk ; θ) . (3.5)
k=1
Variogram estimation 16

This provides a purely geometrical fitting and does not use any information on the
distribution of the specific estimator γ̂ ] (h) being used. This is instead taken into
account in the so called generalized least squares method, that can be defined as
follows. Let γ̂ ] (hk ) k = 1, · · · , K the estimated values of the empirical vari-
ogram for an a priori fixed number K of distance classes. Furthermore, assume
that the number of data pairs in each distance class is sufficiently large (Cressie
suggests to consider only classes for which at least 30 data pairs are present). One
can then consider the random vector 2γ ] = (2γ ] (h1 ), . . . , 2γ ] (hK ))T and its co-
variance matrix V = var(2γ ] ). The generalized least squares method consists in
determining the parameter vector θ that minimizes the functional

(2γ ] − 2γ(θ))T V(θ)−1 (2γ ] − 2γ(θ)), (3.6)


where 2γ(θ) = (2γ(h1 ; θ), . . . , 2γ(hK ; θ)T is the theorical variogram model to be
fitted computed at distances h1 , . . . , hK . Lo stimatore così ottenuto viene indicato
con θV] .
The generalized least square method is only using the second order moments
of the variogram estimator and does not require any assumption on the data distri-
bution. On the other hand, the covariance matrix can be quite complex to derive
and minimization of the functional 3.6 not easy. For this reason, a simplified pro-
cedure is presented in [5], that is based on heuristic considerations valid in the
case of a gaussian field Z. This derivation shows that the nondiagonal terms of
V can be disregarded in a first approximation, and that the diagonal terms can be
approximated by
2(2γ(hj ; θ)2
Vj,j ≈ .
|N (hj )|
As a consequence, an estimator of the parameter vector θ can be obtained by
minization of the functional
X K  2
γ
b(hj )
N (hj ) −1 . (3.7)
j=1
γ(hj ; θ)

Formula 3.7 yields a criterion that attributes a greater importance to well populated
distance classes hj for which N (hj ) is larger. This approximation can also be
considered as the first step of an iterative procedure, in which the minimization of
3.6 is sought via a sequence θ k , where θ 0 is obtained by minimizing 3.7, and the
following θ k are obtained by minimization of

(2γ ] − 2γ(θ))T V(θk−1 )−1 (2γ ] − 2γ(θ)). (3.8)


Chapter 4

Spatial prediction and kriging

Geostatistical interpolation consists in recovering an optimal prediction of the


field value at a location where no data are available, using the known data both for
the purpose of estimating the field variogram (or covariance) and to provide a pre-
diction and estimate the prediction error. It is to be remarked that the stationarity
assumptions that will

4.1 Ordinary kriging


In ordinary kriging, the uncertain data zi i = 1, · · · , N, assumed to be known
at the N points xi , i = 1, · · · , N are interpreted as a realization of an intrin-
sically stationary random field Z(x) with constant mean µ. The constant mean
is not assumed to be known, while the semivariogram has to be available. The
implications of using estimated variograms on the quality of the estimate will be
discussed later. This amounts to assume Z(x) = µ + δ(x), where δ is a zero mean
random field. Considering definition 3 these assumptions imply that
h 2 i h 2 i
E Z(x) − Z(y) =E δ(x) − δ(y) = 2γ(x, y). (4.1)

Under these assumption, one can define ordinary kriging as follows:

Definition 18 (Ordinary kriging) Given a point x0 , the ordinary kriging estima-


tor at x0 based on the data Z(xi ) i = 1, · · · , N is defined as the linear unbiased
estimator
XN
Ẑ(x0 ) = λi Z(xi )
i=1

of Z(x0 ) with minimum mean square prediction error.

17
Kriging 18
PN
It can be remarked that the unbiasedness assumption amounts to require i=1 λi =
1, since
h i hX
N i XN h i N
X
E Ẑ(x0 ) = E λi Z(xi ) = λi E Z(xi ) = µ λi ,
i=1 i=1 i=1
h i
which is equal to µ = E Z(x0 ) if and only if the coefficients of the linear com-
bination sum to one.
In order to derive an expression for these coefficients, it is practical to resort
to the method of Lagrange multipliers to reduce the problem to an unconstrained
minimization. Thus, one introduces the function

h N
X 2 i X
N 
φ(λ1 , . . . , λN , β) = E Z(x0 ) − λi Z(xi ) − 2β λi − 1
i=1 i=1

and seeks values of λ1 , . . . , λN , β such that φ attains its minimum. Before pro-
ceeding to the minimization, the function is rewritten using the fact that

 N
X 2
Z(x0 ) − λi Z(xi )
i=1
N
X X
N 2
2
= Z(x0 ) − 2Z(x0 ) λi Z(xi ) + λi Z(xi )
i=1 i=1
XN N
X
= Z(x0 )2 − 2Z(x0 ) λi Z(xi ) + λi Z(xi )2
i=1 i=1
N
X X
N 2
− λi Z(xi )2 + λi Z(xi )
i=1 i=1
XN  
= λi Z(x0 ) − 2Z(x0 )Z(xi ) + Z(xi )2
i=1

1h X i
N XN XN XN
2 2 2
− λi Z(xi ) + λj Z(xj ) − 2( λi Z(xi ) )( λj Z(xj )2 )
2 i=1 j=1 i=1 j=1
N
X  2 N
1 XX
N  2
= λi Z(x0 ) − Z(xi ) − λi λj Z(xi ) − Z(xj ) .
i=1
2 i=1 j=1
Kriging 19

Because of equation (4.1), this implies that

φ(λ1 , . . . , λN , β)
XN N
X h N
X i X
N 
= λi γ(x0 , xi ) λi γ(x0 , xi ) − λj γ(xi , xj ) − 2β λi − 1
i=1 i=1 j=1 i=1

X N
N X N
X X
N 
=− λi λj γ(xi , xj ) + 2 λi γ(x0 , xi ) − 2β λi − 1 . (4.2)
i=1 j=1 i=1 i=1

Setting the gradients of the function φ equal to zero leads to the linear system

ΓO λO = γ O (4.3)

where the unknown and right hand side are given by, respectively,
   
λ1 γ(x0 , x1 )
. . .   ... 
λO =  λN 
 γ O = 
γ(x0 , xN )
 (4.4)
β 1

and the system matrix is defined by


 
γ(x1 , x1 ) γ(x1 , x2 ) . . . γ(x1 , xN ) 1
 γ(x2 , x1 ) γ(x2 , x2 ) . . . γ(x2 , xN ) 1
 
ΓO =  ... ... ... ... 1
. (4.5)
γ(xN , x1 ) γ(xN , x2 ) . . . γ(xN , xN ) 1
1 1 ... 1 0

The ordinary kriging coefficients can then be determined solving the linear
system (4.3), so that

λO = Γ−1
O γO. (4.6)
It is to be remarked that the solution λO provides two types of information. Along
with the values of the coefficients λi , i = 1, . . . , N, the solution of the system also
provides the value of the Lagrange multiplier β that minimizes the mean square
prediction error. Putting the computed values back into the expression of this
functional one can see that the optimal value of the prediction error is given by

2
σOK (x0 ) = λTO γ O = γ TO Γ−1
O γO. (4.7)
This expression is also called kriging variance and is an estimate of the prediction
error associated with the ordinary kriging predictor.
Kriging 20

4.2 Universal kriging


In universal kriging, the uncertain data zi i = 1, · · · , N, assumed to be known
at the N points xi , i = 1, · · · , N are interpreted as a realization of a random field
that can be decomposed in the sum of a deterministic component and of an in-
Pstationary random field Z(x) with zero mean. This amounts to assume
trinsically
Z(x) = pj=1 βj fj (x) + δ(x), where δ is the zero mean random field. The deter-
ministic component is represented using shape functions fj that are assumed to be
known, along with the semivariogram of the random field δ, but the coefficients
βj are not needed to formulate the prediction. Under these assumption, one can
define universal kriging as follows:
Definition 19 (Universal kriging) Given a point x0 , the universal kriging es-
timator at x0 based on the data Z(xi ) i = 1, · · · , N is defined as the linear
unbiased estimator
N
X
Ẑ(x0 ) = λi Z(xi )
i=1
of Z(x0 ) with minimum mean squared prediction error.
It is to be remarked that if p = 1 and f1 = 1 are chosen, ordinary kriging is
recovered exactly. Introducing the matrix
 
f1 (x1 ) . . . fp (x1 )
 f1 (x2 ) . . . fp (x2 ) 
X =  ...
, (4.8)
... ... 
f1 (xN ) . . . fp (xN )
and the vectors
     
β1 δ(x1 ) Z(x1 )
β = . . .  δ =  . . .  Z =  . . . 
βp δ(xN ) Z(xN )
the universal kriging data can also be rewritten as
Z = Xβ + δ, (4.9)
which highlights the formal similarity with the general linear estimation problem.
The functional to be minimized can be written in the case of universal kriging as
h N
X 2 i
φ(λ1 , . . . , λN , m1 , . . . , mp ) = E Z(x0 ) − λi Z(xi )
i=1
p
X X
N 
− 2 mj λi fj (xi ) − fj (x0 )
j=1 i=1
Kriging 21

where ml , l = 1, . . . , p are the Lagrange multipliers. Repeating the derivation


along the lines of the previous section leads to the linear system

ΓU λU = γ U (4.10)

where the unknown and right hand side vector are given by, respectively,
   
λ1 γ(x0 , x1 )
. . .   ... 
λU =  λN 
 γ U = 
γ(x0 , xN )
 (4.11)
β 1

and the system matrix is given by

 
γ(x1 , x1 ) ... γ(x1 , xN ) f1 (x1 ) . . . fp (x1 )
 γ(x2 , x1 ) ... γ(x2 , xN ) f1 (x2 ) . . . fp (x2 ) 
 
 ... ... ... ... ... ... 
 
ΓU = 
γ(xN , x1 ) ... γ(xN , xN ) f1 (xN ) . . . fp (xN ). (4.12)
 f1 (x1 ) ... f1 (xN ) 0 ... 0 
 
 ... ... ... 0 ... 0 
fp (x1 ) ... fp (xN ) 0 ... 0

The universal kriging coefficients can then be determined by solving the linear
system (4.10), so that

λU = Γ−1
U γU . (4.13)
Similarly to the ordinary kriging case, along with the prediction also the mean
squared prediction error can be computed by the formula

σU2 K (x0 ) = λTU γ U = γ TU Γ−1


U γU (4.14)
once the universal kriging coefficients and Lagrange multipliers have been deter-
mined.
The main difficulty in the practical application of universal kringing lies with
the fact that, if the variogram is not known, for any random field with non constant
mean the standard variogram estimators described in section 3 are no more unbi-
ased and, indeed, cannot be applied if the coefficients βj are not known. These
can be in turn estimated assuming that the field δ has known covariance. Indeed, if
the covariance of the data Z is known and denoted by Σ, the standard generalized
least squares estimator yields the value

β gls = (X T Σ−1 X)−1 X T Σ−1 Z.


Kriging 22

However, the data covariance, assuming it exists, is in fact related to the vari-
ogram. This leads to a circularity between the hypotheses needed for variogram
estimation that can be resolved in a number of ways, none of which is free from
criticism and practical problems. The reader is referred to the discussion in [5],
[4] for more details.
Chapter 5

Appendix: Basics of random


variables

In order to make these notes self contained, some basic definitions and results
in probability theory are summarized in this appendix. There is also no attempt
at achieving a high standard of mathematical rigour in the formulation of the def-
initions and theorems. The reader interested in the complete presentation of the
measure theoretic problems associated with probability spaces and random vari-
ables should consult textbooks such as [2]. A basic introduction to probability
theory and mathematical statistics can be found in [10].

Definition 20 (Probability space) A probability space is defined by

• the set Ω of all events that are considered admissible

• the collection F of all subsets of Ω for which a probability is defined (which


includes Ω and the empty set ∅); in order to avoid some paradoxes and to
endow P with all the desirable properties defined below, F cannot coincide
with the set of all subsets of Ω and must satisfy a series of properties which
will not be listed here.

• the probability P, a function that assignes values in the interval [0, 1] to


each set in F , representing the relative weight of a given event with respect
to the set of all admissible events

The probability P must satisty the properties

• P (Ω) = 1;

• P (Ac ) = 1 − P (A), for each set A ∈ F , where Ac denotes the complemen-


tary of A;

23
Random variables 24

• given an arbitrary (possibly infinite) sequence of mutually disjoint sets A i , i ≥


1, Ai ∪ Aj = ∅, it holds
X
P (∪i≥1 Ai ) = P (Ai ).
i≥1

Definition 21 (Random variable) A random variable is a function Z = Z(ω)


which prescribes a real number Z for each event ω in a probability space (Ω, P ).

From a probabilistic viewpoint, the behaviour of a scalar random variable X is


completely determined if it is known how to compute the probabilities
h i
P X ∈ [a, b) . (5.1)

Definition 22 (Probability distribution) The probability distribution of the ran-


dom variable X is defined for each x ∈ R by
h i
FX (x) = P X ∈ [−∞, x) . (5.2)

Definition 23 (Continuous random variables) The random variable X has a


continuous distribution if for each x ∈ R there is a non negative real function
fX (u) such that
h i Z +∞
FX (x) = P X ∈ [−∞, x) = fX (u) du. (5.3)
−∞

fX (u) is called the probability density function of X.

An important example are gaussian random variables, for which the distribu-
tion are defined by multidimensional gaussian distributions

1 n (u − m)2 o
fX (u) = √ exp − (5.4)
2πa 2a
where m = is vector of space dependent quantities and a is a positive number.
The average and the variance of a random variable are defined as
Z +∞
mX = E[X] = ufX (u) du (5.5)
−∞

h i
2 2
V ar[X] = σX = E (X − mX )
Z +∞
= (u − m)2 fX (u) du (5.6)
−∞
Random variables 25

The median of a random variable is defined implicitly by the equation


1
FX (med[X]) = . (5.7)
2
The mean of a random variable its approximation with a constant that mini-
mizes the L2 norm of the difference, i.e.

Theorem 6 (Variance as minimum mean square estimator) For any real num-
ber λ, one has h i h i
E (X − mX )2 ≤ E (X − λ)2 .

The median of a random variable its approximation with a constant that mini-
mizes the L1 norm of the difference, i.e.

Theorem 7 (Median as minimum mean of absolute estimator) For any real


number λ, one has
h i h i
E |X − med[X]| ≤ E |X − λ| .

Other quantities such as the covariance require instead two dimensional finite
distributions: h i
Cov(X, Y ) = E (X − mX )(Y − mY ) . (5.8)
Existence of the covariance is equivalent to the existence of the second order mo-
ments of the random fields. Variance and covariance are related by

V ar[X + Y ] = V ar[X] + V ar[Y ] + Cov(X, Y ). (5.9)


Bibliography

[1] R.J. Adler. Geometry of Random Fields. Wiley, 1981.

[2] P. Billingsley. Probability and measure. Wiley, New York, 1986.

[3] M. D. Buhmann. Radial Basis Functions. Cambridge University Press,


Cambridge, 2003.

[4] R. Christensen. Linear Models for Multivariate, Time Series and Spatial
data. Springer Verlag, 1991.

[5] N. Cressie. Statistics for spatial data. Wiley, 1991.

[6] M.G. Genton. Higly robust variogram estimation. Mathematical Geology,


30:213–221, 1998.

[7] D.J. Gorsich and M.G. Genton. Variogram model selection via nonparamet-
ric derivative estimation. Mathematical Geology, 32:249–270, 2000.

[8] D.M. Hawkins and N. Cressie. Robust kriging-a proposal. Journal of the
International Association for Mathematical Geology, 16:3–18, 1984.

[9] G. Kitanidis. Geostatistics. In D.R. Maidment, editor, Handbook of Hydrol-


ogy, pages 153–165. McGraw Hill, 1993.

[10] S. Ross. Probability and Statistics for the applied sciences. ??, Berlin, 1995.

[11] J. Stoer and R. Bulirsch. An Introduction to Numerical Analysis, 2nd edition.


Springer Verlag, Berlin, 1990.

[12] H. Wackernagel. Multivariate Geostatistics. Springer Verlag, Berlin, 1995.

[13] A. T. Walden and P.Guttorp. Statistics in the environmental and earth sci-
ences. Arnold, 1992.

26

You might also like