Professional Documents
Culture Documents
Lezkrig 005
Lezkrig 005
an introduction to geostatistical
interpolation for environmental
applications
Luca Bonaventura
Stefano Castruccio
MOX - Laboratorio di Matematica Applicata
Dipartimento di Matematica
Politecnico di Milano
luca.bonaventura@polimi.it
Contents
3 Variogram estimation 14
3.1 Empirical variogram estimators . . . . . . . . . . . . . . . . . . . 14
3.2 Least squares variogram fitting procedures . . . . . . . . . . . . . 15
References 26
i
Introduction
1
Chapter 1
2
Spatial data 3
Thus, a random field is a function of several real variables which also happens
to depend on elements of a probability space. A short review of the basic proper-
ties of these mathematical objects will show how they combine the peculiarities
of random variables and scalar fields on Rd . Concepts from both analysis and
probability theory are necessary for a proper description of their behaviour.
where ai , bi , denote the extremes of arbitrary intervals on the real line. For each N
and each set of N points xi , i = 1, · · · , N the probabilities (2.1) define uniquely
a set of values P(x1 ,··· ,xN ) [(a1 , b1 ), · · · , (aN , bN )] which identifies a probability
distribution on RN . These probability distributions are called finite dimensional
distributions of the random field Z. It should be observed that the quantities (2.1)
are symmetric with respect to permutations of the set of points xi , i = 1, · · · , N.
In the case of random fields with continuous finite dimensional distributions,
to compute (2.1) it is sufficient to prescribe for each N and each set of N points
xi , i = 1, · · · , N a probability density fZ (u) = f(Z(x1 ),··· ,Z(xN )) (u1 , · · · , uN ).
4
Random fields 5
An important example are gaussian random fields, for which the finite dimen-
sional distributions are defined by multidimensional gaussian distributions, whose
densities are given for a generic set of points xi , i = 1, · · · , N by
1 n −(u − m)Ȧ−1 (u − m) o
fZ (u) = p exp (2.2)
(2π)N det(A) 2
where m = (m(x1 ), · · · , m(xN ) is vector of space dependent quantities and A =
Ax1 ,··· ,xN is a symmetric, positive definite matrix.
h i h 2 i
V ar Z(x) = σZ2 (x) = E Z(x) − m(x)
Z +∞
= (u − m(x))2 fZ(x) (u) du (2.4)
−∞
The computation of mean and variance only involves the one dimensional distri-
butions. Other quantities such as the covariance require instead two dimensional
finite distributions
h i h i
Cov Z(x), Z(y) = E Z(x) − m(x) Z(y) − m(y) . (2.5)
The covariance is defined if the first and second order moments of the random field
exist. In the case of Gaussian random fields whose finite dimensional distributions
are described by equation (2.2), the vector m = (m(x1 ), · · · , m(xN ) has indeed
Random fields 6
The quantity
1 h i
γ(x, y) = V ar Z(x) − Z(y)
2
is called semivariogram of Z. If Z has constant mean, the semivariogram is de-
fined equivalently as
1 h 2 i
γ(x, y) = E Z(x) − Z(y)
2
If a random field has second order moments, both variogram and covariance
exist and there is a simple relationship between them
h i h i h i h i
V ar Z(x) − Z(y) = V ar Z(x) + V ar Z(y) − 2Cov Z(x), Z(y) (2.6)
. Higher order moments can also be defined as done for standard random variables.
However, in practice they are quite difficult to estimate from the data and in many
applications estimation and inference are only feasible for first and second order
moments.
These convergence concepts are not independent of each other: for example, both
convergence in mean square sense and convergence with probability one imply
convergence in probability.
An important result relating the continuity of a random field to the properties
of its second order moments is the following:
Theorem 2 (Continuity of random fields) If there is a β > 0 such that
h 2 i
E Z(x) − Z(y) ≤ Ckx − yk2d+β
The stationarity property can also be summarized by saying that the finite di-
mensional distributions of a stationary field are translation invariant. As a conse-
quence, all the single site moments E[Z(x)k ], k ≥ 1 are constants. If they exist,
either covariance or semivariogram only depend on the difference between the
two locations at which Z is evaluated.
If a field Z has finite second order moments that are constant in space, definitions
2.11 and 2.12 are equivalent, since one can use equation 2.6 to obtain
h i h i h i
γ(x, y) = V ar Z(0) + V ar Z(0) − 2Cov Z(x), Z(y) (2.13)
Some special cases of anisotropy can be handled more easily, as in the case of
Definition 12 (Geometrically anisotropic random fields) An intrinsically sta-
tionary random field is geometrically anisotropic if its semivariogram is given
by
γ(x, y) = γ o (kA(x − y)k)
where A is a d × d matrix.
In the case of gaussian random fields, stationarity and second order station-
arity coincide, since the finite dimensional distributions of the field are entirely
determined by the covariance function.
one has
N X
X N
αi αj φ(xi , xj ) ≤ 0.
i=1 j=1
PN
since i=1 αi = 0. Taking the expected value one obtains
N X
N m
!
X X
αi αj 2γ(xi − xj ) = −2var αi Z(si ) ≤ 0. (2.15)
i=1 j=1 i=1
where {W (s) : s ∈ Rd } is a complex valued zero mean random field with inde-
pendent increments and such that E(|W (dω)|2 ) = G(dω)/2. One then has
Z +∞ Z +∞
eiω s Wh∗ (dω),
0
Z(s + h) − Z(s) = ... (2.18)
−∞ −∞
Random fields 11
Both these effects, although conceptually quite different, can be effectively de-
scribed by allowing the variogram to be discontinuous at the origin. In particular,
if limh→0 γ(h) = c0 with c0 different from zero, the variogram is said to display
the nugget effect. A complete proof of the formal equivalence of nugget effect
and inclusion of measurement errors can be found in [4].
It should be remarked that random fields with Gaussian variogram need not be
Gaussian random fields. Gaussian variograms imply very smooth random fields,
that are often not realistic for many practical applications.
This formula defines a valid variogram only if h is the absolute value of a vector
in R2 or R3 .
Chapter 3
Variogram estimation
14
Variogram estimation 15
1 X
γ̂ M (hk ) = (Z(xi ) − Z(xj ))2 . (3.2)
2N (hk )
N (hk )
This the most straightforward form of a variogram estimator and it has been
widely applied, see e.g. [5], [6], [7], [12].
One problem with the Matheron estimator is that it can be very sensitive to the
presence of outliers in the data. In [8], a more robust estimator was proposed by
Cressie and Hawkins. This is defined for k = 1, · · · , K as
!4
1 1 X 1
C
γ̂ (hk ) = |Z(xi ) − Z(xj )| 2 . (3.3)
0.494 N (hk )
2 0.457 + N (hk )
N (hk )
This choice can be explained as follows since, for gaussian random fields,
Z(xi ) − Z(xj ))2 is a random variable with a χ21 distribution with one degree of
freedom. For this type of variables, it can be seen heuristically that elevating to the
power 1/4 is the transformation that yields a distribution most similar to a normal
distribution, for which it can be proven that |Z(xi ) − Z(xj )|1/2 are less correlated
among themselves than |Z(xi ) − Z(xj )|2 .
Another alternative is the estimator
4
med 1/2
γ̂ (h) = med |Z(xi ) − Z(xj )| : (xi , xj ) ∈ N (h) 2B(h), (3.4)
where med{·} denotes the median of the values in brackets and B(h) is a bias
corrector that tends to the asymptotic value of 0.457.
This provides a purely geometrical fitting and does not use any information on the
distribution of the specific estimator γ̂ ] (h) being used. This is instead taken into
account in the so called generalized least squares method, that can be defined as
follows. Let γ̂ ] (hk ) k = 1, · · · , K the estimated values of the empirical vari-
ogram for an a priori fixed number K of distance classes. Furthermore, assume
that the number of data pairs in each distance class is sufficiently large (Cressie
suggests to consider only classes for which at least 30 data pairs are present). One
can then consider the random vector 2γ ] = (2γ ] (h1 ), . . . , 2γ ] (hK ))T and its co-
variance matrix V = var(2γ ] ). The generalized least squares method consists in
determining the parameter vector θ that minimizes the functional
Formula 3.7 yields a criterion that attributes a greater importance to well populated
distance classes hj for which N (hj ) is larger. This approximation can also be
considered as the first step of an iterative procedure, in which the minimization of
3.6 is sought via a sequence θ k , where θ 0 is obtained by minimizing 3.7, and the
following θ k are obtained by minimization of
17
Kriging 18
PN
It can be remarked that the unbiasedness assumption amounts to require i=1 λi =
1, since
h i hX
N i XN h i N
X
E Ẑ(x0 ) = E λi Z(xi ) = λi E Z(xi ) = µ λi ,
i=1 i=1 i=1
h i
which is equal to µ = E Z(x0 ) if and only if the coefficients of the linear com-
bination sum to one.
In order to derive an expression for these coefficients, it is practical to resort
to the method of Lagrange multipliers to reduce the problem to an unconstrained
minimization. Thus, one introduces the function
h N
X 2 i X
N
φ(λ1 , . . . , λN , β) = E Z(x0 ) − λi Z(xi ) − 2β λi − 1
i=1 i=1
and seeks values of λ1 , . . . , λN , β such that φ attains its minimum. Before pro-
ceeding to the minimization, the function is rewritten using the fact that
N
X 2
Z(x0 ) − λi Z(xi )
i=1
N
X X
N 2
2
= Z(x0 ) − 2Z(x0 ) λi Z(xi ) + λi Z(xi )
i=1 i=1
XN N
X
= Z(x0 )2 − 2Z(x0 ) λi Z(xi ) + λi Z(xi )2
i=1 i=1
N
X X
N 2
− λi Z(xi )2 + λi Z(xi )
i=1 i=1
XN
= λi Z(x0 ) − 2Z(x0 )Z(xi ) + Z(xi )2
i=1
1h X i
N XN XN XN
2 2 2
− λi Z(xi ) + λj Z(xj ) − 2( λi Z(xi ) )( λj Z(xj )2 )
2 i=1 j=1 i=1 j=1
N
X 2 N
1 XX
N 2
= λi Z(x0 ) − Z(xi ) − λi λj Z(xi ) − Z(xj ) .
i=1
2 i=1 j=1
Kriging 19
φ(λ1 , . . . , λN , β)
XN N
X h N
X i X
N
= λi γ(x0 , xi ) λi γ(x0 , xi ) − λj γ(xi , xj ) − 2β λi − 1
i=1 i=1 j=1 i=1
X N
N X N
X X
N
=− λi λj γ(xi , xj ) + 2 λi γ(x0 , xi ) − 2β λi − 1 . (4.2)
i=1 j=1 i=1 i=1
Setting the gradients of the function φ equal to zero leads to the linear system
ΓO λO = γ O (4.3)
where the unknown and right hand side are given by, respectively,
λ1 γ(x0 , x1 )
. . . ...
λO = λN
γ O =
γ(x0 , xN )
(4.4)
β 1
The ordinary kriging coefficients can then be determined solving the linear
system (4.3), so that
λO = Γ−1
O γO. (4.6)
It is to be remarked that the solution λO provides two types of information. Along
with the values of the coefficients λi , i = 1, . . . , N, the solution of the system also
provides the value of the Lagrange multiplier β that minimizes the mean square
prediction error. Putting the computed values back into the expression of this
functional one can see that the optimal value of the prediction error is given by
2
σOK (x0 ) = λTO γ O = γ TO Γ−1
O γO. (4.7)
This expression is also called kriging variance and is an estimate of the prediction
error associated with the ordinary kriging predictor.
Kriging 20
ΓU λU = γ U (4.10)
where the unknown and right hand side vector are given by, respectively,
λ1 γ(x0 , x1 )
. . . ...
λU = λN
γ U =
γ(x0 , xN )
(4.11)
β 1
γ(x1 , x1 ) ... γ(x1 , xN ) f1 (x1 ) . . . fp (x1 )
γ(x2 , x1 ) ... γ(x2 , xN ) f1 (x2 ) . . . fp (x2 )
... ... ... ... ... ...
ΓU =
γ(xN , x1 ) ... γ(xN , xN ) f1 (xN ) . . . fp (xN ). (4.12)
f1 (x1 ) ... f1 (xN ) 0 ... 0
... ... ... 0 ... 0
fp (x1 ) ... fp (xN ) 0 ... 0
The universal kriging coefficients can then be determined by solving the linear
system (4.10), so that
λU = Γ−1
U γU . (4.13)
Similarly to the ordinary kriging case, along with the prediction also the mean
squared prediction error can be computed by the formula
However, the data covariance, assuming it exists, is in fact related to the vari-
ogram. This leads to a circularity between the hypotheses needed for variogram
estimation that can be resolved in a number of ways, none of which is free from
criticism and practical problems. The reader is referred to the discussion in [5],
[4] for more details.
Chapter 5
In order to make these notes self contained, some basic definitions and results
in probability theory are summarized in this appendix. There is also no attempt
at achieving a high standard of mathematical rigour in the formulation of the def-
initions and theorems. The reader interested in the complete presentation of the
measure theoretic problems associated with probability spaces and random vari-
ables should consult textbooks such as [2]. A basic introduction to probability
theory and mathematical statistics can be found in [10].
• P (Ω) = 1;
23
Random variables 24
An important example are gaussian random variables, for which the distribu-
tion are defined by multidimensional gaussian distributions
1 n (u − m)2 o
fX (u) = √ exp − (5.4)
2πa 2a
where m = is vector of space dependent quantities and a is a positive number.
The average and the variance of a random variable are defined as
Z +∞
mX = E[X] = ufX (u) du (5.5)
−∞
h i
2 2
V ar[X] = σX = E (X − mX )
Z +∞
= (u − m)2 fX (u) du (5.6)
−∞
Random variables 25
Theorem 6 (Variance as minimum mean square estimator) For any real num-
ber λ, one has h i h i
E (X − mX )2 ≤ E (X − λ)2 .
The median of a random variable its approximation with a constant that mini-
mizes the L1 norm of the difference, i.e.
Other quantities such as the covariance require instead two dimensional finite
distributions: h i
Cov(X, Y ) = E (X − mX )(Y − mY ) . (5.8)
Existence of the covariance is equivalent to the existence of the second order mo-
ments of the random fields. Variance and covariance are related by
[4] R. Christensen. Linear Models for Multivariate, Time Series and Spatial
data. Springer Verlag, 1991.
[7] D.J. Gorsich and M.G. Genton. Variogram model selection via nonparamet-
ric derivative estimation. Mathematical Geology, 32:249–270, 2000.
[8] D.M. Hawkins and N. Cressie. Robust kriging-a proposal. Journal of the
International Association for Mathematical Geology, 16:3–18, 1984.
[10] S. Ross. Probability and Statistics for the applied sciences. ??, Berlin, 1995.
[13] A. T. Walden and P.Guttorp. Statistics in the environmental and earth sci-
ences. Arnold, 1992.
26