Trend, Variation, and Universal Kriging

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

Trend, Variation, and Universal Kriging

In ordinary kriging, it is assumed that the mean is constant across the entire region of
interest. The basic model for ordinary kriging is thus:
Yi = + i ,
where is the constant mean of Y and i is a stationary (either intrinsic or second order)
random process with semivariogram ().
In universal kriging, the mean is allowed to be non-constant and to be a function of the
site coordinates or other explanatory covariates:
Yi = 0 f0 (si ) + 1 x1 (si ) + + p xp (si ) + i ,
where i = 1, . . . , n sites, and the errors are assumed stationary. In matrix notation, this can
be expressed as:

Y
x (s ) xp (s0 )
1
0 0
..
..
..

Y = . =
.
.

Yn
x0 (sn ) xp (sn )

0
..
. +

..
. = X + .

This looks just like a linear regression model where Y is regressed on explanatory
variables at the same locations. For example, if there is a linear trend from one side of
the region to the other, we might use the model:
Yi = 0 + 1 s1i + 2 s2i + i ,
where (s1i , s2i ) represents the (s1i , s2i ) coordinates of the ith sample point. The expectation or mean in this model is just 0 + 1 s1i + 2 s2i and is known as a spatial trend
or drift.
How is this any different from the trend removal done with ordinary kriging?

The trend does not necessarily have to be a linear model. In general it can be comprised of nonlinear functions as well as sets of indicator variables. Why would we use
indicator variables?

In universal kriging, we are assuming both a nonconstant mean and the presence of local
(spatial) variation. This gives the following decomposition with any of the following
sets of terms used interchangeably:
Yi = large scale variation + small scale variation
trend or drift
residual error
nonstationary mean
random variation
deterministic signal
noise
133

The method of universal kriging seeks to model these two components simultaneously, given
some model for the trend and a model for the covariance structure. How this technique
specifically differs from ordinary kriging will be illustrated below through the derivation of
the universal kriging equations.

Unbiasedness Conditions: The goal here is to predict V0 . The universal kriging predictor is
a linear combination of the sample values (v1 , . . . , vn ), i.e.:
Vb0 =

n
X

wi vi .

i=1

It turns out that in order for this predictor to be unbiased for all possible vectors , we need
to satisfy the following conditions:
w1 X = x0 ,
where:
w1 = [w1 , . . . , wn ], the vector of weights,
X = a matrix of coordinates (or functions of coordinates) for the sample, where each
column corresponds to the values of one explanatory function for each of the n sites,
x0 = the vector of coordinate values for the point to be predicted.
For example, for a full second-order trend function (quadratic trend), we have:

1 x1 y1 x21 y12 x1 y1

..
..
..
..
X = ... ...
.
.
.
.

2
2
1 xn yn xn yn xn yn

x0 =

1 x0 y0 x20 y02 x0 y0

and

Letting p + 1 = the number of parameters in the trend function, since X is n (p + 1), there
are p + 1 unbiasedness conditions. In the quadratic trend model, this gives 6 unbiasedness
conditions as:
w1 X = x0

1 x1 y1 x21 y12 x1 y1

..
..
..
..
= [w1 wn ] ... ...
.
.
.
.

2
2
1 xn yn xn yn xn yn
Pn

wi = 1
i=1 wi xi = x0
Pn
i=1 wi yi = y0
Pn
2
2
i=1 wi xi = x0
Pn
2
2
i=1 wi yi = y0
Pn
i=1 wi xi yi = x0 y0
i=1

Pn

1 x0 y0 x20 y02 x0 y0

, i.e. :

(ordinary kriging unbiasedness condition)

(additional universal kriging unbiasedness conditions)

134

Derivation of the Unbiasedness Conditions: First, note that:


E(Vb0 ) = E[w1 V ] = w1 E[X + ] = w1 X.

In order for the estimator Vb0 to be unbiased, we must have E(Vb0 ) = E(V0 ), where E(V0 ) =
x0 for all . Hence, we need:
w1 X = x0 for all .
Clearly, if w1 X = x0 , then w1 X = x0 for all . Is the reverse true?
Suppose w1 X 6= x0 for some . Define:
X=

Choose =

1
0
..
.

x1 x2 xp+1

, where xi is the ith column of X.

e.g.: X = 1 x y x2 y2 xy

for the quadratic trend model

so that X = x1 and x0 = x01 .

0
For this choice of , if w1 x1 6= x01 , then certainly w1 X 6= x0 , and then Vb0 would not
be unbiased.
If we then select each column of the X matrix in turn, it follows that for some choice
of , if w1 X 6= x0 , then w1 X 6= x0 .
Since we initially assumed that w1 X 6= x0 for some , and this led to a violation of
the unbiasedness of Vb0 , then by the contrapositive argument, for Vb0 to be unbiased, we must
have w1 X = x0 for all .
The Universal Kriging Estimator: The universal kriging estimator is the linear unbiased
estimator of V0 with minimum prediction variance. I.e.: we seek to minimize the function:
U2 K

=
=

Var(Vb

XX
i

V0 ) = Var

n
X

w i Vi V0

i=1

wi wj Cij + 2 2

wi Ci0 ,

subject to the p + 1 unbiasedness conditions given earlier. Note that the only difference between this minimization problem and that of ordinary kriging is the number of unbiasedness
constraints. The form of the variance above is identical to that of ordinary kriging. Using
Lagrange multipliers then, we need to minimize:
U2 K

0
..

+ 2 (X w1 x0 ) where: = .

135

is a vector of Lagrange multipliers. Using calculus as with the ordinary kriging minimization
problem yields the following set of universal kriging equations, given in matrix form:

C
C1n f10
11
..
..
..
.
.
.

Cn1 Cnn fn0

0
f10 f1n

..
.
..
..
.
.

f1p fnp 0

f1p
w
1
..
..

.
.

w
fnp
n

0 0

.
..
. ..

C
10
..
.

Cn0
f00
..
.
f0p

In shorthand notation, these UK equations can be written as:

C 1 X w 1 c0

=
,
X 0

x0

where the only difference between these and the OK equations in form is the replacement of
the one-vectors by the X-matrix defining the trend piece.
Prediction Variance: The prediction variance in terms of the covariance function is:
U2 K = Var(Vb0 V0 ) = 2 2
= 2

w 1

wi Ci0 +

XX
i

wi wj Cij

c
0 (with some work)
x0

To express this prediction variance in terms of the semivariogram, the expressions are greatly
simplified if one of the columns of the X-matrix is a column of ones; that is, if there is an
intercept in the trend model. In this case of an intercept model, and under an assumption
of second-order stationarity, the kriging equations can be shown to have the form:

1 X w 1 0

=
,
X 0

x0
where : 1 = the matrix of semivariances among the sample locations,
0 = the vector of semivariances between the sample sites and the prediction site.
Important Note: This form of the kriging equations holds good only if the trend model has
an intercept term. The form is more complicated in the no-intercept model.
To see how to write
variance in terms of the semivariogram, note that:
the prediction

2
C10

..

0 = 2 1 c0 =
under second-order stationarity. Hence, the prediction vari.

2 Cn0
ance can be expressed as:
U2 K

= 2 w1 c0 x0 (from previous page)


136

= 2 w1 ( 2 1 0 ) x0 (by substitution)
= 2 2 + w1 0 x0 (since w1 1 = 1 for the intercept model)
=

w1 0 x0 .

Prediction Intervals: Prediction intervals can be constructed under an assumption of normality (or an appeal to the Central Limit Theorem) as:
Vb0 z1/2 U K .

Estimation of the Trend Parameters: In addition to the kriging predictions and corresponding standard errors, it may also be of interest to obtain estimates for the parameters in the
trend model used in the universal kriging. The best linear unbiased estimator (BLUE) of
is the generalized least squares estimator (assuming C1 is known). The GLS estimator of
is:
1
1
b

X C11 v
GLS = (X C1 X)
and the standard errors of the estimated coefficients are the square roots of the diagonal
elements of:
1
1
b
Var(
.
GLS ) = (X C1 X)
It is important to note that this variance-covariance matrix is for the case when C1 is
b
known, so that the variance of
GLS is probably higher than the estimated variance
whenever C1 is really unknown, as is usual in practice.

In practice, there is one primary difficulty with universal kriging which makes it difficult to identify good predictive models. It is very difficult to choose a trend model and
covariance model simultaneously. For this reason, it is generally easier to first model
the trend, remove it, and then consider the covariance structure separately. One could
model the deterministic piece (trend) and stochastic piece (variogram) separately simply to identify the proper structure, and then perform universal kriging on both pieces
at the same time.

Some Considerations For Modeling Trend and Variation : When data consist of but one realization of a spatial process, there is in general no way to uniquely specify the above decomposition. Often, perhaps usually, there is no obvious deterministic piece of the variation.
Our view of nonstationarity in the mean may also depend on scale. For example, when
looking at one mountain, the elevation does not appear to be stationary. However,
when looking at an entire mountain range, we may want to view elevation as having
a stationary mean. How we eventually decide to model spatial variability depends on
the goals of the study and on what we know about the system.

137

Example: Recall the soil moisture lattice data explored in a homework assignment
earlier this semester. In the text plot of moisture values, we clearly see one wet corner
and a gradient of decreasing wetness in the 45-degree direction, as well as one extreme
outlier (26).

13
11
1 2 3 4 5 6 7 8 9

Column (s1)

15

17

19

Text Plot of Moisture Data


6
7
7
6
6
7
7
7
7
7
8
8
9
9
8
8
8
8
8
9

6
7
6
7
7
7
8
7
7
8
8
9
8
8
8
8
8
8
9
9

6
6
7
7
6
7
7
7
7
8
8
8
8
8
7
8
8
8
7
9

6
6
6
6
5
6
7
6
7
7
7
7
8
7
7
8
8
9
9
8

7
6
6
7
7
7
7
7
8
8
7
8
8
11
8
8
8
10
8
9

6
5
5
6
7
7
6
6
6
6
6
7
7
8
7
7
8
8
8
8

6
7
7
8
8
7
7
7
8
7
8
9
8
9
8
9
9
10
9
8

7
7
6
7
7
6
6
7
8
7
8
8
8
8
8
9
8
8
8
7

6
6
6
7
7
6
6
6
6
7
7
7
6
8
7
8
7
7
10
7

8
7
8
8
9
10
10
12
12
13
14
14
14
14
14
15
14
14
12
12

7
8
7
7
7
8
9
11
11
12
13
14
13
15
13
14
14
13
13
11

7
8
8
7
7
7
9
9
10
11
11
12
13
13
13
13
13
13
13
14

8
7
8
8
8
8
9
10
10
15
12
12
11
13
13
14
13
13
13
13

7
7
7
8
8
8
8
9
10
10
11
11
11
12
13
12
26
13
13
12

7
8
8
8
7
8
9
9
9
9
10
10
10
11
11
12
12
14
12
13

8
8
8
8
7
8
9
8
10
10
9
10
10
9
11
11
11
12
12
13

8
8
8
8
8
10
9
10
9
9
10
10
10
11
11
11
11
11
12
12

8
7
8
7
8
7
7
7
8
9
9
9
9
10
10
10
9
11
11
12

7
8
7
8
7
8
6
9
6
8
6
8
7
8
8
8
10 6
10 7
7
9
10 7
10 8
11 13
10 8
10 8
10 10
10 8
10 9
10 9

10 11 12 13 14 15 16 17 18 19 20

Row (s0)
1. If we were looking at just one field and there is some physical reason that the
corner is wet (the field may slope in that direction, or the irrigation system is close
to that side of the plot), then we would probably want to include the trend we see
in the model.
2. If the field was just one portion of a much larger field with many wetter and drier
areas, we might want to treat it as random variation on this larger scale.
Variogram estimation is greatly affected by our assumptions on the model. If we assume
a constant mean, the apparent gradient is incorporated into the covariance structure.
In the soil moisture example, there is a strong relationship between moisture values
and distance in the 45-degree direction. The variogram in that direction looks like a
power variogram model, with Var(Yi Yj ) continuing to increase as si & sj become
further apart. In the 45 degree direction there is very little trend in the moisture.
The variogram in this direction is nearly flat, indicating a much smaller variance and
possibly white noise (i.e.: no correlation among sites).
A couple of references for papers which examine the relationship between trend and
variation for geological spatial applications are:
1. Starks & Fang 1982. The effect of drift on the experimental semivariogram. Mathematical Geology, 14: 309-319.
2. Journel & Rossi 1989. When do we need a trend model in kriging? Mathematical
Geology, 21: 715-739.
138

Some Possible Models: In general, there is a wide range of models which explain the variability among spatial data points in a variety of ways. Some possibilities include:
1. Ordinary Kriging with constant mean:
Yi = + i ,
where i is stationary with mean zero and some covariance structure.
2. Universal Kriging with linear trend:
Yi = 0 + 1 s1i + 2 s2i + i ,
where i is stationary with mean zero and some covariance structure. Again, this
differs from ordinary kriging in that the deterministic and stochastic pieces are modeled
simultaneously.
3. Universal Kriging with quadratic trend:
Yi = 0 + 1 s1i + 2 s2i + 3 s21i + 4 s22i + 5 s1i s2i + i ,
where i is stationary with mean zero and some covariance structure.
4. Trend Surface Prediction with uncorrelated errors (e.g.: quadratic or higher-order
trend):
Yi = 0 + 1 s1i + 2 s2i + 3 s21i + 4 s22i + 5 s1i s2i + i ,
where here i (0, 2 ) are uncorrelated. In this case, no spatial techniques are required,
as it is assumed that once the trend is removed, there is no remaining correlation in
the resulting residuals.
As the model for the mean becomes more complicated and detailed, it explains more
of the variation in the data, and may yield residuals that appear uncorrelated. As with
regression modeling, it may be best to abide by the principle of parsimony, and underfit
rather than overfit the trend in order to keep the mean model relatively simple. Once
an adequate portion of the trend is removed, the remaining variation can be explained
using spatial correlation analyses such as variogram fitting.
From a practical standpoint, there is often not much reason that there should be a
polynomial relationship between the response variable and the geographic coordinates.
If, however, you are measuring concentration of a pollutant and your site is near a point
source, it may be reasonable that the response is a function of the distance from the
source. Any other examples from data we have looked at?

139

How do you decide what is part of the trend and what is random variation?: There is not a
lot of guidance toward this question in general, in the literature. Here are some general ideas
which address this question.
1. If there is some physical or biological reason that the mean should vary smoothly with
position, the function of position should be included in the trend part of the model.
2. If the objective of the study is to test for positional effects, they should be incorporated
into the trend part of the model.
3. If the objective of the study is prediction, the best advice seems to be to choose a
parsimonious model and use cross-validation techniques to check that the model is
giving reasonable predictions. Cross-validation will not answer the question about what
to consider local variation, but it should prevent problems associated with overfitting
the trend, namely:
(a) Unreliable extrapolation.
(b) Larger mean squared prediction error (MSPE) than necessary.
Starks & Fang (1982) elaborate on things to think about when examining a variogram in a
geological context. Much of what they suggest applies to a more general framework than
just geological data:
1. Check to see if drift is evident in the geological description of the area. If it is, include
it in the model.
2. Can the nugget effect in the sample variogram be logically explained by the measurement error or small-scale variation? If not, there may be trend present.
3. Can observed anisotropy be explained geologically?
4. Cross-validate the model and do normal probability plots of the PRESS residuals.
As a statistician, I tend to think of data purely from a quantitative perspective, ignoring
practical issues regarding the data (this is due to my limited background in the pure sciences). What Starks & Fang are saying is that any observed quantitative attribute in your
data should make sense from a practical standpoint to be included in a model. In other
words, let the science, not the data, drive the model.
Effect of Trend on Kriging Predictions: Semivariograms are strongly affected by trends in
the mean. Why?
Trends cause the variables at neighboring sites to appear more highly correlated.

140

The kriging predictions, however, seem to be more robust to drift in the mean. Consider the
following four modeling situations:
1. Minimal variogram, underfit the trend (e.g.: ordinary kriging using a variogram based
on just residual variation (i.e.: the trend was removed)).
2. Minimal variogram, fit the right amount of trend (universal kriging).
3. Minimal variogram, overfit the trend (universal kriging where we overfit the trend).
4. Maximal variogram, underfit the trend (e.g.: ordinary kriging using a variogram based
on the variation in V , the response variable).
It turns out that the kriging predictions for interpolated points for cases 1, 2, & 3,
which all use the same variogram type, are very similar provided that kriging is done
in moving neighborhoods. This is typical for kriging as explained shortly.
Extrapolation results, however, can vary a lot depending on the model chosen for the
mean. Additionally, the MSPE will tend to be too large.
The reason that these three predictions (for cases 1, 2, & 3) turn out to be similar (for
interpolation) is that kriging is typically done with moving neighborhoods. For any site
s0 , we use only the sample points nearby (within a specified radius) to compute kriging
predictions. We are really recomputing the constant mean for each site - much like
using a moving window average.
If there is really a trend present, universal kriging computes the trend within the
moving window. If the slope is not too great, the variability within a window may not
be large. Also, in an interpolation, s0 is typically near the center of the points in the
search neighborhood, so the predicted value of Y0 should be similar to Yb within the
neighborhood.
Cases 3 & 4 demonstrate situations where were not sure which part of the overall
variation is trend and which is random error. For example, in the soil moisture example,
we could probably fit two models reasonably well:
1. Ordinary kriging with constant mean and a power variogram model for the errors
(Maximal variogram, underfit the trend).
2. Kriging with trend and white noise (Minimal variogram, overfit the trend).
In this case, it is less clear what happens to the predictions; however, Cressie asserts that
the misspecifications essentially cancel each other out. In other words, the variability
in Y is incorporated either way: through the trend part or through the random part
of the model. Since both parts are added together to obtain the prediction of Y0 , the
predictor is not affected very much by these misspecifications. The prediction variance,
however, may depend greatly on which model you choose.
141

Using R to Perform Universal Kriging


To illustrate how R performs universal kriging, consider the following data on piezometrichead heights (in meters above sea level) for the Wolfcamp Aquifer in West Texas collected
at 85 locations given by easting and northing coordinates, as shown below.
Greyscale Map of Head Values

100
100 50

150 100 50

0.6
8
0.

0.9
200

Northing

0.5

0.7

1.0
0.8
0.6
0.4

50

0.4

Northing

50 100

Contour Map of Head Values

100

200

Easting

100

Easting

After a few attempts at removing the clear SW-NE trend, I chose a linear regression model
with the easting and northing coordinates as the explanatory variables. The resulting reb 1 , x2 ) = 607.7707 1.278x1 1.139x2 .
gression model for the head values is given by: (x
Ordinary kriging was then performed on the resulting residuals, yielding a spherical model
for the variogram, with major axis at 90 degrees, minor axis at 0 degrees, range ratio at 2.0,
range = 58, nugget = 495, and sill = 2385.
As the spatial region was not rectangular or even convex, the identify function was used
to specify a region over which predictions would be made. The resulting predicted residuals
and corresponding standard errors were computed and plotted, as shown below.

40
20

200

100

100

50

50

100

20

40

0 50 100

Northing

40

0
100

200

100

100

Easting

SE of Predicted Residuals

SE of Predicted Residuals

40
46

200

46

Northing

46
44

40

50

42

50

40

34

100

100

100

38

48

0 50 100

Easting

40
40
36
48

50 0 50 100
150
50 0 50 100

20

46

150

Northing

Predicted Residuals

20

Northing

Predicted Residuals

100

200

Easting

100

Easting

142

100

The trend function was then added back to the predicted residuals, producing the final
ordinary kriging predicted region, shown below. It should be noted that the prediction
standard errors for the residuals are the same as those for the predictions, as the trend
function is viewed as a nonrandom (deterministic) function. Hence, adding back the trend
function values artificially has no effect on the prediction standard errors.

50 100
500

600

50 0

800

90

150

Northing

70

200

100 50 0

Northing

Predicted Head Values

40

50 100

Predicted Head Values

100

200

Easting

100

Easting

Having done all of this preliminary leg work for ordinary kriging, we will use the same trend
function and covariance model for performing universal kriging on the aquifer head values.
The R command for doing this are given below.
aq.ukrige <- krige(head ~ x+y, # Performs universal kriging, with x & y as
aquifer, poly.in,
#
explanatory variables and using the
model=model1.out)
#
variogram model in "model1.out".
As with ordinary kriging, contour plots and greyscale maps of the predicted head values and
their corresponding standard errors are given below. Based on these plots, at least visually,
the results from ordinary and universal kriging appear virtually identical, both in terms of
the prediction and their standard errors.
Predicted Head Values Universal Kriging

50 100

600

800

Northing

700

600
400

100

200

100

100

Easting

SE of Predicted Head Values

SE of Predicted Head Values

200

100

100

1000
800
600
400

100 50 0

Northing

42

46

46

40

34

42

46

40

46

420

38 36

44

50 100

40
48

48

50 0 50 100

800

Easting

46

150

100

48

Northing

150

0
90
200

1000

100 50 0

50 0 50 100

400

50

Northing

Predicted Head Values Universal Kriging

200

Easting

100

Easting

143

100

Scatterplots of the predictions from the two kriging methods and the corresponding standard
errors were constructed. What do these plots indicate?
While the standard errors and predictions are very similar resulting from ordinary and
universal kriging, there are some notable differences. First, note that the universal
kriging standard errors tend to be larger than the ordinary kriging standard errors,
and especially so the larger these values are. Remember that universal kriging, unlike
ordinary kriging, accounts for the variability in estimating the trend parameters; hence,
it more accurately reflects the overall variability in the predictions.
The predictions resulting from the two methods are very similar but there are some
differences as the points do not fall exactly on the 45-degree line. We can also note
that the universal kriging predictions tend to be larger for the larger head values.
UK vs. OK Standard Errors

30

800
400

600

UK Predictions

40
35

UK SE,s

45

50

1000

UK vs. OK Predicted Values

30

35

40

45

50

400

OK SEs

600

800

1000

OK Predictions

These plots were generated with the following commands:


par(mfrow=c(1,2))
# Sets up a 1x2 graphics display.
plot(sqrt(krige.out$var1.var),
# Plots the UK SEs vs. the OK SEs.
sqrt(aq.ukrige$var1.var),
xlab="OK SEs",ylab="UK SE,s",pch=16,cex=0.5,cex.lab=1.5,cex.axis=1.5)
lines(c(28,52),c(28,52))
# Overlays the 45-degree line.
title("UK vs. OK Standard Errors",cex.main=1.5)
plot(aq.pred,aq.ukrige$var1.pred,
# Plots the UK vs. OK predictions.
xlab="OK Predictions",
ylab="UK Predictions",pch=16,cex=0.5,cex.lab=1.5,cex.axis=1.5)
lines(c(300,1070),c(300,1070))
# Overlays the 45-degree line.
title("UK vs. OK Predicted Values",cex=1.5)
144

You might also like