Professional Documents
Culture Documents
Trend, Variation, and Universal Kriging
Trend, Variation, and Universal Kriging
Trend, Variation, and Universal Kriging
In ordinary kriging, it is assumed that the mean is constant across the entire region of
interest. The basic model for ordinary kriging is thus:
Yi = + i ,
where is the constant mean of Y and i is a stationary (either intrinsic or second order)
random process with semivariogram ().
In universal kriging, the mean is allowed to be non-constant and to be a function of the
site coordinates or other explanatory covariates:
Yi = 0 f0 (si ) + 1 x1 (si ) + + p xp (si ) + i ,
where i = 1, . . . , n sites, and the errors are assumed stationary. In matrix notation, this can
be expressed as:
Y
x (s ) xp (s0 )
1
0 0
..
..
..
Y = . =
.
.
Yn
x0 (sn ) xp (sn )
0
..
. +
..
. = X + .
This looks just like a linear regression model where Y is regressed on explanatory
variables at the same locations. For example, if there is a linear trend from one side of
the region to the other, we might use the model:
Yi = 0 + 1 s1i + 2 s2i + i ,
where (s1i , s2i ) represents the (s1i , s2i ) coordinates of the ith sample point. The expectation or mean in this model is just 0 + 1 s1i + 2 s2i and is known as a spatial trend
or drift.
How is this any different from the trend removal done with ordinary kriging?
The trend does not necessarily have to be a linear model. In general it can be comprised of nonlinear functions as well as sets of indicator variables. Why would we use
indicator variables?
In universal kriging, we are assuming both a nonconstant mean and the presence of local
(spatial) variation. This gives the following decomposition with any of the following
sets of terms used interchangeably:
Yi = large scale variation + small scale variation
trend or drift
residual error
nonstationary mean
random variation
deterministic signal
noise
133
The method of universal kriging seeks to model these two components simultaneously, given
some model for the trend and a model for the covariance structure. How this technique
specifically differs from ordinary kriging will be illustrated below through the derivation of
the universal kriging equations.
Unbiasedness Conditions: The goal here is to predict V0 . The universal kriging predictor is
a linear combination of the sample values (v1 , . . . , vn ), i.e.:
Vb0 =
n
X
wi vi .
i=1
It turns out that in order for this predictor to be unbiased for all possible vectors , we need
to satisfy the following conditions:
w1 X = x0 ,
where:
w1 = [w1 , . . . , wn ], the vector of weights,
X = a matrix of coordinates (or functions of coordinates) for the sample, where each
column corresponds to the values of one explanatory function for each of the n sites,
x0 = the vector of coordinate values for the point to be predicted.
For example, for a full second-order trend function (quadratic trend), we have:
1 x1 y1 x21 y12 x1 y1
..
..
..
..
X = ... ...
.
.
.
.
2
2
1 xn yn xn yn xn yn
x0 =
1 x0 y0 x20 y02 x0 y0
and
Letting p + 1 = the number of parameters in the trend function, since X is n (p + 1), there
are p + 1 unbiasedness conditions. In the quadratic trend model, this gives 6 unbiasedness
conditions as:
w1 X = x0
1 x1 y1 x21 y12 x1 y1
..
..
..
..
= [w1 wn ] ... ...
.
.
.
.
2
2
1 xn yn xn yn xn yn
Pn
wi = 1
i=1 wi xi = x0
Pn
i=1 wi yi = y0
Pn
2
2
i=1 wi xi = x0
Pn
2
2
i=1 wi yi = y0
Pn
i=1 wi xi yi = x0 y0
i=1
Pn
1 x0 y0 x20 y02 x0 y0
, i.e. :
134
In order for the estimator Vb0 to be unbiased, we must have E(Vb0 ) = E(V0 ), where E(V0 ) =
x0 for all . Hence, we need:
w1 X = x0 for all .
Clearly, if w1 X = x0 , then w1 X = x0 for all . Is the reverse true?
Suppose w1 X 6= x0 for some . Define:
X=
Choose =
1
0
..
.
x1 x2 xp+1
e.g.: X = 1 x y x2 y2 xy
0
For this choice of , if w1 x1 6= x01 , then certainly w1 X 6= x0 , and then Vb0 would not
be unbiased.
If we then select each column of the X matrix in turn, it follows that for some choice
of , if w1 X 6= x0 , then w1 X 6= x0 .
Since we initially assumed that w1 X 6= x0 for some , and this led to a violation of
the unbiasedness of Vb0 , then by the contrapositive argument, for Vb0 to be unbiased, we must
have w1 X = x0 for all .
The Universal Kriging Estimator: The universal kriging estimator is the linear unbiased
estimator of V0 with minimum prediction variance. I.e.: we seek to minimize the function:
U2 K
=
=
Var(Vb
XX
i
V0 ) = Var
n
X
w i Vi V0
i=1
wi wj Cij + 2 2
wi Ci0 ,
subject to the p + 1 unbiasedness conditions given earlier. Note that the only difference between this minimization problem and that of ordinary kriging is the number of unbiasedness
constraints. The form of the variance above is identical to that of ordinary kriging. Using
Lagrange multipliers then, we need to minimize:
U2 K
0
..
+ 2 (X w1 x0 ) where: = .
135
is a vector of Lagrange multipliers. Using calculus as with the ordinary kriging minimization
problem yields the following set of universal kriging equations, given in matrix form:
C
C1n f10
11
..
..
..
.
.
.
0
f10 f1n
..
.
..
..
.
.
f1p fnp 0
f1p
w
1
..
..
.
.
w
fnp
n
0 0
.
..
. ..
C
10
..
.
Cn0
f00
..
.
f0p
C 1 X w 1 c0
=
,
X 0
x0
where the only difference between these and the OK equations in form is the replacement of
the one-vectors by the X-matrix defining the trend piece.
Prediction Variance: The prediction variance in terms of the covariance function is:
U2 K = Var(Vb0 V0 ) = 2 2
= 2
w 1
wi Ci0 +
XX
i
wi wj Cij
c
0 (with some work)
x0
To express this prediction variance in terms of the semivariogram, the expressions are greatly
simplified if one of the columns of the X-matrix is a column of ones; that is, if there is an
intercept in the trend model. In this case of an intercept model, and under an assumption
of second-order stationarity, the kriging equations can be shown to have the form:
1 X w 1 0
=
,
X 0
x0
where : 1 = the matrix of semivariances among the sample locations,
0 = the vector of semivariances between the sample sites and the prediction site.
Important Note: This form of the kriging equations holds good only if the trend model has
an intercept term. The form is more complicated in the no-intercept model.
To see how to write
variance in terms of the semivariogram, note that:
the prediction
2
C10
..
0 = 2 1 c0 =
under second-order stationarity. Hence, the prediction vari.
2 Cn0
ance can be expressed as:
U2 K
= 2 w1 ( 2 1 0 ) x0 (by substitution)
= 2 2 + w1 0 x0 (since w1 1 = 1 for the intercept model)
=
w1 0 x0 .
Prediction Intervals: Prediction intervals can be constructed under an assumption of normality (or an appeal to the Central Limit Theorem) as:
Vb0 z1/2 U K .
Estimation of the Trend Parameters: In addition to the kriging predictions and corresponding standard errors, it may also be of interest to obtain estimates for the parameters in the
trend model used in the universal kriging. The best linear unbiased estimator (BLUE) of
is the generalized least squares estimator (assuming C1 is known). The GLS estimator of
is:
1
1
b
X C11 v
GLS = (X C1 X)
and the standard errors of the estimated coefficients are the square roots of the diagonal
elements of:
1
1
b
Var(
.
GLS ) = (X C1 X)
It is important to note that this variance-covariance matrix is for the case when C1 is
b
known, so that the variance of
GLS is probably higher than the estimated variance
whenever C1 is really unknown, as is usual in practice.
In practice, there is one primary difficulty with universal kriging which makes it difficult to identify good predictive models. It is very difficult to choose a trend model and
covariance model simultaneously. For this reason, it is generally easier to first model
the trend, remove it, and then consider the covariance structure separately. One could
model the deterministic piece (trend) and stochastic piece (variogram) separately simply to identify the proper structure, and then perform universal kriging on both pieces
at the same time.
Some Considerations For Modeling Trend and Variation : When data consist of but one realization of a spatial process, there is in general no way to uniquely specify the above decomposition. Often, perhaps usually, there is no obvious deterministic piece of the variation.
Our view of nonstationarity in the mean may also depend on scale. For example, when
looking at one mountain, the elevation does not appear to be stationary. However,
when looking at an entire mountain range, we may want to view elevation as having
a stationary mean. How we eventually decide to model spatial variability depends on
the goals of the study and on what we know about the system.
137
Example: Recall the soil moisture lattice data explored in a homework assignment
earlier this semester. In the text plot of moisture values, we clearly see one wet corner
and a gradient of decreasing wetness in the 45-degree direction, as well as one extreme
outlier (26).
13
11
1 2 3 4 5 6 7 8 9
Column (s1)
15
17
19
6
7
6
7
7
7
8
7
7
8
8
9
8
8
8
8
8
8
9
9
6
6
7
7
6
7
7
7
7
8
8
8
8
8
7
8
8
8
7
9
6
6
6
6
5
6
7
6
7
7
7
7
8
7
7
8
8
9
9
8
7
6
6
7
7
7
7
7
8
8
7
8
8
11
8
8
8
10
8
9
6
5
5
6
7
7
6
6
6
6
6
7
7
8
7
7
8
8
8
8
6
7
7
8
8
7
7
7
8
7
8
9
8
9
8
9
9
10
9
8
7
7
6
7
7
6
6
7
8
7
8
8
8
8
8
9
8
8
8
7
6
6
6
7
7
6
6
6
6
7
7
7
6
8
7
8
7
7
10
7
8
7
8
8
9
10
10
12
12
13
14
14
14
14
14
15
14
14
12
12
7
8
7
7
7
8
9
11
11
12
13
14
13
15
13
14
14
13
13
11
7
8
8
7
7
7
9
9
10
11
11
12
13
13
13
13
13
13
13
14
8
7
8
8
8
8
9
10
10
15
12
12
11
13
13
14
13
13
13
13
7
7
7
8
8
8
8
9
10
10
11
11
11
12
13
12
26
13
13
12
7
8
8
8
7
8
9
9
9
9
10
10
10
11
11
12
12
14
12
13
8
8
8
8
7
8
9
8
10
10
9
10
10
9
11
11
11
12
12
13
8
8
8
8
8
10
9
10
9
9
10
10
10
11
11
11
11
11
12
12
8
7
8
7
8
7
7
7
8
9
9
9
9
10
10
10
9
11
11
12
7
8
7
8
7
8
6
9
6
8
6
8
7
8
8
8
10 6
10 7
7
9
10 7
10 8
11 13
10 8
10 8
10 10
10 8
10 9
10 9
10 11 12 13 14 15 16 17 18 19 20
Row (s0)
1. If we were looking at just one field and there is some physical reason that the
corner is wet (the field may slope in that direction, or the irrigation system is close
to that side of the plot), then we would probably want to include the trend we see
in the model.
2. If the field was just one portion of a much larger field with many wetter and drier
areas, we might want to treat it as random variation on this larger scale.
Variogram estimation is greatly affected by our assumptions on the model. If we assume
a constant mean, the apparent gradient is incorporated into the covariance structure.
In the soil moisture example, there is a strong relationship between moisture values
and distance in the 45-degree direction. The variogram in that direction looks like a
power variogram model, with Var(Yi Yj ) continuing to increase as si & sj become
further apart. In the 45 degree direction there is very little trend in the moisture.
The variogram in this direction is nearly flat, indicating a much smaller variance and
possibly white noise (i.e.: no correlation among sites).
A couple of references for papers which examine the relationship between trend and
variation for geological spatial applications are:
1. Starks & Fang 1982. The effect of drift on the experimental semivariogram. Mathematical Geology, 14: 309-319.
2. Journel & Rossi 1989. When do we need a trend model in kriging? Mathematical
Geology, 21: 715-739.
138
Some Possible Models: In general, there is a wide range of models which explain the variability among spatial data points in a variety of ways. Some possibilities include:
1. Ordinary Kriging with constant mean:
Yi = + i ,
where i is stationary with mean zero and some covariance structure.
2. Universal Kriging with linear trend:
Yi = 0 + 1 s1i + 2 s2i + i ,
where i is stationary with mean zero and some covariance structure. Again, this
differs from ordinary kriging in that the deterministic and stochastic pieces are modeled
simultaneously.
3. Universal Kriging with quadratic trend:
Yi = 0 + 1 s1i + 2 s2i + 3 s21i + 4 s22i + 5 s1i s2i + i ,
where i is stationary with mean zero and some covariance structure.
4. Trend Surface Prediction with uncorrelated errors (e.g.: quadratic or higher-order
trend):
Yi = 0 + 1 s1i + 2 s2i + 3 s21i + 4 s22i + 5 s1i s2i + i ,
where here i (0, 2 ) are uncorrelated. In this case, no spatial techniques are required,
as it is assumed that once the trend is removed, there is no remaining correlation in
the resulting residuals.
As the model for the mean becomes more complicated and detailed, it explains more
of the variation in the data, and may yield residuals that appear uncorrelated. As with
regression modeling, it may be best to abide by the principle of parsimony, and underfit
rather than overfit the trend in order to keep the mean model relatively simple. Once
an adequate portion of the trend is removed, the remaining variation can be explained
using spatial correlation analyses such as variogram fitting.
From a practical standpoint, there is often not much reason that there should be a
polynomial relationship between the response variable and the geographic coordinates.
If, however, you are measuring concentration of a pollutant and your site is near a point
source, it may be reasonable that the response is a function of the distance from the
source. Any other examples from data we have looked at?
139
How do you decide what is part of the trend and what is random variation?: There is not a
lot of guidance toward this question in general, in the literature. Here are some general ideas
which address this question.
1. If there is some physical or biological reason that the mean should vary smoothly with
position, the function of position should be included in the trend part of the model.
2. If the objective of the study is to test for positional effects, they should be incorporated
into the trend part of the model.
3. If the objective of the study is prediction, the best advice seems to be to choose a
parsimonious model and use cross-validation techniques to check that the model is
giving reasonable predictions. Cross-validation will not answer the question about what
to consider local variation, but it should prevent problems associated with overfitting
the trend, namely:
(a) Unreliable extrapolation.
(b) Larger mean squared prediction error (MSPE) than necessary.
Starks & Fang (1982) elaborate on things to think about when examining a variogram in a
geological context. Much of what they suggest applies to a more general framework than
just geological data:
1. Check to see if drift is evident in the geological description of the area. If it is, include
it in the model.
2. Can the nugget effect in the sample variogram be logically explained by the measurement error or small-scale variation? If not, there may be trend present.
3. Can observed anisotropy be explained geologically?
4. Cross-validate the model and do normal probability plots of the PRESS residuals.
As a statistician, I tend to think of data purely from a quantitative perspective, ignoring
practical issues regarding the data (this is due to my limited background in the pure sciences). What Starks & Fang are saying is that any observed quantitative attribute in your
data should make sense from a practical standpoint to be included in a model. In other
words, let the science, not the data, drive the model.
Effect of Trend on Kriging Predictions: Semivariograms are strongly affected by trends in
the mean. Why?
Trends cause the variables at neighboring sites to appear more highly correlated.
140
The kriging predictions, however, seem to be more robust to drift in the mean. Consider the
following four modeling situations:
1. Minimal variogram, underfit the trend (e.g.: ordinary kriging using a variogram based
on just residual variation (i.e.: the trend was removed)).
2. Minimal variogram, fit the right amount of trend (universal kriging).
3. Minimal variogram, overfit the trend (universal kriging where we overfit the trend).
4. Maximal variogram, underfit the trend (e.g.: ordinary kriging using a variogram based
on the variation in V , the response variable).
It turns out that the kriging predictions for interpolated points for cases 1, 2, & 3,
which all use the same variogram type, are very similar provided that kriging is done
in moving neighborhoods. This is typical for kriging as explained shortly.
Extrapolation results, however, can vary a lot depending on the model chosen for the
mean. Additionally, the MSPE will tend to be too large.
The reason that these three predictions (for cases 1, 2, & 3) turn out to be similar (for
interpolation) is that kriging is typically done with moving neighborhoods. For any site
s0 , we use only the sample points nearby (within a specified radius) to compute kriging
predictions. We are really recomputing the constant mean for each site - much like
using a moving window average.
If there is really a trend present, universal kriging computes the trend within the
moving window. If the slope is not too great, the variability within a window may not
be large. Also, in an interpolation, s0 is typically near the center of the points in the
search neighborhood, so the predicted value of Y0 should be similar to Yb within the
neighborhood.
Cases 3 & 4 demonstrate situations where were not sure which part of the overall
variation is trend and which is random error. For example, in the soil moisture example,
we could probably fit two models reasonably well:
1. Ordinary kriging with constant mean and a power variogram model for the errors
(Maximal variogram, underfit the trend).
2. Kriging with trend and white noise (Minimal variogram, overfit the trend).
In this case, it is less clear what happens to the predictions; however, Cressie asserts that
the misspecifications essentially cancel each other out. In other words, the variability
in Y is incorporated either way: through the trend part or through the random part
of the model. Since both parts are added together to obtain the prediction of Y0 , the
predictor is not affected very much by these misspecifications. The prediction variance,
however, may depend greatly on which model you choose.
141
100
100 50
150 100 50
0.6
8
0.
0.9
200
Northing
0.5
0.7
1.0
0.8
0.6
0.4
50
0.4
Northing
50 100
100
200
Easting
100
Easting
After a few attempts at removing the clear SW-NE trend, I chose a linear regression model
with the easting and northing coordinates as the explanatory variables. The resulting reb 1 , x2 ) = 607.7707 1.278x1 1.139x2 .
gression model for the head values is given by: (x
Ordinary kriging was then performed on the resulting residuals, yielding a spherical model
for the variogram, with major axis at 90 degrees, minor axis at 0 degrees, range ratio at 2.0,
range = 58, nugget = 495, and sill = 2385.
As the spatial region was not rectangular or even convex, the identify function was used
to specify a region over which predictions would be made. The resulting predicted residuals
and corresponding standard errors were computed and plotted, as shown below.
40
20
200
100
100
50
50
100
20
40
0 50 100
Northing
40
0
100
200
100
100
Easting
SE of Predicted Residuals
SE of Predicted Residuals
40
46
200
46
Northing
46
44
40
50
42
50
40
34
100
100
100
38
48
0 50 100
Easting
40
40
36
48
50 0 50 100
150
50 0 50 100
20
46
150
Northing
Predicted Residuals
20
Northing
Predicted Residuals
100
200
Easting
100
Easting
142
100
The trend function was then added back to the predicted residuals, producing the final
ordinary kriging predicted region, shown below. It should be noted that the prediction
standard errors for the residuals are the same as those for the predictions, as the trend
function is viewed as a nonrandom (deterministic) function. Hence, adding back the trend
function values artificially has no effect on the prediction standard errors.
50 100
500
600
50 0
800
90
150
Northing
70
200
100 50 0
Northing
40
50 100
100
200
Easting
100
Easting
Having done all of this preliminary leg work for ordinary kriging, we will use the same trend
function and covariance model for performing universal kriging on the aquifer head values.
The R command for doing this are given below.
aq.ukrige <- krige(head ~ x+y, # Performs universal kriging, with x & y as
aquifer, poly.in,
#
explanatory variables and using the
model=model1.out)
#
variogram model in "model1.out".
As with ordinary kriging, contour plots and greyscale maps of the predicted head values and
their corresponding standard errors are given below. Based on these plots, at least visually,
the results from ordinary and universal kriging appear virtually identical, both in terms of
the prediction and their standard errors.
Predicted Head Values Universal Kriging
50 100
600
800
Northing
700
600
400
100
200
100
100
Easting
200
100
100
1000
800
600
400
100 50 0
Northing
42
46
46
40
34
42
46
40
46
420
38 36
44
50 100
40
48
48
50 0 50 100
800
Easting
46
150
100
48
Northing
150
0
90
200
1000
100 50 0
50 0 50 100
400
50
Northing
200
Easting
100
Easting
143
100
Scatterplots of the predictions from the two kriging methods and the corresponding standard
errors were constructed. What do these plots indicate?
While the standard errors and predictions are very similar resulting from ordinary and
universal kriging, there are some notable differences. First, note that the universal
kriging standard errors tend to be larger than the ordinary kriging standard errors,
and especially so the larger these values are. Remember that universal kriging, unlike
ordinary kriging, accounts for the variability in estimating the trend parameters; hence,
it more accurately reflects the overall variability in the predictions.
The predictions resulting from the two methods are very similar but there are some
differences as the points do not fall exactly on the 45-degree line. We can also note
that the universal kriging predictions tend to be larger for the larger head values.
UK vs. OK Standard Errors
30
800
400
600
UK Predictions
40
35
UK SE,s
45
50
1000
30
35
40
45
50
400
OK SEs
600
800
1000
OK Predictions