Download as pdf or txt
Download as pdf or txt
You are on page 1of 32

Math Geosci

https://doi.org/10.1007/s11004-020-09853-6

Clarifications and New Insights on Conditional Bias

Gilles Bourgault1

Received: 22 May 2019 / Accepted: 7 January 2020


© International Association for Mathematical Geosciences 2020

Abstract This study revisits the conditional bias that can be observed with spatial
estimators such as kriging. In the geostatistical literature, the term “conditional bias”
has been used to describe two different effects: underestimation of high values and
overestimation of low values, or the opposite, viz. overestimation of high values and
underestimation of low values. To add to the confusion, the smoothing effect of the
estimator is always indicated to be the culprit. It seems that geostatisticians have been
debating conditional bias since the birth of geostatistics. Is less or more smoothing
required to alleviate conditional bias, and which one? This paradox is actually resolved
when one considers the different distribution partitions on which conditional expecta-
tion can be calculated. Depending on the partitions of the bivariate distribution of true
versus estimated values, conditional expectation can be calculated on conditional or
marginal distributions. These lead to different types of conditional bias, and smooth-
ing affects them differently. The type based on conditional distributions is smoothing
friendly, while the type based on marginal distributions is smoothing adverse. The same
estimator can display under- and overestimation, depending on whether a conditional
or marginal distribution is considered. It is also observed that all conditional biases,
regardless of the bivariate distribution partitions, are greatly affected by the variance
of the conditioning data and vary with the sampling. A simple estimator correction can
be applied to exactly remove the smoothing-friendly conditional bias in the sample as
measured by the slope of the linear regression between the true and estimated values in
cross-validation. Over many samplings, it is observed that this cross-validation mea-
sure is itself conditionally biased, depending on the variance of the data. On the other
hand, the smoothing-adverse type of conditional bias can be corrected by conditional

B Gilles Bourgault
gilles_bourgault@yahoo.com

1 Computer Modelling Group Ltd, Calgary, Canada

123
Math Geosci

simulation that reproduces the distribution of the data. The results are also biased,
depending on the variance of the conditioning data. Correcting for the smoothing-
adverse type will worsen the smoothing-friendly type, and vice versa. Both types of
conditional bias can be corrected by averaging statistics, or averaging estimates, over
multiple samplings.

Keywords Conditional expectation · Conditional bias · Kriging · Smoothing ·


Cross-validation · Linear regression · Conditional simulation · Resampling

1 Introduction

This study explores the problem of conditional bias that can affect spatially weighted-
average estimators such as kriging. Any spatial estimator of averaging type is affected
by smoothing, and smoothing is always invoked as the culprit when geostatisticians
discuss the problem of conditional bias. The variance of the estimated values is smaller
than the variance of the data, and smaller than the variance of the true values at the
estimation locations. With smoothing, the local mean is well estimated but the extreme
values are not. Since a kriging estimate is an average, the kriging estimator underesti-
mates the high values and overestimates the low values by pulling the extreme values
towards the local mean (Goovaerts 1997; Deutsch and Journel 1998; Deutsch 2002).
While not being biased for the global mean, the estimates have smaller proportions of
extreme values compared with what is found in the data and in the population. This
can be called conditional bias. However, when one cross-plots the true values versus
their kriging estimates, one also often observes exactly the opposite. Now, on aver-
age, the kriged values tend to overestimate the high values and underestimate the low
values (Isaaks and Srivastava 1989; Magri et al. 2003). This is also called conditional
bias. It was first described and addressed by Krige (1951) in the context of mining,
hence giving birth to geostatistics. It appears to be a contradiction to the principle of
smoothing by averaging. Indeed, it seems paradoxical that the same estimator tends
to overshoot the extreme values (on average) while at the same time underestimating
their proportions.
Some geostatisticians advocate that restoring the variance in the estimates is
required to remove the conditional bias (Goovaerts 1997; McLennan and Deutsch
2002; Journel and Kyriakidis 2004). Instead, others advocate for more smoothing by
including more distant data in the local spatial averaging or by giving more influence
to the data global mean in the local estimate (Krige 1951; Rivoirard 1987). This para-
dox around the concept of conditional bias has been a source of much confusion and
many debates among geostatisticians. Is less or more smoothing required to alleviate
conditional bias, and which one? Isaaks (2004) provides a review of previous works
and a good discussion in the context of mining, where the ability of an estimator to
predict local values competes with its ability to predict mineral reserves. The same
estimator cannot achieve local and global estimation. It will be biased for one or the
other, which is yet another way to express the paradox surrounding conditional bias.
This study is an attempt to clarify the concept of conditional bias by revisiting its
basics. An experimental approach is used. Conditional bias is explored by analyzing

123
Math Geosci

estimates and true values from multiple samplings on a synthetic dataset. Kriged
estimates and conditionally simulated values are used to compare conditional bias for
smooth and nonsmooth estimators. First, the definition of conditional expectation is
reviewed to show that an estimator is associated with different types of conditional
bias. Second, sampling experiments show that smoothing affects them differently,
hence resolving the paradox. Corrections for conditional bias are also explored and
discussed.

2 Many Definitions for Conditional Expectation

To help understand the contradiction around conditional bias, one can start with a very
general definition for conditional expectation. The most basic definition is simply
the expected value of a random variable for a given condition. The condition can be
defined on the same variable, or defined on one or multiple related variables such as an
estimator, which is the case of interest in this study. The goal is to define a meaningful
subspace (a partition) within the bivariate distribution of the variable being studied
and its estimator, for a more refined analysis in assessing the accuracy of the estimator
beyond global unbiasedness (unconditional expectation).
Given a random variable Z, with realization values z, given its estimator Z * , with
estimated values z* , and given a condition c, one can define conditional expectation in
the most general manner as E{Z | c} and E{Z * | c}, where E{.} denotes the expected
value operator. If E{Z * | c}  E{Z | c}, then the estimator is not biased for condition
c. If c happens to be defined as Z *  z* , then E{Z * | Z *  z* }  z* by definition, and
the estimator is not conditionally biased if E{Z | Z *  z* }  z* as well. The estima-
tor is said to be accurate on average. This is the original definition for conditional
unbiasedness as used by Krige (1951). It calls for the statistical characteristic of the
conditional distribution f (Z|Z *  z* ), where f (.) denotes the probability density func-
tion. However, one can also define the condition as Z * > t, where t is a threshold or
cutoff value, and compare the conditional expectation E{Z * | Z * > t} with E{Z | Z *
> t}. This actually compares a partition of the marginal distribution f (Z * ) with the con-
ditional distribution f (Z | Z * > t), itself a partition of the bivariate distribution f (Z,Z * ).
Now, two subpopulations must be considered to analyze this conditional bias. More-
over, one can also compare two marginal distributions for the same threshold value,
for example comparing E{Z * | Z * > t} with E{Z | Z > t}. Although both expectations
are conditional, the conditions Z * > t and Z > t are not the same, even if they share
the same threshold. Now, these two conditional expectations call for comparison of a
partition of the marginal distribution f (Z * ) with a partition of the marginal distribution
f (Z). No conditional distribution of type f (Z | Z * ) is involved. The term “conditional
bias” still applies when E{Z * | Z * > t}  E{Z | Z > t}, as the estimator is usually not
biased without any condition (E{Z * }  E{Z}). Figure 1 presents a diagram showing
the different possible partitions for the bivariate distribution f (Z, Z * ), providing a
visual comparison for the partitions associated with the different conditional expec-
tations used in geostatistics; For example, one can see that the conditions Z * > t and
Z > t are not the same condition; indeed, albeit partly overlapping, they do define two

123
Math Geosci

Fig. 1 Possible partitions of the


bivariate distribution f (Z,Z * )
corresponding to different
conditional distributions and
leading to different types of
conditional bias

different partitions of the bivariate distribution f (Z,Z * ) (regions 1 and 2 and regions 1
and 4, respectively, as seen in Fig. 1).
It is understandable that these various definitions are a source of confusion in the
geostatistical literature because they are all used and termed “conditional expecta-
tion/bias.” Mistakes can therefore be made when conditional bias is reported without
a clear reference to the associated distributions (or partitions) involved. As shown in
the remainder of this study, the effect of the estimator smoothing is not the same if one
compares the marginal distributions f (Z * ) and f (Z), or instead, compares the marginal
distribution f (Z * ) with a conditional distribution f (Z | Z * ). Returning to the paradox
motivating this study, one can say that the estimator is overestimating if E{Z * | Z *
> t} > E{Z | Z * > t}, and one can also say that the same estimator is also underestimat-
ing if E{Z * | Z * > t} < E{Z | Z > t}. These are two different conditional biases based
on different conditions. It will be shown that it is possible for an estimator to display
underestimation and overestimation at the same time, depending on which distribu-
tion (or partition) one considers. Now, when comparing E{Z * | Z * > t} with E{Z | Z
> t}, one can already conclude that, if the estimator Z * has the same distribution as
the variable Z, i.e., f (Z * )  f (Z), then all the conditional expectations comparing the
two marginal distributions (E{Z * | Z * > t} versus E{Z | Z > t}) will be equal and the
estimator will be unbiased for these types of conditional expectation (even if Z and Z *
are not even correlated). Of course, these conditional expectations may greatly differ
if f (Z * ) is smoother (smaller variance) than f (Z). Nevertheless, E{Z * | Z * > t} may
still not be biased for E{Z | Z * > t} when considering f (Z | Z * ) instead of f (Z). It will
be shown that it is important to be specific when discussing under/overestimation and
conditional bias.
Conditional expectation is also tied to regression analysis (Draper and Smith 1998).
Linear regression was used by Krige (1951) to measure and correct the conditional
bias of type E{Z | Z *  z* }. After revisiting the relationships between conditional
expectation and linear regression, it will be shown that a kriging estimator can always
be corrected to provide estimates without conditional bias of type E{Z | Z *  z* }, at
least for the previously estimated values. Using various sampling experiments, this
correction will be used and its effects analyzed when estimating new locations.

123
Math Geosci

3 Conditional Expectation and Linear Regression

In the context of spatial estimation, one is interested in predicting the actual value,
at any nonsampled location, from the observed values at the sampled locations. One
can hope that a good estimator will predict correctly, at least on average, over the
field of the study. If one has access to the actual values at any estimated/predicted
location, then a conditional bias analysis can be carried out a posteriori by cross-
plotting the actual values on the y-axis versus the estimated values on the x-axis.
Conditional bias of type E{Z | Z *  z* } is observed as the deviation of the esti-
mated values from correctly predicting the actual values, on average. This conditional
bias is detected when the expectation of the true values given an estimated value
(conditional expectation) deviates from the 45° line. The deviation does not need to
be linear (Journel and Huijbregts 1978; Isaaks 2004), but following Krige (1951),
the linear regression can be used to analyze the linear component of the condi-
tional expectation associated with a bivariate distribution f (Z, Z * ). The equation
of the linear fit of regression is recalled below, as it is the basis for further discus-
sions

cov(Z , Z ∗ ) ∗
E{Z |Z ∗  z ∗ }  E{Z } + (z − E{Z ∗ }), (1)
var(Z ∗ )

where Z is the variable representing the true values, Z * is the variable representing the
estimator/predictor values, z* is an actual value of the estimator (kriging estimate),
E{.} is the expected value operator, cov(·, ·) is the covariance operator, and var(·) is
the variance operator.
In Eq. (1), the ratio cov(Z, Z * )/var(Z * ) measures the slope of the linear fit. The
estimator Z * is said to be conditionally unbiased (David 1977; Journel and Hui-
jbregts 1978; Armstrong 1998; Deutsch and Journel 1998; Chiles and Delfiner 2012)
when

E{Z |Z ∗  z ∗ }  z ∗ . (2)

Thus, the estimator can correctly predict the actual values on average. Since most
estimators, especially of kriging type, are globally unbiased, one can also write

E{Z ∗ }  E{Z }. (3)

Conditions (2) and (3) allow Eq. (1) to be rewritten in the form

cov(Z , Z ∗ ) ∗ cov(Z , Z ∗ )
z ∗  E{Z } + (z − E{Z }) ⇒  1. (4)
var(Z ∗ ) var(Z ∗ )

Equation (4) indicates that the slope of the linear regression is equal to 1.0 (the
45° line) when the estimator is globally unbiased (Eq. 3) and conditionally unbiased
(Eq. 2). This is a general result, even when the deviation from the 45° line is not linear.
More specifically, this means that the covariance between the actual values and the

123
Math Geosci

estimated values equals the variance of the estimated values (Eq. 4). It will be shown
that this is a key point when trying to fix conditional bias of type E{Z | Z *  z* }. Note
that, if the estimator is globally unbiased (Eq. 3), then Eq. (1) shows that it is also
conditionally unbiased for the global mean when z*  E{Z * }, regardless of the slope
of the linear regression. This highlights the importance of the global mean as a special
value in conditional unbiasedness of type E{Z | Z *  z* }.
In practice, conditional bias analysis can also be done a priori with cross-validation
(Efron 1982; Davis 1987; Isaaks and Srivastava 1989), when in turn over the dataset,
one datum is left out and is estimated using its neighbors. The next section explores the
robustness of cross-validation for measuring conditional bias with linear regression,
and the impact of the variance of the sampled data values (due to sampling fluctua-
tions) on the various conditional expectations, for a smooth estimator (kriging) and a
nonsmooth estimator (conditional simulation).

4 Geostatistical Experiments

Simple geostatistical experiments can be used to explore how the different condi-
tional expectations behave and compare in terms of bias, and if they can be corrected.
The basic idea is to sample a known population and analyze all types of conditional
expectation used in practice (Sect. 2). The primary variable of the GSLIB dataset,
true.dat (Deutsch and Journel 1998), is chosen as the known population because its
distribution is characterized by a log-normal shape, typical of precious-metal deposits,
where conditional bias is often observed. From the population of the 2,500 true values
(on a 50 × 50 grid), 11 datasets of 100 sample values each are created by random
sampling in the population. This is resampling with the particularity that no dupli-
cates are allowed in a given sampling, but a given sample value (spatial location)
can be included by the luck of the draw in more than one sampling. Resampling
allows for delineation of more robust results and conclusions (Efron 1982). For each
sampling, ordinary kriging with a relatively small neighborhood (four closest sam-
ple values) is done on the remaining 2,400 locations. A small kriging neighborhood
is used because this limits spatial smoothing and is often done in practice to favor
more detailed mapping. However, it is also known to increase the chance of display-
ing conditional bias of type E{Z | Z *  z* } (Rivoirard 1987; Krige 1996; Chiles and
Delfiner 2012), which is of interest in this study. Cross-validation is also done on
each of the 100 conditioning data values for each of the 11 datasets, also using ordi-
nary kriging with the four closest sample values, thus allowing comparison of the
conditional bias analysis by linear regression a priori against the actual conditional
bias of the kriging estimates at the nonsampled locations. In addition, 25 conditional
simulations (using the sequential Gaussian method) are computed at the 2,400 loca-
tions for 3 of the samplings representing different categories of the variance ratio
between the sample values and the true values at the kriging locations. Estimated and
simulated values are used to calculate the various types of conditional expectation.
The results are summarized in Tables 1, 2, and 3, and discussed below.

123
Math Geosci

Table 1 Comparison of ordinary kriging and cross-validation results for 11 sets of 100 data sampled at
random in the true population of 2,500 values

Sampling Regression Regression Variance (100 Variance Variance ratio


(100 data) slope (2,400 slope data) (2,400 true (100 data/2,400
kriging (x-validation values) true values)
estimates) for 100 data)

#1 0.38 0.53 43.10 25.83 2.14


#2 0.62 0.19 30.77 26.35 1.17
#3 0.47 0.31 29.53 26.41 1.12
#4 0.86 0.67 15.88 26.97 0.59
#5 0.80 0.87 10.92 27.16 0.40
#6 0.35 0.95 34.00 26.22 1.30
#7 0.52 0.70 10.82 27.17 0.40
#8 0.63 0.52 32.79 26.26 1.25
#9 0.95 0.82 9.98 27.20 0.37
#10 0.37 0.40 40.44 25.94 1.56
#11 0.86 0.68 11.15 27.17 0.41
Average 0.62 0.6 24.49 26.61 0.97

Table 2 Averaged, minimum, and maximum slopes, over 25 realizations, for the linear regression of 2,400
true values versus 2,400 simulated values, using three different sets of 100 conditioning sampled data values
with different variance ratio relative to the variance of the 2,400 true values

Variance ratio Averaged Minimum Maximum Regression slope


(100 sampled regression slope regression slope regression slope for kriging
data/2,400 true (true versus (Table 1)
values) simulated values)

0.37 (#9 in 0.54 0.34 0.84 0.95


Table 1)
1.12 (#3 in 0.285 0.12 0.41 0.47
Table 1)
2.14 (#1 in 0.23 0.13 0.41 0.38
Table 1)
Results are from experiments #1 with high variance ratio, #3 with a variance ratio close to 1, and #9 with a
low variance ratio (Table 1)

4.1 Kriging Results: Smooth Estimator

Table 1 presents the results, in terms of regression slopes and variances, from the 11
samplings in the known population. The actual conditional bias of type E{Z | Z * 
z* } is measured by the slope of the linear regression of the 2,400 true values versus
the kriged values from the 100 conditioning data values (referred to as the kriging
regression or kriging slope). The first thing one can notice is the variability of the
results. The kriging regression slope varies a lot, from 0.35 to 0.95. Therefore, in
this case, all krigings display the classical conditional bias of type E{Z | Z *  z* },

123
Math Geosci

Table 3 Comparing marginal distributions for the averaged expected value above a low and high cutoff, over
25 realizations, for the 2,400 simulated values (Z s ), using three different set of 100 conditioning sampled
data with different variance ratio relative to the variance of the 2,400 true values (from experiments #1, #3,
and #9 in Table 1; see also Table 2)

Variance Average from 25 From 2,400 kriged From 2,400 true


ratio (100 realizations values values
sampled
data/2,400
true values)

E{E{Z s | Z s > t}} E{Z * | Z * > t} E{Z | Z > t}

Average of Average of Values above Values above True True


expected expected first quartile mean (t  expected expected
value above value above (t  0.34) 2.58) value above value above
cutoff first cutoff mean first quartile mean (t 
quartile (t  (t  2.58) (t  0.34) 2.58)
0.34)

0.37 (#9) 2.79 6.55 2.38 5.59 3.40 7.97


1.12 (#3) 3.22 9.46 2.69 7.38 3.38 7.87
2.14 (#1) 4.66 8.81 3.57 7.23 3.33 7.88
Average 3.56 8.27 2.88 6.73 3.37 7.91
Average 1.06 1.05 0.85 0.85 1.0 1.0
relative to
truth
The averaged expected value above cutoff for the realizations is compared with the expected value for the
true values (Z), and for the ordinary kriging estimates (Z * )

characterized by a regression slope lower than 1.0. Note that this was expected when
using a small kriging neighborhood. It indicates that, on average, the estimates tend to
overestimate the values higher than the global mean (regression line below the 45° line)
and underestimate the values lower than the global mean (regression line above the
45° line). In terms of conditional expectations, this translates as E{Z * | Z *  z* } < E{Z
| Z *  z* } when z* < E{Z * }, and E{Z * | Z *  z* } > E{Z | Z *  z* } when z* > E{Z * }.
Another result that is clear is that the regression slope, from cross-validation on the
samples (100 conditioning data values), also varies a lot, from 0.19 to 0.95. However,
as seen in Fig. 2, it does not correlate perfectly with the actual kriging regression slope
of the 2,400 true values versus the 2,400 estimated values. The linear regression slope
from the cross-validation is not a good predictor for the actual linear regression slope
between the true values and the kriging estimates at the nonsampled locations. One can
see that the linear regression slope, for the cross-plot between the kriging regression
slopes versus the cross-validation slopes, is also smaller than 1.0 (actually equaling
0.29). This shows that, on average, the cross-validation slope overestimates the high
values of the kriging regression slope and underestimates the low slope values. Thus,
cross-validation is itself affected by conditional bias of type E{kriging slope | cross-
validation slope  value} just as kriging is for conditional bias of type E{Z | Z * 
z* }. Interestingly, cross-validation is not globally biased, as it correctly estimates the
average kriging regression slope with E{kriging slope}  E{cross-validation slope}

123
Math Geosci

Fig. 2 Cross-plot with regression slopes for true values versus kriging estimates on the y-axis and regression
slopes from cross-validation on the x-axis. Each point represents a sampling of 100 conditioning data (Table 1
experiments)

(0.6 for the cross-validations versus 0.62 for the krigings). This property is reassuring,
but one should be aware that cross-validation statistics can fluctuate greatly with the
luck of the draw from sampling to sampling. Statistical fluctuations between samplings
are normal, and one should also expect to see a similar effect on the various conditional
expectations/biases. A linear regression slope of cross-validation that is not biased on
average suggests that conditional expectation may also not be biased when averaging
over the experiments. This will be explored below after discussing the effect of the
data variance on the regression slope.
The next question that comes to mind is how to explain the large variations in the
regression slopes among the 11 random sampling datasets. Figures 3 and 4 show the
plot of the kriging regression slope as a function of the variance of the 100 sample
values (Fig. 3), and as a function of the variance of the true 2,400 values at the kriged
locations (Fig. 4). It can be seen that the variance fluctuations for the sets of 100
data values are much larger (from 9.98 to 43.1) than the variance fluctuations for the
sets of the remaining 2,400 values at the kriged locations (from 25.8 to 27.2). The
latter is almost constant around 26.6, the variance of the population. It is fairly clear
that the kriging regression slope is closer to 1 (less conditionally biased of type E{Z
| Z *  z* }) when the variance of the sample values tends to be smaller (Fig. 3). This
means that a lower variance in the conditioning data is beneficial for having a smaller
conditional bias of type E{Z | Z *  z* }. Not only the smoothing of the estimator, but
also the original level of variance in the conditioning data, affects the conditional bias.
To the best of the author’s knowledge, this has never been described before. In the
remainder of this study, the variance ratio of the data variance over the remaining
population variance is used for comparing the datasets on a relative scale in terms

123
Math Geosci

Fig. 3 Cross-plot with regression slopes for true values versus kriging estimates on the y-axis and variance
of the 100 conditioning data used for kriging on the x-axis. Each point represents a sampling of 100
conditioning data (Table 1 experiments)

of their variance. Closer inspection of Fig. 2 reveals that the regression slopes from
cross-validation tend to underestimate kriging regression slopes lower than average
when the variance ratio is larger than 1.0 (hotter color scale) and tend to overestimate
kriging regression slopes higher than average when the variance ratio is smaller than
1.0 (cooler color scale). The variance ratio (relative variance) is a good indicator
for separating underestimation and overestimation of the kriging regression slope by
cross-validation. Next, the statistical fluctuations for the various expectations above
a threshold value are also analyzed in terms of the variance ratio for the different
experiments.
Figure 5 shows the plot of E{Z | Z * > t} versus E{Z * | Z * > t} for a low threshold
value t  first quartile of the true population (Q1). One can see that, on average over
the 11 experiments, the kriging estimates are not conditionally biased E{E{Z * | Z *
> Q1}}  E{E{Z | Z * > Q1}}. Similarly to the cross-validation slope, averaging over
the experiments has removed this conditional bias. However, the experiments with
low variance ratio of data variance over remaining population variance (< 1) tend to
underestimate E{Z | Z * > t}, whereas the high variance ratios (> 1) are overestimating.
As the regression slope of cross-validation, E{Z * | Z * > t} appears to be itself con-
ditionally biased for E{Z | Z * > t}, for a given sampling. As for the regression slope
of cross-validation, the results are split into two groups according to a variance ratio
lower or higher than 1. Figure 6 shows the same plot of E{Z | Z * > t} versus E{Z *
| Z * > t} for a high threshold value t  mean of the true population (M). Now, E{Z *
| Z * > M} appears to overestimate E{Z | Z * > M} on average over the 11 datasets.
The overestimation is more important when the variance ratio of data variance over

123
Math Geosci

Fig. 4 Cross-plot with regression slopes for true values versus kriging estimates on the y-axis and variance
of the 2,400 true values at the kriging locations on the x-axis. Each point represents a sampling of 100
conditioning data (Table 1 experiments)

remaining population variance is high. Averaging this statistic over the experiments
does not seem to be enough to remove this conditional bias. Other corrections will
need to be explored. For a high threshold value, E{Z * | Z * > t} for estimating E{Z
| Z * > t} displays similar conditional bias of type E{Z | Z *  z* } as kriging for a high
estimated value, and as the regression slope from cross-validation.
Figure 7 shows the plot of E{Z | Z > t} versus E{Z * | Z * > t} for a low threshold
value t  first quartile of the true population (Q1). Figure 8 shows the same plot
of E{Z | Z > t} versus E{Z * | Z * > t} for a high threshold value t  mean of the true
population (M). As expected, when comparing the marginal distribution of the kriging
estimator with the true population, E{Z * | Z * > t} tends to underestimate E{Z | Z > t}
(E{E{Z * | Z * > t}} < E{Z | Z > t}) because of smoothing in f (Z * ). The underestimation
increases when the threshold increases. Interestingly, the bias is more pronounced
for datasets with low variance ratio of data variance over the remaining population
variance, showing again the influence of the data variance fluctuations in the samplings.
For completeness, note that E{Z * | Z * < t} would tend to overestimate E{Z | Z < t}.
These results are not shown, since this type of conditional expectation is not used in
practice.
From the sampling experiments, all krigings tend to underestimate the values
smaller than the global mean and overestimate the values higher than the global mean,
as the regression slopes are all smaller than 1.0. This conditional bias of type E{Z | Z * 
z* } is worse when the variance of the data increases. For the conditional bias of expec-
tation type above a threshold, the estimator (Z * ) displays overestimation (Fig. 6) and
underestimation (Fig. 8), depending on whether one compares the estimator marginal

123
Math Geosci

Fig. 5 Cross-plot of E{Z | Z * > t} on the y-axis versus E{Z * | Z * > t} on the x-axis for t  first quartile and
Z * from ordinary kriging using the conditioning data values from the 11 datasets of experiments in Table 1

Fig. 6 Cross-plot of E{Z | Z * > t} on the y-axis versus E{Z * | Z * > t} on the x-axis for t  mean of population
and Z * from ordinary kriging using the conditioning data values from the 11 datasets of experiments in
Table 1

distribution f (Z * ) with the conditional f (Z | Z * ) (Fig. 6) or with the marginal distribu-


tion f (Z) (Fig. 8) of the variable to be estimated. The relatively high variance of the
sampled data values versus the variance of the true values at the estimation locations
is detrimental to conditional bias of conditional distribution type f (Z | Z * ) (Fig. 6),
whereas it is favorable to conditional bias of marginal distribution type f (Z * ) versus

123
Math Geosci

Fig. 7 Cross-plot of E{Z | Z > t} on the y-axis versus E{Z * | Z * > t} on the x-axis for t  first quartile and
Z * from ordinary kriging using the conditioning data values from the 11 datasets of experiments in Table 1

Fig. 8 Cross-plot of E{Z | Z > t} on the y-axis versus E{Z * | Z * > t} on the x-axis for t  mean of population
and Z * from ordinary kriging using the conditioning data values from the 11 datasets of experiments in
Table 1

f (Z) (Fig. 8). Conversely, a smaller variance ratio is favorable to conditional bias of
conditional distribution type f (Z | Z * ) and detrimental to conditional bias of marginal
distribution type f (Z * ) versus f (Z). Therefore, the smooth ordinary kriging estimator
can show overestimation for conditional expectation when f (Z * ) is compared with

123
Math Geosci

f (Z | Z * ) (conditional distribution), but at the same time show underestimation for


conditional expectation when f (Z * ) is compared with f (Z) (marginal distribution).
Generally speaking, all expectations associated with a conditional distribution [f (Z
| Z *  z* ) or f (Z | Z * > t)] tend to be underestimated for the low-end values and over-
estimated for the high-end values. Less data variance favors conditional expectation
of conditional distribution type (E{Z | Z *  z* } and E{Z | Z * > t}). Conversely, more
data variance favors conditional expectation of marginal distribution type (E{Z * | Z *
> t} versus E{Z | Z > t}). The conditional biases associated with the two types of
conditional expectation go in opposite directions with a change in the data variance.
Changing the data variance (by the luck of the draw) improves one conditional bias
but worsens the other. It is important to note that, for the two types of conditional
expectation (conditional or marginal distribution), the bias level is controlled by the
data variance, which is set by the luck of the draw when sampling the population. The
smoothness of the estimator, brought about by spatial averaging, is not the only factor
affecting conditional bias.

4.2 Conditional Simulation Results: Nonsmooth Estimator

The previous section was concerned with the effect of the smoothness of the ordinary
kriging estimator on the various conditional expectations. This section is concerned
with the conditional biases that can be observed when the estimator is not smooth, but
instead reproduces the variance of the data. Since the data variance appears to be a
controlling factor, the previous kriging results have been put into three broad categories
in terms of variance ratio for the sampled data values relative to the remaining data
values to be estimated: low (ratio < 1.0), close to 1.0, and high (ratio > 1.0). Table 2
presents the results for 25 conditional sequential Gaussian simulations done using a
sample dataset from each of these three categories. From Table 1, datasets #9 (variance
ratio  0.37 < 1.0), #3 (variance ratio  1.12 close to 1.0), and #1 (variance ratio 
2.14 > 1.0) were used for the conditional simulations. Similarly to the kriging estimates,
conditional bias of type E{Z | Z *  z* } is measured by the slope of the linear regression
of the 2,400 true values (Z) versus the simulated values (Z s ) from the 100 conditioning
sampled values (using Z s as Z * ). In this case, the averaged slope over the 25 conditional
simulations is calculated. Table 2 also presents the minimum and maximum regression
slopes for the 25 realizations in each category. One can see that conditional simulations
tend to generate values that are more conditionally biased of type E{Z | Z *  z* }
than the kriged values since the average regression slopes (0.54, 0.285, and 0.23)
are systematically lower for the conditional simulations than for kriging (0.95, 0.47,
and 0.38) (for experiments #9, #3, and #1, respectively). Similarly to the kriging
estimates, the smaller variance ratios of the sample are associated with the higher
regression slopes (less conditionally biased of type E{Z | Z *  z* }). This confirms
once more that the conditional bias of conditional distribution type varies with the
variance of the conditioning data and that a smaller variance corresponds to a smaller
bias.
Some might be surprised that conditional simulations give smaller regression slopes
of true values versus estimated/simulated values than kriging (Table 2). Restoring the

123
Math Geosci

estimator variance is sometimes suggested as a remedy to this type of conditional


bias by arguing that a perfect estimator would have the same variance as the data
and would have a regression slope of 1. The poor performance, in terms of condi-
tional bias of type E{Z | Z s  zs } for a bivariate distribution f (Z, Z s ), comes from
the fact that the variance of the simulated values (Z s ) is increased by a larger factor
than the covariance between the true values and the simulated values. As recalled
from Eq. (1), the covariance/variance ratio controls the magnitude of the regression
slope. To be exact, conditional sequential Gaussian simulation is designed to restore
the distribution, not just the variance. As such, it addresses conditional bias of type
E{Z s | Z s > t}  E{Z | Z > t}, which calls for comparing the marginal distributions
f (Z s ) and f (Z) for their expected value above a threshold. These results are reported
in Table 3. When taking the average results over the three main categories of variance
ratio (last two rows in Table 3), one can see that, on average, conditional simula-
tion values (Z s ) tend to well reproduce the average above threshold with E{E{Z s
| Z s > t}}  E{Z | Z > t}. On the contrary, kriging (Z * ) tends to underestimate it with
E{E{Z * | Z * > t}} < E{Z | Z > t}. One can also observe that the conditional simula-
tion for the lowest variance ratio (dataset #9) appears to underestimate on average
over the 25 realizations with E{E{Z s | Z s > t}} < E{Z | Z > t}, whereas the condi-
tional simulation for higher variance ratios (datasets #3 and #1) tends to overestimate
on average over 25 realizations with E{E{Z s | Z s > t}} > E{Z | Z > t}. Interestingly,
the variance of the conditioning data values also has an impact on the conditional
bias of marginal distribution type when comparing E{Z s | Z s > t} with E{Z | Z > t}.
This shows that conditional simulations can also be conditionally biased depending
on the luck of the draw in the sampling. Conditional unbiasedness of marginal dis-
tribution type, by conditional simulation that restores the data distribution, appears
to be true only on average over many realizations and over multiple data samplings.
As already seen with the cross-validation slope, and the conditional bias of E{Z *
| Z * > t} versus E{Z | Z * > t} for the kriging estimates over a low threshold value
(Q1), averaging statistics over multiple samplings tends to remove the conditional
bias.
Figures 9, 10, and 11 show the plots of E{Z | Z s > Q1} versus E{Z s | Z s > Q1}
for each of the three categories of variance ratio for the sampled data values rela-
tive to the remaining data values to be estimated. They show that E{Z s | Z s > Q1}
tends to underestimate E{Z | Z s > Q1} for a low variance ratio and overestimate
for a high variance ratio. A variance ratio close to 1.0 appears to yield conditional
simulations that are not conditionally biased on average with E{E{Z s | Z s > Q1}} 
E{E{Z | Z s > Q1}} over the 25 realizations. Figures 12, 13, and 14 show the plots
of E{Z | Z s > M} versus E{Z s | Z s > M} for each of the three categories of vari-
ance ratio for the sampled data values relative to the remaining data values to be
estimated. For the higher threshold (M > Q1), E{Z s | Z s > M} tends to overesti-
mate E{Z | Z s > M}. The overestimation tends to increase with the variance ratio
for the sampled data values relative to the remaining data values to be estimated.
Even for conditional simulations, the higher relative variance of the conditioning
data versus the variance of the true values at the simulation locations is detrimen-
tal to conditional bias of type E{Z | Z s > t} (conditional distribution). It can also
be seen that, in general, E{Z s | Z s > t} increases with the variance of the simu-

123
Math Geosci

Fig. 9 Cross-plot of E{Z | Z s > t} on the y-axis versus E{Z s | Z s > t} on the x-axis for t  first quartile and
Z s from 25 conditional Gaussian simulations using the conditioning data values from the low-variance-ratio
dataset #9 of experiments in Table 1. The color scale measures the variance of the simulated values for each
simulation

lated values. The conditional bias of conditional distribution type increases with
the variance of the simulated values, except for the lower threshold (Q1) when the
variance ratio is lower than 1.0 (Fig. 9). In this case, the higher variances of the
simulated values are associated with smaller biases (closer to the 45° line). Condi-
tional bias of marginal distribution type is impacted by both the ergodic fluctuations
(Deutsch and Journel 1998) in the realizations of the simulated values and the variance
ratio of the conditioning data versus the population variance (sampling fluctua-
tions).
From the above, restoring the variance removes the conditional bias of marginal
distribution type (E{Z s | Z s > t} versus E{Z | Z > t}) but only on average over the
realizations and over the three categories of the variance ratio, showing once again the
importance of averaging over multiple samplings. Conversely, restoring the variance
increases the conditional bias of conditional distribution type (E{Z s | Z s  zs } versus
E{Z | Z s  zs } and E{Z s | Z s > t} versus E{Z | Z s > t}). This is similar to the results
observed for the kriging estimator (compare Figs. 9, 10, and 11 with Fig. 5 and Figs. 12,
13, and 14 with Fig. 6). Not only the data variance but also the estimator variance
(ergodic fluctuations) have opposite effects on the two types of conditional bias. Fixing
the conditional bias of marginal distribution type worsen the conditional distribution
type. The next section revisits Krige’s conditional bias correction (Krige 1951) and
explores the effects of correcting the kriging estimator for removing conditional bias
of conditional distribution type for E{Z | Z *  z* }.

123
Math Geosci

Fig. 10 Cross-plot of E{Z | Z s > t} on the y-axis versus E{Z s | Z s > t} on the x-axis for t  first quartile and
Z s from 25 conditional Gaussian simulations using the conditioning data values from the unit-variance-ratio
dataset #3 of experiments in Table 1. The color scale measures the variance of the simulated values for each
simulation

Fig. 11 Cross-plot of E{Z | Z s > t} on the y-axis versus E{Z s | Z s > t} on the x-axis for t  first quartile and
Z s from 25 conditional Gaussian simulations using the conditioning data values from the high-variance-
ratio dataset #1 of experiments in Table 1. The color scale measures the variance of the simulated values
for each simulation

123
Math Geosci

Fig. 12 Cross-plot of E{Z | Z s > t} on the y-axis versus E{Z s | Z s > t} on the x-axis for t  mean of population
and Z s from 25 conditional Gaussian simulations using the conditioning data values from the low-variance-
ratio dataset #9 of experiments in Table 1. The color scale measures the variance of the simulated values
for each simulation

Fig. 13 Cross-plot of E{Z | Z s > t} on the y-axis versus E{Z s | Z s > t} on the x-axis for t  mean of population
and Z s from 25 conditional Gaussian simulations using the conditioning data values from the unit-variance-
ratio dataset #3 of experiments in Table 1. The color scale measures the variance of the simulated values
for each simulation

123
Math Geosci

Fig. 14 Cross-plot of E{Z | Z s > t} on the y-axis versus E{Z s | Z s > t} on the x-axis for t  mean of
population and Z s from 25 conditional Gaussian simulations using the conditioning data values from the
high-variance-ratio dataset #1 of experiments in Table 1. The color scale measures the variance of the
simulated values for each simulation

5 Correcting the Estimator

As seen in the previous section, restoring the variance of the estimator helps removing
conditional bias of marginal distribution type but worsens the conditional bias of
conditional distribution type. However, the kriging (or any) estimator can also be
modified to remove the linear component of conditional bias of conditional distribution
type, E{Z | Z *  z* }. Indeed, one can add a correction component so that the slope of
the linear regression of the true values versus the corrected estimated values is exactly
1. Equation (5) presents this correction

Z ∗∗  Z ∗ + a · e∗ , (5)

where a is the correction factor to be determined, e∗  Z ∗ − E{Z ∗ }, and Z * is the


kriging estimator. Therefore, E{e* }  0 and E{Z ** }  E{Z * }  E{Z} for a globally
unbiased estimator. The definition of the correction component, e*  Z * − E{Z * },
respects the observation that, for linear regression, a globally unbiased estimator is
also conditionally unbiased for the expected value, and is more biased as it departs
from the expected value. Therefore, the correction needs to be proportional to the
difference between the estimate and the mean. The conditional unbiasedness property
of conditional distribution type, f (Z | Z **  z** ), calls for the condition

E{Z |Z ∗∗  z ∗∗ }  z ∗∗ . (6)

123
Math Geosci

From Eq. (4), and assuming global unbiasedness, this conditional unbiasedness
calls for
cov(Z , Z ∗∗ )
 1. (7)
var(Z ∗∗ )

Both, cov(Z, Z ** ) and var(Z ** ), can be expressed in terms of the primary (kriging)
estimator Z * as
var(Z ∗∗ )  (1 + a)2 var(Z ∗ ) (8)

and
cov(Z , Z ∗∗ )  (1 + a)cov(Z , Z ∗ ). (9)

Note that the variance of the corrected estimator will be greater than the variance
of the kriging estimator only if a > 0 or a < −2. Note also that, if a < −1, then cov(Z,
Z ** ) will be negative, which is not suitable for prediction as it is natural to expect a
positive correlation between a random variable and its estimator. From Eqs. (8) and
(9), one can solve for the multiplicative correction factor thus
cov(Z , Z ∗∗ ) cov(Z , Z ∗ ) cov(Z , Z ∗ )
1  > a − 1. (10)
var(Z ∗∗ ) (1 + a)var(Z ∗ ) var(Z ∗ )

Equation (10) says that conditional bias of type E{Z | Z *  z* }, as measured by


the slope of a linear regression, can be corrected when a is defined as the difference
between the slope of the linear regression of the true values versus the estimated values
and the slope of the 45° line (Eq. 1). In the presence of the typical conditional bias
generated by kriging, (1 + a)2 < 1, since the slope of the linear regression of the
true values versus the estimated values, (1 + a), is between 0.0 and 1.0 (see kriging
experiments in Table 1). This indicates that the variance of the corrected estimator
(Eq. 8) is even smaller than the variance of the original estimator var(Z ** ) < var(Z * ).
Contrary to the correction for conditional bias of marginal distribution type by restoring
the variance, correcting conditional bias of conditional distribution type [f (Z | Z * )]
calls for even more reduction in the variance than the one created by the original
smoothing of the kriging estimator. This is consistent with the results of conditional
simulation (Sect. 4.2), where conditional bias of type E{Z | Z *  z* } is worse from the
realizations of simulated values (Table 2) than from the kriging results (Table 1).This
is also consistent with the kriging results over the 11 experiments, where the lower
data variances produce smaller conditional biases (Fig. 3). Still, a variance reduction is
necessary to completely remove conditional bias of type E{Z | Z *  z* }. The corrected
estimator is now written as
   
Cov Z , Z ∗   
Z ∗∗  Z ∗ + − 1 ∗ Z∗ − E Z∗
Var (Z ∗ )
 
  Cov Z , Z ∗   
 E Z∗ + ∗
∗ Z∗ − E Z∗ .
Var (Z ) (11)

123
Math Geosci

Note that this correction is similar to Krige’s correction (Krige 1951), although
Krige used a simple average instead of the kriging estimator. It is also equivalent
to Eq. (1) when replacing the (usually unknown) expected value of the variable Z
with that of its estimator Z * . Therefore, one can say that Krige’s correction amounts to
equating the covariance between the true values and corrected estimate values with the
variance of the corrected estimate values as in Eq. (7). Equation (5) is an interesting
reinterpretation of Krige’s correction which exposes the correction factor, a, in an
explicit manner, allowing the variance of the corrected estimator to be linked with that
of the original estimator (Eq. 8).
Figure 15a, b shows one example of the application of the correction (5) to the esti-
mates obtained from cross-validation (Fig. 15a). One can see that the new regression
line exactly matches the 45° line with a slope of 1.0 (compare Fig. 15a and b). It is clear
that applying the correction of Eq. (5) successfully removes the conditional bias of the
known locations (sampled locations). It is also clear that the variance of the original
estimator is greatly reduced (from 8.91 to 0.86) as the points are more grouped near
the kriging estimator average value (compare Fig. 15a and b). Again, this shows the
importance of the global mean for conditional unbiasedness of type E{Z | Z *  z* }.
However, the more important test is the performance of the corrected estimator at the
remaining 2,400 unknown locations. Figure 16a, b compares the linear regression of
the true values versus the kriging estimates before and after applying the correction
factor estimated from the cross-validation on the conditioning data (Fig. 15a). The
original kriging estimates show a conditional bias with a regression slope smaller than
1.0 (Fig. 16a). The correction factor estimated from the cross-validation overcorrects
the conditional bias for the unknown locations, as the new regression line now has a
slope value greater than 1.0 (Fig. 16b). This can be explained when recalling the results
of Fig. 2. For a linear regression slope smaller than the average (< 0.6) (actually 0.31
in Fig. 15a), cross-validation tends to underestimate the actual linear regression slope
(actually 0.47 on Fig. 16a). Thus, the correction factor based on cross-validation (a 
cross-validation slope − 1) is larger (in negative value) than it should be, leading to an
overcorrection. The opposite would apply when correcting the kriging estimates using
a correction factor from a cross-validation regression slope higher than the average
(> 0.6). The correction would then be too small, because cross-validation slopes larger
than the average tend to overestimate the actual regression slope. This would give a
new regression line with a slope larger than the actual slope but still lower than 1.0
(verified, not shown).
Because the linear regression slope of cross-validation suffers from conditional
bias, it is not robust at correcting the kriging estimates for new estimation locations.
This illustrates well the risk of relying on cross-validation statistics that are based on a
single sampling. Since this study uses a synthetic dataset, the true correction factor for
conditional bias of type E{Z | Z *  z* } is known for each of the 11 sampling experi-
ments. The next section explores the impact, on the various conditional expectations,
of applying the exact correction factor.

123
Math Geosci

Fig. 15 a Example of cross-validation plot for the 100 true values versus their 100 kriging estimates,
and b cross-validation plot for the 100 true values versus their 100 corrected kriging estimates removing
conditional bias of conditional distribution type seen in a

123
Math Geosci

Fig. 16 a Example of cross-plot between the 2,400 true values versus the 2,400 kriging estimates from
same 100 conditioning data used in Fig. 15a, and b cross-plot between the 2,400 true values versus the
2,400 corrected kriging estimates using correction factor corresponding to the correction of conditional bias
of conditional distribution type as estimated from cross-validation from Fig. 15a, b

123
Math Geosci

Fig. 17 Cross-plot of E{Z | Z ** > t} on the y-axis versus E{Z ** | Z ** > t} on the x-axis for t  first quartile
and Z ** from the corrected ordinary kriging estimates from the 11 datasets of experiments in Table 1

6 Removing Conditional Bias of Conditional Distribution Type


by Applying the Exact Correction Factor

For accurate spatial predictions, on average, one is really interested in correcting the
conditional bias of type E{Z | Z *  z* } for locations where no value has been measured
yet. To study the effects of the exact correction (Eqs. 5, 10, 11), the ordinary kriging
estimates of the remaining 2,400 locations (using the 100 known data locations) are
corrected using the exact correction factor (Eq. 5) for each of the 11 sample dataset
experiments in Table 1 (a  kriging regression slope – 1).
Figure 17 shows the plot of E{Z | Z ** > t} versus E{Z ** | Z ** > t} for a low threshold
value t  first quartile of the true population (Q1). It shows that, similarly to the
noncorrected kriging estimator (Z * ), the corrected kriging estimates (Z ** ) are not
conditionally biased on average over the 11 experiments with E{E{Z ** | Z ** > Q1}} 
E{E{Z | Z ** > Q1}} (compare Figs. 17 and 5). Again, similarly to the noncorrected
kriging estimator (Z * ), the experiments with low variance ratio of data variance over
remaining population variance underestimate E{Z | Z ** > t} whereas high variance
ratios overestimate. Figure 18 shows the plot of E{Z | Z ** > t} versus E{Z ** | Z **
> t} for a high threshold value t  mean of the true population (M). In contrast to
the noncorrected ordinary kriging estimates, E{Z ** | Z ** > M} appears to be not or
less conditionally biased for E{Z | Z ** > M} (compare Figs. 18 and 6). Here, the
correction appears to favor the datasets with high variance ratio, whereas datasets
with low variance ratio were favored before the correction (Fig. 6). This is an effect of
the extra smoothing brought about by the correction. Not surprisingly, since f (Z ** ) is
smoother than f (Z * ), the corrected kriging estimator (E{Z ** | Z ** > t}) is more biased
for E{Z | Z > t} than the noncorrected kriging estimator (compare Figs. 19 with Fig. 7

123
Math Geosci

Fig. 18 Cross-plot of E{Z | Z ** > t} on the y-axis versus E{Z ** | Z ** > t} on the x-axis for t  mean of
population and Z ** from the corrected ordinary kriging estimates from the 11 datasets of experiments in
Table 1

Fig. 19 Cross-plot of E{Z | Z > t} on the y-axis versus E{Z ** | Z ** > t} on the x-axis for t  first quartile
and Z ** from the corrected ordinary kriging estimates from the 11 datasets of experiments in Table 1

and Fig. 20 with Fig. 8). Again, the bias is more pronounced for datasets with low
variance ratio of data variance over remaining population variance (at least for the low
threshold value, Fig. 19).
Note that it was also observed (not shown) that the same results can be obtained
when reducing the variance of the 100 conditioning data values (Z) as

123
Math Geosci

Fig. 20 Cross-plot of E{Z | Z > t} on the y-axis versus E{Z ** | Z ** > t} on the x-axis for t  mean of
population and Z ** from the corrected ordinary kriging estimates from the 11 datasets of experiments in
Table 1

Z   (a + 1) ∗ (Z −E{Z }) + E {Z } , (12)

where a + 1  kriging regression slope (Eq. 10). Equation (12) reduces the variance
of the data values instead of reducing the variance of the kriging estimates because
a + 1 < 1 (Table 1). The end result is exactly the same when kriging uses Z  without
additional correction of the kriging estimates, instead of using Z, and correcting the
kriging estimates with Eq. (5). As suggested by one reviewer, spatially declustering the
dataset can also possibly be beneficial to alleviate the conditional bias of conditional
distribution type. This will be true if the declustering weights actually reduce the data
variance. Spatial declustering is usually required when the sampling has a tendency to
create clusters of sample locations in the neighborhood of sweet spots. In this study,
each sampling was done at random and declustering was not required.
The above results show that E{Z | Z **  z** }  z** , for the corrected kriging esti-
mator, seems to entail that E{Z ** | Z ** > t}  E{Z | Z ** > t}, but only on average over
the experiments (E{E{Z ** | Z ** > t}}  E{Z | Z ** > t}). As seen in many results so far,
averaging statistics over the experiments seems to be the key for correcting conditional
bias. In the next section, this avenue is further explored by averaging the estimates
themselves.

7 Correcting Conditional Bias by Averaging over Sampling Experiments

It is interesting to learn from the effects of applying the exact correction factor, a,
for correcting the conditional bias of conditional distribution type. Of course, the

123
Math Geosci

Fig. 21 Cross-plot for the 2,500 true values versus the 2,500 averaged kriging estimates over the 11 datasets
of experiments in Table 1

correction factor cannot be known a priori because one cannot know the covariance
between the estimates and the true values at the new estimation locations before the
true values are known. However, it has been shown that conditional unbiasedness can
be observed for the averaged conditional expectations over the 11 dataset experiments
for the kriging estimates (correcting conditional bias of conditional distribution type
above a threshold) and also over the 25 realizations for the simulated values (correcting
conditional bias of marginal distribution type). Capitalizing on averaging over the
experiments, Fig. 21 shows the cross-plot of the true values (Z) versus the average
of the kriging values themselves [Z **  avg(Z * )], over the 11 datasets used in the
experiments of Table 1. This way, no correction factor (a) is required. One can observe
that conditional bias is practically removed (regression slope equals 1.1). Note also
that the average of the variance ratios is 0.97, indicating that the 11 samplings are
globally unbiased for the population variance (Table 1). Note that, if one averages the
kriging estimates for the datasets with a variance ratio smaller than 1.0, then one gets
a regression slope equals to 1.31. This indicates an overcorrection by focusing on the
datasets with less variance, which are known to be less conditionally biased of type
E{Z | Z *  z* } to start with (Fig. 3). The contrary is observed if, instead, the averaging
is done for the datasets with a variance ratio greater than 1.0. Then, a regression
slope equal to 0.8 is obtained. This indicates an undercorrection by focusing on the
datasets with more variance, which are known to be more conditionally biased of type
E{Z | Z *  z* } to start with (Fig. 3). Once more, the results confirm that conditional
bias of type E{Z | Z *  z* } is reduced when estimates have their variance reduced.
Indeed, the variance of the average of the kriging estimates [Var(Z **  avg(Z * )] over

123
Math Geosci

Table 4 Comparing conditional expectations for the averaged kriging estimates [avg(Z * )] over the 11
experiments of Table 1 with the true values (Z)

Conditional Marginal distribution True distribution


distribution

E{Z | avg(Z * ) > t} E{avg(Z * ) | avg(Z * ) > t} E{Z | Z > t}

Expected value Expected value Expected Expected True expected True expected
above cutoff above cutoff values above values above value above value above
first quartile (t mean (t  2.58) first quartile (t mean (t  2.58) first quartile mean (t  2.58)
 0.34)  0.34) (t  0.34)

2.65 5.63 2.71 5.29 3.37 7.93

the 11 experiments is 6.92, whereas the average of the kriging estimate variances
[avg(Var(Z * )] is 14.47.
Table 4 presents the expected value of the true values given that the averaged
kriging values are above a threshold. Practically no conditional bias is observed,
E{avg(Z * ) | avg(Z * ) > t}  E{Z | avg(Z * ) > t}. Thus, having E{Z | avg(Z * )  z* }  z*
entails E{avg(Z * ) | avg(Z * ) > t}  E{Z | avg(Z * ) > t}. Of course, the averaged kriging
estimates underestimate the true values for their proportions above a threshold because
of their increased smoothing with E{avg(Z * ) | avg(Z * ) > t} < E{Z | Z > t}. With the 11
experiments, we observe that E{avg(Z * ) | avg(Z * ) > t} < E{E{Z * | Z * > t}} < E{Z | Z
> t} with (2.71, 5.29) < (3.0, 6.61) < (3.37, 7.93), respectively, for (t  Q1, t  M).
This again shows that correcting for conditional bias of conditional distribution type
worsens the conditional bias of marginal distribution type.
It appears that conditional bias of conditional distribution type [f (Z | Z * )] can be
removed by averaging multiple kriging estimates from different random samplings, as
long as these samplings are not globally biased for the population variance (averaged
variance ratio close to 1). This could give a direction for further research, with other
datasets, to explore how many samplings might be required to average over, depending
on factors such as the skewness of the distribution of the population and the sampling
size.

8 Conclusions

Smoothing is always observed when the set of kriging estimates is globally compared
with the set of conditioning data or with the set of the true values. Because of spa-
tial averaging, the kriging estimates overestimate the proportion of low values and
underestimate the proportion of high values. This conditional bias is detected when
comparing marginal distributions [f (Z * ) versus f (Z)]. However, when a specific krig-
ing estimate is compared locally with its corresponding true value, then overestimation
of the high values and underestimation of the low values is observed on average. This
other conditional bias is detected when comparing conditional distributions [f (Z * | Z *
 z* ) versus f (Z | Z *  z* )]. Both conditional biases are inherent characteristics of
the ordinary kriging estimator. It is also observed that conditional bias is not reserved

123
Math Geosci

for kriging estimates. Any statistics estimated from the conditioning data values have
the potential to be conditionally biased, and the bias depends on the variance of the
conditioning data relative to the variance of the population. The linear regression slope
of cross-validation is a good example.
Conditional expectation can be defined in different ways depending on the defi-
nition of the condition. Two important definitions, related to an estimator, are used
in geostatistics: conditional expectation of the true values given an estimator value
or a range of values (E{Z | Z *  z* } and E{Z | Z * > t}), and the conditional expec-
tation of the estimator values above a given threshold (E{Z * | Z * > t}). The former
characterizes a conditional distribution [f (Z | Z * )], whereas the latter characterizes a
marginal distribution [f (Z * )]. The term “conditional bias” can be applied to condi-
tional distributions and to marginal distributions, and the same estimator can display
different types of conditional bias, explaining the source of so much confusion in the
geostatistical literature. Both types of conditional bias are dependent on the variance
of the estimates, which are themselves directly dependent on the variance of the data.
It is shown here that the conditional expectation of conditional distribution type (E{Z
| Z *  z* } and E{Z | Z * > t}) is smoothing friendly, whereas the conditional expecta-
tion of marginal distribution type (E{Z * | Z * > t} versus E{Z | Z > t}) is smoothing
adverse. Because the two types of conditional bias behave inversely with a change in
the variance, correcting for one type of conditional bias necessarily worsens the other
type. Trying to fix conditional bias of conditional distribution type (E{Z | Z *  z* }
and E{Z | Z * > t}), by reducing the variance of the estimates, will necessarily worsen
the conditional bias of marginal distribution type (E{Z * | Z * > t} versus E{Z | Z > t}).
Vice versa, trying to fix conditional bias of marginal distribution type, by increasing
the variance, will necessarily worsen the conditional bias of conditional distribution
type. As already noted before (Isaaks 2004), there is no estimator that can fix the two
types of conditional bias at the same time.
Estimated values (z*  E{Z * | Z *  z* }) are used to predict the conditional expecta-
tion of type E{Z | Z *  z* } at their spatial locations, whereas conditional expectation
of type E{Z * | Z * > t} is used for global estimation when trying to estimate E{Z | Z *
> t} or E{Z | Z > t} for the whole field. Conditional bias is observed for E{Z | Z * 
z* } and E{Z | Z * > t} when the estimator is not smooth enough, whereas conditional
bias for E{Z * | Z * > t} versus E{Z | Z > t} is observed when the estimator is smooth.
If E{Z * | Z * > t} is used to estimate E{Z | Z * > t}, than smoothing is good. If instead,
E{Z * | Z * > t} is used to estimate E{Z | Z > t}, then smoothing is bad. Again, the
same estimator cannot be used to estimate both E{Z | Z * > t} and E{Z | Z > t}. If the
estimation of E{Z | Z > t} is required, then conditionally simulated values (Z s ) should
be used instead of the kriged values (Z * ).
Conditional bias of marginal distribution type can be corrected, or at least lessened,
by conditional simulation that will reproduce the data distribution [f (Z s )  f (Z)]. Of
course, this is assuming that the sample of the conditioning data values represents
well the distribution of the true values at the simulation locations. As seen in this
study, their respective variance can be quite different (see variance ratios in Table 1).
It is observed here that conditional simulations correct for conditional bias of marginal
distribution type on average over the 25 realizations and, at least, over the three selected
experiments representing all the categories of variance ratios (< 1,  1, > 1; Table 3).

123
Math Geosci

Realizations from a particular sampling dataset are likely to show conditional bias of
marginal distribution type depending on the variance ratio of the conditioning data
values versus the true values at the simulation locations. A lower variance ratio tends
to make E{Z s | Z s > t} underestimate E{Z | Z > t}, and vice versa for a higher variance
ratio. Overestimation is systematically observed, for a high threshold, for conditional
expectation of conditional distribution type when comparing E{Z s | Z s > t} with E{Z
| Z s > t}. Simulated values are not adequate for estimating conditional expectation of
conditional distribution type. For a given sampling, conditional bias of both types are
sensitive to the ergodic fluctuations of the simulated values, and any one particular
realization is likely to be biased. It is the statistics over the ensemble that is less,
or sometimes not biased for conditional expectation of marginal distribution type,
depending on the variance of the conditioning data.
It is shown here that conditional bias of conditional distribution type [f (Z | Z * )] is
also sensitive to the level of variance in the conditioning data. A lower data variance
favors a lower conditional bias of conditional distribution type (E{Z | Z *  z* } and
E{Z | Z * > t}). Correcting for conditional bias of conditional distribution type reduces
the variance of the kriging estimator, by regrouping the corrected values closer to the
global mean. Indeed, a globally unbiased estimator is also non-conditionally biased
for the global mean as E{Z * }  E{Z} implies E{Z | Z *  E{Z}}  E{Z}. Moving
the estimates closer to the global mean appears to act as a rotation of the regression
line around the global mean and towards the 45° line. By doing so, the kriging esti-
mates will become even smoother and lose some of their local spatial accuracy. A
better local statistical prediction (E{Z | Z *  z* }) comes at the price of losing spatial
details. The extreme case for local statistical accuracy would be to assign the global
mean at every estimation location, achieving local unbiasedness for sure but losing all
spatial details. Therefore, local accuracy/details should be distinguished from local
unbiasedness, which is more correctly termed as local statistical accuracy (i.e., local
accuracy on average only, over the field). Going from local accuracy, which is impor-
tant for mapping details, to local statistical accuracy, which is important for prediction,
requires additional smoothing. The kriging estimator can be somewhat tuned towards
one of these two goals by changing the size of the kriging neighborhood. A large neigh-
borhood will favor local statistical accuracy over local accuracy (Rivoirard 1987). As
pointed out by one reviewer, correcting the kriging estimates by adding additional
smoothing will make the estimates depart from their minimal kriging variance prop-
erty (the same goes for conditionally simulated values when removing the smoothing).
It is worth recalling that the kriging variance is not a measure of the actual errors of
the kriged values versus the true values. It is not dependent on the actual data values,
but instead, is completely dependent on the geometry of the data locations relative to
the location being estimated (Deutsch and Journel 1998; Chiles and Delfiner 2012).
It represents the missing variance, for a given local conditioning data geometry, that
the estimator would need to recover to match the data variance (Deutsch 2002). It is
a modeling error and becomes useful when one needs to restore the variance of the
estimator. It is perfectly acceptable, if necessary, to adjust the estimates if the goal is
to alleviate conditional bias.
A simple formula (Eqs. 5, 10, 11), based on the linear regression slope between the
true values and the estimated values, can be applied for correcting the linear component

123
Math Geosci

for the conditional bias of conditional distribution type. Even if the conditional expecta-
tion is not linear, a linear correction is always possible to remove the linear component.
When the correction factor (Eqs. 5, 10) is measured by cross-validation, it is shown
here that the slope of the linear regression in cross-validation is itself conditionally
biased. Cross-validation statistics appear to fluctuate significantly from one sampling
to another. From sampling experiments, it is shown here that cross-validation tends to
overestimate the kriging regression slopes for samplings with low data variance and
underestimate the kriging regression slopes for samplings with higher data variance.
Therefore, Krige’s correction, based on the regression slope of cross-validation, cannot
completely remove conditional bias (of conditional distribution type) when applied
on the basis of a single sampling. However, it is observed that conditional bias of con-
ditional distribution type can simply be corrected by averaging the kriging estimates
over multiple random samplings of the population. When it comes to correcting for
conditional bias of both types, averaging estimates over multiple samplings and aver-
aging statistics over multiple realizations and samplings appears to be a good strategy.
It reduces the variance of the estimates and averages out the fluctuations due to the
luck of the draw in the sampling. The variance reduction for the kriging estimates will
reduce the conditional bias of conditional distribution type, while the averaging of the
statistics for the simulated values will remove the ergodic fluctuations and reduce the
conditional bias of marginal distribution type. Using a unique sampling will likely
lead to some conditional bias. Use of multiple samplings is simple in principle, but
may not necessarily be simple to implement in practice. Resampling is a useful tool
for more robust statistics, and it should be considered whenever possible.
A final note is warranted to discuss the difference between using E{E{Z s | Z s
> t}} (averaged statistics for simulated values over multiple realizations and multiple
samplings) to estimate E{Z | Z > t} (Table 3) and using E{avg(Z * ) | avg(Z * ) > t} (aver-
age kriging values over multiple samplings) to estimate E{Z | avg(Z * ) > t} (Table 4).
As discussed above, it is observed that E{avg(Z * ) | avg(Z * ) > t} is not biased for
E{Z | avg(Z * ) > t}, and E{E{Z s | Z s > t}} is not biased for E{Z | Z > t}. However,
E{avg(Z * ) | avg(Z * ) > t} is systematically smaller than E{E{Z s | Z s > t}} (2.71 versus
3.56 for t  Q1 and 5.29 versus 8.27 for t  M; Tables 3 and 4). The difference is prob-
ably explained by the smoothness in the distribution of the avg(Z * ) values. Because the
simulated values (Z s ) cannot be pinned to any particular spatial location, E{Z | Z > t}
can probably never be achieved in practice, even if it can be well estimated by E{E{Z s
| Z s > t}}. On the contrary, the spatial locations of the averaged kriging values [avg(Z * )]
are known, allowing selection for locations above the threshold value. Thus, it is more
realistic to achieve E{Z | avg(Z * ) > t} as estimated by E{avg(Z * ) | avg(Z * ) > t} than
achieving E{Z | Z > t}, which appears to be an ideal upper limit for global estimation.
This has an important consequence in practice if simulated values (Z s ) are used to
estimate the global reserves; the global estimation may be too optimistic.
Finally, the average of the kriging estimates [avg(Z * )] reconciles local statistical
accuracy (no bias for E{Z | avg(Z * )  z* }) with global estimation accuracy (no bias for
E{Z | avg(Z * ) > t}). The same estimator [avg(Z * )] can be used for local and global esti-
mation, but only for the expected value (E{Z | avg(Z * ) > t}), as f (avg(Z * ) | avg(Z * ) > t)
is smoother than f (Z | avg(Z * ) > t) (i.e., variance of 6.95 versus 27.15 for t  Q1, and
8.2 versus 57.1 for t  M).

123
Math Geosci

Acknowledgements Comments and suggestions by two anonymous reviewers helped to improve the
original version of the manuscript.

References
Armstrong M (1998) Basic linear geostatistics. Springer, Berlin
Chiles JP, Delfiner P (2012) Geostatistics modeling spatial uncertainty, 2nd edn. Wiley series in probability
and statistics. Wiley
David M (1977) Geostatistical ore reserve estimation. Elsevier, Amsterdam
Davis B (1987) Uses and abuses of cross validation in geostatistics. Math Geol 19(3):241–248
Deutsch CV (2002) Geostatistical reservoir modeling. Oxford University Press, New York
Deutsch CV, Journel AG (1998) GSLIB Geostatistical software library and user’s guide, 2nd edn. Oxford
University Press, New York
Draper NR, Smith H (1998) Applied regression analysis, 3rd edn. Wiley
Efron B (1982) The jacknife, the bootstrap, and other resampling plans. In: CBMS-NSF regional confer-
ence series in applied mathematics. Monograph 38. Society for Industrial and Applied Mathematics,
Philadelphia
Goovaerts P (1997) Geostatistics for natural resources evaluation. Oxford University Press, New York
Isaaks E (2004) The kriging oxymoron: a conditionally unbiased and accurate predictor. In: Leuangthong
O, Deutsch CV (eds) Geostatistics Banff 2004, 2nd edn. Springer, Berlin, pp 363–374
Isaaks EH, Srivastava RM (1989) An introduction to applied geostatistics. Oxford University Press, New
York
Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic, Cambridge
Journel A, Kyriakidis P (2004) Evaluation of mineral reserves: a simulation approach. Oxford University
Press, New York, p 216
Krige DG (1951) A statistical approach to some basic mine valuation problems on the Witwatersrand. J
Chem Metall Min Soc S Afr 52:119–139
Krige DG (1996) A practical analysis of the effects of spatial structure and data available and used, on con-
ditional biases in ordinary kriging. In: 5th international geostatistics congress, Wollongong, Australia
Magri EJ, Gonzalez MA, Couble A, and Emery X (2003) The influence of conditional bias in optimum
ultimate pit planning. Application of Computers and Operations Research in Minerals Industries,
South Africa Institute of Mining and Metallurgy
McLennan J, Deutsch C (2002) Conditional bias of geostatistical simulation for estimation of recoverable
reserves. In: CIM proceedings Vancouver 2002, Vancouver
Rivoirard J (1987) Teacher’s aide: two key parameters when choosing the kriging neighborhood. Math Geol
19(8):851–856

123

You might also like