Summary of Geostatistical Analysis

Summary of Geostatistical Analyst
Spatial Analysis by Visual & Statistical Description
• Contour Map – lines delineating values of equality (p42).
• Symbol Map – Symbols posted representing a class of data values

(p43).
• Grayscale Map – Similar to symbol map but uses gradations of

gray values (p44).
• Indicator Maps – Binary raster plots delineating class
values (p45).
• Each plot represents a different range or thresholds of

values, e,g, concentrations of dissolved solids in a lake.
• Distribution of values above or below a threshold are

displayed & their spatial patterns may be analyzed for
trends (directional bias).
• Moving Window Statistics – used to assess
anomalies of average values and variability
(heteroscedasticity) of values throughout a region (p46).
• Create a grid whose size is dependent upon the size of

the data set.
• Analogous to a raster spacing (spatial resolution).
• Given a data set with a 1 meter grid spacing and 100

points:
• Select a window (search neighborhood) that is

not too large to over-generalize, yet not too small
that anomalies are not picked up.
• Search neighborhoods are used to compute
mean and standard deviations.
• If local means (based upon the number of values used
in the grid) are heavily influenced by erratic values, use
the median.
• Mean (top value) & standard deviations (bottom value)

for moving windows posted (p47).
• Note patterns; higher standard deviations where

variability of values is greatest in a window, Higher
means correspond to highest values in a window.
• Standard deviations vary more than means throughout

the region.
Proportional Effect
• Anomalies in the local variability have an impact on

the accuracy of estimations in spatial statistical
techniques.
• Areas of uniform values (low variability) render the

most credible estimates.
• Consider four relationships between local mean and

local variability (p49).
Graph A - Local mean (straight line) and variability

are constant. Data values fluctuate about the local
mean, but no obvious change in variability.
Graph B – A trend in the local mean, but the

variability remains constant w/ rising & falling mean.
Graph C – Constant local mean while the variability
exhibits a trend (rises).
Graph D – A trend in both the local mean and

variability (both mean and standard deviation rise).
Most common case in Earth Science data.
• Ideally, data should be as A or B (relatively constant

variability), but D is acceptable since variability changes
noticeably, but it is related to local mean, therefore
somewhat predicable.
• When a relationship exists between the local mean and

local standard deviation, a proportional effect exists.
• A scatterplot (mean vs. standard deviation) shows a
visual means for assessing this relationship (p51).
• A low correlation coefficient indicates a low

relationship.
• Univariate, normally distributed data can be spatially

arranged to render a proportional effect.
Geostatistical Analysis - an ArcGIS extension used to create a
statistically valid, continuous surface.
Basic steps to creating a surface
• Add layers in ArcGIS.
• Explore statistical properties of data through:
• Histogram (univariate distribution).
• Voronoi polygons – analyze spatial

variability.
• Normal QQPlot – Check for normality.
• Trend Analysis – identify global trends.

• Semi-variogram/covariance cloud – analyze spatial dependencies
(autocorrelation) in the data.
• General QQPlot – explore whether two datasets have the same

distributions.
• Cross-covariance cloud – analyze cross-covariance between two

datasets.
• Select a surface model to create a surface.

• Assess the output surface.
• Compare other models for optimal fit.
• Two basic methods of interpolation –
• Deterministic – use mathematical functions.

• Geostatistical – use statistical & mathematical
functions.
Deterministic Methods
• Sample points (observed) of phenomena are established at

locations.
• Algorithms interpolate (predict) other (unmeasured) points &

create a continuous surface.
• Premise – objects that are close to each other tend to be more

alike than objects further apart (Tobler, 1970).
• Inverse distance weighted method – based upon Local similarity.
• General inverse weighted formula – Z’(so) = S lZ(si)
• Weight formula - li = dio-p / S dio-p
Z’(so) = a predicted value for location (so).

li = weights assigned to each measured point.
Z (si) = an observed value at location (si).
dio-p = distance between prediction location (so) & each
measured location (si).
p = a power function (higher powers plot as a steeper
curve on the left end of a graph of relative weight (y
axis), distance (x axis).
• Weights decrease rapidly w/ distance. Scaled so that Sl = 1.
• Optimal p values are derived as the minimum RMS prediction

error.
• Cross-validation process computes the difference from a

measured vs. predicted value for that location.
• An exact (predicts a value identical to the measured

value at a known location w/o prediction standard error, & forces the
surface to pass through data values) interpolator.
• An output surface is sensitive to clustering & outliers.

• Works best when the data is evenly distributed.
• A predicted surface can not extend above a maximum or below a

minimum value.
• IDW assumes the surface is driven by local variation captured by

neighborhood values.
• Use the Validation option by creating training (contains measured

locations) & test datasets (used to validate the predictions).
• Global & Local Polynomial Interpreters – modeled by polynomial
equations.
• First order – linear.

• Second order – one bend in the plane (Quadratic).
• Third order – two bends (cubic).
• An inexact (doe not predict a value identical to the measured value

at a measured location, & does not force the surface to pass through
data values) interpolator.
• Points will plot above & below the fitted plane.

Global Interpolation.
• Use for gradually varied surfaces.
• To examine & remove global trends (trend analysis).
• Surfaces are highly susceptible to extreme values,

especially near edges.
• Anisotropic* (ellipse) or Isotropic* (circle) distributed

data can be modeled.
• Anisotropic – spatial dependence (autocorrelation) changes

w/ distance & direction between two locations.
• Isotropic - spatial dependence (autocorrelation) changes

w/ only distance between two locations.
Local Interpolation.
• Accuracy of any surface increases w/ an ability to

identify local trends.
• Fits many polynomials w/in overlapping neighborhoods.
• Accounts for local variation in a surface.
• Fits a surface using all points in a defined neighborhood.

• Provides a smooth fit for short range variation in longer range
trends.
• Successive iterations cross-validate an output surface using

optimal parameters based upon a minimum RMS predicted error.
• Local polynomials are sensitive to neighborhood distance.
• Anisotropic (ellipse) or Isotropic (circle) distributed data can be

modeled.
• Radial basis functions – (Local/neighborhood) polynomial
interpolation (smoothing).
• An exact (forces the surface to pass through data values)

interpolator.
• Thin-plate spline, spline w/ tension, completely

regularized spline, multiquadric spline, & inverse multiquadric
spline.
• Conceptually like rubber sheeting through points w/ minimum

curvature.
• A predicted surface can extend above a maximum or below a
minimum value.
• Uses smaller, overlapping planes.

• Thin-plate splines.
• Splines w/ tension.
• Optimal parameter is derived as the minimum RMS prediction

error. Higher values produce smoother surfaces.
• Use to create smooth surfaces from large sets of points.

• Best for gently varied surfaces.
• Not good for irregular, highly varied data, & data prone to error or
uncertainty.
Geostatistical Methods
• Regionalized variable in geostatistics – consists of: Structured

aspect reflecting large scale tendencies, and a random aspect
reflecting local irregularities.
• The accuracy of any surface increases w/ an ability to identify

local trends.
• Must characterize error & variability of the predicted surface.
• Based on statistical models.
• Geostatistical methods provide measures of error & uncertainty

(accuracy of predictions).
Basic Principles of Geostatistical Methods
• Assumes all values are a result of random processes w/ dependence

(as opposed to an independent random process, e.g. a coin toss).
• Dependent random process - a coin is tossed three times, tails,

heads, & heads. A fourth position is determined by a rule: since the
second & third flips are heads, assign the fourth the same as the first
(tails).
• Reality is represented by introducing randomness through local

fluctuations termed, drift, (as opposed to errors in Trend analysis)
around a fixed surface.
• Fluctuations are not errors in Geostatistical analysis, rather actual

features with their own structure
• Autocorrelation – statistical correlation between spatial random
variables of the same type where correlation is dependent upon
distance and/or direction (spatially dependent) separating the
locations.
• In geostatistics, knowing spatial locations allows distances to be

computed between observations & autocorrelation modeled as a
function of distance through the semi-variogram.
• Observe scatter of points in the semi-variogram. If points are

close to the line of best fit on one side, yet spread out more on the
other side, directional autocorrelation may exist.
Spatial Continuity
• Exists for most Earth Science data (similar values with

data close to each other, positive spatial autocorrelation).
• Scatterplots may be employed in semivariograms to

show the value of one variable and the value of the same
variable, but at a nearby location.
h – Scatterplots
• Shows all possible pairs of data values whose locations

are separated by a distance in a direction.
• The location of any point can be described by a vector

as can the separation between two points (p52).
• Given an origin at 0,0. Point xi,yi can be written as
vector ti, and point xj,yj can be written as vector tj.
• The separation between point i & j is ti - tj. This can

also be expressed as:
(xi - yi, xj- yj). The vector (distance) between these two
points (i to j) will now be referred as, hij.
• On a h - scatterplot, the x axis is labeled V(t), & the y

axis is labeled V(t+h).
• The x coordinate of a point is a value at a particular

location, and the y coordinate is a value a distance and
direction h away.
• Recall an h value is a vector, thus we can express it as
h = (0,1), where we have taken each data location and
paired it with the data location whose easting is the
same, and whose northing is 1 m larger to the north
(p53,A).
• The shape of the cloud of points on an h – scatterplot

delineates how continuous the data values are over a
certain distance in a specific direction (p54).
• If data values at locations separated by h are similar,

then the pairs will plot close to the line x = y (1, 1), or a
45˚degree slope through the data cloud (p54).
• As the separation distance increases (y increases), the

similarity between pairs of values decreases, and the
points on the h - scatterplot spread out further from the
diagonal line.
Correlation Functions, Covariance functions, and
Variograms
• h – scatterplots require some quantitative summary

of the information contained within them.
• One essential feature is fatness of the cloud of points.
• Summarized by a correlation coefficient (p(h)).

• As the cloud gets thicker (y increases), coefficient gets
smaller.
h Correlation Covar. Moment of Inertia
(ppm2) (ppm2)
(0, 1) 0.742 448.8 312.8
(0, 2) 0.590 341.0 479.2
(0, 3) 0.560 323.8 521.4
(0, 4) 0.478 291.5 652.9

• The relation between the correlation coefficient of an
h – scatterplot (p(h)) and h is termed a correlation
function or correlogram (p57).
• The correlation coefficient is dependent on h which is

vector defined as having magnitude (visual angle
varying inversely w/ distance from the eye) and
direction.
• Covariance (C(h)) – an alternative index for spatial

continuity, & termed covariance function (C(h)).
• A covariance function is also plotted as the covariance

(C(h)) against h, & is also inversely related to thickness
(y increases) (p57).
• Moment of inertia (g(h))– computed as, 1/2n S (xi – yi)2
about x = y (45˚).
• As h increases, points drift away from the 45˚ line,

thus the moment of inertia is a measure of the flatness
of the cloud – it increases as the cloud gets fatter.
• The relationship between the moment of inertia of an h

– scatterplot and h is termed a semivariogram
(variogram for short) (p57).
• Aberrant points can have a pronounced impact on

summary statistics.
• Note the significant difference in a correlation

coefficient w/ & w/out a single point value.
Correlation Coefficient
h All Points 19 ppm Excluded
(0, 1) 0.742 0.761
(0, 2) 0.590 0.625
(0, 3) 0.560 0.551
(0, 4) 0.478 0.559

Components of the Semivariogram
A Semi-variogram surface is controlled by the lag size and numbe

of lags.
Lag – distance between pairs of points.
• If the lag is too large, short-range autocorrelation is masked. If

the lag is too small, many empty bins will result.
• Lag - Multiply lag size x number of lags. This value should be <1/2
the largest distance among all points.
• Lag (bin) size – a limit of the distance (width) defining a bin into
which pairs of locations of approx. equal distance & direction are
placed to reduce the number of combinations.
• Increase or decrease lag based on a small or large range of points.

Binning (classification of lag size)- reduces the number of points plotted
making the semivariogram easier to interpret.
• Locations are grouped by distance & direction.
• Semivariogram average distance (h) =

0.5 * average [(value at a location Z(si) – value at location Z(sj)2].
• Only average distance & semi-variance for all pairs in a bin are plotted
as a single point on the empirical semivariogram.
• An empirical semivariogram value (from the y axis) for each bin for
each direction is plotted as a red dot. X axis is a distance from the center
of the bin to the center of the semivariogram surface).
• A semivariogram surface is plotted for each bin value (average

distance & semi-variance for all pairs in a bin) represented as color-
coded pixels.
Range – the distance where the semivariogram model
levels out.
• Sample points with distances closer (within) than the

range are spatially autocorrelated. Points beyond the range
make no useful contribution to the interpolation.
• The range defines the “zone of influence.”
Sill – The value on the y axis that defines the semi-

variance at the range.
• The flattened line from this point represents no change

in variation w/ increases in distance.
Nugget – The point where the semi-variogram model

intercepts the y axis.
• Due to measurement error, and/or variation at

distances smaller than the sampling interval.
Modeling the Semivariogram - Assuming stationarity,
autocorrelation can be examined & quantified (spatial modeling).
Stationarity
• Statistics relies on some notion of replication.
• It follows, estimates can be derived, & variation & uncertainty of

an estimate understood from repeated observations.
• Through continued observations of many samples,

dependencies become apparent.
• Statistics relies upon replication.
• Estimates derived with variation & uncertainty understood from

repeated observations.
• Stationarity is used to obtain replication in a spatial setting.
• A variogram is only used for a given practical distance.
• Stationarity – a variable is stationary if its distribution is

invariant (does not change) under translation (the entire
distribution is shifted through coordinate space in mass).
• Normally only the mean and covariance are required to

satisfy this requirement.
• Mean stationarity – constant (no spatial drift) between samples &
independent of location.
• Also, constant variance (no outliers) across space.
• Second-order stationarity – assumes covariance is the same between

two points that are the same distance & direction apart regardless of
points chosen.
• Covariance is dependent on the distance between any two

values, not location.
• For semi-variograms, intrinsic (genuine, essential, real) stationarity

is the assumption that the variance of the difference is the same
between any two points that are the same distance apart (increments)
regardless of which two points are chose. Also referred to as, quasi-
stationarity.
• Local fluctuations dominate the overall trend.
• This is exhibited within a natural increase in the variance beyond

these shorter sections (observations) of the distribution of a regional
variable as distance increases from each observation.
• Similar distance between points that provides replication to

estimate dependence rules & allow predictions to be made.
Assuming intrinsic stationarity, autocorrelation can be

quantified.
• Termed spatial modeling, structural analysis, or variography.
• A line is fitted in the empirical semivariogram similar to a least

squares line in regression analysis.
• Parameters of the curve should minimize the
deviations from the points by some criterion
• Used to quantify spatial autocorrelation – assumes

closer points (left side, x axis) are more similar (lower semi-
variance on y axis) than points further away (right side, x axis).
• Semivariogram Model – defined as one-half the variance of the
difference between two variables at two locations.
where, • g (si sj) = ½ var(Z(si) - Z(sj)) where,
g (si sj) = semivariogram (predicted values).

var = variance.
(si sj) = two locations.
(Z(s) = observed (measured) values.
• This is a dissimilarity function of increased variability (semi-

variance, y axis) w/ increased distance.
• Semivariogram fuction – average[(Z(si) - Z(sj))2].
• With observed values close to each other, the average

is small.
• As observed values get further apart, their values become

more dissimilar w/ higher variability & higher averages.
Calculating the Empirical Semivariogram
Step 1 – Given a data set of five locations [(1,5), (3,4), (1,3), (4,5),
(5,1)]and their associated values [(100), (105), (105), (100), (115)],
calculate the distance (x values) between each pair of locations
using the Pythagorean theorem.
Table 1
Value Locations Distance Cal. Distances
100 & 105 (1,5), (3,4) √[(1-32) + (5-42) 2.236

100 & 105 (1,5) , (1,3) √[(02) + (22)] 2
100 & 100 (1,5) , (4,5) √[(32) + (02)] 3
100 & 115 (1,5) , (5,1) √[(42) + (42)] 5.657
105 & 105 (3,4) , (1,3) √[(22) + (12)] 2.236
105 & 100 (3,4) , (4,5) √[(12) + (12)] 1.414
105 & 115 (3,4) , (5,1) √[(22) + (32)] 3.606
105 & 100 (1,3) , (4,5) √[(32) + (22)] 3.606
105 & 115 (1,3) , (5,1) √[(42) + (22)] 4.472
100 & 115 (4,5) , (5,1) √[(12) + (42)] 4.123
Step 2 – Calculate the empirical semivariance (y values) as
0.5 * [(value at location i – value at location j) 2], or one-half times
the difference squared for the values of the paired locations (empirical
semivariance).
Increasing the number of points (five in this example) can soon result
in a very large number of paired combinations (Table 1), since each of
the five point locations has an accompanying point to form a vector.
Table 2
Value Diff. Diff.2 Empirical Semivariance
5 (5)2 = 25 12.5
5 (5)2 = 25 12.5
0 (0)2 = 0 0
15 (15)2 = 225 112.5
0 (0)2 = 0 0
5 (5)2 = 25 12.5
10 (10)2 = 100 50
5 (5)2 = 25 12.5
10 (10)2 = 100 50
15 (15)2 = 225 122.5
Step 3 – Binning the Empirical Semivariogram. Since a
data set can consist of 100’s to 10,000’s of control points,
the individual vectors (delineated by two ordered
coordinate pairs) and the empirical semivariance values
associated with each pair, will quickly result in a cloud of
points in the semivariogram that will be unintelligible to
interpret.
• Binning reduces the total number of points plotted in

the semivariogram by assigning a bin (storage
compartment) based upon the lag size (distance) and
number of lags.
• Binning classifies locations grouped by distance &

direction.
• Only the average distance and average semivariance for

all pairs included in the bin are plotted as a single point.
• The basic equation would now be – Semivariogram
(distance h) = 0.5 * average [(value at location i – value at
location j) 2].
• This equation is interpreted as, compute one-half the

average of the differences squared of the values for all
paired locations separated by a distance h. Distance is
the lag distance.
• Considering a lag distance of some unit measure (e.g.

1), we create 5 bins, & will compute the average
semivariance for all pairs of points falling within the
distance limits (lag distance) assigned for each bin.
• Only an average distance & average semivariance for

all pairs in a bin are plotted as a single point on the
empirical semivariogram.
Table 3
Lag Dist. (Bin) Pairs Dist. Avg. Dist. Semivar. Avg.
1-2 1.414, 2 1.707 12.5, 12.5 12.5

2-3 2.236, 2.236, 3 2.491 12.5, 0, 0 4.167
3-4 3.606, 3.606 3.606 50, 12.5 31.25
4-5 4.472, 4.123 4.298 50, 112.5 81.25
5+ 5.657 5.657 112.5 112.5
Trend Analysis in Geostatistics
• A trend surface is made-up of global (structured or deterministic)

& random (uncorrelated, local irregularities referred to errors in
Trend Analysis) short-range variation.
• Global trend – all measurements are controlled by some (non-

random) deterministic factor. (a physical process, e.g. prevailing
wind, data collected along a transect, etc.).
• May be represented by a polynomial formula & removed from the

analysis, then returned before predictions are made.
• Modeling long-range or coarse scale variation.
• Note asymmetry of the trend model and variance in change across

each axis.
• Global trends are plotted as a 3-D box diagram.
• Plotted as sideways views along the x, z & y, z planes.
• Polynomials (best fit line) are created for each plane.
• Flat lines indicate no trend.
• A definite pattern to the polynomial suggests a global trend.
• Rotating the model enables an enhanced visual distribution.
• Model the residuals & reintroduce the trend in the prediction model.
• Used to examine local characteristics of spatial autocorrelation.

Removing the Global Trend
• Zkj(si) = the jth measurement of variable type k at the ith

spatial location si decomposed into:
m(s) = a deterministic (predicted mean) trend.
e(s) = random, autocorrelated errors (distance dependent).
Si = a location.
• Employed to satisfy normality, stationarity assumptions & to
model local, short-range variations.
• Stick to Ordinary Kriging unless there are strong reasons to

remove the trend surface.
• More parameters must be estimated when a trend is removed,

reducing the precision of the surface model.
• Keep the surfaces simple. i.e. 1st or 2nd order polynomials.
• Always check w/ cross-validation, & especially validation.

Random Trends
• Random implies governed by rules of probability, including

dependence of neighboring values (autocorrelation).
• Alignment of block-faulted mountain system is a global trend, but

a specific time-event surficial processes (volcanism) may affect
short-term variations in the mountain’s morphology.
• The shape of the semivariogram/covariance curve may vary (be

influenced) with direction (anisotropy) after the global trend has
been removed, or if no global trend exists at all.
• Anisotropy– spatial dependence (autocorrelation) changes w/ both
distance & direction between two locations.
• Usually not a deterministic process.
• Modeled as a random process (Random error usually of an

unknown cause) w/ higher autocorrelation in a preferred direction.
• Observe scatter of points in the semivariogram. If points are close

to the line of best fit on one side, yet spread out more on the other
side, directional autocorrelation may exist.
• When the data is anisotropic (the anisotropic box checked), the

yellow line becomes several lines indicating many directions).
• The surface cloud is elliptical (w/ an angular component) in shape.
• The outline of the range is displayed as a blue line throughout the

semivariogram/covariance surface cloud.
• The lengths of the semi-minor & semi-major axes determine the

ranges in the surface cloud, & define the search neighborhood.
• Once the shape is determined, establish constraints (number of

points, & partition the shape into bins to avoid directional bias due to
clustering or transected points w/in the shape).
• Ensure there are enough points to account for a meaningful

prediction.
• If enough points are not available for the bins, outside points
(uncorrelated) will be used which degrades the model.
• Distance greater than the range are not correlated.
• When the pairs of points in the semivariogram yield a straight

(horizontal) line (no variability), there is no spatial autocorrelation
in the data, thus a surface would be meaningless.
• The length of the semi-minor axis (minor range) from the center of
the cloud will reach the sill sooner (plot as a steeper grade) than the
semi-major (major range) axis.
• Directional Binning (Anisotropy)
Under the Semivariogram/Covariance Modeling dialog box –
• Use the Show Search Direction box under the

Semivariogram/Covariance Cloud dialog, and rotate or type in angle
(search) direction, angle tolerance, bandwidth, and lag
(distance/width).
• An angle of tolerance determines how close points will be

included from an observed point to the bandwidth.
• Bandwidth is the width of the search.
• Bins (lag distance/width) contain pairs of locations with a

predetermined distance (interval) & direction apart.
• Isotropic - spatial dependence (autocorrelation) changes only w/
changes in distance between two locations.
• When the data is isotropic, the shape of the semivariogram will be a

circle.
• The outline of the range is displayed as a blue line throughout the

semivariogram/covariance surface cloud.
• The length of any axis from the center of the

semi-variogram/covariance cloud will always be the same.
• The range will always be the same in all directions.
• Thus, the sill is equidistant in any direction.

Covariance – a statistical tendency of two variables
to vary in ways related to each other.
• A scaled version of correlation.

• A similarity function of decreased variability (semi-
variance, y axis) w/ increased distance (along the
x axis).
• C (si sj) = covar(Z(si), Z(sj)) where,
C (si sj) = covariance (predicted values).

covar = covariance.
(si sj) = two locations.
(Z(s) = observed (measured) values.
Z’ = mean of all the data.
• Covariance function – average[ (Z(si) –Z’) (Z(sj) – Z’) .

• With two variables close to each other (approaching 0
distance on the x axis), they are expected to be similar
w/ a large covariance & correlation.
• Positive covariance – when both variables tend to be

above their respective means together.
• Negative covariance - when one variable tends to be

below & the other variable above its mean.
Summary Statements on Modeling the
Semivariogram
• The semivariogram & covariance functions quantify

autocorrelation by measuring the statistical correlation
as a function of distance.
• A relationship between the semivariogram &

covariance function is expressed as:
g (si sj) = sill - C (si sj) where,
g (si sj) = semi-variogram (predicted values).

C (si sj) = covariance (predicted values).
• Predictions are made using either function because of
their equivalence.
• Finally, a line of best fit (a continuous function or

curve) is fitted through the cloud of points plotted in the
semivariogram from which predictions of attribute
values at unsampled locations can be made through the
Kriging steps.
• When the curve is steep near the origin (short range),

closer neighbors have a pronounced influence on the
prediction.
Exploratory Spatial Data Analysis (ESDA)
• ESDA provides interactive graphical models of the dataset.

Histogram - shows a frequency distribution.
• Shows the shape (skewness & kurtosis) of the distribution.
• Normal distributions are required for quantile & probability maps

using ordinary, simple, and universal Kriging.
Properties include –
• Variances of data values are more constant & the variance

approaches the mean.
• Mean, median, and mode are nearly equal in a normal distribution.

• Variance & standard deviation should approach 1.
• Skewness should approach 0, & kurtosis 3.
• Outliers in a skewed distribution can be global or local.
• Global – high or low value in the distribution.
• Local – a value w/in the normal range of the dataset, yet

locally it is unusually high.
• Both outliers may indicate real abnormalities in the measured

phenomena, or incorrect measurements.
• Trend removal helps justify assumptions of normality &

stationarity.
Normal Score Transformation (NST)
• Goal is to normally distribute all random errors for the entire

population, & make the variances more constant throughout the
study area.
• Perform this step after detrending since covariance & variograms

are calculated on residuals after trend correction.
• Ranks the dataset from low to high by creating an observed

cumulative histogram (OCH).
• Then matches the ranks of the OCH to equivalent ranks from a

cumulative standard normal distribution.
• Transformation is defined by taking values from the normal
distribution at that rank.
Three methods for NCT –
• Direct – uses the OCD (least smooth w/ least assumptions).
• Linear – fits lines between each step of the cumulative

distribution (middle of the road for smoothness & assumptions).
• Gaussian kernels – approximates the probability distribution by

fitting a linear combination of density cumulative distributions
(smoothest back transformation w/ strongest assumptions).
• Must back-transform to bring values back to their original scale.

Other transformations (BAL) to Normality
• Any relationship between the variance & the trend are removed w/
these transformations.
• Hence, trend & model autocorrelation are removed (optionally).
• Box-Cox (B) – Small counts in a region of a distribution w/ low

variability (variance). If variance approaches the mean, it
approaches a normal distribution.
• Square Root – as a power function of the data variances will be
made more constant with trends to normality.
• Log (L) – Used as part of the Box-Cox transformation w/

positively skewed distributions & few large values in the dataset.
• Arcsine (A) – used for data that is a proportion (percentage).

Again, makes the variances in the data set more constant.
Declustering Preferential Data -
• Use when data are not random or regular spaced, i.e. higher
concentration of points in some regions.
• May weight the data w/ denser points receiving less weight.
• Cell declustering –
• A grid of rectangular cells is assigned over the data locations.
• A weight is attached to each data location, & inversely

proportional to the number of points in a cell.
• Optimal cell size by –
1.Use a minimum weighted mean if data is preferentially sampled in

areas of high values.
2.Use a maximum weighted mean if data is preferentially sampled in

areas of low values.
• Voronoi Map –
• All locations w/in the polygon are closer to the sample point than
any other sample point.
• Weight each polygon in proportion to the area it represents.
• Border polygons are often given more weight since the borders are
composed of truncated rectangular sides.
Normal & General QQPlot
• QQPlots are quantiles from two distributions.
• Normal QQPlot – a plot of data values (y axis) versus

standardized units (x axis).
• Points that do not fall on a straight line represent

departures from the normal trend & the data is asymmetric.
• General QQPlot – used to assess the similarity of the

distributions of two sets of data, one set on each axis.
• Two identical distributions are plotted as a straight

line.
Kriging Methodology
Kriging family – Does not require a normal distribution, except for

obtaining probability and quantile maps in Ordinary, Simple, &
Universal Kriging.
• Data collected as point events.
• Values are actually spatially continuous (values occur everywhere

on a surface).
• Continuous (Real) numbers, integers, ordered

categorical, unordered categorical, or binary.
• Weighted surrounding measured values (like

IDW).
• Quantifies closeness of objects & spatial arrangement (spatial
autocorrelation) by variography.
• Weights are statistically determined by the spatial structure

expressed in the semi-variogram.
• Point pairs closer in distance (to a predicted location) have smaller

variability.
• Must minimize the weighted least-square differences.
• Optimize the locations contributing w/ a search neighborhood.
• Circle if there is no directional bias.
• Ellipse for directional bias.

• Assumes all random errors are 2nd order stationarity, &
random errors have a mean = 0, & covariance between any
two random errors is distance & direction dependent, not by
exact locations.
• Also capable of producing maps of:
• Kriging standard errors associated w/ predicted values.

• Probability whether a critical level is exceeded.
• Subsets to assess the quality of the output surface.
• Quantiles* for a predetermined probability level.
* Quantile – the pth quantile is a value between 0-1, &

represents the proportion p (of unity) of the data below this
value.
• Basic Kriging equation – Zkj(si) = m(s) + e(s) where,
Zkj(si) = the jth measurement of variable type k at the ith spatial

location si decomposed into:
m(s) = a deterministic (predicted mean) trend.
e(s) = random, autocorrelated errors (distance dependent).
Si = a location.
• Trend removal helps justify assumptions of normality &

stationarity.
Kriging Methods
• While sampling provides accurate information at data points, it

does not tell us what values exist between them.
• Kriging – an estimation (weighted average) method yielding the

best (minimum variance) unbiased linear estimates of point values.
• Accuracy is based upon:
• Number of samples & data quality.

• Geometry of the sample points (Dispersed
distributions are optimal).
• Distance between sample points.
• Spatial continuity (regularity) of a variable.
Ordinary Kriging - Use if m (mean) is unknown.
• Must have data points sampled from continuous phenomena.
• May be used for data that have a trend.
• May create training & test datasets for validation.
• May de-trend & perform Kriging on the residuals.

• Simple Kriging – when the trend is completely known (all
parameters & covariates known), whether constant or not.
• m (mean) is a known constant.
• Must have data points sampled from continuous phenomena.
• It follows, you also know e(s) as deviations from the mean.
• May produce quantile, probability, prediction, or standard error

maps.
• May create training & test datasets for validation.

• Consider checking for bivariate normality.
• Consider declustering when using the normal score transform &

data has been preferentially resampled with denser points in some
areas.
• Consider parameters for semivariogram/covariance modeling

(mathematically expresses autocorrelation), search neighborhoods, &
cross validation.
• Changing Z(s) = 0 (an indicator variable, 0 or 1), if Z(s) is below
some value, e.g. 0.12 ppm ozone concentrations, or 1 if it is above
this value. A probability map is developed to assess the degree a
value exceeds a given threshold using Indicator Kriging.
• Uses a binary variable.
• A threshold is computed to establish limits for binary assignments.
• Assumes an unknown constant mean.
• Creates a probability or standard error map from 0-1.
• Consider parameters for semivariogram/covariance

(mathematically expresses autocorrelation) & search neighborhoods,
although estimation is difficult w/ 0’s & 1’s.
• Probability Kriging – form an indicator variable of Z(s) & predict
it using the original transformed data Z(s) in a cokriging model.
• Like indicator Kriging, only uses cokriging since there are two
types of random errors (e1(s) & e2(s)) generated from two constants
(m1 &m2).
• Assumes unknown constant means.
• Creates a probability or standard error map from 0-1.

• Consider parameters for semivariogram/covariance
(mathematically expresses autocorrelation) & search
neighborhoods, although estimation is difficult w/ 0’s & 1’s.
• Probability kriging tries to exploit the extra information in the

original data in addition to the binary variable.
• Results in more autocorrelation estimations per variable & cross-

correlation creating more uncertainty.
• Disjunctive Kriging – used to form predictors of functions of
variables.
• Ordinary Kriging uses linear combinations of data to find the final

predictor.
• Disjunctive Kriging can predict either the value (predictor), or an

indicator (derived from the threshold) by generalizing the Indicator
Kriging method to form the predictor.
• Assumes a model - f(Z(s)) = m1 + e(s) where,

• m (mean) is an unknown constant.
• f(Z(s)) is some arbitrary function of Z(s).
• e(s) = random, autocorrelated errors (distance
dependent).
• May produce prediction, probability, standard error of indicators,
or standard error maps.
• A standard error map quantifies uncertainty of the prediction.
• In a normal distribution, the true value is w/in plus/minas 2x the

prediction standard error 95% of the time.
• An appropriate transformation & detrending,

semi-variogram/covariance models (mathematically expresses
autocorrelation), & search neighborhoods.
• Decluster when the data has been preferentially sampled.
• Assumes a bivariate normality which should be examined.

• Universal Kriging – operates like polynomial regression, only the
regression is w/ spatial coordinates as explanatory variables & the
errors are assumed auto-correlated, rather than independent.
• If m(s) is a second order polynomial trend, subtract (detrend) it

from the original data (Z(s)) & Krig the random errors e(s)).
• The mean of e(s) is 0.
• The autocorrelation is now modeled from the random errors.
• Consider parameters for semivariogram &/covariance modeling

(mathematically expresses autocorrelation), search neighborhoods,
& cross validation.
• May produce quantile, probability, prediction, or standard error

maps.
• Cokriging – a family of Kriging methods using multiple
variables.
• Considers different trends on multiple variables (Z1(s) &

Z2(s), (e.g. ozone & particulate materials) & measures
autocorrelation for errors & cross-correlation between errors
e1(s) & e2(s) between two variables.
• Ordinary Cokriging models:

• Z1(s) = m1 + e1(s)
• Z2(s) = m2 + e2(s) where,
m1 & m2 = unknown constants.

e1(s) & e2(s) = random errors.
Z1(s) & Z2(s) = predicted measurement
variables.
• Requires estimating auto-correlation & cross-correlation for each
variable.
• Each estimate for auto-correlation introduces additional variability.

Gains in precision of predictions may not be defensible.
• Consider parameters for semivariogram/covariance (mathematically

expresses autocorrelation), search neighborhoods, & cross-covariance
(mathematically expresses cross-correlation) validation, &
detrending.
• May produce prediction, probability, quantile, standard error of

indicators, or standard error maps.
Cross-covariance – Correlation between spatial random variables of
different types with correlation dependent upon distance &/or
direction separating the locations (e.g. NO2 & Ozone).
• Positive cross-covariance – both variables are above their

respective means.
• Negative cross-covariance – when one variable is above & the

other below their its mean.
• Cross-covariance cloud – shows empirical cross-covariance for all

pairs of locations between two datasets placed in bins as a function
of distance between the two locations.
• Bins are averaged & smoothed to create a cross-covariance
surface. Controlled by the lag size and number of lags.
• A global outlier (unusually high value) identified in the histogram

is also visible with all pairings of points in the semi-variogram
cloud, regardless of distance.
• Two distinct groups of points occur in the semi-

variogram cloud for global outliers.
• A local outlier will have high semivariogram values (approaching

the top of the y axis) associated with a single point at close
distances (left side along x axis) in the semi-variogram cloud.
• Both types of outliers can have detrimental effects on a prediction
surface.
• Use the Show Search Direction box under the

Semi-variogram/Covariance Cloud dialog, and rotate or type in angle
direction, angle tolerance, bandwidth, and lag.
Cross-Validation & Validation
• Seeks to establish how well any surfacing model predicts

the values at unknown locations.
• Cross Validation – uses all data points to estimate the

autocorrelation model.
• Each data location is selected, & the remaining known

locations are used to predict its value.
• The predicted & actual (known measurement) are

compared & graphed.
• Ideally, they should be close to the same value.
• Unusual points are set aside, & may require refitting the
autocorrelation model.
• Validation – creates a model for a subset of the data set.
• Used to check the validity of a protocol of decisions.

• Choice of semivariogram model, lag size, search
neighborhood.
• Predicted plot – scatterplot of predicted vs. measured values.
• Kriging tends to under predict large values, & over

predict small values.
• Thus, the slope of the solid blue line is usually less than
1 (black dashed line).
• Error plot – same as the predicted plot, except true values are
subtracted from the predicted values.
• Standardized Error plot – measurement values are subtracted from
the predicted values, then divided by the estimated Kriging standard
errors.
•All three plots show strength of Kriging predictions.
• With autocorrelation & a good model, the predicted line (blue)

should be close to the measured (dashed black) line.
• If all the data were independent (no autocorrelation), all

predictions would equal the mean of the measured data, & the plot
would be a horizontal line.
• QQPlot – quantiles of the difference between predicted &
measurement values divided by the estimated Kriging standard
errors, & quantiles from the standard normal distribution.
• If the predicted errors from the observed (measured) values are

normally distributed, points should lie close to the dashed black line.
• Obtaining probability and quantile maps in Ordinary, Simple, &

Universal Kriging methods that rely on a normal distribution can be
used with confidence.
Summary of Cross-Validation & Validation Analysis
• Predictions should be unbiased w/ a mean prediction error near 0.
• Standardized prediction errors – prediction errors divided by their

prediction standard errors to standardize (scale). The mean
standardized prediction error should approach 0.
• RMS prediction error (Root-Mean-Square in dialog box) should be

small. Shorter green lines indicate closer predictions to their true
values.
• Average (estimated) standard errors should be close to RMS
prediction errors to correctly assess the variability in a prediction.
• If average standard errors are > RMS prediction errors, variability

is overestimated.
• If average standard errors are < RMS prediction errors, variability

is underestimated.
• If RSM standardized errors are >1, variability is underestimated.
• If RSM standardized errors are <1, variability is overestimated.

Cross Validation & Validation Comparison
• Assesses two surfacing models (e.g. IDW vs. Ordinary Kriging) w/

statistics & scatterplots for each model in a single dialog box.
• Generally, the best model has the parameters as per those listed in
the previous summary.
• Average (estimated) standard errors should be close to RMS

prediction errors for a more valid model. This comparison
supercedes a model even if it has an RMS prediction error lower than
the second model.
Measurement Error
• Occur when it is possible to have several different observations at

the same location.
• A soils series description from a survey document & actual field

measurements.
• Measurement error model is expressed as:
• Z(s) = m(s) + e(s) + d(s) where,

• Z(s) = a variable of interest.
• d(s) = measurement error.
• m(s) = deterministic trend.
• e(s) = autocorrelated errors (microscale variation or
nugget effect in this model).
• W/ no measurement error, Kriging is an exact interpolator.
• When errors exist, consider predicting only the filtered value

(m(s) + e(s)) that does not contain the measurement error (d(s)).
• It follows, where data have been collected (actually measured), the

filtered values will not be the same.
• Thus the final map will be more smoothed & an inexact

interpolator.

Summary of Geostatistical Analysis

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Summary of Geostatistical Analysis

Uploaded by

Copyright:

Available Formats

Summary of Geostatistical Analyst

Spatial Analysis by Visual & Statistical Description

• Contour Map – lines delineating values of equality (p42).

• Symbol Map – Symbols posted representing a class of data values

• Grayscale Map – Similar to symbol map but uses gradations of

• Each plot represents a different range or thresholds of

• Distribution of values above or below a threshold are

• Create a grid whose size is dependent upon the size of

• Analogous to a raster spacing (spatial resolution).

• Given a data set with a 1 meter grid spacing and 100

• Select a window (search neighborhood) that is

• Mean (top value) & standard deviations (bottom value)

• Note patterns; higher standard deviations where

• Standard deviations vary more than means throughout

• Anomalies in the local variability have an impact on

• Areas of uniform values (low variability) render the

• Consider four relationships between local mean and

Graph A - Local mean (straight line) and variability

Graph B – A trend in the local mean, but the

Graph D – A trend in both the local mean and

• Ideally, data should be as A or B (relatively constant

• When a relationship exists between the local mean and

• A low correlation coefficient indicates a low

• Univariate, normally distributed data can be spatially

Basic steps to creating a surface

• Add layers in ArcGIS.

• Explore statistical properties of data through:

• Histogram (univariate distribution).

• Voronoi polygons – analyze spatial

• Normal QQPlot – Check for normality.

• Trend Analysis – identify global trends.

• General QQPlot – explore whether two datasets have the same

• Cross-covariance cloud – analyze cross-covariance between two

• Select a surface model to create a surface.

• Deterministic – use mathematical functions.

• Sample points (observed) of phenomena are established at

• Algorithms interpolate (predict) other (unmeasured) points &

• Premise – objects that are close to each other tend to be more

• General inverse weighted formula – Z’(so) = S lZ(si)

• Weight formula - li = dio-p / S dio-p

Z’(so) = a predicted value for location (so).

• Optimal p values are derived as the minimum RMS prediction

• Cross-validation process computes the difference from a

• An exact (predicts a value identical to the measured

• An output surface is sensitive to clustering & outliers.

• A predicted surface can not extend above a maximum or below a

• IDW assumes the surface is driven by local variation captured by

• Use the Validation option by creating training (contains measured

• First order – linear.

• An inexact (doe not predict a value identical to the measured value

• Points will plot above & below the fitted plane.

• Use for gradually varied surfaces.

• To examine & remove global trends (trend analysis).

• Surfaces are highly susceptible to extreme values,

• Anisotropic* (ellipse) or Isotropic* (circle) distributed

• Anisotropic – spatial dependence (autocorrelation) changes

• Isotropic - spatial dependence (autocorrelation) changes

• Accuracy of any surface increases w/ an ability to

• Fits many polynomials w/in overlapping neighborhoods.

• Accounts for local variation in a surface.

• Fits a surface using all points in a defined neighborhood.

• Successive iterations cross-validate an output surface using

• Local polynomials are sensitive to neighborhood distance.

• Anisotropic (ellipse) or Isotropic (circle) distributed data can be