Regionalized Variables

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

REGIONALIZED

VARIABLES
- Variance (geostatistics)
- Covariance (spatial
correlation)
- Cluster analysis
(regionalization)

Ronny Berndtsson
Objectives course
§ Ability to do a geostatistical
analysis employing variance
of a data set.
§ Ability to do a spatial corre-
lation analysis employing
covariance of a data set.
§ Ability to do a regionali-
zation employing cluster
analysis.

Regionalized variables
Literature
§ Handouts
§ Application spatial correlation
and cluster analysis, Uvo and
Berndtsson (1996) (available
on Air through ftp).
§ Application geostatistics,
Berndtsson et al. (1993).

Regionalized variables
Software
§ Geoeas (geostatistical
software freely available
from http://www.epa.gov/-
ada/csmos/models/geoeas.html
§ Matlab (correlation and
cluster analyses)

Regionalized variables
Today's topic
§ Analysis of a single data field
z(x, y) (note; for correlation
time series are needed)!
z(x, y)

Regionalized variable z = z(x, y)

Regionalized variables
Examples of spatially dependent
variables (regionalized variables)

§ Rainfall
§ Soil´s hydraulic conductivity
§ Chemical concentration
§ Plant properties
§ Population characteristics
§ What variable is not?

Regionalized variables
Why use regional
variables theory?

- General analysis tool for spatially


varying/dependent data.
- A general tool for spatial interpolation.
- A tool for regionalization studies.
- A basis for developing spatial models
that consider regional differences.
- Just because it is fun and interesting!

Regionalized variables
Definition of variance
and covariance

Variance V(x) = E[(x - m)2] = σ2

Covariance C(x, y) =
E[(x - mx)2(y - my)2]

Correlation coefficient R(x, y) =


C(x, y)/[V(x) V(y)]1/2

Regionalized variables
Spatial field points

Assumptions:
1st order stationarity
(E(z) = constant)
z(x, y) 2nd order stationarity
(V(z) = constant)
x

. z2
.
z1
y
h

Regionalized variables
Spurios correlation (or
variance)!

§ If data contain many zeros


§ If data contain outliers
§ If data contain trend

§ Check normality (if non-


normal apply relevant data
transformation)
§ De-trend if necessary

Regionalized variables
Definition
semivariance

V(z2 – z1) = E(z2 – z1)2 = 2γ(h)

γ(h) = E(z2 – z1)2/2

γ*(h) = Σ ((z+h) - z)2/2n(h)

n = number of observation
pairs at h distance

Regionalized variables
Spatial correlation

ρ(h) = C(z1, z2)/[V(z1) V(z2)]1/2

where z1 and z2 are time series at


corresponding points and h is the
distance between z1 and z2

Regionalized variables
Both correlation and semi-
variance expressed as a
function of distance h

ρ(h) 1.0
γ(h) = 1 - ρ(h)
(if stationary!)

0
Distance h

γ(h)
Vtot

0
Distance h

Regionalized variables
Errors + small-scale
variability

ρ(h) 1.0 Sum of errors and


small-scale variation

Distance h

γ(h)
Vtot

Sum of errors
and small-scale
variation Distance h

Regionalized variables
The variogram

γ(h)
Sill

Vtot

Nugget

Range Distance h

Regionalized variables
The correlogram

ρ(h)
1.0

Decorrelation =
1/e = 0.37

Distance h
Decorrelation
distance

Regionalized variables
Spatial analyses
Correlogram Variogram

”Normal”

Random

Highly
correlated
in space

Significant
trend

Data not
stationary

Distance Distance

Regionalized variables
Experimental variogram

γ(h)

Regionalized variables
Correlogram for different
time steps

ρ(h)

Distance

Regionalized variables
Correlogram
seasonal difference

ρ(h)

Distance

Regionalized variables
Regional differences; data not
homogeneous and stationarity
assumption not fulfilled!

z(x, y)

Area of low y
correlation Area of high
correlation

Regionalized variables
Cluster analysis
§ Technique to discriminate
between different data
groups with mutually high
similarity. Dendrogram:

From: http://www.kgs.ku.edu/Workshops/GEMINI/geoff_petrophysical_modules/sld006.htm

Regionalized variables
Ward´s method

From: http://www.kgs.ku.edu/Workshops/GEMINI/geoff_petrophysical_modules/sld006.htm

Regionalized variables
Indata for cluster analysis

§ Raw data
§ Semivariance
§ Correlation
§ etc

Regionalized variables
Level of detail in
dendrogram

Level 3
Level 1
Level 2

Regionalized variables
Regionalization based on
three levels of detail

Regionalized variables
Directional dependence
spatial correlation

Regionalized variables
Regional differences for
spatial correlation

Regionalized variables
Exercises

§ Calculate and plot vario-


grams for your data
(Geoeas)
§ Calculate and plot correlo-
grams for your data (Matlab)
§ Use cluster analysis to
delineate homogeneous
regions (Matlab)

Regionalized variables
Geoeas
§ Calculate experimental
variograms
§ Plot variograms
§ Use the variograms for
kriging

Regionalized variables
Data file Geoeas
Data for Geoeas analyses
3
X-coor m
Y-coor m
Al ug/g DM
0.707 39.293 55000
0.303 20.234 44000
0.450 15.232 34000
0.420 10.210 64000
etc

Regionalized variables
Spatial correlation
§ Calculate correlation
coefficient for time series of
pairwise points
§ Calculate distance between
these pairwise points
§ Plot correlation vs. distance
for all unique station
combinations
ρ(h)
x
x x
x
Distance

Regionalized variables
Cluster analysis
§ Possible in Matlab
§ Perform a regionalization
§ Compare e.g., variance with
correlation as dependent
measure.

Regionalized variables
Matlab help Cluster
§ CLUSTER Construct clusters from LINKAGE output.
T = CLUSTER(Z,'CUTOFF',C) constructs clusters from cluster
tree Z. Z is a matrix of size M-1 by 3, generated by LINKAGE.
C is a threshold for cutting the hierarchical tree generated
by LINKAGE into clusters. Clusters are formed when
inconsistent values are less than CUTOFF (see INCONSISTENT).
The output T is a vector of size M that contains the cluster
number for each observation in the original data.
T = CLUSTER(Z,'MAXCLUST',N) specifies N as the maximum
number of clusters to form from the hierarchical tree in Z.
T = CLUSTER(...,'CRITERION','CRIT') uses the specified
criterion for forming clusters, where 'CRIT' is either
'inconsistent' or 'distance'.
T = CLUSTER(...,'DEPTH',D) evaluates inconsistent values to
a depth of D in the tree. The default is D=2.
See also PDIST, LINKAGE, COPHENET, INCONSISTENT,
CLUSTERDATA.

Regionalized variables
References
Berndtsson, R., A. Bahri, and
K. Jinno, (1993), Spatial
dependence of
geochemical elements in a
semi-arid agricultural field:
2. Geostatistical properties,
Soil Sci. Soc. Am. J., 57,
1323-1329.
Uvo, C. B., and R.
Berndtsson, (1996),
Regionalization and spatial
properties of Ceará State
rainfall in Northeast Brazil,
J. Geophys. Res .,
Regionalized variables
101,
4221-4233

You might also like