SpaceStat Chapter2

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

PART II

CHAPTER II

GENERAL CONCEPTS IN SPATIAL


DATA ANALYSIS
Assist. Prof. Dr. Mahmut Çavur
2.1. Introduction
Spatial data analysis involves:

 Accurate description of data relating to a process in


space.
 Exploration of patterns and relationships in data
 Search for explanations of such patterns and
relationships

These relate to:


  Visualizing spatial data
  Exploring spatial data
  Modeling spatial data
2.2. Visualizing Spatial Data

An essential requirement in any data analysis is the


ability to be able to “see” the data being analyzed.
Plots of data and other graphical displays of various
descriptions are fundamental tools for:

Seeking patterns
Generating hypotheses
Assessing the fit of proposed models
Determining the validity of predictions derived
from models
Maps are the tools for visualizing the spatial data.
Hence GIS can provide an environment to create maps
for spatial data and to explore spatial patterns and
relationships quickly and easily.

Cartographic considerations are important in using


maps in spatial data analyses. Because bad choices of
map type or scaling used for data values can lead to

 Misleading conclusions drawn from the display

 Suggest inappropriate models for the process


under study
2.3. Exploring Spatial Data

Exploratory methods for spatial data may be in the


form of:
 Maps
or
 Conventional plots

 E.g. Some exploratory techniques when applied to


point events result in contour map of the estimated
intensity of occurrences of events over the whole study
area; others, applied to the same set of events result in
a graph to throw light on the degree of spatial
dependence between event locations.
Exploring spatial data:

 Provides good descriptions of the data


 Help to develop hypothesis
 Help to establish appropriate models

If many exploratory spatial techniques result in


different forms of maps, then how do they really differ
from visualization techniques?
2.3.1. Distinction between visualizing and exploring
spatial data

Dividing line between visualization of spatial data and


exploratory data analysis is somewhat artificial. The
distinction is made based on the degree of data
manipulation.

 E.g.
Suppose that we have cause-specific death rates which
are age-standardized in a number of administrative zone.
Visualizing spatial data involves:

 A map of death rates


 Simple transformation of the rates

(No data manipulation)

Exploring spatial data involves:

 Map of spatial moving average of the rates in for


smoothing out local variations in order to see clearly
global trends (the moving averages are computed
in which each rate is replaced by the average of itself
and those neighboring districts)

(Data manipulation)
2.3.2. Distinction between exploring and modeling spatial
data

Exploratory methods do not involve any explicit model


for the data. However several exploratory techniques
involve informal comparison of some summary data.
Hence models do enter into exploratory techniques. The
distinction is based on the degree to what extent any
comparison made between the model. Moreover models
depend on certain assumptions.
 E.g.
Stan Openshaw (a quantitative geographer) tried to detect clusters
in point distributions of incidence of childhood leukemia. For this
purpose he used a technique which exhaustively compares the
observed intensity of events in circles of varying radius centered
on a fine grid imposed over the study area. By this way the aim
was to detect if cases were random in the circles. The circles with
significant discrepancies are identified and retained for later
display and investigation. This technique involves a model for
searching a random pattern and performs repeated formal
statistical comparisons with this model.

However, the validity of such comparison does not depend on the


assumption of any specific alternative model. The technique is
detecting clusters not searching for an explanation for the
process by which such clusters occur.

Therefore, this form of analysis makes few a priori assumptions


about the data and is fully in line with explanatory methods
2.4. Modeling Spatial Data

Models are mathematical abstraction of reality and not reality


itself. A statistical model involve using a combination of both:

 Data
 Reasonable assumptions

About the nature of phenomena being modeled. The assumptions


are arise from:

Background theoretical knowledge about the behavior of the


phenomena

The results of previous analysis on the same or similar


phenomenon

Judgement and intuition of the modeler.


A statistical model for a stochastic process consists of
specifying a probability distribution for the random
variable/variables that present the phenomena. Once a
probability distribution is fully specified there is
effectively nothing further that can be said about the
behavior of the process. A fitted model is evaluated
and results may lead to modification of assumptions or
using different model or updating the existing one.

 E.g.
Consider modeling levels of ozone in a large rural area.
The ozone level at each location s in R will vary during
the day and from day to day. A model can be fitted to
explain the distribution of ozone level based on a linear
regression.
Figure 2.1. Ozone levels
Basic Assumptions:

1. Random variables { Y(s), s ÎR } are independent

2. The probability distribution of random variable Y(s)


only differ in their mean value

3. The mean value is a simple linear function of


location.

4. Y(s) has normal distribution about this mean with


the same constant variance, σ2.
The model:

Where;
s1 and s2 are spatial coordinates of s

The assumptions provide a framework under which


final model specifications reduce to a problem of
estimation of unknown parameters. βi can be
estimated based on Maximum Likelihood Estimation
method.
The next step is to test the reliability of the model or
goodness of the fit. This can be achieved by using
hypothesis-testing methods. Testing hypothesis, which
involves comparison of the fit of a hypothesized model
with that of an alternative, is in fact one facet of
statistical modeling. At this step:

 Does a model in which certain parameters have pre-


specified values fit the data significantly well?
Figure 2.2. Analysis of spatial data
2.5. Practical Problems of Spatial Data Analysis

There are basically four types of problem that an


analyst can face:

1. Problem of geographical scale

2. Lack of spatial indexing

3. Problem of edge of boundary effects

4. Problem of modifiable areal unit


Problem 1: Geographical scale at which analyses are
performed.

Spatial data analysis is concerned with detecting and


modeling spatial pattern. However, pattern at one
geographical scale may be simply random variations in
another pattern at a different scale.
Problem 1: Geographical scale at which analyses are
performed.
 E.g. Local variations in disease rates may die out
against the national scale.

The scale to which spatial analysis relates depends on:

Phenomena under study

Objective of the analysis

Scale at which data collected


Problem 2: Lack of spatial indexing or ordering in
space.

An indexing implies that we have a natural notion of


what is next or previous. On a regular grid there is
reasonably a natural ordering of locations. However,
spatial data are not indexed most of the time. While
some data (those from satellites) come in the form of
regular grid or lattice, much spatial data are provided
for a patchwork quilt of areal units or irregularly
distributed set of sites.

 E.g. We can only speak of neighborhood of a zone


for areal units that share a common boundary.
Problem 3: Problem of edge or boundary effect.

In the middle of a study area, a site or zone may likely


to be surrounded by others; i.e. zone may have
neighbors. However, at the edge of the map or study
region, the neighbors extend in one direction only. In
spatial domain there is potentially much greater set of
observations around the edge of the map. Therefore
edge effects play critical role. This problem can be
overcome by leaving a guard area.
Problem 4: Problem of modifiable areal unit.

When data are measurements on a set of zones, often


they are aggregated measurements such as
households or individuals living in a zone. For the
sake of confidentially, the data are realized for arbitrary
areal units. The important point is to note that any
result from the analysis of these area aggregations is
usually conditional on the set of zones. Depending on
different aggregated areas the result is subject to
change.
Problem 4: Problem of modifiable areal unit.

Mean = 8.88; Aerial unit = 9 Mean = 8.33; Aerial unit = 3

Mean = 8.47; Aerial unit = 3 Mean = 9.33; Aerial unit = 3


2.6. Computers and Spatial Data Analysis

Q: Given that some spatial analysis capabilities are


available in widely used systems, is there a need for
spatial analysis functions beyond those currently
provided by GIS?

A: At present yes!

 E.g. A GIS will currently be able to overlay a set of


points (childhood cancer) onto a set of polygons
(buffer zones constructed along high voltage power
lines). The GIS will then be able to count how many
points lie within particular polygons by performing a
“point-in-polygon” operation.
However, it is hard to find a system, which evaluates
significantly the nature of the association between the
set of points and the set of polygons.

If we want to know whether there is statistically


significant association between the incidence of
childhood cancer and proximity to high voltage power
lines we can not do this readily by using GIS.

There are several ways for the use of computers in


spatial data analysis. Most of the time spatial analysis
techniques are coupled with GIS.
3.6.1.Methods of coupling GIS and spatial data analysis

There are 4 different methods to use spatial analysis


techniques with GIS:

 Full integration

 Loose coupling

 Close coupling

 Special combinations
Full integration: Every method for exploratory spatial
analysis and modeling are available within a GIS.

Loose coupling: Data are exported from GIS for use


within a spatial statistical framework, (i.e. having GIS and
separate spatial analysis software talk to each other)

Close coupling: Spatial analysis routines are called from


within GIS, (which requires use of macro language
capabilities of GIS).

Special combinations: A self-contained spatial analysis


system for a specific purpose is developed (Case I).
OR
Spatial analysis and GIS functions are added to a
standard statistical package (Case II).
2.7. Stationarity and Isotropy (terminology)
A spatial phenomenon is represented within a spatial
domain (R) and the location of each stochastic
phenomenon is expressed by s. The set of s within R
referred as a spatial stochastic process, {Y(s), s Є R}.

sR  Any data location in R

 s1 
s  s1, s 2    
T Location vector of point s

 s2 

Z(s) : s  R  Spatial Stochastic Process


2.7. Stationarity and Isotropy
Modeling real life problems requires data and assumptions on nature of the
phenomena

Figure 2.7. A spatial stochastic process


Spatial stochastic processes often exhibit a degree of spatial
correlation and this correlation by somehow has to be
incorporated into the analysis.

In general the behavior of spatial phenomena is the result of a


mixture of two types of effects:

 First order
 Second order

First order effects: They relate to variation in the mean value of the
process in space (global or large-scale trend).

Second order effects: They are resultant from the spatial


correlation structure or the spatial dependence in the process. In
other words, this effect occurs due to the tendency for deviations in
values of the process from its mean to follow each other in
neighboring sites (local or small-scale effects).
Behavior of Spatial Phenomena

First Order effects

Variation of a mean value in space - global or large scale


trend

Second order effects

Correlation in the deviations of the process values from


the mean
E.g.
Suppose that iron particles onto a sheet of paper marked
with a fine grid are scattered. The numbers of particles
landing in different grid-squares represent a spatial
stochastic process. As long as the mechanism by which
we scatter iron particles is purely random, they should
lack in both 1st and 2nd order effects (Case I).

Figure 2.7. Random scatter of iron particles (Case I)


Suppose that a small number of weak magnets are placed
under the paper at different points and we scatter the iron
particles again. The result will be a process with spatial
pattern arising from first-order effects (clustering in
numbers in grid-squares will occur globally at and around
the sites of magnets (Case II).

Figure 2.8. Scatter of iron particles with magnets


underneath the paper (Case I)
Now remove the magnets and weakly magnetized the iron
particles instead and scatter them again. The result is a
process with spatial pattern arising from a second-order
effect (some degree of local clustering will occur due to
the tendency of particles to attract or repel each other)
(Case III).

If the magnets are now replaced under the paper and the
magnetized particles scattered again we end up with a
spatial pattern arising from both first-order and second-
order effects
Stationarity: A spatial process is stationary or
homogeneous if its statistical properties are independent
of absolute location in R. This implies that:

 E[Y(s)] and VAR[Y(s)] are constant over R and do not


depend on locations.

 COV[Y(si),Y(sj)] between values at any two sites si and


sj, depends only on the relative locations of these
sites, the distance and direction between them. But
not their absolute location in R.

 If mean, variance and covariance structure changes


over R, the process exhibits non-stationarity or
heterogeneity.
Stationarity
Spatial data usually represent a single realization of a
random process

 Some degree of stationarity must be assumed to


make inferences about the data

 Stationarity is a form of location invariance


(invariance in the mean and variance of the process).

 Stationarity is the quality of a process in which the


statistical parameters (mean and standard deviation)
of the process do not change with space or time.
Stationarity

Strict or Strong Stationarity

Requires equivalence of distribution functions under


translation and rotation - all higher-order moments are
constant including the variance and mean

Weak Stationarity

Requires a constant mean and covariance that is


independent of location. The covariance is only
dependent on distance and direction between points
Non-Stationary Mean

Decreasing from west to east


Stationarity E(Y(s))=μ (Constant) for all s Є R

Cov [Y(s1), Y(s2)] = Cov [Y(s3), Y(s4)]


Cov [Y(s5), Y(s6)] = Cov [Y(s9), Y(s10)]
Cov [Y(s1), Y(s2)] ≠ Cov [Y(s7), Y(s8)]
Isotropy: The spatial process is called isotropic if the
covariance depends only on the distance between si and
sj, not the direction in which they are separated.
E.g.

Weakly magnetized iron particles


scattered onto paper with no
magnets represents an isotropic
process

Figure 2.9. Stationary and isotropic spatial processes


Isotropic

Refers to a spatial process that evolves the same in all


directions

Anisotropic

A spatial process in which the correlation and


covariance differs with direction

Most methods assume spatial correlation as isotropic


Modeling Spatial Processes

Most methods assume spatial correlation is isotropic

 Heterogeneity in the mean

 Deviations from the mean are stationary

You might also like