Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

SPATIAL STATISTICS – I

Introduction: In GIS dictionary (Wade and sommer, 2006) define Spatial Statistics as ‘The
field of study concerning statistical methods that use space and special relationships (such as
distance, area, volume, length, height, orientation. centrality and/or other spatial characteristics
of data ) directly in their mathematical computations. Spatial statistics are used for a variety of
different types of analyses, including pattern analysis, shape analysis, prediction, spatial datasets,
statistical modeling and prediction of spatial interaction, and more.

Different forms of Spatial Statistics:

It is difficult to classify the types of spatial analysis-

1 Spatial data analysis, 2 Spatial autocorrelation, 3 Spatial stratified heterogeneity, 4 Spatial


Interpolation, 5 Spatial regression, 6 Spatial interaction,7 Simulation and modeling, and 8
Multiple-point geostatistics (MPS).

Spatial and Non Spatial Data:-

 Spatial Data:

Spatial data, also known as geospatial data, is information about a physical object that can be
represented by n umerical values in a geographic coordinate system'.

Characteristics of Spatial Data:

 It is the data or information that identifies the geographic location of features and
boundaries on Earth,
 Spatial data represents natural or constructed features, oceans, and more.
 Spatial data is usually stored as coordinates and topology, and is data that can be mapped.
 Spatial data is often accessed, manipulated or analyzed through Geographic
Information Systems (GIS)
 Most spatial databases allow representing simple geometric objects such a
Points, lines and polygons.

Form of Spatial Data:

Spatial data means the natural feature data with latitude and longitude Such as-Google
image, satellite image, and air photo.
 Non Spatial Data:

A non spatial database (or "traditional database") lacks spatial capabilities, i.e. ability to
store, and query data defined in a geometric space. Non spatial data means the data that
can be used to identify absolute space with coordinate system.
Characteristics of Non Spatial Data:

Non spatial data refers to data that cannot be used to identify a location. A data set that
does contain direct spatial data (lat long elevation) or indirect spatial data.
Non spatial data represents basically socio economic information of an absolute space.

Form of Non Spatial Data:

Non spatial data represents the socio economic data of an absolute space. The forms of these
types of data are-Demographic data, Agricultural data, Industrial data, Social data (poverty, cast,
ethnic people, race etc), Cultural data (Linguistic data, Religious data), Political data (such as-
Area, Boundary length) etc of any absolute space that determine with specific coordinate system.

Point Pattern:

In spatial statistics the patterns of point’s data, characteristics, there distribution relation is
analyzed in different methods using the geographical information system (GIS). These
techniques are-Quadrant Analysis, Kernel Estimation, Nearest Neighbour Analysis, K Function
etc.

 Spatial Auto correlation:

The measure spatial auto correlation based on feature location and attribute values. A measures
of the degree to which a set spatial features and their associated data trend to be clustered
together in space ( Positive spatial autocorrelation )or dispersed (negative spatial
autocorrelation ) spatial autocorrelation in GIS helps understand the degree to which one object
is similar to other nearby objects. Moran's I (Index) is used to measure spatial autocorrelation.

Objective:

a. To see the distribution pattern of point data carrying a specific indicator.


b. To find out the relationship of the points with varying intensities of value under the
defined indicator.
c. To show the degree of association of data values of a specific indicator to know the
clustered or dispersed pattern in space.

Software Handing Processes:

Search → Write Spatial Autocorrelation → then click Incremental Spatial


Autocorrelation→ Input features → Input field → Number of Distance Bands→ then
clicks on Environment → click on Processing Extent → Extent → same as layer
Boundary → Ok.
Result:

Table 1: Global Moran’s I summary by Distance

Distance Mornas Index Z score P value


148753.00 0.9481 22.2984 0.0000

250639.76 0.87132 40.7116 0.0000

352526.53 0.8035 49.5786 0.0000

454413.30 0.7279 55.4799 0.0000

556300.07 0.6602 60.6307 0.0000

658186.84 0.5823 61.6719 0.0000

760073.60 0.4830 59.4421 0.0000

861960.37 0.3872 54.9609 0.0000

963847.14 0.2966 48.2840 0.0000

1065733.91 0.2212 41.6183 0.0000

Interpretation: From the above table, It is seen that the entire river basin of is divided into ten
equal part. If the value of Spatial Autocorrelation is less than 1, the pattern exhibits clustering. If
the Moran’s index is greater than 1, the trend is toward dispersion. For the calculation, the value
of Moran’s Index is less than 1 and it indicates the clustering pattern of river basin. Here we can
see that the relation between Moran’s Index and Z score are negative. So the value of Z score are
increasing but the value of Moran’s Index is decreases. The value of P is 0.0000 which represent
the level of significant at 100%.

 Nearest Neighbour Analysis:

In spatial statistics for analyze the dispersion among the points are emphasized in a specific
space. The Average Nearest Neighbour Distance tool measures the distance between each feature
centroid and its nearest neighbour’s centroid location. It then averages all these nearest
neighbour distances GIS is very useful in analyzing spatial relationship between features .one
such analysis is finding out which features are closest to a given feature. In this tutorial, we will
use 2 datasets and find out which points from one layer are closest to which point from the
second layer.
Objective: To know whether the settlements are distributed in clustered or dispersed manner.

Methodology:

Rn =2 d √ n/a

Where,

Rn =The nearest neighbor statistic

d = The mean observed nearest neighbor diatance

n= The total number of points

a=The total area

Software Handing Processes:

Arc tool box →Spatial Statistics Tools→ Analyzing Patterns→ Average Nearest Neighbor →
Input feature class → then clicks on Environment → click on Processing Extent → Extent →
same as layer Boundary → Ok.

Results:

Table 2: Average Nearest Neighbor Analysis ( Bihar, West Bengal)

NNRatio 1.511688
NNZScore 6.710964
P Value 0.00000
NNExpected 617.704514
NNObserved 933.776401

Interpretation: If the index (Average Nearest Neighbor ratio) is less than 1, the pattern exhibits
clustering. If the index is greater than1, the trend is toward dispersion The Nearest Neighbor
Index is expressed as the ratio of the Observed Mean Distance to the Expected Mean Distance.
In this calculation result of Average Nearest Neighbor ratio 1.511688 that means is more
dispersed than random that is, exhibits the semi dispersion. The p-value is measures of statistical
significance which tell whether or not to reject the null hypothesis. Here, P value is 0.00000
that represents there is 100% probability dispersion.

 Hotspot Cold spot Analysis:


Hotspot cold spot identify high and low values .In the present case as the parameter is elevation
is tries to delineate the areas of high and low elevation. With the help of p and z statistics the
result is also tested at a specific significant level.

Objective: To find out whether there is any cluster of high and low elevation.

Software Handing Processes:

Search → Write Hot spot Analysis→ then click Hot spot Analysis (Getis-Ord Gi*) → Input
features → Input field → Output Features class→ Conceptualization of Spatial Relationship
(Zone of Indifference) → then clicks on Environment → click on Processing Extent → Extent →
same as layer Boundary → Ok

Result:
Interpretation: From the above analysis, we have found cold and hotspot all over the basin.
Mainly the lower segment of river we have found the maximum number of cold spot and the
middle part are found maximum number of hot spot. That means the higher elevation is clustered
in the middle segment and lower elevation is clustered in the lower segment in the river segment.
Increasing intensity of red colour is represent 90, 95, 99% significant level respectively of the
hotspot. Same as increasing intensity of blue colour is represent 90, 95, 99% significant level
respectively of the coldspot. White colour does not represent any significant height.

 Multi distance cluster analysis

In nearest neighbor analysis pattern is analysed fir entire area in a single spatial scale.
Sometimes, it can create confusion in some specific situation for e.g.- if NNA is executed in an
area having one major town and surrounding villages up to a large distance, the result may show
confusing understanding. Dispersed or random distribution may be the result but in reality we
can clearly notice up to a certain distance from main town the settlements are highly clustered. It
means NNA overlooks the scale effect on result. But multi distance cluster analysis considers
this and computes the pattern at different user defined distance. It means result of settlement
pattern can be examined at different convenient distant bands.

Objectives: To find out settlement pattern at multiple distance bands.

Software Handling Processes:

Arc tool box →Spatial Statistics Tools→ Analyzing Patterns→ Multi Distance spatial cluster
analysis (Ripley's K-function)→ Input feature class → Output Table → Number of distance
bands (10) →Compute Confidence Envelope (Optional) → Display Results Graphically (Option)
→ then clicks on Environment → click on Processing Extent → Extent → same as layer
Boundary → Ok.

Result:
Interpretation: The above K function map, there are expected and observed two curve seen
on map. The expected curves divide the entire map in two equal parts. One part is clustered
and another part is dispersed. When the observed curve rises above the expected curve is
considered as clustered pattern but the degree of clustering much less. After the 3349.81
metre the observed curve lies below the expected curve is considered as dispersion pattern
but the degree of dispersion are not much. The patterns of river basin are statistically
significant at more than 99% level of significant.

 Mean Center:

The mean center identifies the geographic center (or the center of concentration) for a set of
features. The mean center is a point constructed from the average x and y coordinate of all the
features in the study area.

Objective: The mean center tool is a measure of central tendency. It’s useful for tracking
changes in the distribution or for comparing the distributions of different types of features.

Methodology:

Where xi and yi are the coordinates for feature i, and n is the total number of features.
Software Handling Process:

Arc tool box →Spatial Statistics Tools→ Measuring Geographic Distribution → Mean center →
Input feature class→ Output Table→ then clicks on Environment → click on Processing Extent
→ Extent → same as layer Boundary → Ok.

Result:

Interpretation: The mean centre is the average x and y coordinate of all the features in the study
area.  Here we found that mean centre is not located in middlemost point of the study area.

 Median Center:
Median centre identifies the location that minimizes travel from it to all other features in the
dataset. While the mean centre tool returns a point at the average X and average Y coordinate for
all feature centroids, the median center uses an iterative algorithm to find the point that
minimizes Euclidean distance to all features in the dataset.
Objective: Median Center is measures of central tendency Use project data with this tool to
accurately measure distances.

Methodology:

Where xi and yi are the coordinates for feature i, Xwe, Ywe weighted euclidean median centre.
Software Handling Process:

Arc tool box →Spatial Statistics Tools→ Measuring Geographic Distribution → Median centre
→ Input feature class→ Output Table→ then clicks on Environment → click on Processing
Extent → Extent → same as layer Boundary → Ok.

Result:
Interpretation: The median centre uses an iterative algorithm to find the point that minimizes
Euclidean distance to all features in the dataset. Here we found that the mean centre and the
median centre are all most similar.

 Standard Distance:

The Standard distance is measuring the compactness of a distribution provides a single value
representing the dispersion of features around the center. The value is a distance, so the
compactness can be represented on a map by drawing a circle with the radius equal to the value.

Objective: This tool requires projected data to accurately measure distance. For line and polygon
features, feature centroids are used in distance computation.

Methodology:
Where xi and yi are the coordinates for features i, (Ẋ, Ӯ) represents the Mean Center for the
features, and n is the equal to the number of features.

Software Handling Process:

Arc tool box →Spatial Statistics Tools→ Measuring Geographic Distribution → Standard
Distance → Input feature class→ Output Table→ circle size → standard deviation (1,2,3) →
then clicks on Environment → click on Processing Extent → Extent → same as layer Boundary
→ Ok.

Result:
Interpretation: The underlying spatial pattern of features is concentrated toward the center with
fewer features toward the periphery (following a Rayleigh distribution), a one standard deviation
circle size will cover approximately 63 % of the features; a two standard deviation circle size
will contain a approximately 98% of the features; and a three standard deviations circle size will
99% of the features in two dimensions.

 Ordinary Least Squares:

Ordinary Least Squares is the best known of the regression techniques. It is also a starting point
for all spatial regression analysis. It provides a global model of the variable or process you are
trying to understand or predict; it creates a single regression equation to represent that process.

Objective: Ols chooses the parameters of a linear function of a set of explanatory variables by
the principle of least square, minimizing the sum of squares of differences between the observed
dependent variable in the given dataset and those predicted by the linear function.

Software Handling Process:

Search → Ols → click ordinary Least Squares→ Input feature class→ Unique ID Field→ Object
→ Output Feature class→ Dependent Variable→ Explanatory Variables → then clicks on
Environment → click on Processing Extent → Extent → same as layer Boundary → Ok.

Result:
Interpretation: From the above diagram it is clearly seen that the different colors are divided
the river basin in seven categories. The value of standard regression residual is lies in both
positive and negative. In this map the yellow point show less departure than the other point and
the regression residual are decreases. So we can easily saw that the degree of departure is low
that indicated the influence of independent variables in dependent variable are much strong.

You might also like