Unite 5

CSC468 Geographical Information System | Prithivi Narayan Campus | Dev Timilsina
Spatial Analysis
5.1 Vector data analysis
Vector data analysis uses the geometric objects of point, line, and polygon. The accuracy
of analysis results depends on the accuracy of these objects in terms of location and
shape. Topology can also be a factor for some vector data analyses such as buffering
and overlay. The raster data model is composed of a regular grid of cells in specific
sequence and each cell within a grid holds data. The conventional sequence is row by
row which may start from the top left corner. In this model, basic building block is the cell.
The representation of the geographic feature in this model is used by coordinate, and
every location corresponds to a cell. Each cell contains a single value and is
independently addressed with the value of an attribute. One set of cells and associated
value is a layer. Cells are arranged in layers.
Geoprocessing with Raster
Like the geoprocessing tools available for use on vector datasets, raster data can undergo
similar spatial operations. Although the actual computation of these operations is
significantly different from their vector counterparts, their conceptual underpinning is
similar. The geoprocessing techniques covered here include both single layer and
multiple layer operations.
Single Layer Analysis
Reclassifying, or recoding, a dataset is commonly one of the first steps undertaken during
raster analysis. Reclassification is basically the single layer process of assigning a new
class or range value to all pixels in the dataset based on their original values (Figure
below, "Raster Reclassification". For example, an elevation grid commonly contains a
different value for nearly every cell within its extent. These values could be simplified by
aggregating each pixel value in a few discrete classes (i.e., 0–100 = “1,” 101–200 = “2,”
201–300 = “3,” etc.). This simplification allows for fewer unique values and cheaper
storage requirements. In addition, these reclassified layers are often used as inputs in
secondary analyses.
1
devtimilsina@pncampus.edu.np
Multiple Layer Analysis
A raster dataset can also be clipped similar

to a vector dataset (Figure below "Clipping
a Raster to a Vector Polygon Layer").
Here, the input raster is overlain by a
vector polygon clip layer. The raster clip
process results in a single raster that is
identical to the input raster but shares the
extent of the polygon clip layer.
Fig: Clipping a Raster to a Vector Polygon Layer

Raster overlays are relatively simple compared to their vector counterparts and require
much less computational power. The mathematical raster overlay is the most common
2
overlay method. The numbers within the aligned cells of the input grids can undergo any
user-specified mathematical transformation.
Overlay Analysis,
Overlay analysis is a group of methodologies applied in optimal site selection or suitability
modeling. It is a technique for applying a common scale of values to diverse and dissimilar
inputs to create an integrated analysis. Suitability models identify the best or most
preferred locations for a specific phenomenon.
➢ An overlay operation combines the geometries and attributes of two feature layers
to create the output. The geometry of the output represents the geometric
intersection of features from the input layers.
3
➢ Each feature on the output contains a combination of attributes from the input
layers, and this combination differs from its neighbors
Types of problems addressed by suitability analysis include the following:
1. Where to site a new housing development?

2. Which sites are better for deer habitat?
3. Where economic growth is most likely to occur?
4. Where the locations are that are most vulnerable to mudslides?
Overlay analysis often requires the analysis of many different factors. For instance,
choosing the site for a new housing development means assessing such things as land
cost, proximity to existing services, slope, and flood frequency. This information exists in
different rasters with different value scales: dollars, distances, degrees, and so on. You
cannot add a raster of land cost (dollars) to a raster of distance to utilities (meters) and
obtain a meaningful result.
Additionally, the factors in your analysis may not be equally important. It may be that the
cost of land is more important in choosing a site than the distance to utility lines. How
much more important is for you to decide.
Even within a single raster, you must prioritize values. Some values in a particular raster
may be ideal for your purposes (for example, slopes of 0 to 5 degrees), while others may
be good, others bad, and still others unacceptable.
The following lists the general steps to perform overlay analysis:
1. Define the problem.

2. Break the problem into sub-models.
3. Determine significant layers.
4. Reclassify or transform the data within a layer.
5. Weight the input layers.
6. Add or combine the layers.
7. Select the best locations.
8. Analyze.
4
Steps 1 through 3 are common steps for nearly all spatial problem solving and are
particularly important in overlay analysis.
1. Define the problem
Defining the problem is one of the most difficult aspects of the modeling process. The
overall objective must be identified. All aspects of the remaining steps of the overlay
modeling process must contribute to this overall objective.
The components relating to the objective must be defined. Some of the components may
be complementary and others competitive. However, a clear definition of each component
and how they interact must be established.
Not only is it important to identify what the problem is, a clear understanding needs to be
developed to define when the problem is solved, or when the phenomenon is satisfied. In
the problem definition, specific measures should be established to identify the success of
the outcome from the model.
For example, when identifying the best location for a ski resort, the overall goal may be
to make money. All factors that are identified in the model should help the ski area be
profitable.
2. Break the problem into sub-models
Most overlay problems are complex and it is recommended that you break them down
into sub-models for clarity, to organize your thoughts, and to more effectively solve the
overlay problem.
For example, a suitability model for identifying the best location for a ski resort can be
broken into a series of sub-models that should help the ski area be profitable. The first
sub-model can be a terrain sub-model identifying locations that have a wide variety of
favorable terrain for skiers and snowboarders.
Making sure people can reach the ski area can be captured in an accessibility sub-model.
Included in the sub-model can be access from major cities as well as local road access.
5
A cost sub-model can identify the locations that would be optimal to build on. This sub-
model may identify flatter slopes as well as those close to power and water as being
favorable.
Certain attributes or layers can be in multiple sub-models. For example, steep slopes
might be favorable in the terrain sub-model but detrimental for the cost of building sub-
models.
3. Determine significant layers
The attributes or layers that affect each sub-model need to be identified. Each factor
captures and describes a component of the phenomena the sub-model is defining. Each
factor contributes to the goals of the sub-model, and each sub-model contributes to the
overall goal of the overlay model. All, and only factors that contribute to defining the
phenomenon, should be included in the overlay model.
For certain factors, the layers may need to be created. For example, it may be more
desirable to be closer to a major road. To identify the distance each cell is from a
road, Euclidean Distance may be run to create the distance raster.
4. Reclassification/transformation
Different number systems cannot be directly combined effectively. For example, adding
slope to land use would produce meaningless results. The four main numbering systems
are the following:
❖ Ratio: The ratio scale has a reference point, usually zero, and the numbers within
the scale are comparable. For example, elevation values are ratio numbers, and
an elevation of 50 meters is half as high as 100 meters.
❖ Interval: The values in an interval scale are relative to one another; however, there
is not a common reference point. For example, a pH scale is of type interval, where
the higher the value is above the neutral value of 7, the more alkaline it is, and the
lower the value is below 7, the more acidic it is. However, the values are not fully
comparable. For example, a pH of 2 is not twice as acidic as a pH of 4.
6
❖ Ordinal: An ordinal scale establishes order, such as who came in first, second, and
third in a race. Order is established, but the assigned order values cannot be
directly compared. For example, the person who came in first was not necessarily
twice as fast as the person who came in second.
❖ Nominal: There is no relationship between the assigned values in the nominal
scale. For example, land-use values, which are nominal values, cannot be
compared to one another. A land use of 8 is probably not twice as much as a land
use of 4.
Because of the potential different ranges of values and the different types of numbering
systems each input layer may have, before the multiple factors can be combined for
analysis, each must be reclassified or transformed to a common ratio scale.
Common scales can be predetermined, such as a 1 to 9 or a 1 to 10 scale, with the higher

value being more favorable, or the scale can be on a 0 to 1 scale, defining the possibility
of belonging to a specific set.
5. Weight
Certain factors may be more important to the overall goal than others. If this is the case,
before the factors are combined, the factors can be weighted based on their importance.
For example, in the building sub-model for siting the ski resort, the slope criteria may be
twice as important to the cost of construction as the distance from a road. Therefore,
before combining the two layers, the slope criteria should be multiplied twice as much as
distance to roads.
6. Add/Combine
In overlay analysis, it is desirable to establish the relationship of all the input factors
together to identify the desirable locations that meet the goals of the model. For example,
the input layers, once weighted appropriately, can be added together in an additive
weighted overlay model. In this combination approach, it is assumed that the more
favorable the factors, the more desirable the location will be. Thus, the higher the value
on the resulting output raster, the more desirable the location will be.
7
Other combining approaches can be applied. For example, in a fuzzy logic overlay
analysis, the combination approaches explore the possibility of membership of a location
to multiple sets.
7. Select the best locations
In most overlay analysis and suitability models, identifying the best locations for the
phenomenon you are modeling is the ultimate goal. This phenomenon will have specific
size and spatial requirements to function effectively. These requirements include the total
area necessary to function, the number of regions this area should be distributed among,
the shape characteristics for the regions, and the minimum and maximum distance
between the regions.
The Locate Regions tool allows you to identify the best combinations of desired regions
that meet the defined spatial constraints.
8. Analyze
The final step in the modeling process is for you to analyze the results. Do the potential
ideal locations sensibly meet the criteria? It may be beneficial not only to explore the best
locations identified by the model but to also investigate the second and third most
favorable sites.
Buffering
Buffering is the process of creating an output dataset that contains a zone (or zones) of
a specified width around an input feature. In the case of raster datasets, these input
features are given as a grid cell or a group of grid cells containing a uniform value (e.g.,
buffer all cells whose value = 1). Buffers are particularly suited for determining the area
of influence around features of interest. Whereas buffering vector data results in a precise
area of influence at a specified distance from the target feature, raster buffers tend to be
approximations representing those cells that are within the specified distance range of
the target. Buffering usually creates two areas: one area that is within a specified
distance to selected real world features and the other area that is beyond. The area that
is within the specified distance is called the buffer zone.
8
A buffer zone is any area that serves the purpose of keeping real world features distant
from one another. Buffer zones are often set up to protect the environment, protect
residential and commercial zones from industrial accidents or natural disasters, or to
prevent violence. Common types of buffer zones may be greenbelts between residential
and commercial areas, border zones between countries (see figure_buffer_zone), noise
protection zones around airports, or pollution protection zones along rivers.
The border between the United States of America and Mexico is separated by a buffer
zone. (Photo taken by SGT Jim Greenhill 2006).
In a GIS Application, buffer zones are always represented as vector polygons enclosing
other polygon, line or point features.
A buffer zone around vector points. A buffer zone around vector polylines.
9
A buffer zone around vector polylines.

Variations in buffering
There are several variations in buffering. The buffer distance or buffer size can vary
according to numerical values provided in the vector layer attribute table for each feature.
The numerical values have to be defined in map units according to the Coordinate
Reference System (CRS) used with the data. For example, the width of a buffer zone
along the banks of a river can vary depending on the intensity of the adjacent land use.
For intensive cultivation the buffer distance may be bigger than for organic farming (see
Figure figure_variable_buffer and Table table_buffer_attributes).
Figure Variable Buffer 1: Buffering rivers with different buffer distances.
10
Buffering rivers with different buffer distances.
River Adjacent land use Buffer distance (meters)

Breede River Intensive vegetable 100
cultivation
Komati Intensive cotton cultivation 150
Oranje Organic farming 50
Telle river Organic farming 50
Table Buffer Attributes 1: Attribute table with different buffer distances to rivers based on
information about the adjacent land use.
Buffers around polyline features, such as rivers or roads, do not have to be on both sides
of the lines. They can be on either the left side or the right side of the line feature. In these
cases the left or right side is determined by the direction from the starting point to the end
point of line during digitizing.
Multiple buffer zones
A feature can also have more than one buffer zone. A nuclear power plant may be
buffered with distances of 10, 15, 25 and 30 km, thus forming multiple rings around the
plant as part of an evacuation plan (see figure_multiple_buffers).
Figure Multiple Buffers 1:
Buffering a point feature with distances of 10, 15, 25 and 30 km.
11
Buffering with intact or dissolved boundaries
Buffer zones often have dissolved boundaries so that there are no overlapping areas
between the buffer zones. In some cases, though, it may also be useful for boundaries of
buffer zones to remain intact, so that each buffer zone is a separate polygon and you can
identify the overlapping areas (see Figure figure_buffer_dissolve).
Figure Dissolve Buffers 1: Buffer zones with dissolved (left) and with intact boundaries
(right) showing overlapping areas.
Buffering outward and inward
Buffer zones around polygon features are usually extended outward from a polygon
boundary but it is also possible to create a buffer zone inward from a polygon boundary.
Say, for example, the Department of Tourism wants to plan a new road around Robben
Island and environmental laws require that the road is at least 200 meters inward from
the coast line. They could use an inward buffer to find the 200 m line inland and then plan
their road not to go beyond that line.
Common problems / things to be aware of
Most GIS Applications offer buffer creation as an analysis tool, but the options for creating
buffers can vary. For example, not all GIS Applications allow you to buffer on either the
left side or the right side of a line feature, to dissolve the boundaries of buffer zones or to
buffer inward from a polygon boundary.
12
A buffer distance always has to be defined as a whole number (integer) or a decimal

number (floating point value). This value is defined in map units (meters, feet, decimal
degrees) according to the Coordinate Reference System (CRS) of the vector layer.
More spatial analysis tools
Buffering is a an important and often used spatial analysis tool but there are many others
that can be used in a GIS and explored by the user.
Spatial overlay is a process that allows you to identify the relationships between two
polygon features that share all or part of the same area. The output vector layer is a
combination of the input features information (see figure_overlay_operations).
Figure Overlay Operations 1:
Spatial overlay with two input vector layers (a_input = rectangle, b_input = circle). The
resulting vector layer is displayed green.
Typical spatial overlay examples are:
❖ Intersection: The output layer contains all areas where both layers overlap
(intersect).
❖ Union: the output layer contains all areas of the two input layers combined.
❖ Symmetrical difference: The output layer contains all areas of the input layers
except those areas where the two layers overlap (intersect).
➢ Difference: The output layer contains all areas of the first input layer that do not
overlap (intersect) with the second input layer.
Based on the concept of proximity, buffering creates two areas: one area that is within
a specified distance of select features and the other area that is beyond.
13
➢ The area that is within the specified distance is called the buffer zone.
➢ There are several variations in buffering. The buffer distance can vary according
to the values of a given field.
➢ Buffering around line features can be on either the left side or the right side of the
line feature. Boundaries of buffer zones may remain intact so that each buffer zone
is a separate polygon.
Network analysis
Network analysis in GIS is often related to finding solutions to transportation problems. In
a GIS the real world is represented by either one of two spatial models, vector-based,
or raster-based. Real world networks, such as a road system, must be modelled
appropriately to fit into the different spatial models. Even though the models differ, the
solution to different transportation problems in either raster or vector GIS uses the same
path finding algorithms. Whether raster or vector GIS is to be preferred is more a question
of choice than of accuracy.
Introduction
In general, a network is a system of interconnected linear features through which

resources are transported or communication is achieved. The network data model is an
abstract representation of the components and characteristics of real-world network
systems. One major application of network analysis is found in transportation planning,
where the issue might be to find paths corresponding to certain criteria, like finding the
shortest or least cost path between two or more locations, or to find all locations within a
given travel cost from a specified origin. Traditionally, a GIS, represents the real world in
either one of two spatial models, vector-based, i.e. points, lines and polygons, or raster-
based, i.e. cells of a continuous grid surface. This study will investigate the subject of
network analysis in both raster and vector GIS, in order to compare the two spatial
models. It will discuss their limitations and advantages, by using a road network as an
example.
Network modeling in general: A network model can be defined as a line graph, which
is composed of links representing linear channels of flow and nodes representing their
14
connections. In other words, a network takes the form of edges (or arcs) connecting pairs
of nodes (or vertices). Nodes can be junctions and edges can be segments of a road or
a pipeline. For a network to function as a real-world model, an edge will have to be
associated with a direction and with a measure of impedance, determining the resistance
or travel cost along the network.
Typical network graph and table structure, listing nodes, connectivity of edges, turn
impedance and edge attribute data.
Since networks utilize the basic arc-node structure, by definition, due to the way the data
is stored, the vector network will already have a topological structure, relating all
elements. All that is needed, simply speaking, is to implement the resistance factors in
the attribute tables for the lines or nodes. Directions are an explicit part of the vector
network topology. If the directions are derived from digitising a road map, or received as
a ready coded network form a data supplier, they may not correspond with the real-world
directions and need to be checked. Consequently, the representation of network elements
requires substantial amount of time to be devoted to data preparation and validation. This
can be quite complex, depending on the amount of travel cost information we want to
incorporate in the model: road width, speed limit, road class, delay at traffic lights, delays
15
in taking turns at crossroads, to mention just a few. For a “simple” crossroads with four
edges and one node there are as many as 16 possible turns, three directions from each
edge to other edges, plus four 180-degree U-turns. In a mixed rural/urban road network
in an average Norwegian municipality, with 7000 edges and nodes, there can be as many
as 18000 turn possibilities (Husdal, 1998). Arcs usually describe the centerline of a
network feature, such as a road centerline. Arcs and nodes are discretely referenced by
coordinates. Also lines that cross, but not intersect, can be directly implemented in the
vector model, much like in the real world, where we have “overpasses” and
“underpasses”.
Network modeling in vector GIS
Arcs and nodes, together with the special-purpose network elements stops, centers and
turns, form the network model in vector GIS. Stops can be delivery or pick-up points along
a route, centers are used for allocating services and investigating catchment areas, turns
arte used in determining direction and flow within the network. The characteristics of any
system being modeled in a network must be abstracted into a form that may be
represented by one of these elements. Path finding in vector GIS Dolan et al (1993), Chou
(1993) and Jones (1998) have described the process of finding a criteria-determined path
through a network in great detail. Path finding algorithms fall into one of two main
categories, matrix algorithms and tree-building algorithms, of which the latter one is the
one mostly used in GIS. Matrix algorithms find the shortest distance between all pairs of
nodes in iterative steps, eliminating the least favorable nodes, as seen in Chou (1993).
This is based on that it is possible to represent the network as a matrix. Tree-building
algorithms find the shortest path from an origin node to all other nodes, producing a tree
of shortest paths with branches emanating from the origin. (Lombard et al., 1993). The
most commonly used tree-building procedure is that originally developed by Dijkstra
(1959), of which to date many modifications and improvements have been made for
specific applications. In order to find a path, the algorithm will build a tree data structure
that represents specific paths through the network. This is often referred to as a breadth-
first search, that fans out to as many nodes as possible before penetrating deeper into
the tree (Dolan, 1993). Starting from one origin node, the search tree builds branches in
16
all directions, adds up the resistance figures, and keeps only those that represent the
cumulative least cost. For each new set of adjacent nodes the calculations for all possible
edges towards these nodes are repeated till all nodes and edges have been utilized, and
the final destination is reached with minimal cost. During the process, edges may appear
in the search tree and then disappear as the calculations discard their value.
Figure demonstrates an
example of optimum paths
based on minimum distance.
In the figure, there are
locations of number of main
hospitals within the ring road
of Kathmandu valley. If there
has been an accident out of
ring road (let’s say: close to
Bhaktapur), which is the
closest hospital and the shortest route to that hospital for a ambulance. The network
analysis identifies the closest hospital (Bir Hospital as you notice in the figure) in terms of
distance and also indicates how to go there.
5.2 Raster Analysis
The Raster Analysis toolbox contains a set of tools for performing raster analysis on data
in your portal. By distributing the processing between multiple server nodes, you can
process large datasets in less time than processing using your desktop machine. Raster
Analysis tools are powered by your ArcGIS Image Server.
Raster analysis is similar in many ways to vector analysis. However, there are some key
differences. The major differences between raster and vector modeling are dependent on
the nature of the data models themselves. In both raster and vector analysis, all
operations are possible because datasets are stored in a common coordinate framework.
17
Every coordinate in the planar section falls within or in proximity to an existing object,
whether that object is a point, line, polygon, or raster cell.
➢ Raster data analysis is based on cells and rasters.

➢ Raster data analysis can be performed at the level of individual cells, or groups of
cells, or cells within an entire raster.
➢ Some raster data operations use a single raster; others use two or more rasters.
➢ Raster data analysis also depends on the type of cell value (numeric or categorical
values).
➢ The analysis environment refers to the area for analysis and the output cell size.
The raster data analysis presents the final powerful data mining tool available to
geographers. Raster data are particularly suited to certain types of analyses, such as
basic geoprocessing surface analysis, and terrain mapping. While not always true, raster
data can simplify many types of spatial analyses that would otherwise be overly weighty
to perform on vector datasets.
The types of operations in Spatial Analyst
❖ Local operations
❖ Focal operations
❖ Zonal operations
❖ Global operations
❖ Application operations
The operations of cell-based analysis available in the ArcGIS Spatial Analyst extension
can be divided into five types:
➢ Those that work on single cell locations (local operations)

➢ Those that work on cell locations within a neighborhood (focal operations)
➢ Those that work on cell locations within zones (zonal operations)
➢ Those that work on all cells within the raster (global operations)
➢ Those that perform a specific application (for example, hydrologic analysis
operations)
18
Each of these categories can be influenced by, or based on, the spatial or geometric
representation of the data and not solely on the attributes that the cells portray.
Local Operations:
Local operations, or per-cell functions, compute a raster output dataset where the output
value at each location (cell) is a function of the value associated with that location on one
or more raster datasets. That is, the value of the single cell, regardless of the values of
neighboring cells, has a direct influence on the value of the output. A per-cell operation
can be applied to a single raster dataset or to multiple raster datasets. For a single
dataset, examples of per-cell operations include the trigonometric tools, for example, Tan,
or the logarithmic tools.
Local operations can also be performed on multiple input rasters. In this case, a single
value will be returned for each cell based on some operation being applied to the
corresponding cell in each of the input rasters. An example of this type of operation is
using the Cell Statistics tool: for each output cell, a statistical calculation (such as the
mean or range) is performed on the cell values of all the input rasters at that
corresponding location.
19
Local operations: value of an output cell determined by a single input cell
➢ A common term for local operations with multiple input rasters is map algebra, a
term that refers to algebraic operations with raster map layers.
➢ Besides mathematical functions that can be used on individual rasters, other
measures that are based on the cell values or their frequencies in the input rasters
can also be derived and stored on the output raster of a local operation with
multiple rasters.
20
Focal operations (Neighborhood Operations)
Focal, or neighborhood, operations produce an output raster dataset in which the output
value at each cell location is a function of the input value at a cell location and the values
of the cells in a specified neighborhood around that location. As each cell in the input is
processed, the neighborhood is essentially a moving window that shifts along with it. The
configuration (size and shape) of the neighborhood determines specifically which cells
surrounding the processing cell should be used in the calculation of each output value.
The most typical neighborhood is 3 by 3 cells, which incorporates the processing cell and
its closest eight neighbors.
➢ A neighborhood operation involves a focal cell and a set of its surrounding cells.
The surrounding cells are chosen for their distance and/or directional relationship
to the focal cell.
➢ Common neighborhoods include rectangles, circles, annuluses, and wedges.
21
Focal operations: value of the output cell is determined by the cells in a specified
neighborhood around each input cell.
Zonal operations
Zonal operations compute an output raster dataset where the output value for each
location depends on the value of the cell at the location and the association that location
has within a cartographic zone. Zonal operations are similar to focal operations except
that the definition of the neighborhood in a zonal operation is the configuration of the
zones themselves, not a specified neighborhood shape. Individual zones can be of any
shape or size and can be disconnected from each other. Zones can be defined either as
22
raster or feature data. For raster data, a zone is all cells with the same value. For feature
data, a zone is all features with the same attribute value (LandClass = 4, for example).
➢ A zonal operation works with groups of cells of same values or like features. These
groups are called zones. Zones may be contiguous or noncontiguous.
➢ A zonal operation may work with a single raster or two rasters.
➢ Given a single input raster, zonal operations measure the geometry of each zone
in the raster, such as area, perimeter, thickness, and centroid.
➢ Given two rasters in a zonal operation, one input raster and one zonal raster, a
zonal operation produces an output raster, which summarizes the cell values in
the input raster for each zone in the zonal raster.
Zonal operations: value of each output cell determined by all the input cells of the same
zone. An example zonal operation is to return the mean (average) of values from the first
dataset that fall within a specified zone of the second.
23
Re-Sampling
When you go from 5-meter cell size to 10-meter cell size, the output raster grid cell size
will be different. When converting raster data between different coordinate systems, cell
centers don’t match. In both situations, a resampling approach must be taken to specify
how the output grid will take shape. But it’s not always an easy choice which resampling
method to use because there’s more than one way to recalculate cell values.
We’ll highlight which resampling technique is appropriate to use in given scenarios. We’ll
also touch on how we use these resampling methods in a GIS environment. There are
four common ways to resample raster grids in GIS.
1. Nearest Neighbor
2. Bilinear
3. Cubic Convolution
4. Majority
1. Nearest Neighbor Resampling
The nearest neighbor technique doesn’t change any of

the values from the input raster data set. It takes the cell
24
center from the input raster data set to determine the closest cell center of the output
raster. For processing speed, it’s generally the fastest because of its simplicity.
Because nearest neighbor resampling doesn’t alter any values in the output raster data
set, it is ideal for categorical, nominal, and ordinal data.
When you resample this type of data, you should use the nearest neighbor resampling.
For example, if you have a land cover classification raster grid, nearest neighbor will take
the cell center value.
If agriculture has a discrete value of 7, the nearest neighbor method will never assign it a
value of 7.2. It simply involves taking the output value from the nearest input layer cell
center.
2. Bilinear Interpolation
Bilinear interpolation is a technique for calculating

values of a grid location based on four nearby grid
cells. It assigns the output cell value by taking the
weighted average of the four neighboring cells in
an image to generate new values.
It smooths the output raster grid, but not as much as cubic convolution. It’s useful when
working with continuous data sets that don’t have distinct boundaries.
For example, noise distance rasters don’t have discrete limits. In this case, this type of
data varies continuously cell-to-cell to form a surface.
3. Cubic Convolution Interpolation
Cubic convolution interpolation is similar to bilinear

interpolation in that it takes the average of surrounding
cells. Instead of using the four nearest cells, the output
value is based on averaging the 16 nearest cells. As a
result, processing time tends to increase for this method. This method is generally used
for continuous surfaces where much noise exists. Because it takes more neighboring
25
cells compared to bilinear resampling, it’s good for smoothing data from the input raster
grid.
Generally, we use cubic convolution much less than bilinear interpolation. In particular,
it’s good for noise reduction. For example, a synthetic aperture radar image might benefit
from cubic convolution interpolation technique because it reduces noise which is
commonly seen in radar.
4. Majority Resampling
While the nearest neighbor resampling takes

the cell center from the input raster data, the
majority algorithm uses the most common
values within the filter window.
Similar to the nearest neighbor algorithm, this technique is commonly used for discrete
data like land cover classification and other types of raster grids with distinct boundaries.
For example, if the filter window finds 3 cells of agricultural land cover and 2 cells of road,
the output data set will be classified as agriculture. This is because the agriculture land
cover class is the most popular cell within the filter window. When compared with nearest
neighbor resampling, the resulting data set will often be smoother.
Raster Resampling: The Main Takeaway
Image processing has become more important to create images at different resolutions
and coordinate system conversions. This is why we use image resampling techniques
like the nearest neighbor, bilinear interpolation, cubic convolution, and majority
interpolation.
In GIS, nearest neighbor resampling does not change any of the values of the output
cells from the input raster dataset. This makes the nearest neighbor suitable for discrete
data like land cover classification maps. While nearest neighbor resampling took the cell
center from the input raster data set, majority resampling is based on the most common
values found within the filter window.
26
The bilinear interpolation technique works best for continuous data. This is because
output cells are calculated based on the relative position of the four nearest values from
the input grid.
When you have even more noise in the input raster grid, this is when cubic convolution
can be more advantageous. It smooths out the output grid because it takes the 16 nearest
cells from the input data set.
Mosaic and Clip
A mosaic is a combination or merge of two or more images.
A mosaic combines multiple raster images to obtain a seamless raster. Here are some of
the tools available to help mosaic raster datasets:
➢ Mosaic Tool
➢ Raster Catalog
➢ Merge Raster
Let’s take a look at how to stitch together multiple raster images. This can be anything
from satellite imagery, digital elevation models to land cover.
Mosaic Tool
The Mosaic Tool accepts one or more input rasters. Then, it merges them into a complete
raster mosaic. There are several variations of this tool:
27
Mosaic Raster: This appends raster into an existing raster dataset.

Mosaic to New Raster: This merges raster and creates a new raster dataset.
Some of the options include:
➢ Compression type
➢ File format
➢ Ignore background and No Data values
Raster Catalog
Raster catalogs are like containers, specifically designed for large raster datasets. But
they’re more like pointers because they just reference existing raster datasets. They don’t
actually store the large files within.
They also have the advantage of building

footprints, which are extents for each
dataset. Then, it’s possible to export these
footprints to display a vector coverage of all
available imagery. If you want to produce a
mosaic, you can use the tool – Raster
Catalog to Raster Dataset. This is how to get
into a usable GIS format like GeoTIFF,
JPEG, or IMG files.
28
Common Errors and Issues for Mosaic Raster
Incorrect raster inputs: A common error when mosaicking is ensuring that your inputs
are consistent.
➢ Input raster datasets must have the same bit depth (such as 8-bit images).
➢ They also must have the same number of bands. This means you can’t mosaic a
single band raster with a 3-band raster
Black image border: Another common error is having a black surrounding border around
your image. This is especially common for drone images and after georeferencing raster
datasets.
If it’s pure black, a quick fix is to set the transparent values in your symbology to (0, 0, 0).
But the problem with this method is that anywhere there is pure black in your image, it will
become transparent too. Also, it won’t work if it’s slightly different from pure black.
The best way to remove the black area in your image is to build footprints in a raster
catalog. You can manually adjust each footprint, then and set the property to clip each
footprint. Alternatively, you can export footprints, and use the Raster Clip Tool.
Clip: In GIS, to clip is to overlay a polygon on one or more target features (layers) and
extract from the target feature (or features) only the target feature data that lies within the
area outlined by the clip polygon. In other words, the boundaries of the second polygon
are imposed on the first polygon.
29
Clip Tool
The Clip Tool cuts out an input layer to a defined feature boundary. Like a cookie-cutter,
the output is a new clipped output. The clipping layer must be a polygon. But the input
layer can be points, lines, or polygons. The Clip Tool and Intersect Tool achieve the same
results. But the main difference from the Intersect Tool is how it only retains the attributes
from the input layer. Whereas, the Intersect Tool preserves attributes from both tables
input tables.
The Clip Tool is not just for vector
➢ You can also clip raster data, using a polygon, graphic, or even a data frame.
➢ If you need help, we have a tutorial on how to clip rasters with two different
techniques.
Some of the advantages are:
➢ Save time by working with a subset of data.

➢ Perform area and summary statistics to your area
of interest.
➢ Improves cartographic output by clipping imagery
to a specific extent.
Common errors and troubleshooting:
It’s always a good idea to visually spot-check a few areas after running the Clip Tool. If
your output isn’t what you expected, these are some of the common errors and how to fix
them.
30
Missing Output: If data is missing in the output, it’s often because you selected a record
selected before running the process. In this case, you will have to unselect the data and
run it again.
Same Projection: When the output is shifted, this can occur from having datasets in two
different protections. Try projecting your data, and running the tool again.
Repair Geometry: If no records are in the output, you can try to repair the geometry
beforehand. I’ve also seen even exporting to a shapefile fix this error.
We also have a range of troubleshooting specific to 999999 errors in ArcGIS, which is

Esri’s generic error code.
Distance Measurement
Distance is measured between two points. The two points used when measuring distance
on a raster (or grid) are the centers of two specific cells. That is, in the case of the raster,
the measurements are made from cell center to cell center.
If the cell size is 50, the distance shown would be 200, spanning all or parts of five cells.
For another example, if you had a raster that had cells of size 10 and 1000 columns, the
longest east-west distance measurement would be 9990.
The most fundamental “distance raster” is one in which there is a single “source” cell and
all other cells indicate the distance from that cell.
Measurements are made from cell center to cell center, regardless of intervening cells or
the angle of the line connecting the two cells of interest. The distance is calculated by the
law of Pythagoras, who, despite having lived 200 years before Euclid, determined that
the hypotenuse of a right triangle is the square root of the sums of the squares of the
other two sides. i.e Hypotenuse formula = √((base)2 + (height)2) (or) c = √(a2 + b2).
31
Distance raster analysis is to perform analysis with the distance between each cell and
its neighboring cells (source), and show the spatial relationship of them. Both the raster
surface distance and cost will be taken into consideration. Distance analysis can be used
to get many useful information, guide resource management and planning, such as to
find the distance from the earthquake region to the nearest hospital, and the estimating
of the service area for the chain supermarket. The distance raster analysis functionality
provided by the application including generate distance raster, calculate shortest path
and calculate the shortest path between two points.
1. Generate distance raster

a. Distance raster
b. Direction raster
c. Allocation raster
2. Calculate shortest path
3. Calculate the shortest path between two points
4. Raster cost distance
Generate distance raster
Generate distance raster is used to calculate the distance between each cell and the
source data in a raster data. The result got can be used to resolve three problems:
I. The distance between each cell and
the nearest source data, for example,
the distance to the nearest school.
II. The direction between each cell and
the nearest source data, for example,
the direction to the nearest school.
III. The cells to allocate to the source data
according to the spatial distribution,
such as the location of the several
nearest schools.
Three kinds of datasets will be got:
distance raster, direction raster and allocation raster. As shown in figure.
32
Distance raster
Distance raster including straight-line distance raster and cost distance raster.
1. Straight-line distance raster
The value of straight-line distance raster represents the Euclidean distance (straight-line
distance) between a cell and its nearest source. The cost of the straight-line distance can
be considered as the distance, it is the simplest cost. Straight line distance raster does
not consider cost, that is the path has no barrier or has the same cost. The source data
to create straight-line distance raster can be vector data (point, line, region) or raster data.
The results including straight-line distance raster dataset, straight-line direction raster
dataset and allocation raster dataset.
A straight-line distance raster is the result by calculating the distance between each cell
to the closest source. If the coordinates of point A are (x1,y1) and those of point B are
(x2,y2), the Euclidean distance between the two points is:
2. Cost distance
The value of cost distance raster represents the cost value between the cell to the nearest
source (it can contain all kinds of cost factors, or the weighting of all interested cost
factors). For example, the distance traveled when coming over the mountain is small, but
the time cost may be larger than bypassing it. On the other side, there are different kinds
of ground overlays, and generally, it is impossible to reach the source with the straight
line path, and you must make a detour to avoid barriers such as rivers and mountains, so
the cost distance can be viewed as the extension of straight line distance. Since the
distance raster records the distance between each cell and the nearest source, it can be
used for site selection analysis with the condition "the distance between the point and the
nearest resource point is less than certain value".
33
Direction raster
Direction raster represent azimuth direction between each raster cell and its nearest
source. It all contains straight line direction raster and cost direction raster. The value of
straight-line direction raster represents the azimuth form a cell to its nearest source, the
unit is degree. The north is 0 degree, makes clockwise rotation, the range is 0-360 degree.
For example, if the nearest source is to the east of the cell, the value of the cell is 90
degrees.
The value of the cost direction raster represents the direction from a cell to its nearest
source with the least cost path. As shown below, for the raster data shown in figure 1, the
least cost path forms each cell to the source (represented by small red flag) is identified
with arrows, since the values represent the direction is shown as figure 2, the direction
values for the cells in figure 1 is as fig3, fig3 is the cost direction raster of the raster data
in fig1. The value of the source cell in the cost direction raster dataset is 0. The cells in
the direction dataset with the value 0 are not all sources, for example, if a cell has no
value in the input raster dataset, it's value will also be 0 in the output cost direction dataset.
Allocation raster
Allocation raster is to allocate spatial resource (raster cells) to different source objects,
for example, representing the service areas of post offices.
Allocation raster including straight line allocation raster and cost allocation raster.
Allocation raster is also called service raster, its raster value is the value of the nearest
source, so you can know which source is nearest to each cell from the allocation raster.
34
When calculating straight-line distance, the nearest source is determined by the straight-
line distance; when calculating cost distance, the nearest source is determined by the
cost distance between the cell and the source.
The figure below shows the raster data result got by performing the generating cost
distance raster, in the analysis, the source data is a point dataset, and the DEM data is
the cost data.
Calculate shortest path
Shortest path analysis is to calculate the shortest path to the nearest source based on
the target point data and the distance/direction raster created with the generate distance
raster function. For example, the shortest path from a point in the suburbs to the nearest
market (target data).
For example, analyze how to get to the nearest shopping mall (point data set) from each
residential plot (area dataset). Firstly, the shopping mall is taken as the source, and the
consumption raster and consumption direction raster are generated. The shortest path
35
analysis is carried out on the basis of the generated consumption distance raster and the
consumption direction raster, and the residential area can be obtained to the nearest
shopping mall (source) of the shortest path.
There are three modes for calculating the shortest path:
1. Cell path: A path is generated for each grid cell, connecting

that cell with the closest source. As shown in the following
figure, the red dot is used as the source and the black box
polygon is used as the target. The shortest path of the raster
is analyzed and the shortest path represented by the blue
cell is obtained. Figure: Cell path analysis
2. Zone path: A path is generated for each cell zone. A cell

zone is composed of contiguous cells with equal values.
A path for a target zone is the least-cost path from the
zone to the closest source. As shown in the following
figure, the red dot is used as the source and the black
box polygon is used as the target. The shortest path of
the raster is analyzed and the shortest path represented
by the blue cell is obtained.
3. Single path: Only one path is generated for all grid cells.
This path is the one with the least cost among all the paths
connecting the entire target area dataset. As shown in the
following figure, the red dot is used as the source and the
black box polygon is used as the target. The shortest path
of the raster is analyzed and the shortest path represented
by the blue cell is obtained.
36
Calculate the shortest path between two points
Calculate the shortest path between the source point and the target point. Calculate the
shortest surface path, least cost path and least cost path considering the surface
distance.
Cost distance
It is needed to specify the cost raster when calculating least cost path. Cost raster is used
to specify the cost needed for passing each cell. The value of a cell represents the cost
of passing one unit of this cell. For example, a cost raster represents the cost for car in
different ground environment, the value of a cell represents the resistance value for travel
1 kilometer in the cell, so the total cost for car to pass the cell is the cost value (cell value)
multiply by the size of the cell. The unit of the cost raster can be of any unit, such as
length, time, price, etc., or the cost can have no unit, such as the slope, exposure and
land use after reclassify. Usually, in research, there may be many factors that influence
the cost, for example, in the planning of a new road, the factors that influence the cost
may include the total length, the land use of the passed area, slope, the distance to the
population concentration area, etc. It is needed to weight the factors to get a
comprehensive weight as the cost data. Note that the cost cannot be a negative value.
5.2. Spatial Interpolation Techniques
Spatial Interpolation Techniques
Interpolation is the process of mapping a variable point at unsampled locations using a

set of samples of known location and point value. It is a process of creating a surface
based on values at isolated sample points. Sample points are locations where we collect
data on some phenomenon and record the spatial coordinates. We use mathematical
estimation to “guess at” what the values are “in between” those points. We can create
either a raster or vector interpolated surface. Interpolation is used because field data are
expensive to collect, and can’t be collected everywhere.
In another words, it is the procedure of estimating the values of properties at in sampled

sites within an area covered by existing observations. It predicts values for cells in a raster
37
from a limited number of sample data points. It can be used to predict unknown values
for any geographic point data elevation, rainfall, temperature, chemical dispersion, noise
level or other spatially-based phenomena. Interpolation is commonly a raster operation,
but it can also be done in a vector environment using a TIN surface model. There are
several well-known interpolation techniques, including spline and kriging.
Kriging is a powerful type of spatial interpolation that uses complex mathematical

formulas to estimate values at unknown points based on the values at known points.
Spatial interpolation techniques can be divided into two main categories: deterministic
and geostatistical approaches. To put it simple, deterministic methods do not try to
capture the spatial structure in the data. They only make use of predefined
mathematical equations to predict values at unsampled locations (by weighing the
attribute values of samples with known location). On the opposing, geostatistical
approaches intend to fit a spatial model to the data. This enables to generate a
prediction value at unsampled locations (like deterministic methods) and to provide users
with an estimate of the accuracy of this prediction. Deterministic methods gather the TIN
(Triangular irregular networks), IDW (Inverse distance weighted) and Trend surface
analysis techniques. Geostatistical approaches include kriging and its variants.
➢ Interpolation uses vector points with known values to estimate values at unknown
locations to create a raster surface covering an entire area.
➢ The interpolation result is typically a raster layer.
➢ It is important to find a suitable interpolation method to optimally estimate values
for unknown locations.
Because of high cost and limited resources, data collection is usually conducted only in
a limited number of selected point locations. In GIS, spatial interpolation of these points
can be applied to create a raster surface with estimates made for all raster cells.
Spatial Interpolation May Be Used In GIS:
➢ To provide contours for displaying data graphically.

➢ To calculate some property of the surface at a given point.
38
➢ To change the unit of comparison when using different data structures in different
layers frequently used as an aid in the spatial decision-making process both in
physical and human geography and in related disciplines such as mineral
prospecting and hydrocarbon exploration.
Classification of Interpolation
Global Methods
➢ Single mathematical function applied to all points.

➢ tends to produces smooth surfaces.
Local Methods
➢ Single mathematical function applied repeatedly to subsets of the total observed

points.
➢ link regional surfaces into composite surface.
Exact method
➢ Honor all data points such that the resulting surface passes exactly through all
data points.
➢ Appropriate for use with accurate data.
Approximate methods
➢ Do not honor all data points.

➢ More appropriate when there is high degree of uncertainty about data points.
Spatial analysis is the process of manipulating spatial information to extract new

information and meaning from the original data. Usually, spatial analysis is carried out
with a Geographic Information System (GIS). A GIS usually provides spatial analysis tools
for calculating feature statistics and carrying out geoprocessing activities as data
interpolation. In hydrology, users will likely emphasize the importance of terrain analysis
and hydrological modelling (modelling the movement of water over and in the earth). In
wildlife management, users are interested in analytical functions dealing with wildlife point
39
locations and their relationship to the environment. Each user will have different things
they are interested in depending on the kind of work they do.
Spatial interpolation Methods
Spatial interpolation is the process of using points with known values to estimate values
at other unknown points. For example, to make a precipitation (rainfall) map for your
country, you will not find enough evenly spread weather stations to cover the entire region.
Spatial interpolation can estimate the temperatures at locations without recorded data by
using known temperature readings at nearby weather stations
(see figure_temperature_map). This type of interpolated surface is often called
a statistical surface. Elevation data, precipitation, snow accumulation, water table and
population density are other types of data that can be computed using interpolation.
Figure Temperature Map 1:
Temperature map interpolated from South African Weather Stations.
Because of high cost and limited resources, data collection is usually conducted only in
a limited number of selected point locations. In GIS, spatial interpolation of these points
can be applied to create a raster surface with estimates made for all raster cells.
40
In order to generate a continuous map, for example, a digital elevation map from elevation
points measured with a GPS device, a suitable interpolation method has to be used to
optimally estimate the values at those locations where no samples or measurements were
taken. The results of the interpolation analysis can then be used for analyses that cover
the whole area and for modelling.
There are many interpolation methods.
❖ Thiessen polygons.
❖ Triangulated Irregular Networks (TINS)
❖ Spatial moving average
❖ Trend surfaces
❖ Kriging
❖ Inverse distance weighting (IDW)
Inverse Distance Weighted (IDW)
In the IDW interpolation method, the sample points are weighted during interpolation such
that the influence of one point relative to another declines with distance from the unknown
point you want to create (see figure_idw_interpolation).
Figure IDW Interpolation 1:
Inverse Distance Weighted interpolation based on weighted sample point distance (left).
Interpolated IDW surface from elevation vector points (right). Image Source: Mitas, L.,
Mitasova, H. (1999).
41
Weighting is assigned to sample points through the use of a weighting coefficient that
controls how the weighting influence will drop off as the distance from new point
increases. The greater the weighting coefficient, the less the effect points will have if they
are far from the unknown point during the interpolation process. As the coefficient
increases, the value of the unknown point approaches the value of the nearest
observational point.
It is important to notice that the IDW interpolation method also has some disadvantages:
the quality of the interpolation result can decrease, if the distribution of sample data points
is uneven.
Triangulated Irregular Network (TIN)
TIN interpolation is another popular tool in GIS. A common TIN algorithm is

called Delaunay triangulation (The Delaunay triangulation ensures that no vertex lies
within the interior of any of the circumcircles of the triangles in the network. If the Delaunay
criterion is satisfied everywhere on the TIN, the minimum interior angle of all triangles is
maximized.). It tries to create a surface formed by triangles of nearest neighbor points.
To do this, circumcircles around selected sample points are created and their
42
intersections are connected to a network of non-overlapping and as compact as possible

triangles.
Figure TIN Interpolation 1:
Delaunay triangulation with circumcircles around the red sample data. The resulting
interpolated TIN surface created from elevation vector points is shown on the right. Image
Source: Mitas, L., Mitasova, H. (1999).
The main disadvantage of the TIN interpolation is that the surfaces are not smooth and
may give a jagged appearance. This is caused by discontinuous slopes at the triangle
edges and sample data points. In addition, triangulation is generally not suitable for
extrapolation beyond the area with collected sample data points.
Thiessen polygon
In order to achieve accurate estimation of the spatial distribution of rainfall, it is necessary

to use interpolation methods, for this, the Thiessen* method is considered as the most
important in engineering praxis. This method assigns weight at each gauge station in
proportion to the catchment areathat is closest to that gauge. Average rain fall =
summation (representative area × rainfall of that gauge) / catchment area
43
This method is better than weighted average or arithmetic average since we consider the
rainguage which are out of the catchment basin. The method of constructing the
polygons implies the following steps:
1. Gauge network is plotted on map of the catchment area of interest.

2. Adjacent stations are connected with lines.
3. Perpendicular bisectors of each line are constructed (perpendicular line at the
midpoint of each line connecting two stations)
4. The bisectors are extended and used to form the polygon around each gauge
station.
5. Rainfall value for each gauge station is multiplied by the area of each polygon.
6. All values from step 5 are summed and divided by total basin area.
An example of spatial precipitation distribution according to Thiessen method can be
appreciated in the following figure.
44
Kriging
Kriging is the most commonly used geostatistical approach for spatial interpolation.
Kriging techniques rely on a spatial model between observations to predict attribute
values at unsampled locations. One of the specificities of kriging methods is that they do
not only consider the distance between observations but they also intend to
capture the spatial structure in the data by comparing observations separated by
specific spatial distances two at a time. The objective is to understand the relationships
between observations separated by different lag distances. All this knowledge is
accounted for in the variogram. Kriging methods then derive spatial weights for the
observations based on this variogram. It must be noted that kriging techniques will
preserve the values of the initial samples in the interpolated map.
Kriging methods consider that the process that originated the data can be divided into
two major components: a deterministic trend (large scale variations) and an
autocorrelated error (the remainders).
Z(s)=m+e(s)
Where Z(s) is the attribute value at the spatial position s, m is the deterministic trend that
does not depend on the location s of the observations and e(s) is the autocorrelated error
term (which depends on the spatial position s)
Trend Surface Analysis
Trend surface analysis intend to fit a trend to the data. In other words, the objective is to
find the equation that best matches with the attribute values of the available data. Trend
surface analysis often involves multiple linear regressions or polynomial functions to find
a relationship between the predicted variable and a combination of the spatial
coordinates.
This approach is a global technique because it considers all the data available to put into
place the linear regressions or the polynomial functions. More local approaches could be
implemented to improve the prediction accuracy. Note that the trend surface analysis can
suffer from outlying values. In fact, an abnormal observation is likely to severely influence
45
the proposed model and might lead to a really bad fit to the data. To make the trend
surface analysis more reliable, there is a need to calibrate and then to validate the
selected regressions or polynomial functions. Indeed, a subset of the samples should be
used to infer a specific model and then, a independent set of samples (or at least the
remaining samples in the initial set) should be used to validate the proposed model. The
use of both training and validation datasets is fundamental to evaluate the reliability and
accuracy of the proposed model.
Common problems / things to be aware of
It is important to remember that there is no single interpolation method that can be applied
to all situations. Some are more exact and useful than others but take longer to calculate.
They all have advantages and disadvantages. In practice, selection of a particular
interpolation method should depend upon the sample data, the type of surfaces to be
generated and tolerance of estimation errors. Generally, a three-step procedure is
recommended:
1. Evaluate the sample data. Do this to get an idea on how data are distributed in the
area, as this may provide hints on which interpolation method to use.
2. Apply an interpolation method which is most suitable to both the sample data and
the study objectives. When you are in doubt, try several methods, if available.
3. Compare the results and find the best result and the most suitable method. This
may look like a time-consuming process at the beginning. However, as you gain
experience and knowledge of different interpolation methods, the time required for
generating the most suitable surface will be greatly reduced.
Geo-Statistics
Geo-Statistics is the study of statistics with a focus on spatial and temporal information.
The aim is to model and find patterns of geographic phenomena.
The field of geo-statistics covers a wide range of spatial statistical topics such as:
❖ Semi-variograms to characterize the spatial pattern in the data (A variogram is a

description of the spatial continuity of the data. The experimental variogram is a
46
discrete function calculated using a measure of variability between pairs of points

at various distances)
❖ Kriging for spatial prediction
❖ Standard error to measure uncertainty about unsampled values
Semi-variograms
Geo-statistics provides descriptive tools like semi-variograms to identify underlying trends

in spatial phenomena. According to Tobler’s First Law of Geography, closer things
are more related than things farther away. This is also the main idea for the concept
of spatial autocorrelation.
The semi-variogram graphs out all pairs of data according to distance. Observations
closer together have a higher correlation. But at a certain distance (range), there is no
longer a relationship between points that are close together.
The semi-variogram depicts the relationship until it hits the sill where further samples are
no longer correlated. The purpose is to fit a mathematical function that models the trend
in your semi-variogram.
47
Kriging Interpolation
Kriging is an interpolation technique that leverages the spatial correlation between

samples to predict values at unsampled locations. But the main difference is that you can
build it using the mathematical function obtained from the semi-variogram.
Here are the different types of kriging available in geo-statistics:
Co-Kriging adds a second related variable so you can improve the prediction with
secondary information. For example, to predict precipitation change in mountainous
areas, you can add elevation data as a covariate to rainfall amounts.
Empirical Bayesian Kriging (EBK) can help by treating local variance separately.
Instead of having a similar variance to a whole extent, EBK performs kriging as a separate
underlying process in different areas. It still performs kriging, but it is done locally.
Universal Kriging adds trend surface analysis (or drift) with ordinary kriging by
accounting for trends.
Indicator Kriging carries through ordinary kriging with binary data (0 and 1) such as
urban and non-urban cells.
Probability Kriging uses binary data (similar to indicator kriging) and estimates unknown
points for a series of cutoffs.
This example shows the spatial prediction model from kriging.
48
Standard Error
Geo-statistics is advantageous because it assesses uncertainty for unsampled values

with a standard error surface map. A standard error map represents a measure of
confidence of how likely that prediction will be true.
Standard error assesses the robustness of your kriging model. By comparing actual
versus predicted values, it assesses uncertainty by building a surface of residuals.
In general, you get a higher standard of error when you have a sparse number of
observations. When error exceeds a critical threshold, expert knowledge can contribute
to the process of the variogram.
Applications and Uses

Geo-statistics was originally developed for the mining industry to estimate and manage
ore and mineral resources. But geo-statistics applies to various types of spatial
phenomena with local variation. For example, we use it to:
❖ Predict weather, climate, pollution, and other atmospheric phenomena.

❖ Assess soil attributes and chemistry which vary at all scales.
❖ Measure the abundance of fish for a sustainable population in fisheries.
49
Geo-statistics is an emerging field of study in engineering, geophysics, and most natural

phenomena.
GIS modeling
A GIS is a tool that can process, display, and integrate different data sources including
maps, digital elevation models (DEMs), GPS (global positioning system) data, images,
and tables. A GIS can be used to build a vector-based or raster-based model.
Modelling is a simplified version of a concept. It is a simplified representation of a

phenomenon or a complex system into simple and understandable concept of real
world. It is a graphical, mathematical, physical, or verbal representation of a concept,
phenomenon, relationship, structure, system, or an aspect of the real world. Therefore, it
can be said that modelling is a representation of reality in either physical form or symbolic
form. A modeling may have objectives of:
a. To facilitate understanding by eliminating unnecessary components,

b. To aid in decision making by simulating 'what if' scenarios,
c. To explain, control, and predict events on the basis of past observations.
Since most phenomenon is very complicated and much too complex, a model contains
only those features that are of primary importance to the model maker's purpose.
Modeling is a set of rules and procedures for representing a phenomenon or predicting

an outcome. A data representation of reality, such as the vector data model. When we
perform geoprocessing tasks on our data, we are developing the components of a GIS
model.
Static and Dynamic
Static modeling is the series of steps required to achieve some final result. And dynamic
modeling is performed in a similar fashion, but has additional parameters requiring
several iterations of the model.
A model is a simplified representation of a phenomenon or a system. Models using

geographically referenced data are usually called “spatially explicit models.”
50
Classifications of GIS Models
❖ A model may be descriptive or prescriptive

❖ A model may be deterministic or stochastic.
❖ A model may be static or dynamic
❖ A model may be deductive or inductive
The Modeling Process
1. The first step is to define the goals of the model.

2. The second step is to break down the model into elements and to define the
properties of each element and the interactions between the elements. A flowchart
is a useful tool for linking the elements.
3. The third step is the implementation and calibration of the model.
4. The fourth step is to validate the model before it can be generally accepted.
The Role of GIS in Modeling
1. A GIS is a tool that can process, display and integrate different data sources
including maps, digital elevation models, GPS data, Image and tables.
2. A GIS can be used to build a vector-based or raster-based model.
3. A GIS has algorithms for conversion between vector and raster data.
4. The process of modeling may take place in a GIS or use a GIS to link other
computer programs.
General types of Models
❖ Structural Model
Structural model focuses on the composition and construction of things. There are
two types of structural models:
▪ Object Model: This type of model forms a visual representation of an item.
Characteristics include scaled, 2 or 3dimensional, symbolic representation.
For example: an architect's blueprint of a building.
51
▪ Action Model: It tracks the space/time relationships of items. Characteristics

include change detection, transition statistics and animation. For example: a
model train along its track.
❖ Relational Model
Relational model focuses on the interdependence and relationships among
factors. There are two types of Relational models:
▪ Functional Model: This model is based on Input / Output method. It tracks
relationships among variables, such as storm runoff prediction. Characteristics
include cause/effect linkages and sensitivity analysis.
▪ Conceptual Model: It is perception-based. It incorporates both fact
interpretation and value weights, such as suitability for outdoor recreation.
Characteristics include heuristics (expert rules) and scenarios.
Types of GIS Model
Binary Models
➢ A binary model uses logical expressions to select spatial features from a composite
feature layer or multiple rasters. The output of a binary model is in binary format:
1 (true) for spatial features that meet the selection criteria and 0 (false) for features
that do not.
➢ Siting analysis is probably the most common application of the binary model.
Index Models
➢ An index model calculates the index value for each unit area and produces a
ranked map based on the index values.
➢ An index model is similar to a binary model in that both involve multicriteria
evaluation and both depend on overlay operations for data processing. But an
index model produces for each unit area
➢ an index value rather than a simple yes or no.
The Weighted Linear Combination Method
52
➢ The weighted linear combination method is a common method for computing the
index value.
➢ The method involves evaluation at three levels. First, the relative importance of
each criterion, or factor, is evaluated against other criteria. Second, data for each
criterion are standardized. Third, the index value is calculated for each unit area
by summing the weighted criterion values and dividing the sum by the total of the
weights.
Regression Models
➢ A regression model relates a dependent variable to a number of independent

(explanatory) variables in an equation, which can then be used for prediction or
estimation.
➢ A regression model can use overlay operations in a GIS to combine variables
needed for the analysis.
➢ There are two types of regression model: linear regression and logistic
regression.
Process Models
➢ A process model integrates existing knowledge about the environmental

processes in the real world into a set of relationships and equations for quantifying
the processes.
➢ Environmental models are typically process models because they must deal with
the interaction of many variables including physical variables such as climate,
topography, vegetation, and soils as well as cultural variables such as land
management.
GIS Models
When Geographical Information System (GIS) is used in the process of building models
with spatial data, it is called as GIS modelling. GIS modelling involves symbolic
representation of Locational properties (Where?), as well as Thematic (What?) and
Temporal (When?) attributes describing characteristics and conditions with reference to
space and time. There are two types of GIS model:
53
❖ Cartographic Model: It is automation of manual techniques, which traditionally

use drafting aids and transparent overlays, such as a map identifying locations of
productive soils and gentle slopes using binary logic expressed as a geo-query.
❖ Spatial Model: Spatial model is expression of mathematical relationships among
mapped variables, such as a map of crop yield throughout a field based on relative
amounts of phosphorous, potassium, nitrogen and ph levels using multi-value logic
expressed as variables, parameters and relationships.
Elements of GIS Modelling:
A GIS model must have following elements:
➢ A set of selected spatial variables

➢ Functional / mathematical relationship between variables.
A model is related to exploratory data analysis, data visualization and data base
management. GIS model can be vector based and raster based.
5.4 GIS programming and customization: Opening and exploring Model Builder,
Python script tools, Customizing QGIS with Python
Will be update very soon ………………
54

Unite 5

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unite 5

Uploaded by

Copyright:

Available Formats

CSC468 Geographical Information System | Prithivi Narayan Campus | Dev Timilsina

Geoprocessing with Raster

Single Layer Analysis

Multiple Layer Analysis

A raster dataset can also be clipped similar

Fig: Clipping a Raster to a Vector Polygon Layer

Types of problems addressed by suitability analysis include the following:

1. Where to site a new housing development?

The following lists the general steps to perform overlay analysis:

1. Define the problem.

1. Define the problem

2. Break the problem into sub-models

3. Determine significant layers

Common scales can be predetermined, such as a 1 to 9 or a 1 to 10 scale, with the higher

7. Select the best locations

A buffer zone around vector polylines.

Figure Variable Buffer 1: Buffering rivers with different buffer distances.

Buffering rivers with different buffer distances.

River Adjacent land use Buffer distance (meters)

Multiple buffer zones

Figure Multiple Buffers 1:

Buffering a point feature with distances of 10, 15, 25 and 30 km.

Buffering with intact or dissolved boundaries

Buffering outward and inward

Common problems / things to be aware of

A buffer distance always has to be defined as a whole number (integer) or a decimal

More spatial analysis tools

Figure Overlay Operations 1:

Typical spatial overlay examples are:

In general, a network is a system of interconnected linear features through which

Network modeling in vector GIS

5.2 Raster Analysis

➢ Raster data analysis is based on cells and rasters.

The types of operations in Spatial Analyst

➢ Those that work on single cell locations (local operations)

Local operations: value of an output cell determined by a single input cell

Focal operations (Neighborhood Operations)

1. Nearest Neighbor Resampling

The nearest neighbor technique doesn’t change any of

Bilinear interpolation is a technique for calculating

3. Cubic Convolution Interpolation

Cubic convolution interpolation is similar to bilinear

While the nearest neighbor resampling takes

Raster Resampling: The Main Takeaway

Mosaic and Clip

A mosaic is a combination or merge of two or more images.

Mosaic Raster: This appends raster into an existing raster dataset.

They also have the advantage of building

Common Errors and Issues for Mosaic Raster

The Clip Tool is not just for vector

Some of the advantages are:

➢ Save time by working with a subset of data.

Common errors and troubleshooting:

We also have a range of troubleshooting specific to 999999 errors in ArcGIS, which is

1. Generate distance raster

Generate distance raster

1. Straight-line distance raster

Calculate shortest path

There are three modes for calculating the shortest path:

1. Cell path: A path is generated for each grid cell, connecting

2. Zone path: A path is generated for each cell zone. A cell

Calculate the shortest path between two points

5.2. Spatial Interpolation Techniques