Download as pdf or txt
Download as pdf or txt
You are on page 1of 82

CHAPTER 1 INTRODUCTION: OVERVIEW AND GIS SOFTWARE 1-1

1.1 Definitions of GIS ........................................................................... 1-1


1.1.1 GIS as a Toolbox ......................................................................... 1-1
1.1.2 GIS as an Information System ...................................................... 1-1
1.1.3 GIS as an Approach to Science ...................................................... 1-2
1.1.4 GIS is a Multibillion-Dollar Business ............................................... 1-2
1.1.5 GIS Plays a Role in Society ........................................................... 1-2
1.2 Components of GIS ........................................................................ 1-3
1.2.1 Computer Hardware ..................................................................... 1-4
1.2.2 GIS Software .............................................................................. 1-4
1.2.3 Organizational Context ................................................................. 1-5
1.3 Functional Components of GIS ......................................................... 1-5
1.3.1 Data Capture .............................................................................. 1-6
1.3.2 Data Storage and Maintenance ...................................................... 1-7
1.3.3 Data Manipulation and Analysis ..................................................... 1-8
1.3.4 Data Presentation ........................................................................ 1-9
1.4 Functions of GIS .......................................................................... 1-10
1.5 Geographical Features .................................................................. 1-10
1.7 Application of GIS ........................................................................ 1-11
1.8 Importance of GIS ....................................................................... 1-12
1.9 Historical Development of GIS ....................................................... 1-13
1.10 GIS Software ............................................................................... 1-15
CHAPTER 2 GIS AND MAPS ...................................................... 2-1
2.1 Maps and Map Characteristics .......................................................... 2-1
2.2 Cartography vs GIS ........................................................................ 2-3
2.3 Coordinate Systems ....................................................................... 2-4
2.3.1 Geographic Coordinate System ...................................................... 2-4
2.3.2 Shape of the Earth ....................................................................... 2-5
2.3.3 Datum ........................................................................................ 2-7
2.3.4 Projected Coordinate System ........................................................ 2-7
2.4 Map Projections ............................................................................. 2-8
2.4.1 Map Project Surfaces and Orientation ............................................. 2-9
2.4.2 Cylindrical Projection .................................................................. 2-11
2.4.3 Conical Projection ...................................................................... 2-11
2.4.4 Azimuthal (Planer) Projection ...................................................... 2-11
2.5 Common Map Projections .............................................................. 2-11
2.5.1 Universal Transverse Mercator (UTM) ........................................... 2-11

i
2.6 Geo-referencing ........................................................................... 2-11
2.7 Accuracy and Precision and Error ................................................... 2-12
CHAPTER 3 SPATIAL DATA MODEL ........................................... 3-1
3.1 Concept of Data Model .................................................................... 3-1
3.2 Raster Data Model.......................................................................... 3-1
3.2.1 Data Compression ....................................................................... 3-3
3.2.2 Indexing and Hierarchical Data Structures ...................................... 3-8
3.3 Vector Data Model .......................................................................... 3-6
3.3.1 Spaghetti Model .......................................................................... 3-7
3.3.2 Topology .................................................................................... 3-8
3.4 TIN Data Model ............................................................................ 3-13
3.4.1 Delauney Triangulation ............................................................... 3-15
3.4.2 TIN Data Structure .................................................................... 3-15
3.4.3 Advantages and Disadvantages of TIN .......................................... 3-16
3.5 Images ....................................................................................... 3-16
CHAPTER 4 DATA SOURCES ..................................................... 4-1
4.1 Sources of Spatial Data .................................................................. 4-1
4.2 Data Quality .................................................................................. 4-1
4.2.1 Data Quality and Components ....................................................... 4-1
4.2.2 Data Quality Standards ................................................................ 4-2
4.2.3 Sources of Error in Spatial Data ..................................................... 4-3
4.3 Major Data Feeds ........................................................................... 4-3
4.4 Data Formats ................................................................................ 4-3
4.5 Meta Data ..................................................................................... 4-4
4.5.1 Types of Metadata ....................................................................... 4-5
4.5.2 Metadata Standards ..................................................................... 4-5

CHAPTER 5 DATABASE CONCEPTS ........................................... 5-1


5.1 Database Concepts ............................................................................ 5-1
5.2 Database Types ............................................................................. 5-2
5.2.1 Flat File ...................................................................................... 5-2
5.2.2 Hierarchical File Structure ............................................................. 5-3
5.2.3 Networks .................................................................................... 5-3
5.2.4 Relational Database ..................................................................... 5-4
5.2.5 Object Oriented Databases ........................................................... 5-4
5.3 Relational Database ....................................................................... 5-4
5.3.1 Primary Key ................................................................................ 5-5
5.3.2 Foreign Key ................................................................................ 5-6

ii
5.3.3 Normalization .............................................................................. 5-6
5.3.4 Advantages and Disadvantages ..................................................... 5-7
5.4 Databases and GIS ........................................................................ 5-8
5.4.1 External DBMS ............................................................................ 5-8
5.4.2 Spatial Database Functionality ..................................................... 5-10

CHAPTER 6 SPATIAL ANALYSIS ............................................... 6-1


6.1 Measurement Functions .................................................................. 6-1
6.1.1 Measurement on Vector Data ........................................................ 6-1
6.1.2 Measurement on Raster Data ........................................................ 6-2
6.2 Overlay Functions .......................................................................... 6-3
6.2.1 Vector Overlay Functions .............................................................. 6-3
6.2.2 Raster Overlay Functions .............................................................. 6-4
Arithmetic Operators ................................................................................ 6-4
Comparison and Logical Operators ............................................................. 6-5
Conditional Expressions ............................................................................ 6-6
6.3 Classification/Reclassification Function .............................................. 6-7
6.4 Spatial Interpolation ....................................................................... 6-8
CHAPTER 7 SURFACE MODELING ............................................. 7-1
7.1 Digital Elevation Model (DEM) .......................................................... 7-1
7.2 Contouring .................................................................................... 7-1
7.3 Slope............................................................................................ 7-1
7.4 Aspect .......................................................................................... 7-1
7.5 Hillshade ....................................................................................... 7-1
7.6 Viewshed Analysis .......................................................................... 7-1
CHAPTER 8 HYDROLOGY MODELING ........................................ 8-1
8.1 Filled DEM ..................................................................................... 8-1
8.2 Flow Direction ............................................................................... 8-1
8.3 Flow Accumulation ......................................................................... 8-1
8.4 River/Stream Network .................................................................... 8-1
8.5 Watershed Boundary ...................................................................... 8-1
CHAPTER 9 MAKING MAPS ....................................................... 9-1
9.1 Map Functions in GIS...................................................................... 9-1
9.2 Basic Elements of Map .................................................................... 9-1
9.3 Types of Map ................................................................................. 9-2
9.4 Map Design ................................................................................... 9-2
9.5 Printing a Map ............................................................................... 9-3

iii
9.6 Exporting Map ............................................................................... 9-4
REFERENCES .............................................................................. 9-1

iv
CHAPTER 1 INTRODUCTION: OVERVIEW AND GIS SOFTWARE

Introduction and Overview of GIS and Software: [4 hrs]


Definition of a GIS features and functions; why GIS is important; how GIS is
applied; GIS as an Information System; GIS and cartography; contributing and
allied disciplines; GIS data feeds; historical development of GIS.

1.1 Definitions of GIS


GIS is an abbreviated form of Geographic Information System. Apart from this
Geographical Information System, Geospatial Information System, Geographic
Information Science, etc., are also used synonymously as an extended form of
GIS. There is no universally accepted definition of GIS, many authors have defined
it differently depending on the purpose for which GIS have been used. Some of
the selected definitions are presented here:
1.1.1 GIS as a Toolbox
Burrough defined GIS as “a powerful set of tools for collecting, storing and
retrieving at will, transforming and displaying spatial data from the real world”
(Burrough, 1986). This definition implies that GIS is a tool for geographic analysis.
This definition is often called the ‘toolbox definition’ of GIS because it stresses a
set of tools each designed to solve specific problems (Clarke, 1995). The toolbox
definition emphasizes the generic aspects of GIS and is frequently used by GIS
vendors who wish to maximize the size of their market (Healey, 1991).
Another definition in this category states that GISs are “automated systems for
the capture, storage, retrieval, analysis, and display of spatial data" (Clarke,
1995).
1.1.2 GIS as an Information System
As the name suggests, GIS is an information system that deals with the
geographic information. Star and Estes defined GIS as "an information system
that is designed to work with data referenced by spatial or geographic coordinates.
In other words, a GIS is both a database system with specific capabilities for
spatially-referenced data, as well as a set of operations for working with the data"
(Star and Estes, 1990). With this definition, GIS collects data, sorts them, and
selects and rebuilds them to find precisely the right piece of information to answer
a specific question. The reference to geographic coordinates is an important one,
because the coordinates are literally how we are able to link data with the map
(Clarke, 1995).
Dueker defined GIS as “a special case of information systems where the database
consists of observations on spatially distributed features, activities or events,
which are definable in space as points, lines, or areas. A geographic information
system manipulates data about these point, lines, and areas to retrieve data for
ad hoc queries and analysis” (Dueker, 1978). The phrase “special case of
information systems” implies that GIS has a heritage in information systems
technology. The database itself consists of a set of observations, which implies a
scientific approach to measurements. Scientists take measurements and record
those measurements in some kind of system to help them analyze the data. The

1-1
observations are spatially distributed; that is, they occur over space at different
times and at different locations at the same time (Clarke, 1995).
Database approach of defining GIS is probably the most widely used, because of
the influence of database theory and practice on GIS (Healey, 1991).
1.1.3 GIS as an Approach to Science
As a tool or as an information system, GIS technology has changed the entire
approach to spatial data analysis. GIS has already been compared to not one but
several simultaneous revolutionary changes in the way that data can be managed.
The convergence of GIS with allied technologies, those of surveying, remote
sensing, air photography, the global positioning system (GPS), and mobile
computing and communications has fed a spectacular growth of these
technologies.
As a result, the way of doing business – the standard operating procedure of
geographic and spatial information handling – has rapidly restructured itself. First,
the technology of GIS has become much simpler, more distributed, cheaper, and
has crossed the boundary into disciplines such as anthropology, epidemiology,
facilities management, forestry, geology, and business. Second, this mutation has
led to a culling of the body of knowledge that constitutes geography so that it is
suitable for use in these parallel fields as a new approach to science. Goodchild
called this “geographic information science” (Goodchild, 1992).
Goodchild defined geographic information science as “the generic issues that
surround the use of GIS technology, impede its successful implementation, or
emerge from an understanding of its potential capabilities” (Goodchild, 1992). It
involves both research on GIS and research with GIS (Clarke, 1995).
The discipline that deals with all aspects of the handling of spatial data and geo-
information is called geographic information science (often abbreviated to geo-
information science of just GIScience). Geo-Information Science is the scientific
field that attempts to integrate different disciplines studying the methods and
techniques of handling spatial information (de By and Huisman, 2009).
1.1.4 GIS is a Multibillion-Dollar Business
Groups monitoring the GIS industry estimate the total value of the hardware,
software, and services conducted by the private, governmental, educational, and
other sectors that handle spatial data to be billions of dollars a year. Furthermore,
for the last half decade of 1990s, and into the current decade, the industry has
seen double-digit annual growth. Anyone who attends a national or international
conference in the field can feel an overwhelming sense of rapid growth,
sophistication, and the sheer magnitude of the transformation that GIS has led
(Clarke, 1995).
“The growth of GIS has been a marketing phenomenon of amazing breadth and
depth and will remain so for many years to come. Clearly, GIS will integrate its
way into our everyday life to such an extent that it will soon be impossible to
imagine how we functioned before” (Clarke, 1995).
1.1.5 GIS Plays a Role in Society
Many people doing research on GIS have argued that defining GIS narrowly, as a
technology, as software, or as a science, ignores the role that GIS plays in
changing the way people live and work. Not only has GIS radically changed how

1-2
we do day-to-day business, but also how we operate within human organizations.
(Chrisman, 1999) has defined GIS as “organized activity by which people measure
and represent geographic phenomena, and then transform these representations
into other forms while interacting with social structures.” This definition has
emerged from an area of GIS research that has examined how GIS fits into society
as a whole, including its institutions and organizations, and how GIS can be used
in decision making, especially in a public setting such as a town meeting, or on a
community group (Clarke, 1995).
GIS in a nutshell therefore, can be defined as (1) a set of computer tools for
analyzing spatial data; (2) a special case of an information system designed for
spatial data; (3) an approach to the scientific analysis and use of spatial data; (4)
a multibillion-dollar industry and business; and (5) a technology that plays a role
in society (Clarke, 1995).
All definitions, however, have a single common feature, namely that GIS are
systems which deal with geographical information. The geographical (also called
locational) data element is used to provide a reference for the attribute (also called
statistical or non-locational) data element. In GIS, the geographical element is
seen as more important than the attribute element and this is one of the key
features which differentiates GIS from other information systems. The terms
‘spatial’ and ‘geographical’ are often used interchangeably to describe
geographical features. Similarly, the term ‘aspatial data’ is often used as a
synonym for ‘attribute data’ (Healey, 1991).
Everyone has their own favorite definition of a GIS (Table 1-1), and there are
many to choose from (Longley et al., 2004).
Table 1-1: Definitions of a GIS, and the groups who find them useful

Definition Groups
A container of maps in digital form The general public
A computerized tool for solving Decision makers, community groups,
geographic problems planners
A spatial decision support system Management scientists, operations
researchers
A mechanized inventory of Utility managers, transportation
geographically distributed features and officials, resource managers
facilities
A tool for revealing what is otherwise Scientists, investigators
invisible in geographic information
A tool for performing operations on Resource managers, planners
geographic data that are too tedious or
expensive or inaccurate if performed by
hand

1.2 Components of GIS


Geographical information systems have three important components1 – computer
hardware, sets of application software modules, and a proper organizational

1
Physical Components
1-3
context including skilled people – which need to be in balance if the system is to
function satisfactorily (Burrough, 1986: 12).
1.2.1 Computer Hardware
The computer hardware component may comprise of any type of computer
platform, including relatively modest personal computers, high performance
workstations and minicomputers and mainframe computers (Healey, 1991). In
addition to the standard input, storage and output devices, specialist peripherals
are required for data input (e.g. scanners, digitizers and tape drives), data output
(e.g. plotters or plotters) and sometimes, data storage (e.g. CD-ROMs, external
hard disk, network) and data processing.
Hardware is the computer system on which a GIS operates. Today, GIS software
runs on a wide range of hardware types, from centralized computer servers to
desktop computers used in stand-alone or networked configurations (Buckley,
1997).
1.2.2 GIS Software
The software component of GIS includes the program and the user interface for
driving the hardware (Chang, 2014). The software components of a GIS should
satisfy the four subsystems (refer to section 1.3) as well as be fully integrated
with the relevant hardware of the system (Buckley, 1997).
GIS software can range from a simple package designed for a PC and costing a
few hundred dollars, to a major industrial-strength workhorse designed to serve
an entire enterprise of networked computers, and costing tens of thousands of
dollars. New products are constantly emerging, and it is beyond the scope of this
book to provide a complete inventory (Longley et al., 2004).
GIS can be considered to be a data store (i.e. as system that stores spatial data),
a toolbox, a technology, an information source or a field of science. The main
characteristics of a GIS software package are its analytical functions that provide
means for deriving new geo-information from existing spatial and attribute data
(de By and Huisman, 2009).
All GIS packages available on the market have their own strengths and
weaknesses, typically resulting from the development history and/or intended
application domain(s) of the package. Some GISs have traditionally focussed more
on support for raster-based functionality, others more on vector-based spatial
objects. Any package that provides support for only rasters or only objects, is not
a complete GIS. Well known, full-fledged GIS packages include ILWIS,
Intergraph’s GeoMedia, ESRI’s ArcGIS, and MapInfo from MapInfo Corp (de By
and Huisman, 2009).
There is no particular GIS package which is necessarily ‘better’ than another one:
this depends on factor such as the intended application, and the expertise of its
user. ILWIS’s traditional strengths are in raster processing and scientific spatial
data analysis, especially in project-based GIS applications. Intergraph, ESRI and
MapInfo products have been known better for their support of vector-based spatial
data and their operations, user interface and map production (a bit more typical
of institutional GIS applications). Any such brief characterization, however, fails
to do justice to any of these packages, and it is only after extended use that their
strengths, and sometimes weaknesses, might become clear (de By and Huisman,
2009).

1-4
1.2.3 Organizational Context
The third physical component that is vital to the successful operation of a GIS is
the organizational context of the company or agency that has purchased a GIS.
As in all organizations dealing with sophisticated technology, new tools can only
be used effectively if they are properly integrated into the entire business strategy
and operation. To do this properly requires not only the necessary investments in
hardware and software, but also in the retraining and/or hiring of personnel to
utilize the new technology in the proper organizational context. Failure to
implement your GIS without regard for a proper organizational commitment will
result in an unsuccessful system (Buckley, 1997).
The infrastructure refers to the necessary physical, organizational, administrative,
and cultural environments that support GIS operations. The infrastructure includes
requisite skills, data standards, data clearinghouses, and general organizational
patterns (Chang, 2014).
Apart from the above three components, the other components namely data and
people are also equally important in GIS.
1.2.3.1 People
People refers to GIS professionals and users who define the purpose and
objectives, and provide the reason and justification for using GIS (Chang, 2014).
Without properly trained personnel with the vision and commitment to a project
little will be achieved. The significance of the people involved in GIS is, regrettably,
all too often overlooked by these with a more technological focus (Healey, 1991).
1.2.3.2 Data
Data consist of various kinds of inputs that the system takes to produce
information (Chang, 2014). In many respects data are a crucial resource.
Geographical data are very expensive to collect, store and manipulate because of
large volumes are normally required to solve substantive geographical problems.
Although estimates vary, it is not uncommon for the cost of data collection to
exceed the cost of hardware and software by a factor of two (Healey, 1991).

1.3 Functional Components of GIS


GIS consists of several functional components2 – components which support key
GIS functions. These are data capture and preparation, data storage, data
analysis, and presentation of spatial data. Figure 1-1 shows a diagram of these
components, with arrows indicating the data flow in the system. For a particular
GIS, each of these components may provide many or only a few functions.
Arguably, the system should not be called a geographic information system if any
one of these components is missing. It is important to note however, that the
same function may be offered by different components of the GIS: for instance,
data capture and data storage may have functions in common, and the same holds
for data preparation and data analysis (de By and Huisman, 2009: 144). These
functions must always be present for the software to qualify as a GIS (Clarke,
1995).

2
Also called subsystems
1-5
Figure 1-1: Functional Components of a GIS (de By and Huisman, 2009: 145)

1.3.1 Data Capture


Getting the map into the computer is a critical first step in GIS. Geocoding must
include at least the input of scanned or digitized maps in some appropriate format.
The system should be able to absorb data in a variety of formats. Not just in the
native format of the particular GIS. For example, an outline map may be available
as an AutoCAD DXF format. The GIS should at a minimum be capable of absorbing
the DXF file without further modification. Similarly, attributes may already be
stored in standard database format (DBF) and should be absorbable either directly
or through the generic ASCII format (Clarke, 1995: 205).
The functions for capturing data are closely related to the disciplines of survey
engineering, photogrammetry, remote sensing, and the process of digitizing, i.e.
the conversion of analogue data into digital representations. Remote sensing, in
particular, is the field that provides photographs and images as the raw base data
from which spatial datasets are derived. Surveys of the study often need to be
conducted for data that cannot be obtained with remote sensing techniques, or to
validate data thus obtained (de By and Huisman, 2009: 149).
Traditional techniques for obtaining spatial data, typically from paper sources,
included manual digitizing and scanning. Table 1-2 lists the main methods and
devices used for data capture. In recent years there has been a significant increase
in the availability and sharing of digital (geospatial) data. As discussed above,
various media and computer networks play an important role in the dissemination
of this data, particularly the internet (de By and Huisman, 2009: 149).
The data, once obtained in some digital format, may not be quite ready for use in
the system. This may be because the format obtained from the capturing process
is not quite the format required for storage and further use, which means that
some type of data conversion is required. In part, this problem may also arise
when the captured data represents only raw base data, out of which the real data
objects of interest to the system still need to be constructed. For example, semi-
automatic digitizing may produce line segments, while the application’s
requirements are that non-overlapping polygons are needed. A build-and-
verification phase would then be needed to obtain these from the captured lines
(de By and Huisman, 2009: 149).

1-6
Table 1-2: Spatial Data Input Methods and Devices Used (de By and Huisman, 2009:
150)

Method Devices
Manual Digitizing  Coordinate entry via keyboard
 Digitizing tabled with cursor
 Mouse cursor on the computer monitor (heads-
up digitizing)
 (Digital) photogrammetry
Automatic Digitizing  Scanner
Semi-automatic Digitizing  Line-following software
Input of Available Digital  CD-ROM or DVD-ROM
Data  Via computer network or internet (including geo-
webservices)

1.3.2 Data Storage and Maintenance


Data storage within a GIS has historically been an issue of both space (usually
how much disk space the system requires) and access (or how flexible a GIS is in
terms of making data available for use). The massive reductions in the cost of disk
storage, new high-density storage media, and the integration of compression
methods into common operating systems have made the former less critical and
the latter more so (Clarke, 1995: 208).
The way that data is stored plays a central role in the processing and the eventual
understanding of that data. In most of the available systems, spatial data is
organized in layers by theme and/or scale. For instance, the data may be
organized in thematic categories, such as land use, topography and administrative
subdivisions, or according to map scale. An important underlying need or principle
is a representation of the real world that has to be designed to reflect phenomena
and their relationships as naturally as possible. In a GIS, features are represented
with their (geometric and non-geometric) attributes and relationships. The
geometry of features is represented with primitives of the respective dimension:
a windmill probably as a point, and agricultural field as a polygon. The primitives
follow either the vector, as in the example, or the raster approach (de By and
Huisman, 2009: 151).
The storage of a raster is, in principle, straightforward. It is stored in a file as a
long list of values, one for each cell, preceded by a small list of extra data (the so-
called ‘file header’) that informs how to interpret the long list. The order of the call
values in the list can be (but need not be) left-to-right, top-to-bottom. This simple
encoding scheme is known as row ordering. The header of the raster file will
typically inform how many rows and columns the raster has, which encoding
scheme is used, and what sort of values are stored for each cell. Raster files can
be quite big data sets. For computational reasons, it is wise to organize the long
list of cell values in such a way that spatially nearby cells are also near to each
other in the list. This is why other encoding schemes have been devised (de By
and Huisman, 2009: 151).
Maintenance of (spatial) data can best be defined as the combined activities to
keep the data set up-to-date and as supportive as possible to the user community.
It deals with obtaining new data, and entering them into the system, possibly

1-7
replacing outdated data. The purpose is to have and un-to-date stored data set
available. After a major earthquake, for instance, we may have to update our road
network data to reflect that roads have been washed away, or have otherwise
become impassable (de By and Huisman, 2009: 153).
The need for updating spatial data stems from the requirements that the data
users impose, as well as the fact that many aspects of the real world change
continuously. These data updates can take different forms. It may be that a
complete, new survey has been carried out, from which an entirely new data set
is derived that will replace the current set. Such a situation is typical if the spatial
data originates from remotely sensed data, for example, a new vegetation cover
set, or a net digital elevation model. It may also be that local (ground) surveys
have revealed local changes, for instance, new constructions, or changes in land
use or ownership. In such cases, local change to the large spatial data set is more
typical, i.e. they should leave other spatial data within the same layer intact and
correct (de By and Huisman, 2009: 153)
1.3.3 Data Manipulation and Analysis
The analysis capabilities of GIS systems vary remarkably. Among the multitude of
features that GIS systems offer are the computation of the slope and direction of
slope (aspect) on a surface such as terrain; interpolation of missing of
intermediate values; line-of-sight calculations on a surface; the incorporation of
special break or skeleton lines into a surface; finding the optimal path through a
network or a landscape; and the computations necessary to calculate the amount
of material that must be moved during cut-and-fill operations such as road
construction (Clarke, 1995: 213).
Almost unique to GIS, and entirely absent in other types of information systems,
are geometric tests. These can be absolutely fundamental to building a GIS in the
first case. These are described by their dimensions, point-in-polygon, line-in-
polygon, and point-to-line distance. The first, point-in-polygon, is how a point
database such as a geocoded set of point samples is referenced into regions. Thus
a set of locations for soil samples, generated at random, could be point-in-polygon
merged with a digitized set of district boundaries so that a sample list can be sent
to each soil district manager. Other more complex analytical operations include
partitioning a surface into regions, perhaps using the locations of known points to
form proximal regions or Voronoi polygons, or by dividing a surface into
automatically delineated drainage basins (Clarke, 1995: 213).
Some of the most critical analytical operations are often the simplest. A GIS should
be able to do spreadsheet and database tasks, compute a new attribute, generate
a printed report or summarize a statistical description, and do at least simple
statistical operations such as computing means and variance, performing
significant testing, and plotting residuals (Clarke, 1995: 213).
The most distinguishing parts of a GIS are its functions for spatial analysis, i.e.
operators that use spatial data to derive new geo-information. Spatial queries and
process models play an important role in this functionality. One of the key uses of
GISs has been to support spatial decisions. Spatial decision support systems
(SDSS) are a category of information systems composed of a database, GIS
software, models, and a so-called knowledge engine which allow users to deal
specifically with locational problems (de By and Huisman, 2009: 155).

1-8
In a GIS, data are usually grouped into layers (or themes). Usually, several
themes are part of a project. The analysis functions of a GIS use the spatial and
non-spatial attributes of the data in a spatial database to provide answers to user
questions. GIS functions are used for maintenance of the data, and for analysing
the data in order to infer information from it. Analysis of spatial data can be
defined as computing new information that provides new insight from the existing,
stored spatial data (de By and Huisman, 2009: 155).
Consider an example from the domain of road construction. In mountainous areas
this is a complex engineering task with many cost factors, which include the
amount of tunnels and bridges to be constructed, the total length of the paved
surface, and the volume of rock and soil to be moved. GIS can help to compute
such costs on the basis of an up-to-date digital elevation model and soil map. The
exact nature of the analysis will depend on the application requirements, but
computations and analytical functions operate on both spatial and non-spatial data
(de By and Huisman, 2009: 155).
1.3.4 Data Presentation
GIS systems need to be able to perform what has become called desktop mapping,
generating geographical and thematic maps so that they can be integrated with
other functions. GISs typically can create several types of thematic mapping,
including choropleth and proportional symbol maps; and they can draw isoline and
cross-sectional diagrams when the data are three dimensional (Clarke, 1995:
213).
Almost all GIS packages now either allow interactive modification of map elements
– moving and resizing titles and legends – or allow their output to be exported
into a package that has these capabilities, such as Adobe Illustrator or CorelDraw.
A very limited few GIS packages include cartographic design help in their editing
of graphics, defaulting to suitable color schemes, or notifying the user if an
inappropriate map type is being used for the data. This would be a desirable
feature for many of the GISs on today’s market and could avoid many tasteless
or erroneous maps before they were created (Clarke, 1995: 213).
The presentation of spatial data, whether in print or on-screen, in maps or in
tabular displays, or as ‘raw data’, is closely related to the discipline of cartography,
printing and publishing. The presentation may either be an end-product, for
example as a printed atlas, or an intermediate product, as in spatial data made
available through the internet (de By and Huisman, 2009: 157).
Table 1-3: Spatial Data Presentation (de By and Huisman, 2009: 157)

Method Devices
Hard Copy  Printer
 Plotter (Pen Plotter, Ink-jet Printer, Thermal Transfer
Printer, Electrostatic Plotter)
 Film Writer
Soft Copy  Computer Screen
Output of Digital Data  Magnetic Tape
Sets  CD-ROM or DVD
 The Internet

1-9
Table 1-3 lists several different methods and devices used for the presentation of
spatial data. Cartography and scientific visualization make use of these methods
and devices to produce their products (de By and Huisman, 2009: 157).

1.4 Functions of GIS


The basic functions of GIS are to answer the following basic questions. The basic
questions which a GIS can answer may be classified in a generic fashion. There
are six generic questions that a sophisticated GIS can address (Cho, 1995: 33):
Table 1-4: Basic Questions that GIS can answer

Location What is at...? For example, the number of animals in a habitat.


Condition Where is it? Find a location where certain conditions are satisfied.
This is the intersection of two pieces of data

Routing Which is the best way to...? Calculates the best, fastest, shortest,
most scenic route or the route between two places.
Trends What has changed since...? Monitoring change over time, for
example, deforestation
Patterns What spatial patterns exist? Identification of patterns help describe
and compare distributions of a phenomena, to understand the
processes and account for their distribution.
Modelling What if? To determine what happens when one changes some
feature or variable. Requires geographic and other information and
possibly scientific laws, for example, sea level changes, global
warming, and desertification for an explanation.

Refer to (Bhatta, 2014: 462) for details

1.5 Geographical Features


A feature is a point, line, or polygon in a dataset that represents a real-world
object. A feature class is a collection of features, categorized by the type of
geometry used to define the feature (e.g., how the coordinates are stored, as a
point, line, or polygon).
In GIS, geographical features are usually defined according to their two data
elements. The geographical (also called locational) data element is used to provide
a reference for the attribute (also called statistical or non-locational) data element.
For example, administrative boundaries, river networks and point locations of sites
are all geographical features used to provide a reference for, respectively, census
counts, river water flows or site elevations. In GIS the geographical element is
more important than the attribute element and this is one of the key features
which differentiates GIS from other information systems (Coppock and Rhind,
1991: 322).
Four generic geographical features are normally recognized on the basis of
Euclidian dimensionality: points, lines, areas and surfaces. In this scheme, points
have no length dimension and are said to have a dimensionality of zero. Lines
have a single length dimension and dimensionality of one. Areas have two length
dimensions and dimensionality of two. Finally, surfaces have three length
dimensions and are given a dimensionality of three. All these features can be

1-10
represented in either the vector of tessellation models (Coppock and Rhind, 1991:
322).
Each of these generic geographical features may be further subdivided according
to the characteristics of the associated attribute data. For example, points might
be sub-divided into houses, telephone boxes and soil pits; areas might be sub-
divided into those with a population density of, say, 0-5000, 5001-10000 and
greater 10000 persons per sq km; and surfaces might be sub-divided into those
which are flat, steeply sloping and very steeply sloping (Coppock and Rhind, 1991:
322).

1.7 Application of GIS


From its beginnings, GIS has been important in natural resource management,
including land-use planning, natural hazard assessment, wildlife habitat analysis,
riparian zone monitoring, and timber management (Chang, 2014). In recent years
GIS has been used for crime analysis, emergency planning, land records
management, market analysis, and transportation applications (Chang, 2014).
Integration of GIS with the global positioning system (GPS), wireless technology,
and the Internet has also introduced new and exciting applications ( (Tsou, 2004)
as cited in (Chang, 2014)). Some examples of this category are:
 Location-based services (LBS) technology allows mobile phone users to be
located and to receive location information, such as nearby ATMs and
restaurants.
 Interactive-mapping websites let users select map layers for display and make
their own maps.
 In-car navigation systems find the shortest route between and origin and
destination and provide turn-by-turn directions to drivers.
 Mobile mapping allows field workers to collect and access geospatial data in
the field.
 Precision farming promotes site-specific farming activities such as herbicide or
fertilizer application.
Some current applications of GIS are (Burrough and McDonnell, 2014):

Agriculture Monitoring and management from farm to national levels


Archaeology Site description and scenario evaluation

Environment Monitoring, modelling, and management for land


degradation; land evaluation and rural planning; landslides;
desertification; water quality and quantity; plagues; air
quality; weather and climate modelling and prediction

Epidemiology Location of disease in relation to environmental factors


and Health
Forestry Management, planning, and optimizing extraction and
replanting

Emergency Optimizing fire, police, and ambulance routing; improved


services understanding of crime and its location

1-11
Navigation Air, sea, and land

Marketing Site location and target groups; optimizing goods delivery

Real Estate Legal aspects of the cadastre, property values in relation to


location, insurance

Regional/local Development of plans, costing, maintenance, management


planning

Road and rail Planning and management

Site evaluation Cut and fill, computing volumes of materials


and costing

Social studies Analysis of demographic movements and developments

Tourism Location and management of facilities and attractions

Utilities Location, management, and planning of water, drains, gas,


electricity, telephone, cable services
Refer to (Bhatta, 2014: 463) for more information

1.8 Importance of GIS


GIS is an integration of several systems, methodologies and applications.
Therefore, it has various advantages and some of them are interrelated. A GIS
has many advantages over the traditional manual method of geographic data
analysis. It also possesses some unique advantages over other traditional systems
like Computer aided design and drawing (CADD or CAD), Automated Mapping and
Facility Management (AM/FM) or conventional information systems (Bhatta, 2014:
466).
Once a GIS is implemented, the following benefits are expected (Murai, 1998b).
 geospatial data are better maintained in a standard format
 revision and updating are easier
 geospatial data and information are easier to search, analyze and represent
 more value added product
 geospatial data can be shared and exchanged freely
 productivity of the staff is improved and more efficient
 time and money are saved
 better decisions can be made
Table 1-5 shows the advantages of GIS and the disadvantages of conventional
manual woks without GIS.
Table 1-5: GIS vs Manual Works (Murai, 1998b)

Maps GIS Manual Works


Storage Standardized and Integrated Different scales on different standards
Retrieval Digital Database Paper maps, census, tables
Updating Search by Computer Manual Check

1-12
Maps GIS Manual Works
Overlay Systematically Done Expensive and time consuming
Spatial Analysis Very fast and Easy Time and energy consuming
Display Cheap and fast Expensive

Geographic information is the key to better decision-making; just about


everything a community, business, or public agency does, whether in day-to-day
operations or long-term planning, is related to its geography.
Education is a good example. The primary purpose of schools, of course, is to
teach children. But schools also have to worry about maintaining an efficient and
safe transportation system for their students, whether the school building will have
to expand if the population keeps growing, and whether the building’s septic
system will be adequate in years to come.
Commercial site evaluation is another example. Zoning regulations, utility
availability, traffic access, and proximity to consumers are all important
considerations for retail businesses choosing building sites.
Imagine how maps would present the information described above in an easy to
understand format.
Because of all these capabilities and advantages over manual and other traditional
systems, GIS is important and its importance is still growing and field of
applications as well are expanding.
Further Reading: (Healey, 1991: 14)

1.9 Historical Development of GIS


The following excerpt is extracted from (Coppock and Rhind, 1991: 39)
A history of GIS is necessarily piecemeal and partial. Inevitably, events are
duplicated in different countries at different times. But, despite the unsatisfactory
nature of the evidence and the fact that such conclusions are necessarily
approximations to reality, four overlapping phases may be distinguished in the
development of GIS in the more advanced countries. The first is the pioneer or
‘research frontier’ period, from the 1950s to about 1975 in the United States and
the United Kingdom. This was characterised by the individual developments,
limited international contact, little data in machine readable form and ambitions
which far out-ran the computing resources of the day. Individual personalities
greatly influenced events. The second phase was that in which format experiment
and government-funded research was the norm, stretching from about 1973 to
the early 1980s; the role of individuals was diminished somewhat in the
international and national arenas except for strong-minded heads of national
mapping agencies, but at the local level the effect of individual persisted strongly.
Rapidly replacing this phase was the commercial phase commencing about 1982
which, in the light of strong competition among vendors, is now giving way to a
phase of user dominance. The last two phases can also be characterized as ones
in which systems handling individual data sets on isolated machines (latterly
workstations) gave way to those dealing with corporate and distributed databases,
assessed across networks and increasingly integrated into the other non-spatial
databases of the organization. A vital characteristic of both the latter phases is

1-13
that these activities became routine: in earlier phases, skilled ‘fixers’ were
required to be on hand to cope with problems in the software, data or hardware.
What particularly emerges from this chapter is the dominant contribution of North
America to the development and implementation of GIS up to the mid- and late-
1980s, a function of the persuasive power of key individual pioneers, the size of
the internal market, the leading role of the United States in the development of
computer hardware and software and –above all – an increasing appreciation by
many North American users of the need for efficient, speedy and cost-effective
means of handling large quantities of geographical data. It is that perception of
need which led potential users to seek GIS solutions and has encouraged
commercial providers to develop and offer turnkey systems to convert that
perceived need into a reality. What is not clear from the piecemeal evidence,
however, is the ratio of failures to successes or how many operational systems
are fully used and living up to their promises. A federal system of government,
where large bureaucracies have considerable powers to take initiatives on their
own account and where states are often as large as many independent countries,
are no doubt are important features, as is the large area of public land to be
managed directly by federal and state agencies. Being continental in scale faces
both Canada and the United States with particular problems, but in its scale,
comprehensiveness and ambition at a time of inadequate technology.
Development elsewhere in GIS were more limited until late in the 1980s, although
those in Japan, the United Kingdom and several other countries in mainland
Europe seem in rapid evolution. Land registration promises to make GIS a globally
used technology from the ‘bottom up’ while earth monitoring from satellites
promises to achieve global use ‘top down’. It is a reasonable expectation that
routine (and often boring, if valuable) use of GIS will be nearly ubiquitous over
the next 20 years. This is the end of the beginning of GIS.

1-14
(Clarke, 1995)

1.10 GIS Software


GIS software is the processing engine and a vital component of an operational
GIS. It is made up of integrated collections of computer programs that implement
geographic processing functions. The three key parts of any GIS software system
are the user interface, the tools (functions), and the data manager. All three parts
may be located on a single computer or they may be spread over multiple
machines in a departmental or enterprise configuration. Four main types of
computer system architecture configurations are used to build operational GIS
implementations: desktop, client-server, centralized desktop, and centralized
server. There are many different types of GIS software and this chapter uses five
categories to organize the discussion: desktop, server (including Internet),
developer, hand-held, and other. The market leading commercial GIS software
vendors are ESRI, Intergraph, Autodesk, and GE Energy (Smallworld) (Longley et
al., 2004).

Figure 1-2: Estimated size (number of users) of the different GIS software sectors
(Longley et al., 2004)
GIS software is a fundamental and critical part of any operational GIS. The
software employed in a GIS project has a controlling impact on the type of studies
that can be undertaken and the results that can be obtained. There are also far
reaching implications for user productivity and project costs. Today, there are
many types of GIS software product to choose from and a number of ways to
configure implementations. One of the exciting and at times unnerving
characteristics of GIS software is its very rapid rate of development. This is a trend

1-15
that seems set to continue as the software industry pushes ahead with significant
research and development efforts. The following chapters will explore in more
detail the functionality of GIS software and how it can be applied in real-world
contexts (Longley et al., 2004).

(Clarke, 1995)

1-16
CHAPTER 2 GIS AND MAPS

GIS and Maps: [4 hrs]


Map Projections and Coordinate Systems; Maps and their characteristics
(selection, abstraction, scale, etc.); automated cartography versus GIS; Map
projections; Coordinate systems; Geo-referencing, precision and error.

Refer to (de By and Huisman, 2009: 441) chapter 7.1 GIS and Maps

2.1 Maps and Map Characteristics


The main method of identifying and representing the location of geographic
features on the landscape is a map. A map is a graphic representation of where
features are, explicitly and relative to one another. A map is composed of different
geographic features represented as either points, lines, and/or areas. Each feature
is defined both by its location in space (with reference to a coordinate system),
and by its characteristics (typically referred to as attributes). Quite simply, a map
is a model of the real world (Buckley, 1997).
A map can be defined as a graphic depiction of all or part of a geographic realm
in which the real-world features have been replaced by symbols in their correct
spatial location at a reduced scale. Maps, as we have already seen, are the paper
storehouses of spatial information that we use as sources of data of GIS. They are
also the final stage in GIS work, the means by which the information being
extracted, analysed, and reconstructed using the power of the GIS is at last
communicated to the GIS user or the decision maker who relies on the GIS for
knowledge. Maps within a GIS can be temporary, designed merely for a quick
information glance, or permanent, for presentation of ideas as a substitute for a
picture or a report. Whatever the map’s context, we must return again to the
cartographic roots of GIS for a discussion of the critical information that the GIS
practitioner needs to use the map display part of a GIS correctly (Clarke, 1995).
In either case, the map has a structure. Just as a sentence in the English language
needs to follow grammar and syntax to be understood, so a map has to follow its
own visual grammar. Just as a map has a structure, so that structure can vary
accordingly to which media we use for map display. GISs usually use the computer
monitor to display a map, rather than the traditional paper. Only now, after many
years of computer mapping, are cartographers beginning to understand how map
design depends on the display medium. The GIS has been a major reason why
this has become important consideration (Clarke, 1995).
Maps have been used throughout history to portray the Earth’s surface, location
of features, and relations between features. Traditionally, maps were exclusively
hand-drawn or drafted documents. The practice of cartography paralleled the
exploration of the world as navigators established location reference schemes,
classifications of features, labeling, and other annotations. Many of the symbols
developed are retained in modern maps, such as blue lines for streams, double-
line symbols for roads, and contour lines for topography (Johnson, 2009).
A map can accomplish many things in many ways. When you read a map, you
observe the shapes and position of features, some attribute information about a

2-1
feature, and the spatial relationships between features (Zeiler 1999). Some things
that maps accomplish include (Johnson, 2009):
 Identify what is at a location through placement of a features symbol in a
reference frame
 Portray the relationship between features as connecting, adjacent, contained
within, intersecting, in proximity, or higher/lower
 Display multiple attributes of an area
 Allow portrayal of and discernment between distributions, relationships, and
trends
 Show classifications of feature attributes and graphic portrayals as thematic
maps
 Visually encode feature attributes as text, values, or identifiers
 Detect changes over time using maps prepared at different times
 Integrate data from diverse sources into a common geographic reference,
thereby allowing comparison
Maps remain important, but more and more maps are produced with geographic
information. Some people now even suggest that most maps are simply interfaces
to geographic information databases. Several years ago, separating geographic
information from maps would have been complicated. Maps, following the
International Cartographic Association, are science and art. Geographic
information was interpreted or symbolized data. It’s simpler now. In this book
“maps” are a form of output of geographic information. Maps are truly the most
common form of output and have been essential to our understanding of the world
for millennia. Maps can be drawn by hand, and constructed by hand, but nowadays
are mostly prepared using geographic information (Harvey, 2008).
Maps are perhaps the best known (conventional) models of the real world. Maps
have been used for thousands of years to represent information about the real
world, and continue to be extremely useful for many applications in various
domains. Their conception and design has developed into a science with a high
degree of sophistication. A disadvantage of the traditional paper map is that it is
generally restricted to two-dimensional static representations, and that is always
displayed in a fixed scale. The map scale determined the spatial resolution of the
graphic feature representation. The smaller the scale, the less detail a map can
show. The accuracy of the base data, on the other hand, puts limits to the scale
in which a map can be sensibly drawn. Hence, the selection of a proper map scale
is one of the first and most important steps in map design (de By and Huisman,
2009: 51).
A map is the representation of the Earth’s surface/pattern as a whole or a part of
it on the plane surface, with the conventional signs, drawn to a scale and
projection so that each and every point on it correspond to the actual position of
the Earth. Maps are abstract representation of the physical features of a portion
of the Earth’s surface, graphically displayed on a planar surface (Bhatta, 2014:
612).
Map making dates back to the Stone Age. From the last quarter of the twentieth
century, the computer has been an indispensable tool for the cartographer. Much

2-2
of Cartography, especially at the data gathering survey level, has been
incorporated by GIS. Even when GIS is not involved, most cartographers now use
a variety of computer graphics programs to generate new maps. Interactive,
computerized maps are commercially available, allowing users to zoom-in or
zoom-out (Bhatta, 2014: 613). A map portrays three kinds of information about
geographic features (Bhatta, 2014: 613):
 Location and extent of the feature
 Attributes (characteristics) of the feature
 Relationship of the feature to other features

2.2 Cartography vs GIS


Cartography is the science and practice of representing features of the Earth’s
surface graphically. In more formal terms a definition of cartography may be
stated as follows (Cho, 1995: 181):
"the graphical representation of spatial relationships and spatial forms in what we
call a map, and, very simply, cartography is the making and study of maps in all
their aspects .... This includes teaching the skills of map use; studying the history
of cartography; maintaining map collections with associated cataloguing and
bibliographic activities; and the collection, collation and manipulation of data and
the design and preparation of maps, charts, plans and atlases" (Robinson et al.
1985: 1-3).
In the light of this extensive definition, it may be possible to distil four essential
elements in cartographic work; namely (Cho, 1995: 181):
 collecting and selecting the data for mapping;
 manipulating and generalising the data, designing and constructing the
map;
 reading and viewing the map; and,
 responding to or interpreting the data.
Some observers, for example, Dent (1985), make a distinction between map
making and cartography. Cartography, according to Dent (1985: 5) "requires the
study of the philosophical and theoretical bases of the rules for map making,
including the study of map communication". On the other hand, Muehrcke (1972:
1) considers map making "as the aggregate of those individual and largely
technical processes of data collection, cartographic design and construction
(drafting, scribing, display), reproduction, etc., normally associated with the
actual reproduction of maps" (Cho, 1995: 182).

Cartography is concerned with the display of spatial information and the main
source of input data for GIS is maps. The discipline has a long tradition in the
design of maps and recent developments in ‘digital’ and ‘automated’ cartography
provides methods for digital representation and manipulation of cartographic
features and methods of visualisation (Cho, 1995: 27).
GIS is claimed as a new discipline. However, current disciplinary boundaries
suggest that this new technology will sit uncomfortably in any one or a
combination of disciplines. The academic practices and traditions are too

2-3
entrenched to allow GIS to integrate easily. Thus, GIS will remain a powerful tool
for spatial display, analysis and modelling. GIS is predicted to displace
cartography, geodesy and other land sciences. The traditional cartographic
production process of paper maps is said to be tedious, expensive, slow to produce
and costly to update and maintain. GIS itself is in its infancy and suffers the same
kinds of criticisms levelled against traditional cartography. In geodesy, the lack of
architectural scale accuracy even with highly accurate DGPS limits applications at
a large-scale. At present even a 1-meter resolution from DGPS is unsuitable for
cadastral mapping and land surveys. These very large-scale cadastres cannot be
used because GIS are ideally suited to small- to medium-scale work (Cho, 1995:
30).

2.3 Coordinate Systems


Coordinates define location in two or three-dimensional space. Coordinate pairs,
e.g., x and y, or coordinate triples, x, y, and z, are used to define the shape and
location of each spatial object or phenomenon (Bolstad, 2012: 28).
Coordinate system can be thought as a system used to identify locations on a
graph or grid. For example, the system of assigning longitude and latitude to
geographical locations is a coordinate system. There are various coordinate
systems available to represent the location of any point. However, all of them fall
into two broad categories – curvilinear and rectangular. Curvilinear system uses
angular measurements from the origin to describe one’s position, whereas
rectangular coordinate system uses distance measurements from the origin. One
example of curvilinear system is geographic coordinate system (GCS) that uses
latitude/longitude measurements, and one example of rectangular coordinate
system is Cartesian coordinate system (CCS) that uses distance measurements
(Bhatta, 2014: 619).

2.3.1 Geographic Coordinate System


Just as all maps have a map scale, all maps have locations, too. Coordinate
systems are frameworks that are used to define unique positions. For instance,
in geometry we use x (horizontal) and y (vertical) coordinates to define points on
a two-dimensional plane. The coordinate system that is most commonly used to
define locations on the three-dimensional earth is called the geographic
coordinate system (GCS), and it is based on a sphere or spheroid. A spheroid
(a.k.a. ellipsoid) is simply a sphere that is slightly wider than it is tall and
approximates more closely the true shape of the earth. Spheres are commonly
used as models of the earth for simplicity (Campbell and Shin, 2012: 45).
The unit of measure in the GCS is degrees, and locations are defined by their
respective latitude and longitude within the GCS. Latitude is measured relative to
the equator at zero degrees, with maxima of either ninety degrees north at the
North Pole or ninety degrees south at the South Pole. Longitude is measured
relative to the prime meridian at zero degrees, with maxima of 180 degrees west
or 180 degrees east (Campbell and Shin, 2012: 45).
Note that latitude and longitude can be expressed in degrees-minutes-seconds
(DMS) or in decimal degrees (DD). When using decimal degrees, latitudes above
the equator and longitudes east of the prime meridian are positive, and latitudes

2-4
below the equator and longitudes west of the prime meridian are negative (see
the following table for examples) (Campbell and Shin, 2012: 45):

Converting from DMS to DD is a relatively straightforward exercise. For example,


since there are sixty minutes in one degree, we can convert 118° 15 minutes to
118.25 (118 + 15/60). Note that an online search of the term “coordinate
conversion” will return several coordinate conversion tools (Campbell and Shin,
2012: 46).
When we want to map things like mountains, rivers, streets, and buildings, we
need to define how the lines of latitude and longitude will be oriented and
positioned on the sphere. A datum serves this purpose and specifies exactly the
orientation and origins of the lines of latitude and longitude relative to the center
of the earth or spheroid (Campbell and Shin, 2012: 46).

Further Reading: (Bhatta, 2014: 619)


2.3.2 Shape of the Earth
The geoid is an equipotential gravitational surface, which is everywhere
perpendicular to the direction of gravity. Because of variations in the Earth's mass
distribution and the rotation of the Earth, the geoid has an irregular shape (Ghilani
and Wolf, 2012: 530).
The ellipsoid is a mathematical surface obtained by revolving an ellipse about the
Earth's polar axis. The dimensions of the ellipse are selected to give a good fit of
the ellipsoid to the geoid over a large area and are based upon surveys made in
the area (Ghilani and Wolf, 2012: 530).
A two-dimensional view, which illustrates conceptually the geoid and ellipsoid, is
shown in Figure 19.1. As illustrated, the geoid contains nonuniform undulations
(which are exaggerated in the figure for clarity) and is therefore not readily defined
mathematically. Ellipsoids, which approximate the geoid and can be defined
mathematically, are therefore used to compute positions of widely spaced points
that are located through control surveys. The Clarke Ellipsoid of 1866
approximates the geoid in North America very well and from 1879 until the 1980s
it was the ellipsoid used in NAD 27 as a reference surface for specifying geodetic
positions of points in the United States, Canada, and Mexico. Currently, the
Geodetic Reference System of 1980 (GRS80) and World Geodetic System of 1984
(WGS84) ellipsoids are commonly used in the United States because they provide

2-5
a good worldwide fit to the geoid. This is important because of the global surveying
capabilities of GNSS (Ghilani and Wolf, 2012: 530).

Figure 2-1: Geoid and Ellipsoid


Sizes and shapes of ellipsoids can be defined by two parameters. Table 19.1 lists
the parameters for the three ellipsoids noted above. For the Clarke 1866 ellipsoid,
the defining parameters were the semiaxes ‘a’ and ‘b’. For GRS80 and WGS84,
the defining parameters are the semimajor axis ‘a’ and flattening ‘f’. The
relationship between these three parameters is (Ghilani and Wolf, 2012: 531)

Other quantities commonly used in ellipsoidal computations are the first


eccentricity, e, and the second eccentricity, e’ of the ellipse, where (Ghilani and
Wolf, 2012: 531)

Often the term eccentricity is understood to mean the first eccentricity and this
book will follow that convention. For each ellipsoid, the polar semiaxis a is only
about 21 km (13 mi) shorter than the equatorial semiaxis b. This means the

2-6
ellipsoid is nearly a sphere; hence, for some calculations involving moderate
lengths (usually up to about 50 km) this assumption can be made3.

Further Reading: (Bhatta, 2014: 621)


2.3.3 Datum
Depending on the need, situation, and location, there are several datums to
choose from. For instance, local datums try to match closely the spheroid to the
earth’s surface in a local area and return accurate local coordinates. A common
local datum used in the United States is called NAD83 (i.e., North American Datum
of 1983). For locations in the United States and Canada, NAD83 returns relatively
accurate positions, but positional accuracy deteriorates when outside of North
America (Campbell and Shin, 2012: 46).
The global WGS84 datum (i.e., World Geodetic System of 1984) uses the center
of the earth as the origin of the GCS and is used for defining locations across the
globe. Because the datum uses the center of the earth as its origin, locational
measurements tend to be more consistent regardless where they are obtained on
the earth, though they may be less accurate than those returned by a local datum.
Note that switching between datums will alter the coordinates (i.e., latitude and
longitude) for all locations of interest (Campbell and Shin, 2012: 46).

Further Reading: (Bhatta, 2014: 623)

2.3.4 Projected Coordinate System


A projected coordinate system is defined on a flat, two-dimensional surface. A
projected coordinate system, unlike a geographic one, has the advantage that
lengths, angles, and areas are constant across the two dimensions. This is not
true when working in a geographic coordinate system. A projected coordinate
system is always based on a geographic coordinate system that can use a sphere
or spheroid (Kennedy, 2000: 16).
In a projected coordinate system, locations are identified by x, y coordinates on a
grid, with the origin at the centre of the grid. Each position has two values
referencing it to that central location. One specifies its horizontal position and the
other its vertical position. The two values are called the x-coordinate and y-
coordinate. Using this notation, the coordinates at the origin are x = 0 and y = 0
(Kennedy, 2000: 16).

3
In computations if the ellipsoid is assumed a sphere, its radius is usually taken such that
3
its volume is the same as the reference ellipsoid. It is computed from 𝑟 = √𝑎2 𝑏. For the
GRS80 ellipsoid, its rounded value is 6,371,000 m.

2-7
Further Reading: (Bhatta, 2014: 624)

2.4 Map Projections


A map projection is a process of transforming location on the curved surface of
the Earth with the geodetic coordinates (,) to planar map coordinates (x, y).
More than 400 different map projections have been proposed. The map projections
are classified by the following parameters (Murai, 1998b):
 Projection plane: perspective, conical, cylindrical
 Aspect: normal, transverse, oblique
 Property: conformal, equivalence, equidistance
Previously we noted that the earth is really big. Not only is it big, but it is a big
round spherical shape called a spheroid. A globe is a very common and very good
representation of the three-dimensional, spheroid earth. One of the problems with
globes, however, is that they are not very portable (i.e., you cannot fold a globe
and put in it in your pocket), and their small scale makes them of limited practical
use (i.e., geographic detail is sacrificed). To overcome these issues, it is necessary
to transform the three-dimensional shape of the earth to a two-dimensional
surface like a flat piece of paper, computer screen, or mobile device display in
order to obtain more useful map forms and map scales. Enter the map projection
(Campbell and Shin, 2012: 46).
Map projections refer to the methods and procedures that are used to transform
the spherical three-dimensional earth into two-dimensional planar surfaces.
Specifically, map projections are mathematical formulas that are used to translate
latitude and longitude on the surface of the earth to x and y coordinates on a
plane. Since there are an infinite number of ways this translation can be
performed, there are an infinite number of map projections. The mathematics
behind map projections are beyond the scope of this introductory overview and
for simplicity, the following discussion focuses on describing types of map
projections, the distortions inherent to map projections, and the selection of
appropriate map projections (Campbell and Shin, 2012: 47).
To illustrate the concept of a map projection (Figure 2-2), imagine that we place
a light bulb in the center of a translucent globe. On the globe are outlines of the
continents and the lines of longitude and latitude called the graticule. When we
turn the light bulb on, the outline of the continents and the graticule will be
“projected” as shadows on the wall, ceiling, or any other nearby surface. This is
what is meant by map “projection” (Campbell and Shin, 2012: 47).

2-8
Figure 2-2: Concept of Map Projection (Campbell and Shin, 2012)

2.4.1 Map Projection Surfaces and Orientation


Within the realm of maps and mapping, there are three surfaces used for map
projections (i.e., surfaces on which we project the shadows of the graticule). These
surfaces are the plane, the cylinder, and the cone. Referring again to the previous
example of a light bulb in the center of a globe, note that during the projection
process, we can situate each surface in any number of ways. For example,
surfaces can be tangential to the globe along the equator or poles, they can pass
through or intersect the surface, and they can be oriented at any number of angles
(Campbell and Shin, 2012: 48).

2-9
Figure 2-3: Map Projection Surfaces (Campbell and Shin, 2012: 48)
In fact, naming conventions for many map projections include the surface as well
as its orientation. For example, as the name suggests, “planar” projections use
the plane, “cylindrical” projections use cylinders, and “conic” projections use the
cone. For cylindrical projections, the “normal” or “standard” aspect refers to when
the cylinder is tangential to the equator (i.e., the axis of the cylinder is oriented
north–south). When the axis of the cylinder is perfectly oriented east–west, the
aspect is called “transverse,” and all other orientations are referred to as “oblique.”
Regardless the orientation or the surface on which a projection is based, a number
of distortions will be introduced that will influence the choice of map projection
(Campbell and Shin, 2012: 48).

2-10
2.4.2 Cylindrical Projection
These projections are developed by transforming the spherical surface to a
tangent or secant cylinder. Mathematically, a cylinder wrapped around the equator
is expressed with x equal to longitude, and the y coordinates some function of
latitude. The Example is Mercator projection (Fazal, 2008: 69).
2.4.3 Conical Projection
The transformation is made to the surface of a cone tangent at a small circle
(tangent case) or intersecting at two small circles (secant case) on a globe.
Mathematically, this projection is also expressed as mappings from latitude and
longitude to polar coordinates, but with the origin located at the apex of the cone.
The examples are (Fazal, 2008: 69):
• Alber’s conical equal area projection with two standard parallels
• Lambert conformal conic projection with two standard parallels
• Equidistant conic projection with one standard parallel
2.4.4 Azimuthal (Planer) Projection
A flat sheet is placed in contact with a globe, and points are projected from the
globe to the sheet. Mathematically, the projection is easily expressed as mappings
from latitude and longitude to polar coordinates with the origin located at the point
of contact with the paper. The examples are (Fazal, 2008: 69):
• Stereographic projection
• Gnomic projection
• Lambert’s azimuthal equal-area projection
• Orthographic projection

2.5 Common Map Projections


(Kennedy, 2000)
2.5.1 Universal Transverse Mercator (UTM)

2.6 Geo-referencing
Many data sources lack formal spatial referencing. Some CAD and GIS data sets
are developed in a generic “design” space and have unique, often proprietary,
types of referencing that simply need reinterpretation to be spatially integrated
into a GIS environment. However, many of these sources are scanned raster data
(digital imagery) that have only the coordinates of a raw pixel grid from an original

2-11
scan. While these raster sources are often times unique and critical to a GIS
project, images also need to be referenced from scratch, spatially transformed
into a defined coordinate referencing system, then integrated and overlaid in a
GIS environment. This process is known as geo-referencing (Galati, 2006: 8).
The ability to perform accurate and timely spatial referencing adds a measure of
customization to any GIS operation or project. Raster imagery, such as hardcopy
maps and aerial photography, is the most popular type of data to use with geo-
referencing, since it is the most commonly available type of data to use. Scanning
imagery also alleviates the need to perform time-consuming and repetitive
digitization efforts (i.e., transforming hardcopy to an electronic, digital file)
(Galati, 2006: 8).
Geo-referencing is the art of selecting common point locations in the real world
using at least two data sources: an unreferenced source (such as a raster map)
and a referenced source of the same area providing positional information. Basic
geo-referencing procedures involve point selection and transformation. For
example, when a hardcopy map is scanned to an electronic file, it has no
relationship to any real-world coordinate system. The geo-referencing process
establishes (or in some cases re-establishes) the relationship between image pixel
locations and real-world locations. Geo-referencing is accomplished by first
selecting points on a source image (scanned raster map) with known coordinates
for the real-world surface location (benchmarks, grid ticks, road intersections, and
so on). These real-world coordinates are then linked to the corresponding pixel
grid coordinates in the raster source image. After the image is geo-referenced,
each pixel has a real-world coordinate value assigned to it (Galati, 2006: 8).

2.7 Accuracy and Precision and Error


So far we have used the terms error, accuracy and precision without appropriately
defining them. Accuracy should not be confused with precision, which is a
statement of the smallest unit of measurement to which data can be recorded. In
conventional surveying and mapping practice, accuracy and precision are closely
related. Instruments with an appropriate precision are employed, and surveying
methods chosen, to meet specified accuracy tolerances. In GIS, however, the
numerical precision of computer processing and storage usually exceeds the
accuracy of the data. This can give rise to so-called spurious accuracy, for example
calculating area sizes to the nearest m2 from coordinates obtained by digitizing a
1 : 50,000 map (de By and Huisman, 2009: 285).
Using graphs that display the probability distribution (for which see below) of a
measurement against the true value T, the relationship between accuracy and
precision can be clarified. In Figure 5.2, we depict the cases of good/bad accuracy
against good/bad precision. An accurate measurement has a mean close to the
true value; a precise measurement has a sufficiently small variance (de By and
Huisman, 2009: 285).

2-12
CHAPTER 3 SPATIAL DATA MODEL

Spatial Data Models: [4 hrs]


Concept of data model; raster data model; compression; indexing and hierarchical
data structures; vector data model; topology; TIN data model.

3.1 Concept of GIS Data


The basic premise of data in a GIS reflects traditional data found on a map
(Buckley, 1997: 24). Accordingly, data used in GIS can be classified as (i) Spatial
and (ii) Non-spatial.

3.2 Spatial Data Model


Spatial data, also called graphic data, consists in general of natural and cultural
features (Ghilani and Wolf, 2012: 846). Spatial data comprises of information such
as location, shape, size and orientation of the geographical objects (Fazal, 2008:
100). For example, for a parcel of land; its coordinate pair of the center describes
the location; square, rectangle, trapezoid, etc., describe its shape; its size is given
by the length of side and orientation by the rotation of one of its sides.
Fundamental elements that describe spatial data are (i) point, (ii) line, (iii) area,
(iv) pixel, and (v) Grid Cell (Ghilani and Wolf, 2012: 847).
Data that describe the geographic and spatial aspects of phenomena. Traditionally
spatial data has been stored and presented in the form of a map. Two basic types
of spatial data models have evolved for storing geographic data digitally. These
are referred to as (Buckley, 1997: 25):
 Raster; and
 Vector;
Other data models that are frequently used in GIS are images and TIN.
The real world is too complex for our immediate and direct understanding, so we
create models of abstractions of reality that are intended to have some similarity
with selected aspects of the real world. A spatial database is a collection of
spatially referenced data that act as a model of reality (Bhatta, 2014: 477).
3.2.1 Raster Data Model
Raster data models incorporate the use of a grid-cell data structure where the
geographic area is divided into cells identified by row and column. This data
structure is commonly called raster. While the term raster implies a regularly
spaced grid other tessellated data structures do exist in grid based GIS systems.
Several tessellated data structures such as hexagonal, triangular and rectangular
tessellation exist, however squares are the most commonly used in GIS's. In this
model, the data structure involves a division of spatial data into regularly spaced
cells. Each cell is of the same shape and size (Buckley, 1997: 28).
The raster cell size is an important factor. Smaller cells improve data quality
because they can provide more detail. As cell size increases, data definition
decreases or blurs. Cell size in a raster file is referred to as resolution. Spatial
resolution of a raster is defined as the size of the smallest recording unit or the
smallest size of geographic area represented by a cell (Bhatta, 2014: 479).

3-1
Conceptually, raster models are the simplest of available spatial data models. We
can create a raster of elevation values by encoding each cell with a value that
represents the elevation which best represents the elevation in that cell area.
When we are finished, every cell will have a value. This will be a numeric value of
elevation. We could also give values for other features such as roads, streams,
etc., but on separate layers (Bhatta, 2014: 479).
A raster is a set of regularly spaced (and contiguous) cells with associated (field)
values. The associated values represent cell values, not point values. This means
that the value for a cell is assumed to be valid for all locations within the cell (de
By and Huisman, 2009: 86).
The size of cells in a tessellated data structure is selected on the basis of the data
accuracy and the resolution needed by the user. There is no explicit coding of
geographic coordinates required since that is implicit in the layout of the cells. A
raster data structure is in fact a matrix where any coordinate can be quickly
calculated if the origin point is known, and the size of the grid cells is known. Since
grid-cells can be handled as two-dimensional arrays in computer encoding many
analytical operations are easy to program. This makes tessellated data structures
a popular choice for many GIS software. Topology is not a relevant concept with
tessellated structures since adjacency and connectivity are implicit in the location
of a particular cell in the data matrix (Buckley, 1997: 28).
Since geographic data is rarely distinguished by regularly spaced shapes, cells
must be classified as to the most common attribute for the cell. The problem of
determining the proper resolution for a particular data layer can be a concern. If
one selects too coarse a cell size then data may be overly generalized. If one
selects too fine a cell size then too many cells may be created resulting in a large
data volumes, slower processing times, and a more cumbersome data set. As well,
one can imply an accuracy greater than that of the original data capture process
and this may result in some erroneous results during analysis (Buckley, 1997:
28).
As well, since most data is captured in a vector format, e.g. digitizing, data must
be converted to the raster data structure. This is called vector-raster conversion.
Most GIS software allows the user to define the raster grid (cell) size for vector-
raster conversion. It is imperative that the original scale, e.g. accuracy, of the
data be known prior to conversion. The accuracy of the data, often referred to as
the resolution, should determine the cell size of the output raster map during
conversion (Buckley, 1997: 28).
Most raster based GIS software requires that the raster cell contain only a single
discrete value. Accordingly, a data layer, e.g. forest inventory stands, may be
broken down into a series of raster maps, each representing an attribute type,
e.g. a species map, a height map, a density map, etc. These are often referred to
as one attribute maps. This is in contrast to most conventional vector data models
that maintain data as multiple attribute maps, e.g. forest inventory polygons
linked to a database table containing all attributes as columns. This basic
distinction of raster data storage provides the foundation for quantitative analysis
techniques. This is often referred to as raster or map algebra. The use of raster
data structures allow for sophisticated mathematical modelling processes while
vector based systems are often constrained by the capabilities and language of a
relational DBMS (Buckley, 1997: 28).
This difference is the major distinguishing factor between vector and raster based
GIS software. It is also important to understand that the selection of a particular

3-2
data structure can provide advantages during the analysis stage. For example, the
vector data model does not handle continuous data, e.g. elevation, very well while
the raster data model is more ideally suited for this type of analysis. Accordingly,
the raster structure does not handle linear data analysis, e.g. shortest path, very
well while vector systems do. It is important for the user to understand that there
are certain advantages and disadvantages to each data model (Buckley, 1997:
29).
Further Reading: (Bhatta, 2014: 479)
Data Compression
Although the raster data model has many uses in GIS, one of the main operational
problems associated with it is the sheer amount of raw data that must be stored
(Longley et al., 2004: 182). Approximate files sizes are 1.1 MB for a 30 m DEM,
9.9 MB for a 10 m DEM and so on. The memory requirement becomes even higher
for high-resolution raster data (Chang, 2014: 87).
Data compression refers to the reduction of data volume, a topic particularly
important for data delivery and web mapping. A variety of compression techniques
are available for raster data compression. Compression techniques can be lossless
or lossy. A lossless compression method allows the original data to be precisely
reconstructed. Run-length encoding is an example of lossless compression. A lossy
compression method cannot be reconstructed fully to the original form but can
achieve high compression ratios. Image degradation through lossy compression
can affect GIS related tasks such as extracting ground control points from aerial
photographs or satellite images for the purpose of geo-referencing (Chang, 2014:
87, 88).
To improve storage efficiency many types of raster compression technique have
been developed such as run-length encoding, chain encoding, block encoding,
wavelet compression, and quad-trees (Longley et al., 2004: 182).
Run-length Encoding (RLE)
Run-length encoding is perhaps the simplest compression method and is very
widely used. It involves encoding adjacent row cells that have the same value,
with a pair of values indicating the number of cells with the same value, and the
actual value (Longley et al., 2004: 183). Run-length encoding (RLE) allows the
cells in each mapping unit to be stored per row for each class (or object) in terms
of a beginning cell and end cell, and an attribute (Burrough and McDonnell, 2014).
Data type is usually integer. Consider the raster shown in Figure 3-1, it has 8
columns and 8 rows, hence 64 cells. Shaded region has to be stored in RLE by
storing the beginning cell number, and ending cell number per row from left to
right.

3-3
Figure 3-1: RLE of Binary Data (Bhatta, 2014)
In this example, the 26 cells of the shaded region have been completely coded by
12 values. To explain, if we consider the first line, it says that the shaded region
starts at 1st cell of row 5 and ends at 1st cell, then it starts again at 5th cell and
ends at 8th cell. This example is useful for bitonal (also called binary) raster
(Bhatta, 2014: 482).
For the raster, having multiple values (or classed) require a different compaction
technique to store the data. In this case data can be encoded as a pair of numbers,
first the run length and then the cell value as shown in the Figure 3-2 (Bhatta,
2014: 483).

Figure 3-2: RLE of Multi-class Raster (Bhatta, 2014)


Chain Encoding
Chain codes represent the raster boundary of a region by giving a starting point
and the cardinal direction (east, north, west and south) to follow as we progress
around the boundary. Consider the shaded region of Figure 3-1 can be encoded
starting from row = 5 and column = 1 in clockwise direction (considering east =
0, north = 1, west = 2, and south = 3):

Origin Value Sequence of unit


5,1 2 21, 50, 21, 40, 43, 42, 21, 42, 23, 21, 22, 31

3-4
where the directions are given by superscript numbers.
To describe this structure, from origin (5,1) value 2 can be found in an area of 2
cells in the direction of south, 5 cells in direction of east, 2 cells in the direction of
north, and so on. Similarly, other objects (or boundaries) in the raster can also be
encoded (Bhatta, 2014: 483).
Chain codes can be stored using integer data types and therefore provide a very
compact way of storing a region representation; they allow certain operations such
as estimation of areas and perimeters, or detection of sharp turns, etc. They are
also useful for automated raster to vector conversion. Overlay operations such as
union and intersection are difficult to perform with chain codes. Another limitation
is the redundancy introduced because common boundaries between adjacent
regions must be stored twice (Freeman, 1974) in (Bhatta, 2014: 484).
Block Encoding
Block encoding is a 2D version of run-length encoding in which areas of common
cell values are represented with a single value. An array is defined as a series of
square blocks of the largest size possible. Recursively, the array is divided using
blocks of smaller and smaller size. It is sometimes described as a quadtree data
structure (see also Section 10.7.2.2) (Longley et al., 2004: 183).

Figure 3-3: Block Encoding of Raster Data (Bhatta, 2014)


Wavelet Transform
Wavelet compression techniques invoke principles similar to those discussed in
the treatment of fractals. They remove information by recursively examining
patterns in datasets at different scales, always trying to reproduce a faithful
representation of the original. A useful by-product of this for geographic
applications is that wavelet-compressed raster layers can be quickly viewed at
different scales with appropriate amounts of detail. MrSID (Multiresolution
Seamless Image Database) from LizardTech is an example of a wavelet
compression technique that is widely used in geographic applications, especially
for compressing aerial photographs. Similar wavelet compression algorithms are
available from other public and private sources and have been incorporated into
the JPEG 2000 standard which is increasingly being used for image compression.
Run-length and block encoding both result in lossless compression of raster layers,
that is, a layer can be compressed and decompressed without degradation of
information. In contrast, the MrSID wavelet compression technique is lossy since

3-5
information is irrevocably discarded during compression. Although MrSID
compression results in very high compression ratios, because information is lost
its use is limited to applications that do not need to use the raw digital numbers
for processing or analysis. It is not appropriate for compressing DEMs for example,
but many organizations use it to compress scanned maps and aerial photographs
when access to the original data is not necessary (Longley et al., 2004: 183).
Quad-tree Encoding
The quad-tree method divides a geographic area into square cells using the
principles of recursive subdivision of non-homogeneous square array of cells into
four equal sized quadrants. The quartering is continued to a suitable level until a
square is found to be homogeneous. The underlying structure is a levelled tree
where all non-leaf nodes have exactly four descendants (Bhatta, 2014: 485).
Figure 3-4 shows the successive division of a region into quadrant blocks. This
block structure may be described by a tree of degree 4, known as quad-tree. The
entire array 2n x 2n cells starts from the root node of the tree, and the height of
the tree is at most n levels. Each node has four children, respectively NW, NE,
SW, and SE quadrants. Leaf nodes correspond to those quadrants for which no
further subdivision is necessary (Bhatta, 2014: 485).

Figure 3-4: Quad-tree Encoding and Quad-tree Hierarchy (Bhatta, 2014)


This structure is efficient for relatively homogeneous areas. Quad-trees are
‘variable resolution’ arrays in which detail is represented only when available
without requiring excessive storage of parts where detail is lacking. Quad-tree
representation allows a region to be split up into parts, or to contain holes, without
difficulty. The quad-tree has the added advantage of variable resolution. The ease
of subsequent processing varies with the data structure used. It is important to
realize that quad-tree is more concerned with the overall data model, rather than
just saving space (Bhatta, 2014: 486).
3.2.2 Vector Data Model
The vector model is close to the traditional mapping approach where the objects
are represented as points, lines, or areas. In a vector model, the positions of
points, lines, and areas are precisely specified. The position of each object is
defined by a (series of) coordinate pairs (Bhatta, 2014: 486).
Vectors are graphical objects that have geometrical primitives such as points,
lines, and polygons to represent geographical entities in computer graphics.
Vectors have a precise direction, length, and shape, and can be defined by
coordinate geometry (Bhatta, 2014: 486).

3-6
All spatial data models are approaches for storing the spatial location of
geographic features in a database. Vector storage implies the use of vectors
(directional lines) to represent a geographic feature. Vector data is characterized
by the use of sequential points or vertices to define a linear segment. Each vertex
consists of an X coordinate and a Y coordinate (Buckley, 1997: 27).
Vector lines are often referred to as arcs and consist of a string of vertices
terminated by a node. A node is defined as a vertex that starts or ends an arc
segment. Point features are defined by one coordinate pair, a vertex. Polygonal
features are defined by a set of closed coordinate pairs. In vector representation,
the storage of the vertices for each feature is important, as well as the connectivity
between features, e.g. the sharing of common vertices where features connect
(Buckley, 1997: 27). Vector data models can be structured many different ways,
two of them are Spaghetti Model and Topological Model.
A coordinate-based data model that represents geographic features as points,
lines, and polygons. Each point feature is represented as a single coordinate pair,
while line and polygon features are represented as ordered lists of vertices.
Attributes are associated with each vector feature, as opposed to a raster data
model, which associates attributes with grid cells. Vector models are useful for
storing data that has discrete boundaries, such as country borders, land parcels,
and streets (Shekar and Xiong, 2008: 215).
Spaghetti Model
The spaghetti model is an early vector data model that was originally developed
to organize and manipulate line data. Lines are captured individually with explicit
starting and ending nodes, and intervening vertices used to define the shape of
the line. The spaghetti model records each line separately. The model does not
explicitly enforce or record connections of line segments when they cross, nor
when two line ends meet. A shared polygon boundary may be represented twice,
with a line for each polygon on either side of the boundary. Data in this form are
similar in some respects to a plate of cooked spaghetti, with no ends connected
and no intersections when lines cross (Bolstad, 2012: 34, 35).
The spaghetti model severely limits spatial data analysis and is little used except
for very basic data entry or translation. Rudimentary data entry and editing
software may initially store data as “spaghetti”. Because spaghetti data are
unstructured, lines often do not connect when they should, many common spatial
analyses are inefficient or impossible, or the results incorrect. Area calculation,
layer overlay, and many other analyses require “clean” spatial data in which all
polygons close and lines meet correctly. Spaghetti data are most often processed
to produce more structured, useful forms (Bolstad, 2012: 35).

3-1: Spaghetti and Topologic Data Model


Unlinked (spaghetti) data usually include data derived either from the manual
digitizing of maps or from digital photogrammetric registration. Consequently,
spaghetti data are often viewed as raw digital data. These data are amenable to

3-7
graphic presentation-the delineation of borders, for example – even though they
may not form completely closed polygons. Otherwise, their usefulness in GIS
applications is severely limited (Fazal, 2008: 168.).
One drawback is that both data storage and data searches are sequential. Hence
search times are often unduly long for such routine operations as finding
commonality between two polygons, determining line intersection points, or
identifying points within a given geographical area. Other operations vital in GIS,
such as overlaying and network analysis, are intractable. Furthermore, unlinked
data require an inordinate amount of storage memory because all polygons are
stored as independent coordinate sequences, which mean that all lines common
to two neighbouring polygons are stored twice (Fazal, 2008: 169).
Spaghetti models (sometimes termed simple data models) are the simplest of the
vector-based models where the geometric representations of spatial features do
not have any explicit relationship (e. g., topological or network) to any other
spatial feature. The geometries may be points, lines, or polygons. There are no
constraints with respect to how geometries may positioned – e. g., two lines may
intersect without a point being positioned at the location of intersection, or two or
more polygons may intersect without restriction (Shekar and Xiong, 2008: 216).
Spaghetti models may offer several advantages over other data models. These
advantages include simplicity of the model, ease of editing, and drawing
performance. The disadvantages of the spaghetti model include the possible
redundant storage of data and the computational expense in determining
topological or network relationships between features. In addition, spaghetti
models cannot be used to effectively represent surface data (Shekar and Xiong,
2008: 216).

Indexing and Hierarchical Data Structures


The partitioning and indexing of a spatial study area to optimize feature search
and retrieval operations in a horizontal fashion is called spatial indexing. Spatial
indexing is a necessary internal requirement for a GIS to handle large data sets
that cover a large study area (Buckley, 1997).
The proprietary organization of data layers in a horizontal fashion within a GIS is
known as spatial indexing. Spatial indexing is the method utilized by the software
to store and retrieve spatial data. A variety of different strategies exist for
speeding up the spatial feature retrieval process within a GIS software product.
Most involve the partitioning of the geographic area into manageable subsets or
tiles. These tiles are then indexed mathematically, e.g. by quad-trees, by R
(rectangle) trees, to allow for quick searching and retrieval when querying is
initiated by a user. Spatial indexing is analogous to the definition of map sheets,
except that specific indexing techniques are used to access data across map sheet
(tile) boundaries. This is done simply to improve query performance for large data
sets that span multiple map sheets, and to ensure data integrity across map sheet
boundaries (Buckley, 1997).
The notion of spatial indexing has become increasingly important in the design of
GIS software over the last few years, as larger scale applications have been
initiated using GIS technology. Users have found that often the response time in
querying very large data sets is unacceptably slow. GIS software vendors have
responded by developing sophisticated algorithms to index and retrieve spatial
data. It is important to note that raster systems, by the nature of their data

3-8
structure, do not typically require a spatial indexing method. The raster approach
imposes regular, readily addressable partitions on the data universe intrinsically
with its data structure. Accordingly, spatial indexing is usually not required.
However, the more sophisticated vector GIS does require a method to quickly
retrieve spatial objects (Buckley, 1997).
Further Reading: 10.7.2 Indexing (Longley et al., 2004: 231)
Topology
To overcome the limitations of the simple method of storing polygons, GIS
systems draw on ideas first developed in a branch of mathematics called topology,
which can be broadly explained as the way in which area data is stored in GIS
systems (Fazal, 2008: 169).
The numerical description of the relationships between geographic features, as
encoded by adjacency, linkage, inclusion, or proximity. Thus a point can be inside
a region, a line can connect to others, and a region can have neighbours (Buckley,
1997).
Spatial objects are classified into point object such as meteorological station, line
object such as highway and area object such as agricultural land, which are
represented geometrically by point, line and area respectively. For spatial analysis
in GIS, only the geometry with the position, shape and size in a coordinate system
is not enough but the topology is also required (Murai, 1998a).
Topology refers to the relationships or connectivity between spatial objects. The
geometry of a point is given by two dimensional coordinates (x, y), while line,
string and area are given by a series of point coordinates, as shown in Figure 3-5
(left). The topology however defines additional structure as follows (see Figure
3-5 (right)) (Murai, 1998a).

3-9
Figure 3-5: Geometry and topology of Vector Data Model (Bhatta, 2014)
Node: an intersect of more than two lines or strings, or start and end point of
string with node number
Chain: a line or a string with chain number, start and end node number, left and
right neighboured polygons
Polygon: an area with polygon number, series of chains that form the area in
clockwise order (minus sign is assigned in case of anti-clockwise order).
The advantages of the topological data model are to avoid duplication in digitizing
common boundaries of two polygons and to solve problems when the two versions
of the common boundary do not coincide.
The disadvantages are to have to build very correct topological data sets without
any single error and to be unable to represent islands in a polygon.
In practical applications of GIS, all possible relationships in spatial data should be
used logically with more complicated data structures. Figure 3-6 shows the several
topological relationships between spatial objects (Bhatta, 2014).

3-10
Figure 3-6: Topological Relationships
Further Reading: (Bhatta, 2014: 489-493), (Murai, 1998a: 14-17)
Topology has historically been viewed as a spatial data structure used primarily to
ensure that the associated data forms a consistent and clean topological fabric.
Topology is used most fundamentally to ensure data quality (e. g., no gaps or
overlaps between polygons representing land parcels) and allow a GIS to more
realistically represent geographic features. Topology allows you to control the
geometric relationships between features and maintain their geometric integrity
(Shekar and Xiong, 2008: 217).
The common representation of a topology is as a collection of topological
primitives – i. e., nodes, arcs, and faces, with explicit relationships between the
primitives themselves. For example, an arc would have a relationship to the face
on the left, and the face on the right. With advances in GIS development, an
alternative view of topology has evolved. Topology can be modelled as a collection
of rules and relationships that, coupled with a set of editing tools and techniques,
enables a GIS to more accurately model geometric relationships found in the world
(Shekar and Xiong, 2008: 217).
Topology, implemented as feature behaviour and user specified rules, allows a
more flexible set of geometric relationships to be modelled than topology
implemented as a data structure. For example, older data structure based
topology models enforce a fixed collection of rules that define topological integrity
within a collection of data. The alternative approach (feature behaviour and rules)
allows topological relationships to exist between more discrete types of features
within a feature dataset. In this alternative view, topology may still be employed
to ensure that the data forms a clean and consistent topological fabric, but also
more broadly, it is used to ensure that the features obey the key geometric rules
defined for their role in the database (Shekar and Xiong, 2008: 217).
Topology is a branch of mathematics that describes how spatial objects are related
to each other. The unique sizes, dimensions, and shapes of the individual objects

3-11
are not addressed by topology. Rather, it is only their relative relationships that
are specified (Ghilani and Wolf, 2012: 851).
In discussing topology, it is necessary to first define nodes, chains, and polygons.
These are some additional simple spatial objects that are commonly used for
specifying the topological relationships of information entered into GIS databases.
Nodes define the beginnings and endings of chains, or identify the junctions of
intersecting chains. Chains are similar to lines (or strings) and are used to define
the limits of certain areas or delineate specific boundaries. Polygons are closed
loops similar to areas and are defined by a series of connected chains. Sometimes
in topology, single nodes exist within polygons for labelling purposes (Ghilani and
Wolf, 2012: 851).
In GISs, the most important topological relationships are:
1. Connectivity. Specifying which chains are connected at which nodes.
2. Direction. Defining a "from node" and a "to node" of a chain.
3. Adjacency. Indicating which polygons are adjacent on the left and which are
adjacent on the right side of a chain.
4. Nestedness. Identifying what simple spatial objects are within a polygon.
They could be nodes, chains, or other smaller polygons.
The topological relationships just described are illustrated and described by
example with reference to Figure 3-7. For example in the figure, through
connectivity, it is established that nodes 2 and 3 are connected to form the chain
labelled b. Connectivity would also indicate that at node 2, chains a, b, and f are
connected. Topological relationships are normally listed in tables and stored within
the database of a GIS. Table 28.2(a) summarizes all of the connectivity
relationships of Figure 3-7 (Ghilani and Wolf, 2012: 851).

Figure 3-7: Vector representation of Simple Graphic Record


Directions of chains are also indicated topologically in Figure 28.3. For example,
chain b proceeds from node 2 to node 3. Directions can be very important in a
GIS for establishing such things as the flow of a river or the direction traffic moves
on one-way streets. In a GIS, often a consistent direction convention is followed,
that is, proceeding clockwise around polygons. Table 28.2(b) summarizes the
directions of all chains within Figure 28.3.

3-12
The topology of Figure 28.3 would also describe, through adjacency, that Smith
and Brown share a common boundary, which is chain f from node 5 to node 2,
and that Smith is in the left side of the chain and Brown is on the right. Obviously
the chain's direction must be stated before left or right positions can be declared.
Table 28.2(c) lists the adjacency relationships of Figure 28.3. Note that a zero has
been used to designate regions outside of the polygons and beyond the area of
interest.
Nestedness establishes that the well is contained within Brown's polygon. Table
28.2(d) lists that topological information.
The relationships expressed through the identifiers for points, lines, and areas of
Table 28.1 and the topology in Table 28.2 conceptually yield a "map." With these
types of information available to the computer, the analysis and query processes
of a GIS are made possible (Ghilani and Wolf, 2012: 852).
Table 3-1: Topological Relationships in the Graphic Record of Figure 3-7

Connectivity Direction Adjacency Nestedness


From To Left Right Nested
Nodes Chain Chain Chain Polygon
Node Node Polygon Polygon Node
1-2 a a 1 2 a 0 I II Well
2-3 b b 2 3 b 0 II
3-4 c c 3 4 c 0 II
4-5 d d 4 5 d 0 II
5-1 e e 5 1 e 0 I
5-2 f f 5 2 f I II

Obviously the chain's direction must be stated before left or right positions can be
declared. Table 28.2(c) lists the adjacency relationships of Figure 28.3. Note that
a zero has been used to designate regions outside of the polygons and beyond the
area of interest.
Nestedness establishes that the well is contained within Brown's polygon. Table
28.2(d) lists that topological information.
The relationships expressed through the identifiers for points, lines, and areas of
Table 28.1 and the topology in Table 28.2 conceptually yield a "map." With these
types of information available to the computer, the analysis and query processes
of a GIS are made possible.

3.3 Non-spatial Data Model


3.3.1 TIN Data Model
A data structure that allows to represent a continuous spatial field through a finite
set of (location, value) pairs and triangles made from them. Commonly in use as
digital terrain model, but can be used for geographic fields other than elevation
(de By and Huisman, 2009: 526). When we talk about geographic field, a TIN
(Triangulated Irregular Network) is a tessellated data structure that uses
contiguous, non-overlapping triangles to represent geographic surfaces. Whereas
the raster depiction of a surface represents elevation as an average value over
the spatial extent of the individual pixel, the TIN data structure models each vertex

3-13
of the triangle as an exact elevation value at a specific point on the earth. The
arch between each vertex are an approximation of the elevation between the two
vertices. These arcs are then aggregated into triangles from which information on
elevation, slope, aspect, and surface are can be derived across the entire extent
of the model’s space. Note that term “irregular” in the name of the data model
refers to the fact that the vertices are typically laid out in a scattered fashion
(Campbell and Shin, 2012: 119, 120).
The use of TINs confers certain advantages over raster-based elevation models.
First, linear topographic features are very accurately represented relative to their
raster counterpart. Second, a comparatively small number of data points are
needed to represent a surface, so file sizes are typically much smaller. This is
particularly true as vertices can be clustered in areas where relief is complex and
can be sparse in areas where relief is simple. Third, specific elevation data can be
incorporated into the data model in a post hoc fashion via the placement of
additional vertices if the original is deemed insufficient or inadequate. Finally,
certain spatial statistics can be calculated that cannot be obtained when using a
raster-based elevation model, such as flood plain delineation, storage capacity
curves for reservoirs, and time-area curves for hydrographs (Campbell and Shin,
2012: 119, 120).
A triangular irregular network (TIN) is a specific format for the representation of
fields that relies on a network of lines connecting sampled points with known
values. The connections form a Delauney triangulation, which means that each
point is connected to only two other points to create triangular faces. This type of
GI representation is most commonly used for the visualization of elevation data,
but can be used for any data that is collected using irregular samples in an area.
Dynamic versions of TIN make it possible to rapidly change the TIN. The changes
can be so rapid that dynamic TIN holds potential to help train people for complex
navigation situations (Harvey, 2008: 190).
A commonly used data structure in GIS software is the triangular irregular network
or TIN. It is one of the standard implementation techniques for digital terrain
models, but it can be used to represent any continuous field. The principles behind
a TIN are simple. It is built from a set of locations for which we have measurement,
for instance an elevation. The locations can be arbitrarily scattered in space, and
are usually not on a nice regular grid. Any location together with its elevation
value can be viewed as a point in three-dimensional space. This is illustrated in
Figure 3-8 (left). From these 3D points, we can construct an irregular tessellation
made of triangles. Two such tessellations are illustrated in the Figure 3-8 (middle
and right) (de By and Huisman, 2009: 92).
In three-dimensional space, three points uniquely determine a plane, as long as
they are not collinear, i.e. they must not be positioned on the same line. A plane
fitted through these points has a fixed aspect and gradient, and can be used to
compute an approximation of elevation of other locations. Since we can pick many
triples of points, we can construct many such planes and therefore we can have
many elevation approximations for a single location, such as ‘P’ in the figure. So,
it is wise to restrict the use of a plane to the triangular area between the three
points (de By and Huisman, 2009: 92).
If we restrict the use of a plane to the area between its three anchor points, we
obtain a triangular tessellation of the complete study space. Unfortunately, there
are many different tessellations for a given input set of anchor points as the figure
demonstrates with two of them. Some tessellations are better than others, in the
sense that they make smaller errors of elevation approximation. For instance, if

3-14
we base our elevation computation for location P on the middle shaded triangle,
we will get another value than from the right shaded triangle. The second will
provide better approximation because the average distance from P to the three
triangle anchors is smaller (de By and Huisman, 2009: 93).
Delauney Triangulation
The triangulation of the Figure 3-8 (right) happens to be a Delauney Triangulation,
which in a sense is an optimal triangulation. There are multiple ways of defining
what such a triangulation is, but we suffice here to state two important properties.
The first is that the triangles are as equilateral (equal sided) as they can be, given
the set of anchor points. The second property is that for each triangle, the circum-
circle through its three anchor points does not contain any other anchor point.
One such circum-circle is depicted on the Figure 3-8 (right) (de By and Huisman,
2009: 94).

Figure 3-8: TIN Construction with Delaunay Triangulation

TIN Data Structure


A triangulated irregular network (TIN) is a data model commonly used to represent
terrain heights. Typically the x, y, and z locations for measured points are entered
into the TIN data model. These points are distributed in space, and the points may
be connected in such a manner that the smallest triangle formed from any three
points may be constructed. The TIN forms a connected network of triangles.
Triangles are created such that the lines from one triangle do not cross the lines
of another. Line crossings are avoided by identifying the convergent circle for a
set of three points. The convergent circle is defined as the circle passing through
all three points. A triangle is drawn only if the corresponding convergent circle
contains no other sampling points. Each triangle defines a terrain surface, or facet,
assumed to be of uniform slope and aspect over the triangle.
The TIN model typically uses some form of indexing to connect neighbouring
points. Each edge of a triangle connects to two points, which in turn each connect
to other edges. These connections continue recursively until the entire network is
spanned. Thus, the TIN is a rather more complicated data model than the simple
raster grid when the objective is terrain representation.
Compared to the grid model, the TIN model Figure 3-9 is cumbersome to establish
but more efficient to store because areas of terrain with little detail are described
with fewer data than similar areas with greater variation. TIN models are good for
describing terrain because the sharp breaks of slope between uniform slope facets
fit certain types of terrain well (Fazal, 2008: 210).
Further Reading: (Murai, 1998b: 32-34)

3-15
Figure 3-9: TIN Data Model with triangles in topological structure

Advantages and Disadvantages of TIN


A TIN is a much ‘sparser’ data structure: the amount of data stored is less if we
try to obtain a structure with approximately equal interpolation error, as compared
to a regular raster. The quality of the TIN depends on the choice of anchor points,
as well as on the triangulation built from it. It is, for instance, wise to perform
‘ridge following’ during the data acquisition process for a TIN. Anchor points in
elevation ridges will assist correctly representing peaks and mountain slope faces
(de By and Huisman, 2009: 116).
While the TIN model may be more complex than simple raster models, it may also
be much more appropriate and efficient when storing terrain data in areas with
variable relief. Relatively few points are required to represent large, flat, or
smoothly continuous areas. Many more points are desirable when representing
variable, discontinuous terrain. Surveyors often collect more samples per unit area
where the terrain is highly variable. A TIN easily accommodates these differences
in sampling density, with the result of more, smaller triangles in the densely
sampled area. Rather than imposing a uniform cell size and having multiple
measurements for some cells, one measurement for others, and no measurements
for most cells, the TIN preserves each measurement point at each location
(Bolstad, 2012: 51).

3.4 Images
Image data is most often used to represent graphic or pictorial data. The term
image inherently reflects a graphic representation, and in the GIS world, differs
significantly from raster data. Most often, image data is used to store remotely
sensed imagery, e.g. satellite scenes or orthophotos, or ancillary graphics such as
photographs, scanned plan documents, etc. Image data is typically used in GIS

3-16
systems as background display data (if the image has been rectified and geo-
referenced); or as a graphic attribute. Remote sensing software makes use of
image data for image classification and processing. Typically, this data must be
converted into a raster format (and perhaps vector) to be used analytically with
the GIS (Buckley, 1997: 29).
Image data is typically stored in a variety of de facto industry standard proprietary
formats. These often reflect the most popular image processing systems. Other
graphic image formats, such as TIFF, GIF, PCX, etc., are used to store ancillary
image data. Most GIS software will read such formats and allow you to display
this data (Buckley, 1997: 29).
A wide variety of satellite imagery and aerial photography is available for use in
geographic information systems (GISs). Although these products are basically
raster graphics, they are substantively different in their usage within a GIS.
Satellite imagery and aerial photography provide important contextual information
for a GIS and are often used to conduct heads-up digitizing whereby features from
the image are converted into vector datasets (Campbell and Shin, 2012: 98).

3-17
CHAPTER 4 DATA SOURCES

Data Sources: [4 hrs]


Data Input and Data Quality; Major data feeds to GIS and their characteristics;
maps, GPS, images, databases; commercial data; locating and evaluating data;
data formats; data quality; metadata.

4.1 Sources of Spatial Data


Creating a GIS database is a complex operation which may involve data capture,
verification, and structuring processes. Because raw geographical data are
available in many different analogue or digital forms, such as maps, aerial
photographs, satellite images or tables, a spatial database can be built in several,
not mutually exclusive ways. These are (Burrough and McDonnell, 2014: 76):
 Acquire data in digital form from a data supplier
 Digitize existing analogue data
 Carry out one’s own survey of geographic entities
 Interpolate from point observations to continuous surfaces
In all cases the data must be geometrically registered to a generally accepted and
properly defined coordinate system and coded so that they can be stored in the
internal database structure of the GIS being used. The desired result should be a
current, complete database which can support subsequent data analysis and
modelling (Burrough and McDonnell, 2014: 76).
Further Reading: 17.4 Data Encoding Methods (Bhatta, 2014: 527-543)

4.2 Data Quality


4.2.1 Data Quality and Components
Quality can simply be defined as the fitness for use for a specific data set. Data
that is appropriate for use with one application may not be fit for use with another.
It is fully dependant on the scale, accuracy, and extent of the data set, as well as
the quality of other data sets to be used. The U.S. Spatial Data Transfer Standard
(SDTS) identifies five components to data quality definitions. These are (Buckley,
1997: 12):
Data quality standards have been developed at both national and international
levels in support of mandates for data acquisition and dissemination. Data quality
documentation plays a key role in many standards due to the realisation that an
understanding of quality is essential to the effective use of geospatial data (Salgé,
1999: 183).
Table 4-1: Data Quality Components in SDTS (Salgé, 1999: 184)

Component Description
Lineage Refers to source materials, methods of derivation and transformation
applied to a database.
 Includes temporal information (data that the information refers to on
the ground).

4-1
 Intended to be precise enough to identify the sources of individual
objects (i.e. if a database was derived from different source, lineage
information is to be assigned as an additional attribute of objects or
as a spatial overlay).
Positional Refers to accuracy of the spatial component
Accuracy  Subdivided into horizontal and vertical accuracy elements.
 Assessment methods are based on comparison to source, comparison
to a standard of higher accuracy, deductive estimates of internal
evidence.
 Variation in accuracy can be reported as quality overlays or additional
attributes.
Attribute Refers to the accuracy of the thematic component
Accuracy  Specific tests vary as a function of measurement scale.
 Assessment methods are based on deductive estimates, sampling or
map overlay.
Logical Refers to the fidelity of the relationships encoded in the database.
Consistency
 Includes tests of valid values for attributes, and identification of
topological inconsistencies based on graphical or specific topological
tests.
Completeness Refers to the relationship between database objects and the abstract
universe of all such objects.
 Includes selection criteria, definitions and other mapping rules used to
create the database.

4.2.2 Data Quality Standards


In the field of geographical information, standardisation began more than 25 years
ago. First to emerge was the requirement for transferring data from one system
to another with a static perspective: Lang (1970) was perhaps the first of many
proponents. Later came the requirements for interoperability (Salgé, 1999: 693).
A concern for data quality issues is clearly expressed in the development of data
transfer and metadata standards. Such standards have been developed at both
national and international levels in support of mandates for data acquisition and
dissemination. Data quality documentation plays a key role in many standards due
to the realisation that an understanding of quality is essential to the effective use
of geospatial data (Veregin, 1999: 183).
More than 25 organizations are involved in the standardization of various aspects
of geographic data and geo-processing. Several of these are country and domain
specific. At the global level, ISO (the International Standards Organization) is
responsible for coordinating efforts through the work of technical committees (TC
211 and 287). In Europe, CEN (Commission European Normalization) is engaged
in geographic standardization (Fazal, 2008: 131).

Further Reading:

4-2
SDTS - http://data.geocomm.com/sdts/
Federal Geographic Data Committee (FGDC) - http://www.fgdc.gov/
ISO (International Organization for Standardization) – ISO/TC 211 - Geographic
information/Geomatics –
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_tc_browse.htm?co
mmid=54904
Open GIS Consortium (OGC)

4.2.3 Sources of Error in Spatial Data


(Burrough and McDonnell, 2014: 222)

4.3 Major Data Feeds


The sources of geospatial data are digitized maps, aerial photographs, satellite
images, statistical tables and other related documents (Murai, 1998b).

Further Reading: (Bhatta, 2014: 525-527)

4.4 Data Formats


With a 30-year history and with so many alternative ways to structure map and
attribute data, it is hardly surprising that most GISs use radically different
approaches to handling their content. The data structures used are often invisible
as far as the GIS user is concerned. We might not even need to understand exactly
what is happening when two maps are overlain. However, if we are to be objective,
scientific GIS users, at the very least we must have a full understanding of the
errors and transformations involved. Regardless of how a GIS structures its maps
as numbers, it must be able to import data from other GIS packages and from the
most common data sources, as well as scanned and digitized data, and to convert
the result into its own internal format. In some cases this is an open process.
Some GIS companies have published and documented their internal or exchange
data formats, including Intergraph and Autodesk. Others protect their internal
data as a trade secret, in the hope of being able to sell data and data converters
as well as their GIS (Clarke, 1995: 80).
One of the biggest problems with data obtained from external sources is that they
can be encoded in many different formats. There are so many different geographic
data formats because no single format is appropriate for all tasks and applications.
It is not possible to design a format that supports all the applications. The many
different formats have evolved in response to diverse user requirements (Fazal,
2008: 131).
Format - The pattern into which data (coordinates, attributes, indexes, spatial
reference, etc.) is systematically arranged for use on a computer. A file format is

4-3
the specific design of how information is organized in the file. (All GIS data is a
file on disk at the most basic level).
Table 4-2: Some Examples of Geographic Data Formats (Fazal, 2008)

Further Reading: (Bhatta, 2014: 499-501)

4.5 Meta Data


Metadata is defined as background information that describes all necessary
information about the data itself. More generally, it is known as ‘data about data’
This includes (de By and Huisman, 2009: 282):
 Identification information: Data source(s), time of acquisition, etc.
 Data quality information: Positional, attribute and temporal accuracy,
lineage, etc.
 Entity and attribute information: Related attributes, units of measure, etc.
In essence, metadata answer who, what, when, where, why, and how questions
about all facets of the data made available. Maintaining metadata is a key part in
maintaining data and information quality in GIS. This is because it can serve

4-4
different purposes, from description of the data itself through to providing
instructions for data handling. Depending on the type and amount of metadata
provided, it could be used to determine the data sets that exist for a geographic
location, evaluate whether a given data set meets a specified need, or to process
and use a data set (de By and Huisman, 2009: 282).
Metadata is structured information that describes, explains, locates, or otherwise
makes it easier to retrieve, use, or manage an information resource. Metadata is
often called data about data or information about information (National
Information Standards Organization (NISO), 2004).
Metadata are a special type of non-geometric data that are increasingly being
collected. Some metadata are derived automatically by the GIS software system
(for example, length and area, extent of data layer, and count of features), but
some must be explicitly collected (for example, owner name, quality estimate, and
original source). Explicitly collected metadata can be entered in the same way as
other attributes as described above (Fazal, 2008: 133).
Further Reading: (Bhatta, 2014: 516) and (Chang, 2014: 101)
4.5.1 Types of Metadata
There are three main types of metadata (National Information Standards
Organization (NISO), 2004):
A) Descriptive metadata describes a resource for purposes such as discovery and
identification. It can include elements such as title, abstract, author, and
keywords.
B) Structural metadata indicates how compound objects are put together, for
example, how pages are ordered to form chapters.
C) Administrative metadata provides information to help manage a resource, such
as when and how it was created, file type and other technical information, and
who can access it. There are several subsets of administrative data; two that
sometimes are listed as separate metadata types are:
1. Rights management metadata, which deals with intellectual property rights,
and
2. Preservation metadata, which contains information needed to archive and
preserve a resource.
Metadata can describe resources at any level of aggregation. It can describe a
collection, a single resource, or a component part of a larger resource (for
example, a photograph in an article). Just as catalogers make decisions about
whether a catalog record should be created for a whole set of volumes or for each
particular volume in the set, so the metadata creator makes similar decisions
(National Information Standards Organization (NISO), 2004).

4.5.2 Metadata Standards


Content Standards for Digital Geospatial Metadata (CSDGM)
Under an executive order from the U.S. government, the Federal Geographic Data
Committee (FGDC), a 19-member interagency committee, was charged with the

4-5
task of developing national geospatial metadata standards. As a direct result, the
FGDC developed the Content Standards for Digital Geospatial Metadata, or
CSDGM. Since its origin in the mid-1990s, the CSDGM has developed into the most
widely used metadata standard to date. CSDGM has even become the basis for
numerous standards worldwide.
The intent for CSDGM was to help the U.S. government minimize the costs of data
acquisition efforts and to promote interagency data sharing. The CSDGM was
developed for consistent geospatial dataset description and utilizes a group of
seven core (primary) and three floating elements. The core elements make up the
major element set, whereas the floating elements are nonessential and wholly
optional. Figure 4-1 details the seven CSDGM metadata core elements (Galati,
2006: 50).

Figure 4-1: CSDGM metadata core elements (Galati, 2006: 50)

Further Reading: (Bhatta, 2014: 518-519)

4-6
CHAPTER 5 DATABASE CONCEPTS

Contents of this Chapter


Database Concepts: [8 hrs]
Database concepts and components; flat files; relational database systems; data
modelling; views of the database; normalization; databases and GIS.

5.1 Database Concepts


A database is a collection of data stored in a structured format using a computer.
A database can be thought of as a table, but the distinction is that the table is just
one way (of many) to represent database. The first databases were flat-file
databases: compute files of text with one record on each line, usually encoded in
the ASCII format. Each entry (e.g., a person’s name and address), was separated
by a special mark and commas or tabs separated by the characteristics (e.g.,
name, street, house number, city, post code, state, and country). Finding a
particular person required searching through the entire data file one entry after
another. A flat-file database can also be represented as a list, a table, or a
spreadsheet. In other types of databases the data is stored as records and fields
that correspond to entries and characteristics in the flat-file database. Records
and fields have become the accepted terms when working with databases. The
term ‘tuple’ is used to represent a single data item in a table. A ‘field’ refers to the
division of the data into separate parts of each data item. An ‘attribute’ is the
particular entry in a field (e.g., Main Street in the field ‘Street’). A single database
record including all attributes is called an ‘entry’ (Harvey, 2008: 127).
In a non-spatial domain, databases have been in use sing the 1960’s, for various
purposes like bank account administration, stock monitoring, salary
administration, order bookkeeping, and flight reservation systems to name just a
few. The common denominator between these applications is that the amount of
data is usually quite large, but the data itself has a simple and regular structure
(de By and Huisman, 2009: 158).
Designing a database is not an easy task. Firstly, one has to consider carefully
what the database purpose is, and who its users will be. Secondly, one needs to
identify the available data sources and define the format in which the data will be
organized within the database. This format is usually called the database
structure. Lastly, data can be entered into the database. It is important to keep
the data up-to-date, and it is therefore wise to set up the processes for this, and
make someone responsible for regular maintenance of the database.
Documentation of the database design and set-up is crucial for an extended
database life. Many enterprise databases tend to outlive the professional careers
of their original designers. A database management system (DBMS) is a software
package that allows the user to set up, use and maintain a database. Lime a GIS
allows the set-up of a GIS application, a DBMS offers generic functionality for
database organization (de By and Huisman, 2009: 158).

5-1
5.2 Database Types
Several types of database models (or database schema) exist, such as the flat,
hierarchical, network, and relational models (Worboys 1995; Jackson 1999) as
cited in (Campbell and Shin, 2012: 110). The object oriented is newer but rapidly
gaining in popularity for some applications (Buckley, 1997: 31).
5.2.1 Flat File
In a flat file data structure or tabular file, each record in the file contains the
same data fields as other records. The values of each field or fields may differ and
usually one field is designated as a ‘key’ field which is used for locating a particular
record or for sorting the file in a particular order. When such fields are
computerized the task of locating and sorting the file can be a relatively easy task
(Cho, 1995: 145).

Figure 5-1: Flat File Structure


A sequential search of flat files is possible when the files are ordered sequentially
by the value of the ‘key’ field. Searching for a record based on the value of the
key is relatively fast because searching stops when the desired record is found
(Cho, 1995: 145).
In a binary search a sequenced key field can be searched much more rapidly than
any other technique. Here the search starts in the middle of a file sequenced in
order of the key field and compares the value of the key in the record within the
desired value for a ‘greater than’ or ‘less than’ consideration and eliminates for
consideration the half of the file in which the desired record could not be stored.
The search then goes halfway again through the remaining half and compares the
value of the key field in the record to the desired value, eliminating half of those
records (now one-quarter of the original file). The search continues by halving the
records for consideration until the desired record(s) is found (Cho, 1995: 145).
Indexed searches are used when the key field is not sequenced. An index is a
separate table containing the key of every record and an address pointing to the
location of the data record for each key. The address is the location on the physical
storage device used and is assigned by the software that controls the device. The
index is then sequenced, and a binary search is used on the index rather than on
the data file. New records may be added at the end and there is no need to re-
sequence the entire file — only the index (Cho, 1995: 145).
Flat file structures are simple and efficient for repetitive tasks especially in
transaction-based information systems. However, such structures can be inflexible
and unresponsive to certain types of queries. Access to records based on any field
other than the key is very slow. The fields cannot be readily expanded to

5-2
accommodate more information without major effort. Adding new records to the
data files requires additional processing (Cho, 1995: 145).
5.2.2 Hierarchical File Structure
In a hierarchical file structure (tree-structure) there is more than one type of
record held in different files. Pointers allow a ‘one-to-many’ relationship where
each master record can have one or more detailed records associated with it.
Access to the data is limited to one type of record — the master record. In order
to remember this structure, the master record may be thought of as the ‘parent’
which can be associated with any number of detailed records or ‘children’ through
internally assigned pointers. Furthermore, the detailed records can also have
‘children’ associated with them, with additional pointers assigned for the third level
of association. A distinguishing characteristic here is that each record has one
higher level record associated to it. While flat files use up data storage space in
an attempt to accommodate for as many variations as possible, adding a further
field requires a major re-structuring of the file. There need be little or no need for
additional storage space in hierarchical structures because the data files can
accommodate additions and changes in a flexible way. A drawback of the
hierarchical file structure is that records may only be used by first accessing the
master record and data in detailed records must be repeated for each master
record association. As geographical data have a natural tree-structure,
hierarchical file structures have been used in some GISs. However, such tree
structures can be inflexible because new linkages cannot be defined between
records once the tree has already been established. Also lateral or diagonal
linkages cannot be defined because the relationships are vertically structured
(Cho, 1995: 146).

Figure 5-2: Hierarchical Data Structure

5.2.3 Networks
A network file structure is organized so that there is more than one type of
record with pointers allowing related records to be associated with each other in
a ‘many-to-many’ relationship. Access to the data can be made on any type of
record, and pointers are used to associate different records with related data. A
network file structure has many files containing different, yet related, records. The
relationship is through key fields that are common between the files. Pointers to
related records are contained in different files. Any update to the information
necessitates only one change to the relevant record. While changes take place,
the database management system software keeps track of the pointers, ensuring
that the network relationships are intact. The expansion of the database and the

5-3
complexity in maintaining the pointers within the network can require greater
amounts of storage. The management of these pointers can become so
cumbersome that the system becomes inflexible and difficult to use. However, the
network model has greater flexibility than the hierarchical model for handling
complex spatial relationships. But, in turn, network models have not had
widespread use in GISs because of the greater inflexibility of relational database
models (Cho, 1995: 146).

5.2.4 Relational Database


Modern geographic information system (GIS) software typically employs a fourth
model referred to as a relational database
A relational database structure is made up of separate flat files which contain
related data that can be combined by matching records having the same values
in columns common to the files. No pointers or keys are used, thus reducing the
complexity of the network. Logical linkages of the data values in common fields
among files form the associations between files. The database is thus relational
since one or more other tables are related by a common data item and then joined
to form a new table. A major advantage is the almost unlimited flexibility in
forming relationships among data items in the database without additional
difficulties of managing the linkages. Without the need for keys and pointers to
manage new relationships, data items can be added dynamically ‘on the fly’ rather
than having to do major re-programming and restructuring of the database. In
sum, the relational database management system is the most flexible database
model. It is a ‘user view’ of the data and not the way the data are organized
internally (Cho, 1995: 146). Details in section 5.3.
5.2.5 Object Oriented Databases
The object-oriented database model manages data through objects. An object is
a collection of data elements and operations that together are considered a single
entity. The object-oriented database is a relatively new model. This approach has
the attraction that querying is very natural, as features can be bundled together
with attributes. To date, only a few GIS packages are promoting the use of this
attribute data model. However, initial impressions indicate that this approach may
hold many operational benefits with respect to geographic data processing.
Fulfilment of this promise with a commercial GIS product remains to be seen
(Buckley, 1997: 34).

5.3 Relational Database


The relational database sets itself apart from flat-file database through the way it
stores data and the possibilities for relating data. A relational database stores data
is separate files, usually called ‘tables’, which can be related to other tables in the
database by common fields. A relational database may consists of 100s or even
1000s of tables. Every table in a relational database has a key field that allows
each record to be uniquely identified. This key field is usually indexed to speed up
operations. This key field is especially important for geographic information
because of the large amounts of data that easily come together. In a relational
database, attributes can stand in different relationships to attributes in other
tables. A one-to-one relationship relates a single record in one table with a single

5-4
record in another table. A one-to-many relationship relates a single record in one
table with multiple records in another table; many-to-one relationship does the
opposite. A many-to-many relationship relates many records from one table to
many records from another table. This last relationship is rarely desirable because
the meaningful relationships between the records cannot be differentiated from
spurious and erroneous relationships. A one-to-many or many-to-one
relationships may be called for in a variety of situations (e.g., the cities of one
state or the states of one country). These usually reflect a hierarchy or a grouping
of attributes and their corresponding records (Harvey, 2008: 128).

Databases should consider both geographic representation and cartographic


representation. Based on the nature of the intended geographic information uses,
the creation of a database could consider only geographic representation. GIS
databases are in almost all cases relational databases, which have great flexibility
for geographic information and cartographic needs. You can think of a database
as a set of tables that can be put into relationship with each other based on
characteristics, or what are usually called ‘attributes’. They are stored as one of
several data types: integer, floating-point, characters, data and time, or simple
large binary objects. These tables are different from spreadsheets because each
value of a characteristic is kept grouped into a record of all recorded characteristics
for that database entity or object. Tables in relational databases are related using
the entity-relationship model. These database tables are also used for recording
attributes that are used for the symbolization of things and events. Data modelling
plays a key part in preparing the geographic representation and cartographic
representation of geographic information or maps (Harvey, 2008: 135).

5.3.1 Primary Key


See Chapter 18 Database Management Systems – R G Healey

(Healey, 1991: 257)

5-5
5.3.2 Foreign Key

5.3.3 Normalization
Data normalization is a process of assuring that a database can take best
advantage of relational database principles. If you normalize a database you can
not only improve its performance, but avoid some organizational and logical errors
that could diminish the quality of the database. Data normalization of relationship
database technology was first described by Edgar Codd in the 1970s. The first
level of data normalization requires that each field contain only one value (e.g.,
only the house number, not the house number and street name). The second level
requires that each value of a record is dependent of the key value of the record
(e.g. the name of the person). In the third level, no fields depend on nonkey fields
(e.g. a ‘years at residence’ field must be related to the name of the addressed
person, not the street number) (Harvey, 2008: 133).

A certain amount of necessary data redundancy is implicit in the relational data


model because the join mechanism matches column values between tables.
Without careful table design it is all too easy to introduce further unnecessary
redundancy into the database. To prevent this, table design should follow Codd’s
(1970) theory of normal forms, which specifies what types of values columns may
contain and how columns in a table are to be dependent on the primary key.
The first requirement of the theory is that all the tables must contain rows and
columns as already noted, and column values must be atomic, that is they do not
contain repeating groups of data, such as multiple values of a census variable for
different years. The second requirement of normal form is that every column,
which is not part of the primary key, must be fully dependent of the primary key.
This can be understood most readily by considering the example in Fig. 18.6 of a
table which is not in second normal form. In this case the feature name is
dependent of Feature No., but not Trail No., because feature type is not an
attribute of the ‘Features-visible’ relationship set, but of Features entity set. The
effect of not meeting the requirement can be seen by the introduction of
unnecessary redundancy into the table in row 1 and 4. If the feature Type was
changed in the first row and the corresponding change in the fourth row was
overlooked, the database wold be left in an inconsistent state (Healey, 1991: 258).

5-6
5.3.4 Advantages and Disadvantages
The advantages can be summarized as follow:
 Rigorous design methodology based on sound theoretical foundations.
 All the other database structures can be reduced to a set of relational tables,
so they are the most general form of data representation.
 Ease of use and implementation compared to other types of system.
 Modifiability, which allows new tables and new rows of data within tables to
be added without difficulty.
 Flexibility in ad hoc data retrieval because of the relational join mechanism
and powerful query language facilities.
Disadvantages include:
 A greater requirement for processing resources with increasing numbers of
users on a given system than with the other types of database.
 On heavily loaded systems, queries involving multiple relational joins may
give slower response times than are desirable. This problem can largely be
mitigated by effective use of indexing and other optimization strategies,
together with the continued improvement in price performance in
computing hardware from maintenance to PCs.
The important advantages of the relational approach and the availability of good
proprietary software systems such as ORACLE, INGRES and DB2 have contributed
greatly to the rapid adoption of this technology, both in the GIS field and
automated data processing operations of all other kinds, since the beginning of
the 1980s. Relational systems now dominate the market for DBMS in the GIS
sector and this will continue for the foreseeable future

5-7
(Harvey, 2008: 133)

5.4 Databases and GIS


5.4.1 External DBMS
GIS software provides support for spatial data and thematic or attribute data. GISs
have traditionally stored spatial data and attribute data separately. This required
GIS to provide a link between the spatial data (represented with rasters or
vectors), and their non-spatial attribute data. The strength of GIS technology lies
in its built-in ‘understanding’ of geographic space and all functions that derive
from this, for purposes such as storage, analysis, and map production. GIS
packages themselves can store tabular data, however, they do not always provide
a full-fledged query language to operate on the tables (de By and Huisman, 2009:
179).
DBMSs have a long tradition in handling attribute (i.e. administrative, non-spatial,
tabular, thematic) data in a secure way, for multiple users at the same time.
Arguably, DBMSs offer much better table functionality, since they are specifically
designed for this purpose. A lot of data in GIS applications is attribute data, so it
made use of external DBMSs for data support. In this role, the DBMS serves as a
centralized data repository for all users, while each user runs his/her own GIS
software that obtains its data from the DBMS. This meant that a GIS had to link
the spatial data represented with rasters or vectors, and the attribute data stored
in an external DBMS (de By and Huisman, 2009: 179).

5-8
Figure 5-3: A Raster Representing Land Use and a Related Table (de By and Huisman,
2009: 180)
With raster representations, each raster cell stores a characteristic value. This
value can be used to look up attribute data in an accompanying database table.
For instance, the land use raster of Figure 5-3 indicates the land use class for each
of its cells, while an accompanying table provides full description for all classes,
including perhaps some statistical information for each of the types. Observe the
similarity with the key/foreign key concept in relational databases (de By and
Huisman, 2009: 179).
With vector representations, our spatial objects – whether they are points, lines
or polygons – are automatically given a unique identifier by the system. This
identifier is usually just called the object ID or feature ID and is used to link the
spatial object (as presented in vectors) with its attribute data in an attribute table.
The principle applied here is similar to that in raster settings, but in this case each
object has its own identifier. The ID in the vector system functions as a key, and
any reference to an ID value in the attribute database is a foreign key reference
to the vector system. For example, in Figure 3.8, parcel is a table with attributes,
linked to the spatial objects stored in a GIS by the ‘Location’ column. Obviously,
several tables may make references to the vector system, but it is not uncommon
to have some main table for which the ID is actually also the key (de By and
Huisman, 2009: 180).

5-9
5.4.2 Spatial Database Functionality
DBMS vendors over the last 20 year recognized the need for storing more complex
data, like spatial data. The main problem was that there is additional functionality
needed by DBMS in order to process and manage spatial data. As the capabilities
of our hardware to process information has increased, so too has the desire for
better ways to represent and manage spatial data. During the 1990’s, object-
oriented and object-relational data models were developed for just this purpose.
These extend standard relational models with support for objects, including
‘spatial’ objects (de By and Huisman, 2009: 182)
Currently, GIS software packages are able to store spatial data using a range of
commercial and open source DBMDs such as Oracle, Informix, IBM DB2, Sybase,
and PostgreSQL, with the help of spatial extensions. Some GIS software have
integrated database ‘engines’, and therefore do not need these extensions. ESRI’s
ArcGIS, for example, has the main components of the MS Access database
software built-in. This means that the designer of a GIS application can choose
whether to store the application data in the GIS or in the DBMS. Spatial databases,
also known as geodatabases, are implemented directly on existing DBMSs, using
extension software to allow them to handle spatial objects (de By and Huisman,
2009: 182)

5-10
CHAPTER 6 SPATIAL ANALYSIS

Contents of this Chapter


Spatial Analysis: [14 hrs]
Vector-based Analysis:
Data management functions; Data Analysis functions; Measurement function,
Selection, Vector Overlay functions.
Raster-based Analysis:
Spatial Interpolation methods, raster analysis including topological overlay,
map calculations, spread computations, classification function,
reclassification.

6.1 Measurement Functions


Geometric measurement on spatial features includes counting, distance and area
size computations. For the sake of simplicity, this section discusses such
measurements in a planer spatial reference system. We limit ourselves to
geometric measurements, and do not include attribute data measurement, which
is typically performed in a database query language. In general, measurements
on vector data are more advanced, thus, also more complex, than those on raster
data (de By and Huisman, 2009: 350).
6.1.1 Measurement on Vector Data
The primitives of vector data sets are point, (poly)line and polygon. Related
geometric measurements are location, length, distance and area size. Some of
these are geometric properties of a feature in isolation (location, length, area
size); others (distance) require two features to be identified (de By and Huisman,
2009: 351).
The location property of a vector feature is always stored by the GIS: a single
coordinate pair for a point, or a list of pairs for a polyline or polygon boundary.
Occasionally, there is a need to obtain the location of the centroid of a polygon;
some GISs store these also, others compute them ‘on-the-fly’ (de By and
Huisman, 2009: 351).
Length is a geometric property associated with polylines, by themselves, or in
their function as polygon boundary. It can obviously be computed by the GIS—as
the sum of lengths of the constituent line segments—but it quite often is also
stored with the polyline (de By and Huisman, 2009: 351).
Area size is associated with polygon features. Again, it can be computed, but
usually is stored with the polygon as an extra attribute value. This speeds up the
computation of other functions that require area size values. We see that all of
the above measurements do not require computation, but only a look up in stored
data (de By and Huisman, 2009: 351).
Measuring distance between two features is another important function. If both
features are points, say p and q, the computation in a Cartesian spatial reference
system are given by the well-known Pythagorean distance function (de By and
Huisman, 2009: 351):

5-1
If one of the features is not a point, or both are not, we must be precise in defining
what we mean by their distance. All these cases can be summarized as
computation of the minimal distance between a location occupied by the first and
a location occupied by the second feature. This means that features that intersect
or meet, or when one contains the other have a distance of 0. It is not possible to
store all distance values for all possible combinations of two features in any
reasonably sized database. As a result, the system must compute ‘on the fly’
whenever a distance computation request is made (de By and Huisman, 2009:
352).
Another geometric measurement used by the GIS is the minimal bounding box
computation. It applies to polylines and polygons, and determines the minimal
rectangle—with sides parallel to the axes of the spatial reference system—that
covers the feature as illustrated in Figure 6-1. Bounding box computation is an
important support function for the GIS: for instance, if the bounding boxes of two
polygons do not overlap, we know the polygons cannot possibly intersect each
other. Since polygon intersection is an expensive function, but bounding box
computation is not, the GIS will always first apply the latter as a test to see
whether it must do the first (de By and Huisman, 2009: 352).

Figure 6-1: Minimal Bounding Box of (a) a Polyline, and (b) a Polygon

6.1.2 Measurement on Raster Data


Measurements on raster data layers are simpler because of the regularity of the
cells. The area size of a cell is constant, and is determined by the cell resolution.
Horizontal and vertical resolution may differ, but typically do not. Together with
the location of a so-called anchor point, this is the only geometric information
stored with the raster data, so all other measurements by the GIS are computed.
The anchor point is fixed by convention to be the lower left (or sometimes upper
left) location of the raster (de By and Huisman, 2009: 354).
Location of an individual cell derives from the raster’s anchor point, the cell
resolution, and the position of the cell in the raster. Again, there are two
conventions: the cell’s location can be its lower left corner, or the cell’s midpoint.
These conventions are set by the software in use, and in case of low resolution
data they become more important to be aware of (de By and Huisman, 2009:
354).
The area size of a selected part of the raster (a group of cells) is calculated as the
number of cells multiplied with the cell area size (de By and Huisman, 2009: 354).
The distance between two raster cells is the standard distance function applied to
the locations of their respective mid-points, obviously taking into account the cell

5-2
resolution. Where a raster is used to represent line features as strings of cells
through the raster, the length of a line feature is computed as the the sum of
distances between consecutive cells (de By and Huisman, 2009: 354).

6.2 Overlay Functions


In this section, we look at techniques of combining two spatial data layers and
producing a third one from them. The binary operators that we discuss are known
as spatial overlay operators (de By and Huisman, 2009: 376).
Which characteristic to produce is determined by a rule that the user can choose.
Standard overlay operators take two input data layers, and assume they are
georeferenced in the same system, and overlap in study area. If either condition
is not met, the use of an overlay operator is senseless. The principle of spatial
overlay is to compare the characteristics of the same location in both data layers,
and to produce a new characteristic for each location in the output data layer.
Which characteristic to produce is determined by a rule that the user can choose
(de By and Huisman, 2009: 376).
6.2.1 Vector Overlay Functions
In vector data, principle of comparing locations pairwise applies, but the
underlying computations rely on determining the spatial intersections of features,
one from each input vector layer, pairwise. In the vector domain, the overlaying
of data layers is computationally more demanding than in the raster domain. We
will discuss here only overlays from polygon data layers, but remark that most of
the ideas carry over to overlaying with point or line data layers (de By and
Huisman, 2009: 377).

Figure 6-2: Polygon Intersect Overlay Operator


The standard overlay operator for two layers of polygons is the polygon
intersection operator as shown in the Figure 6-2. It is the fundamental, as many
other overlay operators implemented in systems can be defined in terms of it. The
result of this operator is the collection of all possible polygon intersections; the
attribute table result is a join of the two input attribute tables. This output attribute
table only contains a tuple for each intersection polygon found, and this explains

5-3
why we call this operator sometimes a spatial join (de By and Huisman, 2009:
377).
Another polygon overlay operators is illustrated in Figure 6-3. It is known as the
polygon clipping operator. It takes a polygon data layer and restricts its spatial
extent to the generalized outer boundary obtained from all polygons in a second
input layer. Besides this generalized outer boundary, no other polygon boundaries
from the second layer play a role in the result (de By and Huisman, 2009: 379).

Figure 6-3: Polygon Clip Overlay


Vector overlays are also defined usually for point or line data layers. Their
definition parallels the definitions of operators discussed above. Different GISs use
different names for these operators, and one is advised to carefully check the
documentation before applying any of these operators (de By and Huisman, 2009:
379).
6.2.2 Raster Overlay Functions
In raster data, as we shall see, comparisons are carried out between pairs of cells,
one from each input raster. Vector overlay operators are useful, but geometrically
complicated, and this sometimes results in poor operator performance. Raster
overlays do not suffer from this disadvantage, as most of them perform their
computations cell by cell, and thus they are fast (de By and Huisman, 2009: 381).
GISs that support raster processing usually have a full language to express
operations on rasters. Such a language is called a raster calculus (also called map
algebra), as it allows to compute new rasters from existing ones, using a range of
functions and operators (de By and Huisman, 2009: 381).
When producing a new raster a statement of the following format shall be
provided:
Output raster name = Raster calculus expression
The expression on the right is evaluated by the GIS, and the raster in which it
results is then stored under the name on the left. The expression may contain
references to existing rasters, operators and functions. When the expression is
evaluated, the GIS will perform the calculation on a pixel by pixel basis, starting
from the first pixel in the first row, and continuing until the last pixel in the last
row. There is a wide range of operators and functions that can be used in raster
calculus (de By and Huisman, 2009: 381).
Arithmetic Operators
Various arithmetic operators are supported. The standard ones are multiplication
(×), division (/), subtraction (−) and addition (+). Obviously, these arithmetic
operators should only be used on appropriate data values, and for instance, not
on classification values (de By and Huisman, 2009: 383).
Other arithmetic operators may include modulo division (MOD) and integer
division (DIV). Modulo division returns the remainder of division: for instance, 10
MOD 3 will return 1 as 10−9 = 1. Similarly, 10 DIV 3 will return 3. More operators

5-4
are goniometric: sine (sin), cosine (cos), tangent (tan), and their inverse functions
asin, acos, and atan, which return radian angles as real values (de By and
Huisman, 2009: 383).
Some simple raster calculus assignments are illustrated in Figure 6-4. The
assignment C1 = A + 10 will add a constant factor of 10 to all cell values of raster
A and store the result as output raster C1. The assignment C2 = A + B will add
the values of A and B cell by cell, and store the result as raster C2. Finally, the
assignment C3 = (A − B)/(A + B) × 100 will create output raster C3, as the result
of the subtraction (cell by cell, as usual) of B cell values from A cell values, divided
by their sum. The result is multiplied by 100 (de By and Huisman, 2009: 383).

Figure 6-4: Examples of Arithmetic Operators

Comparison and Logical Operators


Raster calculus also allows to compare rasters, cell by cell. To this end, we may
use the standard comparison operators (<, <=, =, >=, > and <>) that we
introduced before. A simple raster comparison assignment is C = A <> B. It will
store truth values—either true or false—in the output raster C. A cell value in C
will be true if the cell’s value in A differs from that cell’s value in B. It will be false
if they are the same (de By and Huisman, 2009: 385).
Logical connectives such as AND, OR and NOT are also supported in many raster
calculi. Another connective that is commonly offered in raster calculus is exclusive
OR (XOR). The expression a XOR b is true if either a or b is true, but not both.
Examples of the use of these comparison operators and connectives are provided
in Figure 6-5 (de By and Huisman, 2009: 385).

5-5
Figure 6-5: Examples of Logical Expressions in Raster Calculus

Conditional Expressions
The above comparison and logical operators produce rasters with the truth values
true and false. In practice, we often need a conditional expression with them that
allows to test whether a condition is fulfilled. The general format is:
Output raster = con(condition, then expression, else expression).
Here, condition is the tested condition, then_expression is evaluated if condition
holds, and else_expression is evaluated if it does not hold. This means that an
expression like con(A=”Forest”, 10,0) will evaluate to 10 for each cell in the output
raster where the same cell in A is classified as forest. In each cell where this is not
true, the else_expression is evaluated, resulting in 0.

5-6
Figure 6-6: Example of Conditional Expression in Map Algebra

Raster-based Analysis:
Spatial interpolation methods; raster analysis including topological overlay; Map
calculations, statistics; integrated spatial analysis, seek computations, spread
computations, Classification function, reclassification.

6.3 Classification/Reclassification Function


Classification is a technique of purposefully removing detail from an input data
set, in the hope of revealing important patterns (of spatial distribution). In the
process, we produce an output data set, so that the input set can be left intact.
We do so by assigning a characteristic value to each element in the input set which
is usually a collection of spatial features that can be raster cells or points, lines or
polygons. If the number of characteristic values is small in comparison to the size
of the input set, we have classified the input set (de By and Huisman, 2009: 368).
The pattern that we look for may be the distribution of household income in a city.
Household income is called the classification parameter. If we know for each ward
in the city the associated average income, we have many different values.
Subsequently, we could define five different categories (or: classes) of income:
‘low’, ‘below average’, ‘average’, ‘above average’ and ‘high’, and provide value
ranges for each category. If these five categories are mapped in a sensible colour
scheme, this may reveal interesting information. This has been done for Dar es
Salaam in Figure 6-7 in two ways (de By and Huisman, 2009: 368).
The input data set may have been itself the result of some classification, and in
such a case we talk of a reclassification. For example, we may have a soil map
that shows different soil type units and we would like to show the suitability of
units for a specific crop. In this case, it is better to assign to the soil units an
attribute of suitability for the crop. Since different soil types may have the same
crop suitability, a classification may merge soil units of different type into the same
category of crop suitability (de By and Huisman, 2009: 368).

5-7
In classification of vector data, there are two possible results. The input features
may become the output features, in a new data layer, with an additional category
assigned. In other words, nothing changes with respect to spatial extents of the
original features. Figure 6-7 (a) is an illustration of this first type of output. A
second type of output is obtained when adjacent features with the same category
are merged into one bigger feature. Such a post-processing function is called
spatial merging, aggregation or dissolving. An illustration of this second type is
found in Figure 6-7 (b). Observe that this type of merging is only an option in
vector data, as merging cells in an output raster on the basis of a classification
makes little sense. Vector data classification can be performed on point sets, line
sets or polygon sets; the optional merge phase is sensible only for lines and
polygons (de By and Huisman, 2009: 368).

Figure 6-7: Two Classifications (a) with Original Polygons Intact (b) With Original
Polygons Merged
Further Reading in (Bhatta, 2014: 568)

6.4 Spatial Interpolation


Further reading in (Chang, 2014: 326), (Bhatta, 2014: 480)

5-8
CHAPTER 7 SURFACE MODELING

Surface Modelling: [4 hrs]


DEM; slope; aspect; other raster functions

7.1 Digital Elevation Model (DEM)


Further reading in (Bhatta, 2014: 256)

7.2 Contouring
Further reading in (Chang, 2014: 281), (Bhatta, 2014: 256)

7.3 Slope

Further reading in (Chang, 2014: 286), (Bhatta, 2014: 561)

7.4 Aspect

Further reading in (Chang, 2014: 286), (Bhatta, 2014: 562)

7.5 Hillshade
Further reading in (Chang, 2014: 282), (Bhatta, 2014: 562)

7.6 Viewshed Analysis


Viewshed determines areas of the land surface that are visible from one or more
viewpoints.

Further reading in (Chang, 2014: 303), (Bhatta, 2014: 563)

7-1
CHAPTER 8 HYDROLOGY MODELING

Hydrology Modelling: [4 hrs]


Flow direction; flow accumulation; river network; and watershed boundary
delineation.

8.1 Filled DEM


A depression in a DEM is a cell or cluster of cells in an elevation raster that are
surrounded by cells of higher elevation values, which represents and area of
internal drainage. Although some depressions are real, such as quarries or
glaciated potholes, many are imperfections in the DEM.
Depressions must be removed from an elevation raster. A common method for
removing a depression is to increase its cell value to the lowest outflow point out
of the sink (Jenson and Domingue 1988 in (Chang, 2014: 309))

8.2 Flow Direction


Further reading in (Chang, 2014: 309)

8.3 Flow Accumulation

Further reading in (Chang, 2014: 310)

8.4 River/Stream Network

Further reading in (Chang, 2014: 311)

8.5 Watershed Boundary


Watershed is an area that drains water and other substances to a common point.
Further reading in (Chang, 2014: 312)

7-1
CHAPTER 9 MAKING MAPS

Making Maps: [3 hrs]


Map functions in GIS; map design; map elements; choosing a map type; Exporting
map in different format, printing a map.

9.1 Map Functions in GIS

9.2 Basic Elements of Map


Once decisions have been made about projection, symbolization, and the like, the
composition or layout of map elements can begin. The basic elements the
mapmaker has to work with are the subject area, the title, the legend, the scale
indicator, the graticule or north arrow, supplementary text, frame/border, and
insets (Figure 2.15). Not all of these elements will appear on every map (Tyner,
2010: 31).
If some people had their way, every map would always include five elements that
aid in understanding by whom, why, and when a map was made. However, like
all recommendations these five parts are suggestions, not requirements. The five
essential elements of maps are (Harvey, 2008: 48): Legend (special note re color)
Scale, Orientation, Neatline, Title.
Additional and important elements include (Harvey, 2008: 49):
Name of author, Date map published, Explanation of purpose, Projection, Data
sources, Gridlines

Title
Most maps have a title. If the map is to stand alone, that is, printed on a separate
sheet, not in a book, a title should appear on the map sheet; if the map is printed
in a book, report, thesis, or dissertation, the title may appear on the map or as a
caption below the illustration. The caption can explain or elaborate if there is a
title on the map. There are three things to consider with titles: wording,
placement, and type style. The wording introduces the reader to the map subject
just as the title of a book or article does. Wording and type style are covered in
Chapter 3. Placement of the title is a part of the map layout. Contrary to what
many believe, the title does not have to be at the top of the map. It can be placed
anywhere on the page as long as it stands out in the visual hierarchy—the title is
normally the most important wording on the map—and as long as it creates a
balanced composition. The shape of the map area often provides a natural place
for the title in the composition (Figure 2.16) (Tyner, 2010: 32).
Legends
Legends present mini design problems. Like title design, legend design has several
parts: content, wording, placement, and style. First of all, any symbol in the

9-1
legend must look exactly like the symbol on the map. Miniaturizing the symbol,
for example, will cause reader confusion (Figure 2.17). It isn’t necessary to title
the legend space as “legend,” although this is commonly done, especially on maps
in children’s textbooks, and was built into some early computer mapping
programs. This is much like saying “a map of” in the title; it is redundant and a
waste of space—although on children’s maps it can serve as a teaching aid. The
legend title can elaborate on the subject of the map and should explain the
material in the legend (Figure 2.18). For example, if a map shows median income
in the United States, by state, the legend could be titled “Income in Dollars.” Or if
the map title is simply “Income by State” the legend title can be “Median Income
in Dollars.” The goal is clarity (see Chapter 3).
Placement of the legend, like the other design elements, is governed by balance
and white space. There is no general rule for where a legend should be placed
although some companies and agencies may establish their own guidelines for a
map series.
The lettering style of a legend does not have to be the same as that of the title,
but the typefaces must complement one another. Some typefaces do not work
well together (see Chapter 3) (Tyner, 2010: 33).
Scale

9.3 Types of Map


Three of the most common types of maps are thematic, topographic, and
cadastral. There are many ways to develop typologies of maps, but these three
types seem to distinguish both how and why maps are used. Thematic maps are the
most common: they show specific topics and their geographic relationships and
distributions. Thematic maps show us the weather forecast, election results,
poverty, soil types, and the spread of a virus. Topographic maps—from the United
States Geological Survey (USGS), for example—show the physical characteristics
of land in an area and the built changes in the landscape. Cadastral maps show how
land is divided into real property, and sometimes the kinds of built improvements
(Harvey, 2008: 13).

9.4 Map Design


When we speak of map design there are two meanings: layout of design elements
and planning the map. Layout involves decisions such as “Where should I place
the title, where should the legend and scale go?”; in art, this is called composition.
Design in the sense of planning begins before a single line is drawn and includes
deciding what information will be included and choosing a projection, the scale,
and the type of symbols. It is at the heart of the map creation process. In this
chapter we look at both aspects of design. The remainder of the book will assist
you in making design decisions. Any design, whether of maps or buildings, has
certain goals: clarity, order, balance, contrast, unity, and harmony. These must
be kept in mind when planning a map (Tyner, 2010: 18).

9-2
Clarity
A map that is not clear is worthless. Clarity involves examining the objectives of
the map, emphasizing the important points, and eliminating anything that does
not enhance the map message. Although removing data can be carried to an
extreme, as in the case of propaganda maps, putting the names of every river on
a population map simply clutters the map and makes the thematic information
hard to read (Figure 2.1) (Tyner, 2010: 19).
Order
Order refers to the logic of the map. Is there visual clutter or confusion? Are the
various elements placed logically? Is the reader’s eye led through the map
appropriately? Since the map is a synoptic, not a serial, communication,
cartographers cannot assume that readers will look first at the title, then at the
legend, and so on. Studies of eye movements show there is considerable shifting
of view. Rudolph Arnheim has noted that the orientation of shapes seems to exert
an attraction because the shape of the elements on a page creates axes that
provide direction. That is, vertical lines lead the eye up and down on the map;
horizontal lines lead the eye left and right (Tyner, 2010: 19).
Balance
Every element of the map has visual weight. These weights should be distributed
evenly about the optical center of the page, which is a point slightly above the
actual center, or the map will appear to be weighted to one side or unstable (Figure
2.2). While this doesn’t affect the readability or usefulness of the map, it is a factor
in its appearance.
Generally, visual weight within a frame depends on location, size, color, shape,
and direction. According to Arnheim (1969, pp. 14–15), visual weights vary as
follows (Tyner, 2010: 20):

 Centrally located elements have less weight than those to one side.
 Objects in the upper half appear heavier than those in the lower half.
 Objects on the right side appear heavier than those on the left side.
 Weight appears to increase with increasing distance from the center.
 Isolated elements have more weight than grouped objects.
 Larger elements have greater visual weight.
 Red is heavier than blue.
 Bright colors are heavier than dark.
 Regular shapes seem heavier than irregular shapes.
 Compact shapes have more visual weight than unordered, diffuse shapes.
 Forms with a vertical orientation seem heavier than oblique forms.

9.5 Printing a Map

9-3
9.6 Exporting Map

9-4
REFERENCES
Bhatta, B. (2014) Remote Sensing and GIS, 2nd edition, New Delhi: Oxford
University Press.
Bolstad, (2012) GIS Fundamentals: A First Text on Geographic Information
Systems, Fourth Edition edition, Eider Press.
Buckley, D.J. (1997) The GIS Primer – An Introduction to Geographic Information
Systems, Colorado: GIS Solutions Inc.
Burrough, P.A. (1986) Principles of Geographical Information Systems for Land
Resources Assessment, Oxford University Press.
Burrough, P.A. and McDonnell, R.A. (2014) Principles of Geographical Information
Systems, South Asia Edition edition, New Delhi: Oxford University Press.
Campbell, E. and Shin, M. (2012) Geographical Information System Basics (v.1.0).
Chang, K.-T. (2014) Introduction to Geographic Information Systems, 4th edition,
New Delhi: McGraw Hill.
Cho, G. (1995) A Self-Teaching Student's Manual for Geographic Information
Systems, Canberra: University of Canberra and Committee for the Advancement
of University Teaching (CAUT).
Chrisman, R. (1999) 'What dies 'GIS' mean?', Transactions in GIS, vol. 3, no. 2,
pp. 175-186.
Clarke, C. (1995) Getting Started With Geographic Information Systems, 3rd
edition, Prentice Hall.
Coppock, J.T. and Rhind, D.W. (1991) 'An Overview and Definitions of GIS', in
Maguire, D.J., Goodchild, M.F. and Rhind, D.W. (ed.) Geographical Information
Systems: Principles and Applications, London: Longman.
Coppock, J.T. and Rhind, D.W. (1991) 'The History of GIS', in Maguire, D.J.,
Goodchild, M.F. and Rhind, D.W. (ed.) Geographical Information Systems:
Principles and Applications, London: Longman.
de By, R.A. and Huisman, O. (ed.) (2009) Principles of Geographic Information
Systems (ITC Educational Textbook Series; 1), 4th edition, Enschede: The
International Institute for Aerospace Survey and Earth Sciences (ITC).
de By, R.A., Knippers, R.A., Sun, Y., Ellis, M.C., Kraak, M.-J., Weir, J.C.,
Georgiadou, , Radwan, M.M., vanWesten, C.J., Kainz, and Sides, E.J. (2001)
Principles of Geographic Information Systems (ITC Educational Textbook Series;
1), 2nd edition, Enschede: The International Institute for Aerospace Survey and
Earth Sciences (ITC).
Dueker, J. (1978) Land Resouce Information Systems: A Review of fifteen years
experience, Iowa City.
Fazal, S. (2008) GIS Basics, New Delhi: New Age International (P) Ltd.
Freeman, H. (1974) 'Computer Processing of Line Drawing Images', Computing
Surveys, vol. 6, pp. 54-97.
Galati, S.R. (2006) Geographic Information Systems Demystified, Boston: Artech
House.

9-1
Garcia-Molina, , Ullman, D. and Widom, (2002) Database Systems: The Complete
Book, New Jersey: Prentice Hall.
Goodchild, M.F. (1992) 'Geographical Information Science', International Journal
of Geographical Information Systems, vol. 6, no. 1, Jan.-Feb.
Harvey, F. (2008) A Primer of GIS : Fundamental Geographic and Cartographic
Concepts, New York: The Guilford Press.
Healey, R.G. (1991) 'Database Management Systems', in Maguire, D.J.,
Goodchild, M.F. and Rhind, D.W. (ed.) Geographical Information Systems:
Principles and Applications, London: Longman.
Jensen, J.R. (2011) Remote Sensing of the Environment, 2nd edition, Dorling
Kindersley India Pvt. Ltd.
Johnson, L.E. (2009) Geographic Information Systems in Water Resources
Engineering, New York: Taylor & Francis Group.
Kennedy, M. (2000) Understanding Map Projections, ESRI.
Longley, , P.A., Goodchild, M.F., Maguire, D.J. and Rhind, D.W. (ed.) (n.d)
Geographical Information Systems - Principles and Applications, 2nd edition, 2nd
Ed.
Longley, A., Goodchild, M.F., Maguire, D.J. and Rhind, D.W. (2004) Geographic
Information Systems and Science, 2nd edition, John Wiley & Sons, Ltd.
Murai, S. (1998a) GIS Workbook, Tokyo: Japan Association of Surveyors (JAS).
Murai, S. (1998b) GIS Workbook, Tokyo: Japan Association of Surveyors (JAS).
National Information Standards Organization (NISO) (2004) Understanding
Metadata, NISO Press.
Salgé, F. (1999) 'National and International Data Standards', in Longley, P.A.,
Goodchild, F., Maguire, D.J. and Rhind, W. Geographical Information Systems -
Vol 1: Principles and Technical Issues, 2nd edition, New York: John Wiley and Sons,
Inc.
Schofield, W. and Breach, M. (2007) Engineering Surveying, 6th edition, Oxford:
Elsevier.
Shekar, S. and Xiong, H. (ed.) (2008) Encyclopedia of GIS, Springer Science.
Snyder, J.P. (1987) Map Projections - A Working Manual, Washington: United
States Government Printing Office.
Star, and Estes, J. (1990) Geographical Information Systems: An Introduction.,
Englewoods Cliffs, New Jersey: Prentice Hall.
Svedberg , D. and Carlsson, S. (1999) 'Calibration, Pose and Novel Views from
Single Images of Constrained Scenes', Proceedings of the 11th Scandinavian
Conference on Image Analysis (SCIA’99), June, pp. pp. 111-117.
Tsou, M. (2004) 'Integrated Mobile GIS and Wireless Internet Map Servers for
Environmental Monitoring and Management', Cartography and Geographic
Information Science, vol. 31, pp. 153-65.
Tyner, J.A. (2010) Principles of Map Design, New York: The Guilford Press.

9-2
Veregin, H. (1999) 'Data Quality Parameters', in Longley, P.A., Goodchild, F.,
Maguire, D.J. and Rhind, W. Geographical Information Systems - Vol 1: Principles
and Technical Issues, 2nd edition, New York: John Wiley and Sons, Inc.

9-3

You might also like