Professional Documents
Culture Documents
3 - Models of Geographic Data Used in GIS - Eng
3 - Models of Geographic Data Used in GIS - Eng
The aim of this lecture is to represent geographic data models in GIS – i.e. the rules to represent
different aspects of real world in a digital computer environment.
The heart of any GIS is the data model – the whole of the rules describing and representing
selected aspects of a real world in a digital computer environment (Fig. 3.1). GIS users interact
with operational GIS and solve their tasks: map, querying databases, performing spatial analysis
and so on. The functioning of this system is directly predefined by the real world representation
using digital models.
Elementas
Elementas Darbinė GIS
Operational GIS
GISGIS
duomenų
Data Linija
Linija
Analizuoja
Analysis ir
and
Atvaizduoja
Poligonas
Poligonas
modelis
Model Pastatas
Pastatas
Presentation
Aprašo irand
Gatvė
Gatvė Vanden-
Vanden-
tiekio
tiekiolinija
Description linija
Atvaizduoja
Representation Siurblinė
Siurblinė Namai
Namai
Žmonės
People
Supranta ir and
Interpretation
Veikia
Explanation
Realusis
Real worldpasaulis
Fig. 3.1. The role of a data model in GIS (source: Paul A. Longley, Michael F. Goodchild, David
J. Maguire, David W. Rhind, Geographic Information Systems and Science, 2nd Edition, Wiley,
2005)
Geographic reality is very complex and many-sided. Computer is finite, relatively simple and
can use only digital data. So it is extremely important to decide which model to use in GIS.
Different types of people use GIS for different purposes and the phenomena they study have
diverse characteristics. So there is no single universal GIS data model that is best for all cases.
To represent the real world in a computer there are four different levels of abstraction
(generalization or simplification) – Fig. 3.2. First of all, reality is made up of the real world
phenomena (buildings, streets, roads, forests, people, etc.) and includes all its aspects that may
be not perceived or deemed to be out of importance for certain application by individuals.
Second, the conceptual model is human-oriented, often just partially structured and thought to be
related with selected objects and processes. Third, the logical model is implementation of real-
world representation. Fourth, the physical model is directly connected to practical
implementation of GIS, it describes how the files or databases are stored.
Reality
Human
oriented
Conceptual model
Increasing
abstraction
Logical model
Computer
oriented
Physical model
Fig. 3.2. Four levels of abstraction in representation of real world in a computer (source: Paul A.
Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind, Geographic Information
Systems and Science, 2nd Edition, Wiley, 2005)
Many data models have been developed and included in GIS software systems during last fifty
years:
• Computer-Aided Design (CAD) – focused on computer based engineering and drawing;
• Graphical (non-topological) – simple mapping;
• Image – image processing and simple spatial analysis;
• Raster/Grid – spatial analysis and modelling;
• Vector/georelational-topological – analysis, modelling and mapping operations using
vector geometric features;
• Network – network analysis for communication, transportation, hydrology applications;
• Triangulated irregular network (TIN) – surface visualization, analysis and modelling;
• Object – different operations with all geographic feature types (rasters, vectors, TINs, etc.)
for any type of applications.
The first GISs were based on very simple data models, inherited from CAD, digital mapping and
image processing systems. Features of real world are represented as simple points, lines or
polygons in CAD systems. However, this data model has never become common in GIS because
of the following:
• Traditionally, CAD uses paper sheet coordinates rather than real world coordinates;
• Individual features have no unique IDs, so they are very difficult to connect with attribute
information – this is a key requirement of GIS;
• CAD data models do not support topological information, necessary in many spatial
analysis operations.
Graphical (non-topological) data models originate from digital mapping systems. The main
requirements in computer based cartography were automatic map reproduction and thematic
mapping. Technological production lines were developed to digitize, store and print maps. All
map elements are stored as points, lines or polygons with annotations used for place names. Like
CAD systems, graphical data models do not support attributes, topological information.
Practically about the same time that CAD and computer cartography systems were being
developed, a third type of data model emerged to solve the tasks of image processing. It focuses
on the processing of scanned aerial photographs and digital spatial images. So, it is naturally,
that these systems use raster (grid) data model. The main distinctive feature of the image data
model if compared with a raster is low level of reality abstraction (real world objects on
photographs are represented in a way humans see). Some functions of spatial analysis,
traditionally available in raster GIS, are possible with image data model, too.
The real world is represented in a raster data model using an array of rectangular, usually square
cells, or picture elements (pixels) – Fig. 3.3. Each pixel (cell) is supplied with a certain value of
an attribute of geographic feature being described.
Raster data are usually stored as a matrix of numeric values together with some meta-
information about the matrix contained in a file header. Traditionally, meta-information includes
data on geographic or Cartesian coordinates of a reference cell, its size, the number of rows and
columns, and the projection. Cells are georeferenced by indicating coordinates of one corner (e.g.
lower left) of one cell.
Raster data model is very convenient to store and analyse continuous data. Value of each cell
may indicate some class or category, measurement or modelling result for the area, the cell
represents.
• Accuracy, the geographic object is described in a raster, is inverse proportional to the cell
size. Having cells larger some small geographic objects or some details of other objects
may be lost.
• 10 times reducing the cell size increases the size of raster data set by 102 times.
• Raster elements have no semantic value – we cannot identify the cell’s borders in the
field.
• No topology is maintained in a raster.
3.2.3 Vector data model
Each object in the real world is first classified into a geometric type in a vector: point, line or
polygon (Fig. 3.4).
Points (e.g. nesting site, tourist information centre, etc.) are recorded as single coordinate pair,
lines (e.g. street, river, elevation contour line, etc.) as a series of ordered coordinate pairs (also
called in a GIS literature as polylines), and polygons (e.g. city district, outlines of buildings,
voting districts, etc.) are made up of closed line segments, the coordinate of starting point is the
same as the one of the ending point (Fig. 3.5). The coordinates that define the geometry of each
geographic object may be 2, 3 or 4 dimensional:
15, 6 23, 6
5 5, 5 11, 5
8, 2 20, 2
1
1 5 10 15 20 25 x
Fig. 3.5 Representation of points, lines and polygons in a vector
Some linear features can be represented not only as series or ordered coordinates, but also
defined using mathematical functions (e.g. Bézier curve).
3.2.3.1 “Spaghetti” data model
3 3
4 4
1 1
2 2
Feature Id Location
Point 1 X,Y
Line 2 X1,Y1; X2,Y2; …; Xn,Yn
Polygon 3 X1,Y1; X2,Y2; X3,Y3; ...; X1;Y1
Polygon 4 X1,Y1; X2,Y2; X11,Y11; ...; X1;Y1
The structure of “spaghetti” data model is very simple, however, it does not support topology.
This is the main reason for low efficiency of “spaghetti” in spatial analysis – topological
relations between features must be calculated during the analysis itself. On the second hand,
“spaghetti” is very efficient in mapping – there is no need to know the topological relations
between features.
Topological model is a type of vector data model used to encode spatial relationships in a GIS.
Features (points, lines and polygons) are stored following some topological rules. Topology is
the mathematical method used to define the spatial relationships. Topological relationships
remain stable of the geographic space of objects is changed. E.g., changing map projection we
may observe that the lengths and angles may change, but the topology (e.g. neighbouring
features) does not.
There are several forms of vector-topological data model. Here we discuss so called “Arc-Node”
model (Fig. 3.7). Arc is a series of points (vertices in GIS terminology) that start and end at a
node. A node is an intersection point of two or more arcs, it can also occur at the end of a
“dangling” arc. Isolated nodes, not connected to arcs, represent points. A polygon is built from a
closed chain of arcs that represents the boundaries of the area.
a3 M1 a1 E Node topology table
Y Node Arcs
M1 a1, a3, a4
a4 A M2 a1, a2, a5
a a1
x a3 M3 a2, a3, a5
a4 M4 a4
i M4 M2 M5 a6
s a5
a7 M6 a7
M3 M6
B a7 C a7 a2 Polygon topology table
a7 Polygon Arcs
a2 A a1, a5, a3
D, a6, M5
B a2, a5, 0, a6, 0, a7
C a7
a2 D a6
E E Area outside map
borders
X axis
Arc topology table Arc coordinate table
Arc Start End node Left Right Arc Start Intermediate X,Y End X,Y
node polygon polygon X,Y
a1 M1 M2 E A a1 125,220 200,220 200,150
a2 M2 M3 E B a2 220,150 200,30; 50,30 50,110
a3 M3 M1 E A a3 50,110 50,220 125,220
a4 M4 M1 A A a4 100,140 100,180 125,220
a5 M3 M2 A B a5 50,110 100,120; 130,120; 170,150 200,150
a6 M5 M5 B B a6 80,60 80,160
a7 M6 M6 B C a7 180,110 175,80; 130,80; 150,120 180,110
Fig. 3.7. The “Arc-Node” vector topological data model (based on Aronoff S., 1989, Geographic
information systems: a management perspective, WDL Publications, Ottawa, Canada)
The location of geographic objects and their spatial relationships are recorded in topological
tables. The polygon topology table list the arcs that make up the boundaries of each polygon. E.g.
polygon A consist of arcs a1, a3 and a5. Polygon C is an island inside polygon B. This is
indicated in the arc list for polygon B by 0 preceding the arc that make up the island. There is a
single arc a7 in polygon C. The point in polygon B is also treated as a separate polygon D, which
is built from one arc a7. A point is a polygon with no area. The area outside the map border –
polygon E – is marked without identification of any arc that may build it.
Each node is listed in the node topology table with the identification of arcs it belongs to. E.g.,
three arcs meet at node M1: a1, a3 and a4. The arc topology table defines the relationship of the
nodes and polygons to the arcs. One of the arc’s end points is considered as a starting node,
another – as an ending node of the arc. E.g., arc a5 starts at node M3 and ends at node M2.
Moving from M3 to M2, the polygon to the left is A, to the right – B.
To relate the map features to the real world the arc coordinate table is used. Each arc is
represented by one or more straight line segments defined by a series of coordinates. The more
complex shape the more coordinates are needed. E.g., arc a1 contains just one sharp turn so it is
encoded identifying the start and end points plus one intermediate point. All the topological
tables can be related with each other.
Actually, the network data model is a special type of vector-topological data model, used to
model the flows (goods, resources, water, electricity, etc.). There are two primary types of
networks:
• Radial (tree-shaped) – flows have always an upstream or downstream directions (e.g.,
hydrographic network);
• Looped – flows may intersect (e.g., street, water supply networks).
Networks are constructed in a GIS using points (e.g. crossroads, confluences of streams, fuses,
switches, water valves, etc. – usually referred to as “Nodes” in vector-topological data model)
and lines (e.g. stream, street, water supply pipeline, etc.). Topological relations of a network are
define how lines are interconnected at nodes. To carry out the network analysis it is also required
to define rules about how flows can move through a network. E.g., let’s take water supply
network – the capacity of mains and laterals, parameters of pump stations, properties of fittings,
etc. must be defined (Fig. 3.8).
House
Main
Meter
Lateral
Pump Fitting
Valve
Hydrant
Triangulated Irregular Network or just TIN is a type of vector-topological data model used to
represent and analyse the surfaces. The term 2.5D is sometimes used to describe the surface
structure (Earth elevation, river bottom, etc.). A true 3D (three-dimensional) structure is a
structure containing multiple z values at the same x, y location (e.g. Earth surface in the
mountains with negative inclination, engineering constructions, tunnels, etc.). The TIN, as the
name suggests, represents a surface using contiguous non-overlapping triangular elements (Fig.
3.9.).
Fig. 3.9. Earth surface of Nemunas river bottom and surrounding areas, represented using TIN
TIN is created from a set of points with x, y and z coordinates. More complex surfaces are
represented using more densely sampled points.
TIN is a topological data structure, supporting spatial relations between each triangle and x, y, z
point. Let’s note each triangle using capital letter, each node – using number (Fig. 3.10.).
Coordinates and topological information is stored in relational tables. Node topology table lists
all the triangles and their nodes as well as neighbouring triangles. Only two neighbouring
triangles are identified for triangle on a border of study area. Node’s x, y, z coordinates are
stored in coordinate tables.
Neighboring
Triangles Nodes
triangles
A B,K 1,6,7
1 6
B A,C,L 1,7,8
A
C B,D 1,2,8
K D C,E 2,3,8
7 J
E D,F,L 3,8,9
B 11
C F E,G 3,4,9
N 5 G F,H,M 4,9,10
8 I
H G,I 4,5,10
L M I H,J,N 5,10,11
2 J I,K 5,6,11
D 10
9 H K A,J,N 6,7,11
E G L B,E,M 7,8,9
F
M G,L,N 7,9,10
3 N I,K,M 7,10,11
4
Node No. X,Y Z
1 x1,y1 z1
2 x2,y2 z2
...
11 x11,y11 z11
Like all 2.5D or 3D models, TINs are as good as the input sample data. They are very susceptible
to extremely high or low values because there is no filtering of input data. Another TIN
limitation is the fact that the surface inside the triangle is considered as being constant, i.e. being
flat but changing drastically on the border. Sometimes this turns into an advantage, e.g., in using
TINs in engineering, which requires precise location of dikes and ditches, stream banks, etc.
TINs allow using break lines, such as road, dike surfaces, etc., which turn into edges of triangles.
All the data models discussed above are geometry-centric. They model the real world using
collections of points, lines, polygons, triangle planes, rasters. Any operation to be performed on
the geometry requires a separate procedure (program or script). The state of an entity (attributes
and properties defining what it is) is separate from its behaviour (what it does). Object data
model is used to store geographic features, relations between features as well as the behaviour of
features, i.e. each geographic feature is an integrated collection of geometries, properties and
methods (defining its behaviour). Geometry is treated like any other attribute. In addition to such
basic features like points, lines and polygons, the user can create such features like step-down
transformers, mains, forest compartments, etc. User may define specific behaviour of the
features (e.g. let’s say the feature is forest compartment. It may belong just to one forest
enterprise and district taken from the list, it will have such mandatory attributes, like number,
forest group, land category, soil type, etc.). Such behaviour applied during the input procedure
may facilitate the digitizing process as well as improve subsequent analysis and output of the
results.
Geographic objects of the same type are grouped into feature classes. All geographic objects are
somehow connected with other objects of the same and, possibly, other classes. Some
relationships are inherited together with the definition of feature class (e.g., overlapping
polygons may be automatically edited by GIS), other between-class relationships are defined by
the user. There are three types of relationships usually used in object data model: topological,
geographic and general.
Topological relationships are built into the definition of class. E.g., topological polygon feature
classes will be described using a type of tables, discussed in chapter 3.2.3.2.
Geographic relationships between the classes are based on “geographic operators” (e.g.,
“overlaps”, “is next to”, “is inside”, “touches” and so on) that determine the interaction between
objects. E.g., all the buildings, owned by the same person (or objects of buildings’ feature class,
if GIS terms are used) should be inside the land parcel, owned by the same person.
General relationships are used to define other possible relations between objects. E.g., one may
define the relationship between polygons of forest compartments and their descriptive
characteristics, stored in an external table.
In addition to supporting relationships between the objects, object data model allows several
types of rules, which are invaluable to maintain the database integrity during the editing
procedures:
• Attribute rules define the domains for attribute values. E.g., features of georeference
database are coded using feature codes from a list, but not entering them manually. Or,
elevation cannot be, e.g., 500 m in Lithuania.
• Connectivity rules define the valid combinations of features, depending on geometry,
attributes and peculiarities of topology. E.g., 28.8 kV line connects to a 14.4 kV line via
step-down transformer.
• Geographic rules define what happens to the object when it is, e.g., split or merged. E.g.,
land owner has purchased neighbouring land parcel. Polygons in a real estate layer are
merged and attributed using the details of the new owner. However, the land use type
may remain the same.