Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

3.

Models of geographic data used in GIS

The aim of this lecture is to represent geographic data models in GIS – i.e. the rules to represent
different aspects of real world in a digital computer environment.

3.1 Data model and modelling of real world in GIS

The heart of any GIS is the data model – the whole of the rules describing and representing
selected aspects of a real world in a digital computer environment (Fig. 3.1). GIS users interact
with operational GIS and solve their tasks: map, querying databases, performing spatial analysis
and so on. The functioning of this system is directly predefined by the real world representation
using digital models.

Elementas
Elementas Darbinė GIS
Operational GIS
GISGIS
duomenų
Data Linija
Linija
Analizuoja
Analysis ir
and
Atvaizduoja
Poligonas
Poligonas

modelis
Model Pastatas
Pastatas
Presentation
Aprašo irand
Gatvė
Gatvė Vanden-
Vanden-
tiekio
tiekiolinija
Description linija

Atvaizduoja
Representation Siurblinė
Siurblinė Namai
Namai

Žmonės
People
Supranta ir and
Interpretation
Veikia
Explanation

Realusis
Real worldpasaulis

Fig. 3.1. The role of a data model in GIS (source: Paul A. Longley, Michael F. Goodchild, David
J. Maguire, David W. Rhind, Geographic Information Systems and Science, 2nd Edition, Wiley,
2005)

Geographic reality is very complex and many-sided. Computer is finite, relatively simple and
can use only digital data. So it is extremely important to decide which model to use in GIS.
Different types of people use GIS for different purposes and the phenomena they study have
diverse characteristics. So there is no single universal GIS data model that is best for all cases.
To represent the real world in a computer there are four different levels of abstraction
(generalization or simplification) – Fig. 3.2. First of all, reality is made up of the real world
phenomena (buildings, streets, roads, forests, people, etc.) and includes all its aspects that may
be not perceived or deemed to be out of importance for certain application by individuals.
Second, the conceptual model is human-oriented, often just partially structured and thought to be
related with selected objects and processes. Third, the logical model is implementation of real-
world representation. Fourth, the physical model is directly connected to practical
implementation of GIS, it describes how the files or databases are stored.
Reality
Human
oriented

Conceptual model

Increasing
abstraction
Logical model

Computer
oriented
Physical model

Fig. 3.2. Four levels of abstraction in representation of real world in a computer (source: Paul A.
Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind, Geographic Information
Systems and Science, 2nd Edition, Wiley, 2005)

3.2 GIS data models

Many data models have been developed and included in GIS software systems during last fifty
years:
• Computer-Aided Design (CAD) – focused on computer based engineering and drawing;
• Graphical (non-topological) – simple mapping;
• Image – image processing and simple spatial analysis;
• Raster/Grid – spatial analysis and modelling;
• Vector/georelational-topological – analysis, modelling and mapping operations using
vector geometric features;
• Network – network analysis for communication, transportation, hydrology applications;
• Triangulated irregular network (TIN) – surface visualization, analysis and modelling;
• Object – different operations with all geographic feature types (rasters, vectors, TINs, etc.)
for any type of applications.

3.2.1 CAD, graphical and image data models

The first GISs were based on very simple data models, inherited from CAD, digital mapping and
image processing systems. Features of real world are represented as simple points, lines or
polygons in CAD systems. However, this data model has never become common in GIS because
of the following:
• Traditionally, CAD uses paper sheet coordinates rather than real world coordinates;
• Individual features have no unique IDs, so they are very difficult to connect with attribute
information – this is a key requirement of GIS;
• CAD data models do not support topological information, necessary in many spatial
analysis operations.

Graphical (non-topological) data models originate from digital mapping systems. The main
requirements in computer based cartography were automatic map reproduction and thematic
mapping. Technological production lines were developed to digitize, store and print maps. All
map elements are stored as points, lines or polygons with annotations used for place names. Like
CAD systems, graphical data models do not support attributes, topological information.
Practically about the same time that CAD and computer cartography systems were being
developed, a third type of data model emerged to solve the tasks of image processing. It focuses
on the processing of scanned aerial photographs and digital spatial images. So, it is naturally,
that these systems use raster (grid) data model. The main distinctive feature of the image data
model if compared with a raster is low level of reality abstraction (real world objects on
photographs are represented in a way humans see). Some functions of spatial analysis,
traditionally available in raster GIS, are possible with image data model, too.

3.2.2 Raster data model

The real world is represented in a raster data model using an array of rectangular, usually square
cells, or picture elements (pixels) – Fig. 3.3. Each pixel (cell) is supplied with a certain value of
an attribute of geographic feature being described.

Fig. 3.3. Representation of real world objects in a raster data model

Raster data are usually stored as a matrix of numeric values together with some meta-
information about the matrix contained in a file header. Traditionally, meta-information includes
data on geographic or Cartesian coordinates of a reference cell, its size, the number of rows and
columns, and the projection. Cells are georeferenced by indicating coordinates of one corner (e.g.
lower left) of one cell.

Raster data model is very convenient to store and analyse continuous data. Value of each cell
may indicate some class or category, measurement or modelling result for the area, the cell
represents.

The main distinctive characteristics of a raster:

• Accuracy, the geographic object is described in a raster, is inverse proportional to the cell
size. Having cells larger some small geographic objects or some details of other objects
may be lost.
• 10 times reducing the cell size increases the size of raster data set by 102 times.
• Raster elements have no semantic value – we cannot identify the cell’s borders in the
field.
• No topology is maintained in a raster.
3.2.3 Vector data model

Each object in the real world is first classified into a geometric type in a vector: point, line or
polygon (Fig. 3.4).

Fig. 3.4. Representation of real world objects in a vector data model

Points (e.g. nesting site, tourist information centre, etc.) are recorded as single coordinate pair,
lines (e.g. street, river, elevation contour line, etc.) as a series of ordered coordinate pairs (also
called in a GIS literature as polylines), and polygons (e.g. city district, outlines of buildings,
voting districts, etc.) are made up of closed line segments, the coordinate of starting point is the
same as the one of the ending point (Fig. 3.5). The coordinates that define the geometry of each
geographic object may be 2, 3 or 4 dimensional:

• 2D – x and y, latitude and longitude;


• 3D – x, y, z coordinates, z indicates the elevation;
• 4D – x, y, z and m coordinates. m is an additional value to represent time or other
property, e.g., the offset of road sign from a road centreline.
y
15 15, 14
20, 12
8, 10
10

15, 6 23, 6
5 5, 5 11, 5

8, 2 20, 2
1
1 5 10 15 20 25 x
Fig. 3.5 Representation of points, lines and polygons in a vector

Some linear features can be represented not only as series or ordered coordinates, but also
defined using mathematical functions (e.g. Bézier curve).
3.2.3.1 “Spaghetti” data model

Point is represented as pair of x, y coordinates, line as a series of ordered coordinate pairs,


polygon – closed chain of x, y coordinate pairs (Fig. 3.6). Overlapping coordinates of
neighbouring polygons are recorded twice – separately for each polygon. Data set constructed in
such way looks like a collection of coordinate chains – therefore the title “spaghetti”.

Map Map in x,y coordinates

3 3

4 4
1 1

2 2

Feature Id Location
Point 1 X,Y
Line 2 X1,Y1; X2,Y2; …; Xn,Yn
Polygon 3 X1,Y1; X2,Y2; X3,Y3; ...; X1;Y1
Polygon 4 X1,Y1; X2,Y2; X11,Y11; ...; X1;Y1

Fig. 3.6. Representation of geographic objects in a vector “spaghetti” data model

The structure of “spaghetti” data model is very simple, however, it does not support topology.
This is the main reason for low efficiency of “spaghetti” in spatial analysis – topological
relations between features must be calculated during the analysis itself. On the second hand,
“spaghetti” is very efficient in mapping – there is no need to know the topological relations
between features.

3.2.3.2 Vector-topological data model

Topological model is a type of vector data model used to encode spatial relationships in a GIS.
Features (points, lines and polygons) are stored following some topological rules. Topology is
the mathematical method used to define the spatial relationships. Topological relationships
remain stable of the geographic space of objects is changed. E.g., changing map projection we
may observe that the lengths and angles may change, but the topology (e.g. neighbouring
features) does not.

There are several forms of vector-topological data model. Here we discuss so called “Arc-Node”
model (Fig. 3.7). Arc is a series of points (vertices in GIS terminology) that start and end at a
node. A node is an intersection point of two or more arcs, it can also occur at the end of a
“dangling” arc. Isolated nodes, not connected to arcs, represent points. A polygon is built from a
closed chain of arcs that represents the boundaries of the area.
a3 M1 a1 E Node topology table
Y Node Arcs
M1 a1, a3, a4
a4 A M2 a1, a2, a5
a a1
x a3 M3 a2, a3, a5
a4 M4 a4
i M4 M2 M5 a6
s a5
a7 M6 a7
M3 M6
B a7 C a7 a2 Polygon topology table
a7 Polygon Arcs
a2 A a1, a5, a3
D, a6, M5
B a2, a5, 0, a6, 0, a7
C a7
a2 D a6
E E Area outside map
borders
X axis
Arc topology table Arc coordinate table
Arc Start End node Left Right Arc Start Intermediate X,Y End X,Y
node polygon polygon X,Y
a1 M1 M2 E A a1 125,220 200,220 200,150
a2 M2 M3 E B a2 220,150 200,30; 50,30 50,110
a3 M3 M1 E A a3 50,110 50,220 125,220
a4 M4 M1 A A a4 100,140 100,180 125,220
a5 M3 M2 A B a5 50,110 100,120; 130,120; 170,150 200,150
a6 M5 M5 B B a6 80,60 80,160
a7 M6 M6 B C a7 180,110 175,80; 130,80; 150,120 180,110
Fig. 3.7. The “Arc-Node” vector topological data model (based on Aronoff S., 1989, Geographic
information systems: a management perspective, WDL Publications, Ottawa, Canada)

The location of geographic objects and their spatial relationships are recorded in topological
tables. The polygon topology table list the arcs that make up the boundaries of each polygon. E.g.
polygon A consist of arcs a1, a3 and a5. Polygon C is an island inside polygon B. This is
indicated in the arc list for polygon B by 0 preceding the arc that make up the island. There is a
single arc a7 in polygon C. The point in polygon B is also treated as a separate polygon D, which
is built from one arc a7. A point is a polygon with no area. The area outside the map border –
polygon E – is marked without identification of any arc that may build it.

Each node is listed in the node topology table with the identification of arcs it belongs to. E.g.,
three arcs meet at node M1: a1, a3 and a4. The arc topology table defines the relationship of the
nodes and polygons to the arcs. One of the arc’s end points is considered as a starting node,
another – as an ending node of the arc. E.g., arc a5 starts at node M3 and ends at node M2.
Moving from M3 to M2, the polygon to the left is A, to the right – B.

To relate the map features to the real world the arc coordinate table is used. Each arc is
represented by one or more straight line segments defined by a series of coordinates. The more
complex shape the more coordinates are needed. E.g., arc a1 contains just one sharp turn so it is
encoded identifying the start and end points plus one intermediate point. All the topological
tables can be related with each other.

The role of topology in GIS is huge:


• The topology is used to validate the integrity of whole data set. Some tests used in
construction of databases:
o Do the polygons overlap or there are gaps between polygons? E.g. the properties
of two land owners should not overlap (only one owner at any point). There
should be no gaps between properties, too.
o Do all network elements connect? E.g., network elements that connect must be
“snapped” together at junctions.
o Do the lines intersect/not intersect? E.g., elevation contour lines may not intersect.
o Duplicate objects – the same object may be captured several times during the
digitalization.
• Topology enables modelling the integrated behaviour of different feature types. There are
many objects in the real world that cannot exist without each other, share common
boundaries, etc.
• Topology improves the data editing productivity:
o The ability to manipulate and change common features.
o Snapping control.
o Automatic closure of a polygon.
o Testing the connectivity of a network: are all linear network features
interconnected.
• Topology optimizes the queries. There are many queries that can be implemented without
using coordinate data. This accelerates performance of some analysis functions.

3.2.4 Network data model

Actually, the network data model is a special type of vector-topological data model, used to
model the flows (goods, resources, water, electricity, etc.). There are two primary types of
networks:
• Radial (tree-shaped) – flows have always an upstream or downstream directions (e.g.,
hydrographic network);
• Looped – flows may intersect (e.g., street, water supply networks).

Networks are constructed in a GIS using points (e.g. crossroads, confluences of streams, fuses,
switches, water valves, etc. – usually referred to as “Nodes” in vector-topological data model)
and lines (e.g. stream, street, water supply pipeline, etc.). Topological relations of a network are
define how lines are interconnected at nodes. To carry out the network analysis it is also required
to define rules about how flows can move through a network. E.g., let’s take water supply
network – the capacity of mains and laterals, parameters of pump stations, properties of fittings,
etc. must be defined (Fig. 3.8).
House
Main

Meter

Lateral

Pump Fitting
Valve

Hydrant

Pump house Street

Fig. 3.8. Water supply network model in a GIS

3.2.5. Triangulated Irregular Network (TIN) data model

Triangulated Irregular Network or just TIN is a type of vector-topological data model used to
represent and analyse the surfaces. The term 2.5D is sometimes used to describe the surface
structure (Earth elevation, river bottom, etc.). A true 3D (three-dimensional) structure is a
structure containing multiple z values at the same x, y location (e.g. Earth surface in the
mountains with negative inclination, engineering constructions, tunnels, etc.). The TIN, as the
name suggests, represents a surface using contiguous non-overlapping triangular elements (Fig.
3.9.).

Fig. 3.9. Earth surface of Nemunas river bottom and surrounding areas, represented using TIN

TIN is created from a set of points with x, y and z coordinates. More complex surfaces are
represented using more densely sampled points.

TIN is a topological data structure, supporting spatial relations between each triangle and x, y, z
point. Let’s note each triangle using capital letter, each node – using number (Fig. 3.10.).
Coordinates and topological information is stored in relational tables. Node topology table lists
all the triangles and their nodes as well as neighbouring triangles. Only two neighbouring
triangles are identified for triangle on a border of study area. Node’s x, y, z coordinates are
stored in coordinate tables.

Neighboring
Triangles Nodes
triangles
A B,K 1,6,7
1 6
B A,C,L 1,7,8
A
C B,D 1,2,8
K D C,E 2,3,8
7 J
E D,F,L 3,8,9
B 11
C F E,G 3,4,9
N 5 G F,H,M 4,9,10
8 I
H G,I 4,5,10
L M I H,J,N 5,10,11
2 J I,K 5,6,11
D 10
9 H K A,J,N 6,7,11
E G L B,E,M 7,8,9
F
M G,L,N 7,9,10
3 N I,K,M 7,10,11
4
Node No. X,Y Z
1 x1,y1 z1
2 x2,y2 z2
...
11 x11,y11 z11

Fig. 3.10. TIN is a type of vector-topological data model

Like all 2.5D or 3D models, TINs are as good as the input sample data. They are very susceptible
to extremely high or low values because there is no filtering of input data. Another TIN
limitation is the fact that the surface inside the triangle is considered as being constant, i.e. being
flat but changing drastically on the border. Sometimes this turns into an advantage, e.g., in using
TINs in engineering, which requires precise location of dikes and ditches, stream banks, etc.
TINs allow using break lines, such as road, dike surfaces, etc., which turn into edges of triangles.

3.2.6 Object data model

All the data models discussed above are geometry-centric. They model the real world using
collections of points, lines, polygons, triangle planes, rasters. Any operation to be performed on
the geometry requires a separate procedure (program or script). The state of an entity (attributes
and properties defining what it is) is separate from its behaviour (what it does). Object data
model is used to store geographic features, relations between features as well as the behaviour of
features, i.e. each geographic feature is an integrated collection of geometries, properties and
methods (defining its behaviour). Geometry is treated like any other attribute. In addition to such
basic features like points, lines and polygons, the user can create such features like step-down
transformers, mains, forest compartments, etc. User may define specific behaviour of the
features (e.g. let’s say the feature is forest compartment. It may belong just to one forest
enterprise and district taken from the list, it will have such mandatory attributes, like number,
forest group, land category, soil type, etc.). Such behaviour applied during the input procedure
may facilitate the digitizing process as well as improve subsequent analysis and output of the
results.

Geographic objects of the same type are grouped into feature classes. All geographic objects are
somehow connected with other objects of the same and, possibly, other classes. Some
relationships are inherited together with the definition of feature class (e.g., overlapping
polygons may be automatically edited by GIS), other between-class relationships are defined by
the user. There are three types of relationships usually used in object data model: topological,
geographic and general.

Topological relationships are built into the definition of class. E.g., topological polygon feature
classes will be described using a type of tables, discussed in chapter 3.2.3.2.

Geographic relationships between the classes are based on “geographic operators” (e.g.,
“overlaps”, “is next to”, “is inside”, “touches” and so on) that determine the interaction between
objects. E.g., all the buildings, owned by the same person (or objects of buildings’ feature class,
if GIS terms are used) should be inside the land parcel, owned by the same person.

General relationships are used to define other possible relations between objects. E.g., one may
define the relationship between polygons of forest compartments and their descriptive
characteristics, stored in an external table.

In addition to supporting relationships between the objects, object data model allows several
types of rules, which are invaluable to maintain the database integrity during the editing
procedures:
• Attribute rules define the domains for attribute values. E.g., features of georeference
database are coded using feature codes from a list, but not entering them manually. Or,
elevation cannot be, e.g., 500 m in Lithuania.
• Connectivity rules define the valid combinations of features, depending on geometry,
attributes and peculiarities of topology. E.g., 28.8 kV line connects to a 14.4 kV line via
step-down transformer.
• Geographic rules define what happens to the object when it is, e.g., split or merged. E.g.,
land owner has purchased neighbouring land parcel. Polygons in a real estate layer are
merged and attributed using the details of the new owner. However, the land use type
may remain the same.

You might also like