GIS Data Management: Ge 118: Introduction To Gis Engr. Meriam M. Santillan Caraga State University

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Lecture 5:

GIS Data Management

GE 118: INTRODUCTION TO GIS


Engr. Meriam M. Santillan
Caraga State University
1
File Structures
(File-based datasets)

 Simple list
 Ordered sequential files
 Indexed files

2
Simple List
 Simplest file structure
 Unordered/unstructured
 Arrangement is by whichever comes first

3
Ordered Sequential Files
 Simple lists that are arranged according to
some order (ex. Alphabetical order)

4
Indexed Files
 An index to the directory is needed for more
efficient searches involving finding entries
given certain criteria
 Can be developed as direct files or inverted
files

5
Direct Indexed Files
 Records are used to provide access to other
pertinent information

6
Indirect Indexed Files
 Index is based on possible search criteria,
not on the entities themselves
 Attributes are the primary search criteria and
the entities rely on them for selection

7
Database
 An integrated set of data on a particular
subject
 Collection of interrelated data stored
together with controlled redundancy to
serve one or more applications in an
optimal fashion
 Requires more elaborate structure
called a database structure or
database management system

8
Significance of Database
 Most GIS activities consist of storing entity and
attribute data so that we can retrieve any
combination of these objects.
 Each graphical feature must be stored explicitly with
its attributes so that their combined search becomes
faster.

9
Advantages of Database over
File-based datasets
 Collecting data at a single location reduces
redundancy and duplication
 Lower maintenance cost due to better organization
and decreased data duplication
 Multiple applications can use the same data and can
evolve separately over time

10
Advantages of Database over
File-based datasets
 User knowledge can be transferred between
applications more easily because database remains
constant
 Facilitated data sharing, with a corporate view
provided to data managers and users
 Security and standards for data and data access
can be established and enforced

11
Database Management System
 A software application designed to organize
the efficient and effective storage and access
to data
 A suite of software programs designed to
store, retrieve and manipulate data within a
database

12
Types of Database Structure
1. Hierarchical Data Structures
2. Network Systems
3. Relational Database Structures

13
Hierarchical Data Structure
 ‘one-to-many’ or ‘parent-child’ relationship
 Implies that each element has a direct relationship
to a number of symbolic children
 Each child is capable of having the same direct
relationship with his/her own offspring, and so on.

14
Hierarchical Data Structure

15
Hierarchical Data Structure
Advantages:
 Simple and straightforward data access since parent
and children are directly linked
 Easy to search since structure is well defined
 Relatively easy to expand by adding new branches
and formulating new decision rules

16
Hierarchical Data Structure
Disadvantages:
 Confined to queries along one branch only
 Difficult restructuring to allow other possible search
criteria
 Creates large index files
 Redundant entries for searching

17
Network Systems
 ‘many-to-many’ relationship
 Each individual data is linked directly to
anywhere in the database using pointers,
without the parent-child relationship.

18
Network Systems

19
Network Systems
Advantages:
 Less rigid compared to hierarchical structure
 Can handle many-to-many relationships
 Allows much greater flexibility
 Reduced redundancy of data

20
Network Systems
Disadvantages:
 In very complex GIS, the number of
pointers can become large, thus requiring
a lot of storage space
 Linkages between data must still be
explicitly defined using pointers
 Numerous possible linkages can become
extremely tangled, resulting to confusion
and incorrect linkages
 Not recommended for novice users

21
Relational Database
Management Systems
(RDBMS)
 Data are stored as ordered records or rows of
attribute values called tuples
 Tuples are grouped with corresponding data
rows in a form called relations
 Each column represents data for a single
attribute for the entire dataset

22
Relational Database
Management Systems
(RDBMS)
Primary key – a column which is used to define
the search strategy or criterion
Foreign key – column in the second table to
which the primary key is linked

23
Relational Database
Management Systems
(RDBMS)
Normal forms – set of rules to indicate the
forms that the tables should take

1. First Normal Form


2. Second Normal Form
3. Third Normal Form

24
First Normal Form
 Table must contain columns and
rows
 Because the columns are to be
used as search keys, there should
only be a single value in each row
location

25
Second Normal Form
 Requires that every column that is
not a primary key be totally
dependent on the primary key
 Simplifies the tables
 Reduces redundancy by imposing the
restriction that each column be only
searchable using the primary key

26
Third Normal Form
 States that columns that are not
primary keys must “depend” on the
primary key, whereas the primary
key does not depend on the
nonprimary key
 Primary key must be used to find other
columns
 But the other columns are not needed to
search for values in the primary key
column
 Idea is to reduce redundancy

27
Relational Database
Management Systems
(RDBMS)
Advantages:
 Allow us to collect data in reasonably simple tables,
keeping organization also simple
 Capable of doing relational joins, as long as there is
at least one column common to the tables to be
joined
 Allows greatest flexibility, both in design and
querying

28
Data Storage in a DBMS
 Object classes/layers are stored in database
tables
 Each layer is stored as a single database
table in a database management system
 Rows contain objects, while columns contain
attributes/properties of the objects

29
Data Storage in a DBMS
 Geographic database tables have a geometry
column (or shape column), which non-geographic
tables don’t have
 Each layer is stored as a single database table in a
database management system
 Rows contain objects while columns contain
attributes/properties of the objects

30
Basic Database
Functions/Operations
 Join
 Tables are joined together using common row/column
values or keys
 After joining two or more tables, a new table is created
which contains all the values of the joined tables
Database tables can be joined together to create new
relations, or views of the database.

31
Basic Database
Functions/Operations
 Link
 Tables are linked using common row/column values or
keys
 Unlike in joining, linking tables does not result to a new
table. The original tables are retained but accessing one
enables the user to also access a table linked to it

32
Database Design
 Involves three stages: conceptual, logical,
and physical
 Involves six practical steps (see Figure)

33
Stages of Database Design
Conceptual Model
Logical Model
User View
Physical Model
Geographic
Database
Object Types
and Database
Relationships Schema
Geographic
Database
Geographic Structure
Representation

34
Conceptual Model
Steps involved are:
1. Model the user’s view
2. Define objects and their relationships
3. Select geographic representation

35
Model the User’s View
 Identifying organizational functions,
determining data requirements of these
functions, organizing data into groups for
data management
 May be presented using a report with tables

36
Define Objects and Their
Relationships
 Specification of object types/classes and
functions, and their relationships
 May be presented using diagrams

37
Select Geographic
Representation
 Choosing between the types of discrete objects
(point, line, or polygon) or field to represent the
data
 Selection has a critical impact on the database
use
 Although it is possible to switch between
representations later on, it would be
computationally expensive and would lead to
information loss

38
Logical Model
Steps involved are:
1. Match to geographic database types
2. Organize geographic database structure

39
Match to Geographic Database
Types
 Matching of object types to be studied to
specific data types supported by the GIS

40
Organize Geographic Database
Structure
 Defining topological associations, specifying
rules and relationships, and assigning
coordinate systems

41
Physical Model
Step involved is:
1. Define database schema
 definition of the actual physical database
schema that will hold the database data values
 usually created using the DBMS software’s data
definition language (ex. SQL)

42
Database
Organization/Structuring
 Necessary for efficient query, analysis, and
mapping

43
Structuring Techniques
1. Topologic Creation
2. Indexing

44
Topologic Creation
 Can be created for vector data using either batch or
interactive techniques
Batch Topology – for CAD, survey, simple feature and
other unstructured vector data
– an iterative process
Interactive Topology – performed dynamically at the
time objects are added to the database

45
Indexing
 Can help speed up certain types of queries
 Three main indexing methods in GIS are grid
indexes, quadtrees, and R-trees.

Database index – a special representation of


information about objects that improves
searching

46
Thank you!

47

You might also like