Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

Spatial Data Management

UNIT-5
Contents
Types of Spatial data & queries
Applications involving spatial data
Introduction to Spatial Indexes
Indexing based on space filling curves, grid files
R-Trees: Point & Region data
Issues on High Dimensional Indexing
A spatial database is a general-purpose database that
has been enhanced to include spatial data that
represents objects defined in a geometric space, along
with tools for querying and analyzing such data. Most
spatial databases allow the representation of simple
geometric objects such as points, lines and polygons
Spatial data can exist in a variety of formats and
contains more than just location specific information.
To properly understand and learn more about spatial
data, there are a few key terms that will help you
become more fluent in the language of spatial data.
Types of Spatial Data:
Spatial data are of two types according to the storing
technique, namely, raster data and vector data.
Raster data are composed of grid cells identified by row
and column. The whole geographic area is divided into
groups of individual cells, which represent an image.
Satellite images, photographs, scanned images, etc., are
examples of raster data.
Vector data are composed of points, polylines, and
polygons. Wells, houses, etc., are represented by
points. Roads, rivers, streams, etc., are represented by
polylines. Villages and towns are represented by
polygons.
 Attribute Data
Attribute data comprise the pertinent information
about the spatial data. The querying feature works
based on attribute data, i.e., it is attached to geospatial
data. Types of attribute data are:
•nominal data;
•ordinal data;
•interval data;
•ratio data.
vector data
Point Data — layers containing by points (or “events”)
described by x,y (lat,long; easting, northing)
Line/Polyline Data — layers that are described by x,y
points (nodes, events) and lines (arcs) between points
(line segments and polylines)
Polygon Data — layers of closed line segments
enclosing areas that are described by attributes
raster or grid data (matrices of numbers describing e.g.,
elevation, population, herbicide use, etc.
images or pictures such as remote sensing data or scans of
maps or other photos.  This is special “grid” where the number
in each cell describes what color to paint or the spectral
character of the image in that cell. (to be used, the “picture”
must be placed on a coordinate system, or “rectified” or
“georeferenced”)
attribute data are non-spatial characteristics that are
connected by tables to points, lines, “events” on lines, and
polygons (and in some cases GRID cells)A point, vector or
raster geologic map might describe a “rock unit” on a map with
a single number, letter or name, but the associated attribute
table might have
age
lithology
percent quartz
etc, for each rock type on the map.
most GIS programs can either plot the polygon by the
identifier or by one of the attributes
METADATA
metadata are the most forgotten data type
absolutely necessary if you’re going to use data, or if
someone is going to use your data later (or your derivative
information)
contains information about
scale
accuracy
projection/datum
data source
manipulations
how to acquire data
Applications involving spatial data:
 All spatial data-referenced fields and decision-
making process, e.g. GIS, remote sensing, GPS
(global positioning system), transportation,
police, medicine, transportation, navigation,
robot, etc.
Urban Planning
Military Operations
Farming
Disaster & Emergency
Weather Forecasting
Spatial Index:
The purpose of a spatial index is to facilitate spatial
selection. That is, in response to a query, the spatial index
will only search through a subset of objects embedded in
the space to retrieve the query result set.
 A fundamental idea for spatial indexing is the use of
approximations. This allows index structures to manage an
object in terms of one or more spatial objects, which are
much simpler geometric objects than the object itself.
The prime example is the bounding box (the smallest
orthogonal rectangle enclosing the object). Another
method called grid approximation divides the space into
cells by a regular grid, and the object is represented by the
set of cells that it intersects.
The use of approximations leads to a filter and refine
strategy for spatial query processing: First, based on
the approximations, a filtering step is executed; it
returns a set of candidates that is a super set of the
objects fulfilling a spatial predicate. Second, the result
is refined by checking the exact geometry of each
candidate geometry.
Indexing based on space filling curves
What is a space-filling curve?
a line passing through every point in a space, in a
particular order, according to some algorithm
some curves pass through points once only and so
each point lies a unique distance along the curve
away from its beginning - these are the ones we are
interested in
examples include the Hilbert Curve and the Z-Order
Curve
Summary of the work undertaken
designed and implemented a fully functioning
application for the storage and retrieval of multi-
dimensional data
it uses space-filling curves to map multi-dimensional
data to one dimensional values
currently, it works in up to 16 dimensions but can be
easily extended
Why research into multi-dimensional indexing?
despite the volume of previous work, it's a problem
waiting for a generally accepted good solution
data is being gathered in ever increasing volume
this data is increasingly higher dimensional in nature
growing aspirations for sophisticated data processing
to extract valuable information
Why use space-filling curves?
in mapping multi-dimensional space to one-
dimensional values, they allow simple indexing
methods to be used - like the B-tree
unlike traditional hashing functions, they preserve in
the one-dimensional values the proximity of points in
space
although of interest to the research community, most
previous work relates to the theoretical clustering
properties of the curves and little relates to practical
implementation
What are the main problems associated with
space-filling curves?
how to map between 1 and n dimensions
using state diagrams
by calculation
how to execute queries
what information to store and how to store it
R-Trees: Point & Region data
R-tree is a tree data structure used for storing spatial
data indexes in an efficient manner. R-trees are highly
useful for spatial data queries and storage. Some of the
real life applications are mentioned below: 
Indexing multi-dimensional information.
Handling geospatial coordinates.
Implementation of virtual maps.
Handling game data.
Properties of R-tree: 
 
Consists of a single root, internals nodes and leaf nodes.
Root contains the pointer to the largest region in the
spatial domain.
Parent nodes contains pointers to their child nodes
where region of child nodes completely overlaps the
regions of parent nodes.
Leaf nodes contains data about the MBR to the current
objects.
MBR-Minimum bounding region refers to the minimal
bounding box parameter surrounding the region/object
under consideration.
Comparison with Quad-trees: 
 
Tiling level optimization is required in Quad-trees
whereas in R-tree doesn’t require any such
optimization.
Quad-tree can be implemented on top of existing B-
tree whereas R-tree follow a different structure from a
B-tree.
Spatial index creation in Quad-trees is faster as
compared to R-trees.
R-trees are faster than Quad-trees for Nearest
Neighbour queries while for window queries, Quad-
trees are faster than R-trees.
Implementation of R-Trees:
A spatial database consists of a collection of tuples
representing spatial objects, and each tuple has a
unique identifier which can be used to retrieve it.
Leaf nodes in an R-tree contain index record entries
of the form
A tuple in a database is referred to as a tuple_identifier
and I represents n-dimensional rectangle which is the
bounding box of the spatial object indexed.
Non-leaf nodes contain entries of the form
Every leaf node contains nodes between m and M
unless the root
For each index record( I, tuple identifier) in a leaf node,
I is the smallest rectangle that spatially contains the n-
dimensional data object represented by the indicated
tuple.
Every non-leaf node has between m and M children
unless it's the root.
For each entry (I, child pointer) in a non-leaf node, I is
the smallest rectangle that spatially contains the
rectangles in the child nodes=
The root node has at least two children unless it's a leaf
All leaves appear on the same level
Operation on R-Tree:
Insertion on R.Tree
 Algorithm Insert
 Algorithm ChooseLeaf
 Algorithm AdjustTree
 Node Splitting
Searching in R.Tree
Issues on High Dimensional Indexing:
We propose a new method for indexing large amounts
of point and spatial data in high dimensional space.
An analysis shows that index structures such as the R*-
tree are not adequate for indexing high-dimensional
data sets.
The major problem of R-tree-based index structures is
the overlap of the bounding boxes in the directory,
which increases with growing dimension.
To avoid this problem, we introduce a new
organization of the directory which uses a split
algorithm minimizing overlap and additionally utilizes
the concept of supernodes.
The basic idea of overlap-minimizing split and
supernodes is to keep the directory as hierarchical as
possible, and at the same time to avoid splits in the
directory that would result in high overlap.
 Our experiments show that for high-dimensional
data, the X-tree outperforms the well-known R*-tree
and the TV-tree by up to two orders of magnitude.

You might also like