A Comparison of R, R+,R, X and Hilberg Tree: Submitted by

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

A Comparison of R,R+,R*,X and Hilberg Tree

Submitted By:
Ram Charan Baishya

Mtech(IT) 2nd Sem.

CSI100022
R-trees:
R-trees are tree data structures that are similar to B-trees, but are used for
spatial access methods, i.e., for indexing multi-dimensional information; for
example, the (X, Y) coordinates of geographical data. A common real-world
usage for an R-tree might be: "Find all museums within 2 km of my current
location".
The data structure splits space with hierarchically nested, and possibly
overlapping, minimum bounding rectangles (MBRs, otherwise known as
bounding boxes, i.e. "rectangle", what the "R" in R-tree stands for).
Each node of an R-tree has a variable number of entries (up to some pre-
defined maximum). Each entry within a non-leaf node stores two pieces of
data: a way of identifying a child node, and the bounding box of all entries
within this child node.

The insertion and deletion algorithms use the bounding boxes from the
nodes to ensure that "nearby" elements are placed in the same leaf node (in
particular, a new element will go into the leaf node that requires the least
enlargement in its bounding box). Each entry within a leaf node stores two
pieces of information; a way of identifying the actual data element (which,
alternatively, may be placed directly in the node), and the bounding box of
the data element.

Similarly, the searching algorithms (e.g., intersection, containment, nearest)


use the bounding boxes to decide whether or not to search inside a child
node. In this way, most of the nodes in the tree are never "touched" during
a search. Like B-trees, this makes R-trees suitable for databases, where
nodes can be paged to memory when needed.

Different algorithms can be used to split nodes when they become too full,
resulting in the quadratic and linear R-tree sub-types.

R-trees do not historically guarantee good worst-case performance, but


generally perform well with real-world data.[citation needed] However, a
new algorithm was published in 2004 that defines the Priority R-Tree, which
claims to be as efficient as the most efficient methods of 2004 and is at the
same time worst-case optimal.[1]
When data is organized in an R-Tree, the k nearest neighbors (for any Lp-
Norm) of all points can efficiently be computed using a spatial join.[2] This is
beneficial for many algorithms based on the k nearest neighbors, for
example the Local Outlier Factor.

Search:
The input is a search rectangle (Query box). Searching is quite similar to
searching in a B+tree. The search starts from the root node of the tree.
Every internal node contains a set of rectangles and pointers to the
corresponding child node and every leaf node contains the rectangles of
spatial objects (the pointer to some spatial object can be there). For every
rectangle in a node, it has to be decided if it overlaps the search rectangle or
not. If yes, the corresponding child node has to be searched also. Searching
is done like this in a recursive manner until all overlapping nodes have been
traversed. When a leaf node is reached, the contained bounding boxes
(rectangles) are tested against the search rectangle and their objects (if
there are any) are put into the result set if they lie within the search
rectangle.

Insertion:
To insert an object, the tree is traversed recursively from the root node. All
rectangles in the current internal node are examined. The constraint of least
coverage is employed to insert an object, i.e., the box that needs least
enlargement to enclose the new object is selected. In the case where there
is more than one rectangle that meets this criterion, the one with the
smallest area is chosen. Inserting continues recursively in the chosen node.
Once a leaf node is reached, a straightforward insertion is made if the leaf
node is not full. If the leaf node is full, it must be split before the insertion is
made. A few splitting algorithms have been proposed for good R-tree
performance.
Figure 1:A 2d view of R tree

R*-trees:
R*-trees are a variant of R-trees used for indexing spatial information. R*-
trees support point and spatial data at the same time with a slightly higher
cost than other R-trees. It was proposed by Norbert Beckmann, Hans-Peter
Kriegel, Ralf Schneider, and Bernhard Seeger in 1990.

Difference between R*-trees and R-trees

Minimization of both coverage and overlap is crucial to the performance of


R-trees. Overlap means that, on data query or insertion, more than one
branch of the tree needs to be expanded (due to storage redundancy). A
minimized coverage improves pruning performance, allowing to exclude
whole pages from search more often, in particular for negative range
queries.
The R*-tree attempts to reduce both, using a combination of a revised node
split algorithm and the concept of forced reinsertion at node overflow. This is
based on the observation that R-tree structures are highly susceptible to the
order in which their entries are inserted, so an insertion-built (rather than
bulk-loaded) structure is likely to be sub-optimal. Deletion and reinsertion of
entries allows them to "find" a place in the tree that may be more
appropriate than their original location.

When a node overflows, a portion of its entries are removed from the node
and reinserted into the tree. (In order to avoid an indefinite cascade of
reinsertions caused by subsequent node overflow, the reinsertion routine
may be called only once in each level of the tree when inserting any one new
entry.) This has the effect of producing more well-clustered groups of entries
in nodes, reducing node coverage. Furthermore, actual node splits are often
postponed, causing average node occupancy to rise. Re-insertion can be
seen as a method of incremental tree optimization triggered on node
overflow.

Performance: Likely significant improvement over other R tree variants, but


there is overhead due to the reinsertion method.

Efficiently supports point and spatial data at the same time

Algorithm:

The R*-tree uses the same algorithm as the R-tree for query and delete
operations. The primary difference is the insert algorithm, specifically how it
chooses which branch to insert the new node into and the methodology for
splitting a node that is full.
R+ tree:
An R+ tree is a method for looking up data using a location, often (x, y)
coordinates, and often for locations on the surface of the earth. Searching on
one number is a solved problem; searching on two or more, and asking for
locations that are nearby in both x and y directions, requires craftier
algorithms.

Fundamentally, an R+ tree is a tree data structure, a variant of the R tree,


used for indexing spatial information.
Difference between R+ trees and R trees:

R+ trees are a compromise between R-trees; and kd-trees; they avoid


overlapping of internal nodes by inserting an object into multiple leaves if
necessary.

R+ trees differ from R trees in that:

Nodes are not guaranteed to be at least half filled

The entries of any internal node do not overlap

An object ID may be stored in more than one leaf node

Advantages:

Because nodes are not overlapped with each other, point query performance
benefits since all spatial regions are covered by at most one node.

A single path is followed and fewer nodes are visited than with the R-tree

Disadvantages:

Since rectangles are duplicated, an R+ tree can be larger than an R tree


built on same data set.

Construction and maintenance of R+ trees is more complex than the


construction and maintenance of R trees and other variants of the R tree.

X-tree
In computer science, an X-tree is an index tree structure based on the R-
tree used for storing data in many dimensions. It differs from R-trees, R+-
trees and R*-trees because it emphasizes prevention of overlap in the
bounding boxes, which increasingly becomes a problem in high dimensions.
In cases where nodes cannot be split without preventing overlap, the node
split will be deferred, resulting in super-nodes. In extreme cases, the tree
will linearize, which defends against worst-case behaviors observed in some
other data structures.

Hilbert R-tree

Hilbert R-tree, an R-tree variant, is an index for multidimensional objects


like lines, regions, 3-D objects, or high dimensional feature-based
parametric objects. It can be thought of as an extension to B+-tree for
multidimensional objects.

The performance of R-trees depends on the quality of the algorithm that


clusters the data rectangles on a node. Hilbert R-trees use space-filling
curves, and specifically the Hilbert curve, to impose a linear ordering on the
data rectangles.

There are two types of Hilbert R-tree, one for static database and one for
dynamic databases. In both cases, space filling curves and specifically the
Hilbert curve are used to achieve better ordering of multidimensional objects
in the node. This ordering has to be ‘good’, in the sense that it should group
‘similar’ data rectangles together, to minimize the area and perimeter of the
resulting minimum bounding rectangles (MBRs). Packed Hilbert R-trees are
suitable for static databases in which updates are very rare or in which there
are no updates at all.

The dynamic Hilbert R-tree is suitable for dynamic databases where


insertions, deletions, or updates may occur in real time. Moreover, dynamic
Hilbert R-trees employ flexible deferred splitting mechanism to increase the
space utilization. Every node has a well defined set of sibling nodes. By
adjusting the split policy the Hilbert R-tree can achieve a degree of space
utilization as high as is desired. This is done by proposing an ordering on the
R-tree nodes. The Hilbert R-tree sorts rectangles according to the Hilbert
value of the center of the rectangles (i.e., MBR). (The Hilbert value of a
point is the length of the Hilbert curve from the origin to the point.) Given
the ordering, every node has a well-defined set of sibling nodes; thus,
deferred splitting can be used. By adjusting the split policy, the Hilbert R-
tree can achieve as high utilization as desired. To the contrary, other R-tree
variants have no control over the space utilization.

Refference :

www.wikipedia.com

You might also like