Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Temporal Database

Module IV
➢ A Temporal Database is a database with built-in support for handling time sensitive data.

➢ Usually, databases store information only about current state, and not about past states.

➢ For example in a employee database if the address or salary of a particular person changes, the
database gets updated, the old value is no longer there.

➢ However for many applications, it is important to maintain the past or historical values and the time
at which the data was updated. That is, the knowledge of evolution is required.

➢ That is where temporal databases are useful. It stores information about the past, present and future.

➢ Any data that is time dependent is called the temporal data and these are stored in temporal
databases.

➢ Temporal Databases store information about states of the real world across time. Temporal Database
is a database with built-in support for handling data involving time. It stores information relating to
past, present and future time of all events.
Examples Of Temporal Databases

❖ Healthcare Systems: Doctors need the patients’ health history for proper diagnosis.
Information like the time a vaccination was given or the exact time when fever goes high
etc.

❖ Insurance Systems: Information about claims, accident history, time when policies are in
effect needs to be maintained.

❖ Reservation Systems: Date and time of all reservations is important.


Temporal Aspects

➢ There are two different aspects of time in temporal databases.

▪ Valid Time: Time period during which a fact is true in real world, provided to the system.

▪ Transaction Time: Time period during which a fact is stored in the database, based on
transaction serialization order and is the timestamp generated automatically by the system.

Temporal Relation

➢ Temporal Relation is one where each tuple has associated time; either valid time or transaction time or
both associated with it.
▪ Uni-Temporal Relations: Has one axis of time, either Valid Time or Transaction Time.

▪ Bi-Temporal Relations: Has both axis of time – Valid time and Transaction time. It includes
Valid Start Time, Valid End Time, Transaction Start Time, Transaction End Time.
Now let’s see an example of a person, John:

▪ John was born on April 3, 1992 in Chennai.


▪ His father registered his birth after three days on April 6, 1992.
▪ John did his entire schooling and college in Chennai.
▪ He got a job in Mumbai and shifted to Mumbai on June 21, 2015.
▪ He registered his change of address only on Jan 10, 2016.
John’s Data In Non-Temporal Database

Date Real world event Address


April 3, 1992 John is born
John’s father
April 6, 1992 Chennai
registered his birth
June 21, 2015 John gets a job Chennai
John registers his new
Jan 10, 2016 Mumbai
address

➢ In a non-temporal database, John’s address is entered as Chennai from 1992.

➢ When he registers his new address in 2016, the database gets updated and the address field now shows
his Mumbai address. The previous Chennai address details will not be available.

➢ So, it will be difficult to find out exactly when he was living in Chennai and when he moved to
Mumbai.
Uni-Temporal Relation (Adding Valid Time To John’s Data)

➢ To make the above example a temporal database, we’ll be adding the time aspect also to the database. \

➢ First let’s add the valid time which is the time for which a fact is true in real world. Valid time is the time
for which a fact is true in the real world.

➢ A valid time period may be in the past, span the current time, or occur in the future.

➢ The valid time temporal database contents look like this:

Name, City, Valid From, Valid Till

➢ In our example, john was born on 3rd April 1992. Even though his father registered his birth three days
later, the valid time entry would be 3rd April of 1992. There are two entries for the valid time.

➢ The Valid Start Time and the Valid End Time. So in this case 3rd April 1992 is the valid start time. Since
we do not know the valid end time we add it as infinity.
Johns father registers his birth on 6th April 1992, a new database entry is made:

Person(John, Chennai, 3-Apr-1992, ∞).

Similarly John changes his address to Mumbai on 10th Jan 2016. However, he has been living in Mumbai
from 21st June of the previous year. So his valid time entry would be 21 June 2015.

On January 10, 2016 John reports his new address in Mumbai:

Person(John, Mumbai, 21-June-2015, ∞).

The original entry is updated.

Person(John, Chennai, 3-Apr-1992, 20-June-2015).


Name City Valid From Valid Till

John Chennai April 3, 1992 June 20, 2015

John Mumbai June 21, 2015 ∞

Bi-Temporal Relation

➢ Next we’ll see a bi-temporal database which includes both the valid time and transaction time.

➢ Transaction time records the time period during which a database entry is made.

➢ So, now the database will have four additional entries the valid from, valid till, transaction entered
and transaction superseded.

➢ The database contents look like this:

Name, City, Valid From, Valid Till, Entered, Superseded


➢ First, when John’s father records his birth the valid start time would be 3rd April 1992, his actual birth
date. However, the transaction entered time would be 6th April 1992.

➢ Johns father registers his birth on 6th April 1992:

Person(John, Chennai, 3-Apr-1992, ∞, 6-Apr-1992, ∞).

➢ Similarly, when john registers his change of address in Mumbai, a new entry is made.

➢ The valid from time for this entry is 21st June 2015, the actual date from which he started living in
Mumbai. whereas the transaction entered time would be 10th January 2016. We do not know how
long he’ll be living in Mumbai.

➢ So the transaction end time and the valid end time would be infinity. At the same time the original
entry is updated with the valid till time and the transaction superseded time.
On January 10, 2016 John reports his new address in Mumbai:

Person(John, Mumbai, 21-June-2015, ∞, 10-Jan-2016, ∞).


The original entry is updated.

Person(John, Chennai, 3-Apr-1992, 20-June-2015, 6-Apr-1992, 10-Jan-2016).

Name City Valid From Valid Till Entered Superseded


April 3, June 20, April 6,
John Chennai Jan 10, 2016
1992 2015 1992
June 21,
John Mumbai ∞ Jan 10, 2016 ∞
2015
Advantages

➢ The main advantages of this bi-temporal relations is that it provides historical and roll back
information. For example, you can get the result for a query on John’s history, like: Where did John
live in the year 2001?.

➢ The result for this query can be got with the valid time entry.

➢ The transaction time entry is important to get the rollback information.

▪ Historical Information – Valid Time.

▪ Rollback Information – Transaction Time.


Spatial and Geographic Data

➢ Spatial data comprise the relative geographic information about the earth and its features. A pair of
latitude and longitude coordinates defines a specific location on earth.

➢ Spatial data are of two types according to the storing technique, namely, raster data and vector data.

▪ Raster data are composed of grid cells identified by row and column. The whole
geographic area is divided into groups of individual cells, which represent an image.
Satellite images, photographs, scanned images, etc., are examples of raster data.

▪ Vector data are composed of points, polylines, and polygons. Wells, houses, etc., are
represented by points. Roads, rivers, streams, etc., are represented by polylines. Villages
and towns are represented by polygons.
Representation of Geometric Information

➢ A line segment can be represented by the


coordinates of its endpoints.

➢ For example, in a map database, the two


coordinates of a point would be its latitude
and longitude.

➢ A polyline (also called a line string)


consists of a connected sequence of line
segments and can be represented by a list
containing the coordinates of the endpoints
of the segments, in sequence
➢ We can represent a polygon by listing its vertices in order.

➢ The list of vertices specifies the boundary of a polygonal region.

➢ In an alternative representation, a polygon can be divided into a set of triangles, as shown in Figure.
This process is called triangulation, and any polygon can be triangulated.

➢ The complex polygon can be given an identifier, and each of the triangles into which it is divided
carries the identifier of the polygon
Geographic Data

➢ Geographic data are spatial in nature, but differ from design data in certain ways.

➢ Maps and satellite images are typical examples of geographic data.

➢ Maps may provide not only location information—about boundaries, rivers, and roads, for example—
but also much more detailed information associated with locations, such as elevation, soil type, land
usage, and annual rainfall.

➢ Applications of Geographic Data


▪ Web-based road map services form a very widely used application of map data. At the simplest
level, these systems can be used to generate online road maps of a desired region. An important
benefit of online maps is that it is easy to scale the maps to the desired size—that is, to zoom in and
out to locate relevant features. Road map services also store information about roads and services,
such as the layout of roads, speed limits on roads, road conditions, connections between roads, and
one-way restrictions. With this additional information about roads, the maps can be used for getting
directions to go from one place to another and for automatic trip planning. Users can query online
information about services to locate, for example, hotels, gas stations, or restaurants with desired
offerings and price ranges
▪ Vehicle-navigation systems are systems that are mounted in automobiles and provide roadmaps
and trip-planning services. They include a Global Positioning System (GPS) unit, which uses
information broadcast from GPS satellites to find the current location with an accuracy of tens of
meters. With such a system, a driver can never3 get lost—the GPS unit finds the location in terms
of latitude, longitude, and elevation and the navigation system can query the geographic database
to find where and on which road the vehicle is currently located.

▪ Geographic databases for public-utility information have become very important as the network
of buried cables and pipes has grown. Without detailed maps, work carried out by one utility may
damage the cables of another utility, resulting in large-scale disruption of service. Geographic
databases, coupled with accurate location-finding systems, help avoid such problems
Spatial Queries

➢ Nearness queries request objects that lie near a specified location. A query to find all restaurants that
lie within a given distance of a given point is an example of a nearness query. The nearest-neighbor
query requests the object that is nearest to a specified point. For example, we may want to find the
nearest gasoline station. Note that this query does not have to specify a limit on the distance, and
hence we can ask it even if we have no idea how far the nearest gasoline station lies.

➢ Range queries deal with spatial regions. Such a query can ask for objects that lie partially or fully
inside a specified region. A query to find all retail shops within the geographic boundaries of a given
town is an example.

➢ Queries may also request intersections and unions of regions. For example, given region information,
such as annual rainfall and population density, a query may request all regions with a low annual
rainfall as well as a high population density.
Indexing of Spatial Data

➢ A spatial index is a data structure that allows for accessing a spatial object efficiently.

➢ It is a common technique used by spatial databases.

➢ Without indexing, any search for a feature would require a "sequential scan" of every record in the
database, resulting in much longer processing time.
k-d Trees

➢ A K-Dimensional Tree (also known as K-D Tree) is a space-partitioning data structure for organizing
points in a K-Dimensional space.

➢ This data structure acts similar to a binary search tree with each node representing data in the multi
dimensional space.

➢ The K-Dimensional Tree was first developed in 1975 by Jon Bentley.

➢ The purpose of the tree was to store spatial data with the goal of accomplishing:

▪ Nearest neighbor search.

▪ Range queries.

▪ Fast look-up.
Insertion

➢ A simple example to showcase the insertion into a K-Dimensional Tree, we will use a k = 2. The
points we will be adding are: (7,8), (12,3), (14,1), (4,12), (9,1), (2,7), and (10,19).

➢ The first element inserted is (7,8). It will serve as the base node in the following K-D Tree example.

➢ The second element inserted is (12,3). It is placed to the right leaf node of (7,8) because the X value,
12, is greater than the X value of base, 7.
➢ The next element inserted into the K-D Tree is (14,1).

➢ At the first level, we compare the X values of the set and since 14 is greater than 7 it moves to the right of
the tree. Next is the comparison between (12,3) and (14,1).

➢ Each level the comparison operator changes, which means that we will be comparing the Y value of each
set.

➢ Since the Y of the inserted set is 1 which is less than 3, it is inserted to the left leaf node of (12,3).
➢ The next set inserted is (4,12). Because the X, 4, is less than the base nodes X, which is 7, the set is
inserted to the left leaf node.
➢ The next set is (9,1). Since the X is greater than the base, it moves to the right leaf. At level 2 we
compare the Y values, which is less than the current leaf, so we move to the left leaf of this node. On
level 3 we restart and compare by the X value again, but if we had multiple dimensions it would
moved to the next dimension. Since 9 is less than 14, we insert it to the left of this leaf node.
➢ The next set inserted is (2,7). Since the X is less than 7 it compares to the left leaf node. As the Y value is
less than 12, the second levels Y value, this set is inserted to the left of the (4,12) leaf node.

➢ The next set inserted is (10,19). 10 is greater than 7 so it moves to the right leaf node. Additionally, 19 is
greater than the second levels Y, so it it inserted to the right leaf node.
On a plotted 2 dimensional graph the following K-D Tree
will be the equivalent to:
Search

➢ Since this data structure is based on a Binary Search Tree, the search function acts similarly.
➢ Based on the above example, lets say we are searching for the point (14,1).

➢ We begin at the base node and compare the X values in the sets.

➢ Since 14 is greater than 7 we move to the next level and compare the Y values at the given data set.

➢ Since 1 is less than 3, we can move to the left child node.

➢ Since the X values are the same then we are able to compare Y values.

➢ As every element in the set is equal, we can confirm that this point is the correct point we have
been searching for.

➢ In the case that the value comparing is the same (lets say the X) and the other value is different
(lets say the Y), we will move to the right node as that is the node we send duplicates.
Remove

➢ Similar to a Binary Search Tree, to delete a node in a K-Dimensional Tree we must search the tree
recursively. Once found, however, there are a variety of ways we must handle the deletion.

➢ 1) If current node contains the point to be deleted


i. If node to be deleted is a leaf node, simply delete it.
ii. If node to be deleted has right child as not NULL.
• Find minimum of current node’s dimension in right subtree.
• Replace the node with above found minimum and recursively delete minimum in right subtree.

iii. Else If node to be deleted has left child as not NULL


• Find minimum of current node’s dimension in left subtree.

• Replace the node with above found minimum and recursively delete minimum in left
subtree.

• Make new left subtree as right child of current node.


➢ 2) If current doesn’t contain the point to be deleted

▪ If node to be deleted is smaller than current node on current dimension, recur for left subtree.

▪ Else recur for right subtree.


Delete (30, 40): Since right child is not NULL and dimension of node is x, we find the node with minimum x
value in right child. The node is (35, 45), we replace (30, 40) with (35, 45) and delete (30, 40).
Delete (70, 70): Dimension of node is y. Since right child is NULL, we find the node with minimum y value
in left child. The node is (50, 30), we replace (70, 70) with (50, 30) and recursively delete (50, 30) in left
subtree. Finally we make the modified left subtree as right subtree of (50, 30).
Find minimum

➢ The concepts of searching for the minimum value in a K-Dimensional Tree is similar to that of a Binary
Search Tree. We continue to iterate to the left subtree until we reach the minimum point.
Quad Tree

➢ Quadtrees are trees used to efficiently store data of points on a two-dimensional space. In this tree,
each node has at most four children.

➢ We can construct a quadtree from a two-dimensional area using the following steps:

▪ Divide the current two dimensional space into four boxes.

▪ If a box contains one or more points in it, create a child object, storing in it the two
dimensional space of the box

▪ If a box does not contain any points, do not create a child for it

▪ Recurse for each of the children.


➢ The quadtree represents a collection of data points in two dimensions by decomposing the region containing
the data points into four equal quadrants, subquadrants, and so on, until no leaf node contains more than a
single point.

➢ In other words, if a region contains zero or one data points, then it is represented by a quadtree consisting of
a single leaf node.

➢ If the region contains more than a single data point, then the region is split into four equal quadrants.

➢ The corresponding quadtree then contains an internal node and four subtrees, each subtree representing a
single quadrant of the region, which might in turn be split into subquadrants.

➢ Each internal node of a quadtree represents a single split of the two-dimensional region.

➢ The four quadrants of the region (or equivalently, the corresponding subtrees) are designated (in order) NW,
NE, SW, and SE.

➢ Each quadrant containing more than a single point would in turn be recursively divided into subquadrants
until each leaf of the corresponding PR quadtree contains at most one point.
R-tree
➢ R-tree is a tree data structure used for storing spatial data indexes in an efficient manner.

➢ R-trees are highly useful for spatial data queries and storage.

➢ R-trees are hierarchical data structures based on B+- trees.

➢ They are used for the dynamic organization of a set of d-dimensional geometric objects representing
them by the minimum bounding d-dimensional rectangles (for simplicity, MBRs in the sequel).

➢ Each node of the R-tree corresponds to the MBR that bounds its children. The leaves of the tree
contain pointers to the database objects instead of pointers to children nodes. The nodes are
implemented as disk pages.

➢ Some of the real life applications are mentioned below:

▪ Indexing multi-dimensional information.


▪ Handling geospatial coordinates.
▪ Implementation of virtual maps.
▪ Handling game data.
➢ It must be noted that the MBRs that surround different nodes may overlap each other.

➢ Besides, an MBR can be included (in the geometrical sense) in many nodes, but it can be associated to
only one of them.

➢ This means that a spatial search may visit many nodes before confirming the existence of a
➢ given MBR.

➢ Also, it is easy to see that the representation of geometric objects through their MBRs may result in
false alarms.

➢ To resolve false alarms, the candidate objects must be examined.

➢ For instance, Figure 1.1 illustrates the case where two polygons do not intersect each other, but their
MBRs do.

➢ Therefore, the R-tree plays the role of a filtering mechanism to reduce the costly direct examination of
geometric objects.
➢ Figure 1.2 shows a set of the MBRs of some data geometric objects (not shown).

➢ These MBRs are D,E, F, G,H, I, J,K, L,M, and N, which will be stored at the leaf level of the R-
tree.

➢ The same figure demonstrates the three MBRs (A,B, and C) that organize the aforementioned
rectangles into an internal node of the R-tree.

➢ Assuming that M = 4 and m = 2, Figure 1.3 depicts the corresponding MBR. It is evident that
several R-trees can represent the same set of data rectangles.

➢ Each time, the resulting R-tree is determined by the insertion (and/or deletion) order of its
entries

You might also like