Professional Documents
Culture Documents
A Dynamic Distributed Federated Database: Graham Bent Patrick Dantressangle David Vyvyan Abbe Mowshowitz Valia Mitsou
A Dynamic Distributed Federated Database: Graham Bent Patrick Dantressangle David Vyvyan Abbe Mowshowitz Valia Mitsou
I. INTRODUCTION
Military, industrial and scientific data volumes continue to
grow enormously in size and complexity, such that traditional Figure 1. DDFD’s in relation to existing technologies
transactional database architectures can no longer be used cost
effectively to support the growing demand for information One of the key aspects of our approach is the way in which
exploitation. The current trend is a move away from large the different vertices of the DDFD interconnect. Properties
monolithic database architectures to more distributed database such as the degree of separation between any two vertices in a
technologies. In this paper we describe a new type of network, and the degree distribution of connectivity at each
Dynamic Distributed Federated Database (DDFD) that vertex in the network have a significant impact on query
combines the principles of large distributed databases [1], performance, resilience and vulnerability. Using biologically
database federation [7], network topology [11] and the inspired principles of network growth [11], combined with
semantics of data [15], see Figure 1. The resulting database graph theoretic methods [9,1], we describe how a DDFD can
architecture is ideally suited to a range of applications that are be developed and maintained.
required in support of networked operations.
The ‘Store-Locally-Query-Anywhere’ (SLQA) provides a
The approach that we have taken differs from other mechanism for global access to data from any vertex in the
current approaches to the distributed database challenge, such database network. To reflect this capability we have coined
as the map-reduce [4] approaches being used in the Apache the name GaianDB for this type of database architecture.
Hadoop project [8], in that we are seeking, as far as possible,
to maintain a strict distributed relational database model. In
this model data is stored in local database tables at any vertex
in the network, and is accessible from any other vertex using II. DATABASE ARCHITECTURE
Structured Query Language (SQL) like queries and distributed
stored procedure-like processing. The federated distributed database concept that we have
been investigating is illustrated in Figure 2. The database
comprises a set of interconnected vertices (N1,N2…) each of
which is a federated Relational Database Management System
(RDBMS) engine. By federated we mean that the database
N6
N8
N5 N7
N4 N3
N6 N9
N8 N10
SQ L Queries
Query N2
N3
N0
N9
N10
SQL Query N2 N1
N11
N0
N1
N11
Pushes columns
tables constructing the network through a process of graph
Instantiates Original SQL Invokes costing and ‘where’ clause
propagate methods
in a structure In-memory
combining operations [10], which seek to minimise the
tables
GaianPStmtNode VTI:
Executes queries on physical leaf nodes +
resulting network diameter and minimise the vulnerability of
Propagates the original SQL (+ queryID & steps state info) to linked Gaian nodes
MQ(tt)
Stream Data
the network to attacks in which vertices or edges are removed.
These techniques are particularly relevant when initially
Text Index
separate GaianDB networks are to be joined together.
Derby DB2 Oracle MS Sybase MySQL Flat files
SQLServer
Using this protocol we have implemented the broadcast algorithms to minimise the number of paths traversed where
mechanism using an IP multicast. The fitness function F(v) is the same query is being issued from different vertices.
equal to the degree of the vertex. The maximum period of
time tv that an existing vertex will wait before attempting to As noted earlier the GaianDB has the ability to perform
connect to the new one is given by: distributed queries over a network of sensors. To demonstrate
this we have used the temperature, battery and accelerometer
tv = t0 / F(v) sensors that are available on many laptops. Each laptop is
provided with an application that monitors these sensors and
where t0 is a constant used by every vertex. publishes the measurements into a local GaianDB database
table. This table is then accessible from all other vertices in
The actual delay time td is determined by randomly selecting the network. Each vertex can now query its local logical table
a time between 0 and tv. Vertices with higher degree and obtain the data from all other vertices.
preferentially respond faster than vertices of lower degree.
Figure 5 shows a snapshot of a graphical representation of
As vertices connect to the network they record the m edges the state of the sensors from a network of 16 sensors. We have
by which they connect. If one of these edges is subsequently also developed a method for gathering distributed statistics on
lost, the vertex to which it is incident is responsible for re- each query and corresponding result-set. These statistics can
establishing a new edge using the broadcast protocol. When be used to ‘explain’ how queries and results are propagated
any of the other edges that the vertex maintains are lost then no through the network of vertices.
action is taken since these are the responsibility of other
vertices.
Languageware
Rule
Development GaianDB
System
Triple Store
OMNIFIND YAHOO Edition
Figure 6 . Graphs showing the routing of consecutive queries and the paths
over which results are obtained across a Semantic
Rules
12-vertex Gaian Database. UIMA Search
Text
Service
Tokenizer Rule
Annotator Index
results vary dramatically from query to query. This indicates WWW Web
Crawler
File
Crawler
how the different loadings within the network can influence the
query routing.
Windows File System
Our investigations to determine how different sensor
processing algorithms can be implemented using the
Figure 7 Semantic Query Database Architecture
distributed network paradigm has currently been limited to
using distributed SQL to aggregate and filter. We have begun
Connecting these semantic query databases enables in a
work on the use of distributed stored procedures to provide
GaianDB allows the system at each vertex to crawl and index
sensor processing services.
different document collections. A semantic query issued at any
vertex retrieves results from across all the interconnected triple
We have also continued our investigations into the
stores and text indices as shown in Figure 8.
characteristics of the inter-network and have developed a
simulation tool for creating networks using different
connection protocols.