DDBMS Architecture

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 62

 Based on components.

The components of the system are defined together with the


interrelationships between components. A DBMS consists of a
number of components, each of which provides some
functionality.

 Based on functions.
The different classes of users are identified and the functions that
the system will perform for each class are defined. The system
specifications within this category typically specify a hierarchical
structure for the user classes.

2
 Based on data.
The different types of data are identified, and an architectural
framework is specified which defines the functional units that will
realize or use data according to these different views. This
approach (also referred as the datalogical approach) is claimed to
be the preferable choice for standardization activities.

3
The ANSI / SPARC architecture is claimed to be based
on the data organization. It recognizes three views of
data:
the external view, which is that of the user, who might
be a programmer; the internal view, that of the system
or machine; and the conceptual view, that of the
enterprise.

For each of these views, an appropriate schema


definition is required.

4
5
 At the lowest level of the architecture is the internal
view, which deals with the physical definition and
organization of data.

 At the other extreme is the external view, which is


concerned with how users view the database.

 Between these two ends is the conceptual schema, which


is an abstract definition of the database. It is the „real
world” view of the enterprise being modeled in the
database.

6
7
 The square boxes represent processing functions, whereas the
hexagons are administrative roles.
 The arrows indicate data, command, program, and
description flow, whereas the „I”-shaped bars on them
represent interfaces.
 The major component that permits mapping between
different data organizational views is the data dictionary /
directory (depicted as a triangle), which is a meta-database.
 The database administrator is responsible for defining the
internal schema definition.
 The enterprise administrator’s role is to prepare the conceptual
schema definition.
 The application administrator is responsible for preparing the
external schema for applications.

8
The systems are characterized with respect to:
(1) the autonomy of the local systems,
(2) their distribution,
(3) their heterogeneity.
9
Autonomy refers to the distribution of control, no data.
It indicates the degree to which individual DBMSs can
operate independently.

Three alternatives:
 tight integration
 semiautonomous systems
 total isolation

10
Tight integration.
A single-image of the entire database is available to any user
who wants to share the information, which may reside in
multiple databases. From the users’ perspective, the data is
logically centralized in one database.
Semiautonomous systems.
The DBMSs can operate independently. Each of these DBMSs
determine what parts of their own database they will make
accessible to users of other DBMSs.
Total isolation.
The individual systems are stand-alone DBMSs, which know
neither of the existence of the other DBMSs nor how to
communicate with them.

11
Distributions refers to the distributions of data. Of
course, we are considering the physical distribution of
data over multiple sites; the user sees the data as one
logical pool.

Two alternatives:
 client / server distribution
 peer-to-peer distribution (full distribution)

12
Client / server distribution.
The client / server distribution concentrates data management
duties at servers while the clients focus on providing the
application environment including the user interface. The
communication duties are shared between the client machines and
servers. Client / server DBMSs represent the first attempt at
distributing functionality.

Peer-to-peer distribution.
There is no distinction of client machines versus servers. Each
machine has full DBMS functionality and can communicate with
other machines to execute queries and transactions.

13
Heterogeneity may occur in various forms in distributed
systems, ranging form hardware heterogeneity and
differences in networking protocols to variations in
data managers.
Representing data with different modeling tools creates
heterogeneity because of the inherent expressive
powers and limitations of individual data models.
Heterogeneity in query languages not only involves the
use of completely different data access paradigms in
different data models, but also covers differences in
languages even when the individual systems use the
same data model.

14
The dimensions are identified as: A (autonomy), D
(distribution) and H (heterogeneity).
The alternatives along each dimension are identified by
numbers as: 0, 1 or 2.

A0 - tight integration D0 - no distribution


A1 - semiautonomous systems D1 - client / server
systems
A2 - total isolation D2 - peer-to-peer systems

H0 - homogeneous systems
H1 - heterogeneous systems

15
(A0, D0, H0)
If there is no distribution or heterogeneity, the system is a set of
multiple DBMSs that are logically integrated.
(A0, D0, H1)
If heterogeneity is introduced, one has multiple data managers
that are heterogeneous but provide an integrated view to the user.
(A0, D1, H0)
The more interesting case is where the database is distributed
even though an integrated view of the data is provided to users
(client / server distribution).

16
(A0, D2, H0)
The same type of transparency is provided to the user in a fully
distributed environment. There is no distinction among clients
and servers, each site providing identical functionality.
(A1, D0, H0)
These are semiautonomous systems, which are commonly termed
federated DBMS. The component systems in a federated
environment have significant autonomy in their execution, but
their participation in the federation indicate that they are willing
to cooperate with other in executing user requests that access
multiple databases.

17
(A1, D0, H1)
These are systems that introduce heterogeneity as well as
autonomy, what we might call a heterogeneous federated DBMS.
(A1, D1, H1)
System of this type introduce distribution by pacing component
systems on different machines. They may be referred to as
distributed, heterogeneous federated DBMS.
(A2, D0, H0)
Now we have full autonomy. These are multidatabase systems
(MDBS). The components have no concept of cooperation.
Without heterogeneity and distribution, an MDBS is an
interconnected collection of autonomous databases.

18
(A2, D0, H1)
These case is realistic, maybe even more so than (A1, D0, H1), in
that we always want to built applications which access data from
multiple storage systems with different characteristics.
(A2, D1, H1) and (A2, D2, H1)
These two cases are together, because of the similarity of the
problem. They both represent the case where component
databases that make up the MDBS are distributed over a number
of sites - we call this the distributed MDBS.

19
 Client / server systems - (Ax, D1, Hy)

 Distributed databases - (A0, D2, H0)

 Multidatabase systems - (A2, Dx, Hy)

20
 This provides two-level architecture which make it easier
to manage the complexity of modern DBMSs and the
complexity of distribution.

 The server does most of the data management work


(query processing and optimization, transaction
management, storage management).

 The client is the application and the user interface


(management the data that is cached to the client,
management the transaction locks).

21
 This architecture is
quite common in
relational systems
where the
communication
between the clients
and the server(s) is
at the level of SQL
statements.

22
Multiple client - single server
From a data management perspective, this is not much different
from centralized databases since the database is stored on only
one machine (the server) which also hosts the software to manage
it. However, there are some differences from centralized systems
in the way transactions are executed and caches are managed.

Multiple client - multiple server


In this case, two alternative management strategies are possible:
either each client manages its own connection to the appropriate
server or each client knows of only its “home server” which then
communicates with other servers as required.

23
 The physical data organization on each machine may
be different.
 Local internal scheme (LIS) - is an individual internal
schema definition at each site.
 Global conceptual schema (GCS) - describes the enterprise
view of the data.
 Local conceptual schema (LCS) - describes the logical
organization of data at each site.
 External schemas (ESs) - support user applications and
user access to the database.

24
25
In these case, the
ANSI/SPARC model is
extended by the addition of
global directory / dictionary
(GD/D) to permits the required
global mappings. The local
mappings are still performed
by local directory / dictionary
(LD/D). The local database
management components are
integrated by means of global
DBMS functions. Local
conceptual schemas are
mappings of global schema
onto each site.

26
The detailed
components
of a
distributed
DBMS.

Two major
components:
 user processor
 data processor

27
User processor
 user interface handler - is responsible for interpreting user
commands as they come in, and formatting the result data as it is
sent to the user,
 semantic data controller - uses the integrity constraints and
authorizations that are defined as part of the global conceptual
schema to check if the user query can be processed,
 global query optimizer and decomposer - determines an
execution strategy to minimize a cost function, and translates the
global queries in local ones using the global and local conceptual
schemas as well as global directory,
 distributed execution monitor - coordinates the distributed
execution of the user request.

28
Data processor
 local query optimizer - is responsible for choosing the best access
path to access any data item,
 local recovery manager - is responsible for making sure that the
local database remains consistent even when failures occur,
 run-time support processor - physically accesses the database
according to the physical commands in the schedule generated by
the query optimizer. This is the interface to the operating system
and contains the database buffer (or cache) manager, which is
responsible for maintaining the main memory buffers and
managing the data accesses.

29
Models using a Global Conceptual Schema (GCS)
The GCS is defined by integrating either the external schemas of
local autonomous databases or parts of their local conceptual
schemas. If the heterogeneity exists in the system, then two
implementation alternatives exists unilingual and multilingual.

Models without a Global Conceptual Schema (GCS)


The existence of a global conceptual schema in a multidatabase
system is a controversial issue. There are researchers who even
define a multidatabase management system as one that manages
“several databases without the global schema”.

30
31
 A unilingual multi-DBMS requires the users to utilize possibly
different data models and languages when both a local
database and the global database are accessed.

 Any application that accesses data from multiple databases


must do so by means of an external view that is defined on the
global conceptual schema.

 One application may have a local external schema (LES)


defined on the local conceptual schema as well as a global
external schema (GES) defined on the global conceptual
schema.

32
 An alternative is multilingual architecture, where the basic
philosophy is to permit each user to access the global database by
means of an external schema, defined using the language of the
user’s local DBMS.

 The multilingual approach obviously makes querying the


databases easier from the user’s perspective. However, it is more
complicated because we must deal with translation of queries at
run time.

33
34
 The architecture identifies two layers: the local system layer and the
multidatabase layer on top of it.
 The local system layer consists of a number of DBMSs, which
present to the multidatabase layer the part of their local database
they are willing to share with users of the other databases. This
shared data is presented either as the actual local conceptual
schema or as a local external schema definition.
 The multidatabase layer consist of a number of external views,
which are constructed where each view may be defined on one
local conceptual schema or on multiple conceptual schemas. Thus
the responsibility of providing access to multiple databases is
delegated to the mapping between the external schemas and the
local conceptual schemas.

35
The MDBS provides a layer
of software that runs on top
of these individual DBMSs
and provides users with the
facilities of accessing various
databases.

Fig. represents a
nondistributed multi-DBMS.
If the system is distributed,
we would need to replicate
the multidatabase layer to
each site where there is a
local DBMS that participates
in the system.

36
 The global directory includes information about the
location of the fragments as well as the makeup of the
fragments.
 The directory is itself a database that contains meta-data
about the actual data stored in the database.

 We have three dimensions:

1.type 2.location 3.replication

37
Type
A directory maybe either global to the entire database or local
to each site. In other words, there might be a single directory
containing information about all the data in the database, or a
number of directories, each containing the information stored
at one site.

Location
The directory maybe maintained centrally at one site, or in a
distributed fashion by distributing it over a number of sites.

Replication
There maybe a single copy of the directory or multiply copies.

38
These three dimensions are orthogonal to one
another. The unrealistic combination have been
designed by a question mark.
39
The organization of distributed systems can be investigated
along three orthogonal dimensions:
1. Level of sharing
2. Behavior of access patterns
3. Level of knowledge on access pattern behavior

41
Level of sharing
 no sharing - each application and its data execute at one site,
 data sharing - all the programs are replicated at all the sites, but
data files are not,
 data plus program sharing - both data and programs may be
shared.
Behavior of access patterns
 static - access patterns of user requests do not change over time,
 dynamic - access patterns of user requests change over time.
Level of knowledge on access pattern behavior
 complete information - the access patterns can reasonably be
predicted and do not deviate significantly from the predictions,
 partial information - there are deviations from the predictions.

42
Two major strategies that have been identified for
designing distributed databases are:

 the top-down approach

 the bottom-up approach

43
44
 view design - defining the interfaces for end users,

 conceptual design - is the process by which the enterprise is


examined to determine entity types and relationships among these
entities. One can possibly divide this process into to related
activity groups:
 entity analysis - is concerned with determining the entities, their
attributes, and the relationships among these entities,
 functional analysis - is concerned with determining the
fundamental functions with which the modeled enterprise is
involved.

45
 distributions design - design the local conceptual schemas
by distributing the entities over the sites of the distributed
system. The distribution design activity consists of two steps:
 fragmentation
 allocation

 physical design - is the process, which maps the local


conceptual schemas to the physical storage devices available
at the corresponding sites,

 observation and monitoring - the results is some form of


feedback, which may result in backing up to one of the earlier
steps in the design.

46
Top-down design is a suitable approach when a database
system is being designed from scratch.

If a number of databases already exist, and the design


task involves integrating them into one database - the
bottom-up approach is suitable for this type of
environment. The starting point of bottom-up design is
the individual local conceptual schemas. The process
consists of integrating local schemas into the global
conceptual schema.

47
 The important issue is the appropriate unit of distribution. For a
number of reasons it is only natural to consider subsets of
relations as distribution units.
 If the applications that have views defined on a given relation
reside at different sites, two alternatives can be followed, with the
entire relation being the unit of distribution. The relation is not
replicated and is stored at only one site, or it is replicated at all or
some of the sites where the applications reside.
 The fragmentation of relations typically results in the parallel
execution of a single query by dividing it into a set of subqueries
that operate on fragments. Thus, fragmentation typically increases
the level of concurrency and therefore the system throughput.

48
 There are also the disadvantages of fragmentation:
 if the application have conflicting requirements which prevent
decomposition of the relation into mutually exclusive
fragments, those applications whose views are defined on more
than one fragment may suffer performance degradation,
 the second problem is related to semantic data control,
specifically to integrity checking.

49
 The are clearly two alternatives:
 horizontal fragmentation
 vertical fragmentation

 The fragmentation may, of course, be nested. If the


nestings are of different types, one gets hybrid
fragmentation.

50
 The extent to which the database should be fragmented
is an important decision that affects the performance of
query execution.

 The degree of fragmentation goes from one extreme,


that is, not to fragment at all, to the other extreme, to
fragment to the level of individual tuples (in the case of
horizontal fragmentation) or to the level of individual
attributes (in the case of vertical fragmentation).

51
Completeness
If a relation instance R is decomposed into fragments R1,R2, ..., Rn,
each data item that can be found in R can also be found in one or
more of Ri’s. This property is also important in fragmentation
since it ensures that the data in a global relation is mapped into
fragments without any loss.
Reconstruction
If a relation R is decomposed into fragments R1,R2, ..., Rn, it should
be possible to define a relational operator  such that:
R = Ri,  RiFR
The reconstructability of the relation from its fragments ensures
that constraints defined on the data in the form of dependencies
are preserved.

52
Disjointness
If a relation R is horizontally decomposed into fragments R1,R2, ...,
Rn and data item di is in Rj, it is not in any other fragment Rk (k  j).
This criterion ensures that the horizontal fragments are disjoint. If
relation R is vertically decomposed, its primary key attributes are
typically repeated in all its fragments. Therefore, in case of vertical
partitioning, disjointness is defined only on the nonprimary key
attributes of a relation.

53
 The reasons for replication are reliability and efficiency of read-
only queries.

 Read-only queries that access the same data items can be executed
in parallel since copies exist on multiple sites.

 The execution of update queries cause trouble since the system


has to ensure that all the copies of the data are updated properly.

 The decisions regarding replication is a trade-off which depends


on the ratio of the read-only queries to the update queries.

54
 A nonreplicated database (commonly called a partitioned database)
contains fragments that are allocated to sites, and there is only one
copy of any fragment on the network.

 In case of replication, either the database exists in its entirety at


each site (fully replicated database), or fragments are distributed to
the sites in such a way that copies of a fragment may reside in
multiple sites (partially replicated database).

55
56
 The information needed for distribution design can be
divided into four categories:
 database information,
 application information,
 communication network information,
 computer system information.

57
 Horizontal fragmentation partitions a relation along its
tuples

 Two versions of horizontal fragmentation

 Primary horizontal fragmentation of relation is performed


using predicates that are defined on that relation

 Derived fragmentation is the partitioning of relation that


results from predicates being defined on another relation

58
 Vertical fragmentation partitions a relation into
a set of smaller relations so that many of users
aplications will run on only one fragment

 Vertical fragmentation is inherently more


complicated than horizontal partitioning

59
 Allocation problem
 there are set of fragments F= { F1, F2, ... , Fn } and
network consisiting of sites S = { S1, S2, ... , Sm } on
wich sets aplications Q= { q1, q2, ... , qq } is running

 The allocation problem involves finding the


“optimal” distribution of F to S

60
 One of important issues that need to be
discussed is the definition of optimality
 The optimality can be defined with respects of
two measures [ Dowdy and Foster, 1982 ]
 Minimal cost. The cost consists of the cost of storing
each Fi at the site Sj, the cost of quering Fi at Sj, the
cost of updating Fi, at all sites it is stored, and cost of
data comunication. The allocation problem,then,
attempts to find an alocations scheme that minimizes
cost function.

61
 Perfomance. The allocation strategy is designed to
maintain a performance mertic. Two well-known are
to minimize the response time and to maximize the
system throughput at each site

62

You might also like