DBMS CH-6

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 24

Advanced Database Management system

CHAPTER SIX

Distributed Databases System

02/24/2023
1
Distributed Databases System Concept

distributed database (DDB) as a collection of multiple logically


interrelated databases distributed over a computer network,

and a distributed database management system (DDBMS) as a


software system that manages a distributed database while making
the distribution transparent to the user

02/24/2023 2
Distributed Databases System Concept

A distributed database (DDB) processes Unit of execution (a


transaction) in a distributed manner. A distributed database (DDB) can
be defined as

A distributed database (DDB) is a collection of multiple logically


related database distributed over a computer network, and a
distributed database management system as a software system that
manages a distributed database while making the distribution
transparent
02/24/2023 to the user.
3
Advantages of Distributed Databases System
Management of distributed data with different levels of transparency: This refers to the
physical placement of data (files, relations, etc.) which is not known to the user (distribution
transparency).

Distribution and Network transparency:

Users do not have to worry about operational details of the network.

There is Location transparency, which refers to freedom of issuing command from any
location without affecting its working.

Then there is Naming transparency, which allows access to any names object (files, relations,
etc.) from any location.
02/24/2023 4
Replication transparency:

It allows to store copies of a data at multiple sites as shown in the


above diagram.

This is done to minimize access time to the required data.

Fragmentation transparency:

Allows to fragment a relation horizontally (create a subset of tuples


of a relation) or vertically (create a subset of columns of a relation).

02/24/2023 5
Increased reliability and availability:

Reliability refers to system live time, that is, system is running


efficiently most of the time. Availability is the probability that the
system is continuously available (usable or accessible) during a time
interval.

A distributed database system has multiple nodes (computers) and if


one fails then others are available to do the job.

02/24/2023 6
Improved performance:

A distributed DBMS fragments the database to keep data closer to where it


is needed most.

This reduces data management (access and modification) time significantly.

Easier expansion (scalability):

Allows new nodes (computers) to be added anytime without chaining the


entire configuration.

02/24/2023 7
Functions of Distributed Databases

Distribution leads to increased complexity in the system design and


implementation. the DDBMS software must be able to provide the following
functions in addition to those of a centralized DBMS:

Keeping track of data distribution. The ability to keep track of the data
distribution, fragmentation, and replication by expanding the DDBMS catalog.

 Distributed query processing. The ability to access remote sites and transmit
queries and data among the various sites via a communication network.

02/24/2023 8
Distributed transaction management. The ability to devise execution
strategies for queries and transactions that access data from more than one site and to
synchronize the access to distributed data and maintain the integrity of the overall
database.

Replicated data management. The ability to decide which copy of a replicated data
item to access and to maintain the consistency of copies of a replicated data item.

Distributed database recovery. The ability to recover from individual site


crashes and from new types of failures, such as the failure of communication
links.
02/24/2023 9
Security. Distributed transactions must be executed with the proper
management of the security of the data and the authorization/access privileges
of users.

Distributed directory (catalog) management. A directory contains


information (metadata) about data in the database. The directory may be global
for the entire DDB, or local for each site. The placement and distribution of
the directory are design and policy issues.

02/24/2023 10
DATA FRAGMENTATION, REPLICATION AND ALLOCATION TECHNIQUES FOR DISTRIBUTED DATABASE DESIGN

Data Fragmentation

Split a relation into logically related and correct parts. A relation can be
fragmented in two ways:

Horizontal Fragmentation

Vertical Fragmentation

02/24/2023 11
Horizontal fragmentation
It is a horizontal subset of a relation which contain those of tuples which satisfy
selection conditions.

Consider the Employee relation with selection condition (DNO = 5). All tuples
satisfy this condition will create a subset which will be a horizontal fragment of
Employee relation.

A selection condition may be composed of several conditions connected by AND


or OR.

Derived horizontal fragmentation: It is the partitioning of a primary relation to


other secondary relations which are related with Foreign keys.
02/24/2023 12
Vertical fragmentation

It is a subset of a relation which is created by a subset of columns. Thus


a vertical fragment of a relation will contain values of selected columns.
There is no selection condition used in vertical fragmentation.

Consider the Employee relation. A vertical fragment of can be created by


keeping the values of Name, Bdate, Sex, and Address.

Because there is no condition for creating a vertical fragment, each


fragment must include the primary key attribute of the parent relation
Employee. In this way all vertical fragments of a relation are connected.
02/24/2023 13
Data Replication

Database is replicated to all sites.

In full replication the entire database is replicated and in partial


replication some selected part is replicated to some of the sites.

Data replication is achieved through a replication schema.

Data Distribution (Data Allocation)

This is relevant only in the case of partial replication or partition.

The selected portion of the database is distributed to the database 14


TYPES OF DISTRIBUTED DATABASE SYSTEMS

02/24/2023 15
Heterogeneous

Federated: Each site may run different database system but the data
access is managed through a single conceptual schema.

 This implies that the degree of local autonomy is minimum. Each site
must adhere to a centralized access policy. There may be a global schema.

Multidatabase: There is no one conceptual global schema. For data


access a schema is constructed dynamically as needed by the

application software.
02/24/2023 16
Heterogeneous

02/24/2023 17
Query Processing and Optimization in Distributed Databases

A distributed database query is processed in stages as follows:


1. Query Mapping. The input query on distributed data is specified formally
using a query language. It is then translated into an algebraic query on global
relations. This translation is done by referring to the global conceptual
schema and does not take into account the actual distribution and replication
of data. Hence, this translation is largely identical to the one performed in a
centralized DBMS. It is first normalized, analyzed for semantic errors,
simplified, and finally restructured into an algebraic query.
02/24/2023 18
2. Localization. In a distributed database, fragmentation results in
relations being stored in separate sites, with some fragments possibly
being replicated. This stage maps the distributed query on the global
schema to separate queries on individual fragments using data
distribution and replication information.

02/24/2023 19
3. Global Query Optimization. Optimization consists of selecting a
strategy from a list of candidates that is closest to optimal. A list of
candidate queries can be obtained by permuting the ordering of
operations within a fragment query generated by the previous stage.
4. Local Query Optimization. This stage is common to all sites in the
DDB. The techniques are similar to those used in centralized systems.

02/24/2023 20
Concurrency Control and Recovery in Distributed Databases

For concurrency control and recovery purposes, numerous problems arise in a


distributed DBMS environment that are not encountered in a centralized DBMS
environment. These include the following:

Dealing with multiple copies of the data items. The concurrency control
method is responsible for maintaining consistency among these copies. The
recovery method is responsible for making a copy consistent with other copies if
the site on which the copy is stored fails and recovers later
02/24/2023 21
Failure of individual sites. The DDBMS should continue to operate with its
running sites, if possible, when one or more individual sites fail. When a site
recovers, its local database must be brought up-to-date with the rest of the sites
before it rejoins the system.

Distributed deadlock. Deadlock may occur among several sites, so


techniques for dealing with deadlocks must be extended to take this into
account.

02/24/2023 22
Failure of communication links. The system must be able to deal with
the failure of one or more of the communication links that connect the
sites. An extreme case of this problem is that network partitioning
may occur. This breaks up the sites into two or more partitions, where
the sites within each partition can communicate only with one another
and not with sites in other partitions.

Distributed commit. Problems can arise with committing a transaction


that is accessing databases stored on multiple sites if some sites fail
23
during the commit process.
THE END

02/24/2023 24

You might also like