Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 26

Distributed Databases

Centralized Database System Distributed Database System Advantages and Disadvantages of DDBMS
Advantages

of Data Distribution Disadvantages of Data Distribution

Design of Distributed Databases


Data

Fragmentation Data Replication

Centralized Database System

In the Centralized Database System data resides at one single location.

Distributed Database System

In distributed database system, the database is stored on several computers. A distributed database system consists of a collection of sites, each of which maintains a local databases system. Each site is able to process local transactions, those transactions that access data only in that single site. In addition, a site may participate in the execution of global transactions, those transactions that access data at several sites.

Distributed Database System

The computers in a distributed system communicate and exchange data among one another using various communication media, such as high-speed buses or telephone line. These computers do not share main memory or clock. Each of the computers in a distributed system participates in the execution of transactions. These computers are also be called as sites or nodes.

Advantages of DDBMS

Data Sharing Distributed control Reliability and availability Faster query processing

Data Sharing

Since data is distributed on multiple computers, user working on one computer can work on data available on any other computer.

Distributed control

Unlike centralized database system, wherein a single database administrator controls the database, in a distributed system the responsibility of control is divided among local administrators for each computer. Thus local administrators have certain amount of control over the data stored locally.

Reliability and availability

Even if one site fails in the distributed system the remaining sites continue working. If data is replicated among multiple computers, failure of any one computer does not cause shutdown of entire database system.

Faster query processing

Queries involving data at several sites can be split into sub queries. These sub queries can then be executed in parallel by several sites. Such parallel computation allows faster processing of users query.

Disadvantages of DDBMS

Increased cost More error prone Increased overhead

Increased cost

It is more difficult to implement distributed database system. Also cost is involved in physically linking different sites.

More error prone

Since the sites that comprise distributed system operate in parallel its very difficult to ensure correctness of data.

Increased overhead

Maintaining physical links between the sites and exchanging messages between computers is an additional overhead associated with distributed system.

DESIGN OF DISTRIBUTED DATABASES

Replication: It is defined as a copy of a relation. Each replica is stored at a different site. The alternative to replication is to store only one copy of a relation which is not recommended in distributed databases. Data Fragmentation :It is defined as partitioning of a relation into several fragments. Each fragment can be stored at a different site. - The distributed database design is a combination of both these concepts.

Data Fragmentation

It is defined as partitioning of a relation into several fragments. Each fragment can be stored at a different site.

Data Fragmentation

Why do we need to fragment a relation? The reasons for fragmenting a relation are: Use of partial data by applications: In general, applications work with views rather than entire relations. Therefore, it may be more appropriate to work with subsets of relations rather than entire data. Increases efficiency: Data is stored close to most frequently used site, thus retrieval would be faster. Also, data that is not needed by local applications is not stored, thus the size of data to be looked into is smaller.

Data Fragmentation

Parallelism of transaction execution: A transaction can be divided into several subqueries that can operate on fragments in parallel. This increases the degree of concurrency in the system, thus allowing transactions to execute efficiently.

Data Fragmentation

Fragmentation can be done in 3 ways


Horizontal

fragmentation Vertical fragmentation Mixed fragmentation

Horizontal Fragmentation

Here the tables are divided horizontally ie some of the tuples of the relation(rows) are divided horizontally. That is some of the tuples are placed in one computer and the rest are placed in other computers.

Horizontal Fragmentation

A horizontal fragment is produced by specifying a WHERE clause condition that performs a restriction on the tuples in the relation. It can also be defined using the Selection operation of the relational algebra. For eg we may define four horizontal fragments on emp table with following conditions(deptno=10,). These fragments can then be assigned to four different sites in the distributed database.

Vertical Fragmentation

In vertical fragmentation some of the columns (attributes) are stored in one computer while the rest are stored on others. This is because each site may not need all the attributes of a relation. A vertical fragment is defined using the Projection operation of the relational algebra.

Vertical Fragmentation

For example the emp table can be fragmented into two vertical fragments which will include information such as empno, ename, design, deptno in which he or she is working. The second fragment can include information such as empno, empno of the manager under whom he is working, salary. These two fragments are then stored at different locations.

Mixed Fragmentation

Mixed or hybrid fragmentation consists of a horizontal fragment that is vertically fragmented, or a vertical fragment that is then horizontally fragmented. For example if we combine the horizontal and vertical fragments of emp table it will result into mixed fragmentation.

Data Replication

Data replication is the process of storing data at one or more site or node. This is necessary for improving the availability of data. There can be full replication in which case a copy of whole database is stored at every site. There can also be partial replication in which case some fragments (important fragments or frequently used) of the database are replicated and others are not replicated.

Advantages of Data Replication

Availability: If one of the sites containing the relation fails then the relation r can be obtained from other site. Thus queries can continue to be processed inspite of failures. Increased parallelism: The sites containing the relation r can process queries in parallel this leads to faster query processing.

Disadvantages of Data Replication

Increased overheads on update: When an updation is required, a database system must ensure that all replicas are updated. Otherwise inconsistency will lead to erroneous computations. More disk space: Storing replicas of same data at different sites consumes more disk space.

You might also like