Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

Advanced Databases

UNIT: I
Introduction
Reference:
Principles of Distributed Database Systems - M. Tamer Ozsu, Patrick Valduriez

Prepared By:
B.Srinivas Reddy, HoD- IT, Vidya Jyothi Institute of Technology, Hyderabad
Agenda
• Introduction
• What is a distributed DBMS
• Background
• Distributed Data Processing
• Data Delivery Alternatives
• Promises of DDBSs
• Complications Introduced by Distribution
• Design Issues
• Distributed DBMS Architecture
File Systems

Program 1
File 1
Data
Description 1
Program 2
File 2
Data
Description 2
Program 3 File 3
Data
Description 3
Database Management

Application
program 1
(with data
semantics)
DBMS

Description
Application
program 2 Manipulation
(with data Database
semantics) Control

Application
program 3
(with data
semantics)
Motivation
Database Computer
Technology Networks
Integration Distribution

Distributed
Database
Systems
Integration
Integration ≠ Centralization
It is possible to achieve Integration without Centralization
Distributed Computing
 A number of autonomous processing elements (not
necessarily homogeneous) that are interconnected by
a computer network and that cooperate in
performing their assigned tasks.
 What is being distributed?
 Processing logic – Processing elements
 Function - Various functions of a computer system could be
delegated to various pieces of hardware or software
 Data - Data used by a number of applications may be
distributed to a number of processing sites
 Control - Execution of various tasks might be distributed
1. What is a Distributed Database System?
A distributed database (DDB) is a collection of multiple,
logically interrelated databases distributed over a
computer network.

A distributed database management system (D–DBMS) is


the software system that permits the management of the
distributed database and makes the distribution
transparent to the users.
Distributed database system (DDBS) = DDB + D–DBMS
What is not a DDBS?
 A timesharing computer system

 A loosely or tightly coupled multiprocessor system

 A database system which resides at one of the nodes


of a network of computers - this is a centralized
database on a network node
Centralized DBMS on a Network

Site 1
Site 2

Site 5

Communication
Network

Site 4 Site 3
Distributed DBMS Environment

Site 1
Site 2

Site 5
Communication
Network

Site 4 Site 3
Implicit Assumptions
 Data stored at a number of sites  each site logically
consists of a single processor.
 Processors at different sites are interconnected by a
computer network  not a multiprocessor system
 Parallel database systems
 Distributed database is a database, not a collection of
files  data logically related as exhibited in the users’
access patterns
 Relational data model
 D-DBMS is a full-fledged DBMS
 Not remote file system, not a TP system
2. Data Delivery Alternatives (Dimensions)
Data are “delivered” from the sites where they are stored
to where the query is posed
 Delivery modes
 Pull-only - Transfer of data from servers to clients is initiated by a client pull
 Push-only - Transfer of data from servers to clients is initiated by a server push
 Hybrid
 Frequency
 Periodic - Data are sent from the server to clients at regular intervals
 Conditional - Data are sent from servers whenever certain conditions
 Ad-hoc or irregular - Performed mostly in a pure pull-based system

 Communication Methods
 Unicast - Communication from a server to a client is one-to-one
 One-to-many - Server sends data to a number of clients
 Note: not all combinations make sense
3. Promises of Distributed DBMS (DDBS)
 Transparent management of distributed,
fragmented and replicated data

 Improved reliability/availability through


distributed transactions

 Improved performance

 Easier and more economical system


expansion
Ch.x/13
Transparency
Transparency is the separation of the higher level
semantics of a system from the lower level
implementation issues.
Types of Transparency
 Data Independence is a fundamental form of
transparency that we look for within a distributed
environment
 Logical Data Independence - Logical Structure
 Physical Data Independence - Storage Structure
 Network (distribution) Transparency - Hiding the existence of
the network
 Replication Transparency - Refers only to the existence of
replicas, not to their actual location
 Fragmentation Transparency
o Horizontal fragmentation: Subset of the tuples Ch.x/14
o Vertical fragmentation: Subset of the attributes
Example
Transparent Access
SELECT ENAME,SAL
FROM EMP,ASG,PAY Tokyo
WHERE DUR > 12
AND EMP.ENO = ASG.ENO Boston Paris
AND PAY.TITLE = EMP.TITLE Paris projects
Paris employees
Communication Paris assignments
Network Boston employees

Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Distributed Database - User View

Distributed Database
Distributed DBMS - Reality
User
Query

DBMS User
Application
Software
DBMS
Software

DBMS Communication
Software Subsystem

User
DBMS User Application
Software Query
DBMS
Software

User
Query
• Reliability: The ability of software and hardware to work
without failure
• Integrity: How correct data within a system is
 A Transaction is a basic unit of consistent and reliable computing,
consisting of a sequence of database operations executed as an
atomic action
Reliability Through Distributed Transactions
• Replicated components and data should make distributed DBMS
more reliable.
• Distributed transactions provide
➡ Concurrency transparency
➡ Failure atomicity
• Distributed transaction support requires implementation of
➡ Distributed concurrency control protocols
➡ Commit protocols
• Data replication
➡ Great for read-intensive workloads, problematic for updates
➡ Replication protocols
Potentially Improved Performance
 Proximity of data to its points of use

 Requires some support for fragmentation and replication

 Parallelism in execution

 Inter-query parallelism

 Intra-query parallelism
Parallelism Requirements
 Have as much of the data required by each application
at the site where the application executes

 Full replication

 How about updates?

 Mutual consistency

 Freshness of copies
Easier System Expansion
 Issue is database scaling

 Emergence of microprocessor and workstation


technologies

 Client-server model of computing

 Data communication cost Vs telecommunication cost


4. Distributed DBMS Issues(In building a DDBMS)
 Distributed Database Design
 How to distribute the database
 partitioned (or non-replicated) and replicated
 fully replicated and partially replicated
 A related problem in directory management
 Distributed Directory Management
 Data items in the database
 Distributed Query Processing
 Convert user transactions to data manipulation
instructions
 factors to be considered are the distribution of data,
communication costs and lack of sufficient locally-
available information
Distributed DBMS Issues
 Distributed Concurrency Control

 Synchronization of concurrent accesses

 Consistency and Isolation of transactions' effects

 Deadlock management
 Locking, which is based on the mutual exclusion of accesses to
data items, and Timestamping, where the transaction executions
are ordered based on timestamps
 Reliability of Distributed DBMS
 How to make the system resilient to failures

 Atomicity and durability

 Replication
Relationship Between Issues
Directory
Management

Query Distribution
Reliability
Processing Design

Concurrency
Control

Deadlock
Management
Related Issues
 Operating System Support
 Operating system with proper support for database
operations
 Distinguish between general purpose processing
requirements and database processing requirements
 Open Systems and Interoperability
 Distributed Multidatabase Systems
 More probable scenario
 Parallel issues

You might also like