Professional Documents
Culture Documents
ADB - Unit - I
ADB - Unit - I
UNIT: I
Introduction
Reference:
Principles of Distributed Database Systems - M. Tamer Ozsu, Patrick Valduriez
Prepared By:
B.Srinivas Reddy, HoD- IT, Vidya Jyothi Institute of Technology, Hyderabad
Agenda
• Introduction
• What is a distributed DBMS
• Background
• Distributed Data Processing
• Data Delivery Alternatives
• Promises of DDBSs
• Complications Introduced by Distribution
• Design Issues
• Distributed DBMS Architecture
File Systems
Program 1
File 1
Data
Description 1
Program 2
File 2
Data
Description 2
Program 3 File 3
Data
Description 3
Database Management
Application
program 1
(with data
semantics)
DBMS
Description
Application
program 2 Manipulation
(with data Database
semantics) Control
Application
program 3
(with data
semantics)
Motivation
Database Computer
Technology Networks
Integration Distribution
Distributed
Database
Systems
Integration
Integration ≠ Centralization
It is possible to achieve Integration without Centralization
Distributed Computing
A number of autonomous processing elements (not
necessarily homogeneous) that are interconnected by
a computer network and that cooperate in
performing their assigned tasks.
What is being distributed?
Processing logic – Processing elements
Function - Various functions of a computer system could be
delegated to various pieces of hardware or software
Data - Data used by a number of applications may be
distributed to a number of processing sites
Control - Execution of various tasks might be distributed
1. What is a Distributed Database System?
A distributed database (DDB) is a collection of multiple,
logically interrelated databases distributed over a
computer network.
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Distributed DBMS Environment
Site 1
Site 2
Site 5
Communication
Network
Site 4 Site 3
Implicit Assumptions
Data stored at a number of sites each site logically
consists of a single processor.
Processors at different sites are interconnected by a
computer network not a multiprocessor system
Parallel database systems
Distributed database is a database, not a collection of
files data logically related as exhibited in the users’
access patterns
Relational data model
D-DBMS is a full-fledged DBMS
Not remote file system, not a TP system
2. Data Delivery Alternatives (Dimensions)
Data are “delivered” from the sites where they are stored
to where the query is posed
Delivery modes
Pull-only - Transfer of data from servers to clients is initiated by a client pull
Push-only - Transfer of data from servers to clients is initiated by a server push
Hybrid
Frequency
Periodic - Data are sent from the server to clients at regular intervals
Conditional - Data are sent from servers whenever certain conditions
Ad-hoc or irregular - Performed mostly in a pure pull-based system
Communication Methods
Unicast - Communication from a server to a client is one-to-one
One-to-many - Server sends data to a number of clients
Note: not all combinations make sense
3. Promises of Distributed DBMS (DDBS)
Transparent management of distributed,
fragmented and replicated data
Improved performance
Boston projects
Boston employees
Boston assignments
Montreal
New
Montreal projects
York Paris projects
Boston projects New York projects
New York employees with budget > 200000
New York projects Montreal employees
New York assignments Montreal assignments
Distributed Database - User View
Distributed Database
Distributed DBMS - Reality
User
Query
DBMS User
Application
Software
DBMS
Software
DBMS Communication
Software Subsystem
User
DBMS User Application
Software Query
DBMS
Software
User
Query
• Reliability: The ability of software and hardware to work
without failure
• Integrity: How correct data within a system is
A Transaction is a basic unit of consistent and reliable computing,
consisting of a sequence of database operations executed as an
atomic action
Reliability Through Distributed Transactions
• Replicated components and data should make distributed DBMS
more reliable.
• Distributed transactions provide
➡ Concurrency transparency
➡ Failure atomicity
• Distributed transaction support requires implementation of
➡ Distributed concurrency control protocols
➡ Commit protocols
• Data replication
➡ Great for read-intensive workloads, problematic for updates
➡ Replication protocols
Potentially Improved Performance
Proximity of data to its points of use
Parallelism in execution
Inter-query parallelism
Intra-query parallelism
Parallelism Requirements
Have as much of the data required by each application
at the site where the application executes
Full replication
Mutual consistency
Freshness of copies
Easier System Expansion
Issue is database scaling
Deadlock management
Locking, which is based on the mutual exclusion of accesses to
data items, and Timestamping, where the transaction executions
are ordered based on timestamps
Reliability of Distributed DBMS
How to make the system resilient to failures
Replication
Relationship Between Issues
Directory
Management
Query Distribution
Reliability
Processing Design
Concurrency
Control
Deadlock
Management
Related Issues
Operating System Support
Operating system with proper support for database
operations
Distinguish between general purpose processing
requirements and database processing requirements
Open Systems and Interoperability
Distributed Multidatabase Systems
More probable scenario
Parallel issues