Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 19

Reference Book

Principles of Distributed Database System

Chapters

4. Distributed DBMS Architecture


5. Distributed Database Design
7.5 Layers of Query Processing
Preethi Vishwanath

Week 2 : 5th September 2006 – 12th September 2006


ANSI/SPARC Architecture
– External View, Users
which is that of the user,
who might be a
programmer
basically concerned with
how users view the data. External View
– Conceptual view,
that of the enterprise
Conceptual View
– Internal View,
that of a system or a
machine,
deals with the physical Internal View
definition and organization
of data.
Possible ways to put together
multiple databases
Autonomy of Alternatives to autonomy
Local Systems – Tight Integration
Single image of entire db_ Is available
– Refers to for any user who wants to share the
distribution of info, which may reside in multiple
control db_.
– Indicates degree – Semiautonomous systems
of independence Consists of DBMSs that can operate
of individual independently, but have decided to
databases participate in a federation.
– Total Isolation
Stand Alone DBMs
Distribution
– Deals with Physical distribution of data over
multiple sites

– Three alternative architectures available


Client-Server, communication duties are shared
between the client machines and servers.
Peer-to-peer systems, no distinction of client
machines versus servers.
Non-distributed systems
Heterogeneity
– Occurs in Various forms

– Data models: Representing data with different


modeling tools

– Query Languages: Not only involves the use


of completely different data access paradigms
in different data models, but also covers
difference in languages, even when the
individual systems use the same data model.
Client-Server architecture
Distinguish the functionality and Multiple Client – Multiple Server
divide these functions into two – Multiple Servers accessed by
classes, server functions and multiple clients
client functions. – 2 alternate management
strategies
Server does most of the data
management work 1. Heavy client Systems
– query processing – Each client manages its own
connection to the appropriate
– data management server.
– Optimization – Simplifies server code
– Transaction management etc – Loads client machines with
additional responsibilities
Client performs
– Application 2. Light Client Systems
– User interface – Each client knows of only its
– DBMS Client model “home server” which then
communicates with other servers
as required.
Multiple Client - Single Server – Concentrates on data
– Single Server accessed by management functionality at the
multiple clients servers.
Peer-to-Peer Distributed Systems
Schemas Present Local conceptual schemas are
mappings of the global
schema onto each site.
– Individual internal schema
definition at each site, local
internal schema Databases are typically
designed in a top-down
– Enterprise view of data is fashion, and, therefore all
described the global external view definitions are
conceptual schema. made globally.
– Local organization of data
at each site is describe in
the local conceptual Major Components of a Peer-
schema. to-Peer System
– User Processor
– User applications and user
– Data processor
access to the database is
supported by external
schemas.
Peer-to-Peer Distributed Systems
User Processor Data processor

User-interface handler Local query optimizer


responsible for interpreting Acts as the access path
user commands, and selector
formatting the result data Responsible for choosing the
Semantic data controller best access path
checks if the user query can Local Recovery Manager
be processed. Makes sure local database
Global Query optimizer and remains consistent
decomposer Run-time support processor
determines an execution Is the interface to the
strategy operating system and
Translates global queries into contains the database buffer
local one. Responsible for maintaining
Distributed execution the main memory buffers and
managing the data access.
Coordinates the distributed
execution of the user request
MDBS Architecture
Models Using a Global Conceptual Models without a global
Schema conceptual schema

GCS is defined by integrating either Consists of two layers, local system


the external schemas of local layer and multi database layer.
autonomous databases or parts of Local system layer , present to the
their local conceptual schema multi-database layer the part of their
Users of a local DBMS define their local database they are willing share
own views on the local database. with users of other database.
If heterogeneity exists in the system, System views are constructed above
this layer
then two implementation alternatives
exist: unilingual and multilingual Responsibility of providing access to
multiple database is delegated to the
Unilingual requires the users to mapping between the external
utilize possibly different data models schemas and the local conceptual
and languages schemas.
Basic philosophy of multilingual Full-fledged DBMs, exists each of
architecture, is to permit each user which manages a different database.
to access the global database.
GCS in Logically integrated distributed
GCS in multi-DBMS DBMS
– Mapping is from local conceptual – Mapping is from global schema to local
schema to a global schema conceptual schema
– Bottom-up design – Top-down procedure
Global Directory Issues
Global Directory is an extension of the normal directory, including
information about the location of the fragments as well as the
makeup of the fragments, for cases of distributed DBMS or a multi-
DBMS, that uses a global conceptual schema,

Global Directory Issues

– Relevant for distributed DBMS or a multi-DBMS that uses a global


conceptual schema
– Includes information about the location of the fragments as well as the
makeup of fragments.
– Directory is itself a database that contains meta-data about the actual
data stored in database.
– Three issues
A directory may either be global to the entire database or local to each site.
Directory may be maintained centrally at one site, or in a distributed fashion
by distributing it over a number of sites.
– If system is distributed, directory is always distributed
Replication, may be single copy or multiple copies.
– Multiple copies would provide more reliability
Organization of Distributed systems
Three orthogonal dimensions
– Level of sharing
No sharing, each application and data execute at one site
Data sharing, all the programs are replicated at other sites but not
the data.
Data-plus-program sharing, both data and program can be shared
– Behavior of access patterns
Static
– Does not change over time
– Very easy to manage
Dynamic
– Most of the real life applications are dynamic
– Level of knowledge on access pattern behavior.
No information
Complete information
– Access patterns can be reasonably predicted
– No deviations from predictions
Partial information
– Deviations from predictions
Top Down Design
– Suitable for applications where database needs to be build from
scratch
– Activity begins with requirement analysis
– Requirement document is input to two parallel activities:
view design activity, deals with defining the interfaces for end
users
conceptual design, process by which enterprise is examined
– Can be further divided into 2 related activity groups
Entity analyses, concerned with determining the entities, attributes
and the relationship between them
Functional analyses, concerned with determining the fun
Distributed design activity consists of two steps
– Fragmentation
– Allocation

Bottom-Up Approach
– Suitable for applications where database already exists
– Starting point is individual conceptual schemas
– Exists primarily in the context of heterogeneous database.
Fragmentation
Advantages Disadvantages

1. Permits a number of 1. Applications whose views are


transactions to executed defined on more than one
concurrently fragment may suffer
performance degradation, if
2. Results in parallel execution applications have conflicting
of a single query requirements.

3. Increases level of 2. Simple asks like checking for


concurrency, also referred to dependencies, would result in
as, intra query concurrency chasing after data in a number
of sites
4. Increased System throughput
Id Name Sal Dept
100 A 10K D1
200 B 20K D2
300 C 30K D3

Horizontal Fragmentation Vertical Fragmentation

Rows split : Sal > 20K Columns split : Primary


Key retained
Id Name Sal Dept
Id Name Id Sal Dept
100 A 10K D1
100 A 100 10K D1
200 B 20K D2
200 B 200 20K D2

300 C 300 30K D3


Id Name Sal Dept
300 C 30K D3
Correctness rules of fragmentation
Completeness
If a relation instance R is decomposed into fragments R1,R2 ….
Rn, each data item that can be found in R can also be found in
one or more of Ri’s.

Reconstruction
If a relation R is decomposed into fragments R1,R2 …. Rn, it
should be possible to define a relational operator such that
R = ▼Ri, ¥Ri ε FR ,
Please note the operator would be different for the different forms
of fragmentation

Disjointness
If a relation R is horizontally decomposed into fragments R1,R2 ….
Rn, and data item di is in Rj, it is not in any other fragment Rk (k !=
j).
Comparison of Replication
Alternatives
Full Replication Partial Partitioning
Replication
Query Easy Same Difficulty
Processing
Directory Easy or Same Difficulty
Management nonexistent
Concurrency Moderate Difficult Easy
Control
Reliability Very High High Low

Reality Possible Realistic Possible


Application application
Derived Horizontal Fragmentation
Defined on a member relation of a link Example
according to a selection operation specified Consider two tables
on its owner.
Emp PAY
Link between the owner and the member Id Name Dept Dept Sal
relations is defined as equi-join 100 A D1 D1 10K
200 B D2 D2 20K
An equi-join can be implemented by means 300 C D3 D3 30K
of semijoins.

Given a link L where owner (L) = S and PAY1 = EMP1 α PAY


member (L) = R, the derived horizontal
fragments of R are defined as PAY2 = EMP2 α PAY

Ri = R α Si, 1 <= I <= w Emp1 = σSal <= 20K (Emp)


Emp2 = σSal > 20K (Emp)
Where,
Si = σ Fi (S) PAY1
Id Name Dept PAY2
Id Name Dept
w is the max number of fragments that will be 100 A D1
defined on 300 C D3
200 B D2
Fi is the formula using which the primary horizontal
fragment Si is defined
Vertical Fragmentation
Primary Horizontal
Fragmentation
Grouping

Starts by assigning each attribute to


Primary horizontal fragmentation is one fragment
defined by a selection operation on the
owner relation of a database schema. At each step, joins some of the
fragments until some criteria is
Given relation Ri, its horizontal fragments satisfied.
are given by
Ri = σFi(R), 1<= i <= w Results in overlapping fragments
Fi selection formula used to obtain fragment
Ri Splitting

The example mentioned in slide 20, can be Starts with a relation and decides on
represented by using the above formula beneficial partitioning based on the
as access behavior of applications to the
Emp1 = σSal <= 20K (Emp) attributes
Emp2 = σSal > 20K (Emp)
Fits more naturally within the top-down
design

Generates non-overlapping fragments.


Hybrid Fragmentation
Horizontal or vertical fragmentation of In case of horizontal fragmentation,
a database schema will not be one has to stop when each fragment
sufficient to satisfy the requirements of consists of only one tuple, whereas the
user applications. termination point for vertical
In certain cases, a vertical fragmentation is one attribute per
fragmentation may be followed by a fragment.
horizontal one, or vice versa.
Since two types of partitioning Example discussed in slides 20 and 26
strategies are applied one after the can be converted into hybrid
other, this alternative is called hybrid fragmentation
fragmentation.

R U

R1 R2 α α

R11 R12 R21 R22 R23 R11 R12 R21 R22 R23

You might also like