Professional Documents
Culture Documents
3-Distribution Design
3-Distribution Design
• Introduction
• Background
• Distributed Database Design
➡ Fragmentation
➡ Data distribution
• Database Integration
• Semantic Data Control
• Distributed Query Processing
• Multidatabase Query Processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/1
Design Problem
• In the general setting :
Making decisions about the placement of data and programs across the sites of
a computer network as well as possibly designing the network itself.
Level of sharing
• Bottom-up
Objectives
User Input
Conceptual View Integration View Design
Design
Access
GCS Information ES’s
Distribution
Design User Input
LCS’s
Physical
Design
LIS’s
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/5
Distribution Design Issues
Why fragment at all?
How to fragment?
How to allocate?
Information requirements?
PROJ1 PROJ2
PROJ1 PROJ2
PNO BUDGET PNO PNAME LOC
tuples relations
or
attributes
CONCURRENCY
Moderate Difficult Easy
CONTROL
Possible Possible
REALITY Realistic
application application
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/13
Information Requirements
• Four categories:
➡ Database information
➡ Application information
• Database Information
➡ relationship
SKILL
TITLE, SAL
L1
EMP PROJ
ENO, ENAME, TITLE PNO, PNAME, BUDGET,
LOC
ASG
ENO, PNO, RESP, DUR
pj : Ai θValue
where θ {=,<,≤,>,≥,≠}, Value Di and Di is the domain of Ai.
For relation R we define Pr = {p1, p2, …,pm}
Example :
PNAME = "Maintenance"
BUDGET ≤ 200000
➡ minterm predicates : Given R and Pr = {p1, p2, …,pm}
define M = {m1,m2,…,mr} as
M = { mi | mi =
pjPr pj* }, 1≤j≤m, 1≤i≤z
where pj* = pj or pj* = ¬(pj).
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/17
PHF – Information Requirements
Example
Given a set of minterm predicates M, there are as many horizontal fragments
of relation R as there are minterm predicates.
Set of horizontal fragments also referred to as minterm fragments.
Preliminaries :
➡ Pr should be complete
➡ Pr should be minimal
• Example :
➡ Assume PROJ[PNO,PNAME,BUDGET,LOC] has two applications defined
on it.
➡ Find the budgets of projects at each location. (1)
➡ Find projects with budgets less than $200000. (2)
which is complete.
acc(mi ) acc(m j )
=
card( fi ) card( f j )
PAY1 PAY2
TITLE SAL TITLE SAL
Mech. Eng. 27000 Elect. Eng. 40000
Programmer 24000 Syst. Anal. 34000
➡ Simple predicates
➡ For application (1)
p1 : LOC = “Montreal”
p2 : LOC = “New York”
p3 : LOC = “Paris”
➡ For application (2)
p4 : BUDGET ≤ 200000
p5 : BUDGET > 200000
➡ Pr = Pr' = {p1,p2,p3,p4,p5}
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/31
PHF – Example
• Fragmentation of relation PROJ continued
➡ Minterm fragments left after elimination
m1 : (LOC = “Montreal”) (BUDGET ≤ 200000)
m2 : (LOC = “Montreal”) (BUDGET > 200000)
m3 : (LOC = “New York”) (BUDGET ≤ 200000)
m4 : (LOC = “New York”) (BUDGET > 200000)
m5 : (LOC = “Paris”) (BUDGET ≤ 200000)
m6 : (LOC = “Paris”) (BUDGET > 200000)
PROJ1 PROJ2
PROJ4 PROJ6
• Reconstruction
➡ If relation R is fragmented into FR = {R1,R2,…,Rr}
R = Ri FR Ri
• Disjointness
➡ Minterm predicates that form the basis of fragmentation should be mutually
exclusive.
SKILL
TITLE, SAL
L1
EMP PROJ
ENO, ENAME, TITLE PNO, PNAME, BUDGET, LOC
L2 L3
ASG
ENO, PNO, RESP, DUR
Ri = R ⋉F Si, 1≤i≤w
where w is the maximum number of fragments that will be defined on R and
Si = Fi (S)
where Fi is the formula according to which the primary horizontal fragment
Si is defined.
EMP1 EMP2
ENO ENAME TITLE ENO ENAME TITLE
• Disjointness
➡ Simple join graphs between the owner and the member fragments.
• Non-overlapping fragments
➡ splitting
access
query access = access frequency of a query
execution
all sites
Assume each query in the previous example accesses the attributes once
during each execution. S1 S2 S3
Also assume the access frequencies q 1 15 20 10
q2 5 0 0
q3 25 25 25
q
4 3 0 0
Then A1 A2 A3 A4
aff(A1, A3) = 15*1 + 20*1+10*1 A1 45 0 45 0
= 45 A2 0 80 5 75
and the attribute affinity matrix AA is A3 45 5 53 3
A4 0 75 3 78
where
n
bond(Ax,Ay) = aff(Az,Ax)aff(Az,Ay)
z =1
Ordering (0-3-1) :
cont(A0,A3,A1) = 2bond(A0 , A3)+2bond(A3 , A1)–2bond(A0 , A1)
= 2* 0 + 2* 4410 – 2*0 = 8820
Ordering (1-3-2) :
cont(A1,A3,A2) = 2bond(A1 , A3)+2bond(A3 , A2)–2bond(A1,A2)
= 2* 4410 + 2* 890 – 2*225 = 10150
Ordering (2-3-4) :
cont (A2,A3,A4) = 1780
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/48
BEA – Example
• Therefore, the CA matrix has the form A1 A3 A2
45 45 0
0 5 80
45 53 5
0 3 75
• When A4 is placed, the final form of the CA matrix (after row organization)
is A1 A3 A2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.3/49
VF – Algorithm
How can you divide a set of clustered attributes {A1, A2, …, An}
into two (or more) sets {A1, A2, …, Ai} and {Ai, …, An} such that
there are no (or minimal) applications that access both (or more
than one) of the sets.
A1 A2 A3 … Ai Ai+1 . . .Am
A1
A2
TA
Ai
Ai+1
BA
Am
CTQCBQCOQ2
A = ARi
• Reconstruction
➡ Reconstruction can be achieved by
R = ⋈•
K Ri, Ri FR
• Disjointness
➡ TID's are not considered to be overlapping since they are maintained by the
system
➡ Duplicated keys are not considered to be overlapping
R1 R2
VF VF VF VF VF
Decision Variable
➡ Retrieval Cost
• Heuristics based on
➡ single commodity warehouse location (for FAP)
➡ knapsack problem
➡ network flow