Professional Documents
Culture Documents
DDB 05 Allocation
DDB 05 Allocation
Design
Must take relationship between fragments into
account
Cost of integrity enforcements
Constraints on response-time, storage, and
processing capability
Needed Information to do
Allocation
Database information: tuple size, cardinality of
fragment
Application information: #updates/#retrieval a
query performs on a fragment
Site Information: storage and processing
capabilities and cost of processing a unit of work
Network Information: communication cost to
transfer a block of data between sites i and j
Formulation of DAP
DAP can be formulated as an optimization
problem
Min(Total Cost)
Subject to
Constraints
Q: Set of all queries
S: Set of all sites
STCjk: Storage cost of fragment Fj at Sk
Storage constraint
STC
F j
jk
Processing constraint
processing load of q
qi Q
Cost Computation
Decision variable xij defined as
Cost Model
Total cost=query processing cost + storage cost
TOC =
QPC
qi Q
STC
S k S F j F
jk
Transfer
cost of qi
ACi =
(u
S k S F j F
ij
1 if query qi updates F j
uij =
0 otherwise
0
Assume cost of an
1 if query qi retrieves from F j
update same as cost rij =
0 otherwise
0
of retrieval
S k S F j F
ij
* x jk * g o ( i ),k +
S k S F j F
ij
* x jk * g k ,o ( i )
S k S
F j F
seli ( F j ) * length( F j )
fsize
* g k ,o ( i ) )
Pick the least cost site among all sites with the replicas
gij: communication cost per message between Si and Sj
fsize: #Bytes in a message
length(Fj): #bytes in fragment Fj
Seli(Fj): Selectivity Factor of qi on Fj
Heuristic Approaches
Allocation of Horizontal Fragments
Allocation of Vertical Fragments
i:
fragment index
j:
site index
k:
application index
ALLOCATION
Notations
Frequency of application k
at site j
rki: Number of retrieval
references of application k
to fragment i.
uki: Number of update
references of application k
to fragment i.
nki = rki + uki (Number of accesses
of application k to fragment
fkj:
Site j
Fragment i
uki
rki
Application k
/w freq. fkj
B ij =
f kj n ki
Cost of retrieval
references
j ' j
Cost of update
references from
other sites
Fi
(di)
(d i ) = (1 21d ) F i
i
j ' j
Alternatively:
1. Determine the solution
of the non-replicated
problem.
2. Progressively
introduce replicated
copies starting from
the most beneficial;
the process is
terminated when no
additional replication is
beneficial.
SUMMARY
Design of a distributed DB consists of four phases:
Phase 1: Global schema design (same as in centralized DB
design)
Phase 2: Fragmentation
Horizontal Fragmentation
Primary: Determine a complete and minimal set of predicates
Derived: Use semijoin
Vertical Fragmentation
Identify fragments such that many applications can be executed
using just one fragment.
Phase 3: Allocation
The primary goal is to minimize the number of remote accesses.
10