DDB 05 Allocation

Complexity in the Database Allocation
Design
Must take relationship between fragments into
account
Cost of integrity enforcements
Constraints on response-time, storage, and
processing capability
Needed Information to do
Allocation
Database information: tuple size, cardinality of
fragment
Application information: #updates/#retrieval a
query performs on a fragment
Site Information: storage and processing
capabilities and cost of processing a unit of work
Network Information: communication cost to
transfer a block of data between sites i and j
No models developed to date can

handle all the constraints
Current models simplify assumptions
and work with some specific
situations
Formulation of DAP
DAP can be formulated as an optimization
problem
Min(Total Cost)
Subject to
response time constraint

Storage constraint
Processing load constraint
DAP is NP-complete and several heuristics

have been proposed
Constraints
Q: Set of all queries
S: Set of all sites
STCjk: Storage cost of fragment Fj at Sk
Execution time constraint

Execution time of qi <= maximum response time
of qi for all qi in Q
Storage constraint
STC
F j
jk
storage capacity at site S k ,S k S
Processing constraint
processing load of q
qi Q
at site Sk processing capacity of Sk , Sk S
Cost Computation
Decision variable xij defined as
1 if fragment Fi is stored at site S j

xij =
0 otherwise
Total Cost = query processing cost +

storage cost
Solve the optimization constraint for
xij
Cost Model
Total cost=query processing cost + storage cost
TOC =
QPC
qi Q
STC
S k S F j F
jk
Unit cost of storing data at Sk
STC jk = USCk size( F j ) x jk

size( F j ) = card ( F j ) * length( F j )
Query Processing Cost

QPCi = PCi + TCi
Computation
cost of qi
Transfer
cost of qi
PCi = ACi + IEi + CCi

access cost of qi +
integrity enforcement cost of qi +
concurrency control cost of qi
LPCk: Cost of processing one unit of work at site Sk

RRij: Number of read accesses a query qi makes
to a fragment Fj
URij: #of update accesses a query qi makes to a
fragment Fj
Access cost of query qi
ACi =
(u
S k S F j F
ij
URij + rij RRij ) x jk LPCk
1 if query qi updates F j
uij =
0 otherwise
0
Assume cost of an
1 if query qi retrieves from F j
update same as cost rij =
0 otherwise
0
of retrieval
Transmission Cost Model

TCi = TCU i + TCRi
Update cost: Need to perform updates to

all replicas; no large results sent back
TCU i =
S k S F j F
ij
* x jk * g o ( i ),k +
Cost of update message to all

replicas that are involved in qi
S k S F j F
ij
* x jk * g k ,o ( i )
Cost of confirmation message

back to i
gij: communication cost per message between Si and Sj
Retrieval Cost Model

TCRi =
S k S
F j F
min (rij * x jk * g o (i ),k + rij * x jk *
Cost of sending a query
seli ( F j ) * length( F j )
fsize
* g k ,o ( i ) )
Cost of sending the results back
Pick the least cost site among all sites with the replicas
gij: communication cost per message between Si and Sj
fsize: #Bytes in a message
length(Fj): #bytes in fragment Fj
Seli(Fj): Selectivity Factor of qi on Fj
Heuristic Approaches
Allocation of Horizontal Fragments
Allocation of Vertical Fragments
(Material not in the textbook)
i:
fragment index
j:
site index
k:
application index
ALLOCATION
Notations
Frequency of application k
at site j
rki: Number of retrieval
references of application k
to fragment i.
uki: Number of update
references of application k
to fragment i.
nki = rki + uki (Number of accesses
of application k to fragment
fkj:
Site j
Fragment i
uki
rki
Application k
/w freq. fkj
Allocation of Horizontal Fragments (1)

No replication: Best Fit Strategy
The number of local references of Ri at site j is
B ij =
f kj n ki
Ri is allocated at site j* such that Bij* is maximum.
Advantage: A fragment is allocated to a site that needs it most.

Disadvantage: It disregards the mutual effect of placing a
fragment at a given site if a related fragment is also at that
site.

All beneficial sites approach (replication)
Bij = f kj rki c f kj 'uki

k
Cost of retrieval
references
j ' j
Cost of update
references from
other sites
Ri is allocated at all sites j* such that Bij* > 0.

When all Bijs are negative, a single copy of Ri is
placed at the site such that Bij* is maximum.

Another Replication Approach:
di
The degree of redundancy of Ri
Fi
The reliability and availability benefit of having Ri fully replicated.
(di)
The reliability and availability benefit when the fragment has di

copies.
(1) = 0, (2) = F i , (3) = 3 F i ,
(d i ) = (1 21d ) F i
i
The benefit of introducing a new copy of Ri at site j :
Bij = f kj rki c f kj 'u ki + ( d i )

k
j ' j
Same as All Beneficial

Sites approach
Also takes into

account the benefit
of replication

All Beneficial Sites
Approach:
1. Determine the set of
all sites where the
benefit of allocating
one copy of the
fragment is higher
than the cost.
2. Allocate a copy of the
fragment to each site
in the set.
Alternatively:
1. Determine the solution
of the non-replicated
problem.
2. Progressively
introduce replicated
copies starting from
the most beneficial;
the process is
terminated when no
additional replication is
beneficial.
How about Heuristics for Vertical

Allocation?
SUMMARY
Design of a distributed DB consists of four phases:
Phase 1: Global schema design (same as in centralized DB
design)
Phase 2: Fragmentation
Horizontal Fragmentation
Primary: Determine a complete and minimal set of predicates
Derived: Use semijoin
Vertical Fragmentation
Identify fragments such that many applications can be executed
using just one fragment.
Phase 3: Allocation
The primary goal is to minimize the number of remote accesses.
Phase 4: Physical schema design (same as in centralized DB

design).
10

DDB 05 Allocation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DDB 05 Allocation

Uploaded by

Copyright:

Available Formats

Complexity in the Database Allocation

No models developed to date can

response time constraint

DAP is NP-complete and several heuristics

Execution time constraint

storage capacity at site S k ,S k S

at site Sk processing capacity of Sk , Sk S

1 if fragment Fi is stored at site S j

Total Cost = query processing cost +

Unit cost of storing data at Sk

STC jk = USCk size( F j ) x jk

Query Processing Cost

PCi = ACi + IEi + CCi

LPCk: Cost of processing one unit of work at site Sk

URij + rij RRij ) x jk LPCk

Transmission Cost Model

Update cost: Need to perform updates to

Cost of update message to all

Cost of confirmation message

gij: communication cost per message between Si and Sj

Retrieval Cost Model

min (rij * x jk * g o (i ),k + rij * x jk *

Cost of sending a query

Cost of sending the results back

(Material not in the textbook)

Allocation of Horizontal Fragments (1)

Ri is allocated at site j* such that Bij* is maximum.

Advantage: A fragment is allocated to a site that needs it most.

Allocation of Horizontal Fragments (2)

Bij = f kj rki c f kj 'uki

Ri is allocated at all sites j* such that Bij* > 0.

Allocation of Horizontal Fragments (3)

The degree of redundancy of Ri

The reliability and availability benefit of having Ri fully replicated.

The reliability and availability benefit when the fragment has di

(1) = 0, (2) = F i , (3) = 3 F i ,

The benefit of introducing a new copy of Ri at site j :

Bij = f kj rki c f kj 'u ki + ( d i )

Same as All Beneficial

Also takes into

Allocation of Horizontal Fragments (4)

How about Heuristics for Vertical

Phase 4: Physical schema design (same as in centralized DB

You might also like