Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Complexity in the Database Allocation

Design
Must take relationship between fragments into
account
Cost of integrity enforcements
Constraints on response-time, storage, and
processing capability

Needed Information to do
Allocation
Database information: tuple size, cardinality of
fragment
Application information: #updates/#retrieval a
query performs on a fragment
Site Information: storage and processing
capabilities and cost of processing a unit of work
Network Information: communication cost to
transfer a block of data between sites i and j

No models developed to date can


handle all the constraints
Current models simplify assumptions
and work with some specific
situations

Formulation of DAP
DAP can be formulated as an optimization
problem
Min(Total Cost)
Subject to

response time constraint


Storage constraint
Processing load constraint

DAP is NP-complete and several heuristics


have been proposed

Constraints
Q: Set of all queries
S: Set of all sites
STCjk: Storage cost of fragment Fj at Sk

Execution time constraint


Execution time of qi <= maximum response time
of qi for all qi in Q

Storage constraint

STC
F j

jk

storage capacity at site S k ,S k S

Processing constraint

processing load of q

qi Q

at site Sk processing capacity of Sk , Sk S

Cost Computation
Decision variable xij defined as

1 if fragment Fi is stored at site S j


xij =
0 otherwise

Total Cost = query processing cost +


storage cost
Solve the optimization constraint for
xij

Cost Model
Total cost=query processing cost + storage cost

TOC =

QPC

qi Q

STC

S k S F j F

jk

Unit cost of storing data at Sk

STC jk = USCk size( F j ) x jk


size( F j ) = card ( F j ) * length( F j )

Query Processing Cost


QPCi = PCi + TCi
Computation
cost of qi

Transfer
cost of qi

PCi = ACi + IEi + CCi


access cost of qi +
integrity enforcement cost of qi +
concurrency control cost of qi

LPCk: Cost of processing one unit of work at site Sk


RRij: Number of read accesses a query qi makes
to a fragment Fj
URij: #of update accesses a query qi makes to a
fragment Fj
Access cost of query qi

ACi =

(u

S k S F j F

ij

URij + rij RRij ) x jk LPCk

1 if query qi updates F j
uij =
0 otherwise
0
Assume cost of an
1 if query qi retrieves from F j
update same as cost rij =
0 otherwise
0
of retrieval

Transmission Cost Model


TCi = TCU i + TCRi

Update cost: Need to perform updates to


all replicas; no large results sent back
TCU i =

S k S F j F

ij

* x jk * g o ( i ),k +

Cost of update message to all


replicas that are involved in qi

S k S F j F

ij

* x jk * g k ,o ( i )

Cost of confirmation message


back to i

gij: communication cost per message between Si and Sj

Retrieval Cost Model


TCRi =

S k S

F j F

min (rij * x jk * g o (i ),k + rij * x jk *

Cost of sending a query

seli ( F j ) * length( F j )
fsize

* g k ,o ( i ) )

Cost of sending the results back

Pick the least cost site among all sites with the replicas
gij: communication cost per message between Si and Sj
fsize: #Bytes in a message
length(Fj): #bytes in fragment Fj
Seli(Fj): Selectivity Factor of qi on Fj

Heuristic Approaches
Allocation of Horizontal Fragments
Allocation of Vertical Fragments

(Material not in the textbook)

i:

fragment index

j:

site index

k:

application index

ALLOCATION
Notations

Frequency of application k
at site j
rki: Number of retrieval
references of application k
to fragment i.
uki: Number of update
references of application k
to fragment i.
nki = rki + uki (Number of accesses
of application k to fragment
fkj:

Site j
Fragment i

uki

rki

Application k
/w freq. fkj

Allocation of Horizontal Fragments (1)


No replication: Best Fit Strategy
The number of local references of Ri at site j is

B ij =

f kj n ki

Ri is allocated at site j* such that Bij* is maximum.

Advantage: A fragment is allocated to a site that needs it most.


Disadvantage: It disregards the mutual effect of placing a
fragment at a given site if a related fragment is also at that
site.

Allocation of Horizontal Fragments (2)


All beneficial sites approach (replication)

Bij = f kj rki c f kj 'uki


k

Cost of retrieval
references

j ' j

Cost of update
references from
other sites

Ri is allocated at all sites j* such that Bij* > 0.


When all Bijs are negative, a single copy of Ri is
placed at the site such that Bij* is maximum.

Allocation of Horizontal Fragments (3)


Another Replication Approach:
di

The degree of redundancy of Ri

Fi

The reliability and availability benefit of having Ri fully replicated.

(di)

The reliability and availability benefit when the fragment has di


copies.

(1) = 0, (2) = F i , (3) = 3 F i ,

(d i ) = (1 21d ) F i
i

The benefit of introducing a new copy of Ri at site j :

Bij = f kj rki c f kj 'u ki + ( d i )


k

j ' j

Same as All Beneficial


Sites approach

Also takes into


account the benefit
of replication

Allocation of Horizontal Fragments (4)


All Beneficial Sites
Approach:
1. Determine the set of
all sites where the
benefit of allocating
one copy of the
fragment is higher
than the cost.
2. Allocate a copy of the
fragment to each site
in the set.

Alternatively:
1. Determine the solution
of the non-replicated
problem.
2. Progressively
introduce replicated
copies starting from
the most beneficial;
the process is
terminated when no
additional replication is
beneficial.

How about Heuristics for Vertical


Allocation?

SUMMARY
Design of a distributed DB consists of four phases:
Phase 1: Global schema design (same as in centralized DB
design)
Phase 2: Fragmentation
Horizontal Fragmentation
Primary: Determine a complete and minimal set of predicates
Derived: Use semijoin

Vertical Fragmentation
Identify fragments such that many applications can be executed
using just one fragment.

Phase 3: Allocation
The primary goal is to minimize the number of remote accesses.

Phase 4: Physical schema design (same as in centralized DB


design).

10

You might also like