Module III

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 132

Module III

Ammu Archa P
SCTCE
PARALLEL DATABASES

➢ Parallel systems improve processing and I/O speeds by using multiple processors and disks in parallel.

➢ In parallel processing, many operations are performed simultaneously, as opposed to serial processing, in
which the computational steps are performed sequentially.

➢ A coarse-grain parallel machine consists of a small number of powerful processors; a massively parallel
or fine-grain parallel machine uses thousands of smaller processors.

➢ There are two main measures of performance of a database system: (1) throughput, the number of tasks
that can be completed in a given time interval, and (2) response time, the amount of time it takes to
complete a single task from the time it is submitted.

➢ A system that processes a large number of small transactions can improve throughput by processing
many transactions in parallel.

➢ A system that processes large transactions can improve response time as well as throughput by
performing subtasks of each transaction in parallel.
Measuring performance of parallel processing system

➢ Two important issues in studying parallelism are speedup and scaleup.

➢ Running a given task in less time by increasing the degree of parallelism is called speedup.

➢ Handling larger tasks by increasing the degree of parallelism is called scaleup.

➢ Suppose that the execution time of a task on the larger machine is TL , and that the execution time of the
same task on the smaller machine is TS. The speedup due to parallelism is defined as TS/TL.

➢ Scaleup is expressed in term of volume of transaction performed in unit time. Scaleup = Volume
Parallel/Volume Original
Parallel Database Architectures

➢ There are several architectural models for parallel machines. Among the most prominent ones are those in
Figure (in the figure, M denotes memory, P denotes a processor, and disks are shown as cylinders):
❖ SharedMemory
➢ In a shared-memory architecture, the processors and disks have access to a common memory, typically via a bus
or through an interconnection network.

➢ The benefit of shared memory is extremely efficient communication between processors— data in shared memory
can be accessed by any processor without being moved with software.

➢ A processor can send messages to other processors much faster by using memory writes (which usually take less
than a microsecond) than by sending a message through a communication mechanism.

➢ Shared-memory architectures usually have large memory caches at each processor, so that referencing of the
shared memory is avoided whenever possible.

➢ However, at least some of the data will not be in the cache, and accesses will have to go to the shared memory.

➢ Moreover, the caches need to be kept coherent; that is, if a processor performs a write to a memory location, the
data in that memory location should be either updated at or removed from any processor where the data are
cached.

➢ Maintaining cache coherency becomes an increasing overhead with increasing numbers of processors.
Consequently, shared-memory machines are not capable of scaling up beyond a point; current shared-memory
machines cannot support more than 64 processors.
❖ Shared Disk

➢ In the shared-disk model, all processors can access all disks directly via an interconnection network, but the
processors have private memories.

➢ There are two advantages of this architecture over a shared-memory architecture.

➢ First, since each processor has its own memory, the memory bus is not a bottleneck.

➢ Second, it offers a cheap way to provide a degree of fault tolerance:

➢ If a processor (or its memory) fails, the other processors can take over its tasks, since the database is resident on
disks that are accessible from all processors.

➢ The main problem with a shared-disk system is again scalability. Although the memory bus is no longer a bottleneck,
the interconnection to the disk subsystem is now a bottleneck; it is particularly so in a situation where the database
makes a large number of accesses to disks.

➢ Compared to shared-memory systems, shared-disk systems can scale to a somewhat larger number of processors, but
communication across processors is slower (up to a few milliseconds in the absence of special-purpose hardware for
communication), since it has to go through a communication network.
❖ Shared Nothing

➢ In a shared-nothing system, each node of the machine consists of a processor, memory, and one or more disks.

➢ The processors at one node may communicate with another processor at another node by a high-speed interconnection
network.

➢ A node functions as the server for the data on the disk or disks that the node owns. Since local disk references are
serviced by local disks at each processor, the shared-nothing model overcomes the disadvantage of requiring all I/O to
go through a single interconnection network; only queries, accesses to nonlocal disks, and result relations pass
through the network.

➢ Moreover, the interconnection networks for shared-nothing systems are usually designed to be scalable, so that their
transmission capacity increases as more nodes are added.

➢ Consequently, shared-nothing architectures are more scalable and can easily support a large number of processors.

➢ The main drawbacks of shared-nothing systems are the costs of communication and of nonlocal disk access, which
are higher than in a shared-memory or shared-disk architecture since sending data involves software interaction at
both ends.
❖ Hierarchical

➢ The hierarchical architecture combines the characteristics of shared-memory, shared-disk, and shared-nothing
architectures.

➢ At the top level, the system consists of nodes that are connected by an interconnection network and do not share
disks or memory with one another.

➢ Thus, the top level is a shared-nothing architecture.

➢ Each node of the system could actually be a shared-memory system with a few processors.

➢ Alternatively, each node could be a shared-disk system, and each of the systems sharing a set of disks could be a
shared-memory system.

➢ Thus, a system could be built as a hierarchy, with shared-memory architecture with a few processors at the base, and
a shared-nothing architecture at the top, with possibly a shared-disk architecture in the middle.
How parallelism is achieved?

I/O Parallelism Intraquery Parallelism

Round Robin Intraoperation


Partitioning parallelism
Interquery Parallelism
List Partitioning

Interoperation
Parallelism
Hash Partitioning

Range Partitioning
I/O Parallelism

➢ I/O parallelism refers to reducing the time required to retrieve relations from disk by partitioning the
relations over multiple disks.

➢ The most common form of data partitioning in a parallel database environment is horizontal partitioning.

➢ In horizontal partitioning, the tuples of a relation are divided (or declustered) among many disks, so that
each tuple resides on one disk. Several partitioning strategies have been proposed
Partitioning Example

id Name Branch
1 A X
2 B Y
3 C Y
4 D Y
5 E W
6 F W
7 G X
Horizontal Fragmentation / Partitioning

id Name Branch id Name Branch


1 A X 4 D Y
2 B Y 5 E W
3 C Y 6 F W

id Name Branch
7 E X
Vertical Fragmentation / Partitioning

id Name id Branch
1 A 1 X
2 B 2 Y
3 C 3 Y
4 D 4 Y
5 E 5 W
6 F 6 W
7 G 7 X
Partitioning Techniques

➢ We present three basic data-partitioning strategies.

➢ Assume that there are n disks, D0, D1, . . . , Dn−1, across which the data are to be partitioned.

➢ Round-robin. This strategy scans the relation in any order and sends the ith tuple to disk number Di mod n .
The round-robin scheme ensures an even distribution of tuples across disks; that is, each disk has
approximately the same number of tuples as the others.

➢ Hash partitioning. This declustering strategy designates one or more attributes from the given relation’s
schema as the partitioning attributes. A hash function is chosen whose range is {0, 1, . . . , n − 1}. Each tuple
of the original relation is hashed on the partitioning attributes. If the hash function returns i, then the tuple is
placed on disk Di .

➢ Range partitioning. This strategy distributes tuples by assigning contiguous attribute-value ranges to each
disk. It chooses a partitioning attribute, A, and a partitioning vector [v0, v1, . . . , vn−2], such that, if i < j,
then vi < vj. The relation is partitioned as follows: Consider a tuple t such that t[A] = x. If x < v0, then t goes
on disk D0. If x ≥ vn−2, then t goes on disk Dn−1. If vi ≤ x < vi+1, then t goes on disk Di+1.
Round Robin example
id Name Branch
1 A X 1 mod 3=1
2 B Y 2 mod 3=2
3 C Y 3 mod 3= 0
4 D Y 4 mod 3=1
5 E W 5 mod 3=2
6 F W 6 mod 3=2
7 G X 7 mod 3=1

id Name Branch id Name Branch id Name Branch


1 A X 2 B Y 3 C Y
4 D Y 5 E W 6 F W
7 G X
DISK 2 DISK 3
DISK 1
List Partitioning Example
id Name Branch
1 A EKLM
2 B TVPM
3 C TVPM
4 D TVPM
5 E MUMBAI
6 F MUMBAI
7 G EKLM

Kerala partition Maharashtra partition

id Name Branch id Name Branch


1 A EKLM 5 E MUMBAI
2 B TVPM 6 F MUMBAI
3 C TVPM
4 D TVPM
7 G EKLM
id Name Branch Salary
Range Partitioning Example
1 A EKLM 11500
2 B TVPM 9000
3 C TVPM 2000
4 D TVPM 12000
5 E MUMBAI 20000
6 F MUMBAI 8000
7 G EKLM 7100
<8000

id Name Branch Salary 8000-10000

3 C TVPM 2000 id Name Branch Salary

7 G EKLM 7100 2 B TVPM 9000


6 F MUMBAI 8000
>10000
id Name Branch Salary
1 A EKLM 11500
4 D TVPM 12000
5 E MUMBAI 20000
❑ Round-robin. The scheme is ideally suited for applications that wish to readVthe entire relation
sequentially for each query. With this scheme, both pointVqueries and range queries are complicated to
process, since each of the n disksVmust be used for the search.

❑ Hash partitioning. This scheme is best suited for point queries based on the partitioning attribute.
Skew

➢ When a relation is partitioned (by a technique other than round-robin), there may be a skew in the
distribution of tuples, with a high percentage of tuples placed in some partitions and fewer tuples in
other partitions.

➢ The ways that skew may appear are classified as:

• Attribute-value skew.

• Partition skew.
➢ Attribute-value skew refers to the fact that some values appear in the partitioning attributes of many
tuples. All the tuples with the same value for the partitioning attribute end up in the same partition,
resulting in skew.

➢ Partition skew refers to the fact that there may be load imbalance in the partitioning, even when there is
no attribute skew.

➢ Attribute-value skew can result in skewed partitioning regardless of whether range partitioning or hash
partitioning is used.

➢ If the partition vector is not chosen carefully, range partitioning may result in partition skew.

➢ Partition skew is less likely with hash partitioning, if a good hash function is chosen.
Interquery Parallelism

➢ In interquery parallelism, different queries or transactions execute in parallel with one another.

➢ Transaction throughput can be increased by this form of parallelism.

➢ However, the response times of individual transactions are no faster than they would be if the
transactions were run in isolation.

➢ Thus, the primary use of interquery parallelism is to scale up a transaction-processing system to support
a larger number of transactions per second.
Processor Result
Query
1 1
1

Query Processor Result


2 2 2

Query Processor Result


N 3 n
Intraquery Parallelism

➢ Intraquery parallelism refers to the execution of a single query in parallel on multiple processors and
disks.

➢ Using intraquery parallelism is important for speeding up long-running queries.

➢ Interquery parallelism does not help in this task, since each query is run sequentially.

➢ To illustrate the parallel evaluation of a query, consider a query that requires a relation to be sorted.

➢ Suppose that the relation has been partitioned across multiple disks by range partitioning on some
attribute, and the sort is requested on the partitioning attribute. The sort operation can be implemented
by sorting each partition in parallel, then concatenating the sorted partitions to get the final sorted
relation.
The execution of a single query can be parallelized in two different ways:

• Intraoperation parallelism. We can speed up processing of a query by parallelizing the execution


of each individual operation, such as sort, select, project, and join.

• Interoperation parallelism. We can speed up processing of a query by executing in parallel the


different operations in a query expression
INTRAOPERATION PARALLELISM

➢ Since relational operations work on relations containing large sets of tuples, we can parallelize the
operations by executing them in parallel on different subsets of the relations.

➢ Since the number of tuples in a relation can be large, the degree of parallelism is potentially enormous.

➢ Thus, intraoperation parallelism is natural in a database system.

➢ SELECT * FROM Email ORDER BY Start_Date;

In the above query, the relational operation is Sorting. Since a table may have large number of records in it,
the operation can be performed on different subsets of the table in multiple processors, which reduces the
time required to sort.

➢ Examples of Intraoperation parallelsim

❑ Parallel Sort

❑ Parallel Join
Intra-Operation Parallelism

Parallel Sort Parallel Join

Range External
Partitioning Sort Merge Fragment and Partitioned
Sort Algorithm Replicate Join Join

Asymmetric Fragment and


Fragment and Replicate
Replicate
❑ Parallel Sort

➢ Suppose that we wish to sort a relation that resides on n disks D0, D1, . . . , Dn−1.

➢ If the relation has been range-partitioned on the attributes on which it is to be sorted, we can sort each
partition separately, and can concatenate the results to get the full sorted relation.

➢ Since the tuples are partitioned on n disks, the time required for reading the entire relation is reduced by
the parallel access.

➢ If the relation has been partitioned in any other way, we can sort it in one of two ways:

1. We can range-partition it on the sort attributes, and then sort each partition separately.

2. We can use a parallel version of the external sort–merge algorithm.


Range-Partitioning Sort

▪ Range-partitioning sort works in two steps: first range partitioning the relation, then sorting each partition
separately.

▪ When we sort by range partitioning the relation, it is not necessary to range-partition the relation on the
same set of processors or disks as those on which that relation is stored.

▪ Suppose that we choose processors P0, P1, . . . , Pm, where m < n, to sort the relation.

▪ There are two steps involved in this operation:

1. Redistribute the tuples in the relation, using a range-partition strategy, so that all tuples that lie
within the ith range are sent to processor Pi, which stores the relation temporarily on disk Di .

To implement range partitioning, in parallel every processor reads the tuples from its disk and sends
the tuples to their destination processors. Each processor P0, P1, . . . , Pm also receives tuples
belonging to its partition, and stores them locally. This step requires disk I/O and communication
overhead.
2. Each of the processors sorts its partition of the relation locally, without interaction with the other
processors. Each processor executes the same operation—namely, sorting—on a different data set.
(Execution of the same operation in parallel on different sets of data is called data parallelism.)

The final merge operation is trivial, because the range partitioning in the first phase ensures that, for 1
≤ i < j ≤ m, the key values in processor Pi are all less than the key values in Pj .
Assume that relation Employee is permanently partitioned using Round-robin technique into 3 disks D0, D1, and
D2 which are associated with processors P0, P1, and P2. At processors P0, P1, and P2, the relations are named
Employee0, Employee1 and Employee2 respectively. This initial state is given in Figure 1.
Assume that the following sorting query is initiated.

SELECT * FROM Employee ORDER BY Salary;

As already said, the table Employee is not partitioned on the sorting attribute Salary. Then, the Range-
Partitioning technique works as follows;

Step 1:

At first we have to identify a range vector v on the Salary attribute. The range vector is of the form v[v0, v1, …, vn-2]. For
our example, let us assume the following range vector;
v[14000, 24000]

This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to 24000) and range 2 (24001 and more).

Redistribute the relations Employee0, Employee1 and Employee2 using these range vectors into 3 disks temporarily.

After this distribution disk 0 will have range 0 records (i.e, records with salary value less than or equal to 14000), disk 1
will have range 1 records (i.e, records with salary value greater than 14000 and less than or equal to 24000), and disk 2
will have range 2 records (i.e, records with salary value greater than 24000).
This redistribution according to range
vector v is represented in Figure 2 as links
to all the disks from all the relations.

Temp_Employee0, Temp_Employee1, and


Temp_Employee2, are the relations after
successful redistribution.

These tables are stored temporarily in


disks D0, D1, and D2.

(They can also be stored in Main


memories (M0, M1, M2) if they fit into
RAM).
Step 2:

Now, we got temporary relations at all the disks after redistribution.


At this point, all the processors sort the data assigned to them in ascending order of Salary individually.

The process of performing the same operation in parallel on different sets of data is called Data Parallelism.

Final Result:

After the processors completed the sorting, we can simply collect the data from different processors and merge
them.

This merge process is straightforward as we have data already sorted for every range.

Hence, collecting sorted records from partition 0, partition 1 and partition 2 and merging them will give us
final sorted output.
Parallel External Sort–Merge

➢ Parallel external sort–merge is an alternative to range partitioning.

➢ Suppose that a relation has already been partitioned among disks D0, D1, . . . , Dn−1 (it does not matter
how the relation has been partitioned).

➢ Parallel external sort–merge then works this way:

1. Each processor Pi locally sorts the data on disk Di .

2. The system then merges the sorted runs on each processor to get the final sorted output.
➢ The merging of the sorted runs in step 2 can be parallelized by this sequence of actions:

1. The system range-partitions the sorted partitions at each processor Pi (all by the same partition
vector) across the processors P0, P1, . . . , Pm−1. It sends the tuples in sorted order, so that each
processor receives the tuples in sorted streams.

2. Each processor Pi performs a merge on the streams as they are received, to get a single sorted run.

3. The system concatenates the sorted runs on processors P0, P1, . . . , Pm−1 to get the final result.
Assume that relation Employee is permanently partitioned using Round-robin technique into 3 disks D0, D1,
and D2 which are associated with processors P0, P1, and P2. At processors P0, P1, and P2, the relations are
named Employee0, Employee1 and Employee2 respectively. This initial state is given in Figure 1.
Assume that the following sorting query is initiated.

SELECT * FROM Employee ORDER BY Salary;

As already said, the table Employee is not partitioned on the sorting attribute Salary. Then, the Parallel
External Sort-Merge technique works as follows;

Step 1:

Sort the data stored in every partition (every disk) using the ordering attribute Salary.

(Sorting of data in every partition is done temporarily).

At this stage every Employeei contains salary values of range minimum to maximum.

The partitions sorted in ascending order is shown below, in Figure 2.


Step 2:

We have to identify a range vector v on the Salary attribute. The range vector is of the form v[v0, v1, …, vn-
2]. For our example, let us assume the following range vector;
v[14000, 24000]
This range vector represents 3 ranges, range 0 (14000 and less), range 1 (14001 to 24000) and range 2
(24001 and more).
Redistribute every partition (Employee0, Employee1 and Employee2) using these range vectors into 3 disks
temporarily. What would be the status of Temp_Employee 0, 1, and 2 after distributing Employee 0 is given
in Figure 3.
Step 3:
Actually, the above said distribution is executed at all processors in parallel such that processors P0, P1,
and P2 are sending the first partition of Employee 0, 1, and 2 to disk 0. Upon receiving the records from
various partitions, the receiving processor P0 merges the sorted data. This is shown in Figure 4.
The above said process is done at all processors for different partitions. The final version of Temp_Employee
0, 1, and 2 are shown in Figure 5.
❑ Parallel Join

➢ Parallel join algorithms attempt to split the pairs to be tested over several processors.

➢ Each processor then computes part of the join locally. Then, the system collects the results from each
processor to produce the final result.
Parallel Join

Partitioned Join Fragment and Replicate Join

Asymetric Fragment and


Fragment and Replicate
Replicate
➢ Partitioned Join

▪ Suppose that we are using n processors and that the relations to be joined are r and s.

▪ Partitioned join then works this way: The system partitions the relations r and s each into n
partitions, denoted r0, r1, . . . , rn−1 and s0, s1, . . . , sn−1.

▪ The system sends partitions ri and si to processor Pi, where their join is computed locally.

▪ The partitioned join technique works correctly only if the join is an equi-join (for example, r
r.A=s.B s) and if we partition r and s by the same partitioning function on their join attributes.

▪ The idea of partitioning is exactly the same as that behind the partitioning step of hash join.

▪ In a partitioned join, however, there are two different ways of partitioning r and s:

• Range partitioning on the join attributes.

• Hash partitioning on the join attributes.


▪ Let us assume the following;

▪ The RegNo attributes of tables STUDENT and COURSES_REGD are used for joining.

▪ Observe the order of tuples in both tables. They are not in particular order. They are stored in random
order on RegNo
▪ Partition the tables on RegNo attribute using Hash Partition. We have 2 disks and we need to partition the
relational tables into two partitions (possibly equal). Hence, n is 2.

▪ The hash function is, h(RegNo) = (RegNo mod n) = (RegNo mod 2). And, if we apply the hash function we
shall get the tables STUDENT and COURSES_REGD partitioned into Disk0 and Disk1 as stated below.
▪ From the above table, it is very clear that the same RegNo values of both tables STUDENT and
COURSES_REGD are sent to same partitions. Now, join can be performed locally at every processor in
parallel.

▪ One more interesting fact about this join is, only 4 (2 Student records X 2 Courses_regd records)
comparisons need to be done in every partition for our example.

▪ Hence, we need total of 8 comparisons in partitioned join against 16 (4 X 4) in conventional join.

▪ The above discussed process is shown in Figure 1.


Points to note:

1. There are only two ways of partitioning the relations,


Range partitioning on the join attributes or
Hash partitioning on the join attributes.

2. Only equi-joins and natural joins can be performed in parallel using Partitioned Join technique.

3. Non-equi-joins cannot be performed with this method.

4. After successful partitioning, the records at every processor can be joined locally using any of the joining
techniques hash join, merge join, or nested loop join.

5. If Range partitioning technique is used to partition the relations into n processors, Skew may present a
special problem. That is, for some partitions, we may get fewer records (tuples) for one relation for a given
range and many records for other relation for the same range.

6. With Hash partitioning, if there are many tuples with same value in one relation then the difference both
relations is possible in one partition. Otherwise, skew has minimal effect.

7. The number of comparisons between relations are well reduced in partitioned join parallel technique.
Fragment-and-Replicate Join

➢ Partitioning is not applicable to all types of joins. For instance, if the join condition is an inequality,
such as r r.a<s.b s, it is possible that all tuples in r join with some tuple in s (and vice versa).

➢ Thus, there may be no easy way of partitioning r and s so that tuples in partition ri join with only
tuples in partition si .

➢ We can parallelize such joins by using a technique called fragment and replicate.

Fragment-and-Replicate Join

Asymmetric
fragment-and-replicate
fragment and
join
replicate join
Asymmetric fragment and replicate join

1. The system partitions one of the relations—say, r . Any partitioning technique can be used on r ,
including round-robin partitioning.

2. The system replicates the other relation, s, across all the processors.

3. Processor Pi then locally computes the join of ri with all of s, using any join technique.
Fragment-and-replicate join

➢ It is the general case of Asymmetric Fragment-and-Replicate join technique. Asymmetric technique is best
suited if one of the relations to be joined is small, and if it can fit into memory. If the relations that are to be
joined are large, and the joins is non-equal then we need to use Fragment-and-Replicate Join.

➢ It works as follows;
1. The system fragments table r into m fragments such that r0, r1, r2, .., rm-1, and s into n fragments
such that s0, s1, s2, .., sn-1 . Any partitioning technique, round-robin, hash or range partitioning
could be used to partition the relations.

2. The values for m and n are chosen based on the availability of processor. That is, we need at least
m*n processors to perform join.

3. Now we have to distribute all the partitions of r and s into available processors. And, remember that
we need to compare every tuple of one relation with every tuple of other relation. That is the
records of r0 partition should be compared with all partitions of s, and the records of partition s0
should be compared with all partitions of r. This must be done with all the partitions of r and s as
mentioned above.
Hence, the data distribution is done as follows;

i. As we need m*n processors, let us assume that we have processors P0,0, P0,1, …, P0,n-1,
P1,0, P1,1, …, Pm-1,n-1. Thus, processor Pi,j performs the join of ri with sj.

ii. To ensure the comparison of every partition of r with every other partition of s, we replicate
ri with the processors, Pi,0, Pi,1, Pi,2, …, Pi,n-1, where 0, 1, 2, …, n-1 are partitions of s.
This replication ensures the comparison of every ri with complete s.

iii. To ensure the comparison of every partition of s with every other partition of r, we replicate
si with the processors, P0,i, P1,i, P2,i, …, Pm-1,i, where 0, 1, 2, …, m-1 are partitions of r.
This replication ensures the comparison of every si with complete r.

4. Pi,j computes the join locally to produce the join result.


Points to Note:

1. Asymmetric Fragment-and-replicate join is the special case of general case Fragment-and-replicate join,
where n or m is 1, i.e, if one of the relation does not have partitions.

2. When compared to asymmetric technique, Fragment-and-replicate join reduces the size of the tables at
every processor.

3. Any partitioning techniques can be used and any joining technique can be used as well.

4. Fragment-and-replicate technique suits both Equi-join and Non-equi join.

5. It involves higher cost in partitioning.


INTEROPERATION PARALLELISM

➢ It is about executing different operations of a query in parallel. A single query may involve multiple
operations at once.

➢ We may exploit parallelism to achieve better performance of such queries. Consider the example query
given below;

SELECT AVG(Salary) FROM Employee GROUP BY Dept_Id;

➢ It involves two operations. First one is an Aggregation and the second is grouping. For executing this
query, We need to group all the employee records based on the attribute Dept_Id first.

➢ Then, for every group we can apply the AVG aggregate function to get the final result.

➢ We can use Interoperation parallelism concept to parallelize these two operations.


❑ Pipelined Parallelism

➢ In Pipelined Parallelism, the idea is to consume the result produced by one operation by the next
operation in the pipeline.

➢ For example, consider the following operation;

r1 ⋈ r2 ⋈ r3 ⋈ r4

➢ The above expression shows a natural join operation. This actually joins four tables. This operation can
be pipelined as follows;

▪ Perform temp1 ← r1 ⋈ r2 at processor P1 and send the result temp1 to processor P2 to perform
temp2 ← temp1 ⋈ r3 and send the result temp2 to processor P3 to perform result ← temp2 ⋈
r4.

▪ The advantage is, we do not need to store the intermediate results, and instead the result
produced at one processor can be consumed directly by the other. Hence, we would start
receiving tuples well before P1 completes the join assigned to it.
➢ Disadvantages:

1. Pipelined parallelism is not the good choice, if degree of parallelism is high.

2. Useful with small number of processors.

3. Not all operations can be pipelined. For example, consider the query given in the first section.
Here, you need to group at least one department employees. Then only the output can be given for
aggregate operation at the next processor.

4. Cannot expect full speedup.


❑ Independent Parallelism:

➢ Operations that are not depending on each other can be executed in parallel at different processors. This
is called as Independent Parallelism.

➢ For example, in the expression r1 ⋈ r2 ⋈ r3 ⋈ r4, the portion r1 ⋈ r2 can be done in one processor,
and r3 ⋈ r4 can be performed in the other processor. Both results can be pipelined into the third
processor to get the final result.

➢ Disadvantages:

1. Does not work well in case of high degree of parallelism.


Parallelizing the SELECTION operation

➢ Assume that the table is partitioned and stored in disks D0, D1, …, Dn-1 with processors P0, P1, …,
Pn-1.

➢ For example, consider the following figure where the table Employee is partitioned using Round-robin
partitioning technique
➢ let us consider the following general syntax of any SQL select query;

SELECT list_of_attributes FROM table_name WHERE condition;

➢ This query can be parallelized based on the condition given in the WHERE clause.

CASE 1: If the condition is of the form “a = value”, then;


If the given table is partitioned on a, then we need to execute the selection operation in only single
processor where in the given attribute is partitioned.

CASE 2: If the condition is of the form “value1 <= a <= value2” (i.e, a falls in a range), then;
If the given relation is range partitioned on a, then we need to perform selection at all the
processors wherever the range overlaps with the given range value1 to value2.

CASE 3: In all the other cases, the selection performed in parallel in all the processors. That is, for all the
other type of queries, the selection can be done in parallel in all the processors, because, the data are
partitioned over several processors.
➢ As an example for CASE 3, let us assume a query which requests information from a table with a non-
key attribute in the WHERE clause condition. Refer the Employee table in Figure 1.

SELECT * FROM emp WHERE ename = ‘Kumar’;

➢ The query involves a condition on name attribute and we don’t create an index or key on name attribute
as there are more such names.

➢ Hence, this query needs to examine all records of Employee table to give the final result. So, we can
distribute the query to all the processors wherever Employee table's partitions are stored, and execute the
query in parallel on all the processors.

➢ According to the example in Figure 1, Processors P0, P1, and P2 have to execute the query locally, even
though the data available at D0 alone.

➢ The final result is generated from the local results of these processors. This way the parallelism for
SELECTION operation can be achieved.
Parallel execution of Duplicate elimination, Projection, and Aggregation operations

❑ Duplication elimination

➢ Duplicate elimination is about removing duplicate values or neglecting repeated values that are
stored in an attribute for various reasons; For example, in one or all of the following situations we
need duplicate elimination;

➢ Duplicate elimination can be achieved in the following two ways in parallel database;

1. During parallel sort, if we find any repeated values while partitioning, those can be discarded
immediately. (This method is for tables that are not partitioned).

2. We can partition the table into many partitions (using range or hash partitioning), and instruct all
the processors to sort the data locally and remove the duplicates. (This works only for the data that
are hash or range partitioned on the duplicate elimination attribute)
❑ Projection:

➢ Projection means selection of one or more attributes from a table with the records stored in them.

➢ This operation can be parallelized as follows;

1. Projection without duplicate elimination: while you read data into various disks during partitioning,
you can project the required columns.

2. Projection with duplicate elimination: any of the techniques suggested in Duplicate elimination
section above can be used.
❑ Aggregation

➢ Aggregation operation involves finding the count of records, sum of values stored in an attribute,
minimum or maximum value of all the values stored in an attribute, and average value of all the
attribute values.

➢ This operation basically needs grouping.

➢ That is, for example, we can find the sum of salary for all the records of Employee table, or we can
find sum of salary with some filter conditions.

➢ In the first case, all the records of Employee table come under one group. In the later case, we
choose the group based on the conditions included.
DISTRIBUTED DATABASES

➢ A distributed database represents multiple interconnected databases spread out across several sites
connected by a network. Since the databases are all connected, they appear as a single database to the
users.

➢ Distributed databases utilize multiple nodes. They scale horizontally and develop a distributed system.
More nodes in the system provide more computing power, offer greater availability, and resolve the single
point of failure issue.

➢ Different parts of the distributed database are stored in several physical locations, and the processing
requirements are distributed among processors on multiple database nodes.

➢ A centralized distributed database management system (DDBMS) manages the distributed data as if it
were stored in one physical location.

➢ DDBMS synchronizes all data operations among databases and ensures that the updates in one database
automatically reflect on databases in other sites.
Difference between Centralized Database and Distributed Database

1. Centralized Database :

➢ A centralized database is basically a type of database that is stored, located as well as maintained at a
single location only.

➢ This type of database is modified and managed from that location itself. This location is thus mainly
any database system or a centralized computer system.

➢ The centralized location is accessed via an internet connection (LAN, WAN, etc). This centralized
database is mainly used by institutions or organizations.
Advantages –

➢ Since all data is stored at a single location only thus it is easier to access and coordinate data.

➢ The centralized database has very minimal data redundancy since all data is stored in a single place.

➢ It is cheaper in comparison to all other databases available.

Disadvantages –

➢ The data traffic in the case of centralized database is more.

➢ If any kind of system failure occurs at the centralized system, then the entire data will be destroyed.
2. Distributed Database :

➢ A distributed database is basically a type of database which consists of multiple databases that relate to
each other and are spread across different physical locations.

➢ The data that is stored on various physical locations can thus be managed independently of other
physical locations.

➢ The communication between databases at different physical locations is thus done by a computer
network.
Advantages –

➢ This database can be easily expanded as data is already spread across different physical locations.

➢ The distributed database can easily be accessed from different networks.

➢ This database is more secure in comparison to centralized database.

Disadvantages –

➢ This database is very costly, and it is difficult to maintain because of its complexity.

➢ In this database, it is difficult to provide a uniform view to user since it is spread across different
physical locations.
S.NO Centralized database Distributed database
It is a database which consists of multiple databases
It is a database that is stored, located as well as
1. which relate to each other and are spread across
maintained at a single location only.
different physical locations.

The data access time in the case of multiple users is The data access time in the case of multiple users is
2.
more in a centralized database. less in a distributed database.

The management, modification, and backup of this The management, modification, and backup of this
3. database are easier as the entire data is present at the database are very difficult as it is spread across
same location. different physical locations.

This database provides a uniform and complete view Since it is spread across different locations thus it is
4.
to the user. difficult to provide a uniform view to the user.

This database has more data consistency in This database may have some data replications thus
5.
comparison to distributed database. data consistency is less.

The users cannot access the database in case of In distributed database, if one database fails users have
6.
database failure occurs. access to other databases.

7. Centralized database is less costly. This database is very expensive.


Types of Distributed Databases
1. Homogeneous Distributed Database

➢ Identical software are used in all the sites. Here, a site refers to a server which is part of the
distributed database system .Software would mean OS, DBMS software, and even the structure of
the database used In some cases, identical hardware are also used

➢ All sites are well known. As they are similar in terms of DBMS software and hardware used

➢ Partial control over the data of other sites As we know the structure of databases, software and
hardware used in other sites. Hence the partial control over the data is possible

➢ Looks like a single central database system


2. Heterogeneous Distributed Database

➢ Different sites uses different database software.

➢ The structure of databases reside in different sites may be different (because of data partitions)

➢ Co-operation between sites are limited. That is, it is not easy to alter the structure of the database or
any other software used
Different options for distributing a database in a distributed database system

➢ A database is distributed over network and stored on various sites in geographically different locations
for ease of access.

➢ In an actual case, a database may be stored in multiple sites as it is, or some tables of a database might
be stored at one site, the others at some other site and so on.

➢ The various options available to us to distribute database over different locations are,

1. Data replication – it is about keeping the same copies at different sites.

▪ The whole database may be reproduced and maintained at all or few of the sites, or

▪ A particular table may be reproduced and maintained at all or few of the sites
2. Horizontal partitioning – it is about partitioning a table by records without disturbing the structure of
the table. For example, if you have a table EMP which stores data according to a schema EMP(Eno,
Ename, Dept, Dept_location), then horizontal partitioning of EMP on Dept_location is about breaking
employee records according to the department location values and store different set of employee details
at different locations. The data at different locations will be different, but the schema will be the same, ie.,
EMP(Eno, Ename, Dept, Dept_location).

3. Vertical partitioning – it is about partitioning a table vertically, ie., decomposition. Hence, the partition
of tables at different locations will of different structure.

For example, assume the schema EMP(Eno, Ename, Dept, Dept_location). If you would like to break the
above schema like one to store employee details and the other to store the department details, it can be
done as follows;

EMP(Eno, Ename, Dept), and DEPT(Dept, Dept_location)

These two tables might be stored at different locations for ease of access according to the defined
organization policies for example.
4. Hybrid approach – it is a combination of few or all of the above said techniques. That is, it may be a
combination like the few listed below;

· Horizontal partitioning and replication of few or all horizontal partitions.


· Vertical partitioning and replication of few or all vertical partitions.
· Vertical partitioning, followed by horizontal partitioning of some vertical partitions, followed by
replication of few horizontal partitions, etc.
Transparency

➢ The user of a distributed database system should not be required to know where the data are physically
located nor how the data can be accessed at the specific local site.

➢ This characteristic, called data transparency, can take several forms:

• Fragmentation transparency. Users are not required to know how a relation has been fragmented.

• Replication transparency. Users view each data object as logically unique. The distributed system
may replicate an object to increase either system performance or data availability. Users do not have to
be concerned with what data objects have been replicated, or where replicas have been placed.

• Location transparency. Users are not required to know the physical location of the data. The
distributed database system should be able to find any data as long as the data identifier is supplied by
the user transaction.
Transactions in Distributed Database Management System

➢ In centralized database system, it is mandatory to perform any transaction (ie., accessing any data
items) under the satisfaction of ACID (Atomicity, Consistency, Isolation, and Durability) properties.

➢ The act of preserving the ACID properties for any transaction is mandatory in Distributed Database
(distributed transaction) also.

➢ In case of distributed transactions, there are two types based on the location of accessed data.

➢ The first one, the local transactions involve read, write, or update of data in only one local database.
Whereas, the global transactions involve read, write, or update of data in many such local databases.

➢ Distributed database – Transaction system

▪ The Transaction system consists of two important components,

1. Transaction manager 2. Transaction Coordinator


➢ Transaction manager (which is similar to the transaction manager in centralized database. But, in
distributed database we have one for every site), whose main job is to verify the ACID properties of
those transactions that execute at that site,

➢ Transaction Coordinator, (available for every site in distributed database) to manage and coordinate
various transactions (both local and global) initiated at that site.
➢ Each Transaction Manager is responsible for,

▪ Maintaining a log for recovery purpose,

▪ Participating in an appropriate concurrency-control scheme (more on this later) to coordinate the


concurrent execution of the transactions executing at that site.

➢ Every Transaction Coordinator is responsible for,

▪ Starting the execution of every transaction at that site,

▪ Breaking the transaction into a number of sub-transactions and distributing these sub-
transactions to the appropriate sites for execution (In the diagram, the links from TC1 to TM2,
TC1 to TM3, TC1 to TM4 and so on and TC1 to TMn mention the distribution of sub-
transactions to the concerned transaction managers),

▪ Coordinating the termination of the transaction.


Commit Protocols

➢ As we noted earlier, satisfaction of ACID properties is very well important.

➢ At the initial stage, we need to ensure that the transaction is atomic (i.e., either completed as a whole
or not).

➢ In distributed database, the transaction, say T, which is going on in multiple sites must be committed
at all sites to say that the transaction T is successfully completed.

➢ If not, the transaction T must be aborted at all the sites. To implement this, we need a commit protocol.

➢ The simplest among the commit protocol is the Two-Phase Commit protocol which is widely used.
Two Phase Commit protocol in Distributed Database

➢ Consider a transaction T initiated at site Sitei. And, at that site the transaction coordinator is TCi.

➢ When the transaction started, TCi distributes the sub-transactions to the sites where the data needed for
those sub-transactions available.

➢ When T completed its execution at all the sites at which T has executed, the transaction managers
(TMs) of those sites inform TCi about the completion.

➢ Then TCi starts the 2PC protocol.

➢ Set of messages used for communication in 2PC protocol are,


MESSAGE DESCRIPTION
<prepare T> send by the coordinator to all the participating sites for preparing for commit.
It is always sent by the coordinator whenever a transaction is ready.

<ready T> send by the transaction manager of the participating site as reply for <prepare
T> message, if the sending site is ready for commit the ongoing transaction.

<abort T> send by the transaction manager of the participating site and later by the
coordinator to all the participating site, if any one or more of the participating
sites are not ready to commit.

<no T> it is the log written to the log file of the local system by the transaction
manager of the participating site, if it is not ready to commit (which also send
<abort T> to the coordinator)
<commit T> send by the coordinator if all the sites are ready for a commit.
Phase 1

➢ Transaction Coordinator TCi inserts a <preapare T> message into the log file, and forces the log
into stable storage (for example, hard disk) for the recovery purpose.

➢ Then it sends <prepare T> message to all the sites where the transaction T is being executed.

➢ On receiving such message, the TM of the participating site must decide to commit or not based on
its status.

➢ If the TM of the receiving site decided not to commit for some reasons (failure of transaction,
message failure, locking etc.), it write <no T> to its log, and sends <abort T> message to the
coordinator TCi.

➢ If the TM is read to commit, then it sends <ready T> message to the coordinator TCi. In both the
cases, (i.e., no T, or ready T), the messages first written into the stable storage of that site where it is
decided and send back to the coordinator.
Phase 2

➢ when TCi receives reply messages for <prepare T> message, or after the pre-specified time interval,
the TCi can decide the fate of the transaction.

➢ Transaction T can be committed if it received <ready T> message from all the participating sites of
the transaction T.

➢ Then TCi write a message <commit T> into its stable storage and send <commit T> to all the
participating sites for them to commit the transaction.

➢ If any one of the reply is <abort T> or no reply on the specified time interval, the transaction must be
aborted.

➢ In this case, a <abort T> message must be written into stable storage and sent to all the participating
sites to abort as well.
Two Phase Commit (2PC) protocol commits the transaction if all are ready to commit
Two Phase Commit (2PC) protocol aborts the transaction, if any of the participating sites are not ready for a
commit. See steps 1 to 4 given below;
Three Phase Commit (3PC) protocol in distributed database failure recovery

➢ Two Phase Commit (2PC) is one of the failure recovery protocols commonly used in distributed
database management system.

➢ It has a disadvantage of getting blocked under certain circumstances.

➢ For example, assume a case where the coordinator of a particular transaction is failed, and the
participating sites have all sent <READY T> message to the coordinator. Now, participating sites
do not have either <ABORT T> or <COMMIT T>. At this stage, no site can take a final decision on
its own. Only solution is to wait for the recovery of the coordinator site. Hence, 2PC is a
blockingprotocol.
➢ 3PC is a protocol that eliminates this blocking problem on certain basic requirements;

▪ No network partitioning

▪ At least one site must be available

▪ At most K simultaneous site failures are accepted


➢ 2PC has two phases namely voting phase and decision phase. 3PC introduces pre-commit phase (serves
as a buffer phase) as the third phase.

➢ 3PC works as follows;


Phase 1 (WAIT/VOTING):
▪ Transaction Coordinator (TC) of the transaction writes BEGIN_COMMIT message in its log file
and sends PREPARE message to all the participating sites and waits.

▪ Upon receiving this message, if a site is ready to commit, then the site’s transaction manager (TM)
writes READY in its log and send VOTE_COMMIT to TC.

▪ If any site is not ready to commit, it writes ABORT in its log and responds with VOTE_ABORT to
the TC.
Phase 2 (PRE-COMMIT):

➢ If TC received VOTE_COMMIT from all the participating sites, then it writes


PREPARE_TO_COMMIT in its log and sends PREPARE_TO_COMMIT message to all the
participating sites.

➢ On the other hand, if TC receives any one VOTE_ABORT message, it writes ABORT in its log and
sends GLOBAL_ABORT to all the participating sites and also writes END_OF_TRANSACTION
message in its log.

➢ On receiving the message PREPARE_TO_COMMIT, the TM of participating sites write


PREPARE_TO_COMMIT in their log and respond with READY_TO_COMMIT message to the TC.

➢ If they receive GLOBAL_ABORT message, then TM of the sites write ABORT in their logs and
acknowledge the abort. Also, they abort that particular transaction locally.
Phase 3 (COMMIT/DECIDING):

➢ If all responses are READY_TO_COMMIT, then TC writes COMMIT in its log and send
GLOBAL_COMMIT message to all the participating sites’ TMs.

➢ The TM of those sites then writes COMMIT in their log and sends an acknowledgement to the TC.
Then, TC writes END_OF_TRANSACTION in its log.
DISTRIBUTED DATABASES - CONCURRENCY CONTROL

➢ Concurrency control schemes dealt with handling of data as part of concurrent transactions.

➢ Various locking protocols are used for handling concurrent transactions in centralized database systems.

➢ There are no major differences between the schemes in centralized and distributed databases.

➢ The only major difference is that the way the lock manager should deal with the replicated data.

1. Single lock manager approach

2. Distributed lock manager approach

a) Primary Copy protocol

b) Majority protocol

c) Biased protocol

d) Quorum Consensus protocol


Single Lock Manager - Concurrency Control in Distributed Database

➢ In this approach, the distributed database system which consists of several sites, maintains a single lock
manager at a chosen site as shown in Figure 1 .

➢ Observe Figure 1 for Distributed Sites S1, S2, …, S6 with Site S3 chosen as Lock-Manager Site.
➢ The technique works as follows;

➢ When a transaction request for locking some data items, the request must be forwarded to the chosen
lock manager site for locks. This is done by the Transaction manager of site where the request is
initiated.

➢ The lock manager at the chosen lock-manager site decides to grant the lock request immediately based
on the usual procedure. [That is, if a lock is already held on the requested data item by some other
transactions in an incompatible mode, lock cannot be granted. If the data item is free or data item is
locked in a compatible mode, the lock manager grants the lock]

➢ If lock request granted, the transaction can read from any site where the replica is available.

➢ On successful completion of transaction, the Transaction manager of initiating site can release the lock
through unlock request to the lock-manager site.
➢ Let us assume that the Transaction T1 is initiated at Site S5 as shown in Figure 2 (Step 1).

➢ Also, assume that the requested data item D is replicated in Sites S1, S2, and S6.

➢ The technique works as follows;

➢ Step 2 - The initiator site S5’s Transaction manager sends the lock request to lock data item D to the
lock-manager site S3.
➢ The Lock-manager at site S3 will look for the availability of the data item D.

➢ Step 3 - If the requested item is not locked by any other transactions, the lock-manager site responds
with lock grant message to the initiator site S5.

➢ Step 4 - As the next step, the initiator site S5 can use the data item D from any of the sites S1, S2,
and S6 for completing the Transaction T1.

➢ Step 5 - After successful completion of the Transaction T1, the Transaction manager of S5 releases
the lock by sending the unlock request to the lock-manager site S3.
Advantages:

➢ Locking can be handled easily. We need two messages for lock (one for request, the other for grant),
and one message for unlock requests. Also, this method is simple as it resembles the centralized
database.

➢ Deadlocks can be handled easily. The reason is, we have one lock manager who is responsible for
handling the lock requests.

Disadvantages:

➢ The lock-manager site becomes the bottleneck as it is the only site to handle all the lock requests
generated at all the sites in the system.

➢ Highly vulnerable to single point-of-failure. If the lock-manager site failed, then we lose the
concurrency control.
Distributed Lock Manager - Concurrency Control in Distributed Database

➢ In this approach, the function of lock-manager is distributed over several sites.

➢ Every DBMS server (site) has all the components like Transaction Manager, Lock-Manager, Storage
Manager, etc.

➢ In Distributed Lock-Manager, every site owns the data which is stored locally.

➢ This is true for a table that is fragmented into n fragments and stored in n sites. In this case, every
fragment is unique from every other fragment and completely owned by the site in which it is
stored. For those fragments, the local Lock-Manager is responsible to handle lock and unlock
requests generated by the same site or by other sites.

➢ If the data stored in a site is replicated in other sites, then a site cannot own the data completely. In
such case, we cannot handle any lock request for a data item stored in a site as the case of
fragmented data. If we handle like fragmented data, it leads to inconsistency problems as there are
multiple copies stored in several sites. This case can be handled using several protocols which are
specifically designed for handling lock requests on replicated data.
(i) Primary Copy Protocol:

➢ Assume that we have the data item Q which is replicated in several sites and we choose one of the
replicas of data item Q as the Primary Copy (only one replica).

➢ The site which stores the Primary Copy is designated as the Primary Site.

➢ Any lock requests generated for data item Q at any sites must be routed to the Primary site.

➢ Primary site’s lock-manager is responsible for handling lock requests, though there are other sites
with same data item and local lock-managers.

➢ We can choose different sites as lock-manager sites for different data items.
➢ Q, R, and S are different data items that
are replicated.

➢ Q is replicated in sites S1, S2, S3 and S5


(represented in blue colored text). Site S3
is designated as Primary site for Q
(represented in purple colored text).

➢ R is replicated in sites S1, S2, S3, and


S4,S6. Site S6 is designated as Primary
site for R.

➢ S is replicated at sites S1, S2, S4, S5, and R


S6. Site S1 is designated as Primary site
for S.
Step 1: Transaction T1 initiated at site S5 and requests lock on data item Q. Even though the data item
available locally at site S5, the lock-manager of S5 cannot grant the lock. The reason is, in our
example, Site S3 is designated as primary site for Q. Hence, the request must be routed to the site S3 by
the Transaction manager of S5.

Step 2: S5 requests S3 for lock on Q. S5 sends lock request message to S3.

Step 3: If the lock on Q can be granted, S3 grants lock and send a message to S5.

On receiving lock grant, S5 executes the Transaction T1 (Executed on the data item Q available locally.
If no local copy, S3 has to execute the transaction in other sites where Q is available).

Step 4: On successful completion of Transaction, S5 sends unlock message to the Primary Site S3.
(ii) Majority Based Protocol

➢ Assume that we have the data item Q which is replicated in several sites and the Majority Based
protocol works as follows;

➢ A transaction which needs to lock data item Q has to request and lock data item Q in half+one sites
in which Q is replicated (i.e, majority of the sites in which Q is replicated).

➢ The lock-managers of all the sites in which Q is replicated are responsible for handling lock and
unlock requests locally individually.

➢ Irrespective of the lock types (read or write, i.e, Shared or Exclusive), we need to lock half+one
sites.

➢ Example:
Let us assume that Q is replicated in 6 sites. Then, we need to lock Q in 4 sites (half+one = 6/2 + 1 =
4). When transaction T1 sends the lock request message to those 4 sites, the lock-managers of those
sites have to grant the locks based on the usual lock procedure.
In the figure,

Q, R, and S are the different data items.

Q is replicated in sites S1, S2, S3 and S6.

R is replicated in sites S1, S2, S3, and S4.

S is replicated at sites S1, S2, S4, S5, and S6.


➢ Let us assume that Transaction T1 needs data item Q to be locked (either read or write mode).

Step 1: Transaction T1 initiated at site S5 and requests lock on data item Q. Q is available in S1, S2, S3 and the site
S6. According to the protocol, T1 has to lock Q in half+one sites, i.e, in our example, we need to lock any 3 out of 4
sites. Assume that we have chosen sites S1, S2, and S3.

Step 2: S5 requests S1, S2 and S3 for lock on Q. The lock request is represented in purple color.

Step 3: If the lock on Q can be granted, S1, S2, and S3 grant lock and send a message to S5.
On receiving lock grant, S5 executes the Transaction T1 (Executed on the data item Q which is taken from any one
of the locked sites). The lock grant is represented in green color.

Step 4: On successful completion of Transaction, S5 sends unlock message to all the sites S1, S2, and S3.
The unlock message is represented in blue color.

Note: If the transaction T1 writes the data item Q, then the changes must be forward to all the sites where Q is
replicated. If the transaction read the data item Q, then no problem.
(iii) Biased Protocol

➢ Biased protocol is one of the many protocols to handle concurrency control in distributed database
system, in case of replicated database.

➢ If a data item Q is replicated over n sites, then a read lock (Shared lock) request message must be sent
to any one of the sites in which Q is replicated and, a write lock (Exclusive lock) request message
must be sent to all the sites in which Q is replicated.

➢ The lock-managers of all the sites in which Q is replicated are responsible for handling lock and
unlock requests locally individually.

➢ Example:
➢ Figure 1 shows Biased protocol implementation for handling read request (Shared lock).
In the figures,
Q, R, and S are the different data items.
Q is replicated in sites S1, S2, S3 and S6.
R is replicated in sites S1, S2, S3, and S4.
S is replicated at sites S1, S2, S4, S5, and S6.
➢ Let us assume that Transaction T1 needs data item Q.

➢ Step 1: Transaction T1 initiated at site S5 and requests lock on data item Q. Q is available in S1, S2, S3
and the site S6. According to the protocol, T1 has to lock Q in any one site in which Q is replicated, i.e, in
our example, we need to lock any 1 out of 4 sites where Q is replicated. Assume that we have chosen the
site S3.

➢ Step 2: S5 requests S3 for shared lock on Q. The lock request is represented in purple color.

➢ Step 3: If the lock on Q can be granted, S3 can grant lock and send a message to S5.

On receiving lock grant, S5 executes the Transaction T1 (Reading can be done in the locked site, in our case,
it is S3).

➢ Step 4: On successful completion of Transaction, S5 sends unlock message to the site S3.
Figure 2 shows Biased protocol implementation for handling write request (Exclusive lock).
➢ Let us assume that Transaction T1 needs data item Q. Q is available in S1, S2, S3 and the site S6.
Sites S4 and S5 do not have Q in them, are represented in red color

➢ Step 1: Transaction T1 initiated at site S5 and requests lock on data item Q. According to the
protocol, T1 has to lock Q in all the sites in which Q is replicated, i.e, in our example, we need to
lock all the 4 sites where Q is replicated.

➢ Step 2: S5 requests S1, S2, S3, and S6 for exclusive lock on Q. The lock request is represented in
purple color.

➢ Step 3: If the lock on Q can be granted at every site, all the sites will respond with grant lock
message to S5. (If any one or more sites cannot grant, T1 cannot be continued)

➢ On receiving lock grant, S5 executes the Transaction T1 (When writing the data item, transaction
performs writes on all replicas).

➢ Step 4: On successful completion of Transaction, S5 sends unlock message to all sites S1, S2, S3, and
S6.
(iv) Quorum Consensus Protocol

➢ This is one of the distributed lock manager based concurrency control protocol in distributed database
systems. It works as follows;

1. The protocol assigns each site that have a replica with a weight.

2. For any data item, the protocol assigns a read quorum Qr and write quorum Qw. Here, Qr and Qw are
two integers (sum of weights of some sites). And, these two integers are chosen according to the following
conditions put together;

Qr + Qw > S

-this rule avoids read-write conflict. (i.e, two transactions cannot read and write concurrently)

2 * Qw > S

-this rule avoids write-write conflict. (i.e, two transactions cannot write concurrently)

Here, S is the total weight of all sites in which the data item replicated.
➢ How do we perform read and write on replicas?

➢ A transaction that needs a data item for reading purpose has to lock enough sites. That is, it has lock
sites with the sum of their weight >= Qr. Read quorum must always intersect with write quorum.

➢ A transaction that needs a data item for writing purpose has to lock enough sites. That is, it has lock
sites with the sum of their weight >= Qw.
➢ Let us assume a fully replicated distributed database with four sites S1, S2, S3, and S4.

➢ 1. According to the protocol, we need to assign a weight to every site. (This weight can be chosen on
many factors like the availability of the site, latency etc.). For simplicity, let us assume the weight as
1 for all sites.

➢ 2. Let us choose the values for Qr and Qw as 2 and 3. Our total weight S is 4. And according to the
conditions, our Qr and Qw values are correct;

Qr + Qw > S => 2 + 3 > 4 True


2 * Qw > S => 2 * 3 > 4 True

➢ 3. Now, a transaction which needs a read lock on a data item has to lock 2 sites. A transaction which
needs a write lock on data item has to lock 3 sites.

You might also like