Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 34

Mutual Exclusion in Distributed Systems

+ Single Processor Systems


- use semaphore, monitor, etc.

+ Distributed Systems

- centralized algorithm
central server coordinate the ordering for entering CS
overload the central site
introduce a single point of failure in the system

Mutual Exclusion in Distributed Systems
+ decentralized algorithms
- non-token based algorithms
Lamport's algorithm
Ricart-Agrawala's algorithm
Maekawa's algorithm

- token based algorithms
token-ring algorithm
broadcast algorithm
tree-based algorithm

- self-stabilizing algorithm
Lamport's Algorithm
+ Request the CS:
1. P
i
broadcasts request (t
i
, i) to all processors and puts the request in its local
queue (in the order of timestamps t of the requests)
2. P
j
upon receiving the request (t
i
, i), puts the request in its local queue (in the
order of timestamps t of the requests) and sends reply (t
j
, j) to P
i


+ Enter the CS:
1. if P
i
has received reply messages from all sites with timestamps larger than t
i

and its request is at the top of the queue, then it enters the CS

+ Release the CS:
1. P
i
, upon exiting CS, removes its request from the queue and sends release (t
i
)
to all processors
2. P
j
, upon receiving the message, removes the request from the top of the
queue

Lamport's Algorithm -- Properties
+ this algorithm requires
- a total ordering of events
- all sites to be alive
+ requires 3(N1) messages per request
+ response time in a very low load
- 2T
- T: per message communication latency
- assume there is no one in CS
- send N1 request messages sent in parallel (T)
- send N1 response messages sent in parallel (T)
- so, requester enters CS after 2T time
Ricart-Agrawala's Algorithm
+ Request the CS:
1. P
i
broadcasts request (t
i
, i) to all processors
2. P
j
, upon receiving the request
a) sends reply (t
j
, j) to P
i
if P
j
is neither requesting nor executing in the
CS
b) sends reply (t
j
, j) to P
i
if P
j
is requesting the CS but the timestamp for
P
j
s request is larger than t
i

c) defers the request otherwise

+ Enter the CS:
1. if P
i
has received reply messages from all sites, then it enters the CS

+ Release the CS:
1. P
i
upon exiting CS, sends reply (j) to all the deferred requests

Ricart-Agrawala's Algorithm
+ this algorithm requires
- a total ordering of events
- require all sites to be alive
+ requires 2(N1) messages per request
+ response time in a very low load
- 2T
- send N1 request messages in parallel (T)
- send N1 response messages in parallel (T)


Maekawa's Algorithm
+ Request set
- each node has a request set
- when the node wants to enter the critical section, it sends its request to all
nodes in its request set
- the request set of each node does not include all nodes in the system
- the intersection of any two request sets is non-empty

+ Example
- consider three nodes, X, Y, and Z
- Xs request set include nodes X and Y
- Ys request set include nodes Y and Z
- Zs request set include nodes Z and X
Maekawa's Algorithm
+ Request the CS:
1. P
i
multicasts request (t
i
, i) to its request set, including itself
2. P
j
upon receiving the request
a) if it is not currently locked, then locks itself and sends reply (j) to P
i

b) otherwise, puts the request in a queue (in the order of the timestamp)
+ Enter the CS:
1. if P
i
has received reply messages from all sites in its request set, then it
enters the CS
+ Release the CS:
1. P
i
upon exiting CS, sends release (t
i
) to all processors in its request set
2. P
j
upon receiving the message
a) if the waiting queue is not empty then it removes the entry in the queue
and sends reply (j) to that node
b) otherwise, unlocks itself
Maekawa's Algorithm -- Properties
+ requires a total ordering of events
+ requires 3\N messages per request
+ response time in a very low load
- 2T
- send K1 request messages sent in parallel (T)
- send K1 response messages sent in parallel (T)
+ has the potential deadlock problem

Potential Deadlock Problem in Maekawa's Algorithm
+ requests reach different sites in different order
- consider nodes X, Y, Z, who issue requests to enter the critical section
- Xs request has the lowest timestamp, Zs request has the highest
- A is the mediator of requests from X and Y
- B is the mediator of requests from Y and Z
- C is the mediator of requests from X and Z
- A received Xs request first and locked itself for X
- B received Ys request first and locked itself for Y
- C received Zs request first and locked itself for Z

- X will not get a reply from C
- Y will not get a reply from A
- Z will not get a reply from B
deadlock
Solution to the Potential Deadlock Problem
+ detect the potential deadlock
- when a request with a smaller timestamp is received, while the node is
locked for a request with a larger timestamp
+ resolution
- ask the requester with a larger timestamp to give up its granted privilege if
it has not already gotten all replies
for the previous example, C asks Z to give up the granted privilege

Resolve the Potential Deadlock Problem
+ Request the CS:
1. P
i
multicasts request (t
i
, i) to its request set, including itself
2. P
z
upon receiving the request
a) if it is not currently locked, then locks itself and sends reply (z) to P
i

b) if it is currently locked for P
k
, then
if request from P
k
has a smaller timestamp then puts the new
request in a waiting queue (in the order of the timestamp) and sends
failed (z) to P
i

otherwise (P
i
's request has a smaller timestamp), sends inquire (z)
to P
k

Resolve the Potential Deadlock Problem
+ Request the CS:
3. P
k
upon receiving inquire (z)
a) if it has received a failed message then sends relinquish (k) to all sites in
its request set
b) if it has received all reply messages then ignores the inquire message
c) otherwise, simply waits
4. P
z
, upon receiving relinquish (k),
a) changes the lock to lock for P
i
and sends reply (z) to P
i


+ Property
- requires at most 5\N messages per request
- response time under very low load: 2T
Request Set Generation
+ Assume
- total N nodes
+ Let S
i
denote the request set for P
i
, the request sets have to satisfy
- S
i
S
j
= C, for all i, j
- S
i
, for all i, always contains P
i

+ additional desirable properties
- |S
i
| = |S
j
| = K, for all i, j, and for some K
i.e., the request sets are of equal size, and each is of size K
- O(P
i
) = O(P
j
) = D, for all i and j
O(P
i
) denotes the number of occurrences of P
i
in all request sets
i.e., each node is involved in D request sets

Request Set Generation
+ relationship between K and D
- N nodes, each has a request set of size K
- total NK nodes required (can be duplicates)
- since there are N nodes, each site need to be duplicated D times
K = D

+ request set size K
- consider the first request set, it has K nodes, each of them can be in (K1)
other request sets
- Each other request set should contain at least one of the nodes in the first
request set
total K(K1) extra request sets other than the first one
N = K(K1)+1 K ~ \N
Request Set Generation
+ assume N = K (K1) + 1, for some K, and K1 is a prime number
+ consider a matrix of size K1 by K1
+ it can generate K groups of K1 nonintersecting sets
- K1 nonintersecting rows
- K1 nonintersecting columns
- (K2) of (K1) nonintersecting diagonals
- different diagonals: jump 1 on each row (the real diagonal), jump 2, ....,
jump (K1)1
+ each number (out of the first K numbers) can be combined with each of
the K1 nonintersecting sets to produce K1 of 1-element-intersected
sets

Request Set Generation Example -- K=6
+ N = 6 * 5 + 1 = 31, K = 6, matrix is 5 by 5
- the first K numbers 123456 form one set
- 1 combined with all rows to form one set
- 2 combined with all columns to form one set
- 3 combined with all jump-1 diagonals
jump-1 diagonals: 7djpv, 8ekqr, 9flms, ....
- 4 combined with all jump-2 diagonals
jump-2 diagonals: 7elnu, 8fhov, 9gipr, ....
- 5 combined with all jump-3 diagonals
jump-3 diagonals: 7fiqt, 8gjmu, ....
- 6 combined with all jump-4 diagonals
jump-4 diagonals: 7gkos, 8clpt, , bfjnr
- total K(K1)+1 = 31 sets
1 2 3 4 5 6
7 8 9 a b
c d e f g
h i j k l
m n o p q
r s t u v
Request Set Assignment Example -- K=6
- How to assign the 31 sets to the 31 nodes
- node 1 gets the first set: 123456
- the request set constructed from each row is assigned to
the 2nd node in the set
e.g., request set 1789ab is assigned to node 7
- now, all nodes in the first column have their request sets
- node 2 gets the set of 2 and first column
- the request set constructed from each column is assigned
to the 2nd node in the set
e.g., node 8 has request set 28dins
note that, set 27chmr is assigned to node 2, not 7
- now, the first node of each column and each row have
their request sets
- the jump-X diagonals will be assigned to the rest of the
nodes
1 2 3 4 5 6
7 8 9 a b
c d e f g
h i j k l
m n o p q
r s t u v
3 4 5 6
d e f g
i j k l
n o p q
s t u v
Request Set Assignment Example -- K=6
- the request set constructed from each jump-1 diagonal is
assigned to the 3rd node in the request set
request set 37djpv is assigned to node d
but, set 3bciou is assigned to node 3, not node c
- the request set constructed from each jump-2 diagonal is
assigned to the 4th node in the request set
e.g., request set 47elnu is assigned to node l
but, set 48fhov is assigned to node 4, not node h
- the request set constructed from each jump-3 diagonal is
assigned to the 5th node in the request set
e.g., request set 57fiqt is assigned to node q
but, set 58gjmu is assigned to node 5, not node m
- the request set constructed from each jump-4 diagonal is
assigned to the last node in the request set
e.g., request set 67gkos is assigned to node s
but, set 6bfjnr is assigned to node 6, not node r
1 2 3 4 5 6
7 8 9 a b
c d e f g
h i j k l
m n o p q
r s t u v
Request Sets Generation Algorithm (Cont.)
+ if K1 is a power of a prime number
- it is possible to generate optimal request sets

+ if K1 is not a power of a prime number or N cannot be expressed as
K(K1)+1
- find a number M where M is the smallest integer which is greater than N
and can be expressed as K(K1), for some K, where K is the power of a
prime number
- generate the required sets for M processors
- replace numbers N+1..M by 1..MN
- remove MN sets

+ same thing can be done for site failures
+ consider the closest prime number that can be divided into K(K1)+1
+ N=5 M=7
+ derive the sets from M=7 and remove the duplicated nodes
1 2 3
4 5
1 2 -- replace nodes 6 and 7 by 1 and 2
S1 = {1, 2, 3}
S4 = {1, 4, 5}
S6 = {1, 1, 2} remove
S2 = {2, 4, 1}
S5 = {2, 5, 2} {2, 5}
S7 = {3, 4, 2} remove
S3 = {3, 5, 1}
Request Set Generation Example -- N=5
Token Ring Algorithm
+ a unique token is associated with the CS
+ P
i
enters CS only if it owns the token

+ Request to enter CS:
1. if P
j
owns the token and it does not need to enter the CS, then it passes the
token to P
(j+1) mod N

2. P
i
will sooner or later gets the token

+ Enter the CS:
1. when P
i
owns the token, it enters CS

+ Release the CS:
1. pass the token to the next processor

Token Ring Algorithm -- Properties
+ simple and no deadlock or starvation
+ number of messages and response time
- if only one node needs the token, the token will traverse N/2 nodes on
average
- best case: 0 message (the node has the token) 0 delay
- worst case: N1 messages (sequentially) (N1)T delay
+ tolerable overhead with small N
+ cannot scale up for large N
+ it is difficult to design a fault tolerant algorithm for this scheme
+ The concept of token is similar to centralized control, however, the
central site is moving

Suzuki-Kasami's Broadcast Algorithm
+ data structures:
- vector X: associated with the token
X[i]: the timestamp of the last request from P
i
that has been served
- vector RT
j
: associated with node P
j

RT
j
[i]: the timestamp of the most current request from P
i
known by P
j

- node j determines whether a node k has an outstanding request by checking
whether RT
j
[k] > X[k]

Suzuki-Kasami's Broadcast Algorithm
+ Request the CS:
- P
i
increase RT
i
[i] by 1 and broadcasts request (RT
i
[i], i) to all processors
- P
j
upon receiving the request request (T,i)
a) update RT
j
[i] to max (RT
j
[i], T)
b) if it has the token then execute (A)
c) otherwise, do nothing
+ A:
- Go through RT and X
The algorithm did not specify a specific starting point
Can starting from (j+1) % N, to avoid starvation
(P
j
holds the token, is where the last check stopped at)
- If RT
j
[k] > X[k], for some k then
P
k
has an outstanding request, send the token to P
k

Suzuki-Kasami's Broadcast Algorithm
+ Enter the CS:
- if P
i
has received the token then it enters the CS

+ Release the CS:
- P
i
upon exiting CS, sets X[i]= RT
i
[i]
- execute (A)

Suzuki-Kasami's Broadcast Algorithm -- Properties
+ this algorithm gives better fault tolerance in the sense of handling
requests
- as long as the request is received by some processors that will possess the
token, the request will be processed
+ however, the problem of missing token is still there
- e.g. the token is held by a dead processors or is sent to a dead processor
+ require N messages per request
- N1 messages for broadcasting the request
- 1 message sending the token
- if the node that wants to enter the critical section happens to have the token,
then there is no message needed
+ response time
- in general, there is a delay of 2T
- in best case, there is no delay
Raymond's Tree-Based Algorithm
+ the processors are structured as a tree and the token is placed at the root
node
+ the tree restructures when the token moves

+ Request the CS (going up the tree):
1. P
i
send request (i) to its parent and puts the request in its queue if it does not
hold the token
2. P
j
upon receiving the request
a) puts the request in its queue
b) if it has not sent a request to its parent then
sends request (j) to its parent
c) otherwise (a request has already been sent to its parent for another
child node)
does nothing

Raymond's Tree-Based Algorithm
+ Request the CS (going down the tree):
3. root site upon receiving the request
a) puts the request in its queue
b) executes (DTPR)
4. P
j
, upon receiving the token,
a) if it was not requesting to enter CS or its request was not on the top of
its queue then executes (DTPR)

+ D. delete the top entry from its requesting queue
+ T. send the token to the requesting child
+ P. update parent pointer to point to the requesting child
+ R. if its request queue is non-empty then send a request to the new
parent
Raymond's Tree-Based Algorithm
+ Enter the CS:
1. if P
i
has received the token and its request is on the top of its queue then it
enters the CS

+ Release the CS:
1. P
i
upon exiting CS
a) if its queue is not empty, then executes (DTPR)

Raymond's Tree-Based Algorithm-- Example
1
2 3
4 5 6 7
1. token is at node 1
node 5 made a request
1
2 3
4 5 6 7
3. node 4 also sends a request,
node 2 receives it
1
2 3
4 5 6 7
4. token is at node 2 now
node 2 becomes the root
1
2 3
4 5 6 7
5. node 5 gets the token, it enters CS
6. node 2 sends a request to node 5
2. node 2 receives
the request, it sends
the request to node 1
Raymond's Tree-Based Algorithm-- Example
1
2 3
4 5 6 7
7. node 5 sends the token
to node 2
1
2 3
4 5 6 7
8. node 4 gets the token, it enters CS
9. node 3 sends a request
1
2 3
4 5 6 7
10. the request from node 2
comes to node 4
1
2 3
4 5 6 7
11. node 3 gets the token, and becomes
the root
Raymond's Tree-Based Algorithm -- Properties
+ the node with the token is always the root node
+ requires the nodes on the entire path, from requester to root, to be alive
in order to process a request
- still has the lost token problem
+ requires 2 logN messages per request in average
- longest path: 2 logN (when the root is at the leaf of the original tree)
- best case: 0 messages
- worst case: 4 logN messages (2 logN to the root, 2 logN back with token)
+ response time
- the message passing has to be done sequentially
- the average response time: T logN
- the best case response time: 0
- the worst case response time: 4T logN
Performance Comparisons
T: per message transmission time
E: computation time
response time: consider low load
algorithm response time # messages
Lamport 2T+E
3(N1)
Ricart-Ag 2T+E
2(N1)
Maekawa 2T+E
3\N 5\N
token-ring
[0NT]+E 0 N
broadcast [0 or 2T]+E 0 or N
tree-based
[04T logN]+E [0 4 logN]

You might also like