Distributed Mutual Exclusion: Lecture Notes

Syllabus
Blank Homework
Notes Labs Scores Blank
Lecture Notes
Dr. Tong Lai Yu, March 2010
0. Review and Overview 7. Distributed OS Theories

1. B-Trees 8. Distributed Mutual Exclusions
2. An Introduction to Distributed Systems 9. Agreement Protocols
3. Deadlocks 10. Distributed Scheduling

4. Distributed Systems Architecture 11. Distributed Resource Management
5. Processes 12. Recovery and Fault Tolerance
6. Communication 13. Security and Protection

Distributed Mutual Exclusion
Life consists not in holding good cards but in playing those you hold well.
Josh Billings
1. Introduction
A Centralized Algorithm
One process is elected as the coordinator.
Whenever a process wants to access a shared-resource, it sends request
to the coordinator to ask for permission.

Coordinator may queue requests.
Decentralized
nontoken-based
token-based
Requirements of Mutual Exclusion Algorithms

only one request accessess the CS at a time ( primary goal )
Freedom from deadlocks
Freedom from starvation
Fairness
Fault Tolerance
Performance of a mutual exclusion algorithm

System throughput S ( rate at which the system executes requests for the CS )
1
S = --------
Sd + E
Sd = synchronization delay
E = average execution time

low load and high load performance
best and worst case performance; if fluctuates statistically, take average
8. Election Algorithms
Principle
An algorithm requires that some process acts as a coordinator. The question
is how to select this special process dynamically.
Note
In many systems the coordinator is chosen by hand (e.g. file servers). This
leads to centralized solutions ) single point of failure.
After a network partition, the leader-less partition must elect a leader.
Election by bullying
Principle
Each process has an associated priority (weight). The process with
the highest priority should always be elected as the coordinator.
Issue
How do we find the heaviest process?
Any process can just start an election by sending an election
message to all other processes with higher numbers.

If a process Pheavy receives an election message from a lighter
process Plight, it sends a take-over message to Plight. Plight is out of
the race.
If a process doesn't get a take-over message back, it wins, and
sends a victory message to all other processes.
(a) Pocess 4 holds an election. (b) Processes 5 and 6 respond, telling 4 to stop.
(c) Noew 5 and 6 hold an election. (d) Process 6 tells 5 to stop.
(e) Process 6 wins and tells everyone.
Issue
Suppose crashed nodes comes back on line:
Sends a new election message to higher numbered processes

Repeat until only one process left standing
Announces victory by sending message saying that it is coordinator (if not already coordinator)
Existing (lower numbered) coordinator yields
Hence the term 'bully'
Election in a ring
Principle
Process priority is obtained by organizing processes into a (logical)
ring. Process with the highest priority should be elected as
coordinator.
Any process can start an election by sending an election message
to its successor. If a successor is down, the message is passed
on to the next successor.

If a message is passed on, the sender adds itself to the list. When
it gets back to the initiator, everyone had a chance to make its
presence known.
The initiator sends a coordinator message around the ring
containing a list of all living processes. The one with the highest
priority is elected as coordinator.
2 and 5 start election message independently. Both messages continue to circulate.

Eventually, both messages will go all the way around
. 2 and 5 will convert Election messages to COORDINATOR messages.
All processes recognize highest numbered process as new coordinator.
Question
Does it matter if two processes initiate an election?
Question
What happens if a process crashes during the election?
19. Non-token-based algorithms
Lamport's Algorithm
Si -- site, N sites
each site maintains a request set
Ri = { S1, S2, ..., SN }

request-queuei containing mutual exclusion
requests ordered by their timestamps,
use => total order relation ( with Lamport's clock )
tsi -- timestamp of site i
Assume
messages are received in the same order as they are sent
eventually every message is received
1. To request entering the CS, process Pi sends a

REQUEST( tsi, i ) message to every
process ( including itself ), puts the request on request-
queuei
2. When process Pj receives REQUEST(tsi, i ), it places it on its request-queuej and sends a timestamped
REPLY ( acknowledgement ) to Pi
3. Process Pi enters CS when the following 2 conditions

are satisfied:
Pi's request is at the head of request-queuei
Pi has received a ( REPLY ) message
from every other process time-stamped later than tsi
4. When exiting the CS, process Pi removes its request

from head of its request-queue and sends a timestamped RELEASE to
every other
process
5. When Pj receives a RELEASE from Pi, it

removes Pi's request from its request queue.
See video Lamport Mutual Exclusion Agorithm
Performance
for each CS invocation

(N-1) REQUEST
(N-1) REPLY
(N-1) RELEASE
total 3(N-1) messages
synchronization delay Sd = average delay
Ricart, Agrawala optimized Lamport's algorithm by

merging the RELEASE and REPLY messages.
(See example below.)
Example:
(a) Two processes want to access a shared resource at the same time
(b)Process 0 has the lowest timestamp, so it wins
(c) When process 0 is done, it sends an OK also, so 2 can now go ahead
Maekawa's Voting Algorithm
Voting Algorithms:
Lamport's algorithem requires a process to get permisson from all other processes. It is an overkill.
A different approach is to let processes compete for votes. If a process has received more votes than any other process, it can
enter the CS. If it does not have enough votes, it waits until the process in the CS is done and releases its votes.
Quorums have the property that any two groups have a non-empty intersection.
Simple majorities are quorums. Any 2 sets whose sizes are simple majorities must have at least one element in common.
12 nodes, so majority is 7
Grid quorum: arrange nodes in logical grid (square). A quorum is all of a row and all of a column. Quorum size is 2 √ N - 1.
Principles:
To get accessi to a CS, not all processes have to agree

Suffices to split set of processes up into subsets ("voting sets") that
overlap
Suffices that there is consensus within every subset
When a process wishes to enter the CS, it sends a vote request to every member of its voting district.
When the process receives replies from all the members of the district, it can enter the CS.
When a process receives a vote request, it responds with a "YES" vote if it has not already cast its vote.
When a process exits the CS, it informs the voting district, which can then vote for other candidates.
May have deadlock.
Request sets
N = { 1, 2, ..., N }
Ri ∩ Rj ≠ ∅ all i, j ∈ N
A site can send a REPLY ( LOCKED ) message only if it has not been LOCKED (i.e. has not cast the vote).
Properties:
1. Ri ∩ Rj ≠ ∅
2. Si ∈ Ri
3. |Ri| = K for all i ∈ N
4. any site Si is in K number of Ri's
Maekawa found that:
N=K(K-1)+1
or K = |Ri| ≈ √N
Messages exchange:
Failed -- F, Sj cannot grant permission to Sk

because Sj has granted permission to a site with higher
request priority.
Inquire -- I, Sj wants to find out if Sk has

successfully locked all sites. ( the outstanding grant to
Sk
has a lower priority than the new request )
Yield -- Y, Sj yields to Sk
( Sj has received a failed message from some other site
or Sj has sent a
yield to some other site but has not received a new grant )
( The request's priority is determined by its

sequence number ( timestamp ); the samller the sequence number,
the higher the priority; if sequence #
same, the one
with smaller site number has higher priority )
Algorithm:
1. A site Si requests access to CS by sending

REQUEST(i) messages to all the sites in its request set Ri
2. When a site Sj receives the REQUEST(i) message,
it sends a REPLY(j) message to Si provided it hasn't
sent any REPLY to any site since last
RELEASE. Otherwise, it
queues up the REQUEST.
3. Site Si could access the CS only after it has
received REPLY from all sites in Ri
Deadlock Handling:
1. When a REQUEST(i) from Si blocks at site Sj

because Sj has currently granted permission to site
Sk then Sj sends FAILED(j) message to
Si if
Si has lower priority. Otherwise
Sj sends an INQUIRE (j) message to Sk.
2. In response to an INQUIRE(j) from Sj, site
Sk sends YIELD(k) to Sj, provided
Sk has received a FAILED message or has sent
a YIELD to
another site, but has not recived a new REPLY from it.
3. In response to a YIELD(k) message from Sk,
site Sj assumes it has been released by Sk, places the request of Sk at the
appropriate location in
the request queue, and sends a
REPLY(j) to the top request's site in the queue. Sj
Example
13 nodes, 13 = 4(4-1) + 1, thus K = 4
R1 = { 1, 2, 3, 4 }
R2 = { 2, 5, 8, 11 }
R3 = { 3, 6, 8, 13 }
R4 = { 4, 6, 10, 11 }
R5 = { 1, 5, 6, 7 }
R6 = { 2, 6, 9, 12 }
R7 = { 2, 7, 10, 13 }
R8 = { 1, 8, 9, 10 }
R9 = { 3, 7, 9, 11 }
R10 = { 3, 5, 10, 12 }
R11 = { 1, 11, 12, 13 }
R12 = { 4, 7, 8, 12 }
R13 = { 4, 5, 9, 13 }
Suppose sites 11, 8, 7 want to enter CS; they all send requests
with sequence number 1. ( 7 has highest priority, 8 next, 11 lowest )
1. site 11 wants to enter; requests have arrived at 12, 13;

R to 1 is on the way
2. 7 wants to enter CS; R arrived at 2 and 10 but R to 13 is
on its way
3. 8 also wants to enter CS; sends R to 1, 9, 10 but fails
to lock 10 because 10 has been locked by 7 with
higher priority
4. R from 11 finally arrived at 1 and R from 7 arrived at 13

11, 7, 8 are circularly locked:
8 receives F and cannot enter CS

11 receives F and cannot enter CS
7 cannot enter CS because it has not received all REPLY ( LOCKED) messages
8. 13 is locked by 11 ( has lower priority than 7 ) and receives request from 7, so it sends an
INQUIRE to 11 to ask it to yield
9. When 11 receives an INQUIRE, it knows that it cannot enter CS; therefore it
sends a YIELD to 13
10. then 13 can send L to 7 which enters CS
11. when 7 finished, sends RELEASE
12. then 8 locks all members, ... , sends RELEASE
13. then 11 enters
35. Token-based algorithms
Principles
one token, shared among all sites
site can enter its CS iff it holds token
The major difference is the way the token is searched
use sequence numbers instead of timestamps
o used to distinguish requests from same site
o kept independently for each site
o use sequence number to distinguish between old

and current requests
The proof of mutual exclusion is trivial

The proof of other issues (deadlock and starvation) may be less so
(a) An unordered group of processes on a network.
(b) A logical ring connected in software.
a) Suzuki-Kasami's Broadcast Algorithm
TOKEN -- a special PRIVILEGE message

node owns TOKEN can enter CS
initially node 1 has the TOKEN
node holding TOKEN can execute CS repeatedly if no request
from others comes
if a node wants TOKEN, it broadcasts a REQUEST message to all
other nodes
node:
REQUEST(j, n)
node j requesting n-th CS invocation
n = 1, 2, 3, ... , sequence #
node i receives REQUEST from j
update RNi[j] = max ( RNi[j], n )
RNi[j] = largest seq # received so far from node j
TOKEN:
TOKEN(Q, LN ) ( suppose at node i )
Q -- queue of requesting nodes
LN -- array of size N such that
LN[j] = the seq # of the request of node j

granted most recently
When node i finished executing CS, it does the following

1. set LN[i] = RNi[i] to indicate that current request
of node i has been granted ( executed )
2. all node k such that
RNi[k] > LN[i]
(i.e. node k requesting ) is appended to Q if its not there

When these updates are complete, if Q is not empty, the front
node is deleted and TOKEN is sent there
FCFS
Example:
There are three processes, p1, p2, and p3.

p1 and p3 seek mutually exclusive access to a shared resource.
Initially: the token is at p2 and the token's state is LN = [0, 0, 0] and Q empty;
p1's state is: n1 ( seq # ) = 0, RN1 = [0, 0, 0];

p2's state is: n2 = 0, RN2 = [0, 0, 0];
p3's state is: n3 = 0, RN3 = [0, 0, 0];
p1 sends REQUEST(1, 1) to p2 and p3; p1: n1 = 1, RN1 = [ 1, 0, 0 ]
p3 sends REQUEST(3, 1) to p1 and p2; p3: n3 = 1, RN3 = [ 0, 0, 1 ]
p2 receives REQUEST(1, 1) from p1; p2: n2 = 1, RN2 = [ 1, 0, 0 ], holding token
p2 sends the token to p1
p1 receives REQUEST(3, 1) from p3: n1 = 1, RN1 = [ 1, 0, 1 ]

p3 receives REQUEST(1, 1) from p1; p3: n3 = 1, RN3 = [ 1, 0, 1 ]
p1 receives the token from p2
p1 enters the critical section
p1 exits the critical section and sets the token's state to LN = [ 1, 0, 0 ] and Q = ( 3 )
p1 sends the token to p3; p1: n1 = 2, RN1 = [ 1, 0, 1 ], holding token; token's state is LN = [ 1, 0, 0 ] and Q empty
p3 receives the token from p1; p3: n3 = 1, RN3 = [ 1, 0, 1 ], holding token

p3 enters the critical section
p3 exits the critical section and sets the token's state to LN = [ 1, 0, 1 ] and Q empty
Performance:
It requires at most N message exchange per CS execution ( (N-1) REQUEST messages + TOKEN message
or 0 message if TOKEN is in the site
synchronization delay is 0 or T
deadlock free ( because of TOKEN requirement )
no starvation ( i.e. a requesting site enters CS in finite time )
Comparison of Lamport and Suzuki-Kazami Algorithms

The essential difference is in who keeps the queue. In one case every site keeps its own local copy of the queue. In the other case, the
queue is passed around within the token.

Distributed Mutual Exclusion: Lecture Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Mutual Exclusion: Lecture Notes

Uploaded by

Copyright:

Available Formats

Syllabus

0. Review and Overview 7. Distributed OS Theories

to the coordinator to ask for permission.

Requirements of Mutual Exclusion Algorithms

Performance of a mutual exclusion algorithm

E = average execution time

An algorithm requires that some process acts as a coordinator. The question

is how to select this special process dynamically.

leads to centralized solutions ) single point of failure.

After a network partition, the leader-less partition must elect a leader.

Each process has an associated priority (weight). The process with

the highest priority should always be elected as the coordinator.

How do we find the heaviest process?

Any process can just start an election by sending an election

message to all other processes with higher numbers.

process Plight, it sends a take-over message to Plight. Plight is out of

sends a victory message to all other processes.

Suppose crashed nodes comes back on line:

Sends a new election message to higher numbered processes

Hence the term 'bully'

Process priority is obtained by organizing processes into a (logical)

ring. Process with the highest priority should be elected as

Any process can start an election by sending an election message

to its successor. If a successor is down, the message is passed

on to the next successor.

it gets back to the initiator, everyone had a chance to make its

priority is elected as coordinator.

2 and 5 start election message independently. Both messages continue to circulate.

All processes recognize highest numbered process as new coordinator.

Does it matter if two processes initiate an election?

What happens if a process crashes during the election?

19. Non-token-based algorithms

each site maintains a request set

Ri = { S1, S2, ..., SN }

use => total order relation ( with Lamport's clock )

tsi -- timestamp of site i

1. To request entering the CS, process Pi sends a

3. Process Pi enters CS when the following 2 conditions

4. When exiting the CS, process Pi removes its request

5. When Pj receives a RELEASE from Pi, it

See video Lamport Mutual Exclusion Agorithm

for each CS invocation

total 3(N-1) messages

synchronization delay Sd = average delay

Ricart, Agrawala optimized Lamport's algorithm by

(See example below.)

(b)Process 0 has the lowest timestamp, so it wins

(c) When process 0 is done, it sends an OK also, so 2 can now go ahead

Maekawa's Voting Algorithm

To get accessi to a CS, not all processes have to agree

Maekawa found that:

Failed -- F, Sj cannot grant permission to Sk

Inquire -- I, Sj wants to find out if Sk has

( The request's priority is determined by its

1. A site Si requests access to CS by sending

1. When a REQUEST(i) from Si blocks at site Sj

13 nodes, 13 = 4(4-1) + 1, thus K = 4

R11 = { 1, 11, 12, 13 }

1. site 11 wants to enter; requests have arrived at 12, 13;

4. R from 11 finally arrived at 1 and R from 7 arrived at 13

8 receives F and cannot enter CS

o used to distinguish requests from same site