Unit 3

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Election algorithm- Bully and Ring election algorithm

Election Algorithms
Many tasks in distributed systems require one of the processes to act as the coordinator.
Election algorithms are techniques for a distributed system of N processes to elect a coordinator
(leader). A coordinator can be chosen amongst all processes through leader election. There are
two typical algorithms: Bully Algorithm and Ring-Based Algorithm.
Bully Algorithm
− The bully algorithm is a simple algorithm, in which we enumerate all the processes
running in the system and pick the one with the highest ID as the coordinator. In this
algorithm, each process has a unique ID and every process knows the corresponding ID
and IP address of every other process.
− A process initiates an election if it just recovered from failure or if the coordinator failed.
Any process in the system can initiate this algorithm for leader election. Thus, we can
have concurrent ongoing elections.
− There are three types of messages for this algorithm: election, OK and I won. The
algorithm is as follows:
1. A process with ID i initiates the election.
2. It sends election messages to all process with ID > i.
3. Any process upon receiving the election message returns an OK to its predecessor
and starts an election of its own by sending election to higher ID processes.
4. If it receives no OK messages, it knows it is the highest ID process in the system.
It thus sends I won messages to all other processes.
5. If it received OK messages, it knows it is no longer in contention and simply drops
out and waits for an I won message from some other process.
6. Any process that receives I won message treats the sender of that message as
coordinator.
Figure 1 The bully election algorithm. (a) Process 4 holds an election. (b) Processes 5 and 6 respond,
telling 4 to stop. (c) Now 5 and 6 each hold an election. (d) Process 6 tells 5 to stop. (e) Process 6 wins
and tells everyone.

Ring-based Election Algorithm


The ring algorithm is similar to the bully algorithm in the sense that we assume the processes
are already ranked through some metric from 1 to n. However, here a process i only needs to
know the IP addresses of its two neighbors (i+1 and i-1). We want to select the node with the
highest id. The algorithm works as follows:
• Any node can start circulating the election message. Say process i does so. We can choose to
go clockwise or counter-clockwise on the ring. Say we choose clockwise where i+1 occurs
after i.
• Process i then sends an election message to process i+1.
• Anytime a process j ≠ i receives an election message, it piggybacks its own ID (thus declaring
that it is not down) before calling the election message on its successor (j+1).
• Once the message circulates through the ring and comes back to the initiator i, process i knows
the list of all nodes that are alive. It simply scans the list and chooses the highest id.
• It lets all other nodes about the new coordinator.
Note that a process might have to jump over its neighbor and contact the neighbour’s neighbor
in case the neighbor has failed, otherwise the circulation of the election message across the ring
will never finish.
Figure 2 Election algorithm using a ring

If the neighbor of a process is down, it sequentially polls each successor (neighbor of neighbor)
until it finds a live node. For example, in the figure 2, when 7 is down, 6 passes the election
message to 0.

Mutual exclusion: Centralized, Distributed mutual exclusion algorithms

Distributed Synchronization (Mutual Exclusion)


Every time we wish to access a shared data structure or resource [critical section] in a
distributed system, we need to guard it with a lock. A lock is acquired before the data structure
is accessed, and once the transaction has completed, the lock is released. Absence of locking
mechanism (synchronization) may cause a race condition leading to corruption of data.
Centralized Algorithm (for Mutual Exclusion)
− In this case, locking and unlocking coordination are done by a master process. All
processes are numbered 1 to n. We run leader election to pick the coordinator. Now, if
any process in the system wants to acquire a lock, it has to first send a lock acquire
request to the coordinator.
− Once it sends this request, it blocks execution and awaits reply until it acquires the lock.
− The coordinator maintains a queue for each data structure of lock requests. Upon
receiving such a request, if the queue is empty, it grants the lock and sends the message,
otherwise it adds the request to the queue.
− The requester process upon receiving the lock executes the transaction, and then sends
a release message to the coordinator.
− The coordinator upon receipt of such a message removes the next request from the
corresponding queue and grants that process the lock.
− This algorithm is fair and simple.
Figure 3 (a) Process 1 asks the coordinator for permission to access a shared resource. Permission is
granted. (b) Process 2 then asks permission to access the same resource. The coordinator does not
reply. (c) When process 1 releases the resource, it tells the coordinator, which then replies to 2.

− There are two major issues with this algorithm, related to failures.
o When coordinator process goes down while one of the processes is waiting on
a response to a lock request, it leads to inconsistency. The new coordinator that
is elected (or reboots) might not know that the earlier process is still waiting for
a response.
o The harder problem occurs when one of the client process crashes while it is
holding the lock. In such a case, coordinator is just waiting for the lock to be
released while the other process has gone down.
Advantages:
• Fair: requests are granted the lock in the order they were received (actually other
strategies can also be implemented)
• Simple: three messages per use of a critical section (request, grant, release).
Disadvantages:
• The coordinator is a single point of failure.
• Hard to detect a dead coordinator.
• Performance bottleneck in large distributed systems.

Decentralized Algorithm (for Mutual Exclusion)


− Decentralized algorithms use voting to figure out which lock requests to grant.
− Essentially, every process keeps a track of who has the lock.
− For a new process to acquire a new lock, it has to be granted an OK (receive an vote)
from the strict majority of the processes.
− Here, strict majority means more than half the total number of nodes (live or not) in the
system. Thus, if any process wishes to acquire a lock, it requests it from all other
processes and if the majority of them tell it to acquire the lock, it goes ahead and does
so.
− The majority guarantees that a lock is not granted twice.
− Upon the receipt of the vote, the other processes are also told that a lock has been
acquired and thus, the processes hold up any other lock request.
− Once a process is done with the transaction, it broadcasts to every other process that it
has released the lock.
− This solves the problem of coordinator failure because if some nodes go down, we can
deal with it so long as the majority agrees that whether the lock is in use or not.
− client process crashes while it is holding the lock is still a problem here.

Distributed Algorithm (for Mutual Exclusion)

− This algorithm, developed by Ricart and Agrawala, needs 2(n−1) messages and is based
on Lamport’s clock and total ordering of events to decide on granting locks.
− After the clocks are logically synchronized, the process that asked for the lock first gets
it. The initiator sends request messages to all n − 1 processes stamped with its ID and
the timestamp of its request. It then waits for replies from all other processes.
− Any other process upon receiving such a request either sends reply if it does not want
the lock for itself, or is already in the transaction phase (in which case it doesn’t send
any reply and the initiator has to wait), or it itself wants to acquire the same lock in
which case it compares its own request timestamp with that of the incoming request.
The one with the lower timestamp gets the lock first.
This algorithm is based on event ordering and timestamps. It grants the lock to the
process which requests first.
Process k enters critical section as follows:
• Generates a new time stamp TSk = TSk + 1
• Sends request (k, TSk) to all other n-1 processes
• Waits for reply(j) from all other processes
• Enters critical section
When a process j receives process K’s request (k, TSk), it:
• Sends reply(j) if it is not interested
• If already in critical section, does not reply, queues request (k, TSk).
• If it wants to enter: compare TSk & TSJ and sends reply(j) if TSk < TSJ, otherwise,
queues request (k, TSk).
This approach is fully decentralized but there are n points of failure, which is worse than the
centralized one.
Figure 4 . (a) Two processes want to access a shared resource at the same moment. (b) Process 0 has
the lowest timestamp, so it wins. (c) When process 0 is done, it sends an OK also, so 2 can now go
ahead.

Mutual exclusion: Token ring mutual exclusion algorithms, comparison of these


algorithms.

Token ring mutual exclusion algorithms


− In the token ring algorithm, the actual topology is not a ring, but for locking purpose
there is a logical ring and processes only talk to neighbouring processes.
− A process with the token has the lock at that time. To acquire the lock one needs to wait.
The token is circulated through the logical ring.
− No method is there to request the token, the process needs to wait to get a lock. Once
the process has token it can enter the critical section.
− Only one machine has lock at any instance in the network and hence only one process
can access the critical section at that particular time.
− In this algorithm one problem is loss of token. Regenerating the token is non-trivial, as
you cannot use timeout strategy.

Figure 5 (a) An unordered group of processes on a network. (b) A logical ring constructed in software.

− When the ring is initialized, process 0 is given a token. The token circulates around the
ring. It is passed from process k to process k +1 (modulo the ring size) in point-to-point
messages.
− When a process acquires the token from its neighbor, it checks to see if it needs to
access the shared resource. If so, the process goes ahead, does all the work it needs to,
and releases the resources. After it has finished, it passes the token along the ring.
− It is not permitted to immediately enter the resource again using the same token.
− If a process is handed the token by its neighbor and is not interested in the resource, it
just passes the token.
− As a consequence, when no processes need the resource, the token just circulates at
high speed around the ring.
− The correctness of this algorithm is easy to see. Only one process has the token at any
instant, so only one process can actually get to the resource.
− Since the token circulates among the processes in a well-defined order, starvation
cannot occur.

A Comparison of the Four Algorithms


Fig. 6 listed the algorithms and three key properties: the number of messages required for a
process to access and release a shared resource, the delay before access can occur (assuming
messages are passed sequentially over a network), and some problems associated with each
algorithm.

Figure 6 A comparison of four mutual exclusion algorithms.

− The centralized algorithm is simplest and also most efficient. It requires only three
messages to enter and leave a critical region: a request, a grant to enter, and a release to
exit.
− In the decentralized case, messages need to be carried out for each of the m
coordinators, but now it is possible that several attempts need to be made (k).
− The distributed algorithm requires n − 1 request messages, one to each of the other
processes, and an additional n − 1 grant messages, for a total of 2(n − 1) [assuming only
point-to-point communication channels between processes].
− It takes only two message times to enter a critical region in the centralized case, but
3mk times for the decentralized case, where k is the number of attempts that need to be
made. Assuming that messages are sent one after the other, 2(n − 1) message times are
needed in the distributed case. For the token ring, the time varies from 0 (token just
arrived) to n – 1 (token just departed).
− With the token ring algorithm, if every process constantly wants to enter a critical
region, then each token pass will result in one entry and exit, for an average of one
message per critical region entered. At the other extreme, the token may sometimes
circulate for hours without anyone being interested in it. In this case, the number of
messages per entry into a critical region is unbounded.
− All algorithms except the decentralized one suffer badly in the event of crashes. Special
measures and additional complexity must be introduced to avoid having a crash bring
down the entire system. The decentralized algorithm is less sensitive to crashes, but
processes may suffer from starvation and special measures are needed to guarantee
efficiency.

Clock Synchronization

The motivation of clock synchronization


Just like usual clocks, systems have a clock that tells time to applications running on the system.
Centralized machines have just 1 clock, but in the case of distributed systems each machine
has its own clock. All these clocks might not be in sync and may drift over time. Hence the
time will be dependent on which local clock is being checked.
Logical clock and Event Ordering
− In logical clocks, processes need to know the order in which the events occurred instead
of the exact time. In logical clocks, there is no global clock and local clocks may run
faster or slower. In this approach, timestamps are used to order of events. For logical
clocks, the problem we address is how to reason about the ordering of events in a
distributed environment without the specific times at which events occurred.
− We can use the concept of a logical time, which depends on message exchanges among
processes, to figure out the event order. A message involves two events: the sending of
a message on the first process, and the receipt of the message in the second process.
Since the message has to be sent before it is received, those two events are always
ordered. This allows us to order two events across machines.
− In addition, local events within a process are also ordered based on the execution order.
− We can use send/receive messages exchanged between processes/machines to order
events, and if 2 processes never communicate with each other, then in such cases we
don’t need to find order.
Lamport’s Logical Clocks
To synchronize logical clocks, Lamport defined a relation called happens-before. The
expression a → b is read ‘‘a happens before b’’ and means that all processes agree that first
event a occurs, then afterward, event b occurs.
The happens-before relation can be observed directly in two situations:
1. If a and b are events in the same process, and a occurs before b, then a → b is true.
2. If a is the event of a message being sent by one process, and b is the event of the same
message being received by another process, then a → b is also true. A message cannot be
received before it is sent. i.e., the fundamental property we use is the send event occurs before
the receive event.
− Happens-before is a transitive relation, so if a → b and b → c, then a → c.
− If two events, x and y, happen in different processes that do not exchange messages (not
even indirectly via third parties), then x → y is not true, but neither is y → x. These
events are said to be concurrent, which simply means that nothing can be said (or need
be said) about when the events happened or which event happened first.

− To implement Lamport’s logical clocks, each process Pi maintains a local counter Ci.
These counters are updated as follows steps.
o Before executing an event (i.e., sending a message over the network, delivering
a message to an application, or some other internal event), Pi executes Ci ← Ci
+ 1. i.e., each processor has a logical clock which gets incremented whenever
an event occurs.
o When process Pi sends a message m to Pj, it sets m’s timestamp ts(m) equal to
Ci after having executed the previous step.
o Upon the receipt of a message m, process Pj adjusts its own local counter as Cj
← max {Cj, ts(m)}, after which it then executes the first step and delivers the
message to the application.
− This makes sure that the timestamp assigned to the receiver event is higher than the
sending event. This technique was invented by Leslie Lamport.

Example of Lamport’s Logical Clocks

Vector Clocks
− Lamport’s logical clocks lead to a situation where all events in a distributed system are
totally ordered with the property that if event a happened before event b, then a will
also be positioned in that ordering before b, that is, C (a) < C (b).
− However, with Lamport clocks, nothing can be said about the relationship between two
events a and b by merely comparing their time values C (a) and C (b), respectively.
− In other words, if C (a) < C (b), then this does not necessarily imply that event a indeed
happened before event b.
− The problem is that Lamport clocks do not capture causality. Causality can be captured
by means of vector clocks.
− A vector clock VC (a) assigned to an event a has the property that if VC (a) < VC (b)
for some event b, then event a is known to causally precede event b.
− Vector clocks are constructed by letting each process Pi maintain a vector VCi with the
following two properties:
1. VCi [i] is the number of events that have occurred so far at Pi. In other words,
VCi [i] is the local logical clock at process Pi.
2. If VCi [j] = k then Pi knows that k events have occurred at Pj. It is thus Pi’s
knowledge of the local time at Pj.
− The first property is maintained by incrementing VCi [i] at the occurrence of each new
event that happens at process Pi. The second property is maintained by piggybacking
vectors along with messages that are sent. In particular, the following steps are
performed:
1. Before executing an event (i.e., sending a message over the network, delivering
a message to an application, or some other internal event), Pi executes VCi [i] ←
VCi [i] + 1.
2. When process Pi sends a message m to Pj, it sets m’s (vector) timestamp ts(m)
equal to VCi after having executed the previous step.
3. Upon the receipt of a message m, process Pj adjusts its own vector by setting
VCj [k] ← max {VCj [k], ts(m) [k]} for each k, after which it executes the first
step and delivers the message to the application.

Example of Vector Clock

You might also like