Leader Deputies Algorithmfor Leader Electionin Distributed Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/346932388

Leader Deputies Algorithm for Leader Election in Distributed Systems

Research · December 2020


DOI: 10.13140/RG.2.2.16726.27203

CITATIONS READS
0 406

4 authors, including:

Ayman Azzam Amr Aboshama


Cairo University Cairo University
1 PUBLICATION   0 CITATIONS    1 PUBLICATION   0 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Leader Deputies Algorithm for Leader Election in Distributed Systems View project

All content following this page was uploaded by Ayman Azzam on 11 December 2020.

The user has requested enhancement of the downloaded file.


Leader Deputies Algorithm for Leader Election in
Distributed Systems
Ayman Azzam Amr Aboshama Reham Ali Menna Fekry
aymanazzam63@gmail.com am.aboshama98@gmail.com reham.ali10@gmail.com mennafekry98@gmail.com

I. I NTRODUCTION Bully algorithm. This paper [2] explains the bully algorithm
Generally, in distributed computing systems, a job can be and the useless overhead traffic that comes with it then adds
divided into sub-jobs and distributed among different nodes a modification for it to solve this issue.
(machines). For this job to be able to get distributed, there
must be a means of communication between those different
A. Bully Algorithm
nodes. One of the common patterns is to choose a leader
(coordinator) from those nodes. This leader is to manage If node P discovered that the leader is down it sends
and organize the distribution of tasks and the communication election messages to the other nodes with higher priorities
among all the nodes in the system. This approach has many then when they receive the elective messages they send
pros and cons. though the pros are more, they include the responses to P to stop. Every node of which received an
minimization of the coordination among the nodes, increasing election message sends election messages to the higher
the efficiency of the system and simplifying the architecture priority nodes and so on until the higher one wins. The higher
by making it hierarchical. node sends to all lower nodes coordination messages to tell
them that it’s the coordinator.
But, when speaking about leader election we must bear
in mind two serious issues. First, how to choose the leader
node, and second, what node will replace the leader node
in case of a crash. This crash may happen due to either
software or hardware reasons. Either cases, we have to adopt
some algorithms to choose another leader (Leader Election
Algorithms) for the system in the least time-down and least
traffic possible. Essentially, there are two traditional and well
known algorithms which are, bully algorithm and ring token
algorithm. Many researchers have invested time and effort on
decreasing the complexity of messaging among the system
nodes. Some keep on modifying the later two and others
brought up new algorithms Achieving O(N2) or O(N log(N))
messaging complexity.

In this paper we present an new algorithm inspired by Figure1. The bully election algorithm.
both, the traditional and the adaptive bully algorithms. The
adaptive bully algorithm is presented in [1]. Here, we not we For example as we see in figure1. (a) Node 3 holds an
managed to decrease the messaging traffic from O(N2) to election. (b) Nodes 4 and 5 respond on 3 with okay message.
O(N) in both average, and worst case, but we also decreased (c) Nodes 4 and 5 start an election. (d) Node 5 respond on
the time-down and increased the reliability of the system by 4 with okay message. (e) Node 5 tells everyone that ”it’s the
using a “leader collaboration” which consists of the main coordinator”.
leader and a two leader assist nodes. Mainly, the “leader
collaboration” is chosen to be the nodes with the currently
B. Modified Bully Algorithm
alive higher priority nodes in the system.
In case a node P discovers the leader is down, P sends
election messages to the set of the nodes with higher priorities
II. R ELATED W ORK S. Each of S sends a message with its priority to P. P sends a
In this section we will go through some papers that grant message to the node with the highest priority response
developed solutions to the Leader Election Problem. One of to grant it as a leader. The highest priority node sends
the most well-known algorithms in the leader election is the coordination messages to the rest of the nodes to inform
them that it’s the leader. In case set of nodes R discovered
the leader is down, each of R sends election messages to the
set of the nodes with higher priorities S. Each of S sends a
message with its priority to the node P which has the lowest
priority of nodes in R then P sends a grant message to the
node with the highest priority response to grant it as a leader.
The highest priority node sends coordination messages to the
rest of the nodes to inform them that it’s the leader. Figure3. The ring election algorithm.

For example as we see in figure3. Node 2 find out that the


For example as we see in figure2. (a) Node 3 and 2 holds leader is dead. So node 2 holds an election and it puts its
an election. (b) Nodes 3, 4 and 5 respond on 1 with their priority on queue then pass it to the next node(node 3). Then
unique priority number. (c) Node 2 comparing the priority node 3 puts its priority on the queue and pass it to the next
numbers and selects the highest node(node 5) and sends node(node 4) and so on till the queue reach to node 2 again
a message to it. (d) Node 5 tells everyone that ”it’s the then node 2 decides who will be the leader(node 5).
coordinator”.
One of the problems raise when two or more nodes send
messages to the Leader in the same time and the leader
is failed, elections start from each node that realized the
failing of the leader and that causes waste in the resources
and the time because of initiating each node a list to
elect the leader in the ring election algorithm, so the idea
of proposed ring election algorithm came to reduce that waste.

D. Proposed Ring Election Algorithm


When a node sends a message to the leader and the leader
is not replying in a time interval T, it considers the leader has
failed and starts the election process by sending an election
message to its successor then the successor sends a message
to its successor with the maximum between the received ID
Figure2. The modified bully election algorithm.
and its ID. The election process stops when node X receives
Another algorithm from the most well-known algorithms its ID again by a sequence of sends and receives of the nodes
in the leader election is the ring election algorithm and the in the ring ,then node X is considered the new elected leader
modified version to reduce the election messages and memory and sends coordination messages to all nodes in the ring to
needed in a new version called proposed ring election inform that it’s the leader.
algorithm from paper [3]. We will discuss both algorithms.

C. Ring Election Algorithm


Consider there are a set of nodes each has a unique id with
a leader node that communicates with the other nodes and
each node has a successor node (the next available node in
the ring), it can communicate with it in case of leader failing
Figure4. The proposed ring election algorithm.
and all nodes forming a ring shape. When a node P sends a
message to the leader and the leader is not replying in a time For example as we see in figure4. Node 2 find out that
interval T, it considers the leader has failed and starts the the leader is dead. So node 2 sends its priority to the next
election by creating a list that has its ID and sends it to its node(node 3). Then node 3 compare its priority with node 2
successor to inform it to stop its process and participate in and sends the higher priority to the next node(node 4) and so
the election, then the successor append its ID to the list and on till node5 receives its priority then it knows that it’s the
send it to its successor and so on, until the node P receives leader.
the list again then it considers the node with maximum ID
X is the leader then forward the same list to its successor to Another approach to reduce the useless overhead traffic
consider X is the leader too and so on. that happens because of the bully algorithm by adding an
announcer to choose a new leader when the current leader dies
from paper [4]. We explained the bully algorithm before so this from it, it considers it failed then it sends to NHPI instead to
time we will explain the modified version using an announcer. get the role of the leader, if it failed too, (Pd ) considers itself
as the leader and sends messages “I’m the Coordinator”
to the rest of the nodes to inform them that it’s the new leader.
E. Announcer Based Bully Election Leader Algorithm
The first case When a node P sends a message to the The experiments showed that (ABA) is more effective than
leader and finds that the leader is down then it sends an the regular (BA) and Modified Bully Algorithm (MBA) in
announcer message to the announcer to tell it that the leader the Number of Passing Messages and the Latency.
is down. The announcer checks that the leader is exactly down
or not. If not, it ignores the message and if exactly down, it Different approach from paper [5] is New Election
stores that the node P is the leader and sends coordination Algorithm based on Assistant and this approach has some
messages to all nodes to tell them that node P is the leader. assumptions.

The second case when several nodes R discovered at the


same time that the leader is down and send messages to the F. New Election Algorithm based on Assistant
announcer, the announcer stores that the higher priority in
nodes R is the leader then sends coordination messages to all Assumptions
nodes to tell them the new leader. • Each node has a log file, but if when the node crashes,
the contents of the core memory is removed.
The third case when node P discovers that both leader • Messages between the nodes may be lost due to possible
and announcer is down, it sends the announcer message to communication links failure.
the node N-1 considering it the new announcer then the • A node shall be able perform its tasks continuously even
new announcer stores that node P is the leader and sends if the leader is replaced with the leader assistant or new
coordination messages to all nodes to inform them of the leader.
new leader. • Node and link performances differ from each other.

Different approach In paper [1] they made a slightly Methodology: The cluster of nodes has a leader and an
different modification in the bully algorithm to reduce the assistant. The cluster itself is divided into sub-clusters where
number of messages used in the election process, they each one of them has a leader and an assistant too. Within the
managed to reduce the complexity of the number of messages sub-cluster both the leader and the assistant does not make
used from O(N2 ) to O(N). Instead of the time-consuming any additional functionality, their functionality appears when
ordinary election process in bully Algorithm that works as a the leader of the whole cluster crashes.
sequence of sending messages from many nodes to all the
nodes that have priority or ID higher than the sending one When a certain node n sends a message to the leader and
which in the worst case will ends up sending N2 , where it does not respond within a certain time(t) the node n sends
N represents the number of nodes in the system. Adaptive a message to the assistant telling it the leader has died, the
Bully Algorithm (ABA) use some variables to help making assistant by its turn sends a checking message to the leader
the election process faster, those variables are: to make sure it is actually dead. If the leader responds back
to the assistance, the assistant sends a message to the node n
• EV: which stores the id of the current leader. stating that the leader is still alive.
• Node ID: which stores the ID number of the node itself.
• HPI (Highest Process Identification) & NHPI (Next In case the leader did not reply to the assistant within a
HPI): which store the highest two IDs during the time limit, the assistant sends an “I am the leader” message
election process. to the whole system in order to keep the system working.
Afterwards, an election algorithm is carried on the leaders
When a node (Pd ) sends a request to the leader and the and the assistants of the sub-clusters, including the assistant
leader doesn’t answer, it considers the leader has failed and of the ex-leader. This election may result in a different leader
starts the election process by sending its ID to all the nodes and assistant other than the assistant of the ex-leader. Then
in the system, when node (Pq ) receives the ID of (Pd ), it the elected leader sends an “I am the leader” message to
compares it with its ID, if IDq ¿ IDp then it sends its ID to the whole system once more and the system returns back to
(Pd ), otherwise, it doesn’t send anything. When node (Pd ) the normal state.
receives again the IDs of all nodes which IDs are greater than
its one, it stores the highest two IDs in HPI & NHPI variables, The election algorithm tries to choose the optimal node.
then it sends to the node which ID is stored in HPI to tell The definition of optimal here can be summarized in two
it to send messages to the rest of the nodes to inform them main points:
that it is the leader, if it doesn’t receive an “OK” message
• The elected leader shall be the closest (physically) to all the three leaders but they communicate with the first leader
nodes. as the master and the other two leaders as normal nodes. We
• The elected leader shall have superior performance to have several cases.
all nodes.

Although Bully algorithm is simple but it causes heavy The first case when a node sends a message to the master
traffic over the network to elect new leader so in paper [6] the and it doesn’t respond, it assumes that the leader is failed
researchers came with new algorithm which is more effective then sends a raising message to the second leader as we
(in terms of number of messages and the number of stages see in figure5 then the second leader replies with an okay
decreased from at least 5 to at least 4 for electing new leader message. The second leader finds that it got the message from
as follows: a normal node so it becomes the first leader and informs all
nodes then it sends a raising message to the third leader and
There are K ordered(based on ID) nodes. The first is the third leader will reply with an okay message. The third
the leader and the others are alternatives to prevent global leader finds that it got the message from the first leader so
election between all nodes. it becomes the second leader and informs all nodes then it
sends a raising message to the next available node (the next
If node P notices that the leader is down it sends a higher priority and alive) and the next available node will
message to the first alternative to elect it to be the leader. reply with an okay message. The next available node finds
If the alternative is alive, It sends an ok message to the that it got the raising message from the second leader so it
node P which informs P that the alternative is alive then the becomes the third leader and informs all nodes.
alternative sends a message to the leader to make sure it is
down. If the leader replied with an ok message that means it
is alive and P made a mistake then the alternative informs P
that the leader is alive. But if the leader didn’t reply then the
alternative is selected as leader and broadcasts a message to
all nodes informing them it is the leader and introducing P
as its substitution. If the alternative is down, P sends another
election message to the next alternative and so on until any
of them becomes a leader. If all the alternatives are down,
Then P runs the Modified Election Algorithm as follows:

when P notices that all alternatives are down it sends


elective messages to all nodes with higher IDs. Every node
receives the message reply to be with its unique ID. If
no node responds to P, then P is selected as leader and it
sends to the other nodes to select the K-1 alternatives and
broadcasts it’s the leader. If some node responds, it will select
the leader and alternatives based on their IDs then send to
the new leader the grant message. The new leader broadcast
a message to all other nodes informing them it became the Figure5. Leader-Deputies Algorithm Case 1 .
leader and including its ID and the alternatives IDs.

The second case when a node finds that the first and
III. M ETHODOLOGY second leaders have failed as we see in figure6 then it sends
We have come up with algorithms which are more reliable a raising message to the third leader and the third leader will
and have less messages and time complexity than Bully reply with an okay message. The third leader finds that it
Algorithm, Adaptive Bully Algorithm and Modified Bully got the message from normal node so it becomes the first
Algorithm. Our algorithm has an instantaneous leader sub- leader and informs all nodes then it sends a raising message
stitution so that will save time electing a new leader for the to the next available node X and node X will reply with an
substitution and has tasks sharing to save time for the leader okay message. Node X finds that it got the message from the
to deal with more requests with the nodes. first leader so it becomes the second leader and informs all
nodes that then sends a raising message to the next available
A. Leader-Deputies Algorithm node Y and node Y will reply with an okay message. Node
We have three important nodes: the first leader (the highest Y finds that it got the message from the second leader so it
priority), the second leader (the second highest priority) and becomes the third leader and informs all nodes.
the third leader (the third highest priority). All nodes know
Figure8. Leader-Deputies Algorithm when the first and third leader are
dead.

The fifth case is the same as the third case but with several
nodes detect that the first, second and third leaders failed and
all of them will run Adaptive / Normal Bully Algorithm but
it’s not clear in adaptive bully how it deals with several nodes
Figure6. Leader-Deputies Algorithm Case 2 . in the case of finding the HPI & NHPI nodes both failed
The third case when a node finds that the first, second and in normal bully we will get traffic more than O(N2 ) so
and third leaders failed as we see in figure7 then it applies we have made a modification in both adaptive / normal bully.
(Adaptive / Bully Algorithm)1 to elect the highest priority
as the first leader then the first leader sends a raising message 1) Modified Adaptive Bully Algorithm: In case HPI &
to the next available node X then node X will reply with NHPI nodes both fail and the discovering node decides to
okay message. Node X finds that it got the message from the declare itself as the new leader there may rise a conflict when
first leader so it becomes the second leader and informs all more than one node (Candidate Nodes) each sends “I’m the
nodes that then send a raising message to the next available Coordinator” message to inform its the new elected leader. So
node Y then node Y will reply with an okay message. Node we came up with a solution which handles this case. Candidate
Y finds that it got the message from the second leader so it nodes send a leader candidacy announcement messages
becomes the third leader and informs all nodes. (LCDM) first to inform the rest of nodes that it’s a candidate
leader, then every candidate will wait for time interval T for
receiving a (LCDM) from other candidate(s), then we have
two cases:
• If the received (LCDM) is from a candidate with a
lower priority than its priority, it still considers itself as
a candidate.
• If the received (LCDM) is from a candidate with a
higher priority than its priority, it won’t consider itself
as a candidate anymore.

At the end within a time interval, there will remain only a


node which still considers itself as a candidate, it will send
“I’m the Coordinator” message to all nodes to inform them
that it is the new elected leader.
Figure7. Leader-Deputies Algorithm Case 3 .

The fourth case when the second or the third leader


has failed but the first leader still alive. The system will
act normally till the first leader fails and we will have two
scenarios here:
• It will become case two or three.
• When the first and third leader have failed, the node
will send a raising message to the second leader and Figure9. Modified Adaptive Bully Algorithm with LCDM.
the second leader will reply with an okay message. The
second leader finds that it got the message from normal Advantages
node so it becomes the first leader and informs all nodes • Number of messages and election time are very small.
then it sends a raising message to the third leader and • The worst case is the same as the average case, the
the third leader will not reply so the new first leader will traffic is O(N).
act as being the only existing.
Disadvantages into five main cases, and handled them all as stated in the
• The highest priority node is not always the leader methodology part. For the future work, we are willing to
transform this idea into neat pseudo code then implement it
2) Modified Normal Bully Algorithm: It is the same as completely.
the bully algorithm but when several nodes detect at the time
that the leader has failed they send election messages to the V. ACKNOWLEDGMENT
higher nodes then when a node gets an election message
We would really like to thank everyone who helped us
while it is waiting for the election process to end(that’s mean
brought this paper out, and did not hesitate to provide us
it sent an election message and waiting for the election to
with the support we need to improve this content and make
end), it will ignore it.
it valuable.
Advantages
• Number of messages and election time are very small in R EFERENCES
the average case. [1] M. Abdullah, I. Al-Kohali, and M. Othman, “An Adaptive Bully Algo-
• The highest priority node is always the leader. rithm for Leader Elections in Distributed Systems,” 2019.
[2] M. Afshari, M.Gholipour, M.Jahanshahi, and A.T.Haghighat, “Modified
Bully Election Algorithm in Distributed Systems,” Aug. 2005.
Disadvantages [3] Tanwar, K. Kanth, S. Kumar, Abhishek, and M. Papoutsidakis, “Opti-
mized Approach to Electing Coordinator Out of Multiple Election in a
• The traffic is high in the worst case(when more than one Ring Algorithm of Distributed System,” Feb. 2018.
node finds out that all leaders are down.) , the traffic is [4] M. Khan, N. Agarwal, S. Jaiswal, and J. A. Khan, “An Announcer Based
around O(N2 ). Bully Election Leader Algorithm in Distributed Systems,” 2018.
[5] M. Zargarnataj, “New Election Algorithm based on Assistant in Dis-
tributed Systems,” May 2007.
The Old Leader [6] M.s.Kordafshari, M. Jahanshahi, and A. M. Rahmani, “A New Approach
for Election Algorithm in Distributed Systems,” Aug. 2009.
When the old leader comes back again it sends a leader
message to all nodes then all nodes ignore the message except
the first leader and we have two cases here.

The first case If the old leader ID is greater than The first
leader ID, The first leader will inform all nodes(including
the old leader) that the old leader became the first leader
again and the first leader will become the second leader
and inform all nodes then send a falling message to the
second leader. The second leader will become the third
leader and inform all nodes then send a falling message
to the third leader. The third leader will become a normal node.

The second case If the old leader ID is smaller than


The first leader ID, the first leader will inform its ID to the
old leader to store it as the first leader then the first send a
supported-check message to the second leader to check its ID
with the old leader and so on till the old leader becomes the
second or third leader or normal node.

IV. C ONCLUSION
After the analysis we made for the bully algorithm, the
modified and the adaptive bully algorithms, we proposed
a new algorithm for leader election which is the Leader-
deputies algorithm. Our algorithm is better in both messaging
complexity (from O(N2) to O(N)), and time-down due to
the presence of backup leaders. We based our algorithm on
the idea of the “leader collaboration” consisting of one main
leader and two leader assists having the higher 3 priorities
in the alive nodes. All nodes know the three leaders but
they communicate with the first leader as the master and the
other two leaders as normal nodes. We broke up the system

View publication stats

You might also like