Professional Documents
Culture Documents
Leader Deputies Algorithmfor Leader Electionin Distributed Systems
Leader Deputies Algorithmfor Leader Electionin Distributed Systems
Leader Deputies Algorithmfor Leader Electionin Distributed Systems
net/publication/346932388
CITATIONS READS
0 406
4 authors, including:
Some of the authors of this publication are also working on these related projects:
Leader Deputies Algorithm for Leader Election in Distributed Systems View project
All content following this page was uploaded by Ayman Azzam on 11 December 2020.
I. I NTRODUCTION Bully algorithm. This paper [2] explains the bully algorithm
Generally, in distributed computing systems, a job can be and the useless overhead traffic that comes with it then adds
divided into sub-jobs and distributed among different nodes a modification for it to solve this issue.
(machines). For this job to be able to get distributed, there
must be a means of communication between those different
A. Bully Algorithm
nodes. One of the common patterns is to choose a leader
(coordinator) from those nodes. This leader is to manage If node P discovered that the leader is down it sends
and organize the distribution of tasks and the communication election messages to the other nodes with higher priorities
among all the nodes in the system. This approach has many then when they receive the elective messages they send
pros and cons. though the pros are more, they include the responses to P to stop. Every node of which received an
minimization of the coordination among the nodes, increasing election message sends election messages to the higher
the efficiency of the system and simplifying the architecture priority nodes and so on until the higher one wins. The higher
by making it hierarchical. node sends to all lower nodes coordination messages to tell
them that it’s the coordinator.
But, when speaking about leader election we must bear
in mind two serious issues. First, how to choose the leader
node, and second, what node will replace the leader node
in case of a crash. This crash may happen due to either
software or hardware reasons. Either cases, we have to adopt
some algorithms to choose another leader (Leader Election
Algorithms) for the system in the least time-down and least
traffic possible. Essentially, there are two traditional and well
known algorithms which are, bully algorithm and ring token
algorithm. Many researchers have invested time and effort on
decreasing the complexity of messaging among the system
nodes. Some keep on modifying the later two and others
brought up new algorithms Achieving O(N2) or O(N log(N))
messaging complexity.
In this paper we present an new algorithm inspired by Figure1. The bully election algorithm.
both, the traditional and the adaptive bully algorithms. The
adaptive bully algorithm is presented in [1]. Here, we not we For example as we see in figure1. (a) Node 3 holds an
managed to decrease the messaging traffic from O(N2) to election. (b) Nodes 4 and 5 respond on 3 with okay message.
O(N) in both average, and worst case, but we also decreased (c) Nodes 4 and 5 start an election. (d) Node 5 respond on
the time-down and increased the reliability of the system by 4 with okay message. (e) Node 5 tells everyone that ”it’s the
using a “leader collaboration” which consists of the main coordinator”.
leader and a two leader assist nodes. Mainly, the “leader
collaboration” is chosen to be the nodes with the currently
B. Modified Bully Algorithm
alive higher priority nodes in the system.
In case a node P discovers the leader is down, P sends
election messages to the set of the nodes with higher priorities
II. R ELATED W ORK S. Each of S sends a message with its priority to P. P sends a
In this section we will go through some papers that grant message to the node with the highest priority response
developed solutions to the Leader Election Problem. One of to grant it as a leader. The highest priority node sends
the most well-known algorithms in the leader election is the coordination messages to the rest of the nodes to inform
them that it’s the leader. In case set of nodes R discovered
the leader is down, each of R sends election messages to the
set of the nodes with higher priorities S. Each of S sends a
message with its priority to the node P which has the lowest
priority of nodes in R then P sends a grant message to the
node with the highest priority response to grant it as a leader.
The highest priority node sends coordination messages to the
rest of the nodes to inform them that it’s the leader. Figure3. The ring election algorithm.
Different approach In paper [1] they made a slightly Methodology: The cluster of nodes has a leader and an
different modification in the bully algorithm to reduce the assistant. The cluster itself is divided into sub-clusters where
number of messages used in the election process, they each one of them has a leader and an assistant too. Within the
managed to reduce the complexity of the number of messages sub-cluster both the leader and the assistant does not make
used from O(N2 ) to O(N). Instead of the time-consuming any additional functionality, their functionality appears when
ordinary election process in bully Algorithm that works as a the leader of the whole cluster crashes.
sequence of sending messages from many nodes to all the
nodes that have priority or ID higher than the sending one When a certain node n sends a message to the leader and
which in the worst case will ends up sending N2 , where it does not respond within a certain time(t) the node n sends
N represents the number of nodes in the system. Adaptive a message to the assistant telling it the leader has died, the
Bully Algorithm (ABA) use some variables to help making assistant by its turn sends a checking message to the leader
the election process faster, those variables are: to make sure it is actually dead. If the leader responds back
to the assistance, the assistant sends a message to the node n
• EV: which stores the id of the current leader. stating that the leader is still alive.
• Node ID: which stores the ID number of the node itself.
• HPI (Highest Process Identification) & NHPI (Next In case the leader did not reply to the assistant within a
HPI): which store the highest two IDs during the time limit, the assistant sends an “I am the leader” message
election process. to the whole system in order to keep the system working.
Afterwards, an election algorithm is carried on the leaders
When a node (Pd ) sends a request to the leader and the and the assistants of the sub-clusters, including the assistant
leader doesn’t answer, it considers the leader has failed and of the ex-leader. This election may result in a different leader
starts the election process by sending its ID to all the nodes and assistant other than the assistant of the ex-leader. Then
in the system, when node (Pq ) receives the ID of (Pd ), it the elected leader sends an “I am the leader” message to
compares it with its ID, if IDq ¿ IDp then it sends its ID to the whole system once more and the system returns back to
(Pd ), otherwise, it doesn’t send anything. When node (Pd ) the normal state.
receives again the IDs of all nodes which IDs are greater than
its one, it stores the highest two IDs in HPI & NHPI variables, The election algorithm tries to choose the optimal node.
then it sends to the node which ID is stored in HPI to tell The definition of optimal here can be summarized in two
it to send messages to the rest of the nodes to inform them main points:
that it is the leader, if it doesn’t receive an “OK” message
• The elected leader shall be the closest (physically) to all the three leaders but they communicate with the first leader
nodes. as the master and the other two leaders as normal nodes. We
• The elected leader shall have superior performance to have several cases.
all nodes.
Although Bully algorithm is simple but it causes heavy The first case when a node sends a message to the master
traffic over the network to elect new leader so in paper [6] the and it doesn’t respond, it assumes that the leader is failed
researchers came with new algorithm which is more effective then sends a raising message to the second leader as we
(in terms of number of messages and the number of stages see in figure5 then the second leader replies with an okay
decreased from at least 5 to at least 4 for electing new leader message. The second leader finds that it got the message from
as follows: a normal node so it becomes the first leader and informs all
nodes then it sends a raising message to the third leader and
There are K ordered(based on ID) nodes. The first is the third leader will reply with an okay message. The third
the leader and the others are alternatives to prevent global leader finds that it got the message from the first leader so
election between all nodes. it becomes the second leader and informs all nodes then it
sends a raising message to the next available node (the next
If node P notices that the leader is down it sends a higher priority and alive) and the next available node will
message to the first alternative to elect it to be the leader. reply with an okay message. The next available node finds
If the alternative is alive, It sends an ok message to the that it got the raising message from the second leader so it
node P which informs P that the alternative is alive then the becomes the third leader and informs all nodes.
alternative sends a message to the leader to make sure it is
down. If the leader replied with an ok message that means it
is alive and P made a mistake then the alternative informs P
that the leader is alive. But if the leader didn’t reply then the
alternative is selected as leader and broadcasts a message to
all nodes informing them it is the leader and introducing P
as its substitution. If the alternative is down, P sends another
election message to the next alternative and so on until any
of them becomes a leader. If all the alternatives are down,
Then P runs the Modified Election Algorithm as follows:
The second case when a node finds that the first and
III. M ETHODOLOGY second leaders have failed as we see in figure6 then it sends
We have come up with algorithms which are more reliable a raising message to the third leader and the third leader will
and have less messages and time complexity than Bully reply with an okay message. The third leader finds that it
Algorithm, Adaptive Bully Algorithm and Modified Bully got the message from normal node so it becomes the first
Algorithm. Our algorithm has an instantaneous leader sub- leader and informs all nodes then it sends a raising message
stitution so that will save time electing a new leader for the to the next available node X and node X will reply with an
substitution and has tasks sharing to save time for the leader okay message. Node X finds that it got the message from the
to deal with more requests with the nodes. first leader so it becomes the second leader and informs all
nodes that then sends a raising message to the next available
A. Leader-Deputies Algorithm node Y and node Y will reply with an okay message. Node
We have three important nodes: the first leader (the highest Y finds that it got the message from the second leader so it
priority), the second leader (the second highest priority) and becomes the third leader and informs all nodes.
the third leader (the third highest priority). All nodes know
Figure8. Leader-Deputies Algorithm when the first and third leader are
dead.
The fifth case is the same as the third case but with several
nodes detect that the first, second and third leaders failed and
all of them will run Adaptive / Normal Bully Algorithm but
it’s not clear in adaptive bully how it deals with several nodes
Figure6. Leader-Deputies Algorithm Case 2 . in the case of finding the HPI & NHPI nodes both failed
The third case when a node finds that the first, second and in normal bully we will get traffic more than O(N2 ) so
and third leaders failed as we see in figure7 then it applies we have made a modification in both adaptive / normal bully.
(Adaptive / Bully Algorithm)1 to elect the highest priority
as the first leader then the first leader sends a raising message 1) Modified Adaptive Bully Algorithm: In case HPI &
to the next available node X then node X will reply with NHPI nodes both fail and the discovering node decides to
okay message. Node X finds that it got the message from the declare itself as the new leader there may rise a conflict when
first leader so it becomes the second leader and informs all more than one node (Candidate Nodes) each sends “I’m the
nodes that then send a raising message to the next available Coordinator” message to inform its the new elected leader. So
node Y then node Y will reply with an okay message. Node we came up with a solution which handles this case. Candidate
Y finds that it got the message from the second leader so it nodes send a leader candidacy announcement messages
becomes the third leader and informs all nodes. (LCDM) first to inform the rest of nodes that it’s a candidate
leader, then every candidate will wait for time interval T for
receiving a (LCDM) from other candidate(s), then we have
two cases:
• If the received (LCDM) is from a candidate with a
lower priority than its priority, it still considers itself as
a candidate.
• If the received (LCDM) is from a candidate with a
higher priority than its priority, it won’t consider itself
as a candidate anymore.
The first case If the old leader ID is greater than The first
leader ID, The first leader will inform all nodes(including
the old leader) that the old leader became the first leader
again and the first leader will become the second leader
and inform all nodes then send a falling message to the
second leader. The second leader will become the third
leader and inform all nodes then send a falling message
to the third leader. The third leader will become a normal node.
IV. C ONCLUSION
After the analysis we made for the bully algorithm, the
modified and the adaptive bully algorithms, we proposed
a new algorithm for leader election which is the Leader-
deputies algorithm. Our algorithm is better in both messaging
complexity (from O(N2) to O(N)), and time-down due to
the presence of backup leaders. We based our algorithm on
the idea of the “leader collaboration” consisting of one main
leader and two leader assists having the higher 3 priorities
in the alive nodes. All nodes know the three leaders but
they communicate with the first leader as the master and the
other two leaders as normal nodes. We broke up the system