Professional Documents
Culture Documents
DC Midsems
DC Midsems
Definitions
• Resources:
– There are two types of resources in computer system
• Reusable resources
– They are fixed in number , neither can be created nor can be
destroyed
– To use the resource, the process must request for it, hold it
during the usage (allocation) and release it on completion
– The released resources may be re‐allocated to other process
– Example: Memory, CPU, Printer, Disk blocks
etc
• Consumable resources
– These resources will vanish once they are consumed
– Producer process can produce any number of consumable
resources if it is not blocked
– Example: Messages, Interrupt signals, V operation in semaphore
etc
Type of Resource Accesses
• Shared
– In this mode, the resource can be accessed by any number
of processes simultaneously
– Example: Read lock on data item
• Exclusive
– In this mode, the resource can be accessed by only one
process at any point of time
– Example: Write lock on data item
– In theory of deadlocks, mostly exclusive locks are
considered
– Reusable resources can be accessed in exclusive or shared
mode at a time
– Consumable resources always accessed on exclusive mode
Resource Request Model
• Single unit resource request model
– In this model, a process is allowed to request only one unit
of the resource at a time. The process is blocked till that
resource is allocated.
– Example: A transaction (process) request for the write lock
on data item [ write_lock(X)]
• AND request model
– In this model, a process is allowed to request multiple
resources simultaneously. It is blocked till all the resources
are available.
– Example: Consider the data item X is replicated at N sites.
A transaction request for write lock on X need to request
for locks at all N where X is located and is blocked till all
the write request is granted
Resource Request Model Contd.
• OR request model
– In this model, a process is allowed to request the multiple resources
simultaneously. However, it is blocked till at‐least one resource is
allocated.
– Example: Consider the data item X is replicated at N sites. A
transaction request for read lock on X need to request for locks at all N
where X is located. However, the transaction is blocked till at least one
of the read lock request is granted.
• AND‐OR request model
– Here the request of the process is specified in the form of a predicate
where its atoms / variables are the resources.
– Example: R1 AND (R2 OR R3)
• P out of Q request model
– Here, a process can simultaneously requests Q resources and will be
blocked till any P out of Q resources are available.
– Note that if P = 1, the model is OR request model; if P = Q, the model
is AND request model.
Deadlock ‐ General
• A set of processes is said to be in deadlock state, if each of them is
waiting for the resources to be released by the another process in
the set.
• Necessary condition for the deadlock:
– Mutual exclusion: ‐ Non sharable characteristic of the resources. Ex:
Memory location
– No pre‐emption:‐ The allocated resources cant be pre‐empted from
the process before its release by the process
– Hold and wait: The process holding some resources and waiting for
other resources
– Circular wait:‐ The processes are waiting for one another for resources
in a circular fashion
• Sufficient condition for the deadlock:
– Note that the above mentioned conditions are not sufficient to say
that the set of processes in deadlock. However, once the set of
processes are in deadlock then we can observe all of those conditions.
Hence they are necessary conditions.
Deadlock Handling Strategies
• Deadlock Prevention
– Idea: Resources are granted to the requesting processes in such a way
that there is no chance of deadlock ( Vaccination). For example,
allocating the resources requested if all available. Else wait for all. [ So
no hold and wait condition holds in that case.]
• Deadlock Avoidance
– Idea: Resources are granted as and when requested by the processes
provided the resulting system is safe. The system state is said to be
safe, if there exist at least one sequence of execution for all the
process such that all of them can run to completion without getting
into deadlock situation.
• Deadlock Detection and Recovery
– Idea: In this strategy, the resources are allocated to the processes as
and when requested. However, the deadlock is detected by deadlock
detection algorithm. If deadlock is detected, the system recovers from
it by aborting one or more deadlocked processes.
Distributed Deadlock Algorithms
Distributed Deadlock Prevention
• Basic Idea:
1. Each process is assigned a globally unique
timestamp using Lamports logical clock, process
number and site number [i.e., <logical clock value,
process id, site id>].
2. Every request from the process for the resource
should accompany the process timestamp.
3. The timestamp of requesting process (for the
resource) is compared with the one who is holding
the resource and suitable decision is made to
prevent the occurrence of deadlock.
Distributed Deadlock Prevention
• Algorithm for distributed deadlock prevention
– Suppose a resource R is held by P1 at some site,
and the process P2 requests R. Let TS(P1) and
TS(P2) are timestamps of P1 and P2 respectively.
– Wait‐die method:
• If TS(P2) < TS(P1) then P2 waits /* P2 is older */
• Else P2 is killed /* P2 is younger */
R P1 P2 Pn
TS(P1) > TS(P2) >
> TS(Pn)
POSSIBLE WAITING SEQUENCE FOR RESOURCES (Assumed that Pi waiting for
some resource hold by Pi‐1)
Distributed Deadlock Prevention
– Note on Wait‐die method:
1. P2 waits if resource holder (i.e. P1) is younger process
2. P2 is killed if the resource holder is older process
3. Killed process will be restarted with SAME timestamp,
will be older after some time and will not be killed
4. No circular wait condition will be hold in this method
� The waiting sequence, TS(P1) > TS(P2) >
> TS(Pn) leads to a
circular wait provided P1 waits for some resources hold by
Pn, i.e., P1 � Pn. This is possible only if TS(P1) < TS(Pn). This
contradict the waiting sequence, i.e., TS(P1)> TS(Pn)
5. No preemption of process (resource holder) in this
method. Here, the requester (P2) will either waits or
die.
Distributed Deadlock Prevention
– Wound‐wait method:
• If TS(P2) < TS(P1) then P1 is killed /* P2 is older */
• Else P2 is waits /* P2 is younger */
P1 P2 Pn
TS(P1) < TS(P2) < < TS(Pn)
POSSIBLE WAITING SEQUENCE FOR RESOURCES (Assumed that Pi
waiting for some resource hold by Pi‐1)
Distributed Deadlock Prevention
– Note on Wound‐wait method:
1. The older process never waits for the younger
resource holder
2. P1 is killed if the resource requester is older process
3. Killed process will be restarted with SAME timestamp,
will be older after some time and will not be killed
4. No circular wait condition will be hold in this method
� The waiting sequence, TS(P1) < TS(P2) <
< TS(Pn) leads to a
circular wait provided P1 waits for some resources hold by
Pn, i.e., P1 � Pn. This is possible only if TS(P1) > TS(Pn). This
contradict the waiting sequence, i.e., TS(P1) < TS(Pn)
5. There is preemption of process (resource holder) in
this method. Here, the requester (P2) will wait or
resource holder (P1) will be wounded.
Distributed Deadlock Prevention
• Method to handle more than one process waiting
on same resource (R):
– Method 1:
• At most only one process is allowed to wait for the resources
and all other processes are killed. If another process P3
requesting the same resource R, then Wound‐wait is
applied between P2 and P3 to select oldest. Then, either
Wound‐wait or Wait‐die method can be used between
oldest waiting process and P1, the resource holder.
– Method 2:
• The waiting processes are ordered in the increasing order of
their timestamps. A new process requesting for the resource
is made to wait if it is not the older than resource holder. If
so, then Wound‐wait method is applied between the new
process and resource holder
Distributed Deadlock Detection and
Recovery
• Two components in this strategy:
– Distributed Deadlock detection
– Distributed Deadlock recovery
• Distributed Deadlock detection
– Using Wait for graph (WFG)
• WFG is a directed graph (V,E), where the vertices are the
processes and the directed edge eij indicates that the
process Pi is waiting for the resource hold by the process Pj.
• The process Pi belong to any node in DCS.
• All the resources are assumed to be single unit based
Distributed Deadlock Detection
• In Single Unit Resource Request Model:
– A deadlock in this model is detected by existence
of cycle in WFG
– Note that a process can involve in only one cycle
P1 P3
P4
Distributed Deadlock Detection
• In AND Request Model:
– A deadlock in this model is detected by existence
of cycle in WFG
– Note that a process can involve in more than one
cycle P3 in both cycles. Note that P3
P2 is requested for the resources
hold by P1 and P5 . P3 is
holding the resources
requested by P2 and P6
P1 P3
P5
P4 P6
Distributed Deadlock Detection
• In OR Request Model:
– A cycle in the WFG is not sufficient condition for
the existence of the deadlock
– Note that a process can involve in more than one
cycle
P2
P6
P1 P3
P5
P4
Distributed Deadlock Detection
• In OR Request Model:
– A cycle in the WFG is not sufficient condition for
the existence of the deadlock
– Note that a process can involve in more than one
cycle Cycles in WFG does not implies the
P2 deadlock situation. This is because,
P1 is requested for the resources
hold by P2 ,P4 and P5. Once it get
P6
the resource hold by P4, the
P1 P3
request edges from P1 to P4 , P1 to
P5 and P1 to P2 will be removed and
hence there will not be any cycle
P5 and no deadlock.
P4
Distributed Deadlock Detection
• OR Request Model
– The necessary and sufficient condition for detecting
the deadlock is the presence of knot.
– Knot: A set of processes S is said to be a knot, if
• ∀Pi ∈ S,
– Dependency Set(Pi) ⊆ S and
– Dependecy Set(Pi) ≠ Φ
– Dependency Set of a process Pi (DS(Pi)) : Set of all
processes from which Pi is expecting the unit of
resources to be released.
– Knot implies deadlock in any resource request model
Distributed Deadlock Detection
• OR Request Model: An Illustrative example
P2
P6
P1 P3
P5
P4
Here, DS(P1) = {P2, P4, P5}, DS(P2) = {P3} , DS(P3)={P1} , DS(P4)={}, DS(P5)= {P6}, DS(P6)={P1}
Note that S= {P1, P2, P3} is not a knot, because DS(P1) is not in S. If you include P5 and P4
then, S = {P1, P2, P3, P4, P5} is again not a knot because DS(P4) is a null set. And so on.
Similar argument will follow for S= {P1, P5, P6} to show S is not a knot.
Distributed Deadlock Detection
• In AND‐OR Request Model:
– Presence of KNOT in WFG implies that the system
is in deadlock
• In P out of Q Request Model:
– Presence of KNOT in WFG implies that the system
is in deadlock
Requirements of Distributed Deadlock
Detection Algorithm
• If there is a deadlock, then the algorithm
should detect all such deadlocks. (i.e.,
algorithm should detect all deadlocks)
• If the algorithm says that there is a deadlock,
then definitely there should one. (i.e., no false
deadlock detection by the algorithm)
Pseudo Deadlock in Distributed
Environment
• Let P1, P2,
Pn be the sequence of processes such that
Pi is waiting for the release of resources hold by Pi+1
(where 1 ≤ i≤ n‐1).
• Let Pn releases the resources first and then request for
the resource hold by P1. For this, Pn will send the
message (M1) to the resource controller to release the
allocated resource for which Pn‐1 is waiting and then
send the message (M2) to the resource controller to
allocate the resource hold by P1.
• If M2 reaches first than M2 at deadlock detection
algorithm, then there is a false / pseudo deadlock!
P1 P2 P3 . . . Pn‐1 Pn Release (M1)
Request (M2)
Distributed Deadlock Detection
Algorithms
• Centralized Approach
T1 R1
T1 R1 R1
R2 T2
R2
T2
T3 R3
T3 R3
Site S1 Site S2 Coordinator Site
Local Wait For Graphs Global Wait For Graph
T1 R1
T1 R1 R1
R2 T2
R2
T2
M2
T3 R3
T3 R3
Site S1 Site S2 Coordinator Site
Local Wait For Graphs Global Wait For Graph
T1 R1
T1 R1 R1
R2 T2
R2
T2
M2
T3 R3
T3 R3
Site S1 Site S2 Coordinator Site
Local Wait For Graphs Global Wait For Graph
T1 R1
T1 R1 R1
R2 T2
R2
T2
M2
T3 R3
T3 R3
Site S1 Site S2 Coordinator Site
Local Wait For Graphs Global Wait For Graph
Solution: Timestamp based messages; ordering at Coordinator
Handling of Pseudo Deadlock
• Pseudo deadlocks can be handled using
timestamps based on Lamports clock value.
• Every messages pertaining to LWFG from the
local site to the coordinator carries the
timestamp.
• If the coordinator observe the cycle due to the
message (M) from the local site, then the
coordinator will broadcast that is there any
message having the timestamp lesser than M?
• The decision about the cycle will be taken after
the receipt of all the acknowledgements from the
local site.
Handling of Pseudo Deadlock
• In the above example, since T3 released R2 and
then requested R3, M1 timestamp should be
smaller than M2.
• When the coordinator receives M2, it suspect the
deadlock and it sends the message asking that if
anyone has the message with timestamp lesser
than M2. For this, site S1will send the positive
acknowledgement regarding M1. The coordinator
now reforms the GWFG with M1 first and then
M2, hence no deadlock.
Chandy Misra Haas Algorithm
• Here, processes are allowed to request multiple
resources at a time, so the process may wait for two or
more resources.
• The process either wait for the resources hold by their
co‐processes (i.e., process in the same system) or by
the processes exist in other machine.
• The algorithm is invoked when the process has to wait
for the resource.
• For this, the process will generates probe message and
send it to the process from whom it is waiting for the
resource
Chandy Misra Haas Algorithm
• The probe message consists of three components:
– probe originator‐id, sender‐id, receiver‐id
• When the process receives the probe message, it
checks whether it is waiting for any process(s), If so, it
update the probe message by updating its 2nd and 3rd
fields and forward it to the process(es) for whom it is
waiting.
• If the probe message goes all the way round and comes
back to the originator (i.e., probe originator‐id ==
receiver id), then the set of processes along the path of
probe message are in deadlock.
• The probe initiator may identify itself as the victim and
will commit suicide to break the deadlock.
Chandy Misra Haas Algorithm
The probe message initiated by process 0 is shown below: The arrow mark from
process i to j indicates that process i is waiting for the resource hold by j
(0,8,0)
(0,4,6) 6 8
4
(0,2,3)
0 1 2 3
5 7
(0,5,7)
Site 0 Site 1 Site 2
Probe message from process 0 is forwarded
by process 2
Probe message (0,0,1) initiated by process 0, since it is waiting for the resource hold by process
1
Once process 0 receives the probe (0,8,0), then it realize that it is in the set of processes under
deadlock. So, it will identify itself to commit suicide to break the deadlock.
Chandy Misra Haas Algorithm
• The problem:
– In the above example, it is possible that the processes, 0,1,2,3,4,6,8
may initiate the probe messages and identify themselves as the victim
to commit suicide.
– This leads to unnecessarily termination of many processes in the same
deadlock path.
• Solution:
– The process ids along the way is attached to the probe message in the
form of a queue.
– When the probe message comes back to the originator, it sees the
process with highest / lowest process id in the queue and selects it as
victim and sends the message to the victim to commit suicide by itself.
– Even though many processes may initiate the probe messages, the
same set of processes are identified in a cycle. Further, there is a
single highest / lowest process id in that cycle. Hence, only one victim
is selected.
Summary
• Generic resource request models are
discussed
• Distributed deadlock prevention algorithms
and distributed deadlock detection and
recovery algorithms are outlined.
References
• Advanced Operating Systems
– M Singhal and N G Shivarathri, McGraw Hill,
International
• Distributed Operating Systems
– A S Tanenbaum, Prentice Hall
• Distributed Algorithms
– Nancy A Lynch, Morgan Kaufman
Load Balancing in Distributed
System
CPU Scheduling ‐ Conventional
• Issue: In multiprogramming environment (with
single CPU), which job is scheduled to the
processor NEXT?
• Need: To allocate the job for execution
J DIFFERENT SCHEDULING TECHNIQUES:
1. FIRST COME FIRST SERVE
O 2.
3.
4.
PRIORITY BASED
ROUND ROBIN BASED
MULTI LEVEL FEEDBACK QUEUES
.
.
B 5. ETC.
S
Load (Job) Scheduling
• Issue: In distributed environment, which job is
scheduled to which distributed processor?
• Need: To allocate the job for execution
JOB
JOB
.
.
.
?
JOB
Load Balancing
• Issue: Redistribution of processes/task in the DCS.
• Redistribution: Movement of processes from the heavily
loaded system to the lightly loaded system
• Need: To improve the Distributed Systems’ throughput
? ?
Process
Process
Job Scheduling
Local queue(s)
. . . CPU SITE 1
Job scheduler Local queue(s)
. . . CPU SITE 2
Stream of jobs
Local queue(s)
. . . CPU SITE N
Can be considered as a QUEUEING MODEL ‐ MULTI JOB MULTI QUEUES SYSTEM
Job Scheduling Policies
• Random:
– Simple and static policy
– The job scheduler will randomly allocate the job to the site i
with some probability pi, where Σpi = 1
– No site state information is used
• Cycle:
– The job scheduler will allocate the job to the site i, if the
previous job allocation was to the site i‐1 using the function (i‐
1)+1 mode N
– It is semi static policy, where in the job scheduler remembers
the previously allocated site.
• Join the Shortest Queue (JSQ):
– The job scheduler remembers the size of local queue in each
site.
– The job will be allocated to the queue which is shortest at the
point of arrival of that job.
Job Scheduling Policies – Parameters
of Interest
• Mean response time of jobs
• Mean delay of the jobs
• Throughput of the system
• Obviously the JSQ is having better edge over
other two in terms of these parameters.
Load Balancing
• Basically two types:
– Sender Initiated Load Balancing
– Receiver Initiated Load Balancing
Sender Initiated Load Balancing
Components of Sender Initiated Load
Balancing
• Idea: Node with the higher load (sender) initiate
the load balancing process.
• Transfer Policy
– Policy about whether to keep the task/process at that
site or transfer to some other site (or node)
• Location Policy
– If decided to transfer, policy about where to transfer?
• Note that any load balancing algorithm should
have these to components
Transfer Policy
• At each node, there is a queue
• If queue length of a node < τ (threshold)
– Originating task is processed in that node only
• Else
– Transfer to some other node
Local queue
. . . CPU
τ
• In this policy each node uses only local state
information
Location Policy
• Random Policy
• Threshold Location Policy
• Shortest Location Policy
Random Policy
• Node status information are not used
• Destination node is selected at random and task is
transferred to that node
• On receipt of the task, the destination node will do the
following:
– If the threshold of the node is < τ, then accept the task
– Else transfer it to some other random node
• If number of transfers reach some limit, Llimit then, the
last recipient of the task has to execute that task
irrespective of its load. This is to avoid unnecessary
thrashing of jobs.
Threshold location policy
• Uses node status information for some extent about
the destination nodes.
• Selects the node at random. Then, probe that node to
determine whether the transferring task to that node
would place its load above the threshold.
– If not, the task is transferred and the destination node has
to process that task regardless of its state when the task
actually arrives.
– If so, select the another node at random and probe in the
same manner as above.
• The algorithm continues either suitable destination is
found or number of probes reaches some limit, Tlimit. If
the number of probes > Tlimit then, the originating node
should process the task
Shortest Location Policy
• Uses additional information about the status of other
node to make “best” choice of the destination node.
• In this policy, Lp number of nodes are chosen at
random and each is polled to determine their queue
length.
• The task is transferred to the node having shortest
queue length among the one with threshold < τ
• If none exist with the threshold < τ, then the next Lp
number of nodes are polled and above step is repeated
• Once group of node selection reaches some limit, Ls,
then the originator should handle the task
Receiver Initiated Load Balancing
Components of Receiver Initiated Load
Balancing
• Idea: Under loaded node (receiver) initiate the
load balancing process. Receiver tries to get the
task from overloaded node, sender.
• Transfer Policy (threshold policy)
– Where the decision is based on CPU queue length. If it
falls below certain threshold, τ, the node is identified
as receiver for obtaining task from the sender
• Location Policy
– If decided to receive, policy about from where to
receive?
Location Policy
• Threshold Location Policy
– A random node is probed to see whether it can
become a potential sender. If so, the task is
transferred from the polled node. Else, the process is
repeated until a potential sender is found or number
of tries reaches a PollLimit.
• Longest Location Policy
– A pool of nodes are selected and probed to find the
potential sender with longest queue length (greater
than τ). If found, then the task is received from that
sender. Else the above process is repeated with new
pool of nodes.
Drawback of Receiver Initiated
Algorithm
• Most of the tasks selected for transfers from
the senders are all preemptive one.
– The reason is: The job scheduler always gives
higher priority for allocating the fresh job to the
processor compared to the existing processes at
different stages of execution. So, by the time the
receiver decide to pick the task, it underwent
some execution.
Symmetrically Initiated Algorithm
• These are algorithms having both sender
initiated and receiver initiated components.
• Idea is that at low system loads the sender
initiated component is more successful in
finding the under loaded nodes and at high
system loads the receiver initiated component
is more successful in finding the overloaded
nodes.
Symmetrically Initiated Algorithm
• Above Average Algorithm
• Adaptive Algorithms
– Stable Symmetrically Initiated Algorithm
– Stable Sender Initiated Algorithm
Above Average Algorithm
• Idea: There is ‘acceptable range (AR)’ in terms of load is
maintained.
– The node is treaded as sender if its load > AR
– The node is treated as receiver if its load < AR
– Else it is a balanced node.
• Transfer Policy: AR is obtained from two adaptive
thresholds and they are equidistance from estimated
average load of the system
– For example, if the estimated average load of the system =
2, then lower threshold (LT) = 1 and upper threshold (UT) =
3
– So, if the load of the node is <=LT then it is a receiver node.
If the load of the node is >= UT, then it is a sender node.
Balanced node other wise.
Above Average Algorithm – Contd.
• Location Policy: Consists of two components
– Sender Initiated Component:
1. The node with load > AR is called sender. The sender broadcasts
TOOHIGH message, sets TOOHIGH timeout alarm and listen for
ACCEPT message
2. On receipt of TOOHIGH message, the receiver (whose load < AR)
• cancels its TOOLOW timeout alarm
• sends ACCEPT message to the node which has sent TOOHIGH message
• increments its load value
• set AWAITINGTASK timeout alarm
• if no task transfer within AWAITINGTASK period, then its load value is
decremented
3. On receipt of ACCEPT message, sender sends the task to the
receiver. [Note that the broadcasted TOOHIGH message will be
received by many receiver and for the first ACCEPT message, the
sender will transfer the task].
4. On expiry of TOOHIGH timeout period, if no ACCEPT message is
received by the sender, then sender infers that its estimated
average system load is too low. To correct the problem, it
broadcasts CHANGEAVERAGE message to increase the average
estimated load at all other sites.
Above Average Algorithm – Contd.
– Receiver Initiated Component:
1. Receiver broadcasts TOOLOW message, sets TOOLOW
timeout alarm and wait for TOOHIGH message.
2. On receipt of TOOHIGH message, perform the
activities as in step 2 of Sender Initiated Component
3. If TOOLOW timeout period expires, then it infers that
its estimated average system load is too high and
broadcasts CHANGE AVERAGE message to decrease
the estimated average load at all sites.
Stable Symmetrically Initiated
Algorithm
• Idea: In this algorithm, the information gathered during
polling is used to classify the node as SENDER,
RECEIVER or BALANCED.
• Each node maintains a list for each of the class.
• Since this algorithm updates its lists based on what it
learns from (or by) probing, the probability of
selecting the right candidate for load balancing is high.
• Unlike Above average algorithm, no broadcasting,
hence the number of messages exchanges are less.
• Initially each node assumes that every other node is
RECEIVER except itself. So, the SENDER and BALANCED
lists are empty to start with.
Stable Symmetrically Initiated
Algorithm – Contd.
• Transfer Policy:
– This policy is triggered when the task originates or
departs.
– This policy uses two thresholds: UT (upper
threshold) and LT (lower threshold).
– The node is sender, if its queue length > UT, a
receiver, if its queue length < LT and balanced if
LT ≤ queue length ≤ UT
Stable Symmetrically Initiated
Algorithm – Contd.
• Location Policy: Has two components
– Sender Initiated Component:
1. When the node becomes sender, it polls the node at the head of its
RECEIVER list. The polled node removes the sender under
consideration from its RECEIVER list and put it into the head of its
SENDER list (i.e., it learns!). It also informs to the sender that
whether it is a sender or a receiver or a balanced node.
2. On receipt of the reply the sender do the following:
• If the polled node is receiver, sender transfers the task to it and updates its
list (putting the polled node at the head of RECEIVER or BALANCED list)
• Otherwise, updates the list (putting the polled node at the head of SENDER
or BALANCED list) and again start polling the next node in its RECEIVER list
3. The polling process stops if
• The receiver is found or
• RECEIVER list is empty or
• Number of polls reaches POLL‐LIMIT
4. If polling fails, the arrived task has to be processed locally. However,
there is a chance of migration under preemptive category.
Stable Symmetrically Initiated
Algorithm – Contd.
– Receiver Initiated Component:
1. When the node becomes receiver, it polls the node at its head of
SENDER list. The polled node updates its list (i.e., places this
node at the head of RECEIVER list). It also informs the receiver
that whether it is a sender or a receiver or a balanced node.
2. On the receipt of reply the receiver do the following:
• If the responded node is a receiver or a balanced node, then its list is
updated accordingly.
• Otherwise (i.e., responded node is a sender), the task sent by it is
received and the list is updated accordingly
3. The polling process stops if:
• The sender is found or
• No more entries in the SENDER list or
• The number of polls reaches POLL‐LIMIT
Note that at high load, receiver initiates the poll and at low load
sender initiates the poll. So this will improve the performance.
Stable Sender Initiated Algorithm
• In this algorithm, there is no transfer of tasks
when a node becomes receiver. Instead, its status
information is shared. Hence there is no
preemptive task transfers.
• The sender initiated component is same as that
of stable symmetrically initiated algorithm. (like
list generation, learning the status via polling
…etc)
• Stable sender initiated algorithm maintains an
array (at each node) called status vector of size =
number of nodes in DCS.
Stable Sender Initiated Algorithm
Status vector
1 2 j N
. . . . . . Node i
‐The entry j in the status vector of node i indicates the best guess (receiver /
sender /balanced node) about node i by the node j
‐SENDER INITIATED COMPONENT
‐ When the node become sender, it polls the node (say j) at the head of its
RECEIVER list
‐ The sender updates its jth entry of its status vector as sender.
‐ Likewise, the polled node (j) updates ith entry in its status vector based on
its reply it sent to the sender node.
‐ Note that above two aspects are additional information it learns along with
the other things as in sender component of stable symmetrically initiated
algorithm
‐RECEIVER INITIATED COMPONENT
‐ When the node becomes receiver, it checks its status vector and informs all
those nodes that are misinformed about its current state
‐ The status vector at the receiver side is then updated to reflect this changes
Stable Sender Initiated Algorithm
• Advantages:
– No broadcasting of messages by the receiver
about its status
– No preemptive transfer of jobs, since no task
transfers under receiver initiated component
– Additional learning using status vector reduces
unnecessary polling
Challenges Load Balancing Algorithm
• Scalability:
– Ability to make quicker decision about task transfers with lesser
efforts
• Location transparency:
– Transfer of tasks for balancing are invisible to the user.
• Determinism:
– Correctness in the result inspite of task transfers
• Preemption:
– Transfer of task to the node should not leads degraded
performance for the task generated at that node. So, there is a
need for preemption of task when the local task arrives at node
• Heterogeneity:
– Heterogeneity in terms of processors, operating systems,
architecture should not be hindrance for the task transfers.
Summary
• Differences between CPU scheduling, Job
scheduling and load balancing are discussed.
• Different load balancing algorithms are
discussed. They are categorized under sender
initiated, receiver initiated, symmetrically
initiated and variations of symmetrically
initiated algorithms are discussed.
References
• Advanced Operating Systems
– M Singhal and N G Shivarathri, McGraw Hill,
International
• Distributed Operating Systems
– A S Tanenbaum, Prentice Hall
• Distributed Systems Concepts and Design
– G Coulouris and J Dollimore, Addison Wesley
Leader Election
Leader Election
Leader election is the process of designating a single process as the
organizer of some task distributed among several computers (nodes).
Before the task is begun, all network nodes are either unaware which
node will serve as the "leader" (or coordinator) of the task, or unable to
communicate with the current coordinator.
After a leader election algorithm has been run, however, each node
throughout the network recognizes a particular, unique node as the task
leader.
The network nodes communicate among themselves in order to decide
which of them will get into the "leader" state.
For that, they need some method in order to break the symmetry among
them.
For example, if each node has unique and comparable identities, then
the nodes can compare their identities, and decide that the node with the
highest identity is the leader.
Leader Election
The problem of leader election is for each node eventually to
decide whether it is a leader or not, subject to the constraint that
exactly one node decides that it is the leader .
Requirements:
at most one process in critical section (safety)
if more than one requesting process, someone
enters (liveness)
a requesting process enters within a finite time (no
starvation)
requests are granted in order (fairness)
Classification of Distributed Mutual Exclusion
(DME) Algorithms
Token based
e.g. Suzuki-Kasami
Some Complexity Measures
Main idea
once Pi has received a REPLY from Pj,
it does not need to send a REQUEST to P j again
unless it sends a REPLY to Pj (in response to a
REQUEST from Pj)
no. of messages required varies between 0 and
2(n – 1) depending on request pattern
worst case message complexity still the same
Maekawa’s Algorithm
Synchronization delay =
2 *(max message transmission time)
Issues:
No starvation
Raymond’s Algorithm
Applications:
Checking “stable” properties, checkpoint &
recovery
Issues:
Need to capture both node and channel states
System cannot be stopped
No global clock
Some notations:
1. GS is consistent iff
for all i, j, 1 ≤ i, j ≤ n,
inconsistent(LSi, LSj) = Ф
2. GS is transitless iff
for all i, j, 1 ≤ i, j ≤ n,
transit(LSi, LSj) = Ф
Causal Ordering: potential dependencies happened before
relationship casually orders events.
If a->b then a casually effects b
If a->b and b->a then a and b are concurrent (a | | b)
Logical Clock
A mechanism for capturing chronological and causal relationships in a
distributed system.
– Distributed systems may have no physically synchronous global
clock, so a logical clock allows global order
In logical clock systems each process has two data structures: logical
local time and logical global time.
– Logical local time is used by the process to mark its own events,
and logical global time is the local information about global time.
– A special protocol is used to update logical local time after each
local event, and logical global time when processes exchange
data
Logical clocks are useful in 1) computation analysis, 2) distributed
algorithm design, 3) individual event tracking, and 4) exploring
computational progress.
Lamport's clock
Each process Pi keep a clock Ci
Each event a in Pi is timestamped C(a), the value of C i is
when a occured.
Ci is incremented by 1 for each event in Pi.
if a is a send event of message m from process Pi to Pj ,
then on recieve of m,
Cj=max (Cj, C(a)+1)
Points to note:
If a->b , then C(a) < C(b)
-> is irreflexive partial order
Total ordering possible by arbitarily ordering concurrent
events by process numbers
Limitation:
a-> b implies C(a) < C(b)
BUT
C(a) < C(b) doesn't imply a->b !!
So not a true clock!
Solution: Vector Clocks
An algorithm for generating a partial ordering of events in a distributed
system and detecting causality violations.
3. Each time a process prepares to send a message, it sends its entire vector
along with the message being sent.
Each time a process receives a message,
It increments its own logical clock in the vector by one and
updates each element in its vector by taking the maximum of the value in
its own vector clock and
the value in the vector in the received message (for every element).
Ci is a vector of size n, where n is no. of processes
C(a) is similarly a vector of size n
Update rules:
The goal of this protocol is to ensure that messages are given to the
receiving processes in order of sending.
Unlike the Birman-Schiper-Stephenson protocol, it does not require
using broadcast messages.
Each message has an associated vector that contains information for
the recipient to determine if another message preceded it.
Clocks are updated only when messages are sent.
Schiper-Eggli-Sandoz Protocol...
Sending a message:
All messages are timestamped and sent out with a list of all the timestamps of messages
sent to other processes.
Locally store the timestamp that the message was sent with.
Receiving a message:
• A message cannot be delivered if there is a message mentioned in the list of timestamps
that predates this one.
If the new list has a timestamp greater than one we already had stored, update our
timestamp to match.
3. Check all the local buffered messages to see if they can be delivered.
Distributed Computing
Introduction
What is Distributed
Computing/ System?
Distributed computing
A field of computing science that studies distributed system.
The use of distributed systems to solve computational problems.
Distributed system
Wikipedia
There are several autonomous computational entities, each of which has its own local
memory.
The entities communicate with each other by message passing.
The components interact with each other in order to achieve a common goal.
Operating System Concept
The processors communicate with one another through various communication lines, such
as high-speed buses or telephone lines.
Each processor has its own local memory.
What is a distributed system?
Distributed program
A computing program that runs in a distributed system
Distributed programming
The process of writing distributed program
Fault tolerance
When one or some nodes fails, the whole system can still work fine except
performance.
Need to check the status of each node
Each node play partial role
Each computer has only a limited, incomplete view of the system.
Each computer may know only one part of the input.
Resource sharing
Each user can share the computing power and storage resource in the
system with other users
Load Sharing
Dispatching several tasks to each nodes can help share loading to the
whole system.
Easy to expand
We expect to use few time when adding nodes. Hope to spend no time if
possible.
What is a distributed system? Cont..
Time complexity:
– For synchronous systems, no. of rounds
– For asynchronous systems, different definitions are there.
Some Fundamental Problems
Ordering events in the absence of a global clock
Capturing the global state
Mutual exclusion
Leader election
Clock synchronization
Termination detection
Constructing spanning trees
Agreement protocols