Simulation Synchronous Algorithms PIII

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Asad Waqar Malik

Synchronous asad.malik@seecs.edu.pk

Algorithms
SEECS A108

Parallel and Distributed Simulation Systems - Richard Fujimoto


Synchronous Algorithms

❖ Deadlock detection and recovery algo. going through phases


❖ Deadlock detection
❖ Deadlock recovery
❖ Rather than relying on the system becoming deadlocked, to control this, the
algorithms relay on mechanisms called barrier synchronizations
❖ “Barrier is a programming construct that defines a point in wallclock time
when all the processes have stopped”
Barrier Mechanism

❖ Programming construct
❖ Define a point when all processors participating have stopped
❖ Processor execute barrier primitive, it blocks and remain blocked
❖ Until all processors execute there barrier primitive
Barrier Synchronization

❖ Barrier Synchronization: when a process invokes the barrier


primitive, it will block until all other processes have also invoked
the barrier primitive.
❖ When the last process invokes the barrier, all processes can resume
execution
Synchronous Execution

❖ Recall the goal is to ensure each LP processes events in time stamp order
❖ Basic idea: each process cycles through the following steps:
❖ Determine the events that are safe to process
❖ Compute a Lower Bound on the Time Stamp (LBTS) of events that LPi might
later receive
❖ Events with time stamp ≤ LBTS are safe to process
❖ Process safe events, exchange messages
❖ Global synchronization (barrier)

❖ Messages generated in one cycle are not eligible for processing until the next cycle
A Simple Synchronous Algorithm
❖ Assume any LP can communicate with any other LP
❖ Ni = time of next event in LPi
❖ LAi = lookahead of LPi
Barrier Using a Centralized Controller

❖ Central controller process used to implement barrier


❖ Overall, a two step process
❖ Controller determines when barrier reached
❖ Broadcast message to release processes from the barrier
❖ Barrier primitive for non-controller processes:
❖ Send a message to central controller!
❖ Wait for a reply
❖ Barrier primitive for controller process
❖ Receive barrier messages from other processes
❖ When a message is received from each process, broadcast message to release barrier
❖ Performance
❖ Controller must send and receive N-1 messages
❖ Potential bottleneck!
Broadcast Barrier

❖ One step approach; no central controller


❖ Each process:
❖ Broadcast message when barrier primitive is invoked
❖ Wait until a message is received from each other process
Tree Barrier
❖ Organize processes into a tree
❖ A process sends a message to its
parent process when
❖ The process has reached the
barrier point, and
❖ A message has been received
from each of its children
processes
❖ Root detects completion of barrier,
broadcast message to release
processes
Compute LBTS

❖ HOW WE CAN COMPUTE LBTS?


Compute LBTS

❖ Every process send


❖ min (NextEventTime + LA)
❖ To its parent node
❖ Parent node select the min
among child nodes, including its
own LBTS
❖ Root select the min LBTS and
disseminate the information in
the same fashion
The Transient Message Problem

❖ A transient message is a
message that has been sent, but
has not yet been received at its
destination
❖ The message could be “in the
network” or stored in an
operating system buffer
(waiting to be sent or delivered)
❖ The synchronous algorithm
fails if transient message(s)
remain after the processes are
released from the barrier
The Transient Message Problem

❖ Message arrives in LP(c)’s past


The Transient Message Solution
Flush Barrier

❖ No process will be released from the barrier until


❖ All processes have reached the barrier!
❖ Any message sent by a process before reaching the barrier has
arrived at its destination!
Flush Barrier
Flush Barrier

❖ Use FIFO communication channels


❖ Send a “dummy message” on each channel; wait until such a
message is received on each incoming channel to guarantee
transient messages have been received
❖ May require a large number of messages
Tree Flush Barrier

❖ When a leaf process reaches flush barrier, include counter (#sent - #received) in
messages sent to parent
❖ Parent adds counters in incoming messages with its own counter, sends sum in
message sent to its parent
❖ If sum at root is zero, broadcast “go” message, else wait until sum is equal to
zero
❖ Receive message after reporting sum: send update message to root
Synchronous Algorithms
Barrier Synchronization: when a process reaches a barrier synchronization, it must block until all
other processors have also reached the barrier.
process 1 process 2 process 3 process 4

- barrier -
wallclock
time wait - barrier -
- barrier -
wait wait
- barrier -

Synchronous algorithm
DO WHILE (unprocessed events remain)
barrier synchronization; flush all messages from the network
LBTS = min (Ni + LAi); Ni = time of next event in LPi; LAi = lookahead of LPi
all i
S = set of events with time stamp ≤ LBTS
process events in S
endDO
Variations proposed by Lubachevsky, Ayani, Chandy/Sherman, Nicol
Topology Information
ORD
4:00 2:00

6:00
LAX JFK

0:30 10:45

SAN
10:00

Global LBTS algorithm - does not exploit topology information


❖ Lookahead = minimum flight time to another airport
❖ Can the two events be processed concurrently?
❖ Yes because the event @ 10:00 cannot affect the event @ 10:45
❖ Simple global LBTS algorithm:
❖ LBTS = 10:30 (10:00 + 0:30)
❖ Cannot process event @ 10:45 until next LBTS computation
Distance Between LPs
❖ Associate a lookahead with each link: LAB is the lookahead on the link
from LPA to LPB
❖ Any message sent on the link from LPA to LPB must have a time stamp
of TA + LAB where TA is the current simulation time of LPA
❖ A path from LPA to LPZ is defined as a sequence of LPs: LPA, LPB, …,
LPY, LPZ
❖ The lookahead of a path is the sum of the lookaheads of the links along
the path
❖ DAB, the minimum distance from LPA to LPB is the minimum lookahead
over all paths from LPA to LPB
❖ The distance from LPA to LPB is the minimum amount of simulated time
that must elapse for an event in LPA to affect LPB
Distance Between Processes
The distance from LPA to LPB is the minimum amount of simulated time that must elapse
for an event in LPA to affect LPB
Distance Matrix:
11 D [i,j] = minimum distance from LPi to LPj
3
LPA LPB LPA LPB LPC LPD
4 LPA 4 3 1 3
3 1 4 1 LPB 4 5 3 1
min (1+2, 3+1)
2 LPC 3 6 4 2
LPC LPD LPD 5 4 2 4
2
13 15

• An event in LPY with time stamp TY depends on an event in LPX with


time stamp TX if TX + D[X,Y] < TY
• Above, the time stamp 15 event depends on the time stamp 11 event,
the time stamp 13 event does not.
Computing LBTS
Assuming all LPs are blocked and there are no transient messages:
LBTSi=min(Nj+Dji) (all j) where Ni = time of next event in LPi

11 Distance Matrix:
3
D [i,j] = minimum distance from LPi to LPj
LPA LPB
LPA LPB LPC LPD
4
1 1 LPA 4 3 1 3
3 4
2 LPB 4 5 3 1
min (1+2, 3+1)
LPC LPD LPC 3 6 4 2
2 LPD 5 4 2 4
13 15

LBTSA = 15 [min (11+4, 13+5)]


LBTSB = 14 [min (11+3, 13+4)]
LBTSC = 12 [min (11+1, 13+2)]
LBTSD = 14 [min (11+3, 13+4)]
Need to know time of next event of every other LP
Distance matrix must be recomputed if lookahead changes
Example
ORD
4:00 2:00

6:00
LAX JFK

0:30 10:45

SAN
10:00

Using distance information:


❖ DSAN,JFK = 6:30
❖ LBTSJFK = 16:30 (10:00 + 6:30)
❖ Event @ 10:45 can be processed this iteration
❖ Concurrent processing of events at times 10:00 and 10:45

You might also like