Professional Documents
Culture Documents
BITS Pilani: Distributed Computing Global State & Snapshot Recording Algorithms
BITS Pilani: Distributed Computing Global State & Snapshot Recording Algorithms
BITS Pilani: Distributed Computing Global State & Snapshot Recording Algorithms
Course No.- SS ZG526, Course Title - Distributed Computing 2 BITS Pilani, Hyderabad Campus
Introduction
•every distributed system component has a local state
•state of a process is characterized by
•state of its local memory
• history of process activity
•channel state is characterized by the set of messages sent
along the channel less the set of messages received along the
channel
•global state of a distributed system is a collection of the local
states of its components
Course No.- SS ZG526, Course Title - Distributed Computing 3 BITS Pilani, Hyderabad Campus
Introduction
•benefit of shared memory - up-to-date state of the entire system is
available to the processes sharing the memory
•absence of shared memory requires ways of getting a coherent and
complete view of the system based on the local states of individual
processes
•meaningful global snapshot can be obtained
•if the components of the distributed system record their local
states at the same time
• requires local clocks at processes to be perfectly synchronized
•existence of global system clock that could be instantaneously
read by the processes
Course No.- SS ZG526, Course Title - Distributed Computing 4 BITS Pilani, Hyderabad Campus
Introduction
•technologically infeasible to have perfectly synchronized clocks at
various sites
•clocks are bound to drift
•reading time from a single common clock maintained at one
process will not work
•indeterminate transmission delays during read operation causes
processes to identify various physical instants as the same time
•collection of local state observations will be made at different times
•may not be meaningful
Course No.- SS ZG526, Course Title - Distributed Computing 5 BITS Pilani, Hyderabad Campus
Introduction
Course No.- SS ZG526, Course Title - Distributed Computing 6 BITS Pilani, Hyderabad Campus
System Model
Course No.- SS ZG526, Course Title - Distributed Computing 7 BITS Pilani, Hyderabad Campus
System Model
•system can be
•represented as a directed graph
•vertices represent processes
•edges represent unidirectional communication channels
•both processes & channels have states
•process state consists of contents of
•processor registers
•stacks
•local memory
Course No.- SS ZG526, Course Title - Distributed Computing 8 BITS Pilani, Hyderabad Campus
System Model
Course No.- SS ZG526, Course Title - Distributed Computing 10 BITS Pilani, Hyderabad Campus
System Model
• receive event:
• changes state of the receiving process
• state of the channel on which the message is received
• LSi: state of process pi
• at an instant t, LSi: state of pi as a result of the sequence of events
executed by pi up to t
• an event e ∈ LSi iff e belongs to the sequence of events that have
taken pi to LSi
• e LSi iff e does not belong to the sequence of events that have taken
pi to LSi
Course No.- SS ZG526, Course Title - Distributed Computing 11 BITS Pilani, Hyderabad Campus
System Model
• channel state depends on the local states of the processes on which
it is incident
• for a channel Cij , intransit messages are:
transit(LSi, LSj) = {mij | send(mij) ∈ LSi rec(mij) LSj}
• if a snapshot recording algorithm records the states of pi and pj as
LSi and LSj , respectively,
• then it must record the state of channel Cij as transit(LSi, LSj)
Course No.- SS ZG526, Course Title - Distributed Computing 12 BITS Pilani, Hyderabad Campus
System Model
• several models of communication among processes exist
• different snapshot recording algorithms have assumed
different models of communication
• FIFO model:
• each channel acts as a first-in first-out message queue
• message ordering is preserved by a channel
• non-FIFO model:
• channel acts like a set
• sender process adds messages to the channel in a random
order
• receiver process removes messages from the channel in a
random order
Course No.- SS ZG526, Course Title - Distributed Computing 13 BITS Pilani, Hyderabad Campus
A Consistent Global State
• global state GS is a consistent global stateiff it satisfies the
following two conditions:
Course No.- SS ZG526, Course Title - Distributed Computing 14 BITS Pilani, Hyderabad Campus
A Consistent Global State
•Condition C1 states the law of conservation of messages:
•every message mij that is recorded as sent in the local state
of sender pi must be captured
•in the state of the channel Cij
•or in the collected local state of the receiver pj
•Condition C2 states that:
•in the collected global state, for every effect, its cause
must be present
•if a message mij is not recorded as sent in the local state of
pi, then it must neither be present in the state of the
channel Cij nor in the collected local state of the receiver pj
Course No.- SS ZG526, Course Title - Distributed Computing 15 BITS Pilani, Hyderabad Campus
Issues in recording a Global State
I1: How to distinguish between the messages to be recorded in the
snapshot (either in a channel state or a process state) from those
not to be recorded ?
I2: How to determine the instant when a process takes its snapshot?
Course No.- SS ZG526, Course Title - Distributed Computing 16 BITS Pilani, Hyderabad Campus
Snapshot Recording Algorithms
•2 types of messages:
•computation messages - exchanged by the underlying
application
•control messages - exchanged by the snapshot algorithm
Course No.- SS ZG526, Course Title - Distributed Computing 17 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
•uses a control message called marker
•after a site has recorded its snapshot, it sends a marker along all of its
outgoing channels before sending out any more messages
•channels are FIFO
•marker separates the messages in the channel into those to be
included in the snapshot (i.e., channel state or process state) from
those not to be recorded in the snapshot
•markers in a FIFO system act as delimiters
Course ID: SS ZG526, Title: Distributed Computing 18 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Course ID: SS ZG526, Title: Distributed Computing 19 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Marker sending rule for process pi
(1) Process pi records its state.
(2) For each outgoing channel C on which a marker has not been sent, pi sends
a marker along C before pi sends further messages along C.
Marker receiving rule for process pj
On receiving a marker along channel C:
if pj has not recorded its state then
Record the state of C as the empty set
Execute the “marker sending rule”
else
Record the state of C as the set of messages received along C after pj’s
state was recorded and before pj received the marker along C
Course ID: SS ZG526, Title: Distributed Computing 20 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Putting together recorded snapshots
•required for creating the global snapshot
•global snapshot creation at initiator:
•each process can send its local snapshot to the initiator of the
algorithm
•global snapshot creation at all processes:
•each process sends the information it records along all outgoing
channels
•each process receiving such information for the first time propagates
it along its outgoing channels
•all local snapshots get disseminated to all other processes
•all the processes can determine the global state
Course ID: SS ZG526, Title: Distributed Computing 21 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
•algorithm can be initiated by any process by executing the
marker sending rule
•termination criterion - each process has received a marker on
all of its incoming channels
•if multiple processes initiate the algorithm concurrently
•each initiation needs to be distinguished by using unique
markers
•different initiations by a process are identified by a sequence
number
Course ID: SS ZG526, Title: Distributed Computing 22 BITS Pilani, Hyderabad Campus
Chandy-Lamport Algorithm
•Markers shown by dashed-and-dotted arrows
•S1 initiates the algorithm just after t1
•S1 records its local state (account A=$550) and
sends a marker to S2
•marker is received by S2 after t4
•when S2 receives the marker, it records its
local state (account B=$170), state C12 as $0,
and sends a marker along C21
•when S1 receives this marker, it records the
state of C21 as $80
•$800 amount in the system is conserved in the
recorded global state
A = $550, B = $170, C12 = $0, C21 = $80
Course ID: SS ZG526, Title: Distributed Computing 23 BITS Pilani, Hyderabad Campus
Chandy-Lamport Algorithm
•Markers shown using dotted arrows
•S1 initiates the algorithm just after t0 and
before sending the $50 for S2
•S1 records its local state (account A = $600)
and sends a marker to S2
•marker is received by S2 between t2 and t3
•when S2 receives the marker, it records its
local state (account B = $120), state of C12 as
$0, and sends a marker along C21
•when S1 receives this marker, it records the
state of C21 as $80
•$800 amount in the system is conserved in the
recorded global state
A = $600, B = $120, C12 = $0, C21 = $80
Course ID: SS ZG526, Title: Distributed Computing 24 BITS Pilani, Hyderabad Campus
Variations of the Chandy–Lamport
Algorithm
Course ID: SS ZG526, Title: Distributed Computing 25 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Course ID: SS ZG526, Title: Distributed Computing 26 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
• second optimization
• deals with the efficient distribution of the global snapshot
• any process needs to take only one snapshot, irrespective
of the number of concurrent initiators
• all processes are not sent the global snapshot
• assumes bi-directional channels in the system
Course ID: SS ZG526, Title: Distributed Computing 27 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording
Course ID: SS ZG526, Title: Distributed Computing 29 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording
Course ID: SS ZG526, Title: Distributed Computing 31 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording
Course ID: SS ZG526, Title: Distributed Computing 32 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording
Course ID: SS ZG526, Title: Distributed Computing 33 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
• efficiently assembles the recorded snapshot in the following way:
• in the snapshot recording phase
• a forest of spanning trees is created in the system
• in this forest
• initiator of algorithm is the root of a spanning tree
• all processes in its region belong to its spanning tree
• if pi executed the “marker sending rule” in response to
receiving its first marker from pj
• then pj is the parent of pi in the spanning tree
Course ID: SS ZG526, Title: Distributed Computing 34 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
• in this forest (contd. from previous slide)
• when a leaf process in the spanning tree has recorded the
states of all incoming channels
• the process sends the locally recorded state (local
snapshot, id-border-set) to its parent in the spanning
tree
Course ID: SS ZG526, Title: Distributed Computing 35 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
Course ID: SS ZG526, Title: Distributed Computing 36 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
• initiator receives the locally recorded states of all its descendants
from its children processes
• initiator assembles the snapshot for all the processes in its region
and the channels incident on these processes
• initiator knows the identifiers of initiators in adjacent regions
using id-border-set information it receives from processes in its
region
• initiator exchanges the snapshot of its region with the initiators in
adjacent regions in rounds
Course ID: SS ZG526, Title: Distributed Computing 37 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
• in each round
• an initiator sends to initiators in adjacent regions, any new
information obtained from the initiator in the adjacent region
during the previous round of message exchange
Course ID: SS ZG526, Title: Distributed Computing 38 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
snapshot = O(rn2)
Course ID: SS ZG526, Title: Distributed Computing 39 BITS Pilani, Hyderabad Campus
Snapshot Algorithms for Non-FIFO
Channels
• FIFO system
• ensures that all messages sent after a marker on a channel will
be delivered after the marker
• in a non-FIFO system
• a marker cannot be used to differentiate messages into those to
be recorded in the global state from those not to be recorded in
the global state
Course ID: SS ZG526, Title: Distributed Computing 40 BITS Pilani, Hyderabad Campus
Snapshot Algorithms for Non-FIFO
Channels
• in a non-FIFO system, for recording a consistent global snapshot
• capturing out-of-sequence messages is necessary
• how??
• piggybacking of control information on computation messages
Course ID: SS ZG526, Title: Distributed Computing 41 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
Course ID: SS ZG526, Title: Distributed Computing 42 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
Coloring Scheme:
Course ID: SS ZG526, Title: Distributed Computing 43 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
Coloring Scheme:
Course ID: SS ZG526, Title: Distributed Computing 44 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
Coloring Scheme:
Course ID: SS ZG526, Title: Distributed Computing 45 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
• second observation –
• marker informs process pj of the value of {send(mij) |send(mij) ∈ LSi}
• compute the state of channel Cij as transit(LSi,LSj)
Course ID: SS ZG526, Title: Distributed Computing 46 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
Course ID: SS ZG526, Title: Distributed Computing 47 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
•marker messages are not required
•each process has to record the entire message history on each
channel as part of the local snapshot
•space requirements of the algorithm may be large
•size of the local storage and snapshot recording can be reduced
•by storing only the messages sent and received since the
previous snapshot recording, assuming that the previous
snapshot is still available
•approach is useful in applications that require repeated snapshots
of a distributed system
Course ID: SS ZG526, Title: Distributed Computing 48 BITS Pilani, Hyderabad Campus
Necessary and sufficient conditions for
consistent global snapshots
•local checkpoint – saved intermediate state of a process during its
execution
•global snapshot –
•set of local checkpoints one from each process
•represents a snapshot of the distributed computation execution at
some instant
•consistent global snapshot –
•no causal path between any two distinct checkpoints
•set of local states that occurred concurrently
•Cp,i - ith (i ≥ 0) checkpoint of process pp , assigned the sequence number i
•ith checkpoint interval of pp - all computation performed between (i−1)th
and ith checkpoints (and includes (i−1)th checkpoint but not ith).
Course ID: SS ZG526, Title: Distributed Computing 49 BITS Pilani, Hyderabad Campus
Necessary and sufficient conditions for
consistent global snapshots
•a causal path exists between checkpoints Ci,j and Cx,y if a sequence of
messages exists from Ci,j to Cx,y such that each message is sent after the
previous one in the sequence is received
•a zigzag path between two checkpoints is a causal path, however allows a
message to be sent before the previous one in the path is received, send and
receive are in the same checkpoint interval
Course ID: SS ZG526, Title: Distributed Computing 50 BITS Pilani, Hyderabad Campus
Necessary and sufficient conditions for
consistent global snapshots
•checkpoint C is involved in a zigzag cycle iff there is a zigzag path from C to itself
Course No.- SS ZG526, Course Title - Distributed Computing 52 BITS Pilani, Hyderabad Campus
Course No.- SS ZG526, Course Title - Distributed Computing 53 BITS Pilani, Hyderabad Campus