BITS Pilani: Distributed Computing Global State & Snapshot Recording Algorithms

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 53

Distributed Computing

Global State & Snapshot Recording


Algorithms
Dr. Barsha Mitra
BITS Pilani CSIS Dept, BITS Pilani, Hyderabad Campus
Hyderabad Campus
Introduction
•recording the global state of a distributed system on-the-fly is
required for analyzing, testing, or verifying properties
associated with distributed executions
•problems in recording global state
•lack of a globally shared memory
•lack of a global clock
•message transmission is asynchronous
•message transfer delays are finite but unpredictable
•problem is non-trivial

Course No.- SS ZG526, Course Title - Distributed Computing 2 BITS Pilani, Hyderabad Campus
Introduction
•every distributed system component has a local state
•state of a process is characterized by
•state of its local memory
• history of process activity
•channel state is characterized by the set of messages sent
along the channel less the set of messages received along the
channel
•global state of a distributed system is a collection of the local
states of its components

Course No.- SS ZG526, Course Title - Distributed Computing 3 BITS Pilani, Hyderabad Campus
Introduction
•benefit of shared memory - up-to-date state of the entire system is
available to the processes sharing the memory
•absence of shared memory requires ways of getting a coherent and
complete view of the system based on the local states of individual
processes
•meaningful global snapshot can be obtained
•if the components of the distributed system record their local
states at the same time
• requires local clocks at processes to be perfectly synchronized
•existence of global system clock that could be instantaneously
read by the processes

Course No.- SS ZG526, Course Title - Distributed Computing 4 BITS Pilani, Hyderabad Campus
Introduction
•technologically infeasible to have perfectly synchronized clocks at
various sites
•clocks are bound to drift
•reading time from a single common clock maintained at one
process will not work
•indeterminate transmission delays during read operation causes
processes to identify various physical instants as the same time
•collection of local state observations will be made at different times
•may not be meaningful

Course No.- SS ZG526, Course Title - Distributed Computing 5 BITS Pilani, Hyderabad Campus
Introduction

Course No.- SS ZG526, Course Title - Distributed Computing 6 BITS Pilani, Hyderabad Campus
System Model

•system consists of a collection of n processes, p1, p2,…, pn connected by


channels
•no globally shared memory
•processes communicate solely by passing messages
•no physical global clock in the system
•message send and receive are asynchronous
•message delivery is reliable, occurs within finite time but has arbitrary
time delay

Course No.- SS ZG526, Course Title - Distributed Computing 7 BITS Pilani, Hyderabad Campus
System Model

•system can be
•represented as a directed graph
•vertices represent processes
•edges represent unidirectional communication channels
•both processes & channels have states
•process state consists of contents of
•processor registers
•stacks
•local memory
Course No.- SS ZG526, Course Title - Distributed Computing 8 BITS Pilani, Hyderabad Campus
System Model

•process state is highly dependent on the local context of the


distributed application
•Cij: channel from process pi to process pj
•SCij:
•state of channel Cij
•consists of intransit messages
• 3 types of events
• internal events
• message send events
• message receive events
Course No.- SS ZG526, Course Title - Distributed Computing 9 BITS Pilani, Hyderabad Campus
System Model

• send(mij): send event of message mij from pi to pj


• rec(mij): receive event of message mij from pi to pj
• occurrence of events
• changes the states of respective processes and channels
• causes transitions in global system state
• internal event: changes the state of the process at which it occurs
• send event changes:
• state of the process that sends the message
• state of the channel on which the message is sent
• events at a process are linearly ordered by their order of occurrence

Course No.- SS ZG526, Course Title - Distributed Computing 10 BITS Pilani, Hyderabad Campus
System Model

• receive event:
• changes state of the receiving process
• state of the channel on which the message is received
• LSi: state of process pi
• at an instant t, LSi: state of pi as a result of the sequence of events
executed by pi up to t
• an event e ∈ LSi iff e belongs to the sequence of events that have
taken pi to LSi
• e  LSi iff e does not belong to the sequence of events that have taken
pi to LSi

Course No.- SS ZG526, Course Title - Distributed Computing 11 BITS Pilani, Hyderabad Campus
System Model
• channel state depends on the local states of the processes on which
it is incident
• for a channel Cij , intransit messages are:
transit(LSi, LSj) = {mij | send(mij) ∈ LSi  rec(mij)  LSj}
• if a snapshot recording algorithm records the states of pi and pj as
LSi and LSj , respectively,
• then it must record the state of channel Cij as transit(LSi, LSj)

Course No.- SS ZG526, Course Title - Distributed Computing 12 BITS Pilani, Hyderabad Campus
System Model
• several models of communication among processes exist
• different snapshot recording algorithms have assumed
different models of communication
• FIFO model:
• each channel acts as a first-in first-out message queue
• message ordering is preserved by a channel
• non-FIFO model:
• channel acts like a set
• sender process adds messages to the channel in a random
order
• receiver process removes messages from the channel in a
random order
Course No.- SS ZG526, Course Title - Distributed Computing 13 BITS Pilani, Hyderabad Campus
A Consistent Global State
• global state GS is a consistent global stateiff it satisfies the
following two conditions:

• C1: send(mij) ∈ LSi ⇒ mij ∈ SCij⊕ rec(mij) ∈ LSj (⊕: XOR)

• C2: send(mij) LSi ⇒ mijSCij ∧ rec(mij)  LSj

Course No.- SS ZG526, Course Title - Distributed Computing 14 BITS Pilani, Hyderabad Campus
A Consistent Global State
•Condition C1 states the law of conservation of messages:
•every message mij that is recorded as sent in the local state
of sender pi must be captured
•in the state of the channel Cij
•or in the collected local state of the receiver pj
•Condition C2 states that:
•in the collected global state, for every effect, its cause
must be present
•if a message mij is not recorded as sent in the local state of
pi, then it must neither be present in the state of the
channel Cij nor in the collected local state of the receiver pj
Course No.- SS ZG526, Course Title - Distributed Computing 15 BITS Pilani, Hyderabad Campus
Issues in recording a Global State
I1: How to distinguish between the messages to be recorded in the
snapshot (either in a channel state or a process state) from those
not to be recorded ?

I2: How to determine the instant when a process takes its snapshot?

Course No.- SS ZG526, Course Title - Distributed Computing 16 BITS Pilani, Hyderabad Campus
Snapshot Recording Algorithms

•2 types of messages:
•computation messages - exchanged by the underlying
application
•control messages - exchanged by the snapshot algorithm

Course No.- SS ZG526, Course Title - Distributed Computing 17 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
•uses a control message called marker
•after a site has recorded its snapshot, it sends a marker along all of its
outgoing channels before sending out any more messages
•channels are FIFO
•marker separates the messages in the channel into those to be
included in the snapshot (i.e., channel state or process state) from
those not to be recorded in the snapshot
•markers in a FIFO system act as delimiters

Course ID: SS ZG526, Title: Distributed Computing 18 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm

•all messages that follow a marker on channel Cij have been


sent by pi after pi took its snapshot

•pj must record its snapshot no later than when it receives a


marker on channel Cij

•a process must record its snapshot no later than when it


receives a marker on any of its incoming channels

Course ID: SS ZG526, Title: Distributed Computing 19 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Marker sending rule for process pi
(1) Process pi records its state.
(2) For each outgoing channel C on which a marker has not been sent, pi sends
a marker along C before pi sends further messages along C.
Marker receiving rule for process pj
On receiving a marker along channel C:
if pj has not recorded its state then
Record the state of C as the empty set
Execute the “marker sending rule”
else
Record the state of C as the set of messages received along C after pj’s
state was recorded and before pj received the marker along C
Course ID: SS ZG526, Title: Distributed Computing 20 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
Putting together recorded snapshots
•required for creating the global snapshot
•global snapshot creation at initiator:
•each process can send its local snapshot to the initiator of the
algorithm
•global snapshot creation at all processes:
•each process sends the information it records along all outgoing
channels
•each process receiving such information for the first time propagates
it along its outgoing channels
•all local snapshots get disseminated to all other processes
•all the processes can determine the global state
Course ID: SS ZG526, Title: Distributed Computing 21 BITS Pilani, Hyderabad Campus
Chandy–Lamport Algorithm
•algorithm can be initiated by any process by executing the
marker sending rule
•termination criterion - each process has received a marker on
all of its incoming channels
•if multiple processes initiate the algorithm concurrently
•each initiation needs to be distinguished by using unique
markers
•different initiations by a process are identified by a sequence
number

Course ID: SS ZG526, Title: Distributed Computing 22 BITS Pilani, Hyderabad Campus
Chandy-Lamport Algorithm
•Markers shown by dashed-and-dotted arrows
•S1 initiates the algorithm just after t1
•S1 records its local state (account A=$550) and
sends a marker to S2
•marker is received by S2 after t4
•when S2 receives the marker, it records its
local state (account B=$170), state C12 as $0,
and sends a marker along C21
•when S1 receives this marker, it records the
state of C21 as $80
•$800 amount in the system is conserved in the
recorded global state
A = $550, B = $170, C12 = $0, C21 = $80

Course ID: SS ZG526, Title: Distributed Computing 23 BITS Pilani, Hyderabad Campus
Chandy-Lamport Algorithm
•Markers shown using dotted arrows
•S1 initiates the algorithm just after t0 and
before sending the $50 for S2
•S1 records its local state (account A = $600)
and sends a marker to S2
•marker is received by S2 between t2 and t3
•when S2 receives the marker, it records its
local state (account B = $120), state of C12 as
$0, and sends a marker along C21
•when S1 receives this marker, it records the
state of C21 as $80
•$800 amount in the system is conserved in the
recorded global state
A = $600, B = $120, C12 = $0, C21 = $80
Course ID: SS ZG526, Title: Distributed Computing 24 BITS Pilani, Hyderabad Campus
Variations of the Chandy–Lamport
Algorithm

• several variants were proposed


• variants refined and optimized the basic algorithm
• variant: Spezialetti and Kearns algorithm
• refinements
• optimizes concurrent initiation of snapshot collection
• efficiently distributes the recorded snapshot

Course ID: SS ZG526, Title: Distributed Computing 25 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm

• 2 phases in obtaining a global snapshot:


• locally recording the snapshot at every process
• distributing the resultant global snapshot to all the
initiators
• Spezialetti and Kearns provided 2 optimizations to the
Chandy–Lamport algorithm
• first optimization
• combines snapshots concurrently initiated by multiple
processes into a single snapshot
• linked with the second optimization

Course ID: SS ZG526, Title: Distributed Computing 26 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm

• second optimization
• deals with the efficient distribution of the global snapshot
• any process needs to take only one snapshot, irrespective
of the number of concurrent initiators
• all processes are not sent the global snapshot
• assumes bi-directional channels in the system

Course ID: SS ZG526, Title: Distributed Computing 27 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording

• marker carries the identifier of the initiator of the algorithm


• every process has a variable master to keep track of the initiator
of the algorithm
• when a process executes the “marker sending rule” on the
receipt of its first marker
• it records the initiator’s identifier carried in the received
marker in the master variable
• process that initiates the algorithm records its own identifier in
the master variable
Course ID: SS ZG526, Title: Distributed Computing 28 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording

• notion of region is used in the system


• region –
• encompasses all the processes whose master field contains
the identifier of the same initiator
• identified by the initiator’s identifier
• if there are multiple concurrent initiators, the system gets
partitioned into multiple regions

Course ID: SS ZG526, Title: Distributed Computing 29 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording

• if the initiator’s identifier in a marker received along a channel is


different from the value in the master variable
• a concurrent initiation of the algorithm is detected
• implies that the sender of the marker lies in a different region
• identifier of the concurrent initiator is recorded in a local
variable id-border-set
• process receiving the marker
• does not take a snapshot for this marker
• does not propagate this marker
Course ID: SS ZG526, Title: Distributed Computing 30 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording

• the algorithm efficiently handles concurrent snapshot initiations


• suppresses redundant snapshot collections
• process does not take a snapshot or propagate a snapshot
request initiated by a process if it has already taken a snapshot in
response to some other snapshot initiation

Course ID: SS ZG526, Title: Distributed Computing 31 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording

• state of the channel is recorded just as in the Chandy–Lamport


algorithm (including those that cross a border between regions)
• enables the snapshot recorded in one region to be merged with
the snapshot recorded in the adjacent region
• if markers arriving at a node contain identifiers of different
initiators
• they are considered part of the same instance of the
algorithm for the purpose of channel state recording

Course ID: SS ZG526, Title: Distributed Computing 32 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient snapshot recording

• snapshot recording completion criterion at a process:


• after it has received a marker along each of its channels
• after every process has recorded its snapshot
• the system is partitioned into as many regions as the number
of concurrent initiations of the algorithm
• variable id-border-set at a process contains the identifiers of the
neighboring regions

Course ID: SS ZG526, Title: Distributed Computing 33 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
• efficiently assembles the recorded snapshot in the following way:
• in the snapshot recording phase
• a forest of spanning trees is created in the system
• in this forest
• initiator of algorithm is the root of a spanning tree
• all processes in its region belong to its spanning tree
• if pi executed the “marker sending rule” in response to
receiving its first marker from pj
• then pj is the parent of pi in the spanning tree
Course ID: SS ZG526, Title: Distributed Computing 34 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
• in this forest (contd. from previous slide)
• when a leaf process in the spanning tree has recorded the
states of all incoming channels
• the process sends the locally recorded state (local
snapshot, id-border-set) to its parent in the spanning
tree

Course ID: SS ZG526, Title: Distributed Computing 35 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot

• an intermediate process in a spanning tree


• receives the recorded states from all its children processes
• records the states of all incoming channels
• forwards its locally recorded state and the locally recorded
states of all its descendent processes to its parent

Course ID: SS ZG526, Title: Distributed Computing 36 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
• initiator receives the locally recorded states of all its descendants
from its children processes
• initiator assembles the snapshot for all the processes in its region
and the channels incident on these processes
• initiator knows the identifiers of initiators in adjacent regions
using id-border-set information it receives from processes in its
region
• initiator exchanges the snapshot of its region with the initiators in
adjacent regions in rounds

Course ID: SS ZG526, Title: Distributed Computing 37 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm
Efficient dissemination of recorded snapshot
• in each round
• an initiator sends to initiators in adjacent regions, any new
information obtained from the initiator in the adjacent region
during the previous round of message exchange

• round is complete when an initiator receives information, or the


blank message from all initiators of adjacent regions from which it
has not already received a blank message
• blank message signifies no new information is available

Course ID: SS ZG526, Title: Distributed Computing 38 BITS Pilani, Hyderabad Campus
Spezialetti–Kearns Algorithm

message complexity of assembling and disseminating the

snapshot = O(rn2)

r: number of concurrent initiations

n: number of processes in the system

Course ID: SS ZG526, Title: Distributed Computing 39 BITS Pilani, Hyderabad Campus
Snapshot Algorithms for Non-FIFO
Channels
• FIFO system
• ensures that all messages sent after a marker on a channel will
be delivered after the marker

• in a non-FIFO system
• a marker cannot be used to differentiate messages into those to
be recorded in the global state from those not to be recorded in
the global state

Course ID: SS ZG526, Title: Distributed Computing 40 BITS Pilani, Hyderabad Campus
Snapshot Algorithms for Non-FIFO
Channels
• in a non-FIFO system, for recording a consistent global snapshot
• capturing out-of-sequence messages is necessary
• how??
• piggybacking of control information on computation messages

• non-FIFO algorithm by Lai and Yang


• use message piggybacking to distinguish computation messages sent
after the marker from those sent before the marker

Course ID: SS ZG526, Title: Distributed Computing 41 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm

• based on 2 observations on the role of a marker in a FIFO


system
• first observation - a marker ensures that condition C2 is
satisfied for LSi and LSj when the snapshots are recorded
at processes pi and pj respectively
• fulfills this role of a marker in a non-FIFO system by using
a coloring scheme on computation messages

Course ID: SS ZG526, Title: Distributed Computing 42 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm

Coloring Scheme:

• every process is initially white


• process turns red while taking a snapshot
• equivalent of the “marker sending rule” is executed when
a process turns red
• every message sent by a white process is colored white
• a white message is a message that was sent before the
sender of that message recorded its local snapshot

Course ID: SS ZG526, Title: Distributed Computing 43 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm

Coloring Scheme:

• every message sent by a red process is colored red


• a red message is a message that was sent after the
sender of that message recorded its local snapshot

• every white process takes its snapshot at its


convenience, but no later than the instant it receives a
red message

Course ID: SS ZG526, Title: Distributed Computing 44 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
Coloring Scheme:

• when a white process receives a red message, it records


its local snapshot before processing the message
• ensures that
• no message sent by a process after recording its local
snapshot is processed by the destination process
before the destination records its local snapshot
• an explicit marker message is not required
• marker is piggybacked on computation messages using a
coloring scheme

Course ID: SS ZG526, Title: Distributed Computing 45 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
• second observation –
• marker informs process pj of the value of {send(mij) |send(mij) ∈ LSi}
• compute the state of channel Cij as transit(LSi,LSj)

• Lai–Yang algorithm fulfills this role of the marker as follows:


• every white process records a history of all white messages sent or
received by it along each channel
• when a process turns red, it sends these histories along with its
snapshot to the initiator process that collects the global snapshot

Course ID: SS ZG526, Title: Distributed Computing 46 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm

• Lai–Yang algorithm fulfills this role of the marker as follows


(contd. from previous slide):
• initiator process evaluates transit(LSi,LSj) to compute the
state of a channel Cij as:
SCij
= {white messages sent by pi on Cij} − {white messages
received by pj on Cij}
= {mij | send(mij) ∈ LSi} − {mij | rec(mij) ∈ LSj}

Course ID: SS ZG526, Title: Distributed Computing 47 BITS Pilani, Hyderabad Campus
Lai–Yang Algorithm
•marker messages are not required
•each process has to record the entire message history on each
channel as part of the local snapshot
•space requirements of the algorithm may be large
•size of the local storage and snapshot recording can be reduced
•by storing only the messages sent and received since the
previous snapshot recording, assuming that the previous
snapshot is still available
•approach is useful in applications that require repeated snapshots
of a distributed system

Course ID: SS ZG526, Title: Distributed Computing 48 BITS Pilani, Hyderabad Campus
Necessary and sufficient conditions for
consistent global snapshots
•local checkpoint – saved intermediate state of a process during its
execution
•global snapshot –
•set of local checkpoints one from each process
•represents a snapshot of the distributed computation execution at
some instant
•consistent global snapshot –
•no causal path between any two distinct checkpoints
•set of local states that occurred concurrently
•Cp,i - ith (i ≥ 0) checkpoint of process pp , assigned the sequence number i
•ith checkpoint interval of pp - all computation performed between (i−1)th
and ith checkpoints (and includes (i−1)th checkpoint but not ith).
Course ID: SS ZG526, Title: Distributed Computing 49 BITS Pilani, Hyderabad Campus
Necessary and sufficient conditions for
consistent global snapshots
•a causal path exists between checkpoints Ci,j and Cx,y if a sequence of
messages exists from Ci,j to Cx,y such that each message is sent after the
previous one in the sequence is received
•a zigzag path between two checkpoints is a causal path, however allows a
message to be sent before the previous one in the path is received, send and
receive are in the same checkpoint interval

Course ID: SS ZG526, Title: Distributed Computing 50 BITS Pilani, Hyderabad Campus
Necessary and sufficient conditions for
consistent global snapshots
•checkpoint C is involved in a zigzag cycle iff there is a zigzag path from C to itself

•necessary condition for consistent snapshot - absence of causal path between


checkpoints in a snapshot
•necessary and sufficient conditions for consistent snapshot - absence of zigzag path
between checkpoints in a snapshot
•a set of checkpoints S can be extended to a consistent snapshot if and only if no
checkpoint in S has a zigzag path to any other checkpoint in S
•a checkpoint can be a part of a consistent snapshot if and only if it is not involved in a
zigzag cycle
Course ID: SS ZG526, Title: Distributed Computing 51 BITS Pilani, Hyderabad Campus
Reference
• Ajay D. Kshemkalyani, and Mukesh Singhal, Chapter 4,
“Distributed Computing: Principles, Algorithms, and Systems”,
Cambridge University Press, 2008.

Course No.- SS ZG526, Course Title - Distributed Computing 52 BITS Pilani, Hyderabad Campus
Course No.- SS ZG526, Course Title - Distributed Computing 53 BITS Pilani, Hyderabad Campus

You might also like