Synchronization: CS403/534 Distributed Systems Erkay Savas Sabanci University

Synchronization
CS403/534
Distributed Systems
Erkay Savas
Sabanci University
1
Outline
• Physical Clocks
• Logical Clocks
• Global State
• Election Algorithms
• Distributed Transactions
2
Clock Synchronization
In a centralized system with a single clock, there is no
problem about the clock
• When each machine has its own clock, the real order of
events may not be preserved
• Question: Is it possible to synchronize all the clocks in
a distributed system? 3
Physical Clocks: Solar Time
• Computation of the mean solar day (or mean solar second)

4
Physical Clocks: Atomic Time
• Universal Coordinated Time (UTC)
– Based on the number of transitions per second of the
Cesium 133 atom (very accurate)
– At present, the real time is taken as the average of 50
cesium clocks around the world International
Atomic Time (TAI)
– UTC introduces a leap second from time to time to
compensate since 86400 TAI seconds is 3 ms less than
a mean solar day
• UTC is broadcast through short wave radio and
satellites. Satellites can give an accuracy of
about ±0.5 ms
5
Physical Clocks: Drifts
• Problem: Suppose we have distributed system
with a UTC-receiver we still have to distribute
the time to each machine in our system
• Basic principle:
– Every machine has a timer that generates an interrupt
H times a second
– The clock in machine P records ticks on each timer
interrupt. Denote the value of that clock by CP(t),
where t is the UTC time.
– In a perfect world we would have CP(t) = t or
dC/dt = 1.
– In practice
ρ ≤ dC/dt ≤ 1 + ρ ρ: maximum drift rate
1-ρ
6
Clock Synchronization Algorithms
The relation
between clock
time and UTC
when clocks tick
at different
rates.
Goal: Never let two clocks in any system differ

by more than δ seconds
Mean: Synchronize them at least every (δδ/2ρρ) seconds
7
Cristian's Algorithm
• There is a UTC-receiver machine in the system.
• All other machines tries to synchronize with the
UTC-receiver.
– They simply asks for the current time every δ/2ρ s.
8
Cristian's Algorithm: Problems
• 1st problem:
– Time must never run backward (if the clock is fast)
– The change must be gradually introduced
• 2nd problem:
– It takes a nonzero amount of time for the time
server’s reply to reach the sender.
– The propagation time can simply be taken as (T1-T0)/2.
– or (T1-T0 -I)/2.
9
The Berkeley Algorithm (1)
• Time server (a daemon) is active, polling every
machine periodically.
• It asks the current time they got and computes
the average of time values it polled from
different machines.
• Informs each machine how it should adjust its
time.
• There is no need for a UTC-receiver.
10
The Berkeley Algorithm (2)
a) The time daemon asks all the other machines for their clock
values
b) The machines answer
c) The time daemon tells everyone how to adjust their clock
11
Logical Clocks (1)
• In many cases, it is sufficient that all the
computers agree on the same time, not
necessarily with the real time.
– Key word is the internal consistency
• What is important is the happens-before
relation on the set of events in a distributed
system Logical clocks (Lamport timestamps)
• Happens-before relationship
– If a and b are two events in the same process, and a
occurs before b, then a → b is true
– If a is the sending of a message by a process, and b is
the receipt of that message by another process, then
a → b is true
– If a → b and b → c, then a → c (transitive)
12
Logical Clocks (2)
• Concurrent events:
– If two events, x and y, happen in different processes
that do not exchange messages (not even indirectly)
they are said to be concurrent.
– Neither x → y nor y → x is true.
• For every event a, we can assign it a time value
C(a) on which all processes agree
– If a → b, then C(a) < C(b)
– C must always go forward, never backward
– C(a) < C(b) does not imply a → b ( a and b may be
concurrent)
13
Logical Clocks : Example
0 0 0
6 A 8 10
12 16 20
18 24 B 30
24 32 40
30 40 50
36 48 C 60
42 56 70
48 D 64 80
54 72 90
60 80 100
P1 P2 P3
Three processes running on different machines with
their own clocks of different rates 14
Lamport’s Algorithm (1)
0 0 0
6 A 8 10
12 16 20
18 24 B 30
24 32 40
30 40 50
36 48 C 60
42 61 70
48 D 69 80
70 77 90
76 85 100
P1 P2 P3
Three processes running on different machines with
their own clocks but obeying the happened-before relation
15
Lamport’s Algorithm (2)
• Each message carries a Lamport timestamp, which is
monotonically increasing software counter.
• Lamport timestamps do not have any particular
relationship to physical clocks
• When a message arrives and the receiver’s clock shows a
time prior to the one in the message, the receiver’s clock
is adjusted to the value one more than the sending time.
• We assign time values to each events
– If a happens before b in the same process, then C(a) < C(b)
– If a and b represents the sending and receiving of a message,
respectively, then C(a) < C(b).
– For all distinctive events a and b, C(a) ≠ C(b)
16
Lamport Timestamps: Example
1 2
P1
a b
m1
3 Physical
4
time
P2
c d
m2
1 5
P3
e f
17
Total Ordering with Logical Clocks
• What happens if two events still occur at the
same time in different processes (with respect
to Lamport timestamps)
• Then a process number (which must be unique
systemwide) is attached to the event along with
the timestamp
• Process Pi stamps event a with [Ci (a), i ]
• Then: Given [Ci (a), i ] < [Cj (b), j ], if and only if
– Ci (a) < Cj (b), or
– Ci (a) = Cj (b) and i < j
18
Example: Order of Events
• Problem: We sometimes need to guarantee that
concurrent updates on a replicated database are
seen in the same order everywhere:
– There are two replicas of account database in a bank
– A bank account has $5000 initially.
– Update 1: Process P1 adds $3000 to the account in
replica 1.
– Update 2: Process P2 increments the account by 4% in
replica 2.
– Without proper synchronization, different orders of
two events may be observed in two sites:
– Replica 1 would have $8320 (Update 1 first)
– Replica 2 would have $8200 (Update 2 first)
19
Totally-Ordered Multicast (1)
• Approach:
– not to order events with respect to their actual
happening times.
– But try to establish an order of events which are the
same in every process in the distributed system
– In the bank example, we need to guarantee that the
two updates must be performed in the same order at
each copy.
• Mechanism:
– A group of processes are multicasting messages with
timestamps of their sender.
– We assume that the messages from the same sender
are received in the order they sent (FIFO ordered)
and no messages are lost (reliable communication)
20
• Process Pi multicasts timestamped message mi to
other processes. The message is also put in the
local queue Qi since this is multicasting.
• Any incoming message at process Pj is queued in
Qj ordered according to its timestamp.
• Processes send acknowledgement messages that
are also treated as regular messages.
• Acknowledgement messages are discarded when
they are at the head of the queue
• The messages are delivered to the application
when it is alright to do so.
21
• Condition for delivering a queued message to
the application:
1. The message is at the head of the queue
2. The message has been acknowledged by every
other process in the multicast group.
Explanation:
– Since the communication is FIFO ordered and no
message is lost, if the acknowledgement for a
message m has arrived at process Pi from process Pj,
this means all previous messages from Pj with lower
timestamps must have been received earlier than m.
22
Totally-Ordered Multicast: Example
Process P1 Process P2
Timestamp Queue Q1 Queue Q2 Timestamp
m1, 5 m2, 7
5 - - 7
ACK ACK
8 (m1,5); (m2,7) (m1,5); (m2,7) 8
9 (m1,5);(m2,7);(ACK, 8) (m1,5);(m2,7);(ACK, 8) 9
23
Vector Timestamps
• Fact: Lamport timestamps do not guarantee that if
C(a) < C(b) then a indeed happened before b. We use
vector timestamps for that.
• There are N processes in the system, P1, P2,…, PN .
• Each process Pi maintains an array Vi[1…N] where
– Vi[i] is the number of events that have occurred so far at Pi .
– Vi[j] denotes the number of events that process Pi knows have
taken place at process Pj . (process Pj may have timestamped more
events, but no information reached Pi about them in messages
yet).
– Before Pi sends a message m, it increments Vi[i] by 1 and sends Vi
along with the message as vector timestamp vt(m) = Vi.
24
Vector Timestamps: Example
[1,0,0] [2,0,0]
P1
a b
m1
[2,1,0]
Physical
[2,2,0]
time
P2
c d
m2
[0,0,1] [2,2,2]
P3
e f
25
Vector Timestamps: Updating Rules
• Vector timestamps are updated as follows:
1. Initially Vi [x] = 0, for x = 1,2,…, N
2. Just before Pi timestamps an event, it sets
Vi[i] := Vi[i] + 1
3. Pi includes the value vt = Vi in every message it sends.
4. When Pj receives a timestamp vt in a message, it sets
Vj[x] := max (Vj[x], vt[x]), for x = 1, 2, …, N ; x ≠ i
• We can compare vector timestamps as follows:
– V = V´ iff V[j] = V´[ j], for j = 1, 2, …, N
– V ≤ V´ iff V[j] ≤ V´[j], for j = 1, 2, …, N
– V < V´ iff V ≤ V´ and V ≠ V´.
26
Vector Timestamps: Capturing Causality
• If V(a) < V(b) then a happened before b.
– Prove that this is true (Homework assignment).
• Totally-ordered timestamp fails to capture
causality.
– Think of posting articles in a network news service.
– Reaction to an article must follow the delivery of the
article.
– We are not interested the order of delivery of
independent articles
– Totally-ordered scheme enforces an order between
unrelated events.
– Causally-ordered scheme preserves the order between
potentially related events.
27
Example: Network News Services
• Pi posts an article a
– i.e. multicasts a along with vt(a) (= Vi)
• Pj posts a reaction r to the article a
– i.e. multicasts r along with vt(r) (= Vj)
– Note that vt(r)[j] > vt(a)[j]
• Both messages, a and r, arrives another process
Pk(no message is lost)
– r may arrive before a
– Pk inspects vt(r) and deliver the message if
1. vt(r)[j] = Vk[j] + 1
2. vt(r)[x] ≤ Vk[x] for all x ≠ j
28
Global State
• Definition: Global state of a distributed system
consists of the local state of each process,
together with messages that are currently in
transit.
• Why interested in global state?
– To check if a distributed system is in a deadlock.
– To check if a distributed computation is finished
regularly or an error occurred.
– For example, when it is known that local computations
have stopped and that there is no more message in
transit, the system has obviously entered a state in
which no more progress can be made
29
Distributed Snapshot (1)
• Reflects a state in which the distributed system
is or has been.
• It yields a consistent global state:
– Event that a message m has been received by process
Pi is recorded in global state the event that m is
sent by another process Pj must also be recorded.
– A snapshot providing this is called a consistent cut.
– Since a message can be in transit, while the sending of
m is recorded, the receiving of m by the target
process may not be recorded in the global state
30
Distributed Snapshot (2)
consistent cut inconsistent cut
time time
P1 P1
m3 m3
m1 m1
P2 P2
m2 m2
P3 P3
a) A consistent cut
b) An inconsistent cut
31
Distributed Snapshot Algorithm (DSA)
• Assumptions (for simplicity)
– The distributed system can be represented as a
collection of processes.
– Neither channels nor processes fail; communication is
reliable so that every message sent is eventually
received intact, exactly once.
– There is point-to-point channels between processes
– Channels are unidirectional and message delivery is
FIFO-ordered
• Features
– Any process can initiate the algorithm at any time
– While the snapshot algorithm is in execution, the
normal operation of distributed system is not
affected.
32
Initiating DSA
• A process, Pi starts the algorithm by
1. saving its local state
2. sending a marker along each of its outgoing
channels
• A receiver of the marker should participate in
recording the global state.
33
Marker Receiving Rule
• When a process Pj receives a marker on
channel C
– IF ( Pj has not yet recorded its local state ) it
1. records its local state now;
2. records the state of C as the empty set;
3. turns on recording of messages arriving on other
incoming channels;
– ELSE
• Pj records the state of C as the set of messages
that it has received over C since it saved its state;
– ENDIF
34
Marker Sending Rule
• After Pj has recorded its state, for each
outgoing channel C
– pi sends one marker message over C.
(before it sends any other (regular) message over C).
35
Process and Channels
Incoming
Outgoing
message
message
state
Process
# #
Local
file system
Marker
Organization of a process and channels for a distributed

snapshot
36
DSA: Example with Two Processes
C12
m14 # m13 m12 m11
P1 P2
m21 m22
C21
LS(P1)
C12
m15 m14
P1 P2
# m23
C21
m21, m22 are in transit on C21 LS(P2) (includes

LS(P1)
that m11, m12, m13 have
been received) 37
DSA: Example with Three Processes
C13
C12 P1
C32
C21 C23
P2 P3
C32
38
Example: P1 Process
P1
Send Receive
Time C12 C13 C21 C31
1 m12_1
2 m12_2 m13_1 m21_1 m31_1
3 m12_3
4 m12_4 m13_2 m21_2 m31_2
5 #1 #1
6 m21_3 m31_3
7 m31_4
8 #2 m31_5
9 #3
10
39
Example: P2 Process
P2
Send Receive
1 m21_1
2 m23_1 m12_1
3 m21_2 m12_2
4 m23_2 m12_3 m32_1
5 m21_3 m12_4
6 m23_3 #1
7 #2 #2 m32_2
8 m32_2
9 #3
10
40
Example: P3 Process
P3
Send Receive
1 m31_1
2 m31_2
3 m32_1 m13_1 m23_1
4
5 m31_3 m32_2 m13_2 m23_2
6 m31_4
7 m31_5 m32_3 #1
8 #3 #3 m23_2
9 #2
10
41
Example: State Recorded at P1:
Sent Received In Transit
C12 C13 C21 C31 C21 C31
m12_1 m13_1 m21_1 m31_1 m21_3 m31_3
m12_2 m13_2 m21_2 m31_2 m31_4
m12_3 m31_5
m12_4
42
Example: State Recorded at P2 & P3
P2 Sent Received In Transit
C21 C23 C12 C32 C12 C32
m21_1 m23_1 m12_1 m32_1 - m32_2
m21_2 m23_2 m12_2 m32_3
m12_3
m12_4
P3 Sent Received In Transit

C31 C32 C13 C23 C13 C23
m31_1 m32_1 m13_1 m23_1 - m23_3
m31_2 m13_2 m23_2
43

Synchronization: CS403/534 Distributed Systems Erkay Savas Sabanci University

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Synchronization: CS403/534 Distributed Systems Erkay Savas Sabanci University

Uploaded by

Copyright:

Available Formats

Synchronization

• Computation of the mean solar day (or mean solar second)

Goal: Never let two clocks in any system differ

Organization of a process and channels for a distributed

m21, m22 are in transit on C21 LS(P2) (includes

P3 Sent Received In Transit

You might also like