Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

CS9222 Advanced Operating

System

Unit – IV

Dr.A.Kathirvel
Professor & Head/IT - VCEW
Unit - IV
Basic Concepts-Classification of Failures – Basic
Approaches to Recovery; Recovery in Concurrent
System; Synchronous and Asynchronous
checkpointing and Recovery; Check pointing in
Distributed Database Systems; Fault Tolerance;
Issues - Two-phase and Nonblocking commit
Protocols; Voting Protocols; Dynamic Voting
Protocols;
Recovery
Recovery in computer systems refers to restoring a
system to its normal operational state.
Recovery may be as simple as restarting a failed
computer or restarting failed processes.
Recovery is generally a very complicated process.
For example, a process has memory allocated to it and
a process may have locked shared resources, such as
files and memory. Under such circumstances, if a
process fails, it is imperative that the resources
allocated to the failed process are undone.
Recovery

 Computer system recovery:


 Restore the system to a normal operational state
 Process recovery:
 Reclaim resources allocated to process,
 Undo modification made to databases, and
 Restart the process
 Or restart process from point of failure and resume execution
 Distributed process recovery (cooperating processes):
 Undo effect of interactions of failed process with other cooperating
processes.
 Replication (hardware components, processes, data):
 Main method for increasing system availability
 System:
 Set of hardware and software components
 Designed to provide a specified service (I.e. meet a set of requirements)

4
Recovery (cont.)

System failure:
– System does not meet requirements, i.e.does not perform its services as specified

Error could lead to system failure

Erroneous System State:


– State which could lead to a system failure by a sequence of valid state transitions
– Error: the part of the system state which differs from its intended value

Error is a manifestation of a fault

Fault:
– Anomalous physical condition, e.g. design errors, manufacturing problems, damage,
external disturbances.

5
Classification of failures
Process failure:
 Behavior: process causes system state to deviate from specification (e.g. incorrect computation,
process stop execution)
 Errors causing process failure: protection violation, deadlocks, timeout, wrong user input, etc…
 Recovery: Abort process or
 Restart process from prior state
System failure:
 Behavior: processor fails to execute
 Caused by software errors or hardware faults (CPU/memory/bus/…/ failure)
 Recovery: system stopped and restarted in correct state
 Assumption: fail-stop processors, i.e. system stops execution, internal state is lost
Secondary Storage Failure:
 Behavior: stored data cannot be accessed
 Errors causing failure: parity error, head crash, etc.
 Recovery/Design strategies:
 Reconstruct content from archive + log of activities
 Design mirrored disk system
Communication Medium Failure:
 Behavior: a site cannot communicate with another operational site
 Errors/Faults: failure of switching nodes or communication links
 Recovery/Design Strategies: reroute, error-resistant communication protocols

6
Backward and Forward Error Recovery
Failure recovery: restore an erroneous state to an error-free state
Approaches to failure recovery:
 Forward-error recovery:
 Remove errors in process/system state (if errors can be completely assessed)
 Continue process/system forward execution
 Backward-error recovery:
 Restore process/system to previous error-free state and restart from there
Comparison: Forward vs. Backward error recovery
 Backward-error recovery
 (+) Simple to implement
 (+) Can be used as general recovery mechanism
 (-) Performance penalty
 (-) No guarantee that fault does not occur again
 (-) Some components cannot be recovered
 Forward-error Recovery
 (+) Less overhead
 (-) Limited use, i.e. only when impact of faults understood
 (-) Cannot be used as general mechanism for error recovery

7
Backward-Error Recovery: Basic approach
Principle: restore process/system to a known, error-free “recovery point”/ “checkpoint”.
System model:

Storage that
CPU
maintains
secondar stable information in
y storage Main memory storage the event of
system failure
Bring object to MM Store logs and
to be accessed recovery points

Write object back


if modified

Approaches:
 (1) Operation-based approach
 (2) State-based approach
8
(1) The Operation-based Approach
Principle:
 Record all changes made to state of process (‘audit trail’ or ‘log’) such that process
can be returned to a previous state
 Example: A transaction based environment where transactions update a database
 It is possible to commit or undo updates on a per-transaction basis
 A commit indicates that the transaction on the object was successful and changes are
permanent
(1.a) Updating-in-place
 Principle: every update (write) operation to an object creates a log in stable storage
that can be used to ‘undo’ and ‘redo’ the operation
 Log content: object name, old object state, new object state
 Implementation of a recoverable update operation:
 Do operation: update object and write log record
 Undo operation: log(old) -> object (undoes the action performed by a do)
 Redo operation: log(new) -> object (redoes the action performed by a do)
 Display operation: display log record (optional)
 Problem: a ‘do’ cannot be recovered if system crashes after write object but before
log record write
(1.b) The write-ahead log protocol
 Principle: write log record before updating object
9
(2) State-based Approach

 Principle: establish frequent ‘recovery points’ or ‘checkpoints’ saving the


entire state of process
 Actions:
 ‘Checkpointing’ or ‘taking a checkpoint’: saving process state
 ‘Rolling back’ a process: restoring a process to a prior state
 Note: A process should be rolled back to the most recent ‘recovery point’ to
minimize the overhead and delays in the completion of the process

 Shadow Pages: Special case of state-based approach


 Only a part of the system state is saved to minimize recovery
 When an object is modified, page containing object is first copied on stable
storage (shadow page)
 If process successfully commits: shadow page discarded and modified page is
made part of the database
 If process fails: shadow page used and the modified page discarded

10
Recovery in concurrent systems
 Issue: if one of a set of cooperating processes fails and has to be rolled back to a
recovery point, all processes it communicated with since the recovery point have to be
rolled back.
 Conclusion: In concurrent and/or distributed systems all cooperating processes have to
establish recovery points
 Orphan messages and the domino effect

X x1 x2 x3
m
Y y1 y2

Z z1 z2
Time
 Case 1: failure of X after x3 : no impact on Y or Z
 Case 2: failure of Y after sending msg. ‘m’
 Y rolled back to y2
 ‘m’ ≡ orphan massage
 X rolled back to x2
 Case 3: failure of Z after z2
 Y has to roll back to y1
 X has to roll back to x1 Domino Effect
 Z has to roll back to z1
11
Lost messages

X x1
m
Failure
Y y1
Time

• Assume that x1 and y1 are the only recovery points for processes X and Y, respectively
• Assume Y fails after receiving message ‘m’
• Y rolled back to y1, X rolled back to x1
• Message ‘m’ is lost

Note: there is no distinction between this case and the case where message ‘m’ is lost in
communication channel and processes X and Y are in states x1 and y1, respectively

12
Problem of livelock
• Livelock: case where a single failure can cause an infinite number of rollbacks

X x1 n1

Y m1
y1 Failure
(a) Time

X x1 n2
n1
Y m2
y1 2nd roll back
(b) Time

(a) • Process Y fails before receiving message ‘n1’ sent by X


• Y rolled back to y1, no record of sending message ‘m1’, causing X to roll back to x1
(b) • When Y restarts, sends out ‘m2’ and receives ‘n1’ (delayed)
• When X restarts from x1, sends out ‘n2’ and receives ‘m2’
• Y has to roll back again, since there is no record of ‘n1’ being sent
• This cause X to be rolled back again, since it has received ‘m2’ and there is no record of sending
‘m2’ in Y
• The above sequence can repeat indefinitely 13
Consistent set of checkpoints
• Checkpointing in distributed systems requires that all processes (sites) that
interact with one another establish periodic checkpoints
• All the sites save their local states: local checkpoints
• All the local checkpoints, one from each site, collectively form a global
checkpoint
• The domino effect is caused by orphan messages, which in turn are caused
by rollbacks
1. Strongly consistent set of checkpoints
– Establish a set of local checkpoints (one for each process in the set)
such that no information flow takes place (i.e., no orphan messages)
during the interval spanned by the checkpoints
2. Consistent set of checkpoints
– Similar to the consistent global state
– Each message that is received in a checkpoint (state) should also be
recorded as sent in another checkpoint (state)
CS-550 (M.Soneru): Recovery [SaS] 14
Consistency of Checkpoint
• Strongly consistent set of checkpoints
no messages penetrating the set
• Consistent set of checkpoints
no messages penetrating the set backward

x1 x2
[ [
need to deal with
y1 y2 lost messages
[ [
Strongly consistent consistent
z1 z2
[ [
Checkpoint/Recovery Algorithm

• Synchronous
– with global synchronization at checkpointing
• Asynchronous
– without global synchronization at checkpointing
Preliminary (Assumption)
~Synchronous Checkpoint~
Goal
To make a consistent global checkpoint

Assumptions
– Communication channels are FIFO
– No partition of the network
– End-to-end protocols cope with message loss due to
rollback recovery and communication failure
– No failure during the execution of the algorithm
Preliminary (Two types of checkpoint)
~Synchronous Checkpoint~

tentative checkpoint :
– a temporary checkpoint
– a candidate for permanent checkpoint
permanent checkpoint :
– a local checkpoint at a process
– a part of a consistent global checkpoint
Checkpoint Algorithm
~Synchronous Checkpoint~
Algorithm
1. an initiating process (a single process that invokes this algorithm) takes a
tentative checkpoint
2. it requests all the processes to take tentative checkpoints
3. it waits for receiving from all the processes whether taking a tentative
checkpoint has been succeeded
4. if it learns all the processes has succeeded, it decides all tentative
checkpoints should be made permanent; otherwise, should be discarded.
5. it informs all the processes of the decision
6. The processes that receive the decision act accordingly

Supplement
Once a process has taken a tentative checkpoint, it shouldn’t send messages
until it is informed of initiator’s decision.
Diagram of Checkpoint Algorithm
~Synchronous Checkpoint~
Tentative decide to commit
checkpoint
Initiator permanent checkpoint
[ | [

request to
take a OK
[ tentative
| [
checkpoint

[ | [

consistent global checkpoint Unnecessary checkpoint


consistent global checkpoint
Optimized Algorithm
~Synchronous Checkpoint~

Each message is labeled by order of sending


Labeling Scheme X [
x3
⊥ : smallest label x2
т : largest label y1 y2
Y [
last_label_rcvdX[Y] : y2
the last message that X received from Y after X has taken its last permanent or
tentative checkpoint. if not exists, ⊥is in it.
first_label_sentX[Y] : x2
the first message that X sent to Y after X took its last permanent or tentative
checkpoint . if not exists, ⊥is in it.
ckpt_cohortX :
the set of all processes that may have to take checkpoints when X decides to
take a checkpoint.
Checkpoint request need to be sent to only the processes
included in ckpt_cohort
Optimized Algorithm
~Synchronous Checkpoint~

ckpt_cohortX : { Y | last_label_rcvdX[Y] > ⊥ }

Y takes a tentative checkpoint only if


last_label_rcvdX[Y] >= first_label_sentY[X] > ⊥
last_label_rcvdX[Y]
X [

Y [
first_label_sentY[X]
Optimized Algorithm
~Synchronous Checkpoint~
Algorithm
1. an initiating process takes a tentative checkpoint
2. it requests p ∈ ckpt_cohort to take tentative checkpoints ( this
message includes last_label_rcvd[reciever] of sender )
3. if the processes that receive the request need to take a checkpoint,
they do the same as 1.2.; otherwise, return OK messages.
4. they wait for receiving OK from all of p ∈ ckpt_cohort
5. if the initiator learns all the processes have succeeded, it decides all
tentative checkpoints should be made permanent; otherwise, should
be discarded.
6. it informs p ∈ ckpt_cohort of the decision
7. The processes that receive the decision act accordingly
Diagram of Optimized Algorithm
~Synchronous Checkpoint~
Tentative
Permanent decide to commit
checkpoint
A [ [| 2 >= 0 > 0
ab1 ac1 ba1 ba2 ca2

B [ [| 2 >= 1 > 0
OK
bd1 cb1 cb2 ac2

C [ [| 2 >= 2 > 0

dc1 cd1
dc2
D [

ckpt_cohortX : { Y | last_label_rcvdX[Y] > ⊥ }

last_label_rcvdX[Y] >= first_label_sentY[X] > ⊥


Correctness
~Synchronous Checkpoint~
• A set of permanent checkpoints taken by this
algorithm is consistent
– No process sends messages after taking a
tentative checkpoint until the receipt of the
decision
– New checkpoints include no message from the
processes that don’t take a checkpoint
– The set of tentative checkpoints is fully either
made to permanent checkpoints or discarded.
Recovery Algorithm
~Synchronous Recovery~
Labeling Scheme
⊥ : smallest label
т : largest label
last_label_rcvdX[Y] :
the last message that X received from Y after X has taken its last
permanent or tentative checkpoint. If not exists, ⊥is in it.
first_label_sentX[Y] :
the first message that X sent to Y after X took its last permanent or
tentative checkpoint . If not exists, ⊥is in it.
roll_cohortX :
the set of all processes that may have to roll back to the latest
checkpoint when process X rolls back.
last_label_sentX[Y] :
the last message that X sent to Y before X takes its latest permanent
checkpoint. If not exist, т is in it.
Recovery Algorithm
~Synchronous Recovery~
roll_cohortX = { Y | X can send messages to Y }

Y will restart from the permanent checkpoint only if


last_label_rcvdY[X] > last_label_sentX[Y]
Recovery Algorithm
~Synchronous Recovery~
Algorithm
1. an initiator requests p ∈ roll_cohort to prepare to rollback ( this
message includes last_label_sent[reciever] of sender )
2. if the processes that receive the request need to rollback, they
do the same as 1.; otherwise, return OK message.
3. they wait for receiving OK from all of p ∈ ckpt_cohort.
4. if the initiator learns p ∈ roll_cohort have succeeded, it decides
to rollback; otherwise, not to rollback.
5. it informs p ∈ roll_cohort of the decision
6. the processes that receive the decision act accordingly
Diagram of Synchronous Recovery
decide to
roll back

A [
ab1 ac1 ba1 ba2
OK
B [ 2
0>1
ac2
request to
bd1 cb1 cb2
roll back

C [ 2>1
dc1 dc1
dc2
D [ 0 >т

roll_cohortX = { Y | X can send messages to Y }


last_label_rcvdY[X] > last_label_sentX[Y]
Drawbacks of Synchronous Approach

• Additional messages are exchanged


• Synchronization delay
• An unnecessary extra load on the system if
failure rarely occurs
Asynchronous Checkpoint
Characteristic
– Each process takes checkpoints independently
– No guarantee that a set of local checkpoints is
consistent
– A recovery algorithm has to search consistent set
of checkpoints

– No additional message
– No synchronization delay
– Lighter load during normal excution
Preliminary (Assumptions)
~Asynchronous Checkpoint / Recovery~
Goal
To find the latest consistent set of checkpoints
Assumptions
– Communication channels are FIFO
– Communication channels are reliable
– The underlying computation is event-driven
Preliminary (Two types of log)
~Asynchronous Checkpoint / Recovery~
• save an event on the memory at receipt of messages
(volatile log)
• volatile log periodically flushed to the disk (stable
log) ⇔ checkpoint

volatile log :
quick access
lost if the corresponding processor fails
stable log :
slow access
not lost even if processors fail
Preliminary (Definition)
~Asynchronous Checkpoint / Recovery~
Definition
CkPti : the checkpoint (stable log) that i rolled back to when failure
occurs
RCVDi←j (CkPti / e ) :
the number of messages received by processor i from processor j, per
the information stored in the checkpoint CkPti or event e.
SENTi→j(CkPti / e ) :
the number of messages sent by processor i to processor j, per the
information stored in the checkpoint CkPti or event e
Recovery Algorithm
~Asynchronous Checkpoint / Recovery~

Algorithm
1. When one process crashes, it recovers to the latest checkpoint
CkPt.
2. It broadcasts the message that it had failed. Others receive this
message, and rollback to the latest event.
3. Each process sends SENT(CkPt) to neighboring processes
4. Each process waits for SENT(CkPt) messages from every
neighbor
5. On receiving SENTj→i(CkPtj) from j, if i notices RCVDi←j (CkPti) >
SENTj→i(CkPtj), it rolls back to the event e such that RCVDi←j (e)
= SENTj→i(e),
6. repeat 3,4,and 5 N times (N is the number of processes)
Asynchronous Recovery
X:Y X:Z
x1
Ex0 Ex1 Ex2 Ex3
X [ 3 <= 2
2 0 <= 0

(X,2) (Z,0)
(Y,2) Y:X Y:Z
Ey0 Ey1 Ey2 y1
Ey3
Y [ 1 <= 2 1 <= 1
(X,0)
(Z,1)
(Y,1)
Z:X Z:Y
Ez0 Ez1 Ez2
Z [ 0 <= 0 1
2 <= 1
z1

RCVDi←j (CkPti) <= SENTj→i(CkPtj)


System reliability: Fault-Intolerance vs. Fault-Tolerance

 The fault intolerance (or fault-avoidance) approach improves


system reliability by removing the source of failures (i.e.,
hardware and software faults) before normal operation begins

 The approach of fault-tolerance expect faults to be present


during system operation, but employs design techniques which
insure the continued correct execution of the computing
process

37
Approaches to fault-tolerance
 Approaches:
 (a) Mask failures
 (b) Well defined failure behavior

 Mask failures:
 System continues to provide its specified function(s) in the presence of failures
 Example: voting protocols
 (b) Well defined failure behaviour:
 System exhibits a well define behaviour in the presence of failures
 It may or it may not perform its specified function(s), but facilitates actions
suitable for fault recovery
 Example: commit protocols
 A transaction made to a database is made visible only if successful and it commits
 If it fails, transaction is undone
 Redundancy:
 Method for achieving fault tolerance (multiple copies of hardware, processes,
data, etc...)

38
Issues
 Process Deaths:
 All resources allocated to a process must be recovered when a process
dies
 Kernel and remaining processes can notify other cooperating processes
 Client-server systems: client (server) process needs to be informed that
the corresponding server (client) process died
 Machine failure:
 All processes running on that machine will die
 Client-server systems: difficult to distinguish between a process and
machine failure
 Issue: detection by processes of other machines
 Network Failure:
 Network may be partitioned into subnets
 Machines from different subnets cannot communicate
 Difficult for a process to distinguish between a machine and a
communication link failure
39
Atomic actions
 System activity: sequence of primitive or atomic actions
 Atomic Action:
 Machine Level: uninterruptible instruction
 Process Level: Group of instructions that accomplish a task
 Example: Two processes, P1 and P2, share a memory location ‘x’ and both
modify ‘x’
Process P1 Process P2
… …
Lock(x); Lock(x);
x := x + z; x := x + y; Atomic action
Unlock(x); Unlock(x);
… …
successful exit
 System level: group of cooperating process performing a task (global
atomicity)
40
Committing
 Transaction: Sequence of actions treated as an atomic action to preserve
consistency (e.g. access to a database)
 Commit a transaction: Unconditional guarantee that the transaction will
complete successfully (even in the presence of failures)
 Abort a transaction: Unconditional guarantee to back out of a transaction,
i.e., that all the effects of the transaction have been removed (transaction
was backed out)
 Events that may cause aborting a transaction: deadlocks, timeouts, protection
violation
 Mechanisms that facilitate backing out of an aborting transaction
 Write-ahead-log protocol
 Shadow pages
 Commit protocols:
 Enforce global atomicity (involving several cooperating distributed processes)
 Ensure that all the sites either commit or abort transaction unanimously, even in
the presence of multiple and repetitive failures

41
The two-phase commit protocol
 Assumption:
 One process is coordinator, the others are “cohorts” (different sites)
 Stable store available at each site
 Write-ahead log protocol

Coordinator Cohorts
Initialization
Send start transaction message to all cohorts
Phase 1
Send commit-request message, requesting all If transaction at cohort is successful
cohort to commit then write undo and redo log on stable
Wait for reply from cohorts storage and return agreed
Phase 2 message
If all cohorts sent agreed and coordinator else return abort message
agrees
then write commit record into log
and send commit message to cohorts If commit received,
else send abort message to cohorts release all resources and locks held for
Wait for acknowledgment from cohorts transaction and
If acknowledgment from a cohort not received send acknowledgment
within specified period if abort received,
resent commit/abort to that cohort undo the transaction using undo log record,
If all acknowledgments received, release resources and locks and
write complete record to log send acknowledgment
42
NonBlocking Commit Protocols
Our Blocking Theorem from last week states that if
network partitioning is possible, then any distributed
commit protocol may block.
Let’s assume now that the network can not partition.
Then we can consult other processes to make
progress.
However, if all processes fail, then we are, again,
blocked.
Let’s further assume that total failure is not possible
ie. not all processes are crashed at the same time.
Automata representation
We model the participants with finite state automata
(FSA).
The participants move from one state to another as a
result of receiving one or several messages or as a
result of a timeout event.
Having received these messages, a participant may
send some messages before executing the state
transition.
Commit Protocol Automata
 Final states are divided into Abort states and Commit states
(finally, either Abort or Commit takes place).
 Once an Abort state is reached, it is not possible to do a
transition to a non-Abort state. (Abort is irreversible). Similarly
for Commit states (Commit is also irreversible).
 The state diagram is acyclic.
 We denote the initial state by q, the terminal states are a (an
abort/rollback state) and c (a commit state). Often there is a
wait-state, which we denote by w.
 Assume the participants are P1,…,Pn. Possible coordinator is
P0, when the protocol starts.
2PC Coordinator

A commit-request from application


VoteReq to P1,…,Pn

Timeout or No from one of P1,.., Pn Yes from all P1,..,Pn


Abort to P1,…Pn Commit to P1,…,Pn

a c
2PC Participant

VoteReq from P0 VoteReq from P0


Yes to P0 No to P0

w a

Abort from P0
Commit from P0 -
-

c
Commit Protocol State Transitions
 In a commit protocol, the idea is to inform other participants
on local progress.
 In fact, a state transition without message change is
uninteresting, unless the participant moves into a terminal
state.
 Therefore, unless a participant moves into a terminal state,
we may assume that it sends messages to other participants
about its change of state.
 To simplify our analysis, we may assume that the messages
are sent to all other participants. This is not necessary, but
creates unnecessary complication.
Concurrency set
 A concurrency set of a state s is the set of
possible states among all participants, if
some participant is in state s.
 In other words, the concurrency set of state s
is the set of all states that can co-exist with
state s.
2PC Concurrency Sets

q q

VoteReq from P0 VoteReq from P0


Commit-req. Yes to P0 No to P0
VoteReq to All

w w a

Timeout or a No Yes from all Abort from P0


Abort to all Commit to all -

a c c

Concurrency_set(q) = {q,w,a}, Concurrency_set(a) = {q,w,a}


Concurrency_set(w) = {q,w,a,c}, Concurrency_set(c) = (w,c)
Committable states
We say that a state is committable, if the
existence of a participant in this state means
that everyone has voted Yes.
If a state is not committable, we say that it is
non-committable.
In 2PC, w and c are committable states.
How can a site terminate when there
is a timeout?
Either (1) one of the operational sites knows the fate
of the transaction, or (2) the operational sites can
decide the fate of the transaction.
Knowing the fate of the transaction means, in
practice, that there is a participant in a terminal
state.
Start by considering a single participant s. The site
must infer the possible states of other participants
from its own state. This can be done using
concurrency sets.
When can’t a single participant
unilaterally abort?
Suppose a participant is in a state, which has a
commit state in its concurrency set. Then, it is
possible that some other participant is in a
commit state.
A participant in a state, which has a commit
state in its concurrency set, should not
unilaterally abort.
When can’t a single participant
unilaterally commit?
Suppose a participant is in a state, which has an
abort state in its concurrency set. Then, some
participant may be in an abort state.
A participant in a state, which has an abort state in
its concurrency set, should not unilaterally commit.
Also, a participant that is not in a committable state
should not commit.
The Fundamental Non-Blocking
Theorem
A protocol is non-blocking, if and only if it
satisfies the following conditions:
(1) There exists no local state such that its
concurrency set contains both an abort and a
commit state, and
(2) there exists no noncommittable state,
whose concurrency set contains a commit
state.
Showing the Fundamental Non-
Blocking Theorem
From our discussion above it follows that
Conditions (1) and (2) are necessary.
We discuss their sufficiency later by showing
how to terminate a commit protocol fulfilling
conditions (1) and (2).
Observations on 2PC
As the participants exchange messages as they
progress, they progress in a synchronised fashion.
In fact, there is always at most one step difference
between the states of any two live participants.
We say that the participants keep a one-step
synchronisation.
It is easy to see by Fundamental Nonblocking
Theorem that 2PC is blocking.
One-step synchronisation and non-
blocking property
If a commit protocol keeps one-step
synchronisation, then the concurrency set of
state s consists of s and the states adjacent to
s.
By applying this observation and the
Fundamental Non-blocking Theorem, we get a
useful Lemma:
Lemma
A protocol that is synchronous within one
state transition is non-blocking, if and only if
(1) it contains no state adjacet to both a
Commit and an Abort state, and
(2) it contains non non-committable state that
is adjacet to a commit state.
How to improve 2PC to get a non-
blocking protocol
It is easy to see that the state w is the problematic
state – and in two ways:
- it has both Abort and Commit in its concurrency
set, and
- it is a non-committable state, but it has Commit
in its concurrency set.
Solution: add an extra state between w and c
(adding between w and a would not do – why?)
We are primarily interested in the centralised
protocol, but similar decentralised improvement
is possible.
3PC Coordinator
q

A commit-request from application


VoteReq to P1,…,Pn

w
Yes from all P1,..,Pn
Timeout or No from one of P1,.., Pn Prepare to P1,…,Pn
Abort to P1,…Pn

a p

Ack from all P1,..,Pn


Commit to P1,…,Pn

c
3PC Participant

q
VoteReq from P0 VoteReq from P0
Yes to P0 No to P0

w a

Abort from P0
Prepare from P0 -
Ack to P0
p

Commit from P0
-
c
3PC Concurrency sets (cs)
q q
A commit-request
VoteReq from P0 VoteReq from P0
VoteReq to all
Yes to P0 No to P0

w
w a
Timeout or one No Yes from all
Abort to all Prepare to all Abort from P0
- P0
Prepare from
Ack to P0
a p p
cs(p) = {w,p,c}, Ack from all
Commit from P0
cs(w) = {q,a,w,p}, Commit to all
-
etc.
c c
3PC and failures
If there are no failures, then clearly 3PC is correct.
In the presence of failures, the operational
participants should be able to terminate their
execution.
In the centralised case, a need for termination
protocol implies that the coordinator is no longer
operational.
We discuss a general termination protocol. It makes
the assumption that at least one participant remains
operational and that the participants obey the
Fundamental Non-Blocking Theorem.
Termination
Basic idea: Choose a backup coordinator B – vote or
use some preassigned ids.
Backup Coordinator Decision Rule:
If the B’s state contains commit in its concurrency
set, commit the transaction. Else abort the
transaction.
Reasoning behind the rule: If B’s state contains
commit in the concurrency set, then it is possible
that some site has performed commit – otherwise
not.
Re-executing termination
It is, of course, possible the backup
coordinator fails.
For this reason, the termination protocol
should be executed in such a way that it can
be re-executed.
In particular, the termination protocol must
not break the one-step synchronisation.
Implementing termination
To keep one-step synchronisation, the termination
protocol should be executed in two steps:
1. The backup coordinator B tells the others to make
a transition to B’s state. Others answer Ok. (This is
not necessary if B is in Commit or Abort state.)
2. B tells the others to commit or abort by the
decision rule.
Fundamental Non-Blocking Theorem
Proof - Sufficiency
The basic termination procedure and decision
rule is valid for any protocol that fulfills the
conditions given in the Fundamental Non-
Blocking Theorem.
The existence of a termination protocol
completes the proof.
Voting protocols
 Principles:
 Data replicated at several sites to increase reliability
 Each replica assigned a number of votes
 To access a replica, a process must collect a majority of votes
 Vote mechanism:
 (1) Static voting:
Each replica has number of votes (in stable storage)
A process can access a replica for a read or write operation if it can
collect a certain number of votes (read or write quorum)
 (2) Dynamic voting
Number of votes or the set of sites that form a quorum change with
the state of system (due to site and communication failures)
(2.1) Majority based approach:
 Set of sites that can form a majority to allow access to replicated data of
changes with the changing state of the system
(2.2) Dynamic vote reassignment:
 Number of votes assigned to a site changes dynamically

69
Failure resilient processes
 Resilient process: continues execution in the presence of failures
with minimum disruption to the service provided (masks failures)
 Approaches for implementing resilient processes:
 Backup processes and
 Replicated execution
 (1) Backup processes
 Each process made of a primary process and one or more backup
processes
 Primary process execute, while the backup processes are inactive
 If primary process fails, a backup process takes over
 Primary process establishes checkpoints, such that backup process can
restart
 (2) Replicated execution
 Several processes execute same program concurrently
 Majority consensus (voting) of their results
 Increases both the reliability and availability of the process 70
Recovery (fault tolerant) block concept

 Provide fault-tolerance within an individual sequential process in


which assignments to stored variables are the only means of
making recognizable progress

 The recovery block is made of:


 A primary block (the conventional program),

 Zero or more alternates (providing the same function as the primary block,
but using different algorithm), and

 An acceptance test (performed on exit from a primary or alternate block to


validate its actions).

71
Recovery (fault tolerant) Block concept

Recovery Block A

Acceptance test AT

Primary block AP

<Program text>

Alternate block AQ

<Program text>

Recovery block
Primary block Acceptance
alternate block test

72
N-version programming

Module ‘0’

Module ‘1’

Voter

Module ‘n-1’

73
Thank U

You might also like