Chap. 6 Consistency & Replication: Distributed Systems

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Chap.

6 Consistency & Replication


BIT3263 | Distributed Systems
Prepared by Noris Bt. Ismail
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Objectives
• Reasons for replication
• Relationship between replication and scalability.

• Consistency of replicated data??

• Managing replicas (Placement of the replicas and content distribution).

• Various ways that consistency can be achieved.

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Reasons for Replication


• Data are replicated to increase the reliability of a system.
• Replication for performance

▪ Scaling in numbers – Increasing num. of processes that need to access the single
server.
▪ Scaling in geographical area – Placing a copy of data in the proximity of the process
using them.
▪ Caveat / Cautions

▪ Gain in performance

▪ Cost of increased bandwidth for maintaining replication

▪ Fault tolerance – correctness concerns on the freshness of the data supplied to the
client and the effects of clients’ operation on the data e.g. Air traffic control(correct
data is needed in a short timescale)

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
Figure 6.1
A basic architectural model for the management of replicated data

Requests and
replies

RM RM
C FE

Clients Front ends Service

C FE Replica
RM managers

Components of the basic architectural model for the management of replicated data

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007
Figure 6.2
Services provided for process groups

A process outside the group sends to the


Group Group without knowing the group’s membership.
address
expansion

Group Leave
send

Multicast Group membership


communication Fail
management

Join

Process group

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Role of Group Membership Service


• Providing an interface for group membership changes – Provides operation to
create and destroy process group, add or withdraw a process.
• Implementing a failure detector – Monitor the group member in terms of
crash, or unreachable because of comm. failure.
• Notifying members of group membership changes – Notify when a new
process is added/withdrawn.
• Performing group address expansion – Provide the group identifier rather then
a list of processes in the group.

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Passive(Primary-backup) Replication

• There is only a single primary replica manager and one or more secondary replica
managers – ‘backup’/’slaves’.
• Sequence of events as follows:
1. Request – Front end issues the request (containing unique identifier to primary
manager)
2. Coordination – Primary takes the request , checks the unique identifier, if already
executed re-send the response.
3. Execution – Primary executes the request and stores the response.
4. Agreement – If update, primary sends the update state, response and unique
identifier to all the backup. Backup send ACK.
5. Response – Primary responds to the front end, and pass back to the front end

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
Figure 6.4
The passive (primary-backup) model for fault tolerance

At any one time there is


only a single primary
Prim
replica manager and one ary
or more secondary replica R
C F R
managers. M
E M
The primary replica Back
manager executes the up
operations and sends
copies of the updated data
to the backups
C F R
E M
Back
up
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007
Figure 6.5
Active replication

RM

C FE RM FE C

RM

Front end multicast their request to the group of replicas managers and
all the replica managers process the request independently but
identically and reply
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Active Replication
• Sequence of events as follows:
1. Request – Front end issues the request (containing unique identifier) and multicast
it to the group of replica managers. It will not issue next request until it receives a
response.
2. Coordination – The group comm. System delivers the request to every correct
replica manager in the same order.
3. Execution – Every replica manager executes the request. Correct replica managers
all process the request identically. The response contains the client’s unique request
identifier.
4. Agreement – No agreement phase is needed, because of the multicast delivery
semantics.
5. Response – Each replica manager send its response to the front end, then the front
end passes the first response to arrive back to the client and discard the rest

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
Figure 6.6
Query and update operations in a gossip service

Service
Provide two basic types of
RM Operation: Queries and
update operation .
gossip
RM Front ends send queries and
RM
update to any replica
manager
Update id
Query,prev Val, new Update,prev they choose (any that is
available and can provide
FE FE
reasonable response time).

Query Va Update Clients are possible to obtain


l Clients stale data from the replica
managers.

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Gossip Architecture
• Sequence of events as follows:
1. Request – Front end issues the request normally to a single replica manager at a
time. If the single replica manager fails/unreachable it may try the other RM.
2. Update the response - If the request is an update then the RM replies as soon as it
has received the update.
3. Coordination – RM receives a request and will not process it until it can apply the
request according to the required ordering constraints.
4. Execution – RM executes the request.
5. Query response – If the request is a query, then RM replies at this point.
6. Agreement – RM update one another by exchanging gossip messages, which
contain the most recent updates they have received.

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
Figure 6.7
Front ends propagate their timestamps whenever clients communicate directly

Service
Each Front End keep a
vector timestamp that reflects RM
the version of the latest data
values accessed by the FE. gossip
RM RM

When the RM returns a value


as a result of a query
operation, it supplies the new
Vector
vector timestamp, since the FE FE
timestamps
replicas may have been
updated since the last
operation. Clients

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 13
Figure 6.8
A gossip replica manager, showing its main state components

Other replica managers


Replic Replica
Updates that have been
timesta Accepted by the RM
a log
Gossi mp
p
messag Reasons for keeping
es Replica update log:
manager -Unstable status(held
Timestamp
back and
table
Valu timesta unprocessed)
Replica
e mp - Confirmation on the
timestamp Stabl
Update e Valu status of propagate
updat e mesg. to the other
log
es
Executed operation RM.
table
To prevent update being applied twice from the FE/any other
RM. Checks the identifiers.

Updat
es Prev – Latest value accessed by the FE
Operation Updat Pre Update – Unique update ID from the RM
F F ID e v
E E
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 14
Figure 6.9
Committed and tentative updates in Bayou – Provide data replication for
high availability with weaker guarantees.

Committed Tentative

c0 c1 c2 cN t0 t1 t2 ti ti+1

Tentative update ti becomes the next committed update and is


inserted after the last committed update cN.
- Updates are marked as tentative when they are first applied to the database
while updates are tentative, the system may undo/reapply them.
-Bayou arranges the tentative updates and placed it in a canonical order
and marked as committed.
-Once committed, they remain applied in their allotted order and marked as
committed.

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 15
Figure 6.10
Transactions on replicated data

Client + front end Client + front end


T U
deposit(B,3);

getBalance(A) B
Replica managers
Replica managers

A A A B B B

Different replication schemes have different rules on how many the RM


are required to carry out the operations.
E.g.
A read request can be performed by a single RM, whereas
a write request must be performed by all the replica managers in the
group.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 16
Figure 6.10
Transactions on replicated data

Client + front end Client + front end


T U
deposit(B,3);

getBalance(A) B
Replica managers
Replica managers

A A A B B B

Different replication schemes have different rules on how many the RM


are required to carry out the operations.
E.g.
A read request can be performed by a single RM, whereas
a write request must be performed by all the replica managers in the
group.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 17
Figure 6.11
Available copies

Client + front end T U Client + front end

getBalance(B)
deposit(A,3);
getBalance(A)
deposit(B,3); Replica managers
B
Replica managers M

A A B B

X Y P N

getBalance operation of transaction T is performed by X, whereas its deposit operation is


performed by M, N and P.
Concurrency control at each replica manager affects the operation performed locally.
E.g. At X, transaction T has read A and therefore transaction U is not allowed to update A with
the deposit transaction until transaction T has completed.

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 18
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Quorum-Based Protocols
• Basic idea – Require clients to request and acquire the permission of multiple
servers before either reading/writing a replicated data item.

• Gifford’s scheme for NR and NW as based on the following constraint:


1. NR + NW > N N=num. of replicas
2. NW > N/2

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

A correct choice of read and write set.

1. NR + NW > N
2. NW > N/2

N=num. of replicas

Most recent write quorum


consist of 10 servers(C-L).
Any subsequent read quorum
of the 3 servers will have to
consists one of this set.

Figure 6-22. A correct choice of read and write set.

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

A choice that may lead to write-write conflicts.

1. NR + NW > N
2. NW > N/2

N=num. of replicas

A write-write conflicts may


occur because NW ≤ N/2. Two
clients are running two updates
where the two updates will be
accepted without detecting that
they actually conflicts.

Figure 6-22. A choice that may lead to write-write conflicts.

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

A correct choice, known as ROWA (read one, write all).

1. NR + NW > N
2. NW > N/2

N=num. of replicas

It is possible to read a replicated file by


finding any copy and using it. Write update
need to acquire ALL copies.

NW > N/2

Figure 6-22. A correct choice, known as ROWA (read one, write all).

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
Figure 6.12
Network partition

Client + Client + A network partition separates a


front end front end group of replica managers into
Net two/more subgroups.
T U
withdra work
parti Only group members of the
w(B, 4) tion deposit subgroup can comm. with each
(B,3); other. – server
Replica crash/down/unreachable.
manager
B s E.g. The RM receiving the
deposit request cannot send it to
the RM receiving the withdraw
B B B request.
Replication schemes are
designed with the assumption
that partitions will eventually be
repaired. And must ensure that
no inconsistencies occur when
the partition is repaired.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 23
Figure 6.13
Two network partitions

Transaction T Network partition

Replica managers

X V Y Z

“Network partition” refer to the barrier that divides RM into several parts.

E.g. Transaction T starts by performing its read at V at a time when V is still in


contact with X, Y and Z.
- Now supposed network partition occurs in figure above, in which X and V are in
one part and Y and Z in different ones.
- When transaction T attempts to write, V will notice that it cannot contact Y and Z.
- When a RM manager cannot contact managers it could previously contact, it
will keep on trying until it can create a virtual partition (one/both of them reply).

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 24
Figure 6.14
Virtual partition

Virtual partition Network partition


Replica managers

X V Y Z

“Virtual Partition” refer to the parts of the RM themselves. Virtual partition


has a creation time, a set of potential members and a set of actual members.

E.g. V will keep on trying to contact Y and Z until both of them replies e.g. when
Y can be accessed. The group of replica managers V, X and Y comprise a virtual
partition because they are sufficient to form read and write quora.

When a new virtual partition is created during transaction that has performed an
operation at one of the RM e.g. Transaction T, the transaction must be
aborted and the replicas within the new partition must be updated.
Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 25
Figure 6.15
Two overlapping virtual partitions

Virtual partition V1 Virtual partition V 2

Y X V Z

Case – Several RM may attempt to create a new virtual partition simultaneously:


Two RM, Y and Z keep making attempts to contact each other. Partition is
partially repaired, so that Y cannot communicate with Z but the two groups V, X
and Y and V, X and Z.

Overlapping Virtual Partitions –


A read operation of transaction V, X and Y might be applied at the RM Y, in which its read
lock will not conflict with write locks set by the write operations in another virtual partitions.

Instructor’s Guide for Coulouris, Dollimore and Kindberg Distributed Systems: Concepts and Design Edn. 4
© Pearson Education 2007 26
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

References

These slides are taken from Tanenbaum & Van Steen, Distributed Systems:
Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved.
0-13-239227-5

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Sub Point #1
Reasons for Replication

Sub Point #4
Network Partition Vs. KEY Sub Point #2
Types of Replication
Virtual Partition POINTS

Sub Point #3
Consistency

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Questions : Replication
Three computers together provide a replicated service. The manufacturers claim that
each computer has a mean time between failures of five days; a failure typically takes
four hours to fix. What is the availability of the replicated service?
Answer:

Formula to use:-

n= num. of replicas.

Probability (replicated server unreachable/failed) = hours of fixing/(hours of


failures + hrs of fixing)
Availability of replicated service = 1 - pn

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

Cont..

• 1 – Probability (all managers failed or unreachable) = 1 - pn


Probability (replicated server unreachable/failed) = hours of
fixing/(hours of failures + hrs of fixing)

The probability that an individual computer is down is 4/(5*24 + 4) ~ 0.03.

Assuming failure-independence of the machines, the availability is therefore:-

n= num. of replicas.

= 1 – 0.033

= 0.999973.

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide
FACULTY OF INFORMATION & COMMUNICATION TECHNOLOGY

BIT3263 | Distributed System

End of Lecture

ALL RIGHTS RESERVED


No part of this document may be reproduced without written approval from Limkokwing University of Creative Technology Worldwide

You might also like