Chapter I Introduction

Chapter I: Introduction to
Distributed System
Sistemas Distribuidos
Carrera de Ingeniería de Sistemas
Universidad Politécnica Salesiana
Based on Distributed System of UPM
Original Author: Sergio Arévalo
Rodrigo Tufiño
Mayo 2016
Contents
1. Motivation
2. Distributed abstractions
3. Examples of distributed applications
4. Model
Bibliography
Introduction to Reliable Distributed Programming .
Rachid Gerraoui, Luis Rodrigues. Springer-Verlag
2006. Chpts. 1 & 2.
2
Motivation
tiene que ver
• Distributed computing has to do with algorithms

for a set of processes that cooperate. entre si.
conjunto cooperan
ademas
• Besides some of the processes of the distributed

algorithm might stop by crashing while others
might stay alive and keep operating.
seguir operando
Esto
• This differentiates a distributed system from a

concurrent system
3
Motivation (cont.)
reto todavia
• The challenge is for the processes that are still alive.
cooperando entre
• They must continue
forma coherente
cooperating among them in a
a pesar de
consistent way in spite of the failure
fracaso
of the other
processes.
fallas
• The process cooperation must tolerate failures.
• The communication asynchrony and communication

link failures makes very difficult this cooperation.
4
Process cooperation
Motivation
•The most common processes cooperation is

client-server: Client
Server
•Tolerating failures would mean that:

solicitud
• if the server fails, the client should do the request
to another server.
• If some clients fail, the server should continue
offering services to other clients.
5
Multiparty Multipartidario
Motivation
•Other form of cooperation: multiparty

interaction or peer-to-peer interaction
P1
get file A get part 3 of file A
P2
P3
P4
•Tolerating failures of this kind of interactions

is more complex complejo
6
Uncertainties Incertidumbres
Motivation
•Distributed computing means that processes

might execute in different physical nodes.
implica
•This implies two more uncertainties:

• Processes might not share the same clock
• Processes might not share the same memory
7
Clock
Motivation
If processes don’t share the clock they can not

time-order easily the events of the system.
B. Y. W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don ’ t Settle for

Eventual Consistency,” Commun. Acm, vol. 57 |, no. 5, pp. 61–68, 2014.
8
Sharing memory
Motivation
Sin compartir memoria:
• Without sharing memory: there is not instant global
state. estado global instantáneo.
get global state() get global state()
Server 1
s1
m1 s1’ m1
Server 2
s2 s2’
Server 3
s3 s3’
Client
• State (s1,s2,s3) is not possible (instantaneous)

• State (s1’,s2’,s3’) is possible but problems with m1
9
Distributed abstractions
• To understand distributed system we need to

capture the properties/abstractions that help
distinguish the fundamental from the accessory.
subyacente
• We will abstract the underlying physical system:
basic abstractions
• Then we will show some recurring interaction
patrones
patterns in distributed applications: applications
abstractions.
10
Basic abstractions
resumen
•Processes: that abstract the active entities
realizan
that perform computations (computer,
processor, a thread of execution).
resumen
•Links: that abstract the physical and logical

network that support communication among entre
processes.
11
Application abstractions
Fiable Debido
•Reliable and efficient communication: Because

there are failures and asynchrony periods,
some abstractions to get reliable and efficient
links are needed.
Comunicación fiable y eficiente: Debido a que hay fallas y períodos de asincronía,
se necesitan algunas abstracciones para obtener enlaces fiables y eficientes.
Relojes lógicos:
•Logical clocks: Because there is no global
clock, an abstraction to time-order the
distributed system is needed.
Relojes lógicos: Debido a que no hay reloj global, se necesita una abstracción para
ordenar el tiempo del sistema distribuido.
12
Application abstractions (cont.)
Estados globales distribuidos: Debido a que no existe un estado global,
se necesita una abstracción para obtener un estado global distribuido coherente.
•Distributed global states: Because there is no

global state, an abstraction to obtain a
consistent distributed global state is needed.
Primitivas de multidifusión: Debido a que no existe un mecanismo de difusión de hardware fiable y síncrono,
se necesita una abstracción de multidifusión, implementando diferentes calidad de servicios,
para comunicar grupos de procesos.
•Multicast primitives: Because there is no

reliable and synchronous hardware broadcast
mechanism, an multicast abstraction,
implementing different quality of services, to
communicate groups of processes is needed.
13
• Memoria compartida: Como no hay memoria física compartida entre los procesos,
se necesita una abstracción para permitir que el proceso comparta memoria.
•Shared memory: Because there is no shared

physical memory among processes, an
abstraction to allow process to share memory
is needed.
•Consensus: Some application group of llegar

processes need to reach a consensus on some
value to advance in their computation, an
abstraction to get this consensus is needed.
• Consenso: Algún grupo de aplicaciones de procesos necesita llegar a un consenso sobre algún
valor para avanzar en su cálculo, es neceseario una abstracción para obtener este consenso.
14
•Failure detectors: The system asynchrony conocimiento
creates uncertainties about the knowledge of

process failures, an abstraction to detect
failures is needed.
compromiso atomico
•Atomic commitment: A group of processes

acordar
need to agree to execute some step only if all
de acuerdo
agree to do it, otherwise the step is not done,
an abstraction to do this commitment is
compromiso
needed
• Compromiso atómico: un grupo de procesos necesita acordar ejecutar algún paso sólo si todos están de
acuerdo en hacerlo, de lo contrario no se realiza el paso, se necesita una abstracción para hacer este compromiso
15
Eleccion del Lider
•Leader election: A group of processes need to

anterior
elect among them a leader when a previous
leader fails, an abstraction to elect a leader is
needed
16
Examples of distributed
applications
Diseminacion de informacion
•Information dissemination
Aplicaciones de control de proceso
•Process control applications
Trabajo cooperativo
•Cooperative work
Bases de datos distribuidas
•Distributed databases
•Highly Available Services
Servicios de alta disponibilidad
17
Diseminacion de Informacion
Information dissemination
Examples of distributed applications
•Processes may produce information,

publishers editores
•Processes may consume information,

subscribers suscriptores
•Also called publish-subscribe paradigm

•If several processes are interested in the same
notification a multicast primitive with reliable
delivery property is needed
•An example is a RSS news channel
Si varios procesos están interesados en la misma notificación,
se necesita una primitiva de multidifusión con propiedad de entrega confiable
18
Process control applications
•Software processes most control the

execution of a physical activity.
•They might control dynamic location of
aircrafts, temperature of nuclear installations,
automation of car production, ...
•Some of the processes have typically
connected a sensor. To tolerate processes
failures a group of processes may
consensuate their input sensor values in
order to offer a reliable output value.
Algunos de los procesos han conectado típicamente un sensor. Para tolerar fallos de procesos,
un grupo de procesos puede consensuarse sus valores de sensor de entrada 19
para ofrecer un valor de salida confiable.
Cooperative work
• Internet users may cooperate in building a

common software or document, or setting up a
distributed dialogue.
• They can use an space abstraction with read and
write operations on it.
• These abstractions can be a distributed shared
memory, or a distributed file service.
• To maintain a consistent view of the shared
space, processes must to agree on the order of
operations.
20
Distributed database
• In distributed systems several transaction

managers might cooperate to service each
transaction.
• When a transaction end a distributed atomic
commitment algorithm must be execute in order
to decide if the transaction must commit or
abort.
• A transaction manager might decide to abort the
transaction if it detects a violation of the
database integrity, a deadlock problem, a disk
error, etc.
21
High available services
• It is done using the state-machine replication

approach
• Several processes (replicas) execute the same code in
different nodes (independent probability of failure).
• They receive the same inputs (messages) in the same
order with a total-ordered multicast.
• All replicas execute the same states if they have the
same deterministic code.
• If one replica fails nothing happens because the others
continue offering the service of the replicated service.
22
Model
•Distributed Computation
•Process
• Failure modes
•Communication links
•Timing assumptions
23
Model
Distributed Computation
•Processes are the units of computations.

•System can be static or dynamic on the set of
processes.
•Processes might know the processes
identifiers of the system (known membership)
or not (unknown membership).
•Unless explicitly stated otherwise, it is
assumed that the set is static and the
membership is known.
24
Model
• No assumption is made on the mapping of

processes to actual processors, processes or
threads.
• Processes communicate exchanging messages
and the messages are uniquely identified
(proc_id, sec_num).
• Messages are exchanged through
communication links.
• A distributed algorithm is a collection of
distributed automata, one per process.
25
Model
• A process step consist in receiving (delivering) a

message (global event), executing a local
computation (local event) and sending a
message (global event).
• Only one process step in the distributed system
at the same time. (Virtual global scheduler)
• Some of the step events can be “nil” (nothing is
done).
• Unless specified otherwise we will consider
deterministic algorithms.
26
Model
Process
•Unless it fails a process is supposed to

execute the algorithm assigned to it.
•The unit of failure is the process (atomic
component).
•When it fails, all its components fail as well at
the same time.
•Process abstraction differ according to the
nature of the failure that are considered.
27
Model - Process
Failure modes
CRASHES
OMISSIONS
CRASHES & RECOVERY
ARBITRARY
28
Model - Process
Arbitrary failure mode
•It happens when a process execute deviates

arbitrarily from the algorithm assigned to it.
•It is the most general failure mode.
•A process can process any output and at any
time.
•They are also called byzantine failures and
malicious failures.
•They are the most expensive to tolerate
29
Model - Process
Omissions failure mode
•It happens when a process does not send (or

receive) a message it is supposed to send (or
receive) according to the algorithm.
•In general this faults are due to buffer
overflows or network congestion.
•With omissions a process deviates from the
algorithm assigned due to messages lost.
30
Model - Process
Crash failure mode
•It happens when a process stops executing

after some time t.
•It is called a crash failure and it is said that we
have a crash-stop process abstraction.
•It is typical to assume in algorithms to have
up to F failures. This means that during the
execution the number of real processes
crashes will be less or equal to F.
31
Model - Process
Crash-recovery failure mode
•In this mode process can recover after crash.

•Two options: to have stable storage or not.
•With the crash all the volatile memory is lost
but not the stable storage. After the recovery
the stable storage can be read.
•Processes: permanently up; eventually up;
eventually down; permanently up&down
32
Model - Communication links
The link abstraction
• The link is used to represent the network
components of the distributed systems.
• Unless otherwise stated every pair of processes

is connected by a bidirectional link, providing a
full connectivity among processes.
• In practice, different topologies may be used to

implement this abstraction, possibly using
routing algorithms: a fully connected mesh, an
ethernet, a ring, the internet.
33
The link abstraction (cont.)
•Some algorithms do not consider a fully

connected system.
•In this case the algorithm should route the

messages by itself.
•Messages are uniquely identified
34
Link failures
•Links can loss messages (omission) and delay

messages (timing).
•A process can retransmit messages if it loss

them.
•Using Fair-loss links we can implement

reliable links.
35
Link failures (cont.)
•The Fair-loss link properties are:

• Fair-loss: if a process p send infinitely number of
messages to process q, then q will deliver
infinitely number of messages, if p and q don’t
crash.
• Finite duplication: If p send to q a message m a
finite number of times, m cannot be deliver an
infinite number of times to q.
• No creation: If m is deliver then m was sent
36
Model – Timing assumptions
Types of timing systems
•The lack of a global clock and the

uncertainties in the communication delay
duration produces different types of timing
systems.
•This timing systems are:

• Asynchronous
• Synchronous
• Partially synchronous
37
Asynchronous system
Timing assumptions
•Processes: There is no upper bound on

maximum processing delays.
•Communication links: There is no upper
bound on maximum message transmission
delay
•More realistic. Like internet.
•Difficult or impossible to build algorithms:
consensus, atomic broadcast, membership
service.
38
Synchronous system
Timing assumptions
•Processes: There is a known upper bound on

maximum processing delays.
•Communication links: There is a known upper
bound on maximum message transmission
delay.
•Less realistic. Only real-time systems.
•Easy to detect processes failures reliably
39
Partially Synchronous system
Timing assumptions
• Processes: There is an upper bound on the
maximum processing delays but is unknown.
• Communication links: There is an upper bound
on the maximum message transmission delay
but is unknown.
• It is realistic.
• It is possible to detect processes failures
unreliably with adaptative timeouts.
• It is possible to implement consensus, atomic
• broadcast, membership services.
40

Chapter I Introduction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter I Introduction

Uploaded by

Copyright:

Available Formats

Chapter I: Introduction to

• Distributed computing has to do with algorithms

• Besides some of the processes of the distributed

• This differentiates a distributed system from a

• The process cooperation must tolerate failures.

• The communication asynchrony and communication

•The most common processes cooperation is

•Tolerating failures would mean that:

•Other form of cooperation: multiparty

•Tolerating failures of this kind of interactions

•Distributed computing means that processes

•This implies two more uncertainties:

If processes don’t share the clock they can not

B. Y. W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don ’ t Settle for

• State (s1,s2,s3) is not possible (instantaneous)

• To understand distributed system we need to

•Links: that abstract the physical and logical

•Reliable and efficient communication: Because

•Distributed global states: Because there is no

•Multicast primitives: Because there is no

•Shared memory: Because there is no shared

•Consensus: Some application group of llegar

•Failure detectors: The system asynchrony conocimiento

creates uncertainties about the knowledge of

•Atomic commitment: A group of processes

•Leader election: A group of processes need to

•Processes may produce information,

•Processes may consume information,

•Also called publish-subscribe paradigm

•Software processes most control the

• Internet users may cooperate in building a

• In distributed systems several transaction

• It is done using the state-machine replication

•Processes are the units of computations.

• No assumption is made on the mapping of

• A process step consist in receiving (delivering) a

•Unless it fails a process is supposed to

•It happens when a process execute deviates

•It happens when a process does not send (or

•It happens when a process stops executing

•In this mode process can recover after crash.

• Unless otherwise stated every pair of processes

• In practice, different topologies may be used to

•Some algorithms do not consider a fully

•In this case the algorithm should route the

•Messages are uniquely identified

•Links can loss messages (omission) and delay

•A process can retransmit messages if it loss

•Using Fair-loss links we can implement

•The Fair-loss link properties are:

•The lack of a global clock and the

•This timing systems are:

•Processes: There is no upper bound on

•Processes: There is a known upper bound on

You might also like