Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Chapter I: Introduction to

Distributed System
Sistemas Distribuidos
Carrera de Ingeniería de Sistemas
Universidad Politécnica Salesiana
Based on Distributed System of UPM
Original Author: Sergio Arévalo

Rodrigo Tufiño
Mayo 2016

1. Motivation
2. Distributed abstractions
3. Examples of distributed applications
4. Model

Introduction to Reliable Distributed Programming .
Rachid Gerraoui, Luis Rodrigues. Springer-Verlag
2006. Chpts. 1 & 2.
tiene que ver

• Distributed computing has to do with algorithms

for a set of processes that cooperate. entre si.
conjunto cooperan


• Besides some of the processes of the distributed

algorithm might stop by crashing while others
might stay alive and keep operating.
seguir operando


• This differentiates a distributed system from a

concurrent system
Motivation (cont.)
reto todavia
• The challenge is for the processes that are still alive.
cooperando entre
• They must continue
forma coherente
cooperating among them in a
a pesar de
consistent way in spite of the failure
of the other

• The process cooperation must tolerate failures.

• The communication asynchrony and communication

link failures makes very difficult this cooperation.

Process cooperation

•The most common processes cooperation is

client-server: Client


•Tolerating failures would mean that:

• if the server fails, the client should do the request
to another server.
• If some clients fail, the server should continue
offering services to other clients.
Multiparty Multipartidario


•Other form of cooperation: multiparty

interaction or peer-to-peer interaction
get file A get part 3 of file A
get file A get part 2 of file A
get file A get part 1 of file A

•Tolerating failures of this kind of interactions

is more complex complejo

Uncertainties Incertidumbres


•Distributed computing means that processes

might execute in different physical nodes.

•This implies two more uncertainties:

• Processes might not share the same clock
• Processes might not share the same memory


If processes don’t share the clock they can not

time-order easily the events of the system.

B. Y. W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don ’ t Settle for

Eventual Consistency,” Commun. Acm, vol. 57 |, no. 5, pp. 61–68, 2014.

Sharing memory
Sin compartir memoria:
• Without sharing memory: there is not instant global
state. estado global instantáneo.
get global state() get global state()
Server 1
m1 s1’ m1
Server 2
s2 s2’
Server 3
s3 s3’

• State (s1,s2,s3) is not possible (instantaneous)

• State (s1’,s2’,s3’) is possible but problems with m1
Distributed abstractions

• To understand distributed system we need to

capture the properties/abstractions that help
distinguish the fundamental from the accessory.
• We will abstract the underlying physical system:
basic abstractions
• Then we will show some recurring interaction
patterns in distributed applications: applications

Basic abstractions
Distributed abstractions
•Processes: that abstract the active entities
that perform computations (computer,
processor, a thread of execution).

•Links: that abstract the physical and logical

network that support communication among entre


Application abstractions
Distributed abstractions
Fiable Debido

•Reliable and efficient communication: Because

there are failures and asynchrony periods,
some abstractions to get reliable and efficient
links are needed.
Comunicación fiable y eficiente: Debido a que hay fallas y períodos de asincronía,
se necesitan algunas abstracciones para obtener enlaces fiables y eficientes.
Relojes lógicos:
•Logical clocks: Because there is no global
clock, an abstraction to time-order the
distributed system is needed.
Relojes lógicos: Debido a que no hay reloj global, se necesita una abstracción para
ordenar el tiempo del sistema distribuido.

Application abstractions (cont.)
Distributed abstractions
Estados globales distribuidos: Debido a que no existe un estado global,
se necesita una abstracción para obtener un estado global distribuido coherente.

•Distributed global states: Because there is no

global state, an abstraction to obtain a
consistent distributed global state is needed.
Primitivas de multidifusión: Debido a que no existe un mecanismo de difusión de hardware fiable y síncrono,
se necesita una abstracción de multidifusión, implementando diferentes calidad de servicios,
para comunicar grupos de procesos.

•Multicast primitives: Because there is no

reliable and synchronous hardware broadcast
mechanism, an multicast abstraction,
implementing different quality of services, to
communicate groups of processes is needed.

Application abstractions (cont.)
Distributed abstractions
• Memoria compartida: Como no hay memoria física compartida entre los procesos,
se necesita una abstracción para permitir que el proceso comparta memoria.

•Shared memory: Because there is no shared

physical memory among processes, an
abstraction to allow process to share memory
is needed.

•Consensus: Some application group of llegar

processes need to reach a consensus on some
value to advance in their computation, an
abstraction to get this consensus is needed.
• Consenso: Algún grupo de aplicaciones de procesos necesita llegar a un consenso sobre algún
valor para avanzar en su cálculo, es neceseario una abstracción para obtener este consenso.

Application abstractions (cont.)
Distributed abstractions

•Failure detectors: The system asynchrony conocimiento

creates uncertainties about the knowledge of

process failures, an abstraction to detect
failures is needed.
compromiso atomico

•Atomic commitment: A group of processes

need to agree to execute some step only if all
de acuerdo
agree to do it, otherwise the step is not done,
an abstraction to do this commitment is
• Compromiso atómico: un grupo de procesos necesita acordar ejecutar algún paso sólo si todos están de
acuerdo en hacerlo, de lo contrario no se realiza el paso, se necesita una abstracción para hacer este compromiso
Application abstractions (cont.)
Distributed abstractions
Eleccion del Lider

•Leader election: A group of processes need to

elect among them a leader when a previous
leader fails, an abstraction to elect a leader is

Examples of distributed

Diseminacion de informacion
•Information dissemination
Aplicaciones de control de proceso
•Process control applications
Trabajo cooperativo
•Cooperative work
Bases de datos distribuidas
•Distributed databases
•Highly Available Services
Servicios de alta disponibilidad

Diseminacion de Informacion

Information dissemination
Examples of distributed applications

•Processes may produce information,

publishers editores

•Processes may consume information,

subscribers suscriptores

•Also called publish-subscribe paradigm

•If several processes are interested in the same
notification a multicast primitive with reliable
delivery property is needed
•An example is a RSS news channel
Si varios procesos están interesados en la misma notificación,
se necesita una primitiva de multidifusión con propiedad de entrega confiable
Process control applications
Examples of distributed applications

•Software processes most control the

execution of a physical activity.
•They might control dynamic location of
aircrafts, temperature of nuclear installations,
automation of car production, ...
•Some of the processes have typically
connected a sensor. To tolerate processes
failures a group of processes may
consensuate their input sensor values in
order to offer a reliable output value.
Algunos de los procesos han conectado típicamente un sensor. Para tolerar fallos de procesos,
un grupo de procesos puede consensuarse sus valores de sensor de entrada 19
para ofrecer un valor de salida confiable.
Cooperative work
Examples of distributed applications

• Internet users may cooperate in building a

common software or document, or setting up a
distributed dialogue.
• They can use an space abstraction with read and
write operations on it.
• These abstractions can be a distributed shared
memory, or a distributed file service.
• To maintain a consistent view of the shared
space, processes must to agree on the order of
Distributed database
Examples of distributed applications

• In distributed systems several transaction

managers might cooperate to service each
• When a transaction end a distributed atomic
commitment algorithm must be execute in order
to decide if the transaction must commit or
• A transaction manager might decide to abort the
transaction if it detects a violation of the
database integrity, a deadlock problem, a disk
error, etc.

High available services
Examples of distributed applications

• It is done using the state-machine replication

• Several processes (replicas) execute the same code in
different nodes (independent probability of failure).
• They receive the same inputs (messages) in the same
order with a total-ordered multicast.
• All replicas execute the same states if they have the
same deterministic code.
• If one replica fails nothing happens because the others
continue offering the service of the replicated service.


•Distributed Computation

• Failure modes

•Communication links

•Timing assumptions
Distributed Computation

•Processes are the units of computations.

•System can be static or dynamic on the set of
•Processes might know the processes
identifiers of the system (known membership)
or not (unknown membership).
•Unless explicitly stated otherwise, it is
assumed that the set is static and the
membership is known.

Distributed Computation

• No assumption is made on the mapping of

processes to actual processors, processes or
• Processes communicate exchanging messages
and the messages are uniquely identified
(proc_id, sec_num).
• Messages are exchanged through
communication links.
• A distributed algorithm is a collection of
distributed automata, one per process.
Distributed Computation

• A process step consist in receiving (delivering) a

message (global event), executing a local
computation (local event) and sending a
message (global event).
• Only one process step in the distributed system
at the same time. (Virtual global scheduler)
• Some of the step events can be “nil” (nothing is
• Unless specified otherwise we will consider
deterministic algorithms.

•Unless it fails a process is supposed to

execute the algorithm assigned to it.
•The unit of failure is the process (atomic
•When it fails, all its components fail as well at
the same time.
•Process abstraction differ according to the
nature of the failure that are considered.

Model - Process
Failure modes


Model - Process
Arbitrary failure mode

•It happens when a process execute deviates

arbitrarily from the algorithm assigned to it.
•It is the most general failure mode.
•A process can process any output and at any
•They are also called byzantine failures and
malicious failures.
•They are the most expensive to tolerate
Model - Process
Omissions failure mode

•It happens when a process does not send (or

receive) a message it is supposed to send (or
receive) according to the algorithm.
•In general this faults are due to buffer
overflows or network congestion.
•With omissions a process deviates from the
algorithm assigned due to messages lost.

Model - Process
Crash failure mode

•It happens when a process stops executing

after some time t.
•It is called a crash failure and it is said that we
have a crash-stop process abstraction.
•It is typical to assume in algorithms to have
up to F failures. This means that during the
execution the number of real processes
crashes will be less or equal to F.

Model - Process
Crash-recovery failure mode

•In this mode process can recover after crash.

•Two options: to have stable storage or not.
•With the crash all the volatile memory is lost
but not the stable storage. After the recovery
the stable storage can be read.
•Processes: permanently up; eventually up;
eventually down; permanently up&down

Model - Communication links
The link abstraction
• The link is used to represent the network
components of the distributed systems.

• Unless otherwise stated every pair of processes

is connected by a bidirectional link, providing a
full connectivity among processes.

• In practice, different topologies may be used to

implement this abstraction, possibly using
routing algorithms: a fully connected mesh, an
ethernet, a ring, the internet.
Model - Communication links
The link abstraction (cont.)

•Some algorithms do not consider a fully

connected system.

•In this case the algorithm should route the

messages by itself.

•Messages are uniquely identified

Model - Communication links
Link failures

•Links can loss messages (omission) and delay

messages (timing).

•A process can retransmit messages if it loss


•Using Fair-loss links we can implement

reliable links.
Model - Communication links
Link failures (cont.)

•The Fair-loss link properties are:

• Fair-loss: if a process p send infinitely number of
messages to process q, then q will deliver
infinitely number of messages, if p and q don’t
• Finite duplication: If p send to q a message m a
finite number of times, m cannot be deliver an
infinite number of times to q.
• No creation: If m is deliver then m was sent

Model – Timing assumptions
Types of timing systems

•The lack of a global clock and the

uncertainties in the communication delay
duration produces different types of timing

•This timing systems are:

• Asynchronous
• Synchronous
• Partially synchronous
Asynchronous system
Timing assumptions

•Processes: There is no upper bound on

maximum processing delays.
•Communication links: There is no upper
bound on maximum message transmission
•More realistic. Like internet.
•Difficult or impossible to build algorithms:
consensus, atomic broadcast, membership

Synchronous system
Timing assumptions

•Processes: There is a known upper bound on

maximum processing delays.
•Communication links: There is a known upper
bound on maximum message transmission
•Less realistic. Only real-time systems.
•Easy to detect processes failures reliably

Partially Synchronous system
Timing assumptions
• Processes: There is an upper bound on the
maximum processing delays but is unknown.
• Communication links: There is an upper bound
on the maximum message transmission delay
but is unknown.
• It is realistic.
• It is possible to detect processes failures
unreliably with adaptative timeouts.
• It is possible to implement consensus, atomic
• broadcast, membership services.


You might also like