Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

Chapter I: Introduction to

Distributed System
Sistemas Distribuidos
Carrera de Ingeniería de Sistemas
Universidad Politécnica Salesiana
Based on Distributed System of UPM
Original Author: Sergio Arévalo

Rodrigo Tufiño
Mayo 2016
Contents

1. Motivation
2. Distributed abstractions
3. Examples of distributed applications
4. Model

Bibliography
Introduction to Reliable Distributed Programming .
Rachid Gerraoui, Luis Rodrigues. Springer-Verlag
2006. Chpts. 1 & 2.
2
Motivation
tiene que ver

• Distributed computing has to do with algorithms


for a set of processes that cooperate. entre si.
conjunto cooperan

ademas

• Besides some of the processes of the distributed


algorithm might stop by crashing while others
might stay alive and keep operating.
seguir operando

Esto

• This differentiates a distributed system from a


concurrent system
3
Motivation (cont.)
reto todavia
• The challenge is for the processes that are still alive.
cooperando entre
• They must continue
forma coherente
cooperating among them in a
a pesar de
consistent way in spite of the failure
fracaso
of the other
processes.
fallas

• The process cooperation must tolerate failures.

• The communication asynchrony and communication


link failures makes very difficult this cooperation.

4
Process cooperation
Motivation

•The most common processes cooperation is


client-server: Client

Server

•Tolerating failures would mean that:


solicitud
• if the server fails, the client should do the request
to another server.
• If some clients fail, the server should continue
offering services to other clients.
5
Multiparty Multipartidario

Motivation

•Other form of cooperation: multiparty


interaction or peer-to-peer interaction
P1
get file A get part 3 of file A
P2
get file A get part 2 of file A
P3
get file A get part 1 of file A
P4

•Tolerating failures of this kind of interactions


is more complex complejo

6
Uncertainties Incertidumbres

Motivation

•Distributed computing means that processes


might execute in different physical nodes.
implica

•This implies two more uncertainties:


• Processes might not share the same clock
• Processes might not share the same memory

7
Clock
Motivation

If processes don’t share the clock they can not


time-order easily the events of the system.

B. Y. W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don ’ t Settle for


Eventual Consistency,” Commun. Acm, vol. 57 |, no. 5, pp. 61–68, 2014.

8
Sharing memory
Motivation
Sin compartir memoria:
• Without sharing memory: there is not instant global
state. estado global instantáneo.
get global state() get global state()
Server 1
s1
m1 s1’ m1
Server 2
s2 s2’
Server 3
s3 s3’
Client

• State (s1,s2,s3) is not possible (instantaneous)


• State (s1’,s2’,s3’) is possible but problems with m1
9
Distributed abstractions

• To understand distributed system we need to


capture the properties/abstractions that help
distinguish the fundamental from the accessory.
subyacente
• We will abstract the underlying physical system:
basic abstractions
• Then we will show some recurring interaction
patrones
patterns in distributed applications: applications
abstractions.

10
Basic abstractions
Distributed abstractions
resumen
•Processes: that abstract the active entities
realizan
that perform computations (computer,
processor, a thread of execution).
resumen

•Links: that abstract the physical and logical


network that support communication among entre

processes.

11
Application abstractions
Distributed abstractions
Fiable Debido

•Reliable and efficient communication: Because


there are failures and asynchrony periods,
some abstractions to get reliable and efficient
links are needed.
Comunicación fiable y eficiente: Debido a que hay fallas y períodos de asincronía,
se necesitan algunas abstracciones para obtener enlaces fiables y eficientes.
Relojes lógicos:
•Logical clocks: Because there is no global
clock, an abstraction to time-order the
distributed system is needed.
Relojes lógicos: Debido a que no hay reloj global, se necesita una abstracción para
ordenar el tiempo del sistema distribuido.

12
Application abstractions (cont.)
Distributed abstractions
Estados globales distribuidos: Debido a que no existe un estado global,
se necesita una abstracción para obtener un estado global distribuido coherente.

•Distributed global states: Because there is no


global state, an abstraction to obtain a
consistent distributed global state is needed.
Primitivas de multidifusión: Debido a que no existe un mecanismo de difusión de hardware fiable y síncrono,
se necesita una abstracción de multidifusión, implementando diferentes calidad de servicios,
para comunicar grupos de procesos.

•Multicast primitives: Because there is no


reliable and synchronous hardware broadcast
mechanism, an multicast abstraction,
implementing different quality of services, to
communicate groups of processes is needed.

13
Application abstractions (cont.)
Distributed abstractions
• Memoria compartida: Como no hay memoria física compartida entre los procesos,
se necesita una abstracción para permitir que el proceso comparta memoria.

•Shared memory: Because there is no shared


physical memory among processes, an
abstraction to allow process to share memory
is needed.

•Consensus: Some application group of llegar


processes need to reach a consensus on some
value to advance in their computation, an
abstraction to get this consensus is needed.
• Consenso: Algún grupo de aplicaciones de procesos necesita llegar a un consenso sobre algún
valor para avanzar en su cálculo, es neceseario una abstracción para obtener este consenso.

14
Application abstractions (cont.)
Distributed abstractions

•Failure detectors: The system asynchrony conocimiento

creates uncertainties about the knowledge of


process failures, an abstraction to detect
failures is needed.
compromiso atomico

•Atomic commitment: A group of processes


acordar
need to agree to execute some step only if all
de acuerdo
agree to do it, otherwise the step is not done,
an abstraction to do this commitment is
compromiso
needed
• Compromiso atómico: un grupo de procesos necesita acordar ejecutar algún paso sólo si todos están de
acuerdo en hacerlo, de lo contrario no se realiza el paso, se necesita una abstracción para hacer este compromiso
15
Application abstractions (cont.)
Distributed abstractions
Eleccion del Lider

•Leader election: A group of processes need to


anterior
elect among them a leader when a previous
leader fails, an abstraction to elect a leader is
needed

16
Examples of distributed
applications

Diseminacion de informacion
•Information dissemination
Aplicaciones de control de proceso
•Process control applications
Trabajo cooperativo
•Cooperative work
Bases de datos distribuidas
•Distributed databases
•Highly Available Services
Servicios de alta disponibilidad

17
Diseminacion de Informacion

Information dissemination
Examples of distributed applications

•Processes may produce information,


publishers editores

•Processes may consume information,


subscribers suscriptores

•Also called publish-subscribe paradigm


•If several processes are interested in the same
notification a multicast primitive with reliable
delivery property is needed
•An example is a RSS news channel
Si varios procesos están interesados en la misma notificación,
se necesita una primitiva de multidifusión con propiedad de entrega confiable
18
Process control applications
Examples of distributed applications

•Software processes most control the


execution of a physical activity.
•They might control dynamic location of
aircrafts, temperature of nuclear installations,
automation of car production, ...
•Some of the processes have typically
connected a sensor. To tolerate processes
failures a group of processes may
consensuate their input sensor values in
order to offer a reliable output value.
Algunos de los procesos han conectado típicamente un sensor. Para tolerar fallos de procesos,
un grupo de procesos puede consensuarse sus valores de sensor de entrada 19
para ofrecer un valor de salida confiable.
Cooperative work
Examples of distributed applications

• Internet users may cooperate in building a


common software or document, or setting up a
distributed dialogue.
• They can use an space abstraction with read and
write operations on it.
• These abstractions can be a distributed shared
memory, or a distributed file service.
• To maintain a consistent view of the shared
space, processes must to agree on the order of
operations.
20
Distributed database
Examples of distributed applications

• In distributed systems several transaction


managers might cooperate to service each
transaction.
• When a transaction end a distributed atomic
commitment algorithm must be execute in order
to decide if the transaction must commit or
abort.
• A transaction manager might decide to abort the
transaction if it detects a violation of the
database integrity, a deadlock problem, a disk
error, etc.

21
High available services
Examples of distributed applications

• It is done using the state-machine replication


approach
• Several processes (replicas) execute the same code in
different nodes (independent probability of failure).
• They receive the same inputs (messages) in the same
order with a total-ordered multicast.
• All replicas execute the same states if they have the
same deterministic code.
• If one replica fails nothing happens because the others
continue offering the service of the replicated service.

22
Model

•Distributed Computation

•Process
• Failure modes

•Communication links

•Timing assumptions
23
Model
Distributed Computation

•Processes are the units of computations.


•System can be static or dynamic on the set of
processes.
•Processes might know the processes
identifiers of the system (known membership)
or not (unknown membership).
•Unless explicitly stated otherwise, it is
assumed that the set is static and the
membership is known.

24
Model
Distributed Computation

• No assumption is made on the mapping of


processes to actual processors, processes or
threads.
• Processes communicate exchanging messages
and the messages are uniquely identified
(proc_id, sec_num).
• Messages are exchanged through
communication links.
• A distributed algorithm is a collection of
distributed automata, one per process.
25
Model
Distributed Computation

• A process step consist in receiving (delivering) a


message (global event), executing a local
computation (local event) and sending a
message (global event).
• Only one process step in the distributed system
at the same time. (Virtual global scheduler)
• Some of the step events can be “nil” (nothing is
done).
• Unless specified otherwise we will consider
deterministic algorithms.
26
Model
Process

•Unless it fails a process is supposed to


execute the algorithm assigned to it.
•The unit of failure is the process (atomic
component).
•When it fails, all its components fail as well at
the same time.
•Process abstraction differ according to the
nature of the failure that are considered.

27
Model - Process
Failure modes

CRASHES

OMISSIONS
CRASHES & RECOVERY
ARBITRARY
28
Model - Process
Arbitrary failure mode

•It happens when a process execute deviates


arbitrarily from the algorithm assigned to it.
•It is the most general failure mode.
•A process can process any output and at any
time.
•They are also called byzantine failures and
malicious failures.
•They are the most expensive to tolerate
29
Model - Process
Omissions failure mode

•It happens when a process does not send (or


receive) a message it is supposed to send (or
receive) according to the algorithm.
•In general this faults are due to buffer
overflows or network congestion.
•With omissions a process deviates from the
algorithm assigned due to messages lost.

30
Model - Process
Crash failure mode

•It happens when a process stops executing


after some time t.
•It is called a crash failure and it is said that we
have a crash-stop process abstraction.
•It is typical to assume in algorithms to have
up to F failures. This means that during the
execution the number of real processes
crashes will be less or equal to F.

31
Model - Process
Crash-recovery failure mode

•In this mode process can recover after crash.


•Two options: to have stable storage or not.
•With the crash all the volatile memory is lost
but not the stable storage. After the recovery
the stable storage can be read.
•Processes: permanently up; eventually up;
eventually down; permanently up&down

32
Model - Communication links
The link abstraction
• The link is used to represent the network
components of the distributed systems.

• Unless otherwise stated every pair of processes


is connected by a bidirectional link, providing a
full connectivity among processes.

• In practice, different topologies may be used to


implement this abstraction, possibly using
routing algorithms: a fully connected mesh, an
ethernet, a ring, the internet.
33
Model - Communication links
The link abstraction (cont.)

•Some algorithms do not consider a fully


connected system.

•In this case the algorithm should route the


messages by itself.

•Messages are uniquely identified

34
Model - Communication links
Link failures

•Links can loss messages (omission) and delay


messages (timing).

•A process can retransmit messages if it loss


them.

•Using Fair-loss links we can implement


reliable links.
35
Model - Communication links
Link failures (cont.)

•The Fair-loss link properties are:


• Fair-loss: if a process p send infinitely number of
messages to process q, then q will deliver
infinitely number of messages, if p and q don’t
crash.
• Finite duplication: If p send to q a message m a
finite number of times, m cannot be deliver an
infinite number of times to q.
• No creation: If m is deliver then m was sent

36
Model – Timing assumptions
Types of timing systems

•The lack of a global clock and the


uncertainties in the communication delay
duration produces different types of timing
systems.

•This timing systems are:


• Asynchronous
• Synchronous
• Partially synchronous
37
Asynchronous system
Timing assumptions

•Processes: There is no upper bound on


maximum processing delays.
•Communication links: There is no upper
bound on maximum message transmission
delay
•More realistic. Like internet.
•Difficult or impossible to build algorithms:
consensus, atomic broadcast, membership
service.

38
Synchronous system
Timing assumptions

•Processes: There is a known upper bound on


maximum processing delays.
•Communication links: There is a known upper
bound on maximum message transmission
delay.
•Less realistic. Only real-time systems.
•Easy to detect processes failures reliably

39
Partially Synchronous system
Timing assumptions
• Processes: There is an upper bound on the
maximum processing delays but is unknown.
• Communication links: There is an upper bound
on the maximum message transmission delay
but is unknown.
• It is realistic.
• It is possible to detect processes failures
unreliably with adaptative timeouts.
• It is possible to implement consensus, atomic
• broadcast, membership services.

40

You might also like