2 Dist Arch (Week 2) Week 3 6

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 65

Architectures for Distributed

Systems
Chapter 2
Instructor : Dr. Farzana Jabeen
Recap : Goals of Distributed
Systems
❏ Four important goals to meet to build a
distributed system
❏ Make resource available

❏ Distribution transparency

❏ Scalability

❏ Pitfalls

2
Pitfalls

Peter Deutsch (Sun microsystem) formulated mistakes as


a false assumption developers makes

❏ network is reliable
❏ network is secure
❏ network is homogenous - topology does not change
❏ latency is zero
❏ bandwidth is infinite
❏ transport cost is zero
❏ there is one administrator
❏ https://blogs.oracle.com/developers/post/fallacie 3

s-of-distributed-systems
Today’s Lecture
Architecture of Distributed
systems
Distributed systems
● Virtually all large computer-based systems are
now distributed systems.

● Information processing is distributed over


several computers rather than confined to a
single machine.
Definitions
• Software Architectures – describe the
organization and interaction of software
components; focuses on logical organization of
software (component interaction, etc.)
• System Architectures - describe the
placement of software components on physical
machines
– The realization of an architecture may be centralized
(most components located on a single machine),
decentralized (most machines have approximately the
same functionality), or hybrid (some combination).
Architectural Styles
• An architectural style describes a particular way
to configure a collection of components and
connectors.
– Component - a module with well-defined interfaces;
reusable, replaceable
– Connector – communication link between modules
• Architectures suitable for distributed systems:
– Layered architectures*
– Object-based architectures*
– Data-centered architectures
– Event-based architectures
Architectural Styles

Object based is less structured


component = object
connector = RPC or RMI

Figure 2-1. The (a) layered architectural style & (b) The object-based
architectural style.
Data-Centered Architectures
• Main purpose: data access and update
• Processes interact by reading and modifying data
in some shared repository (active or passive)
– Traditional data base (passive): responds to requests
– Blackboard system (active): clients solve problems
collaboratively; system updates clients when
information changes.
• Another example: web-based distributed systems
where communication is through web services
(Ch 12)
Architectural Styles

• Communication via event Event-based arch.


supports several
propagation, in dist. systems
communication styles:
seen often in Publish/ Subscribe;
• Publish-subscribe
e.g., register interest in market
• Broadcast
info; get email updates
• Point-to-point
• Decouples sender & receiver;
asynchronous communication

• Figure 2-2. (a) The event-based architectural style


Architectural Styles (5)

Data Centric Architecture; e.g., shared


distributed file systems or Web-based
distributed systems
Combination of data-centered and event
based architectures
Processes communicate asynchronously
Figure 2-2. (b) The shared data-space architectural style.
Distribution Transparency
• An important characteristic of software
architectures in distributed systems is that they
are designed to support distribution
transparency.
• Transparency involves trade-offs
• Different distributed applications require different
solutions/architectures
– There is no “silver bullet” – no one-size-fits-all system.
(Compare NOW, Seti@home, Condor)
System Architectures for
Distributed Systems
• Centralized: traditional client-server structure
– Vertical (or hierarchichal) organization of communication and
control paths (as in layered software architectures)
– Logical separation of functions into client (requesting process) and
server (responder)
• Decentralized: peer-to-peer
– Horizontal rather than hierarchical comm. and control
– Communication paths are less structured; symmetric functionality
• Hybrid: combine elements of C/S and P2P
– Edge-server systems
– Collaborative distributed systems.
• Classification of a system as centralized or decentralized
refers to communication and control organization, primarily.
Traditional Client-Server
• Processes are divided into two groups
(clients and servers).
• Synchronous communication: request-
reply protocol
• In LANs, often implemented with a
connectionless protocol (unreliable)
• In WANs, communication is typically
connection-oriented TCP/IP (reliable)
– High likelihood of communication failures
C/S Architecture
C/S Architectures

Figure 2-3. General interaction between a client and a


server.
Transmission Failures
• With connectionless transmissions, failure
of any sort means no reply
• Possibilities:
– Request message was lost
– Reply message was lost
– Server failed either before, during or after
performing the service
• Can the client tell which of the above
errors took place?
Idempotency
• Typical response to lost request in
connectionless communication: re-transmission
• Consider effect of re-sending a message such
as “Increment X by 1000”
– If first message was acted on, now the operation has
been performed twice
• Idempotent operations: can be performed
multiple times without harm
– e.g., “Return current value of X”; check on availability
of a product
– Non-idempotent: “increment X”, order a product
Layered (software) Architecture for
Client-Server Systems
• User-interface level: GUI’s (usually) for
interacting with end users
• Processing level: data processing
applications – the core functionality
• Data level: interacts with data base or file
system
– Data usually is persistent; exists even if no
client is accessing it
– File or database system
Layered Architecture
Examples
• Web search engine
– Interface: type in a keyword string
– Processing level: processes to generate DB queries, rank replies,
format response
– Data level: database of web pages
• Stock broker’s decision support system
– Interface: likely more complex than simple search
– Processing: programs to analyze data; rely on statistics, AI
perhaps, may require large simulations
– Data level: DB of financial information
• Desktop “office suites”
– Interface: access to various documents, data,
– Processing: word processing, database queries, spreadsheets,…
– Data : file systems and/or databases
Application Layering

Figure 2-4. The simplified organization of an Internet


search engine into three different layers.
System Architecture
• Mapping the software architecture to
system hardware
– Correspondence between logical software
modules and actual computers
• Multi-tiered architectures
– Layer and tier are roughly equivalent terms,
but layer typically implies software and tier is
more likely to refer to hardware.
– Two-tier and three-tier are the most common
Two-tiered C/S Architectures
• Server provides processing and data
management; client provides simple graphical
display (thin-client)
– Perceived performance loss at client
– Easier to manage, more reliable, client machines
don’t need to be so large and powerful
• At the other extreme, all application processing
and some data resides at the client (fat-client
approach)
– Pro: reduces work load at server; more scalable
– Con: harder to manage by system admin, less secure
Multitiered Architectures

Thin Fat
Client Client

Figure 2-5. Alternative client-server organizations (a)–(e).


Three-tiered Architectures
• In some applications servers may also
need to be clients, leading to a three level
architecture
– Distributed transaction processing
– Web servers that interact with database
servers
• Distribute functionality across three levels
of machines instead of two.
Multitiered Architectures
(3 Tier Architecture)

Figure 2-6. An example of a server acting as client.


Centralized v Decentralized
Architectures
• Traditional client-server architectures exhibit
vertical distribution. Each level serves a
different purpose in the system.
– Logically different components reside on different
nodes
• Horizontal distribution (P2P): each node has
roughly the same processing capabilities and
stores/manages part of the total system data.
– Better load balancing, more resistant to denial-of-
service attacks, harder to manage than C/S
– Communication & control is not hierarchical; all about
equal
Distributed Computing
P2P Systems

Farzana Jabeen Farzana.jabeen{@}seecs.edu.pk Office:


A-104
National University of Sciences and Technology (NUST)
Outline

❏ Architectures
❏ Architecture Styles
❏ System Architecture
❏ Centralized Architecture
❏ Decentralized Architecture
❏ P2P network System
(Reading Assignment )

3
0
Peer-to-Peer Systems

❏ Centralized Database
❏ Napster

❏ Query Flooding
❏ Gnutella

❏ Intelligent Query Flooding


❏ KaZaA

31
Second Generation P2P System

❏ Lesson learned from Napster and Gnutella leads to a second


generation of P2P systems
❏ Chord
❏ Pastry
❏ Tapestry
❏ Kademlia
❏ KaZaA
❏ Bit-torrent
❏ CAN
32
Peer-to-Peer Systems

❏ Common Primitives:
❏ Join: how to I begin participating?
❏ Publish: how do I advertise my file?

❏ Search: how to I find a file?

❏ Fetch: how to I retrieve a file?

33
Decentralized Architecture
❏ Peer-to-peer architecture evolve how to organize process in an
Overlay Network
❏ Overlay network
❏ Is a computer network build on the top of another network
❏ Nodes in an overlay network can be thought of connected
by virtual or logical links
❏ Two types of overlay networks
❏ structured and Unstructured (Random connections)
Client Server – P2P Models
Overlay network
structured peer-to-peer architecture
● In a structured peer-to-peer architecture, the overlay network is constructed using a deterministic
procedure
● The most-used procedure is to organize the processes through a distributed hash table (DHT).
● distributed hash table (DHT)
● Data items or nodes are assigned a random key from a large identifier space, such as a 128-bit or
160-bit identifier
● The crux: uniquely maps the key of a data item to the identifier of a node based on some distance
metric
● Routing

○ when looking up a data item, the network address of the node responsible for that data item is returned
Outline

What is Chord?
Consistent Hashing
A Simple Key Lookup Algorithm
Scalable Key Lookup Algorithm
Node Joins and Stabilization
Node Failures
What is Chord?

In short: a peer-to-peer lookup system


Given a key (data item), it maps the key onto a
node (peer).
Uses consistent hashing to assign keys to nodes .
Solves problem of locating key in a collection of
distributed nodes.
Maintains routing information as nodes join and
leave the system
What is Chord? - Addressed Problems

Load balance: distributed hash function, spreading


keys evenly over nodes
Decentralization: chord is fully distributed, no node
more important than other, improves robustness
Scalability: logarithmic growth of lookup costs with
number of nodes in network, even very large
systems are feasible
Availability: chord automatically adjusts its internal
tables to ensure that the node responsible for a key
can always be found
Consistent Hashing

Consistent hash function assigns each node and key an


m-bit identifier.
SHA-1 is used as a base hash function.
A node’s identifier is defined by hashing the node’s IP
address.
A key identifier is produced by hashing the key (chord
doesn’t define this. Depends on the application).
ID(node) = hash(IP, Port)
ID(key) = hash(key)
Consistent Hashing

In an m-bit identifier space, there are 2m identifiers.


Identifiers are ordered on an identifier circle modulo 2m.
The identifier ring is called Chord ring.
Key k is assigned to the first node whose identifier is equal to or follows
(the identifier of) k in the identifier space.
This node is the successor node of key k, denoted by successor(k).
Routing in PEER-TO-PEER Systems - CHORD
Routing in PEER-TO-PEER Systems - CHORD
Routing in PEER-TO-PEER Systems - CHORD
Routing in PEER-TO-PEER Systems - CHORD
Routing in PEER-TO-PEER Systems - CHORD
Routing in PEER-TO-PEER Systems - CHORD
Routing in PEER-TO-PEER Systems - CHORD
Chord
Routing in PEER-TO-PEER Systems - CHORD
A node joins or leaves the system

❑ Joining

❑ generating a random identifier id

❑ the node can simply do a lookup on id

❑ insert itself in the ring

❑ each node also stores information on its predecessor

❑ each data item whose key is now associated with node id, is transferred from succ(id)
Routing in PEER-TO-PEER Systems - CHORD
Routing in PEER-TO-PEER Systems - CHORD
Cont..
❑ Leaving

❑ node id informs its departure to its predecessor and successor

❑ transfers its data items to succ(id)s organize themselves into an overlay network
Routing in
PEER-TO-PEER
Systems -
CHORD
Routing in
PEER-TO-
PEER
Systems -
CHORD
Chord Distance Function
distance(A, B) = B – A (for B >= A) distance(A, B) = B -
A + 2N (for B < A)

This can also be calculated as:


distance(A, B)=( B - A + 2N) mod 2N

For Instance:
distance(0, 11) = ( 11 - 0 + 16 ) mod 16 = (27) mod 16 =
11
distance(11, 0) = ( 0 - 11 + 16 ) mod 16 = (5) mod 16 = 5

You might also like