Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 41

Pastry: Scalable, decentralized

object location and routing


for large-scale peer-to-peer systems

Peter Druschel and Antony Rowstron

In the proceedings of IFIP/ACM International Conference


on Distributed Systems Platforms, 2001

June 7th, 2005


Seungjun Lee and Jung Gun Lee
Outline
 Background
 Pastry
 Node state
 Routing
 Self-organization and adaptation
 Locality
 Arbitrary node failures and network partitions
 Experiments
 Conclusions
Peer-to-peer (P2P) systems

 Distribution
 Decentralized control
 Self-organization
 Symmetry (communication, node roles)
Common Issues in P2P
 Organize, maintain overlay network

 Resource allocation/load balancing


Overlay network
 Resource location

 Network proximity routing

 Provide a generic P2P substrate


Physical network
Architecture

Event Network Web


notification storage Cache P2P application layer

P2P substrate
(self-organizing
Pastry, CAN, Chord
overlay network)

TCP/IP Internet
Distributed Hash Tables (DHT)
nodes
k1,v1 k2,v2 k3,v3

Operations: P2P
P2P
insert(k,v) overlay k4,v4
overlay
lookup(k) network
network

k5,v5 k6,v6

 P2P overlay maps keys to nodes


 completely decentralized and self-organizing
 robust, scalable
What is pastry ?

Let’s learn how to make


nice Pastry today.
Pastry …
Scalable, distributed object location and
routing substrate (underlying layer)

 Self-organizing overlay network

 Lookup/insert object in < log16 N routing steps

 O(log N) per-node state

 Network proximity routing


Basic Idea Behind Routing
 Each node is assigned a node ID

 Given a key,
the node efficiently routes the message to
the node whose nodeID is numerically closest
to the key
Sample NodeID Table
 Leaf Set
 stores numerically closest
nodeIDs

 Routing Table
 Total no of rows: total # of
bits in a nodeID
 Total entries in a row = base
number

 Neighborhood set
 stores closest nodes
according to proximity
metric
Node State : Leaf Set

 Serves as a fall back for routing table and contains:


 L/2 numerically closest and larger nodeIDs
 L/2 numerically closest and smaller nodeIDs

 Size of L is typically 2b or 2 x 2b

 Nodes in L are numerically close


Node State: Neighborhood Set (M)
 Contains the IP addresses and nodeIDs of closest
nodes according to proximity metric

 Size of |M| is typically 2b or 2x2b

 Not used in routing, but instead for maintaining


locality properties
Object Distribution
2128-1 O
Consistent hashing

• 128 bit circular ID space


objID
• Hash Function : SHA-1
• SHA-1(IP addr/Private Key)
 nodeId
nodeIDs
• SHA-1(Object Name)
 objID
Object Insertion/Lookup
2128-1 O
Message with key X is
routed to live node with
X nodeID closest to X

“Route” is to forward the


given message to the node
with nodeID numerically
closest to the key
Route(X)
Routing Procedure

(1) Node is in the leaf set

(2) Forward message to a


closer node (Better match)

(3) Forward towards


numerically closer
node (not a better
match)

D: Message Key
Li: ith closest NodeId in leaf set
shl(A, B): Length of prefix shared
by nodes A and B
R j: (j, i) entry of routing table
i th
Routing Table: 65a1fcx
Row 0 6

Row 1
5

Row 2
a

Row 3

1 log16 N
rows
Routing Example

d471f1 Properties:
d467c4
d462ba
d46a1c • O(log N)
d4213f routing table size

• O(log N)
Route(d46a1c) d13da3
message forwarding
steps

65a1fc
Self-organization
Initializing and maintaining routing tables and
leaf sets

 Node join

 Node departure (failure)


Node Join : Procedure
 X = new node, Z = numerically closest node,
A = bootstrap (A is close in proximity space to X)

 X sends a join message to A with target nodeId X

 A forwards to B C…

 Stops at Z, numerically closest to X’s nodeId

 In process, A,B,…,Z send their state tables to X


Node Join : State Tables
 X’s neighborhood set (NS) = A’s NS
 X’s leaf let = Z’s leaf set
 X’s routing table is filled as follows:
 X’s Row 0 = A’s row 0 (X0 = A0)
 X’s Row 1 = B’s row 1 (X1 = B1)
 …
 X sends its state to every node in its state
tables (Leaf set, neighborhood set, and routing
table)
Node Join : Example

New node X d471f1 Leaf Set


IP: 143.248.10.23 d467c4
SHA-1 d462ba
d46a1c R/T Row#3

nodeID: d46a1c
d4213f
R/T Row#2
X
JOIN

Route(d46a1c) d13da3
R/T Row#1

65a1fc
Neighborhood Set
R/T Row#0
Node Departure (Failure)
 Leaf set
 Detection: leaf set members exchange keep-alive
messages
 Repair: request set from farthest live node in set

 Routing Table
 Detection: when node attempts to contact the failed
node and there is no response
 Repair: get table from peers in the same row, then
higher rows
Node failure in routing table: example

try asking some


neighbor in the same
row for its 655x
failure entry

if it does not have one,


try asking some
neighbor in the row
below.
Locality in Pastry
 Based on proximity metric
(i.e., No. of IP hops, geographic distance)

 Proximity space is assumed to be Euclidean

 The route chosen for a message is likely to be


“good“ with respect to the proximity metric
Locality in Routing Tables
 To maintain the invariant when adding new nodes.

 X joins; A is close to X; X0 = A0
so locality holds in X’s routing table

 X1 = B1
Entries in B1 (row 1 of X) are close to B, but are they
necessarily close to X?
Locality in Routing Tables
 Entries of B1 are reasonable close to X Why?
 A is much closer to B than entry in B1 to B because every
time we choose from an exponentially decreasing set of
nodes

 To improve proximity approximation:


 X Queries nodes in routing table and neighborhood set for
their state

 Compares distances (from routing table entries) and update


route entries with closer nodes if found.
Route Locality
 At each routing step the message is moved closer to
the destination in the:
 nodeId space (numerically closer nodes)
 proximity space: message travels the least possible distance

 Given that:
 A message routed from A to B at a distance d cannot be
routed to a node with a distance of less than d from A.
(follows from routing procedure)
 Expected distance traveled increases exponentially

 Though shortest path is not guaranteed, we still get a


good route.
Locality among k nodes
 In some Pastry-based applications, object is replicated on k
nodes on its route (during insertion)

 In prefix-base routing: goal is to reach any of k numerically


closest nodes that has a copy of object

 May miss nearby nodes with different prefix

 Use heuristic to determine when close to k nearest nodes


 Based on density of nodeIds that store object; using local info
 Switch to numerically closest address
Arbitrary node failure
 Node continues to be responsive, but behaves
incorrectly or maliciously

 Repeated queries fail each time because they


normally take the same route

 How to solve it? Use randomized routing


 The choice among multiple nodes that satisfy the routing
criteria should be made randomly
Pastry: Experimental results
Prototype
 implemented in Java
 emulated network
 Up to 100K pastry nodes
 Quad-processor Compaq AlphaServer
 500MHz Alpha CPU
 6GB main memory
 UNIX
Average Number of Routing Hops

 The number of
route hops scale
with the size of
network.

b=4, |L|=16, |M|=32, 200k random queries


Number of Routing Hops (100K nodes)

 The maximum
route length is 5
(=ceil(log16 105))

b=4, |L|=16, |M|=32, 200k random queries


Number of Routing Hops (failures)
 Without repairs,
routing table
causes as
significant
deterioration of
route quality

 After the repair,


average number
of hops is slightly
higher than
before the
b=4, |L|=16, |M|=32, 200k random queries
failures
5K nodes, 500 failures
CAN: Berkeley

 Overlay:
 A virtual d-dimensional Coordinate space
 Each peer is responsible for a zone

 Routing:
 Each peer maintains info about neighbors
 Greedy algorithm for routing

 Performance Analysis:
 Expected: (d/4)(n1/d) steps for lookup
 d: dimension
Chord: MIT
 Overlay:
 Peers are organized in a ring
 Successor peer
 <k, value> is stored on the
successor of k

 Routing:
 Finger Table:
 More info for close part of the
ring
 Large jumps, then shorter jumps
 Resembling a binary search –
distance halving
Chord Routing – naïve approach

1
I’m node 2. 15
Please find key 14! 14 2

3
How will you find
the location of
12
node 14?
Can’t
we
Hint: Use successor id 10 accelerate
stored in each node
7
lookup?
m=4
Chord Routing – with finger table Start
3
Int.
[3,4)
node
3
Finger table is used to
Ask ‘node’ for keys within range ‘interval’ 4 [4,6) 7

1 6 [6,10) 7
15
10 [10,2) 10
I’m node 2. 14 2
14 ∈[10,2)
Please find key 14!
3

12

Start Int. node O(N)


11 [11,12) 12
12 [12,14) 12 10 Vs.
14
2
[14,2)
`
[2,10)
14
2 14 ∈[14,2)
7
O(log N)
m=4
CAN, CHORD, PASTRY Comparisons
CAN CHORD PASTRY
Initiated Berkeley, 2001 MIT, 2001 RICE & Microsoft,
2001
Type of Overlay d-dimensional Ring Topology Self-adjusting
Cartesian space proximity network
Routing Table Size 2d logN (2b -1)log2b N
Estimated Hop Count O(n1/d) O(log N) O(log2b N)
Upper limit on total Dimension of space Dimension of Size of key
number of nodes Ring
Fault tolerance Update Update Update
Load balancing Natural Natural Natural
Accuracy Exact Exact Exact
Applications NONE CFS PAST(storage),
(Cooperative SQUIRREL(web
File System) caching)
Evaluation of Discovery Algorithms
Conclusions
 Pastry is a generic P2P object location
and routing substrate

 Network proximity routing


 longest prefix with target ID
 numerically closest to target ID

 Scalable, self-organizing, fault-tolerant


 O(log N) routing steps
 O(log N) routing table size
References
[1] A. Rowstron and P. Druschel, “Pastry: Scalable, distributed
object location and routing for large-scale peer-to-peer
systems”, http://research.microsoft.com/~antr/Pastry/
[2] Peter Druschel, ”Scalable peer-to-peer substrates: A
new foundation for distributed applications?”,
http://www.cs.rice.edu/~druschel/comp413/lectures/
Pastry.ppt

[3] M. Kelaskar, V. Matossian, P. Mehra, D. Paul, and M. Parashar,


“A Study of Discovery Mechanisms for Peer-to-Peer
Applications”,
http://www.caip.rutgers.edu/TASSL/Papers/p2p-p2pws02-discov
ery.pdf

You might also like