Chord

Chord
Fay Chang, Jeffrey Dean, Sanjay Ghemawat,

Wilson C. Hsieh, Deborah A. Wallach, Mike
Burrows, Tushar Chandra, Andrew Fikes,
Robert E. Gruber
Google, Inc.
OSDI 2006
Introduction
 Dynamo stores objects associated with a key
through a simple interface:
 get(),put()
 It should be possible to scale Dynamo
incrementally
 This requires the ability to partition data
over the set of nodes (storage hosts)
 Dynamo relies on a concept called consistent
hashing
 The approach they used is similar to that found in
Chord.
Distributed Hash Tables (DHT)
 Operationally like standard hash tables
 Stores (key, value) pairs
 The key is like a filename
 The value can be file contents or pointer to
location
 Goal: Efficiently insert/lookup/delete
(key,value) pairs
 Each peer stores a subset of (key, value)
pairs in the system
DHT
 Core operation: Find node responsible for
a key
 Map key to node
 Efficiently route insert/lookup/delete request
to this node
 Allow for frequent node arrivals and
departures
DHT
 Introduce a hash function to map the object being searched
for to a unique global identifier:
 e.g., h(“NGC’02 Tutorial Notes”) → 8045
 Distribute the range of the hash function among all nodes in
the network
1500-4999
1000-1999 4500-6999
8045
9000-9500
8000-8999 7000-8500
0-999
9500-9999
 Each node must “know about” at least one copy of each

object that hashes within its range (when one exists)
DHT:Desirable Properties
 Key ID space (search space) is uniformly populated
 Mapping of keys to IDs using (consistent) hashing
 A node is responsible for indexing all the keys in a

certain subspace of the ID space
 Nodes have only partial knowledge of other node’s
responsibilities
 Messages should be routed to a node efficiently
(small number of hops)
 Node arrival/departure should only affect a few
nodes.
Consistent Hashing
 The main idea: map both keys and nodes
(node IPs) to the same (metric) ID space
Consistent Hashing
 The main idea: map both keys and nodes
(node IPs) to the same (metric) ID space
The ring is just a possibility.

Any metric space will do
Consistent Hashing
 With high probability, the hash function
balances load (all nodes receive roughly the
same number of keys).
 With high probability, when a node joins
(or leaves) the network, only an fraction of
the keys are moved to a different location.
 Thisis clearly the minimum necessary to
maintain a balanced load.
Consistent Hashing
 The consistent hash function assigns each node
and key an m-bit identifier using SHA-1 as a base
hash function.
 A node’s identifier is chosen by hashing the node’s
IP address.
 A key identifier is produced by hashing the key.
 For more info see:
 D. R. Karger, E. Lehman, F. Leighton, M. Levine, D. Lewin,
and R.Panigrahy, “Consistent hashing and random trees:
Distributed caching protocols for relieving hot spots on
theWorldWideWeb,” in Proc. 29th ACM Symp. Theory of
Computing, El Paso, TX, May 1997, pp. 654–663.
P2P Middleware: Differences
 Different P2P middlewares differ in:
 The choice of the ID space
 The structure of their network of nodes (i.e.
how each node chooses its neighbors)
 For each object, node(s) whose range(s) cover
that object must be reachable via a “short”
path
 This is a major research topic
Chord
 m bit identifier space for both keys and
nodes
 Key identifier = SHA-1(key)
SHA-1
 Key = “LetItBe” ID=50
SHA-1
 Key = “129.100.16.93” ID=70
 How do we assign keys to nodes?
Chord
 Nodes organized in
an identifier circle
based on node
identifiers
 Keys assigned to
their successor
node in the
identifier circle
e.g., node with next
higher ID.
Chord
 Hash function
ensures even
distribution of
nodes and keys on
the circle
 Range covered by
node is from
previous ID up to
its own ID
 Assume an N node
network
Chord: Search Possibilities
 Routing table size vs search cost
 Every peer knows every other peer: O(N)
routing table size
 Every peer knows its successor: O(N)
search time.
 The “compromise” is to have each peer
know the next m successors.
Finger Table
 Let m be the number of bits in the
key/node identifiers
 Each node, n, maintains a routing table with
at most m entries called the finger table.
 The ith entry in the table at node n contains
the identity of the first node, s, that
succeeds n by at least 2i-1.
s = successor(n+2i-1)
 s is called the ith finger of node n
Chord:Finger Table
Finger table:
finger[i] =
successor (n + 2i-1)
where 1 ≤ i ≤ m
O(log N) table size

Chord: Finger Table
Finger table:
finger[i] =
Chord: Finger Table
Finger table:
finger[i] =
Chord: Finger Table
Finger table:
finger[i] =
Chord: Finger Table
Finger table:
finger[i] =
Chord: Finger Table
Finger table:
finger[i] =
Chord: Finger Table
Finger table:
finger[i] =
Chord: Finger Table
Finger table:
finger[i] =
Chord: Finger Table
Finger table:
finger[i] =
The Chord algorithm –
Scalable node localization
Chord: Search
 Assume node n is searching for key k.
 Node n does the following:
 Find ith table entry of node n such that
k[finger[i].start, finger[i+1].start])
 If no such entry exists then return the node in
the last entry of the finger table
 The above two steps are repeated until the
condition in the first step is satisfied.
Chord: Join
 Nodes can join (and leave) at any time.
 Challenge: Preserving the ability to locate
every key in the network
 Chord must preserve the following:
 Each node’s successor correctly maintained
 For every key k, node successor(k) is
responsible for k.
 For lookups to be fast, it is desirable for
the finger tables to be correct.
Chord: Join Implementation
 Each node in Chord maintains a
predecessor pointer.
 This consists of the Chord ID and IP address
of the immediate predecessor of that node.
 It can be used to walk counterclockwise around
the identifier circle.
 The new node to be added learns the
identify of an existing Chord node by some
external mechanism
Chord: Join Initialization Steps
 Assume n is the node to join.
 Find any existing node, n’.
 Find successor of n from n’. Label this
successor(n).
 Ask successor(n) for its predecessor. This
is labelled as predecessor(successor(n)).
Chord: Join Example
•Assume N26 wants to
join; If finds N8
•N8’s finger table suggests

that N26 will be “between”
N21 and N32.
Chord: Join (Initialize finger
table)
 Node n needs to have its finger table
initialized
 Node n can ask one its predecessor to be
for its finger table as a starting point
Chord: Join (Changing Existing
Finger Tables)
 Node n needs to entered into the finger tables of
some existing nodes.
 Node n becomes the ith finger of node p, iff
 p precedes n by at least 2i-1 ; and
 The ith finger of node p succeeds n.
 The first node, p, that satisfies these conditions
is the immediate predecessor of n-2i-1
 For a given n, the algorithm starts with the ith
finger of node n and then continues to walk in the
counter-clock-wise direction on the identifier
circle until it encounters a node whose ith finger
precedes n.
Chord: Join Example (add N26)
N21 (old finger table) N21 (new finger table)
N21+1 N32 N21+1 N26
N21+2 N32 N21+2 N26
N21+4 N32 N21+4 N26
N21+8 N32 N21+8 N32
N21+16 N38 N21+16 N38
N21+32 N56 N21+32 N56
i=1: Does N21 precede N26 by at least 1 (2i-1); yes: N21+1 becomes N26;
i=2: Does N21 precede N26 by at least 2; yes: N21+2 becomes N26;
i=3: Does N21 precede N26 by at least 4; yes: N21+4 becomes N26;
i=4: Does N21 precede N26 by 8; no; evaluate N14;
Chord: Join Example (add N26)
N14 (new finger table) N14 (new finger table)
N14+1 N21 N14+1 N21
N14+2 N21 N14+2 N21
N14+4 N21 N14+4 N21
N14+8 N32 N14+8 N26
N14+16 N32 N14+16 N32
N14+32 N48 N14+32 N48
i=4: Does N14 precede N26 by at least 8; yes; N14+8 becomes N26
i=5; Does N15 precede N26 by at least 16; no; evaluate N8
Etc
Chord: Join (Transferring Keys)
 Move responsibility for all the keys for
which node n is the successor.
 Typically this involves moving data
associated with each key to the new node.
 Node n can become the successor for keys
that were previously the responsibility of
the node immediately following n.
 Node n only needs to contact one node to
transfer responsibility for all relevant
keys.
Chord: Join
 The previous discussion on join focuses on a
single node join.
 What if there are multiple node joins?
 Join requires that each node’s successor is
correctly maintained
Chord: Stabilization Protocol
 The successor/predecessor links are
rebuilt by periodic stabilize notification
messages
 Sent by each node to its successor to inform it
of the (possibly new) identity of the
predecessor
 The successor pointers are used to verify
and correct finger table entries.
Chord: Join/Stabilize Example
• N26 joins the system
• N26 acquires N32 as its successor
• N26 notifies N32
• N32 acquires N26 as its

predecessor
• N26 copies keys
• N21 runs stabilize() and asks its

successor N32 for its predecessor
which is N26.
• N21 aquires N26 as its successor

Chord Stabilization
 Pointers and finger tables may be in a state
of flux
 Is it possible that data will not be found?
 Yes
 Recovery: try again

Chord: Node Failure
N120
N10
N113
N102
N85 Lookup(90)
N80
N80 doesn’t know correct successor, so incorrect lookup

Chord: Node Failure
 Solution: Use successor lists
 Each node knows r immediate successors
 After failure, will know first live successor
 Stabilize messages correct finger tables
 Replicas of the data associated with a key at the r
successor nodes might be used
 Application dependent
Chord Properties
 In a system with N nodes and K keys, with high

probability…
 each node receives at most K/N keys
 each node maintains info. about O(log N) other nodes
 lookups resolved with O(log N) hops
 Insertions O(log2N)
 The developers of Chord validated this through
simulation studies.
 No consistency among replicas
 Hops have poor network locality
Chord: Network Locality
 Nodes close on ring can be far in the
network.
To vu.nl
Lulea.se
OR-DSL N20
CMU
MIT
MA-Cable
Cisco
CA-T1
N40 Cornell
N41
CCI NYU
N80
Aros
Utah
* Figure from http://project-iris.net/talks/dht-toronto-03.ppt

Chord

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chord

Uploaded by

Copyright:

Available Formats

Chord

Fay Chang, Jeffrey Dean, Sanjay Ghemawat,

 Each node must “know about” at least one copy of each

 A node is responsible for indexing all the keys in a

The ring is just a possibility.

O(log N) table size

•N8’s finger table suggests

• N26 joins the system

• N26 acquires N32 as its successor

• N26 notifies N32

• N32 acquires N26 as its

• N26 copies keys

• N21 runs stabilize() and asks its

• N21 aquires N26 as its successor

 Recovery: try again

N80 doesn’t know correct successor, so incorrect lookup

 In a system with N nodes and K keys, with high

* Figure from http://project-iris.net/talks/dht-toronto-03.ppt

You might also like