Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Hashing (4)

Cuckoo Hashing

1
Cuckoo Hashing
A simple scheme for resolving collisions in a hash table
Guaranteed constant time lookup
Expected constant time insertion
Requires stronger assumption for hash functions
We will work with Uniform Hashing Assumption
We will present a simplified version of the scheme, and
a simplified analysis

2
We have a table T[0 . . m − 1] of m slots
Each slot is either null or contains a single data item
Data items are hashed using two hash functions:
h1 , h2 : U → {0, . . . , m − 1}
We model each hash function as a truly random
function from U to {0, . . . , m − 1}
Any data item  ∈ U resides in one of two slots: either
h1 () or h2 ()
Lookup procedure:
test if T[h1 ()] =  or T[h2 ()] = 
=⇒ guaranteed constant time lookup

3
Procedure to insert a new data item 
let n = # items already in the table
if T[h1 ()] =  or T[h2 ()] =  then
return success // already in table
pos ← h1 ()
repeat n times
if T[pos] = null then
T[pos] ← 
return success // found an empty slot
swap  and T[pos]
pos ← h1 () + h2 () − pos
// ’s alternate position
return failure
// need to rehash

4
The cuckoo graph:
Each slot is a vertex
Each data item  adds a random edge
e = {h1 (), h2 ()} to the graph
• undirected graph
• possibly a multi-graph — repeated edges

5
inserts x in position h1 (x).
Arrows show alternate position of each
procedure insert(x)
A item
if T [h (x)] = x or T [h (x)] = x then return;
0 1 2

C
pos ← h1 (x);
1 loop n times { A B
Insert
if TZ[pos]
at =position
NULL then0:
{ T0[pos]
→3 ←→ 8
x; return};
2 x ↔ T [pos];
B if pos= h1 (x) then pos← h2 (x) else pos← h1 (x);}
3 Norehash(); =⇒ no problem!
cycle insert(x)
H end
4

5
P
Insert
Figure 2. Cuckoo h1 (Z)insertion
Z at hashing = 7 (h 2 (Z) =and
procedure 1):illustration.
The arrows W H alternative
Z Cposition of each item in the dic-
6
7→4→7→1→2
show the
tionary. A new item would be inserted in the position of A by
W moving A to its alternative position, currently occupied by B,
7
andInsert
moving BZtoat 1 (Z) = 7
itshalternative (h2 (Z)
position which=is4):
currently va-
8 W ofHa newZitem W
cant. Insertion in the Hposition of H would not
succeed: 7Since
→4 H→ 7→
is part of a4cycle
→ 7(together
→ 4 . .with
. W), the new
9 only
item would two slots
get kicked for three items =⇒
out again.
failure
3.1 Examples and discussion
Figure 2 shows an example of the cuckoo graph, which is a graph that has an
edge for each item in the dictionary, connecting the two alternative positions of 6
Lessons learned:
• If a new item is inserted at slot s, and there is no
cycle in the graph reachable from s, then insertions
will succeed
• In particular: if there are no cycles, insertion will
succeed
• Even if there are cycles, insertion may succeed:
◦ The exact characterization of failure is a bit more
complicated

7
Analyzing the probability of insertion failure
We will show that if α := n/ m (the “load factor”) is at
most 1/ 4, then the probability that inserting n items
into a table with m slots ends in failure is at most
3/ 4
How? Compute bound on probability p of a cycle in a
multi-graph with m vertices (slots) and n random
edges (data items) e1 , . . . , en
Strategy: for each k = 1, 2, 3, . . . , estimate probability
pk that graph contains a simple cycle of length k
Union bound: p ≤ k≥1 pk
P

NOTE: a more careful analysis shows failure probability


is much smaller: O(1/ m)

8
Typical case: p3 := probability of a 3-cycle:
s1

s0 s2

≤ m3 ways to pick s0 , s1 , s2 , but we count the same


cycle 3 times
∴ # of 3-cycles: ≤ m3 / 3
Probability that
(e1 , e2 , e3 ) = ({s0 , s1 }, {s1 , s2 }, {s2 , s0 })
is (2/ m2 )3
# of triples 1 , 2 , 3 : ≤ n3
∴ p3 ≤ (m3 )/ 3 × (2/ m2 )3 × n3 = (2n/ m)3 / 3

9
The general case (exercise):
(2α)k
pk ≤ , where α := n/ m
k | {z }
load factor

Therefore,
X ∞ (2α)k
X
‚
1
Œ
p≤ pk ≤ = ln
k≥1 k=1
k 1 − 2α

Wolfram Alpha says:  ≤ 1/ 2 =⇒ ln(1/ (1 − )) ≤ 3/ 4


Implication:
α ≤ 1/ 4 =⇒ failure probability ≤ 3/ 4

10
Building a cuckoo hash table
Suppose we attempt to insert n distinct items
1 , . . . , n items into an empty hash table, and stop
when an insertion fails
For r = 1 . . n, let Xr be the number of swaps performed
when we attempt to insert r
Note: Xr = 0 if the insertion procedure fails on one of
1 , . . . , r−1
Assume that α := n/ m ≤ 1/ 4
Claim: E[Xr ] ≤ 3/ 2
It follows that
• Expected cost of attempting to insert n items: O(n)
• Probability that such an attempt succeeds: ≥ 1/ 4
• Expected number of attempts until success: ≤ 4
• Expected cost of building a table: O(n)
11
Claim: E[Xr ] ≤ 3/ 2
Proof:
Suppose the h1 (r ) = s0 and consider the cuckoo graph
corresponding to items 1 , . . . , r−1
We have
n
X
E[Xr ] = Pr[Xr ≥ k]
k=1

If Xr ≥ k, then in the cuckoo graph: either


(i) there is a simple path of length k starting at s0 :
s0 → s1 → · · · → sk ,
or
(ii) there is a simple loop starting at s0 :
s0 → · · · → sℓ−1 → sj (j < ℓ)
12
Let’s estimate the probability qk that there is a simple
path of length k starting at s0 :
s0 → s1 → · · · → sk

# of choices for s1 , . . . , sk : ≤ mk
Probability that
(e1 , . . . , ek ) = ({s0 , s1 }, . . . , {sk−1 , sk })
is (2/ m2 )k
# of tuples 1 , . . . , k : ≤ nk
Therefore,
qk ≤ mk × (2/ m2 )k × nk = (2n/ m)k = (2α)k
≤ 2−k (since α ≤ 1/ 4)

13
Let’s estimate the probability q̃ℓ that there is a simple
loop of length ℓ starting at s0
s0 → · · · → sℓ−1 → sj (j < ℓ)

Homework:
ℓ(2α)ℓ
q̃ℓ ≤
m
Let q̃ := probability of any simple loop starting at s0 :
1 X∞ 1 2α
ℓ(2α)ℓ =
X
q̃ ≤ q̃ℓ ≤ ·
ℓ≥1 m ℓ=1 m (1 − 2α)2
2
≤ (since α ≤ 1/ 4)
m

14
Putting it all together:
n
X n
X
E[Xr ] = Pr[Xr ≥ k] ≤ (2−k + 2/ m)
k=1 k=1
≤ 1 + 2n/ m = 3/ 2

15

You might also like