Dictionary ADT: Dictionaries 4/1/2003 8:43 AM

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Dictionaries 4/1/2003 8:43 AM

Dictionary ADT (§8.1.1)


The dictionary ADT models a Dictionary ADT methods:
searchable collection of key- find(k): if the dictionary has
Dictionaries and Hash Tables
„
element items an item with key k, returns
The main operations of a the position of this element,
dictionary are searching, else, returns a null position.
inserting, and deleting items „ insertItem(k, o): inserts item
0 ∅ Multiple items with the same (k, o) into the dictionary
1 025-612-0001 key are allowed „ removeElement(k): if the
2 981-101-0002 Applications: dictionary has an item with
3 ∅ „ address book key k, removes it from the
4 451-229-0004 „ credit card authorization dictionary and returns its
„ mapping host names (e.g., element. An error occurs if
cs16.net) to internet addresses there is no such element.
(e.g., 128.148.34.101) „ size(), isEmpty()
„ keys(), Elements()

Dictionaries and Hash Tables 1 Dictionaries and Hash Tables 2

Hash Functions and


Log File (§8.1.2) Hash Tables (§8.2)
A log file is a dictionary implemented by means of an unsorted
sequence A hash function h maps keys of a given type to
„ We store the items of the dictionary in a sequence (based on a integers in a fixed interval [0, N − 1]
doubly-linked lists or a circular array), in arbitrary order
Example:
Performance:
„ insertItem takes O(1) time since we can insert the new item at the
h(x) = x mod N
beginning or at the end of the sequence is a hash function for integer keys
„ find and removeElement take O(n) time since in the worst case The integer h(x) is called the hash value of key x
(the item is not found) we traverse the entire sequence to look for
an item with the given key
The log file is effective only for dictionaries of small size or for A hash table for a given key type consists of
dictionaries on which insertions are the most common „ Hash function h
operations, while searches and removals are rarely performed
(e.g., historical record of logins to a workstation) „ Array (called table) of size N

When implementing a dictionary with a hash table,


the goal is to store item (k, o) at index i = h(k)
Dictionaries and Hash Tables 3 Dictionaries and Hash Tables 4

Example Hash Functions (§8.2.2)


We design a hash table for 0 ∅
a dictionary storing items 1 025-612-0001 A hash function is The hash code map is
(SSN, Name), where SSN 2 981-101-0002
usually specified as the applied first, and the
3 ∅ compression map is
(social security number) is a 4 451-229-0004 composition of two
nine-digit positive integer applied next on the
functions: result, i.e.,

Our hash table uses an


9997 ∅ Hash code map: h(x) = h2(h1(x))
array of size N = 10,000 and
the hash function
9998 200-751-9998 h1: keys → integers The goal of the hash
9999 ∅
function is to
h(x) = last four digits of x Compression map:
“disperse” the keys in
h2: integers → [0, N − 1] an apparently random
way
Dictionaries and Hash Tables 5 Dictionaries and Hash Tables 6

1
Dictionaries 4/1/2003 8:43 AM

Hash Code Maps (§8.2.3) Hash Code Maps (cont.)


Memory address: Component sum: Polynomial accumulation: Polynomial p(z) can be
„ We reinterpret the memory We partition the bits of
address of the key object as
„
„ We partition the bits of the evaluated in O(n) time
the key into components key into a sequence of
an integer of fixed length (e.g., 16 components of fixed length
using Horner’s rule:
„ Good in general, except for or 32 bits) and we sum (e.g., 8, 16 or 32 bits) „ The following
numeric and string keys the components a0 a1 … an−1 polynomials are
Integer cast: (ignoring overflows) „ We evaluate the polynomial successively computed,
„ We reinterpret the bits of the „ Suitable for numeric keys p(z) = a0 + a1 z + a2 z2 + … each from the previous
key as an integer of fixed length greater … + an−1zn−1 one in O(1) time
„ Suitable for keys of length than or equal to the at a fixed value z, ignoring p0(z) = an−1
less than or equal to the number of bits of the overflows pi (z) = an−i−1 + zpi−1(z)
number of bits of the integer integer type (e.g., long Especially suitable for strings
„ (i = 1, 2, …, n −1)
type (e.g., char, short, int (e.g., the choice z = 33 gives
and double on many
and float on many machines) at most 6 collisions on a set We have p(z) = pn−1(z)
machines)
of 50,000 English words)
Dictionaries and Hash Tables 7 Dictionaries and Hash Tables 8

Compression Collision Handling


Maps (§8.2.4) (§8.2.5)
Division: Multiply, Add and Collisions occur when 0 ∅
1 025-612-0001
„ h2 (y) = y mod N Divide (MAD): different elements are 2 ∅
„ The size N of the „ h2 (y) = (ay + b) mod N mapped to the same 3 ∅
hash table is usually cell 4 451-229-0004 981-101-0004
„ a and b are
chosen to be a prime nonnegative integers Chaining: let each
The reason has to do such that
„
cell in the table point Chaining is simple,
with number theory a mod N ≠ 0
and is beyond the to a linked list of but requires
„ Otherwise, every
scope of this course integer would map to elements that map additional memory
the same value b there outside the table

Dictionaries and Hash Tables 9 Dictionaries and Hash Tables 10

Linear Probing Search with Linear Probing


Open addressing: the Example: Consider a hash table A Algorithm find(k)
colliding item is placed in a i ← h(k)
different cell of the table „ h(x) = x mod 13 that uses linear probing p←0
Linear probing handles „ Insert keys 18, 41, find(k) repeat
collisions by placing the We start at cell h(k) c ← A[i]
colliding item in the next 22, 44, 59, 32, 31, „
if c = ∅
(circularly) available table cell 73, in this order „ We probe consecutive
return Position(null)
Each table cell inspected is locations until one of the
else if c.key () = k
referred to as a “probe” following occurs
return Position(c)
Colliding items lump together, Š An item with key k is
else
causing future collisions to 0 1 2 3 4 5 6 7 8 9 10 11 12 found, or
i ← (i + 1) mod N
cause a longer sequence of Š An empty cell is found,
p←p+1
probes or
until p = N
41 18 44 59 32 22 31 73 Š N cells have been
return Position(null)
0 1 2 3 4 5 6 7 8 9 10 11 12 unsuccessfully probed

Dictionaries and Hash Tables 11 Dictionaries and Hash Tables 12

2
Dictionaries 4/1/2003 8:43 AM

Updates with Linear Probing Double Hashing


To handle insertions and insertItem(k, o) Double hashing uses a
deletions, we introduce a secondary hash function Common choice of
„ We throw an exception
special object, called if the table is full d(k) and handles compression map for the
AVAILABLE, which replaces
deleted elements „ We start at cell h(k) collisions by placing an secondary hash function:
removeElement(k) „ We probe consecutive item in the first available d2(k) = q − k mod q
„ We search for an item with
cells until one of the cell of the series where
key k following occurs (i + jd(k)) mod N
Š A cell i is found that is q<N
for j = 0, 1, … , N − 1
„
„ If such an item (k, o) is
either empty or stores q is a prime
found, we replace it with the „

special item AVAILABLE


AVAILABLE, or The secondary hash
and we return the position of Š N cells have been function d(k) cannot have The possible values for
this item unsuccessfully probed
zero values d2(k) are
Else, we return a null „ We store item (k, o) in 1, 2, … , q
„
position cell i The table size N must be
a prime to allow probing
of all the cells
Dictionaries and Hash Tables 13 Dictionaries and Hash Tables 14

Performance of
Example of Double Hashing Hashing
k h (k ) d (k ) Probes In the worst case, searches,
Consider a hash 18 5 3 5 insertions and removals on a The expected running
table storing integer 41 2 1 2 hash table take O(n) time time of all the dictionary
22 9 6 9
keys that handles 44 5 5 5 10
The worst case occurs when ADT operations in a
collision with double 59 7 4 7 all the keys inserted into the hash table is O(1)
32 6 3 6 dictionary collide
hashing In practice, hashing is
31 5 4 5 9 0 The load factor α = n/N
„ N = 13 73 8 4 8 affects the performance of a
very fast provided the
„ h(k) = k mod 13 hash table load factor is not close
„ d(k) = 7 − k mod 7 Assuming that the hash to 100%
0 1 2 3 4 5 6 7 8 9 10 11 12 values are like random Applications of hash
Insert keys 18, 41, numbers, it can be shown tables:
22, 44, 59, 32, 31, that the expected number of
„ small databases
73, in this order 31 41 18 32 59 73 22 44 probes for an insertion with
open addressing is „ compilers
0 1 2 3 4 5 6 7 8 9 10 11 12
1 / (1 − α) „ browser caches
Dictionaries and Hash Tables 15 Dictionaries and Hash Tables 16

Universal Hashing Proof of Universality (Part 1)


Let f(k) = ak+b mod p
So a(j-k) is a multiple of p
Let g(k) = k mod N But both are less than p
A family of hash functions
So h(k) = g(f(k)). So a(j-k) = 0. I.e., j=k.
is universal if, for any
0<i,j<M-1, f causes no collisions: (contradiction)
Theorem: The set of „ Let f(k) = f(j).
Pr(h(j)=h(k)) < 1/N. Thus, f causes no collisions.
all functions, h, as
Choose p as a prime „ Suppose k<j. Then

between M and 2M.


defined here, is  aj + b   ak + b 
universal. aj + b −   p = ak + b −  p  p
Randomly select 0<a<p  p   
and 0<b<p, and define   aj + b   ak + b  
h(k)=(ak+b mod p) mod N a ( j − k ) =   −   p
  p   p 

Dictionaries and Hash Tables 17 Dictionaries and Hash Tables 18

3
Dictionaries 4/1/2003 8:43 AM

Proof of Universality (Part 2)


If f causes no collisions, only g can make h cause collisions.
Fix a number x. Of the p integers y=f(k), different from x, the number
such that g(y)=g(x) is at most  p / N  − 1
Since there are p choices for x, the number of h’s that will cause a
collision between j and k is at most
p ( p − 1)
p ( p / N  − 1) ≤
N
There are p(p-1) functions h. So probability of collision is at most
p( p − 1) / N 1
=
p( p − 1) N

Therefore, the set of possible h functions is universal.

Dictionaries and Hash Tables 19

You might also like