Dictionary ADT: Dictionaries 4/1/2003 8:43 AM

Dictionaries 4/1/2003 8:43 AM
Dictionary ADT (§8.1.1)

The dictionary ADT models a Dictionary ADT methods:
searchable collection of key- find(k): if the dictionary has
Dictionaries and Hash Tables

element items an item with key k, returns
The main operations of a the position of this element,
dictionary are searching, else, returns a null position.
inserting, and deleting items insertItem(k, o): inserts item
0 ∅ Multiple items with the same (k, o) into the dictionary
1 025-612-0001 key are allowed removeElement(k): if the
2 981-101-0002 Applications: dictionary has an item with
3 ∅ address book key k, removes it from the
4 451-229-0004 credit card authorization dictionary and returns its
mapping host names (e.g., element. An error occurs if
cs16.net) to internet addresses there is no such element.
(e.g., 128.148.34.101) size(), isEmpty()
keys(), Elements()
Dictionaries and Hash Tables 1 Dictionaries and Hash Tables 2
Hash Functions and

Log File (§8.1.2) Hash Tables (§8.2)
A log file is a dictionary implemented by means of an unsorted
sequence A hash function h maps keys of a given type to
We store the items of the dictionary in a sequence (based on a integers in a fixed interval [0, N − 1]
doubly-linked lists or a circular array), in arbitrary order
Example:
Performance:
insertItem takes O(1) time since we can insert the new item at the
h(x) = x mod N
beginning or at the end of the sequence is a hash function for integer keys
find and removeElement take O(n) time since in the worst case The integer h(x) is called the hash value of key x
(the item is not found) we traverse the entire sequence to look for
an item with the given key
The log file is effective only for dictionaries of small size or for A hash table for a given key type consists of
dictionaries on which insertions are the most common Hash function h
operations, while searches and removals are rarely performed
(e.g., historical record of logins to a workstation) Array (called table) of size N
When implementing a dictionary with a hash table,

the goal is to store item (k, o) at index i = h(k)
Example Hash Functions (§8.2.2)

We design a hash table for 0 ∅
a dictionary storing items 1 025-612-0001 A hash function is The hash code map is
(SSN, Name), where SSN 2 981-101-0002
usually specified as the applied first, and the
3 ∅ compression map is
(social security number) is a 4 451-229-0004 composition of two
nine-digit positive integer applied next on the
functions: result, i.e.,
…
Our hash table uses an

9997 ∅ Hash code map: h(x) = h2(h1(x))
array of size N = 10,000 and
the hash function
9998 200-751-9998 h1: keys → integers The goal of the hash
9999 ∅
function is to
h(x) = last four digits of x Compression map:
“disperse” the keys in
h2: integers → [0, N − 1] an apparently random
way
1
Hash Code Maps (§8.2.3) Hash Code Maps (cont.)

Memory address: Component sum: Polynomial accumulation: Polynomial p(z) can be
We reinterpret the memory We partition the bits of
address of the key object as

We partition the bits of the evaluated in O(n) time
the key into components key into a sequence of
an integer of fixed length (e.g., 16 components of fixed length
using Horner’s rule:
Good in general, except for or 32 bits) and we sum (e.g., 8, 16 or 32 bits) The following
numeric and string keys the components a0 a1 … an−1 polynomials are
Integer cast: (ignoring overflows) We evaluate the polynomial successively computed,
We reinterpret the bits of the Suitable for numeric keys p(z) = a0 + a1 z + a2 z2 + … each from the previous
key as an integer of fixed length greater … + an−1zn−1 one in O(1) time
Suitable for keys of length than or equal to the at a fixed value z, ignoring p0(z) = an−1
less than or equal to the number of bits of the overflows pi (z) = an−i−1 + zpi−1(z)
number of bits of the integer integer type (e.g., long Especially suitable for strings
(i = 1, 2, …, n −1)
type (e.g., char, short, int (e.g., the choice z = 33 gives
and double on many
and float on many machines) at most 6 collisions on a set We have p(z) = pn−1(z)
machines)
of 50,000 English words)
Compression Collision Handling

Maps (§8.2.4) (§8.2.5)
Division: Multiply, Add and Collisions occur when 0 ∅
1 025-612-0001
h2 (y) = y mod N Divide (MAD): different elements are 2 ∅
The size N of the h2 (y) = (ay + b) mod N mapped to the same 3 ∅
hash table is usually cell 4 451-229-0004 981-101-0004
a and b are
chosen to be a prime nonnegative integers Chaining: let each
The reason has to do such that

cell in the table point Chaining is simple,
with number theory a mod N ≠ 0
and is beyond the to a linked list of but requires
Otherwise, every
scope of this course integer would map to elements that map additional memory
the same value b there outside the table
Linear Probing Search with Linear Probing

Open addressing: the Example: Consider a hash table A Algorithm find(k)
colliding item is placed in a i ← h(k)
different cell of the table h(x) = x mod 13 that uses linear probing p←0
Linear probing handles Insert keys 18, 41, find(k) repeat
collisions by placing the We start at cell h(k) c ← A[i]
colliding item in the next 22, 44, 59, 32, 31,
if c = ∅
(circularly) available table cell 73, in this order We probe consecutive
return Position(null)
Each table cell inspected is locations until one of the
else if c.key () = k
referred to as a “probe” following occurs
return Position(c)
Colliding items lump together, An item with key k is
else
causing future collisions to 0 1 2 3 4 5 6 7 8 9 10 11 12 found, or
i ← (i + 1) mod N
cause a longer sequence of An empty cell is found,
p←p+1
probes or
until p = N
41 18 44 59 32 22 31 73 N cells have been
return Position(null)
0 1 2 3 4 5 6 7 8 9 10 11 12 unsuccessfully probed
2
Updates with Linear Probing Double Hashing

To handle insertions and insertItem(k, o) Double hashing uses a
deletions, we introduce a secondary hash function Common choice of
We throw an exception
special object, called if the table is full d(k) and handles compression map for the
AVAILABLE, which replaces
deleted elements We start at cell h(k) collisions by placing an secondary hash function:
removeElement(k) We probe consecutive item in the first available d2(k) = q − k mod q
We search for an item with
cells until one of the cell of the series where
key k following occurs (i + jd(k)) mod N
A cell i is found that is q<N
for j = 0, 1, … , N − 1

If such an item (k, o) is
either empty or stores q is a prime
found, we replace it with the
special item AVAILABLE

AVAILABLE, or The secondary hash
and we return the position of N cells have been function d(k) cannot have The possible values for
this item unsuccessfully probed
zero values d2(k) are
Else, we return a null We store item (k, o) in 1, 2, … , q

position cell i The table size N must be
a prime to allow probing
of all the cells
Performance of
Example of Double Hashing Hashing
k h (k ) d (k ) Probes In the worst case, searches,
Consider a hash 18 5 3 5 insertions and removals on a The expected running
table storing integer 41 2 1 2 hash table take O(n) time time of all the dictionary
22 9 6 9
keys that handles 44 5 5 5 10
The worst case occurs when ADT operations in a
collision with double 59 7 4 7 all the keys inserted into the hash table is O(1)
32 6 3 6 dictionary collide
hashing In practice, hashing is
31 5 4 5 9 0 The load factor α = n/N
N = 13 73 8 4 8 affects the performance of a
very fast provided the
h(k) = k mod 13 hash table load factor is not close
d(k) = 7 − k mod 7 Assuming that the hash to 100%
0 1 2 3 4 5 6 7 8 9 10 11 12 values are like random Applications of hash
Insert keys 18, 41, numbers, it can be shown tables:
22, 44, 59, 32, 31, that the expected number of
small databases
73, in this order 31 41 18 32 59 73 22 44 probes for an insertion with
open addressing is compilers
0 1 2 3 4 5 6 7 8 9 10 11 12
1 / (1 − α) browser caches
Universal Hashing Proof of Universality (Part 1)

Let f(k) = ak+b mod p
So a(j-k) is a multiple of p
Let g(k) = k mod N But both are less than p
A family of hash functions
So h(k) = g(f(k)). So a(j-k) = 0. I.e., j=k.
is universal if, for any
0<i,j<M-1, f causes no collisions: (contradiction)
Theorem: The set of Let f(k) = f(j).
Pr(h(j)=h(k)) < 1/N. Thus, f causes no collisions.
all functions, h, as
Choose p as a prime Suppose k<j. Then
between M and 2M.

defined here, is  aj + b   ak + b 
universal. aj + b −   p = ak + b −  p  p
Randomly select 0<a<p  p   
and 0<b<p, and define   aj + b   ak + b  
h(k)=(ak+b mod p) mod N a ( j − k ) =   −   p
  p   p 
3
Proof of Universality (Part 2)

If f causes no collisions, only g can make h cause collisions.
Fix a number x. Of the p integers y=f(k), different from x, the number
such that g(y)=g(x) is at most  p / N  − 1
Since there are p choices for x, the number of h’s that will cause a
collision between j and k is at most
p ( p − 1)
p ( p / N  − 1) ≤
N
There are p(p-1) functions h. So probability of collision is at most
p( p − 1) / N 1
=
p( p − 1) N
Therefore, the set of possible h functions is universal.
Dictionaries and Hash Tables 19

Dictionary ADT: Dictionaries 4/1/2003 8:43 AM

Uploaded by

Copyright:

Available Formats

You might also like

Dictionary ADT: Dictionaries 4/1/2003 8:43 AM

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dictionary ADT: Dictionaries 4/1/2003 8:43 AM

Uploaded by

Copyright:

Available Formats

Dictionaries 4/1/2003 8:43 AM

Dictionary ADT (§8.1.1)

Dictionaries and Hash Tables 1 Dictionaries and Hash Tables 2

Hash Functions and

When implementing a dictionary with a hash table,

Example Hash Functions (§8.2.2)

Our hash table uses an

Hash Code Maps (§8.2.3) Hash Code Maps (cont.)

Compression Collision Handling

Dictionaries and Hash Tables 9 Dictionaries and Hash Tables 10

Linear Probing Search with Linear Probing

Dictionaries and Hash Tables 11 Dictionaries and Hash Tables 12

Updates with Linear Probing Double Hashing

special item AVAILABLE

Universal Hashing Proof of Universality (Part 1)

between M and 2M.

Dictionaries and Hash Tables 17 Dictionaries and Hash Tables 18

Proof of Universality (Part 2)

Therefore, the set of possible h functions is universal.

Dictionaries and Hash Tables 19

You might also like