Spring 2021

Data Structures 2

Hash Tables and Hash Functions

- Hash table: an array of some fixed size, that

positions elements according to an algorithm called a
hash function.

- A hash function maps keys of a given type to

integers in a fixed interval [0, N-1]

- h(x) =x mod N is a hash function for integer keys

- The integer h(x) is called the hash value of key x

- The goal of the hash function is to distribute the keys

in a random way.

If we have a hash table with size 10 and the
following hash function h(k) =k % 10 1 41
Insert the following elements 41, 34, 7, and 18

- Note that when we search an element, we
will calculate the hash function and search 4 34
in the specified location resulted from hash
function. 5

7 7

8 18

- The event that two hash table elements map into the
same slot in the array
- If we have h(k) =k % 10
add 41, 34, 7, 18, then 21
- 21 hashes into the same slot as 41.
- 21 should not replace 41 in the hash table;
they should both be there.
Collision Handling/Resolution:
a strategy for fixing collisions in a hash table

We have the following ways to handle collisions:

1. Separate Chaining
2. Open Addressing
a. Linear probing
b. Quadratic Probing
c. Double Hashing

1. Separate chaining
Let each cell in the table point to a linked list
of entries that map there

h(x)=x mod 13
Insert keys 18, 41, 22, 44, 59, 32, 31, 73 in this order

2 41
5 184431
6 32
7 59
8 73
9 22

2. Open Addressing
- on a collision, look for another empty spot in the
- Examples of open addressing
o linear probing
o quadratic probing
o double hashing
- Searching for an element when using open
addressing scheme must continue looking for item
until it finds it or an empty slot.

a. Linear probing
handles collisions by placing the colliding item in the
next (circularly) available table cell
h(x)=x mod 13
Insert keys 18, 41, 22, 44, 59, 32, 31, 73 in this order

2 41
5 18
6 44
7 59
8 32
9 22
10 31
11 73

Search with Linear Probing
We probe consecutive locations until one of the following
- An item with key k is found, or
- An empty cell is found, or
- N cells have been unsuccessfully probed.

Clustering problem
- elements being placed close together by probing,
which degrades hash table's performance
- Example add 89, 18, 49, 58, 9
- now searching for the value 28 will have to check
half the hash table! no longer constant time...

b. Quadratic probing
- resolving collisions on slot i by putting the colliding
element into slot i+1, i+4, i+9, i+16, ...
- Example add 89, 18, 49, 58, 9
i. 49 collides (89 is already there), so we
search ahead by +1 to empty slot 0
ii. 58 collides (18 is already there), so we
search ahead by +1 to occupied slot 9,
then +4 to empty slot 2
iii. 9 collides (89 is already there), so we
search ahead by +1 to occupied slot 0,
then +4 to empty slot 3

c. Double Hashing
- You have a primary hash function h1(k)
- Double hashing uses a second hash function h2(k)
and handles collisions by placing an item in the first
available cell of the series
(h1(k)+ j*h2(k)) mod N for j=0, 1, … , N -1

- The secondary hash function h2(k)cannot have zero

- The table size N must be a prime to allow probing of
all the cells
- Common choice of compression function for the
secondary hash function:
o h2(k) =q-(k mod q) where
 q<n
 q is prime

Example 1
Consider a hash table storing integer keys that handles
collision with double hashing
h(k)=k mod 13
d(k)=7- k mod 7
Insert keys 18, 41, 22, 44, 59, 32, 31, 73
k h(k) d(k)
18 5 5 is empty
41 2 2 is empty
22 9 9 is empty
44 5 5 5 is busy (5+5)%13 =10
59 7 4 7 is empty
32 6 3 6 is empty
31 5 4 5 is busy (5+4)%13 =9 is also busy
(9+4)%13 =0
73 8 4 8

The final hash table:

0 31
2 41
5 18
6 32
7 59
8 73
9 22
10 44

Example 2
Consider inserting keys 10,22,31,4,15,28,17,88,59 into a
hash table of length m=11 using open addressing with the
hash function h1 (k) =k mod 11.
Illustrate the resulting table when using double hashing
with h2 (k) =1+(k mod 10)
H(k)= h1 (k)+j*h2 (k) ,j=0,1,2 … m-1
k h(k)
10 10 10 is empty
22 0 0 is empty
31 9 9 is empty
4 4 4 is empty
15 4 4 is busy so calculate h2(k)=1+15 mod 10 =6
So (4+6 mod 11= 10) is busy
So (4+2*6) mod 11=5 is empty
28 6 6 is empty
17 6 6 is busy so calculate h2(k)=1+17 mod 10 =8
So (6+8 mod 11= 3) is empty
88 0 0 is busy so calculate h2(k)=1+88 mod 10 =9
So (0+9 mod 11= 9) is busy
So(0+2*9 mod 11) =18 mod 11 =7 is empty
59 4 4 is busy so calculate h2(k)=1+(59 mod 10) =10
So (4+10 mod 11= 3) is busy
So 4+2*10 mod 11= 24 mod 11 =2 is empty

The final hash table will be as following: -

0 1 2 3 4 5 6 7 8 9 10 11

22 59 17 4 15 28 88 31 10

Notes in Open Addressing:
- To handle insertions and deletions, we introduce a
special object, called AVAILABLE, which replaces
deleted elements.
Remove From hash table:
- We search for an entry with key k
- If such an entry k is found,
o We replace it with the special item
AVAILABLE and we return element o
Insert into hash table
- We start at cell h(k)
- We probe consecutive cells until
o A cell i is found that is either empty or stores

Analysis of hash tables
Main operation: Search of item in table
- Worst-case cost of finding an item O(n)

- Average Case can be constant time O(1)

- Worst-case analysis does not make sense for hash

tables, look at average case cost

- Cost highly depends on the load factor

- Load factor  of a hash table is the ratio:

Where N is no. of elements inserted in hash table and
M is array size

- increasing the size of a hash table's array
- re-storing all of the items into the array using the
hash function
o When should we rehash?
 when load reaches a certain level
 when an insertion fails
- To get O(1) average case performance for lookups
and insertions, need
o good hash function
 distributes objects evenly among all indeces
o a load factor that is not too high
 choose table size well appropriate to number
of elements you expect to store
o keep rehashing to a minimum
 choose the largest initial capacity size you
can reasonably afford.

Hash versus tree
Which is better, a hash table or a search tree

Hash Tree
- Better average for - Guarantee on worst-case search time
lookup and insertion - Possible successor and predecessor
- Easy to access items in sorted order


