Hash Tables

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Hash Tables

Learning Outcome
Successful students should be able to:
 Explain the hash table data structure, its advantages and
disadvantages
 Explain collision in hash table and method of handling them
 Use and implement hash table in problem solving
Problem
Lets say we have a list of students records. The primary key is
Student id (unique) .
We need the following operation
Student ID Name Contact Number Address
1) Insert 12345 Sean 018-8888888 8, Jalan Fatt Chye
2) Search 22222 Carmen 019-9999999 9, Jalan Forever
68686 Gladys 016-6868888 68, Jalan Happy
3) Delete : : : :
: : : :

Question : what is the best data structure to use ?

Ver 2.0
Possible solution…
■ Use an Array
– Searching – Linear time O(n)
– Storing in sorted array -> use binary search technique
■ achieve O(log n)
■ but insert and deleting are costly (need to shift data)
■ Linked List
– Searching – Linear time O(n)
■ Balance BST
– Search, insert and delete – O(log n)
■ Direct access table
– Search, insert and delete – O(1)
What is direct access table ?
■ Is data structure that has the capability of mapping records to their corresponding
keys using arrays
– records are placed using their key values directly as indexes.

Array Index
[0]
Key
:
Value(student id)
is used directly :
as array index. [12345] Sean 018-8888888 8, Jalan Fatt Chye
:
Eg: [22222] Carmen 019-9999999 9, Jalan Forever
S[12345]
:
[68686] Gladys 016-6868888 68, Jalan Happy
: :
Direct Access Table - Limitation
■ Prior knowledge of maximum key value
■ Practically useful only if the maximum value is very less.
– Unable to handler for key value that has many digit
Index (integer)
■ It causes wastage of memory space if there is a significant
difference between total records and maximum value.
Hash Table Hash Function :

Map a big numbers or string to a small


 Improvement version of Direct Access Table integer that can be the index of an
array (Hash Table)

H(“12345”) -> 5
Key Value Hash Array Index
(Student ID) Function
[0]
12345 [2] 22222 Carmen 019-9999999 9, Jalan Forever
:
[5] 12345 Sean 018-8888888 8, Jalan Fatt Chye
22222 [6] 68686 Gladys 016-6868888 68, Jalan Happy
:
:
68686
:
[10] :
Hash Table
■ With hashing, the element is stored in slot h(k); that is, we use a hash function h to compute
the slot from the key k.

■ With hashing, this element is stored in slot h(k); that is, we use a hash function h to compute
the slot from the key k.

■ Here, h maps the universe U of keys into the slots of a hash table T[0…m]:
h:U {0, 1, … , m – 1}
where the size m of the hash table is typically much less than
[U].

■ We say that an element with key k hashes to slot h(k); we also say that h(k) is the hash value of
key k.
Hash Function
■ It takes in an item key as its parameter and returns an index location
for that particular item.
■ Hash function uses modulo arithmetic.
■ Example of a simple hash function

int myHash( string key )


{
int value = 0;
for ( int i = 0; i < key.length(); i++ )
value += key[i];
return value % TABLE_LENGTH;
}
Hash Function
■ A hash function can result in a many-to-one mapping and caused
collision.
– Note: Collision occurs when hash function maps two or more
keys to same array index.

■ Collisions cannot be avoided but its chances can be reduced using a


“good” hash function.

■ A “good” hash function should have the properties of:


– Reduced chance of collision - by distributing keys uniformly over
table
– Should be fast to compute
Techniques to deal with Collisions
■ Chaining - Store colliding key in a linked-list at the same
hash table index

■ Open addressing - Store colliding keys elsewhere in the


table
– Linear Probing
– Quadratic Probing
– Double hashing
Collision Technique : Chaining
 Store colliding key in a linked-list at the same hash table
index
Collision Technique : Open Addressing
■ Store colliding keys elsewhere within the table
■ Each element of the hash table holds exactly one piece of data.
■ General ideas:
– generate a sequence of hash values, h0, h1, h2, ..., and
– look at each in the hash table until you find the value (when you are
doing look up), or until you find an empty hash table entry (when you
are inserting, or when you are looking up a value that is not there).
■ 3 strategies
– Linear Probing
– Quadratic Probing
– Double hashing
Open Addressing : Linear Probing
■ Check consecutive hash sequences starting with H(key) until you
find an empty one
Open Addressing : Linear Probing - Example

Insert a new student : eg: Dontonio (assume h(x) = 1)


h1 = (1 + 1) % 10 = 2
h2 = (1 + 2) % 10 = 3

searching a student : eg: “Kelly",


• Find hash value, lets say 3
• Start checking on index 3,4, 5 – empty , conclude that
"Baby-Daisy" is not in the table.

■ Main Problem : Clustering


– many consecutive elements form groups and it starts taking time to find a free slot or to
search an element.
Open Addressing : Quadratic Probing

Eg : Lets say hash value = 2, the sequence you will look for is 2,
if index 2 is full -- > (2 + 1*1 ) % 10 = 3
if index 3 is full  (2 + 2*2) % 10 = 6 ) and so on…
Open Addressing: Double Hashing
■ uses the idea of applying a second hash function to key when a collision occurs.

 A popular second hash function is : hash2(key) = PRIME – (key % PRIME) where


PRIME is a prime smaller than the TABLE_SIZE.
Double Hashing - Example

Double hashing has poor cache performance but no


clustering. Double hashing requires more computation time
as two hash functions need to be computed.
Comparison Chaining vs Open addressing
Chaining Open Addressing
Open Addressing requires more
1. Chaining is Simpler to implement.
computation.
In chaining, Hash table never fills up, we
2. In open addressing, table may become full.
can always add more elements to chain.
Chaining is Less sensitive to the hash
Open addressing requires extra care for to
3. function or load factors (num of key /
avoid clustering and load factor.
capacity).
Chaining is mostly used when it is unknown
Open addressing is used when the
4. how many and how frequently keys may be
frequency and number of keys is known.
inserted or deleted.
Open addressing provides better cache
Cache performance of chaining is not good
5. performance as everything is stored in the
as keys are stored using linked list.
same table.
Wastage of Space (Some Parts of hash In Open addressing, a slot can be used
6.
table in chaining are never used). even if an input doesn’t map to it.
7. Chaining uses extra space for links. No links in Open addressing
Time Complexity
■ Depending on the hash value
■ Worse case O(n)
– Eg: All hash values are the same.
■ Best case O(1)
– Good hash function that do not cause too many key value map to
the same hash value
References

■ Introduction: https://www.geeksforgeeks.org/hashing-set-1-introduction/
■ Collision Handling
– https://www.geeksforgeeks.org/hashing-set-2-separate-chaining/
– https://www.geeksforgeeks.org/hashing-set-3-open-addressing/
– https://www.geeksforgeeks.org/double-hashing/

You might also like