Professional Documents
Culture Documents
8. Hashing
8. Hashing
8. Hashing
What is Hashing?
Hashing is the process of mapping large amount of data item to smaller table with the help of hashing
function.
Hashing is also known as Hashing Algorithm or Message Digest Function.
It is a technique to convert a range of key values into a range of indexes of an array.
It is used to facilitate the next level searching method when compared with the linear or binary search.
Hashing allows to update and retrieve any data entry in a constant time O(1).
Constant time O(1) means the operation does not depend on the size of the data.
Hashing is used with a database to enable items to be retrieved more quickly.
It is used in the encryption and decryption of digital signatures.
The above figure shows the hash table with the size of n = 10. Each position of the hash table is called
as Slot. In the above hash table, there are n slots in the table, names = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Slot 0,
slot 1, slot 2 and so on. Hash table contains no items, so every slot is empty.
As we know the mapping between an item and the slot where item belongs in the hash table is called the
hash function. The hash function takes any item in the collection and returns an integer in the range of slot
names between 0 to n-1.
Suppose we have integer items {26, 70, 18, 31, 54, 93}. One common method of determining a hash key is
the division method of hashing and the formula is :
Hash Key = Key Value % Number of Slots in the Table
Division method or reminder method takes an item and divides it by the table size and returns the remainder
as its hash value.
Data Item Value % No. of Slots Hash Value
26 26 % 10 = 6 6
70 70 % 10 = 0 0
18 18 % 10 = 8 8
31 31 % 10 = 1 1
54 54 % 10 = 4 4
93 93 % 10 = 3 3
1. Division method
In this the hash function is dependent upon the remainder of a division. For example:-if the record 52, 68, 99, 84
is to be placed in a hash table and let us take the table size is 10.
Then:
h(key) = record% table size.
h(52) = 52%10 = 2
h(68) = 68%10 = 8
h(99) = 99%10 = 9
h(84) = 84%10 = 4
2. Multiplication method
In this method, digits of a key are multiplied and divide the result by the table size to get the address.
For example: consider a key 2134. Now, h(2134) = (2*1*3*4) % 10 = 4 is the address of the key.
4. Folding method
Most machines have a small number of primitive data types for which there are arithmetic instructions.
Frequently key to be used will not fit easily in to one of these data types. The solution is to combine the various
parts of the key in such a way that all parts of the key affect for final result such an operation is termed folding
of the key. That is the key is actually partitioned into number of parts, each part having the same length as that
of the required address. Add the value of each part, ignoring the final carry to get the required address.
This is done in two ways:
Fold-boundary: Here the reversed values of outer parts of key are added.
Suppose, the key is: 12345678 and the required address is of two digits, then break the key into: 21, 34, 56, 87.
Add these, we get 21 + 34 + 56 + 87: 198, ignore first 1 we get 98 as location.
Applications of Hashing:
Hashing provides constant time search, insert and delete operations on average. This is why hashing is one of
the most used data structure, example problems are, distinct elements, counting frequencies of items, finding
duplicates, etc.
There are many other applications of hashing, including modern day cryptography hash functions. Some of
these applications are listed below:
Message Digest
Password Verification
Data Structures(Programming Languages)
Compiler Operation
Rabin-Karp pattern matching Algorithm
Linking File name and path together
Collision
It is a situation in which the hash function returns the same hash key for more than one record, it is called as
collision. Sometimes when we are going to resolve the collision it may lead to an overflow condition and this
overflow and collision condition makes the poor hash function.
What is Collision?
Since a hash function gets us a small number for a key which is a big integer or string, there is possibility that
two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in
hash table is called collision and must be handled using some collision handling technique.
2. Chaining Method
a) Separate Chaining
3. Coalesced Chaining
c) Double Hashing: We use another hash function hash2(x) and look for i*hash2(x) slot in i‟th rotation.
Let hash(x) be the slot index computed using hash function.
If slot hash(x) % S is full, then we try (hash(x) + 1*hash2(x)) % S
If (hash(x) + 1*hash2(x)) % S is also full, then we try (hash(x) + 2*hash2(x)) % S
If (hash(x) + 2*hash2(x)) % S is also full, then we try (hash(x) + 3*hash2(x)) % S
..................................................
Hash function is hash1(key) = key % S and
one good choice is to choose a prime R < S and hash2(key) = R – (key % R)
So, the function is (hash1(key) + i * hash2(key)) % S, for i = 1, 2, 3,…
Chaining Method: The idea is to make each cell of hash table point to a linked list of records that have same
hash function value. Chaining is simple, but requires additional memory outside the table.
Separate Chaining:
The idea is to make each cell of hash table point to a linked list of records that have same hash function value.
Let us consider a simple hash function as “key mod 7” and sequence of keys as 50, 700, 76, 85, 92, 73, 101.
Advantages:
1) Simple to implement.
2) Hash table never fills up, we can always add more elements to chain.
3) Less sensitive to the hash function or load factors.
4) It is mostly used when it is unknown how many and how frequently keys may be inserted or deleted.
Disadvantages:
1) Cache performance of chaining is not good as keys are stored using linked list. Open addressing provides
better cache performance as everything is stored in same table.
2) Wastage of Space (Some Parts of hash table are never used)
3) If the chain becomes long, then search time can become O(n) in worst case.
4) Uses extra space for links.
Performance of Chaining:
Performance of hashing can be evaluated under the assumption that each key is equally likely to be hashed to
any slot of table (simple uniform hashing).
Comparison of above three: Linear probing has the best cache performance but suffers from clustering. One
more advantage of linear probing is easy to compute. Quadratic probing lies between the two in terms of cache
performance and clustering. Double hashing has poor cache performance but no clustering. Double hashing
requires more computation time as two hash functions need to be computed.
S.No. Separate Chaining Open Addressing
1. Chaining is Simpler to implement. Open Addressing requires more computation.
2. In chaining, Hash table never fills up, we can In open addressing, table may become full.
always add more elements to chain.
3. Chaining is Less sensitive to the hash function or Open addressing requires extra care for to avoid
load factors. clustering and load factor.
4. Chaining is mostly used when it is unknown how Open addressing is used when the frequency and
many and how frequently keys may be inserted or number of keys is known.
deleted.
5. Cache performance of chaining is not good as keys Open addressing provides better cache
are stored using linked list. performance as everything is stored in the same
table.
6. Wastage of Space (Some Parts of hash table in In Open addressing, a slot can be used even if an
chaining are never used). input doesn‟t map to it.
7. Chaining uses extra space for links. No links in Open addressing
Coalesced hashing:
Coalesced hashing is a collision avoidance technique when there is a fixed sized data. It is a combination of
both separate chaining and open addressing. It uses the concept of Open Addressing (linear probing) to find first
empty place for colliding element from the bottom of the hash table and the concept of Separate Chaining to link
the colliding elements to each other through pointers. The hash function used is h=(key)%(total number of
keys). Inside the hash table, each node has three fields:
h(key): The value of hash function for a key.
Data: The key itself.
Next: The link to the next colliding elements.
The best case complexity of all these operations is O(1) and the worst case complexity is O(n) where n is the
total number of keys. It is better than separate chaining because it inserts the colliding element in the memory of
hash table only instead of creating a new linked list as in separate chaining.
Example:
n = 10
Input : {20, 35, 16, 40, 45, 25, 32, 37, 22, 55}
Hash function
h(key) = key%10
2. Let‟s start with inserting 20, as h(20)=0 and 0 index is empty so we insert 20 at 0 index.
HASH VALUE DATA NEXT
0 20 NULL
1 – NULL
2 – NULL
3 – NULL
4 – NULL
5 – NULL
6 – NULL
7 – NULL
8 – NULL
9 – NULL
3. Next element to be inserted is 35, h(35)=5 and 5th index empty so we insert 35 there.
HASH VALUE DATA NEXT
0 20 NULL
1 – NULL
2 – NULL
3 – NULL
4 – NULL
5 35 NULL
6 – NULL
7 – NULL
8 – NULL
9 – NULL
6. To insert 45, h(45)=5 which is occupied so again we search for the empty block from the bottom i.e 8 index
value and map the address of this newly inserted node i.e(8) to the Next field of 5th index value node i.e in
the next field of key=35.
HASH VALUE DATA NEXT
0 20 9
1 – NULL
2 – NULL
3 – NULL
4 – NULL
5 35 8
6 16 NULL
7 – NULL
8 45 NULL
9 40 NULL
7. Next to insert 25, h(25)=5 is occupied so search for the first empty block from bottom i.e. 7th index value
and insert 25 there. Now it is important to note that the address of this new node can‟t be mapped on 5th
index value node which is already pointing to some other node. To insert the address of new node we have
to follow the link chain from the 5th index node until we get NULL in next field and map the address of
new node to next field of that node i.e from 5th index node we go to 8th index node which contains NULL
in next field so we insert address of new node i.e(7) in next field of 8th index node.
HASH VALUE DATA NEXT
0 20 9
1 – NULL
2 – NULL
3 – NULL
4 – NULL
5 35 8
6 16 NULL
7 25 NULL
8 45 7
9 40 NULL
9. To insert 37, h(37)=7 which is occupied so search for the first free block from bottom which is 4th index
value. So insert 37 at 4th index value and copy the address of this node in next field of 7th index value node.
HASH VALUE DATA NEXT
0 20 9
1 – NULL
2 32 NULL
3 – NULL
4 37 NULL
5 35 8
6 16 NULL
7 25 4
8 45 7
9 40 NULL
10. To insert 22, h(22)=2 which is occupied so insert it at 3rd index value and map the address of this node in
next field of 2nd index value node.
11. Finally, to insert 55 h(55)=5 which is occupied and the only empty space is 1st index value so insert 55
there. Now again to map the address of this new node we have to follow the chain starting from 5th index
value node until we get NULL in next field i.e from 5th index->8th index->7th index->4th index which
contains NULL in Next fied, and we insert the address of newly inserted node at 4th index value node.
HASH VALUE DATA NEXT
0 20 9
1 55 NULL
2 32 3
3 22 NULL
4 37 1
5 35 8
6 16 NULL
7 25 4
8 45 7
9 40 NULL
Case 1: To delete key=37, first search for 37. If it is present then simply delete the data value and if the node
contains any address in next field and the node to be deleted i.e 37 is itself pointed by some other node(i.e
key=25) then copy that address in the next field of 37 to the next field of node pointing to 37(i.e key=25) and
initialize the NEXT field of key=37 as NULL again and erase the key=37.
Case 2: If key to be deleted is 35 which is not pointed by any other node then we have to pull the chain attached
to the node to be deleted i.e 35 one step back and mark last value of chain to NULL again.
HASH VALUE DATA NEXT
0 20 9
1 – NULL
2 32 3
3 22 NULL
4 – NULL
5 45 8
6 16 NULL
7 55 NULL
8 25 7
9 40 NULL