Professional Documents
Culture Documents
Hashing: Fundamentals, Solving Search and Insert Problem Using Hashing, Deletion From Hash Table, Collision Resolution
Hashing: Fundamentals, Solving Search and Insert Problem Using Hashing, Deletion From Hash Table, Collision Resolution
Hashing: Fundamentals, Solving Search and Insert Problem Using Hashing, Deletion From Hash Table, Collision Resolution
This may not be a good choice where table size is larger than 9
• In general, for a range of n values, following hash function turns out to be a reasonable
choice:
key % n
• Thus, for n = 10 and key = 23, we will get index = 3 i.e., 23 will be placed at index 3
In hashing, we want the keys to be scattered all over the table. If, suppose the keys are
hashed to only one area in the table, we can end up with an unnecessarily high number of
collisions. Such possibilities should be avoided
Mid-Square Method
• The key is multiplied by itself and the address is obtained by selecting
an appropriate number of bits or digits from the middle of the square
• Usually, the number of bits chosen depends on the size of the hash
table
• The same position in the square must be used for all values
𝑘 𝑥 = σ𝑟𝑖=1 𝑘𝑖 𝑥 𝑖−1 i = 1, 2… r
• Above polynomial is used as an intermediate for multiplicative
hashing
Handling Collision
• Collision might happen when a key maps to an index where an element already
exists
• Easiest way is to put the element in the next location
• If this location is also filled we keep moving to next locations until a free location
is found
• In such cases, while searching these items must be searched linearly
• So total time becomes – hashing (constant time) + linear search over items with
collision
• During deletion, if a location is empty a special key may be used to mark free
location which can be filled later
• In practice, we never allow the hash table to become completely full
• In general, the hash technique works better when there are more free locations
in the table
Collision Resolution
• We have seen the way of resolving collision by looking at the next
location in the table
• This method is also known as Linear Probing
• There are other methods also for finding a location to place an
element if a collision occurs
• The three main techniques used for resolving collision are – Linear
Probing, Quadratic Probing and Chaining
Collision Resolution
• Linear Probing
• Quadratic Probing
• Random Probing
• Rehashing
Linear Probing
• for a given key and table size n we get location for insertion as:
loc = key %n
• If this location is not free, we apply linear probing as
loc = (loc+1) %n
i.e., we go for next location, if is not free, we move to next
• Suppose, multiple key hash to a single location, in this case we will
keep on adding values to next available locations
• In some cases, these elements might form long chains, resulting into
more collisions
• This phenomenon is called clustering.
• This is one of the main drawbacks of linear probing
• It is also possible that chains for two different key values join each
other and form even longer chain, again the possibility of this
happening increases with long chains
• If at any point we reach the end of the table in this process, we wrap
around from that point
Performance of Quadratic Probing
• Here, the keys that map to different locations trace different
sequences therefore, primary clustering is eliminated
• Secondary clustering still remains
• If n is a power of 2 that 𝑛 = 2𝑚 for some m, this method explores
only a small fraction of the locations in the table and is therefore, not
very effective
• If n is prime, the method can reach half the locations in the table; this
is usually sufficient for most practical purposes
Pseudo-Random Probing
• In this method, a random sequence of positions is generated in place
of an ordered sequence, when a collision occurs
• The random sequence generated must contain every position
between 1 and n exactly once
• The table is full when the first duplicate position is generated
• This method reduces the problem of primary clustering
• Because of the expense of random number generator, this method is
not often used
• The pseudo-random generator uses ith permutation of the numbers
from 1 to m and uses it as probe sequence to find out next free
location.
Double Hashing
• In this method, if a collision occurs, another hash function is used to
decide the next location to search for insertion
• The value generated by the second hash function gives the offset
from the original location
• The value of offset usually depends on the key and therefore reduces
the chances of primary clustering
• When the size of the table is a prime number, the double hashing is
seen to perform very well in practice
Chaining
• Open Addressing is applicable where the key values themselves are stored
as table entries, it is also known as closed addressing
• Another option could be to store pointers to key values, this results into a
new method for collision resolution known as chaining
• In this method, all items that hash to the same location are held on a linked
list
• Each time an element is not found at its hashed location, the
corresponding linked list is searched in sequential manner
• Chaining is usually implemented in a manner such that the hash table
actually contains a pointer to the top of the linked list and each element is
represented by a separate linked list
Applications
• Encryption (Message digest)
• Compiler Operation (Symbol table)
• Rabin-Karp Algorithm
Symbol Table
• It is a set of name-value pairs
• Associated with each name in the table is an attribute or a collection
of attribute or some instructions about what further processing is
needed
• they are normally used when building loaders, assemblers, compilers
or any key-word driven translator
Operations on Symbol Table
• The operations that are generally performed on
Symbol table are:
• Checking for the presence of a particular key
• Retrieving attributes for a particular key value
• Inserting a new name and its value
Name Value
• Deleting a name and its value
.. ..
.. ..