Professional Documents
Culture Documents
10 Hashing Indexing
10 Hashing Indexing
STRUCTURES
Indexing and Hashing
Example
• Consider large number of records, with multiple fields in
one record
SID Row No
17103001 1
17103003 3
17103005 5
Multilevel Indexing
• The purpose of indexing is also to reduce the number of
disk accesses
• If size of index table is too large, create an index on index
table. This index has to be sparse as to reduce the size of
index table
• Repeat above strategy until size of index table is
sufficiently small
• Searching happens at the highest level and subsequently
goes to smaller levels upto 1st level
Multilevel Indexing(cont’d)
• Example:
Phone Row No
7009047379 5
Phone Row No 7347555334 3
7009047379 1 7889541349 1
7889541349 3 8284841852 6
8284942755 5 8284942755 2
9766749590 4
2nd level Index
1st level Index
Hash Function
• Hash function is a function which maps data of arbitrary
size to fixed size
• Hash function maps key to hash codes/hash
values/hashes.
• Hash function has many application like generating hash
tables, encrypting etc
Hashing
• Key in the data set is mapped to a hash code(index of
hash table) using hash function
• Hash table therefore stores key and a pointer to the
record in actual data set
Hash Table
Hash Function: Example
• Key%M, where M is size of hash table
• Key folding%M: eg- if keys have 3n length, than make
pair of 3 keys and sum it. Further, take mode M.
Let key= 123456789
fold keys= 123+456+789 = 1368
%M = 1368%1000 = 368
Let key=789456123
fold keys= 789+456+123=1368
%M = 1368%1000 = 368
Collision
• Collision is when two keys are mapped to same hash
index
• We need to Collision Resolving Techniques.
• Some techniques are:
1. Chaining
2. Open Addressing:
1. Linear probing
2. Quadratic probing
3. Double Hashing
Chaining
• If two or more keys maps to the same hash index, create
a linked list and store keys
• Eg. Let M=10 and Index Key
keys={10,20,30,40,50} 0 10 20 30
1
• Any number of keys can
2 40
be accommodated
3
50
• Search time is more 4
5
6
7
8
9
Linear Probing
• If the hash index is not available, store at the next
available index
• Add i to hash code and take %M. i =1,2,3,4…
Quadratic probing
• If the hash index is not available, increase the hash code
by i+i2 , where i=1,2,3,4…
Double Hashing
• Use two hash functions to generate hash code for any key
• (H1+H2)%M
Search in Hash Table
• In order to search a key, generate the hash index for the
key and search at that location in hash table
• If not found, continue probing based on the Collision
Resolving Technique used.
Perfect Hash Function
• If each key is mapped to a unique hash code for a given
data set, that function is called perfect hash function
• Practically hard to achieve
• Search time using perfect hash function is always
constant.