Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

CSN 102: DATA

STRUCTURES
Indexing and Hashing
Example
• Consider large number of records, with multiple fields in
one record

SID Name Email Phone


17103001 Paras Gupta paras.gupta986745@gmail.com 7889541349
17103002 Sanamdeep Singh sanamdeepsingh1@gmail.com 8284942755
17103003 Shreya Gupta smartsweetcherry@gmail.com 7347555334
17103004 Ashutosh sah ashutoshsah2000@gmail.com 9766749590
17103005 Barleen Dhaliwal barleen.dhaliwal@gmail.com 7009047379
17103006 Pranav Dhingra prnvdhngr323@gmail.com 8284841852
Motivation for Indexing
• Search(SID=17103004): Apply binary search on SID
column
• Search(Phone=7347555334): ??
Motivation for Indexing
• Search(SID=17103004): Apply binary search on SID
column
• Search(Phone=7347555334): Linear search over phone
column. Time complexity O(n)
OR
Sort complete data over phone columns, and then binary
search. Time complexity = O(n log n) + O(log n)
= O(n log n)

• Sorting and then searching is therefore not efficient.


Indexing
• Create a table having two entries(key, pointer to record)
for the column. Sort this table. [its one time thing]
• Next time searching a non-sorted column in data set,
search this index table to get address of the desired
record in O(log n)
Index Table: Example
Phone Row No
7009047379 5
7347555334 3
7889541349 1
8284841852 6
8284942755 2
9766749590 4

• Search(Phone=7347555334): O(log n) using index table


Types of Indexing
• Dense indexing: Index table has entry for each record
and a pointer to corresponding record
• Searching using dense index is locating record in index
table and accessing the record using pointer
• Eg.
Phone Row No
7009047379 5
7347555334 3
7889541349 1
8284841852 6
8284942755 2
9766749590 4
Types of Indexing(cont’d)
• Sparse indexing: index table has only some entries
• Possible only when records are sorted on key
• To search some key, find range of records with in which
desired record exists
• Eg.

SID Row No
17103001 1
17103003 3
17103005 5
Multilevel Indexing
• The purpose of indexing is also to reduce the number of
disk accesses
• If size of index table is too large, create an index on index
table. This index has to be sparse as to reduce the size of
index table
• Repeat above strategy until size of index table is
sufficiently small
• Searching happens at the highest level and subsequently
goes to smaller levels upto 1st level
Multilevel Indexing(cont’d)
• Example:

Phone Row No
7009047379 5
Phone Row No 7347555334 3
7009047379 1 7889541349 1
7889541349 3 8284841852 6
8284942755 5 8284942755 2
9766749590 4
2nd level Index
1st level Index
Hash Function
• Hash function is a function which maps data of arbitrary
size to fixed size
• Hash function maps key to hash codes/hash
values/hashes.
• Hash function has many application like generating hash
tables, encrypting etc
Hashing
• Key in the data set is mapped to a hash code(index of
hash table) using hash function
• Hash table therefore stores key and a pointer to the
record in actual data set

Key Hash Code


Hash Function
Hash Table: Example
Index Phone Row
0 9766749590 3

SID Name Phone 1

17103001 Paras Gupta 7889541349 2 8284841852 5

17103002 Sanamdeep Singh 8284942755 3

17103003 Shreya Gupta 7347555334 4 7347555334 2

17103004 Ashutosh sah 9766749590 5 8284942755 1

17103005 Barleen Dhaliwal 7009047378 6

17103006 Pranav Dhingra 8284841852 7


8 7009047378 4
9 7889541349 0

Hash Table
Hash Function: Example
• Key%M, where M is size of hash table
• Key folding%M: eg- if keys have 3n length, than make
pair of 3 keys and sum it. Further, take mode M.
Let key= 123456789
fold keys= 123+456+789 = 1368
%M = 1368%1000 = 368

Let key=789456123
fold keys= 789+456+123=1368
%M = 1368%1000 = 368
Collision
• Collision is when two keys are mapped to same hash
index
• We need to Collision Resolving Techniques.
• Some techniques are:
1. Chaining
2. Open Addressing:
1. Linear probing
2. Quadratic probing
3. Double Hashing
Chaining
• If two or more keys maps to the same hash index, create
a linked list and store keys
• Eg. Let M=10 and Index Key
keys={10,20,30,40,50} 0 10 20 30
1
• Any number of keys can
2 40
be accommodated
3
50
• Search time is more 4
5
6
7
8
9
Linear Probing
• If the hash index is not available, store at the next
available index
• Add i to hash code and take %M. i =1,2,3,4…
Quadratic probing
• If the hash index is not available, increase the hash code
by i+i2 , where i=1,2,3,4…
Double Hashing
• Use two hash functions to generate hash code for any key
• (H1+H2)%M
Search in Hash Table
• In order to search a key, generate the hash index for the
key and search at that location in hash table
• If not found, continue probing based on the Collision
Resolving Technique used.
Perfect Hash Function
• If each key is mapped to a unique hash code for a given
data set, that function is called perfect hash function
• Practically hard to achieve
• Search time using perfect hash function is always
constant.

You might also like