Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 36

Data Structures and Applications

Course
Code: BCS304
Module 5
TEXT BOOKS

1. Data Structures using C‖, Reema Thareja, 2nd Edition, 2018, Oxford University Press.
2. Data Structures using C‖, Aaron M Tenenbaum, Yedidyah Langsam and Moshe J Augenstein,
2014, low price edition, Pearson education.

1
2
Module 5
Sorting and Hashing • Hashing and Collision: Introduction
• Hash Tables
• Sorting: Introduction to Sorting • Different Hash
• Radix Sort • Functions
• Heap Sort • Collisions
• Pros and Cons of Hashing
• Applications of Hashing

3
Sorting
• Sorting means arranging the elements of an array so that they are placed in
some relevant order which may be either ascending or descending.
• Efficient sorting algorithms are widely used to optimize the use of other
algorithms like search and merge algorithms which require sorted lists to
work correctly.
• There are two types of sorting:
• Internal sorting which deals with sorting the data stored in the
computer’s memory
• External sorting which deals with sorting the data stored in files.
• External sorting is applied when there is voluminous data that cannot be
stored in the memory.
4
RADIX SORT
• Radix sort is a linear sorting algorithm for integers and uses the concept of sorting
names in alphabetical order.
• When the list of sorted names exist, the radix is 26 (or 26 buckets) because there are 26
letters in the English alphabet.
• So radix sort is also known as bucket sort.
• Words are first sorted according to the first letter of the name.
• That is, 26 classes are used to arrange the names, where the first class stores the names
that begin with A, the second class contains the names with B, and so on.
• During the second pass, names are grouped according to the second letter.
• This process is continued till the nth pass, where n is the length of the name with
maximum number of letters.
• After every pass, all the names are collected in order of buckets.
• That is, first pick up the names in the first bucket that contains the names beginning
with A.
5
• In the second pass, collect the names from the second bucket, and so on.
Sort the numbers given below using radix sort.
345, 654, 924, 123, 567, 472, 555, 808, 911

• First pass: units


place is sorted .
• Second pass:
Tens place is
sorted

6
Third pass: Hundreds place is sorted

• The numbers are collected bucket by bucket.


• The new list thus formed is the final sorted result-
• 123, 345, 472, 555, 567, 654, 808, 911, 924.

7
Algorithm for RadixSort (ARR, N)
• Step 1: Find the largest number in ARR as LARGE
• Step 2: Find NOD Number of digits in LARGE ( largest Number)
• Step 3: SET PASS = 0
• Step 4: Repeat Step 5 while PASS <= NOD-1
• Step 5: SET I = 0 and INITIALIZE buckets
• Step 6: Repeat Steps 7 to 9 while I < N - 1
• Step 7: SET DIGIT = digit at PASSth place in ARR[I]
• Step 8: Add ARR[I] to the bucket numbered DIGIT
• Step 9: INCEREMENT bucket count for bucket numbered DIGIT
[END OF LOOP]
• Step 10 : Collect the numbers in the bucket [END OF LOOP]
• Step 11: END 8
#include <stdio.h>
Radix sort # define size 10
int largest(int arr[], int n); int largest(int arr[], int n)
void radix_sort(int arr[], int n); {
void main() { int large=arr[0], i;
int arr[size], i, n;
printf("\n Enter the value of n: "); for(i=1;i<n;i++)
scanf("%d", &n); {
printf("\n Enter the array elements: "); if(arr[i]>large)

for(i=0;i<n;i++) large = arr[i];


}
scanf("%d", &arr[i]);
return large;
radix_sort(arr, n);
}
printf("\n The sorted array is: \n");
for(i=0;i<n;i++)
printf(" %d\t", arr[i]); }
9
void radix_sort(int arr[ ], int n)
{ for(i=0;i<n;i++)
int buket[size][size], bcount[size]; {
int i, j, k, rem; // sort numbers according to digit place
int NOD=0, divisor=1, large, pass; rem = (arr[i]/divisor)%size;
buket[rem][bcount[rem]] = arr[i];
large = largest(arr, n); bcount[rem] += 1;
}
while(large>0)
// collect back the numbers after PASS pass
{
i=0;
NOD++; for(k=0;k<size;k++)
large /= size; {
} for(j=0;j<bcount[k];j++)
// Initialize the buckets arr[i++] = buket[k][j];
for(pass=0; pass<NOD; pass++) }
{ divisor *= size;
for(i=0;i<size;i++) //0-9 }
bcount[i]=0; }
10
HEAP SORT
• Given an array ARR with n elements, the heap sort algorithm is used to sort
ARR in two phases:
• Phase 1- build a heap H using the elements of ARR.
• Phase 2- After the creation of heap, remove the root element of the heap by
shifting it to the end of the array, and then store the heap structure with the
remaining elements.
• Repeat Phase 1 and 2.
• In a max heap, the largest value in heap H is always present at the root node.
• So in phase 2, when the root element is deleted, the elements of ARR in
decreasing order is collected.

11
Algorithm
• HEAPSORT(ARR, N)
• Step 1: [Build Heap H]
Repeat forI= to N-1
CALL Insert_Heap(ARR, N, ARR[I])
[END OF LOOP]
• Step 2: (Repeatedly delete the root element)
Repeat while N> CALL
Delete_Heap(ARR, N, VAL)
SET N=N+1
[END OF LOOP]
• Step 3: END
12
Example- 81, 89, 9, 11, 14, 76, 54, 22
• First, construct a heap from the given • Next, delete the root element (89) from the
array and convert it into max heap. max heap, by swapping with the last node,
(11). After deleting the root element, again
heapify it to convert it into max heap.

• After converting the given heap into


• After converting the given heap into max
max heap, the array elements are heap, the array elements are
Example- 81, 89, 9, 11, 14, 76, 54, 22
• Again, delete the root element (81) from • Next, delete the root element (76) from the
the max heap, by swapping it with the last max heap, by swapping with the last node,
node, (54). After deleting the root element, (9). After deleting the root element, again
again heapify it to convert it into max heap. heapify it to convert it into max heap.

• After converting the given heap into max • After converting the given heap into max
heap, the array elements are heap, the array elements are
Example- 81, 89, 9 11, 14, 76, 54, 22
• Again, delete the root element (54) from • Next, delete the root element (22) from the
the max heap, by swapping it with the last max heap, by swapping with the last node,
node, (14). After deleting the root element, (11). After deleting the root element, again
again heapify it to convert it into max heap. heapify it to convert it into max heap.

• After converting the given heap into max • After converting the given heap into max
heap, the array elements are heap, the array elements are
Example- 81, 89, 9 11, 14, 76, 54, 22
• Again, delete the root element (14) from • Next, delete the root element (11) from the
the max heap, by swapping it with the last max heap, by swapping with the last node,
node, (9). After deleting the root element, (9). After deleting the root element, again
again heapify it to convert it into max heap. heapify it to convert it into max heap.

• After converting the given heap into max • After converting the given heap into max
heap, the array elements are heap, the array elements are
• Now, heap has only one element left and after deleting it, heap will be
empty.
• After completion of sorting, the array elements are:

• Finally the sorted array is:

17
Hashing and Collision
• Hashing is a technique or process of mapping keys and values into the hash table by
using a hash function.
• when two or more keys map to the same memory location, a collision occurs
• It is a process of converting a data set of variable size into a data set of a fixed size.
• It is done for faster access to elements by using a lookup table called Hash Table
and a function to lookup called Hash Function

18
HASH TABLES
• Hash table is a data structure in which keys are mapped to array positions by
a hash function.
• A hash function can extract last one or two or so on digits of the key.
• Therefore, map the keys to array locations or array indices.
• In a hash table, an element with key k is stored at index h(k) and not k.
• It means a hash function h is used to calculate the index at which the element
with key k will be stored.
• The process of mapping the keys to appropriate locations (or indices) in
a hash table is called hashing.
• The main goal of using a hash function is to reduce the range of array indices
that have to be handled.
19
HASH FUNCTIONs
• A hash function is a mathematical formula which, when applied to a key,
produces an integer which can be used as an index for the key in the hash
table.
• The main aim of a hash function is that elements should be relatively,
randomly, and uniformly distributed.
• It produces a unique set of integers within some suitable range in order to
reduce the number of collisions.
• In practice, there is no hash function that eliminates collisions completely.
• A good hash function can only minimize the number of collisions by
spreading the elements uniformly throughout the array.

20
Properties of a Good Hash Function
• Low cost : The cost of executing a hash function must be small, so that
using the hashing technique becomes preferable over other approaches.
• Determinism : A hash procedure must be deterministic. This means that
the same hash value must be generated for a given input value.
• However, this criteria excludes hash functions that depend on external variable
parameters (such as the time of day) and on the memory address of the object being
hashed (because address of the object may change during processing).
• Uniformity : A good hash function must map the keys as evenly as possible
over its output range.
• This means that the probability of generating every hash value in the output range
should roughly be the same. The property of uniformity also minimizes the number
of collisions.
21
DIFFERENT HASH FUNCTIONS
1. Division Method - the most simple method of hashing an integer x.
• This method divides x by M and then uses the remainder obtained.
• In this case, the hash function can be given as h(x) = x mod M
• choose M to be a prime number because making M a prime number increases the
likelihood that the keys are mapped with a uniformity in the output range of values.
2. Multiplication Method
• Step 1: Choose a constant A such that 0 < A < 1.
• Step 2: Multiply the key k by A.
• Step 3: Extract the fractional part of kA.
• Step 4: Multiply the result of Step 3 by the size of hash table (m).
• Hence, the hash function can be given as: h(k) = m (kA mod 1) ˩
22
• Given a hash table of size m=1000, map the key k=12345 to an appropriate
location in the hash table.
• Solution : use A = 0.618033, m = 1000, and k = 12345
• h(12345) = 1000 (12345 ¥ 0.618033 mod 1)˩
• h(12345) = 1000 (7629.617385 mod 1)
• h(12345) = 1000 (0.617385)
• h(12345) = 617.385 ˩
• h(12345) = 617

23
3. Mid-Square Method
• The mid-square method is a good hash function which works in two steps:
• Step 1: Square the value of the key k1. That is, find k2.
• Step 2: Extract the middle r digits of the result obtained in Step 1.
h(k) = s where s is obtained by selecting r digits from k2.

• Calculate the hash value for keys 1234 and 5642 using the mid-square method.
• The hash table has 100 memory locations.
• Solution Note that the hash table has 100 memory locations whose indices vary
from 0 to 99, means that only two digits are needed to map the key to a location
in the hash table, so r = 2.
• When k = 1234, k2 = 1522756, h (1234) = 27
• When k = 5642, k2 = 31832164, h (5642) = 21
• The 3rd and 4th digits starting from the right are chosen. 24
4. Folding Method - works in the following two steps:
• Step 1: Divide the key value into a number of parts. That is, divide k into parts k1, k2, ...,
kn, Where each part has the same number of digits except the last part which may have
lesser digits than the other parts.
• Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ... + kn. The hash
value is produced by ignoring the last carry, if any.

• Given a hash table of 100 locations, calculate the hash value using folding
• method for keys 5678, 321, and 34567.
• Solution : Since there are 100 memory locations to address, we will break the key into
parts where each part (except the last) will contain two digits. The hash values can be
obtained as shown below:

25
COLLISIONS
• Collisions occur when the hash function maps two different keys to the
same location.
• The two most popular methods of resolving collisions are:
• 1. Open addressing
• 2. Chaining

26
Collision Resolution by Open Addressing
• Once a collision takes place, open addressing or closed hashing computes new positions
using a probe sequence and the next record is stored in that position.
• In this technique, all the values are stored in the hash table.
• The hash table contains two types of values: sentinel values (e.g., –1) and data values.
• The presence of a sentinel value indicates that the location contains no data value at
present but can be used to hold a value.
• When a key is mapped to a particular memory location, then the value it holds is checked.
• If it contains a sentinel value, then the location is free and the data value can be stored in
it.
• However, if the location already has some data value stored in it, then other slots are
examined systematically in the forward direction to find a free slot.
• If even a single free location is not found, then we have an OVERFLOW condition.
• The process of examining memory locations in the hash table is called probing.

27
• Open addressing technique can be implemented using linear probing,
quadratic probing, double hashing, and rehashing.
• Linear Probing: where m is the size of the
hash table, h’(k) = (k mod m), and i is the probe number that varies from 0
to m–1.
• Quadratic Probing: where m is the
size of the hash table, h’(k) = (k mod m), i is the probe number that varies
from 0 to m–1, and c1 and c2 are constants such that c1 and c2 ≠ 0.
• Double Hashing: where m is the
size of the hash table, h1(k) and h2(k) are two hash functions given as
h1(k) = k mod m, h2(k) = k mod m', i is the probe number that varies from
0 to m–1, and m' is chosen to be less than m, can choose m' = m–1 or m–2.
• Rehashing: create a new hash table with size double of the original hash
table.
28
Linear Probing - hash function “key mod 7” and a sequence of keys as 50,
700, 76, 85, 92, 73, 101 ,76

29
Quadratic probing – hash function “key mod (S=7)” and a sequence of keys
as 50, 700, 76, 85, 92, 73, 101 ,76
•If the slot hash(x) % S is full, then we try (hash(x) + 1*1) % S.
•If (hash(x) + 1*1) % S is also full, then we try (hash(x) + 2*2) % S.
•If (hash(x) + 2*2) % S is also full, then we try (hash(x) + 3*3) % S.
•This process is repeated for all the values of i until an empty slot is found.

30
//program to show searching using closed hashing. do
#include <stdio.h> {
printf( "\n MENU \n1.Insert \n2.Search
#include <conio.h>
\n3.Delete \n4.Display\n5.Exit");
int ht[10], i, found = 0, key; printf( "\n Enter your option.");
void insert_val(); scanf( "%d", &option);
switch (option)
void search_val();
{
void delete_val(); case 1: insert_val();
void display(); break;
case 2: search_val();
int main()
break;
{ case 3: delete_val();
int option; break;
//to initialize every element as ‘–1’ case 4: display();
break;
for ( i = 0;i < 10;i++ ) default: printf("\nInvalid choice entry");
ht[i] = –1; }
}
while (option!=5);
return 0;
} 31
void insert_val() for ( i = key + 1;i < 10;i+ void display()
+) {
1{ for (i = 0;i < 10;i++)
int val, f = 0; if ( ht[i] == –1 ) printf( "\t%d", ht[i]);
{ }
printf("Enter data to insert: "); ht[i] = val;
scanf( "%d", &val ); break; void delete_val()
} {
key = ( val % 10 ) – 1; search_val();
if ( ht[key] == –1 ) if (found==1)
}3 {
ht[key] = val; if ( key != –1 )
else for ( i = 0;i < key;i++ ) {
if ( ht[i] == –1 ) printf("\nDeleted data
2{
%d", ht[key]);
if ( key < 9 ) { ht[ key ] = –1;
ht[i] = val; }
3{ break; }
} }

}2 32
void search_val()
if (flag == 0)
{ {
int val, flag = 0; for (i = 0;i < key;i++)
if (ht[ i ] == val)
printf( "\nEnter element to search: " );
{
scanf( "%d", &val ); flag = 1;
key = ( val % 10 ) – 1; key = i;
break;
if ( ht[ key ] == val ) }
flag = 1; }
else
if (flag == 1)
{ {
for (i = key + 1;i < 10;i++) found=1;
printf("\n Item found at position %d",key+1);
if(ht[i] == val)
}
{ else
flag = 1; {
key = –1;
key = i;
printf( "\nThe item not found");
break; }
} }
} 33
Collision Resolution by Chaining
• In chaining, each location in a hash table stores a pointer to a linked list
that contains all the key values that were hashed to that location.
• That is, location l in the hash table points to the head of the linked list of all
the key values that hashed to l.
• However, if no key value hashes to l, then location l in the hash table
contains NULL.
• Chained hash tables with linked lists are widely used due to the simplicity
of the algorithms to insert, delete, and search a key using regular linked list
code.

34
Hash function as “key mod 7” and keys are 50, 700, 76, 85, 92, 73, 101,76

• 50% 7 = 1,
• 700 % 7 = 0,
• 76 % 7 = 6
• 85 % 7 = 1 chained
92 % 7 = 1 chained

35
Example

• Hash function is h(x) = x mod 8. and keys (x) are 15, 47, 23, 34, 85, 97, 65,
89, 70.

36

You might also like