Module 5

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 223

MODULE-5

GRAPHS

Mrs.JIMSHA K MATHEW
Assistant Professor Department of ISE
Acharya Institute Of Technology

Department of ISE Acharya Institute of Technology


MODULE 5

TOPICS
Graphs: Definitions, Terminologies, Matrix and Adjacency List Representation Of
Graphs, Elementary Graph operations, Traversal methods: Breadth First Search
and Depth First Search.
Sorting and Searching: Insertion Sort, Radix sort, Address Calculation Sort.
Hashing: Hash Table organizations, Hashing Functions, Static and Dynamic
Hashing.
Files and Their Organization: Data Hierarchy, File Attributes, Text Files and
Binary Files, Basic File Operations, File Organizations and Indexing

Department of ISE Acharya Institute of Technology


A graph is an abstract data structure that is used to implement
mathematical concept of graphs. It is basically a collection of vertices (also
the
called nodes) and edges that connect these vertices. A graph is often viewed
as a generalization of the tree structure, where instead of having a purely
parent-to-child relationship between tree nodes, any kind of complex
relationship can exist.
Graphs
Definitions
Vertex – It is a synonym for a node and is generally represented by a circle.
Example:
3 6

Department of ISE Acharya Institute of Technology


Edge – If ‘u’ and ‘v’ are two vertices, then an arc or a line joining the two
vertices ‘u’ and ‘v’ is called an edge.
Example:
3 6 4 7

Undirected Edge – An edge without any direction is called an undirected edge


and is denoted by an ordered pair (u, v).
Example: (3, 6)
Directed Edge – An edge with a direction is called an directed edge and is
denoted by an directed pair <u, v>.
Example: <4, 7>
Graph – A graph G consists of two sets V and E and is denoted as G = (V, E),
where,
V is finite, nonempty set of vertices
Department of ISE Acharya Institute of Technology

E is finite set of edges.


Directed Graph – A graph G = (V, E) is called a directed graph if all the edges
are directed. It is also called as a digraph.
Example: 1 2

3 4

The graph G = (V, E), where


V = {1, 2, 3, 4}
E = { (1,2), (1,4), (2,1),
(3,1), (3,4) }

Department of ISE Acharya Institute of Technology


Undirected Graph – A graph G = (V, E) is called a undirected graph if
all the edges are undirected.
Example:
1 2

3 4

The graph G = (V, E), where


V = {1, 2, 3, 4}
E = { (1,2), (2,4), (1,3),
(3,4)}

Department of ISE Acharya Institute of Technology


Self Loop – A loop is an edge which starts and ends on the same vertex.
It is represented by an ordered pair (i,i).
Example:
1 2

3 4

I n the above graph the self loops are (1, 1) and (2, 2)

Department of ISE Acharya Institute of Technology


Complete Graph – A graph G = (V,E) is said to be a complete graph, if there
exists an edge between every pair of vertices.
Example:
1 2

3 4

Note: In a complete graph with ‘n’ vertices there will be n(n-1)/2 edges.

Department of ISE Acharya Institute of Technology


Subgraph – A subgraph G’ of a graph G is graph such that V(G’) V(G) and
E(G’) E(G).
Example: Consider the graph G

Department of ISE Acharya Institute of Technology


Path – A path from a vertex ‘u’ to a vertex ‘v’ in a graph G is a sequence of
vertices u, i1, i2,……..ik, v such that (i1, i2), (i2, i3), ………( ik, v) are edges in
E(G).
Example: Consider the graph shown below

A path from vertex 1 to vertex 5 is given by 1, 3, 4, 5

Department of ISE Acharya Institute of Technology


Length of the path – It is the number of edges in the path.
Example: In the above graph, the length of the path from vertex 1 to vertex 5
is 3.

Cycle – A cycle is a simple path in which the first and last vertices are the
same.
Example: In the above graph the path 1 2 4 3 1 represents a cycle.

Note: A graph with atleast one cycle is called as a cyclic graph and a graph
with no cycles is called as an acyclic graph.

Department of ISE Acharya Institute of Technology


Connected Graph – A graph G is said to be a connected graph if and only if
there exists a path between every pair of vertices
Example: The following is a connected graph

Disconnected Graph – A graph G is said to be a disconnected graph if there


exists atleast one vertex in the graph such that it cannot be reached from any
of the other vertices in the graph.
Example: The following is a disconnected graph

Department of ISE Acharya Institute of Technology


Degree, In-degree and Out-degree
The degree of a vertex is the number of edges incident to that vertex.
Example: Consider the graph G

Department of ISE Acharya Institute of Technology


Representation /Implementation of Graphs

To implement a graph, you need to first represent the given information in the form of a graph.
The most commonly used ways of representing a graph are as follows:

Set Representation
Adjacency Matrix (Array-based implementation)
Adjacency List(Linked list representation)
Set Representation
Two sets are maintained 1) V, the set of vertices
2) E, the set of Edges (V * V)
If the graph is weighted, then the set E is the ordered collection of 3 tuples.
E=W*V*V where W is the weight

V(G1)={ v1,v2,v3,v4, v5,v6,v7}


E(G1)={(v1,v2),(v1,v3),(v2,v4),(v2,v5),(v3,v6),(v3,v7)}
V(G4)={A,B,C,D,}
E(G4)={(3,A,C), (5,B,A), (1,B,C), (7,B,D), (2,C,A), (4,C,D), (6,D,B), (8,D,C)}

Advantages
1. Efficient utilization of memory
2. Easy representation

Disadvantages
3. We can’t represent parallel , loop edges
4. Graph manipulation is not possible with set
Array-based implementation- Adjacency Matrix

 Adjacency matrix is a sequential representation.

 It is used to represent which nodes are adjacent to each other. i.e. is there any
edge connecting nodes to a graph.

 In this representation, we have to construct a nXn matrix A.


Adjacency Matrix Representation
y Matrix
Consider the following
graph: v1 v2 v3 v4

v1 0 1 0 0
v2 0 0 1 0
v3 0 0 0 0
v4 1 0 1 0
Undirected graph representation
Directed graph representation
weighted graph representation
Linked-list implementation - Adjacency List

 A list is used for each vertex v which contains the vertices which are
adjacent from v (adjacency list) .

 Adjacency list is a linked representation.

In this representation, for each vertex in the graph, we maintain the list of its
neighbors.
It means, every vertex of the graph contains list of its adjacent vertices.
We have an array of vertices which is indexed by the vertex number and for each
vertex v, the corresponding array element points to a singly linked list of neighbors
of v.
Linked representation( Adjacency List)
linked representation is another space – saving way of graph
representation

Node structure for non – weighted graph

NODE LABEL ADJ_LIST

Node structure for weighted graph


WEIGHT NODE LABEL ADJ_LIST
Header node in each list maintains a list of all adjacent vertices of a node.
Weighted graph
Graph Traversal/Searching

Methods:

 Depth-First-Search (DFS) :Algorithm travels from a starting node to some end


node before repeating the search down a different path from the same start node
until the query is answered
 Breadth-First-Search (BFS) :These algorithms conduct searches by exploring the
graph one layer at a time
Graph Traversal/Searching
Traversing a Graph

Traversing a graph means visiting all the vertices in a graph.


You can traverse a graph with the help of the following two
methods:
Depth First Search (DFS) –Implemented using STACK
Breadth First Search (BFS)-Implemented using QUEUE
DFS

Algorithm: DFS(v)
1. Push the starting vertex, v into the stack.
2. Repeat until the stack becomes empty:
a. Pop a vertex from the stack.
b. Visit the popped vertex.
c. Push all the unvisited vertices adjacent to the popped
vertex into the stack.
Stack is used in the implementation of the depth first search.
DFS (Contd.)

Push the starting vertex, v1 into the stack

v1
DFS (Contd.)
Pop a vertex, v1 from the stack
Visit v1
Push all unvisited vertices adjacent to v1 into the stack

Visited:
v1
DFS (Contd.)
Pop a vertex, v1 from the stack
Visit v1
Push all unvisited vertices adjacent to v1 into the stack

v2
v4

Visited:
v1
DFS (Contd.)

Pop a vertex, v2 from the stack


Visit v2
Push all unvisited vertices adjacent to v2 into the stack

v4

Visited:
v1 v2
DFS (Contd.)

Pop a vertex, v2 from the stack


Visit v2
Push all unvisited vertices adjacent to v2 into the stack

v6
v3
v4

Visited:
v1 v2
DFS (Contd.)

Pop a vertex, v6 from the stack


Visit v6
Push all unvisited vertices adjacent to v6 into the stack

v3
v4

There are no unvisited vertices adjacent to v6

Visited:
v1 v2 v6
DFS (Contd.)

Pop a vertex, v3 from the stack


Visit v3
Push all unvisited vertices adjacent to v3 into the stack

v4

Visited:
v1 v2 v6 v3
DFS (Contd.)

Pop a vertex, v3 from the stack


Visit v3
Push all unvisited vertices adjacent to v3 into the stack

v5
v4

Visited:
v1 v2 v6 v3
DFS (Contd.)

Pop a vertex, v4 from the stack


Visit v4
Push all unvisited vertices adjacent to v4 into the stack

v4

There are no unvisited vertices adjacent to v4

Visited:
v1 v2 v6 v3 v5
DFS (Contd.)

Pop a vertex, v5 from the stack


Visit v5
Push all unvisited vertices adjacent to v5 into the stack

There are no unvisited vertices adjacent to v5

Visited:
v1 v2 v6 v3 v5 v4
DFS (Contd.)

The stack is now empty


Therefore, traversal is complete

Visited:
v1 v2 v6 v3 v5 v4
Simple Algorithm – DFS (FORMAL)
Algorithm DFS
1. Push the starting vertex in to the stack STACK
2. While STACK is not empty
1. POP a vertex V
2. If V is not in the VISIT
1. Visit the vertex V
2. Store V in VISIT_LIST
3. PUSH all adjacent vertex of V on to STACK
3. EndIf
3. EndWhile
4. Stop
DFS (Contd.)
Although the preceding algorithm provides a simple and convenient method to
traverse a graph, the algorithm will not work correctly if the graph is not connected.
BFS

Algorithm: BFS(v)
1. Visit the starting vertex, v and insert it into a queue.
2. Repeat step 3 until the queue becomes empty.
3. Delete the front vertex from the queue, visit all its
unvisited adjacent vertices, and insert them into the queue.
BFS (Contd.)

Insert v1 into the queue

v1
BFS (Contd.)
Remove a vertex v1 from the queue
Visit v1
Visit all unvisited vertices adjacent to v1 and insert them in the queue

Visited:
v1
BFS (Contd.)

Visit all unvisited vertices adjacent to v1 and insert them in the queue

v2 v4

v1
BFS (Contd.)
Remove a vertex v2 from the queue
Visit v2
Visit all unvisited vertices adjacent to v2 and insert them in the queue

v2 v4

v1 v2
BFS (Contd.)

Visit all unvisited vertices adjacent to v4 and insert them in the queue

v4 v3 v6

v1 v2
BFS (Contd.)
Remove a vertex v4 from the queue
Visit v4
Visit all unvisited vertices adjacent to v4 and insert them in the queue

v4 v3 v6

v1 v2
BFS (Contd.)
Remove a vertex v3 from the queue
Visit v3
Visit all unvisited vertices adjacent to v3 and insert them in the queue

v3 v6 v5

v3 does not have any unvisited adjacent


vertices

v1 v2 v4
BFS (Contd.)
Remove a vertex v3 from the queue
Visit v3
Visit all unvisited vertices adjacent to v3 and insert them in the queue

v3 v6 v5

v3 does not have any unvisited adjacent


vertices

v1 v2 V3
V4
BFS (Contd.)
Remove a vertex v6 from the queue
Visit v6
Visit all unvisited vertices adjacent to v6 and insert them in the queue

v6 v5

v3 does not have any unvisited adjacent


vertices

v1 v2 v4 v3 v6
BFS (Contd.)

Remove a vertex v5 from the queue


Visit v5
Visit all unvisited vertices adjacent to v5 and insert them in the queue

v5

v5 does not have any unvisited adjacent


vertices

v1 v2 v4 v3 v6 v5
BFS (Contd.)

The queue is now empty


Therefore, traversal is complete

v5 does not have any unvisited adjacent


vertices

v1 v2 v4 v3 v6 v5
Simple Algorithm - BFS
Algorithm BFS
1. Enqueue the starting vertex in to the queue QUEUE
2. While QUEUE not empty
1. Dequeue a vertex V
2. If Vis not in the VISIT
1. Visit the vertex V
2. Store V in VISIT_LIST
3. Enqueue all adjacent vertex of V on to QUEUE
3. EndIf
3. EndWhile
4. Stop
1. FIND THE DFS & BFS OF THE BELOW GRAPH
Difference Between
BFS &DFS

BREADTH FIRST TRAVERSAL DEPTH FIRST TRAVERSAL

BFS finds the shortest path to the destination. DFS goes to the bottom of a subtree, then backtracks.

The full form of BFS is Breadth-First Search. The full form of DFS is Depth First Search.

It uses a queue to keep track of the next location to It uses a stack to keep track of the next location to visit.
visit.
BFS traverses according to tree level. DFS traverses according to tree depth.

It is implemented using FIFO list. It is implemented using LIFO list.

There is no need of backtracking in BFS. There is a need of backtracking in DFS.


Applications of Graphs

Computer Science: In computer science, graph is used to represent networks of communication, data
organization, computational devices,Google map etc.
Physics and Chemistry: Graph theory is also used to study molecules in chemistry and physics.
Social Science: Graph theory is also widely used in sociology.
Mathematics: In this, graphs are useful in geometry and certain parts of topology such as knot theory.
Biology: Graph theory is useful in biology and conservation efforts.
In Computer science graphs are used to represent the flow of computation.
Google maps uses graphs for building transportation systems, where intersection of
two(or more) roads are considered to be a vertex and the road connecting two vertices
is considered to be an edge, thus their navigation system is based on the algorithm to
calculate the shortest path between two vertices.
In Facebook, users are considered to be the vertices and if they are friends then there is
an edge running between them. Facebook’s Friend suggestion algorithm uses graph
theory. Facebook is an example of undirected graph.
In World Wide Web, web pages are considered to be the vertices. There is an edge from
a page u to other page v if there is a link of page v on page u. This is an example of
Directed graph. It was the basic idea behind Google Page Ranking Algorithm.
In Operating System, we come across the Resource Allocation Graph where each
process and resources are considered to be vertices. Edges are drawn from resources to
the allocated process, or from requesting process to the requested resource. If this leads
to any formation of a cycle then a deadlock will occur.
Graph Traversals
Traversing a Graph is the systematic approach of visiting each node of
the graph and do the processing.(Ex:printing the nodes information)

The two traversal techniques of graphs are

1. Breadth First Search(BFS)

2. Depth First Search(DFS)

Department of ISE Acharya Institute of Technology


1. Breadth First Search(BFS)-Level by Level Traversal
In this traversal technique, the vertices are visited in the order of their levels from the source
node i.e all the vertices at distance 'k' from the source vertex are visited before visiting the
nodes at distance 'k+1'.

BFS can be used

1. To check if the graph is connected or not.

2. To find the vertices which are reachable from a given vertex.

3. To find spanning tree if it exists.

4. To find whether a graph is acyclic or not.

Department of ISE Acharya Institute of Technology


BFS in a binary tree

Department of ISE Acharya Institute of Technology


Cycles:
•We need to save auxiliary information
•Each node needs to be marked
• Visited: No need to be put on queue
• Not visited: Put on queue
when found

•What about assuming only two children vertices?


•Need to put all adjacent vertices in queue

Department of ISE Acharya Institute of Technology


□ Each vertex can be in one of three states:
• Unmarked and not on queue
• Marked and on queue
• Marked and off queue
□ The algorithm moves vertices between these states
□ Unmarked and not on queue:
Not reached yet
□ Marked and on queue:
Known, but adjacent
vertices not visited yet
(possibly)
□ Marked and off queue:
Department of ISE Acharya Institute of Technology

Known, all adjacent vertices


Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Depth First Search
•Again, a simple and powerful algorithm
•Given a starting vertex s
•Pick an adjacent vertex, visit it.
□ Then visit one of its adjacent vertices
□ :
□ Until impossible, then backtrack, visit another

Department of ISE Acharya Institute of Technology


Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
BFS

Department of ISE Acharya Institute of Technology


Sorting

Department of ISE Acharya Institute of Technology


Insertion sort is a simple sorting algorithm that works similar to the way you
sort playing cards in your hands. The array is virtually split into a sorted and
an unsorted part. Values from the unsorted part are picked and placed at the
correct position in the sorted part.
Algorithm
To sort an array of size n in ascending order:
1: Iterate from arr[1] to arr[n] over the array.
2: Compare the current element (key) to its predecessor.
3: If the key element is smaller than its predecessor, compare it to the
elements before. Move the greater elements one position up to make space
for the swapped element.

Department of ISE Acharya Institute of Technology


Insertion Sort in C is a comparison-based sorting algorithm that
arranges numbers of an array in order.
It is stable, adaptive, in-place and incremental in nature. The insertion sort is
useful for sorting a small set of data.
It sorts smaller arrays faster than any other sorting algorithm.
Insertion sort is a sorting algorithm that places an unsorted element at
its correct position in each iteration.

Department of ISE Acharya Institute of Technology


INSERTION SORTING

Insertion sort is a simple sorting algorithm that works similar to the way you sort
playing cards in your hands.
The array is virtually split into a sorted and an unsorted part.
Values from the unsorted part are picked and placed at the correct position in the
sorted part.
Insertion Sort

To insert 12, we need to make


room for it by moving first 36
and then 24.
6 10 24 36

12
Insertion Sort

6 10 24 36

12
Insertion Sort

6 10 24 3
6

12
Sorting An Array Using Insertion Sort
Let us take an example of Insertion sort using an array.

The array to be sorted is as follows:

Now for each pass, we compare the current element to all its previous
elements. So in the first pass, we start with the second element.

Department of ISE Acharya Institute of Technology


Department of ISE Acharya Institute of Technology
Thus, we require N number of passes to completely sort an array containing N
number of elements.

Department of ISE Acharya Institute of Technology


As shown in the illustration above, at the end of each pass, one element goes
in its proper place. Hence in general, to place N elements in their proper
place, we need N-1 passes.
Department of ISE Acharya Institute of Technology
FUNCTION FOR INSERTION SORT:
void insertion_sort(int a[],int n)
{
int i,j,item;
for(i=1;i<=n-1;i++)
{
item=a[i];
j=i-1;
while(item<a[j]&&j>=0)
{
a[j+1]=a[j];
j=j-1;
}
a[j+1]=item;
}
}
Department of ISE Acharya Institute of Technology
RADIX SORT/BUCKET SORT

• Radix sort is a non-comparative sorting algorithm that is better than the


comparative sorting algorithms
• Radix sort is one of the sorting algorithms used to sort a list of integer
numbers in order. In radix sort algorithm, a list of integer numbers will
be sorted based on the digits of individual numbers. Sorting is
performed from least significant digit to the most significant digit.

•First, we have to find the largest element (suppose max) from the given
array. Suppose 'x' be the number of digits in max. The 'x' is calculated
because we need to go through the significant places of all elements.
•After that, go through one by one each significant place. Here, we have to
use any stable sorting algorithm to sort the digits of each significant place.

Department of ISE Acharya Institute of Technology


ALGORITHM:

radixSort(arr)  

1.max = largest element in the given array  
2.d = number of digits in the largest element (or, max)  
3.Now, create d buckets of size 0 - 9  
4.for i -> 0 to d  
5.sort the array elements using counting sort (or any stable sort) according to the digits at  
the ith place  
Complexity of the Radix Sort Algorithm

To sort an unsorted list with 'n' number of elements, Radix


sort algorithm needs the following complexities...

WorstCase : O(n)
BestCase O(n)
Average Case :   O(n)
ADDRESS CALCULATION SORT

• In this sorting algorithm, Hash Function f is used with the property of Order Preserving Function which
states that if   if  x>y THEN f(x)>f(y)                                        .

 This algorithm uses an address table to store the values which is simply a
list of Linked lists.
 The Hash function is applied to each value in the array to find its
corresponding address in the address table.
 Then the values are inserted at their corresponding addresses in a sorted
manner by comparing them to the values which are already present at that
address.
Examples:
Input : arr = [29, 23, 14, 5, 15, 10, 3, 18, 1]

Output: After inserting all the values in the address table, the address table looks like this:

ADDRESS 0: 1 --> 3

ADDRESS 1: 5

ADDRESS 2:

ADDRESS 3: 10

ADDRESS 4: 14 --> 15

ADDRESS 5: 18

ADDRESS 6:

ADDRESS 7: 23

ADDRESS 8:

ADDRESS 9: 29
ALGORITHM-ADDRESS CALCULATION SORT
Step1: Read the elements from the array
Step 2:Create an array of linked list with index from 0 to 9

Step 3: Divide each element in the given array by 10 to get the index of the
linked list where the element needs to be stored and store the element using
insert_order function of linked list.

Example :F(91)=91/10=9
F(28)=28/10=2
Step4: Copy the elements from the linked list inorder from 0 to 9 into an array.
Step 6: Display the array elements.
Files and Their Organization:
A file is an external collection of related data treated as a unit.
Files are stored in auxiliary/secondary storage devices.
• Disk
• Tapes
A file is a collection of data records with each record consisting
of one or more fields.

Department of ISE Acharya Institute of Technology


FILE OPERATIONS:
Typical operations include:
•Create
•Delete
•Open
•Close
•Read
•Write

Department of ISE Acharya Institute of Technology


Desirable properties of files:
•Long-term existence
files are stored on disk or other secondary storage and do not disappear when
a user logs off
•Sharable between processes
files have names and can have associated access permissions that permit
controlled sharing
•Structure
files can be organized into hierarchical or more complex structure to reflect
the relationships among files

Department of ISE Acharya Institute of Technology


Field
basic element of data contains a single value fixed or variable length
File
collection of related fields that can be treated as a unit by some application
program fixed or variable length
Record
collection of related fields that can be treated as a unit by some application
program fixed or variable length
Database
collection of related data
relationships among elements of data are explicit
designed for use by a number of different applications
consists of one or more types of files

Department of ISE Acharya Institute of Technology


File attributes
are settings associated with computer files that grant or deny certain rights to
how a user or the operating system can access that file.

For example, IBM compatible computers running MS-DOS or Microsoft


Windows have capabilities of having read, archive, system, and hidden
attributes.

Department of ISE Acharya Institute of Technology


VARIOUS FILE ATTRIBUTES
Read-only - Allows a file to be read, but nothing can be written to the file or
changed.
Archive - Tells Windows Backup to backup the file.
System - A file is marked as a system file indicates that it is a file used by
computer system. these files should not be modified or deleted.
Hidden - A file is marked hidden,when we see the contents of
directory ,we cannot see the file.

Department of ISE Acharya Institute of Technology


In operating systems like Linux, there are three main file attributes:
read (r), write (w), execute (x).

Read - Designated as an "r"; allows a file to be read, but nothing can be


written to or changed in the file.
Write - Designated as a "w"; allows a file to be written to and changed.
Execute - Designated as an "x"; allows a file to be executed by users or the
operating system.

Department of ISE Acharya Institute of Technology


ACCESS METHODS
The access method determines how records can be retrieved:
sequentially or randomly.

• One record after another, • Access one specific record


from beginning to end without having to retrieve all records before it

Department of ISE Acharya Institute of Technology


SEQUENTIAL FILES
Sequential file –
records can only be accessed sequentially, one after another, from
beginning to end.

Department of ISE Acharya Institute of Technology


Processing records in a sequential file

While Not EOF


{
Read the next record
Process the record
}

Department of ISE Acharya Institute of Technology


Applications –
that need to access all records from beginning to end.
-Personal information
If you have to process each record, sequential access is more efficient and
easier than random access.

Sequential file is not efficient for random access.

Department of ISE Acharya Institute of Technology


INDEXED
FILES
Mapping in an indexed file
■To access a record in a file randomly,
you need to know the address of the record.
■An index file can relate the key to the record address.

Department of ISE Acharya Institute of Technology


An index file is made of a data file, which is a sequential file, and an index.
Index – a small file with only two fields:
The key of the sequential file
The address of the corresponding record on the disk.
To access a record in the file :
1. Load the entire index file into main memory.
2. Search the index file to find the desired key.
3. Retrieve the address the record.
4.Retrieve the data record. (using the address)
Inverted file –
you can have more than one index, each with a
different key
Department of ISE Acharya Institute of Technology
Logical view of an indexed file

Department of ISE Acharya Institute of Technology


Mapping in a hashed file
■A hashed file uses a hash function to map the key to the address.
■Eliminates the need for an extra file (index).
■There is no need for an index and all of the overhead associated with it.

Department of ISE Acharya Institute of Technology


Hashing methods
■Direct Hashing –
the key is the address without any algorithmic manipulation.

■Modulo Division Hashing – (Division remainder hashing)


divides the key by the file size and
use the remainder plus 1 for the address.

■Digit Extraction Hashing –


selected digits are extracted from the key and used as the
address.

Department of ISE Acharya Institute of Technology


Direct hashing
Direct Hashing –
the key is the address without any algorithmic manipulation.

Department of ISE Acharya Institute of Technology


■ the file must contain a record for every possible key.
■ Adv. – no collision.
■ Disadv. – space is wasted.

■ Hashing techniques –
map a large population of possible keys into a small address
space.

Department of ISE Acharya Institute of Technology


Modulo division
■address = key % list_size + 1
■list_size : a prime number produces fewer collisions

A new employee numbering system


that will handle 1 million employees.

Department of ISE Acharya Institute of Technology


Digit Extraction Hashing
■selected digits are extracted from the key
and used as the address.

■ Forexample : 1,3,4
6-digit employee number → → → 3-digit
address
◻ 125870 → 158
◻ 122801 → 128
◻ 121267 → 112
◻ …
◻ 123413 → 134

Department of ISE Acharya Institute of Technology


Collision
■Because there are many keys for each address in the file, there is a possibility that
more than one key will hash to the same address in the file.
■Synonyms – the set of keys that hash to the same address.
■Collision – a hashing algorithm produces an address for an insertion key, and that
address is already occupied.
■Prime area – the part of the file that contains all of the home addresses.

Department of ISE Acharya Institute of Technology


Collision Resolution
■With the exception of the directed hashing,
none of the methods we discussed creates one-to-one mapping.

■Several collision resolution methods :


◻ Open addressing resolution
◻ Linked list resolution
◻ Bucket hashing resolution

Department of ISE Acharya Institute of Technology


Open addressing resolution

■Resolve collisions in the prime area.


■The prime area addresses are searched for an open or unoccupied record where
the new data can be placed.
■One simplest strategy –
the next address (home address + 1)
■Disadv. –
each collision resolution increases the possibility of future collisions.

Department of ISE Acharya Institute of Technology


Linked list resolution
The first record is stored in the home address (prime area), but
it contains a pointer to the second record. (overflow area)

Department of ISE Acharya Institute of Technology


Bucket hashing resolution
Bucket –
a node that can accommodate more than one record.

Department of ISE Acharya Institute of Technology


TEXT VERSUS BINARY
Text and binary interpretations of a file
A file stored on a storage device is a sequence of bits that can be
interpreted by an application program as a text file or a binary file.

Department of ISE Acharya Institute of Technology


Text vs. Binary
■Text files –
◻ A file of characters.
◻ Cannot contain integers, floating-point numbers, or any other data
structures in their internal memory format.
◻ Encoding system – ASCII or EBCDIC …

■Binary files –
◻ A collection of data stored in the internal format of the
computer.
◻ Contain data that are meaningful
only if they are properly interpreted by a program.

Department of ISE Acharya Institute of Technology


Hashing
• For a huge database structure, it's tough to search all the index values through
all its level and then you need to reach the destination data block to get the
desired data.
• Hashing method is used to index and retrieve items in a database as it is faster
to search that specific item using the shorter hashed key instead of using its
original value.
• Hashing is an ideal method to calculate the direct location of a data record on
the disk without using index structure.

Department of ISE Acharya Institute of Technology


Department of ISE Acharya Institute of Technology
Important Terminologies using in Hashing
Here, are important terminologies which are used in Hashing:
Data bucket : Data buckets are memory locations where the records are stored. It is
also known as Unit Of Storage.
Key: A DBMS key is an attribute or set of an attribute which helps you to identify a
row(tuple) in a relation(table). This allows you to find the relationship between two
tables.
Hash function: A hash function, is a mapping function which maps all the set of search
keys to the address where actual records are placed.
The function which is used to convert the key to index is called hash function and the
array which stores the records is called hash table.
Hashing is the process of indexing the data so that sorting,searching,inserting and
deleting data becomes fast.
Department of ISE Acharya Institute of Technology
• Hash index : It is an address of the data block. A hash function could be a simple
mathematical function to even a complex mathematical function.

Department of ISE Acharya Institute of Technology


Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
STATIC HASHING
This kind of hashing is called static hashing since the size of the hash table is
fixed.(an array)

Department of ISE Acharya Institute of Technology


To define hash functions:-
•Division Method
It is the most simple method of hashing an integer x. This method divides x by M and then
uses the remainder obtained. In this case, the hash function can be given as
h(x) = x mod M
it is best to choose M to be a prime number because making M a prime number increases the
likelihood that the keys are mapped with a uniformity in the output range of values.
Example :
Calculate the hash values of keys 1234 and 5462. Solution Setting M = 97, hash values can
be calculated as:
h(1234) = 1234 % 97 = 70
h(5642) = 5642 % 97 = 16
Department of ISE Acharya Institute of Technology
• Mid-Square Method
The mid-square method is a good hash function which works in two steps:
Step 1: Square the value of the key. That is, find k2.
Step 2: Extract the middle r digits of the result obtained in Step 1.
h(k) = s where s is obtained by selecting r digits from k2

Department of ISE Acharya Institute of Technology


 Method 1: k: 1234 2345
k 2: 1522756 5499025
H(k): 525 492
 Method 2:
k: 1234 2345 3456
k2 : 1522756 5499025 11943936

H(k): 227 990 943 Or


439
• Folding Method
The folding method works in the following two steps:
Step 1: Divide the key value into a number of parts. That is, divide k into parts k1, k2, ..., kn,
where each part has the same number of digits except the last part which may have lesser
digits than the other parts.
Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ... + kn. The hash value
is produced by ignoring the last carry, if any. Note that the number of digits in each part of
the key will vary depending upon the size of the hash table.

Department of ISE Acharya Institute of Technology


3.Folding method
 Partition key k into several parts k1,k2,k3…kn.

 All parts except for the last one have the same
length.
 Add the parts together to obtain the hash
address.
Eg: suppose the size of each part is 2.
k: 1522756 5499025 11943936
Chopping: 01 52 27 56 05 49 90 25 11 94 39 36
Pure folding: 01+52+27+56 05+49+90+25 11+94+39+36
=136 =169 =180
Department of ISE Acharya Institute of Technology
Collision Resolution
• If, when an element is inserted, it hashes to the
same value as an already inserted element, then
we have a collision and need to resolve it.
• There are several methods for dealing with this:
– Separate chaining
– Open addressing
• Linear Probing
• Quadratic Probing
• Double Hashing
 
 In linear probing, collisions are resolved by
sequentially scanning an array until an
empty cell is found.
 Suppose there is a hash table of size h , key
value k and address calculated is i. If ith
location is free then k can be stored in
it.
 If the location already has some data value
stored in it, then other slots are examined
systematically in the forward direction to find
a free slot.
ie, the locations i+1,i+2,…h-1,0,1,..i-1 will be
the sequence of locations.
 Here the hash table is considered as circular, so
that when the last location is reached, the
search proceeds to the first location in the table;
hence the name closed hashing.
 Example:
 Insertitems with keys: 89, 18, 49, 58, 9 into an
empty hash table.
 Table size is 10.
 Hash function is hash(x) = x mod 10.
Figure 20.4
Linear
probing hash
table after
each insertion
Problem 1:Apply the hash function h(x) = x mod 7 for linear probing on the data 2341, 4234,
2839, 430, 22, 397, 3920 and show the resulting hash table

2341 MOD 7=3


4234 MOD 7 =6
2839 MOD 7=4
430 MOD 7 =3 0 397
//collision occurs checking 1 22
for next free available
location (5)
2 3920
3 2341
22 MOD 7 =1
397 MOD 7 =5 //collision occurs checking for next free available location (0) 4 2839
3920 MOD 7=0 //collision occurs checking for next free available location (2)
5 430
6 4234
Problem 2: (***IMPORTANT)

A hash table contains 7 buckets and uses linear probing to solve collision. The key
values are integers and the hash function used is key Mod 7. Draw the table that
results after inserting in the given order the following values: 16,8,4,13,29,11,22.
16,8,4,13,29,11,22.

16 MOD 7=2 0 22
8 MOD 7=1 1 8
4 MOD 7=5 2 16
13 MOD 7 =6 3 29
29 MOD 7 =1//COLLISION 4 11
11 MOD 7=4 //COLLISION 5 4
22 MOD 7 =1 //COLLISION 6 13
(Drawbacks of linear probing)
 One of the problems with linear probing is
clustering, many consecutive elements form groups
and it starts taking time to find a free slot or to
search an element.  
 As long as table is big enough, a free cell can always
be found, but the time to do so can get quite large.
 Worse, even if the table is relatively empty, blocks of
occupied cells start forming.
 This effect is known as primary clustering.
 Any key that hashes into the cluster will require
several attempts to resolve the collision, and then it
will add to the cluster.
 Double Hashing.
 Quadratic Probing.
Double Hashing
 Here a second hash function H2 is used for resolving collision
 The second hash function should be selected in such a way that
the hash addresses generated by the two hash functions are
distinct.
 The second hash function generates a value m for the key k,
which is used for the pseudo random number generator as in the
case of random probing.
 Here m and h are relatively prime.
 Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
 A popular second hash function is :

hash2(key) = PRIME – (key % PRIME)


where PRIME is a prime smaller than the TABLE_SIZE.
Calculation of second function(H2)
//hash2(key) = PRIME – (key % PRIME)

(hash1(key) + i * hash2(key)) % TABLE_SIZE


Problem 3: (***IMPORTANT)

A hash table contains 10 buckets and uses double hashing to solve collision. The key
values are integers and the hash function used is division method. Draw the table
that results after inserting in the given order the following values:
16,8,24,13,29,14,33.
Quadratic probing
Instead of probing next locations i+1,
i+2,i+3…as in the case of linear probing , here
the next locations to be probed are
i+12,i+22,i+32….
Mathematically, if h is the size of the hash
table and H(k) is the hash function then the
quadratic probing searches the locations:
(k+i2 ) MOD M
for i=1,2,3….
M=table size
 The idea is to keep a list of all elements that hash
to the same value.
 The array elements are pointers to the
first nodes of the lists.
 A new item is inserted to the end of the list

 Eg: Hash table of size 10 is considered


here.
 The hash address for a key is decided by
its last digit.
Collision Resolution Strategies
1. Open Addressing/Closed Hashing
2. Chaining
Once a collision takes place, open addressing or closed hashing computes new
positions using a probe sequence and the next record is stored in that position .
The process of examining memory locations in the hash table is called probing.
Open addressing technique can be implemented using linear probing, quadratic
probing, double hashing.

Department of ISE Acharya Institute of Technology


Linear Probing
When using a linear probe to resolve collision, the item will be stored in the next available
slot in the table, assuming that the table is not already full.

This is implemented via a linear search for an empty slot, from the point of collision. If the
physical end of table is reached during the linear search, the search will wrap around to the
beginning of the table and continue from there.If an empty slot is not found before reaching
the point of collision, the table is full.

If h is the point of collision, probe through h+1,h+2,h+3..................h+i. till an empty slot is


found

Department of ISE Acharya Institute of Technology


Department of ISE Acharya Institute of Technology
Searching a Value using Linear Probing
□The procedure for searching a value in a hash table is same as for storing a value in a
hash table.
□While searching for a value in a hash table, the array index is re-computed and the key of the
element stored at that location is compared with the value that has to be searched.
□If a match is found, then the search operation is successful.
□ If the key does not match, then the search function begins a sequential search of the array
that continues until:
• □ the value is found,or
•□ the search function encounters a vacant location in the array, indicating that the
value is not present, or
•□ the search function terminates because it reaches the end of the table and the
value is not p

Department of ISE Acharya Institute of Technology


• Quadratic Probing
A variation of the linear probing idea is called quadratic probing.

Instead of using a constant “skip” value, if the first hash value that has resulted in collision is h,

the successive values which are probed are h+1, h+4, h+9, h+16, and so on.

In other words, quadratic probing uses a skip consisting of successive perfect squares.

Department of ISE Acharya Institute of Technology


Department of ISE Acharya Institute of Technology
Double Hashing
In double hashing, we use two hash functions rather than a single function.
Double hashing uses the idea of applying a second hash function to the key when a collision
occurs.
The result of the second hash function will be the number of positions form the point of collision
to insert.
There are a couple of requirements for the second function:
•? it must never evaluate to 0
•? must make sure that all cells can be probed

A popular second hash function is: Hash2(key) = R - ( key % R ) where R is a prime number
that is smaller than the size of the table.But any independent hash function may also be used.

Department of ISE Acharya Institute of Technology


Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology
Department of ISE Acharya Institute of Technology

You might also like