Module 5

MODULE-5
GRAPHS
Mrs.JIMSHA K MATHEW
Assistant Professor Department of ISE
Acharya Institute Of Technology
Department of ISE Acharya Institute of Technology

MODULE 5
TOPICS
Graphs: Definitions, Terminologies, Matrix and Adjacency List Representation Of
Graphs, Elementary Graph operations, Traversal methods: Breadth First Search
and Depth First Search.
Sorting and Searching: Insertion Sort, Radix sort, Address Calculation Sort.
Hashing: Hash Table organizations, Hashing Functions, Static and Dynamic
Hashing.
Files and Their Organization: Data Hierarchy, File Attributes, Text Files and
Binary Files, Basic File Operations, File Organizations and Indexing

A graph is an abstract data structure that is used to implement
mathematical concept of graphs. It is basically a collection of vertices (also
the
called nodes) and edges that connect these vertices. A graph is often viewed
as a generalization of the tree structure, where instead of having a purely
parent-to-child relationship between tree nodes, any kind of complex
relationship can exist.
Graphs
Definitions
Vertex – It is a synonym for a node and is generally represented by a circle.
Example:
3 6

Edge – If ‘u’ and ‘v’ are two vertices, then an arc or a line joining the two
vertices ‘u’ and ‘v’ is called an edge.
Example:
3 6 4 7
Undirected Edge – An edge without any direction is called an undirected edge

and is denoted by an ordered pair (u, v).
Example: (3, 6)
Directed Edge – An edge with a direction is called an directed edge and is
denoted by an directed pair <u, v>.
Example: <4, 7>
Graph – A graph G consists of two sets V and E and is denoted as G = (V, E),
where,
V is finite, nonempty set of vertices
E is finite set of edges.

Directed Graph – A graph G = (V, E) is called a directed graph if all the edges
are directed. It is also called as a digraph.
Example: 1 2
3 4
The graph G = (V, E), where

V = {1, 2, 3, 4}
E = { (1,2), (1,4), (2,1),
(3,1), (3,4) }

Undirected Graph – A graph G = (V, E) is called a undirected graph if
all the edges are undirected.
Example:
1 2
3 4
The graph G = (V, E), where

V = {1, 2, 3, 4}
E = { (1,2), (2,4), (1,3),
(3,4)}

Self Loop – A loop is an edge which starts and ends on the same vertex.
It is represented by an ordered pair (i,i).
Example:
1 2
3 4
I n the above graph the self loops are (1, 1) and (2, 2)

Complete Graph – A graph G = (V,E) is said to be a complete graph, if there
exists an edge between every pair of vertices.
Example:
1 2
3 4
Note: In a complete graph with ‘n’ vertices there will be n(n-1)/2 edges.

Subgraph – A subgraph G’ of a graph G is graph such that V(G’) V(G) and
E(G’) E(G).
Example: Consider the graph G

Path – A path from a vertex ‘u’ to a vertex ‘v’ in a graph G is a sequence of
vertices u, i1, i2,……..ik, v such that (i1, i2), (i2, i3), ………( ik, v) are edges in
E(G).
Example: Consider the graph shown below
A path from vertex 1 to vertex 5 is given by 1, 3, 4, 5

Length of the path – It is the number of edges in the path.
Example: In the above graph, the length of the path from vertex 1 to vertex 5
is 3.
Cycle – A cycle is a simple path in which the first and last vertices are the
same.
Example: In the above graph the path 1 2 4 3 1 represents a cycle.
Note: A graph with atleast one cycle is called as a cyclic graph and a graph
with no cycles is called as an acyclic graph.

Connected Graph – A graph G is said to be a connected graph if and only if
there exists a path between every pair of vertices
Example: The following is a connected graph
Disconnected Graph – A graph G is said to be a disconnected graph if there

exists atleast one vertex in the graph such that it cannot be reached from any
of the other vertices in the graph.
Example: The following is a disconnected graph

Degree, In-degree and Out-degree
The degree of a vertex is the number of edges incident to that vertex.
Example: Consider the graph G

Representation /Implementation of Graphs
To implement a graph, you need to first represent the given information in the form of a graph.
The most commonly used ways of representing a graph are as follows:
Set Representation
Adjacency Matrix (Array-based implementation)
Adjacency List(Linked list representation)
Set Representation
Two sets are maintained 1) V, the set of vertices
2) E, the set of Edges (V * V)
If the graph is weighted, then the set E is the ordered collection of 3 tuples.
E=W*V*V where W is the weight
V(G1)={ v1,v2,v3,v4, v5,v6,v7}

E(G1)={(v1,v2),(v1,v3),(v2,v4),(v2,v5),(v3,v6),(v3,v7)}
V(G4)={A,B,C,D,}
E(G4)={(3,A,C), (5,B,A), (1,B,C), (7,B,D), (2,C,A), (4,C,D), (6,D,B), (8,D,C)}
Advantages
1. Efficient utilization of memory
2. Easy representation
Disadvantages
3. We can’t represent parallel , loop edges
4. Graph manipulation is not possible with set
Array-based implementation- Adjacency Matrix
 Adjacency matrix is a sequential representation.
 It is used to represent which nodes are adjacent to each other. i.e. is there any
edge connecting nodes to a graph.
 In this representation, we have to construct a nXn matrix A.

Adjacency Matrix Representation
y Matrix
Consider the following
graph: v1 v2 v3 v4
v1 0 1 0 0
v2 0 0 1 0
v3 0 0 0 0
v4 1 0 1 0
Undirected graph representation
Directed graph representation
weighted graph representation
Linked-list implementation - Adjacency List
 A list is used for each vertex v which contains the vertices which are
adjacent from v (adjacency list) .
 Adjacency list is a linked representation.
In this representation, for each vertex in the graph, we maintain the list of its
neighbors.
It means, every vertex of the graph contains list of its adjacent vertices.
We have an array of vertices which is indexed by the vertex number and for each
vertex v, the corresponding array element points to a singly linked list of neighbors
of v.
Linked representation( Adjacency List)
linked representation is another space – saving way of graph
representation
Node structure for non – weighted graph
NODE LABEL ADJ_LIST
Node structure for weighted graph

WEIGHT NODE LABEL ADJ_LIST
Header node in each list maintains a list of all adjacent vertices of a node.
Weighted graph
Graph Traversal/Searching
Methods:
 Depth-First-Search (DFS) :Algorithm travels from a starting node to some end

node before repeating the search down a different path from the same start node
until the query is answered
 Breadth-First-Search (BFS) :These algorithms conduct searches by exploring the
graph one layer at a time
Graph Traversal/Searching
Traversing a Graph
Traversing a graph means visiting all the vertices in a graph.

You can traverse a graph with the help of the following two
methods:
Depth First Search (DFS) –Implemented using STACK
Breadth First Search (BFS)-Implemented using QUEUE
DFS
Algorithm: DFS(v)
1. Push the starting vertex, v into the stack.
2. Repeat until the stack becomes empty:
a. Pop a vertex from the stack.
b. Visit the popped vertex.
c. Push all the unvisited vertices adjacent to the popped
vertex into the stack.
Stack is used in the implementation of the depth first search.
DFS (Contd.)
Push the starting vertex, v1 into the stack
v1
DFS (Contd.)
Pop a vertex, v1 from the stack
Visit v1
Push all unvisited vertices adjacent to v1 into the stack
Visited:
v1
DFS (Contd.)
Visit v1
v2
v4
Visited:
v1
DFS (Contd.)

Visit v2
v4
Visited:
v1 v2
DFS (Contd.)

Visit v2
v6
v3
v4
Visited:
v1 v2
DFS (Contd.)

Visit v6
v3
v4
There are no unvisited vertices adjacent to v6
Visited:
v1 v2 v6
DFS (Contd.)

Visit v3
v4
Visited:
v1 v2 v6 v3
DFS (Contd.)

Visit v3
v5
v4
Visited:
v1 v2 v6 v3
DFS (Contd.)

Visit v4
v4
Visited:
v1 v2 v6 v3 v5
DFS (Contd.)

Visit v5
Visited:
v1 v2 v6 v3 v5 v4
DFS (Contd.)
The stack is now empty

Therefore, traversal is complete
Visited:
v1 v2 v6 v3 v5 v4
Simple Algorithm – DFS (FORMAL)
Algorithm DFS
1. Push the starting vertex in to the stack STACK
2. While STACK is not empty
1. POP a vertex V
2. If V is not in the VISIT
1. Visit the vertex V
2. Store V in VISIT_LIST
3. PUSH all adjacent vertex of V on to STACK
3. EndIf
3. EndWhile
4. Stop
DFS (Contd.)
Although the preceding algorithm provides a simple and convenient method to
traverse a graph, the algorithm will not work correctly if the graph is not connected.
BFS
Algorithm: BFS(v)
1. Visit the starting vertex, v and insert it into a queue.
2. Repeat step 3 until the queue becomes empty.
3. Delete the front vertex from the queue, visit all its
unvisited adjacent vertices, and insert them into the queue.
BFS (Contd.)
Insert v1 into the queue
v1
BFS (Contd.)
Remove a vertex v1 from the queue
Visit v1
Visit all unvisited vertices adjacent to v1 and insert them in the queue
Visited:
v1
BFS (Contd.)
v2 v4
v1
BFS (Contd.)
Visit v2
v2 v4
v1 v2
BFS (Contd.)
v4 v3 v6
v1 v2
BFS (Contd.)
Visit v4
v4 v3 v6
v1 v2
BFS (Contd.)
Visit v3
v3 v6 v5
v3 does not have any unvisited adjacent

vertices
v1 v2 v4
BFS (Contd.)
Visit v3
v3 v6 v5

vertices
v1 v2 V3
V4
BFS (Contd.)
Visit v6
v6 v5

vertices
v1 v2 v4 v3 v6
BFS (Contd.)

Visit v5
v5

vertices
v1 v2 v4 v3 v6 v5
BFS (Contd.)
The queue is now empty

Therefore, traversal is complete

vertices
v1 v2 v4 v3 v6 v5
Simple Algorithm - BFS
Algorithm BFS
1. Enqueue the starting vertex in to the queue QUEUE
2. While QUEUE not empty
1. Dequeue a vertex V
2. If Vis not in the VISIT
1. Visit the vertex V
2. Store V in VISIT_LIST
3. Enqueue all adjacent vertex of V on to QUEUE
3. EndIf
3. EndWhile
4. Stop
1. FIND THE DFS & BFS OF THE BELOW GRAPH
Difference Between
BFS &DFS
BREADTH FIRST TRAVERSAL DEPTH FIRST TRAVERSAL
BFS finds the shortest path to the destination. DFS goes to the bottom of a subtree, then backtracks.
The full form of BFS is Breadth-First Search. The full form of DFS is Depth First Search.
It uses a queue to keep track of the next location to It uses a stack to keep track of the next location to visit.
visit.
BFS traverses according to tree level. DFS traverses according to tree depth.
It is implemented using FIFO list. It is implemented using LIFO list.
There is no need of backtracking in BFS. There is a need of backtracking in DFS.

Applications of Graphs
Computer Science: In computer science, graph is used to represent networks of communication, data
organization, computational devices,Google map etc.
Physics and Chemistry: Graph theory is also used to study molecules in chemistry and physics.
Social Science: Graph theory is also widely used in sociology.
Mathematics: In this, graphs are useful in geometry and certain parts of topology such as knot theory.
Biology: Graph theory is useful in biology and conservation efforts.
In Computer science graphs are used to represent the flow of computation.
Google maps uses graphs for building transportation systems, where intersection of
two(or more) roads are considered to be a vertex and the road connecting two vertices
is considered to be an edge, thus their navigation system is based on the algorithm to
calculate the shortest path between two vertices.
In Facebook, users are considered to be the vertices and if they are friends then there is
an edge running between them. Facebook’s Friend suggestion algorithm uses graph
theory. Facebook is an example of undirected graph.
In World Wide Web, web pages are considered to be the vertices. There is an edge from
a page u to other page v if there is a link of page v on page u. This is an example of
Directed graph. It was the basic idea behind Google Page Ranking Algorithm.
In Operating System, we come across the Resource Allocation Graph where each
process and resources are considered to be vertices. Edges are drawn from resources to
the allocated process, or from requesting process to the requested resource. If this leads
to any formation of a cycle then a deadlock will occur.
Graph Traversals
Traversing a Graph is the systematic approach of visiting each node of
the graph and do the processing.(Ex:printing the nodes information)
The two traversal techniques of graphs are
1. Breadth First Search(BFS)
2. Depth First Search(DFS)

1. Breadth First Search(BFS)-Level by Level Traversal
In this traversal technique, the vertices are visited in the order of their levels from the source
node i.e all the vertices at distance 'k' from the source vertex are visited before visiting the
nodes at distance 'k+1'.
BFS can be used
1. To check if the graph is connected or not.
2. To find the vertices which are reachable from a given vertex.
3. To find spanning tree if it exists.
4. To find whether a graph is acyclic or not.

BFS in a binary tree

Cycles:
•We need to save auxiliary information
•Each node needs to be marked
• Visited: No need to be put on queue
• Not visited: Put on queue
when found
•What about assuming only two children vertices?

•Need to put all adjacent vertices in queue

□ Each vertex can be in one of three states:
• Unmarked and not on queue
• Marked and on queue
• Marked and off queue
□ The algorithm moves vertices between these states
□ Unmarked and not on queue:
Not reached yet
□ Marked and on queue:
Known, but adjacent
vertices not visited yet
(possibly)
□ Marked and off queue:
Known, all adjacent vertices

Depth First Search
•Again, a simple and powerful algorithm
•Given a starting vertex s
•Pick an adjacent vertex, visit it.
□ Then visit one of its adjacent vertices
□ :
□ Until impossible, then backtrack, visit another

BFS

Sorting

Insertion sort is a simple sorting algorithm that works similar to the way you
sort playing cards in your hands. The array is virtually split into a sorted and
an unsorted part. Values from the unsorted part are picked and placed at the
correct position in the sorted part.
Algorithm
To sort an array of size n in ascending order:
1: Iterate from arr[1] to arr[n] over the array.
2: Compare the current element (key) to its predecessor.
3: If the key element is smaller than its predecessor, compare it to the
elements before. Move the greater elements one position up to make space
for the swapped element.

Insertion Sort in C is a comparison-based sorting algorithm that
arranges numbers of an array in order.
It is stable, adaptive, in-place and incremental in nature. The insertion sort is
useful for sorting a small set of data.
It sorts smaller arrays faster than any other sorting algorithm.
Insertion sort is a sorting algorithm that places an unsorted element at
its correct position in each iteration.

INSERTION SORTING
Insertion sort is a simple sorting algorithm that works similar to the way you sort
playing cards in your hands.
The array is virtually split into a sorted and an unsorted part.
Values from the unsorted part are picked and placed at the correct position in the
sorted part.
Insertion Sort
To insert 12, we need to make

room for it by moving first 36
and then 24.
6 10 24 36
12
Insertion Sort
6 10 24 36
12
Insertion Sort
6 10 24 3
6
12
Sorting An Array Using Insertion Sort
Let us take an example of Insertion sort using an array.
The array to be sorted is as follows:
Now for each pass, we compare the current element to all its previous
elements. So in the first pass, we start with the second element.

Thus, we require N number of passes to completely sort an array containing N
number of elements.

As shown in the illustration above, at the end of each pass, one element goes
in its proper place. Hence in general, to place N elements in their proper
place, we need N-1 passes.
FUNCTION FOR INSERTION SORT:
void insertion_sort(int a[],int n)
{
int i,j,item;
for(i=1;i<=n-1;i++)
{
item=a[i];
j=i-1;
while(item<a[j]&&j>=0)
{
a[j+1]=a[j];
j=j-1;
}
a[j+1]=item;
}
}
RADIX SORT/BUCKET SORT
• Radix sort is a non-comparative sorting algorithm that is better than the

comparative sorting algorithms
• Radix sort is one of the sorting algorithms used to sort a list of integer
numbers in order. In radix sort algorithm, a list of integer numbers will
be sorted based on the digits of individual numbers. Sorting is
performed from least significant digit to the most significant digit.
•First, we have to find the largest element (suppose max) from the given
array. Suppose 'x' be the number of digits in max. The 'x' is calculated
because we need to go through the significant places of all elements.
•After that, go through one by one each significant place. Here, we have to
use any stable sorting algorithm to sort the digits of each significant place.

ALGORITHM:
radixSort(arr)
1.max = largest element in the given array
2.d = number of digits in the largest element (or, max)
3.Now, create d buckets of size 0 - 9
4.for i -> 0 to d
5.sort the array elements using counting sort (or any stable sort) according to the digits at
the ith place
Complexity of the Radix Sort Algorithm
To sort an unsorted list with 'n' number of elements, Radix

sort algorithm needs the following complexities...
WorstCase : O(n)
BestCase O(n)
Average Case : O(n)
ADDRESS CALCULATION SORT
• In this sorting algorithm, Hash Function f is used with the property of Order Preserving Function which
states that if if x>y THEN f(x)>f(y) .
 This algorithm uses an address table to store the values which is simply a
list of Linked lists.
 The Hash function is applied to each value in the array to find its
corresponding address in the address table.
 Then the values are inserted at their corresponding addresses in a sorted
manner by comparing them to the values which are already present at that
address.
Examples:
Input : arr = [29, 23, 14, 5, 15, 10, 3, 18, 1]
Output: After inserting all the values in the address table, the address table looks like this:
ADDRESS 0: 1 --> 3
ADDRESS 1: 5
ADDRESS 2:
ADDRESS 3: 10
ADDRESS 4: 14 --> 15
ADDRESS 5: 18
ADDRESS 6:
ADDRESS 7: 23
ADDRESS 8:
ADDRESS 9: 29
ALGORITHM-ADDRESS CALCULATION SORT
Step1: Read the elements from the array
Step 2:Create an array of linked list with index from 0 to 9
Step 3: Divide each element in the given array by 10 to get the index of the
linked list where the element needs to be stored and store the element using
insert_order function of linked list.
Example :F(91)=91/10=9
F(28)=28/10=2
Step4: Copy the elements from the linked list inorder from 0 to 9 into an array.
Step 6: Display the array elements.
Files and Their Organization:
A file is an external collection of related data treated as a unit.
Files are stored in auxiliary/secondary storage devices.
• Disk
• Tapes
A file is a collection of data records with each record consisting
of one or more fields.

FILE OPERATIONS:
Typical operations include:
•Create
•Delete
•Open
•Close
•Read
•Write

Desirable properties of files:
•Long-term existence
files are stored on disk or other secondary storage and do not disappear when
a user logs off
•Sharable between processes
files have names and can have associated access permissions that permit
controlled sharing
•Structure
files can be organized into hierarchical or more complex structure to reflect
the relationships among files

Field
basic element of data contains a single value fixed or variable length
File
collection of related fields that can be treated as a unit by some application
program fixed or variable length
Record
collection of related fields that can be treated as a unit by some application
program fixed or variable length
Database
collection of related data
relationships among elements of data are explicit
designed for use by a number of different applications
consists of one or more types of files

File attributes
are settings associated with computer files that grant or deny certain rights to
how a user or the operating system can access that file.
For example, IBM compatible computers running MS-DOS or Microsoft

Windows have capabilities of having read, archive, system, and hidden
attributes.

VARIOUS FILE ATTRIBUTES
Read-only - Allows a file to be read, but nothing can be written to the file or
changed.
Archive - Tells Windows Backup to backup the file.
System - A file is marked as a system file indicates that it is a file used by
computer system. these files should not be modified or deleted.
Hidden - A file is marked hidden,when we see the contents of
directory ,we cannot see the file.

In operating systems like Linux, there are three main file attributes:
read (r), write (w), execute (x).
Read - Designated as an "r"; allows a file to be read, but nothing can be

written to or changed in the file.
Write - Designated as a "w"; allows a file to be written to and changed.
Execute - Designated as an "x"; allows a file to be executed by users or the
operating system.

ACCESS METHODS
The access method determines how records can be retrieved:
sequentially or randomly.
• One record after another, • Access one specific record

from beginning to end without having to retrieve all records before it

SEQUENTIAL FILES
Sequential file –
records can only be accessed sequentially, one after another, from
beginning to end.

Processing records in a sequential file
While Not EOF

{
Read the next record
Process the record
}

Applications –
that need to access all records from beginning to end.
-Personal information
If you have to process each record, sequential access is more efficient and
easier than random access.
Sequential file is not efficient for random access.

INDEXED
FILES
Mapping in an indexed file
■To access a record in a file randomly,
you need to know the address of the record.
■An index file can relate the key to the record address.

An index file is made of a data file, which is a sequential file, and an index.
Index – a small file with only two fields:
The key of the sequential file
The address of the corresponding record on the disk.
To access a record in the file :
1. Load the entire index file into main memory.
2. Search the index file to find the desired key.
3. Retrieve the address the record.
4.Retrieve the data record. (using the address)
Inverted file –
you can have more than one index, each with a
different key
Logical view of an indexed file

Mapping in a hashed file
■A hashed file uses a hash function to map the key to the address.
■Eliminates the need for an extra file (index).
■There is no need for an index and all of the overhead associated with it.

Hashing methods
■Direct Hashing –
the key is the address without any algorithmic manipulation.
■Modulo Division Hashing – (Division remainder hashing)

divides the key by the file size and
use the remainder plus 1 for the address.
■Digit Extraction Hashing –

selected digits are extracted from the key and used as the
address.

Direct hashing
Direct Hashing –
the key is the address without any algorithmic manipulation.

■ the file must contain a record for every possible key.
■ Adv. – no collision.
■ Disadv. – space is wasted.
■ Hashing techniques –
map a large population of possible keys into a small address
space.

Modulo division
■address = key % list_size + 1
■list_size : a prime number produces fewer collisions
A new employee numbering system

that will handle 1 million employees.

Digit Extraction Hashing
■selected digits are extracted from the key
and used as the address.
■ Forexample : 1,3,4
6-digit employee number → → → 3-digit
address
◻ 125870 → 158
◻ 122801 → 128
◻ 121267 → 112
◻ …
◻ 123413 → 134

Collision
■Because there are many keys for each address in the file, there is a possibility that
more than one key will hash to the same address in the file.
■Synonyms – the set of keys that hash to the same address.
■Collision – a hashing algorithm produces an address for an insertion key, and that
address is already occupied.
■Prime area – the part of the file that contains all of the home addresses.

Collision Resolution
■With the exception of the directed hashing,
none of the methods we discussed creates one-to-one mapping.
■Several collision resolution methods :

◻ Open addressing resolution
◻ Linked list resolution
◻ Bucket hashing resolution

Open addressing resolution
■Resolve collisions in the prime area.

■The prime area addresses are searched for an open or unoccupied record where
the new data can be placed.
■One simplest strategy –
the next address (home address + 1)
■Disadv. –
each collision resolution increases the possibility of future collisions.

Linked list resolution
The first record is stored in the home address (prime area), but
it contains a pointer to the second record. (overflow area)

Bucket hashing resolution
Bucket –
a node that can accommodate more than one record.

TEXT VERSUS BINARY
Text and binary interpretations of a file
A file stored on a storage device is a sequence of bits that can be
interpreted by an application program as a text file or a binary file.

Text vs. Binary
■Text files –
◻ A file of characters.
◻ Cannot contain integers, floating-point numbers, or any other data
structures in their internal memory format.
◻ Encoding system – ASCII or EBCDIC …
■Binary files –
◻ A collection of data stored in the internal format of the
computer.
◻ Contain data that are meaningful
only if they are properly interpreted by a program.

Hashing
• For a huge database structure, it's tough to search all the index values through
all its level and then you need to reach the destination data block to get the
desired data.
• Hashing method is used to index and retrieve items in a database as it is faster
to search that specific item using the shorter hashed key instead of using its
original value.
• Hashing is an ideal method to calculate the direct location of a data record on
the disk without using index structure.

Important Terminologies using in Hashing
Here, are important terminologies which are used in Hashing:
Data bucket : Data buckets are memory locations where the records are stored. It is
also known as Unit Of Storage.
Key: A DBMS key is an attribute or set of an attribute which helps you to identify a
row(tuple) in a relation(table). This allows you to find the relationship between two
tables.
Hash function: A hash function, is a mapping function which maps all the set of search
keys to the address where actual records are placed.
The function which is used to convert the key to index is called hash function and the
array which stores the records is called hash table.
Hashing is the process of indexing the data so that sorting,searching,inserting and
deleting data becomes fast.
• Hash index : It is an address of the data block. A hash function could be a simple
mathematical function to even a complex mathematical function.

STATIC HASHING
This kind of hashing is called static hashing since the size of the hash table is
fixed.(an array)

To define hash functions:-
•Division Method
It is the most simple method of hashing an integer x. This method divides x by M and then
uses the remainder obtained. In this case, the hash function can be given as
h(x) = x mod M
it is best to choose M to be a prime number because making M a prime number increases the
likelihood that the keys are mapped with a uniformity in the output range of values.
Example :
Calculate the hash values of keys 1234 and 5462. Solution Setting M = 97, hash values can
be calculated as:
h(1234) = 1234 % 97 = 70
h(5642) = 5642 % 97 = 16
• Mid-Square Method
The mid-square method is a good hash function which works in two steps:
Step 1: Square the value of the key. That is, find k2.
Step 2: Extract the middle r digits of the result obtained in Step 1.
h(k) = s where s is obtained by selecting r digits from k2

 Method 1: k: 1234 2345
k 2: 1522756 5499025
H(k): 525 492
 Method 2:
k: 1234 2345 3456
k2 : 1522756 5499025 11943936
H(k): 227 990 943 Or

439
• Folding Method
The folding method works in the following two steps:
Step 1: Divide the key value into a number of parts. That is, divide k into parts k1, k2, ..., kn,
where each part has the same number of digits except the last part which may have lesser
digits than the other parts.
Step 2: Add the individual parts. That is, obtain the sum of k1 + k2 + ... + kn. The hash value
is produced by ignoring the last carry, if any. Note that the number of digits in each part of
the key will vary depending upon the size of the hash table.

3.Folding method
 Partition key k into several parts k1,k2,k3…kn.
 All parts except for the last one have the same
length.
 Add the parts together to obtain the hash
address.
Eg: suppose the size of each part is 2.
k: 1522756 5499025 11943936
Chopping: 01 52 27 56 05 49 90 25 11 94 39 36
Pure folding: 01+52+27+56 05+49+90+25 11+94+39+36
=136 =169 =180
Collision Resolution
• If, when an element is inserted, it hashes to the
same value as an already inserted element, then
we have a collision and need to resolve it.
• There are several methods for dealing with this:
– Separate chaining
– Open addressing
• Linear Probing
• Quadratic Probing
• Double Hashing

 In linear probing, collisions are resolved by
sequentially scanning an array until an
empty cell is found.
 Suppose there is a hash table of size h , key
value k and address calculated is i. If ith
location is free then k can be stored in
it.
 If the location already has some data value
stored in it, then other slots are examined
systematically in the forward direction to find
a free slot.
ie, the locations i+1,i+2,…h-1,0,1,..i-1 will be
the sequence of locations.
 Here the hash table is considered as circular, so
that when the last location is reached, the
search proceeds to the first location in the table;
hence the name closed hashing.
 Example:
 Insertitems with keys: 89, 18, 49, 58, 9 into an
empty hash table.
 Table size is 10.
 Hash function is hash(x) = x mod 10.
Figure 20.4
Linear
probing hash
table after
each insertion
Problem 1:Apply the hash function h(x) = x mod 7 for linear probing on the data 2341, 4234,
2839, 430, 22, 397, 3920 and show the resulting hash table
2341 MOD 7=3

4234 MOD 7 =6
2839 MOD 7=4
430 MOD 7 =3 0 397
//collision occurs checking 1 22
for next free available
location (5)
2 3920
3 2341
22 MOD 7 =1
397 MOD 7 =5 //collision occurs checking for next free available location (0) 4 2839
3920 MOD 7=0 //collision occurs checking for next free available location (2)
5 430
6 4234
Problem 2: (***IMPORTANT)
A hash table contains 7 buckets and uses linear probing to solve collision. The key
values are integers and the hash function used is key Mod 7. Draw the table that
results after inserting in the given order the following values: 16,8,4,13,29,11,22.
16,8,4,13,29,11,22.
16 MOD 7=2 0 22
8 MOD 7=1 1 8
4 MOD 7=5 2 16
13 MOD 7 =6 3 29
29 MOD 7 =1//COLLISION 4 11
11 MOD 7=4 //COLLISION 5 4
22 MOD 7 =1 //COLLISION 6 13
(Drawbacks of linear probing)
 One of the problems with linear probing is
clustering, many consecutive elements form groups
and it starts taking time to find a free slot or to
search an element.
 As long as table is big enough, a free cell can always
be found, but the time to do so can get quite large.
 Worse, even if the table is relatively empty, blocks of
occupied cells start forming.
 This effect is known as primary clustering.
 Any key that hashes into the cluster will require
several attempts to resolve the collision, and then it
will add to the cluster.
 Double Hashing.
 Quadratic Probing.
Double Hashing
 Here a second hash function H2 is used for resolving collision
 The second hash function should be selected in such a way that
the hash addresses generated by the two hash functions are
distinct.
 The second hash function generates a value m for the key k,
which is used for the pseudo random number generator as in the
case of random probing.
 Here m and h are relatively prime.
 Double hashing can be done using :
(hash1(key) + i * hash2(key)) % TABLE_SIZE
Here hash1() and hash2() are hash functions and TABLE_SIZE
is size of hash table.
 A popular second hash function is :
hash2(key) = PRIME – (key % PRIME)

where PRIME is a prime smaller than the TABLE_SIZE.
Calculation of second function(H2)
//hash2(key) = PRIME – (key % PRIME)
(hash1(key) + i * hash2(key)) % TABLE_SIZE

Problem 3: (***IMPORTANT)
A hash table contains 10 buckets and uses double hashing to solve collision. The key
values are integers and the hash function used is division method. Draw the table
that results after inserting in the given order the following values:
16,8,24,13,29,14,33.
Quadratic probing
Instead of probing next locations i+1,
i+2,i+3…as in the case of linear probing , here
the next locations to be probed are
i+12,i+22,i+32….
Mathematically, if h is the size of the hash
table and H(k) is the hash function then the
quadratic probing searches the locations:
(k+i2 ) MOD M
for i=1,2,3….
M=table size
 The idea is to keep a list of all elements that hash
to the same value.
 The array elements are pointers to the
first nodes of the lists.
 A new item is inserted to the end of the list
 Eg: Hash table of size 10 is considered

here.
 The hash address for a key is decided by
its last digit.
Collision Resolution Strategies
1. Open Addressing/Closed Hashing
2. Chaining
Once a collision takes place, open addressing or closed hashing computes new
positions using a probe sequence and the next record is stored in that position .
The process of examining memory locations in the hash table is called probing.
Open addressing technique can be implemented using linear probing, quadratic
probing, double hashing.

Linear Probing
When using a linear probe to resolve collision, the item will be stored in the next available
slot in the table, assuming that the table is not already full.
This is implemented via a linear search for an empty slot, from the point of collision. If the
physical end of table is reached during the linear search, the search will wrap around to the
beginning of the table and continue from there.If an empty slot is not found before reaching
the point of collision, the table is full.
If h is the point of collision, probe through h+1,h+2,h+3..................h+i. till an empty slot is

found

Searching a Value using Linear Probing
□The procedure for searching a value in a hash table is same as for storing a value in a
hash table.
□While searching for a value in a hash table, the array index is re-computed and the key of the
element stored at that location is compared with the value that has to be searched.
□If a match is found, then the search operation is successful.
□ If the key does not match, then the search function begins a sequential search of the array
that continues until:
• □ the value is found,or
•□ the search function encounters a vacant location in the array, indicating that the
value is not present, or
•□ the search function terminates because it reaches the end of the table and the
value is not p

• Quadratic Probing
A variation of the linear probing idea is called quadratic probing.
Instead of using a constant “skip” value, if the first hash value that has resulted in collision is h,
the successive values which are probed are h+1, h+4, h+9, h+16, and so on.
In other words, quadratic probing uses a skip consisting of successive perfect squares.

Double Hashing
In double hashing, we use two hash functions rather than a single function.
Double hashing uses the idea of applying a second hash function to the key when a collision
occurs.
The result of the second hash function will be the number of positions form the point of collision
to insert.
There are a couple of requirements for the second function:
•? it must never evaluate to 0
•? must make sure that all cells can be probed
A popular second hash function is: Hash2(key) = R - ( key % R ) where R is a prime number
that is smaller than the size of the table.But any independent hash function may also be used.


Module 5

Uploaded by

Copyright:

Available Formats

You might also like

Module 5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module 5

Uploaded by

Copyright:

Available Formats

MODULE-5

Department of ISE Acharya Institute of Technology

Department of ISE Acharya Institute of Technology

Department of ISE Acharya Institute of Technology

Undirected Edge – An edge without any direction is called an undirected edge

E is finite set of edges.

The graph G = (V, E), where

Department of ISE Acharya Institute of Technology

The graph G = (V, E), where

Department of ISE Acharya Institute of Technology

Department of ISE Acharya Institute of Technology

Department of ISE Acharya Institute of Technology

Department of ISE Acharya Institute of Technology

A path from vertex 1 to vertex 5 is given by 1, 3, 4, 5

Department of ISE Acharya Institute of Technology

Department of ISE Acharya Institute of Technology

Disconnected Graph – A graph G is said to be a disconnected graph if there

Department of ISE Acharya Institute of Technology

Department of ISE Acharya Institute of Technology

V(G1)={ v1,v2,v3,v4, v5,v6,v7}

 Adjacency matrix is a sequential representation.

 In this representation, we have to construct a nXn matrix A.

 Adjacency list is a linked representation.

Node structure for non – weighted graph

NODE LABEL ADJ_LIST

Node structure for weighted graph

 Depth-First-Search (DFS) :Algorithm travels from a starting node to some end

Traversing a graph means visiting all the vertices in a graph.

Push the starting vertex, v1 into the stack

Pop a vertex, v2 from the stack

Pop a vertex, v2 from the stack

Pop a vertex, v6 from the stack

There are no unvisited vertices adjacent to v6

Pop a vertex, v3 from the stack

Pop a vertex, v3 from the stack

Pop a vertex, v4 from the stack

There are no unvisited vertices adjacent to v4

Pop a vertex, v5 from the stack

There are no unvisited vertices adjacent to v5

The stack is now empty

Insert v1 into the queue

v3 does not have any unvisited adjacent

v3 does not have any unvisited adjacent

v3 does not have any unvisited adjacent

Remove a vertex v5 from the queue

v5 does not have any unvisited adjacent

The queue is now empty

v5 does not have any unvisited adjacent

BREADTH FIRST TRAVERSAL DEPTH FIRST TRAVERSAL

It is implemented using FIFO list. It is implemented using LIFO list.

There is no need of backtracking in BFS. There is a need of backtracking in DFS.

The two traversal techniques of graphs are

1. Breadth First Search(BFS)

2. Depth First Search(DFS)

Department of ISE Acharya Institute of Technology

BFS can be used

1. To check if the graph is connected or not.