Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

R L JALAPPA INSTITUTE OF TECHNOLOGY

Dept.. of Computer Science & Engg.

Module 5

Manjunatha B N
Assistant Professor

Department
1 of Computer Science, RLJIT
Outline
 Graphs:
 Definitions, Terminologies
 Matrix and Adjacency List Representation of Graphs
 Elementary Graph operations.
 Traversal methods:
1. Breadth First Search
2. Depth First Search
 Sorting and Searching:
 Insertion Sort, Radix sort
 Address Calculation Sort
 Hashing:
 Hash Table organizations
 Hashing Functions
 Static and Dynamic Hashing
 Files and Their Organization:
 Data Hierarchy
 File Attributes
 Text Files and Binary Files
 Basic File Operations
 File Organizations and Department
Indexingof Computer Science, RLJIT 2
GRAPHS
 A graph G can be defined as an ordered set G(V, E) where V(G)
represents the set of vertices and E(G) represents the set of
edges which are used to connect these vertices.

 A Graph G(V, E) with 5 vertices (A, B, C, D, E) and six edges


((A,B), (B,C), (C,E), (E,D), (D,B), (D,A)) is

Department of Computer Science, RLJIT 3


GRAPHS - TERMINOLOGIES
Vertex :
 Individual data element of a graph is called as Vertex.
 Vertex is also known as node.
 A vertex is normally represented by a circle.
Example:

 In this example 5 nodes identified by 1, 2, 3, 4 and 5. they are also


called vertices.
 It is denoted by set v = {1, 2, 3, 4, 5}

Department of Computer Science, RLJIT 4


Edge:
 An edge is a connecting link between two vertices or Joining the
two vertices is called an edge.
 Edge is also known as Arc.
Example:

 It is an undirected edge, i.e. there is no direction towards vertex 6.


 It is denoted by ordered pair(1,6)

Example:

 It is a directed edge, i.e. there is a direction towards vertex 6.


 It is denoted by directed pair <1, 6>
1 is tail of the edge
6 is head of the edge Department of Computer Science, RLJIT 5
Directed Graph:
 It is also called as digraph.
 A graph G = (V, E) in which every edge is directed is called a
directed graph.
Example:

graph G = (V, E)
where V = {0, 1, 2}
E = {<0, 1>, <1, 0>, <1, 2>}

Department of Computer Science, RLJIT 6


Undirected Graph:
 A graph G = (V, E) in which every edge is undirected.
Example:

graph G = (V, E)
where V = {1, 2,3, 4, 5, 6, 7, 8}
E = ((1, 5), (1, 6), (5, 7)
(6, 8), (6, 4), (8, 2), (8, 3))

Department of Computer Science, RLJIT 7


Self-Loop (Self-Edge) Graph:
 A loop is an edge which starts and ends on the same vertex.
Example:

(1, 1) and (4, 4) are self-loop or


self edge.

Department of Computer Science, RLJIT 8


Multi Graph:
 A graph with multiple occurrence of the same edge between any
two vertices is called multi graph.
Example:

(1, 4) and (4, 3) are multiple


edges.

Department of Computer Science, RLJIT 9


Complete Graph:
 A graph G = ( V, E) is said to be a complete graph, if there exists an
edge between every pair of vertices. i.e. n(n-1)/2 edges required ,
where n is number of vertices.

Example:

n vertices
n(n-1)/2
=4(4-1)/2
=6 edges
Complete graph

Not a Complete graph


Department of Computer Science, RLJIT 10
Circuit or Cycle Graph:
 A cycle is a path in which the first and last vertices are same.

Example:

In this example, the path < 4, 2, 3, 4>


It can also be represented as
<4, 2>, <2, 3>, <3, 4> , <4, 2>
NOTE:
A graph with at least one cycle is called
a cyclic graph.
A graph with no cycle is called acyclic
graph.

Department of Computer Science, RLJIT 11


Connected Graph:
 A graph G = (V, E) (directed or undirected) is said to be connected if
and only if there exists a path between every pair of vertices.

Example:

Department of Computer Science, RLJIT 12


Disconnected Graph:
 A graph G = (V, E), if there exists at least one vertex in a graph that
can‟t be reached from other vertices in the graph is called
disconnected graph.
Example:

Department of Computer Science, RLJIT 13


Representation of Graphs :
 The graphs can be represented in two different methods:
1. Adjacency Matrix
2. Adjacency Linked List

Department of Computer Science, RLJIT 14


1. Adjacency Matrix: Let G = (V, E) be a graph where V is a set of
vertices and E is a set of edges. N be the number of vertices in
graph G.
It is formally defined as
1 if there is an edge from vertex i to vertex j
A[i][j] =
0 if there is no edge from vertex i to vertex j

(a) Directed graph Adjacency Matrix

Department of Computer Science, RLJIT


(b) Undirected graph Adjacency
15
Matrix
Function to read Adjacency Matrix:
void read_adjacency_matrix (int a[10][10], int n)
{
int i, j;
for(i=0; i<n; i++)
{
for(j=0; j<n; j++)
{
scanf(“%d”,&a[i][j]);
}
}
}

Department of Computer Science, RLJIT 16


2. Adjacency Linked List: Let G = (V, E) be a graph. It is an array of
n linked lists where n is the number of vertices in graph G.
Each location of the array represent a vertex of the graph.

(a) Directed graph Adjacency Linked List

(b) Undirected graph Adjacency Linked List


Department of Computer Science, RLJIT 17
Weighted graph: A graph in which a number is assigned to each edge
in a graph is called weighted graph. These numbers are called costs or
weights.
Example:

The values 10, 20,30 and 40 are the weights associated with 4 edges
<1,3>, <1,2>, <3,4> and <2,4>
The weighted graph can be represented using adjacency matrix as well
as adjacency linked lists.
The adjacency matrix consisting of costs(weights) is called cost
adjacency matrix.
The adjacency linked list consisting of costs(weights) is called cost
adjacency linked list.

Department of Computer Science, RLJIT 18


Cost adjacency matrix: It is formally defined as shown below:
w if there is a weight associated with edge from vertex i
to vertex j
A[i][J] =
∞ if there is no edge from vertex i to vertex j

Department of Computer Science, RLJIT 19


For undirected graph: It is formally defined as shown below:
w if there is a weight associated with edge from vertex
(i, j ) to (j, i)
A[i][J] =
∞ if there is no edge from vertex i to vertex j

Department of Computer Science, RLJIT 20


Cost adjacency linked list: A cost adjacency list is an array of n linked
lists.

Department of Computer Science, RLJIT 21


Elementary Graph Operations:
1. Creation of graph using adjacency matrix
 Declare an array of M[size][size] which will store the graph.
 Enter how many nodes needed in a graph.
 Enter the edges of the graph by 2 vertices each say Vi, Vj
indicates some edge.
 If the graph is directed set M[i][j]=1. if the graph is undirected set
M[i][j]=1 and M[j][i]= 1as well.
 When all edges for the desired graph is entered print the graph
M[i][j].

Department of Computer Science, RLJIT 22


Elementary Graph Operations:
2. Vertex insertion:
 The vertex can be inserted in a graph by adding name of new
vertex and name of the existing vertex to which it is attached.

 Now if we insert vertex V4, by attaching it to V3. then the graph


becomes.

Department of Computer Science, RLJIT 23


Elementary Graph Operations:
3. Vertex Deletion:
 When particular vertex is deleted, then all the edges associated
with it gets deleted.

Department of Computer Science, RLJIT 24


Graph Traversals: The process of visiting each node of a graph
systematically in some order is called graph traversal.

Two important traversal techniques are:


1. Breadth First Search (BFS)
2. Depth First Search (DFS)

Breadth First Search (BFS):


It is a method of traversing the graph from an arbitrary vertex(u). First,
visit this node u. Then visit all neighbour of u. Then visit the neighbours
of neighbours of u and soon. Once all vertices have been visited search
will terminate.

Department of Computer Science, RLJIT 25


Example: Traverse the following graph by BFS and print all the vertices
reachable from start vertex a. resolve this by the vertex alphabetical
order.

Initialization: Insert source vertex a into queue and add a to S as shown


below:
u = del(Q) v = adjacent to u Nodes visited S Queue
- - a a
Step 1: Delete an element a from queue.
Find the adjacent nodes to a but not in S.
i.e. b, c, d and e
Add b, c, d and e to S, insert into queue
u = del(Q) v = adjacent to u Nodes visited S Queue
- - a a
a b, c, d, e Department of Computer
a, b, Science,
c, d, RLJIT
e 26 b, c, d, e
Step 2: Delete an element b from queue.
Find the adjacent nodes to b but not in S. i.e. f
Add f to S, insert into queue
u = del(Q) v = adjacent to u Nodes visited S Queue
- - a a
a b, c, d, e a, b, c, d, e b, c, d, e
b f a, b, c, d, e, f c, d, e, f
Step 3: Delete an element c from queue.
Find the adjacent nodes to c but not in S. i.e. g
Add g to S, insert into queue
u = del(Q) v = adjacent to u Nodes visited S Queue
- - a a
a b, c, d, e a, b, c, d, e b, c, d, e
b f a, b, c, d, e, f c, d, e, f
c g a, b, c, d, e, f, g 27 d, e, f, g
Department of Computer Science, RLJIT
Remaining steps:
u = del(Q) v = adjacent to u Nodes visited S Queue
Initialization - - a a
Step 1 a b, c, d, e a, b, c, d, e b, c, d, e
Step 2 b f a, b, c, d, e, f c, d, e, f
Step 3 c g a, b, c, d, e, f, g d, e, f, g
Step 4 d a, b, f a, b, c, d, e, f, g e, f, g
Step 5 e a, g a, b, c, d, e, f, g f, g
Step 6 f b, d a, b, c, d, e, f, g g
Step 7 g c, e a, b, c, d, e, f, g empty

The nodes that are reachable from source a : a, b, c, d, e, f, g

Department of Computer Science, RLJIT 28


Algorithm of BFS using adjacency matrix:
No node is visited to start with insert source u to q
Print u
Mark u as visited i.e. add u to S
while queue is not empty
Delete a vertex u from q
For every vertex adjacent to u
if v is not visited
Print v
Mark v as visited
Insert v to queue
end if
end while

Department of Computer Science, RLJIT 29


Depth First Search (DFS):
It is a method of traversing the graph by visiting each node of the graph in a
systematic order. As the name implies means “To search deeper in the graph”.
Algorithm of DFS:
Step1: Select node u as the start vertex (Select in a order)
push u onto stack and mark it as visited.
we add u to S for marking.
Step 2: while stack is not empty
for vertex u on top of the stack, find the next immediate
adjacent vertex.
if v is adjacent
if a vertex v not visited then,
push it onto stack and number it in the order it is pushed
mark it as visited by adding v to S
else
ignore the vertex
end if
else
remove the vertex from the stack
number it in the order it is popped
end if Department of Computer Science, RLJIT 30
Traverse the following graph using DFS and display the nodes
reachable from a given source vertex:

Solution: Some procedure as we did in BFS. But there are two


changes.
1. Instead of using queue, we use stack.
2. In BFS, all the nodes adjacent and which are not visited are
considered. In DFS, only one adjacent which is not visited earlier
is considered. Rest of the procedure remains same.

Department of Computer Science, RLJIT 31


Stack v = adj(S[top]) Nodes visited S pop(Stack)
Initial step a - a
Stage 1 a b a, b -
Stage 2 a, b d a, b, d -
Stage 3 a, b, d f a, b, d, f -
Stage 4 a, b, d, f - a, b, d, f f
Stage 5 a, b, d - a, b, d, f d
Stage 6 a, b - a, b, d , f b
Stage 7 a c a, b, d, f, c -
Stage 8 a, c g a, b, d, f, c, g -
Stage 9 a, c, g e a, b, d, f, c, g, e -
Stage 10 a, c, g, e - a, b, d, f, c, g, e e
Stage 11 a, c, g - a, b, d, f, c, g, e g
Stage 12 a, c - a, b, d, f, c, g, e c
Stage 13 a - a, b, d, f, c, g, e a
Department of Computer Science, RLJIT 32
Difference between DFS and BFS
DFS BFS
This traversal is done with the This traversal is done with the
help of stack data structure. help of queue data structure.

It works using two ordering. It works using one ordering.


The first order is the order in The order in which the vertices
which the vertices are reached are reached in the same order
for the first time and the they get removed from the
second order in which the queue.
vertices become dead end.

Department of Computer Science, RLJIT 33


Sorting: The sorting is a technique by which the elements are
arranged in some particular order (Ascending Order or Descending
Order).
1. Insertion Sort: This sorting procedure is similar to the way we
play cards. We pick each card and insert it into the proper place so
that cards in hand are arranged in sorted order.
 In this method, the elements are inserted at their appropriate place.
Hence the name is insertion sort.
Example: Sort the elements 25, 75, 40, 10 and 20 using insertion
sort.

Department of Computer Science, RLJIT 34


Example: Sort the elements 25, 75, 40, 10 and 20 using insertion
sort.

Department of Computer Science, RLJIT 35


C function to sort items using insertion sort.

void insertion_sort (int a[ ], int n)


{
int i, j, item;
for(i=1; i<n; i++)
{
item = a[i]; // Insert the item from unsorted part
j = i-1;
while (item <a[j] && j>=0) //Find the appropriate place to insert
{
a[j+1] = a[j];
j = j-1;
}
a[j+1] = item; //Insert at the appropriate place
}
}
Department of Computer Science, RLJIT 36
2. Radix Sort: In this method sorting can be done digit by digit and thus
all the elements can be sorted.
Example: Consider the unsorted array of 8 elements.
45, 37, 05, 09, 06, 11, 18 and 27 Step 2: Now sort the above array
Step 1: Now sort the elements with the help of second last digit.
according to the last digit. Second Last Digit Element
Last Digit Element
0 05, 06, 09
0
1 11, 18
1 11
2 27
2
3 37
3
4 45
4
5
5 45, 05 6
6 06 7
7 37, 27 8
8 18 9
9 09
Since the list of element is of 2 digit, will stop comparing.
Finally the sorted list by radix sort37method will be 05, 06,
Department of Computer Science, RLJIT
Example 2: sort the following data in ascending order using radix
sort 25, 06, 45, 60, 140, 50 Step 2: Sort the elements Step 3: Sort the
Step 1: Now sort the elements according to second last elements according
according to the last digit. digit and sort them. to Third last digit
Second Element and sort them.
Last Digit Element
Last Digit Third Element
0 60, 140, 50
0 06 Last Digit
1
1 0 06, 25,
2 45, 50, 60
2 25
3 1 140
3
4 2
4 45, 140
5 25, 45 3
5 50
6 06 4
6 60
7 5
7
8 6
8
9 7
9
8
Thus, the sorted list of elements are: 06, 25,of 45,
Department 50, Science,
Computer 60, 140 RLJIT
9
38
Algorithm:
Step 1: Read the total number of elements in the array.
Step 2: Store the unsorted elements in the array.
Step 3: Now the simple procedure is to sort the elements by digit
by digit.
Step 4: Sort the elements according to the last digit then second
last digit and so on.
Step 5: Thus, the elements should be sorted for up to the most
significant bit.
Step 6: Store the sorted element in the array and print them.
Step 7: Stop.

Department of Computer Science, RLJIT 39


3. Address Calculation Sort:
 The address calculation sort is a kind of sorting technique in which the
function f is applied to each element.
 The value returned from function f will denote in each sub-file the
element is placed.
 The element is placed into a sub-file in correct sequence by using any
sorting method- simple insertion is often used.
Example: 28, 43, 55, 14, 88, 39, 91, 33, 49
 We will create 10 sub-file, initially each sub-file is empty.
 An array of pointer f(10) is declared.
 The element is passed through a function which returns the ten‟s place
digit. This value indicated the address of that element.
Consider, 28 -> f(28) = 2 The elements can be arranged as follows:
43 -> f(43) = 4
55 -> f(53) = 5
14 -> f(14) = 1
88 -> f(88) = 8
39 -> f(39) = 3
91 -> f(91) = 9
33 -> f(33) = 3 Department of Computer Science, RLJIT 40
Hashing: The process of mapping large amounts of data into a
smaller table using hash function, hash value and hash table is
called hashing.
Hash Value: A function can be used to provide a mapping between
large original data and the smaller table by transforming a key
information into an index to the table.
The index value returned by this function is called hash value.
Hash Function: Hash function is a function which is used to put the
data in the hash table. Hence one can use the same hash function to
retrieve the data from the hash table. Thus hash function is used to
implement the hash table.
The integer returned by the hash function is called hash key.
Hash Table: The data can be stored in the form of a table using
arrays with the help of hash function which gives a hash value as an
index to access any element in the table.
In this table insertion, deletion and retrieve operations takes place with
the help of hash value is called hash table.
Department of Computer Science, RLJIT 41
Create a hash table and using hash function for 5 items:
For example, consider the elements: 10, 11, 12, 13 and 14 and consider the
following function:
H(k) = k % 5 // Hash function is just a function
The value of each function for the items: 10, 11, 12, 13 and 14 can be
obtained as shown below:
H(k) = k % 5

H(10) = 10 % 5 = 0 Insert 10 into table a[0]


H(11) = 11 % 5 = 1 Insert 11 into table a[1]
H(12) = 12 % 5 = 2 hash values Insert 12 into table a[2]
H(13) = 13 % 5 = 3 Insert 13 into table a[3]
H(14) = 14 % 5 = 4 Insert 14 into table a[4]
Thus, the table obtained after inserting each of the items using hash value as
the index is shown below:

The table thus created using elements and hash


values isDepartment
the hash table
of Computer Science, RLJIT 42
What are the different types of hashing techniques?
The two types of hashing techniques are:
1. Static hashing
2. Dynamic hashing

1. Static Hashing: The process of mapping large amount of data into a


table whose size is fixed during compilation time is called static
hashing.
 In static hashing, all the data items are stored in a fixed size table t
called hash table.
 The hash table is implemented as an array ht[0…..m-1] with 0 as the
low index and m-1 as the high index.
 Each item in the table can be stored in ht[0], ht[1], …ht[m-1] where m
is the size of the table.
2. Dynamic Hashing: Here size is not fixed. It can grow or shrink .
 Typical example of dynamic hashing is extensible hashing technique.
 Extensible hashing is a technique which handles a large amount of
data. The data to be placed in the hash table is by extracting certain
Department of Computer Science, RLJIT 43
number of bits.
Hash functions:
The various types of hash functions are
1. Division method
2. Mid square method
3. Folding method
4. Converting keys to integers

Department of Computer Science, RLJIT 44


1. Division method:
The hash function depends upon the remainder of division.
h(k) = record % table size

Typically the divisor is table length.

Example: if the record 54, 72, 89, 37 is to be placed in the hash table
and if the table size is 10 then
h(key) = record % table size
54 % 10 = 4
72 % 10 = 2
89 % 10 = 9
37 % 10 = 7

Department of Computer Science, RLJIT 45


2. Mid- square method:
In this method, the key k is squared. A number in the middle of k2 is
selected by removing the digits from both ends.

The hash function is defined as shown below:


h(k) = l
Where l is obtained by removing digits from both ends of k2.

Example: Consider key k = 2345 its square i.e. k2 = 574525


h(2345) = 45 by discarding 57 in the beginning and 25 from the end.

Department of Computer Science, RLJIT 46


3. Folding method:
In this method, the key k is divided into number of parts k1, k2, k3,
….., kn of same length except the last part. Then all parts are added
together as shown below.

h(k) = k1+k2+k3+….+kn
For example, consider key k = 123987234876. the partitions are k1 =
12, k2 = 39, k3 = 87, k4 = 23, k5 = 48, k6 = 76.
Now, the hash value can be obtained by adding all the partitioned keys
as shown below:
h(k) = 12 + 39 + 87 + 23 + 48 + 76 = 285

Department of Computer Science, RLJIT 47


4. Converting keys to integers :
In this method, the key k is a string, the hash value can be obtained by
converting this string into integer. This can be done by adding all
ASCII values of each character in the string.
Example: k = “ABCD”.
The hash value can be calculated as shown below:
h(“ABCD”) = „A‟ + „B‟ + „C‟ + „D‟ = 266

Department of Computer Science, RLJIT 48


What is collision during hashing ? Or When overflow occurs ?
If the size of the hash table is fixed and the size of the table is smaller
than the total number of keys generated, then two or more keys will
have the same hash address (also called hash value). This phenomenon
of two or more keys being hashed to the same location of hash table is
called collision.
Example:
Consider a hash function
H(key) = recordkey % 10 having the hash table of size 10.
The recordkeys to be placed are 131, 44, 43, 78, 19, 36, 57 and 77
 Now if we try to place 77 in the hash table then
we get the hash key to be 7 and at index 7
already the recordkey 57 is place. This situation
is called collision.
 From the index 7 if we look for next vacant
passion at subsequent indices 8, 9. then we find
that there is no room to place 77 in the hash
Department of Computer Science, RLJIT
table. This situation is called overflow. 49
How to avoid overflow in hashing? Or How to avoid collision ?
The various hashing techniques using collision can be avoided are:
1. Open Addressing
2. Chaining

1. Open Addressing: In this technique, the amount of space


available for storing various data is fixed at compile time by
declaring a fixed array for the hash table. So all the keys are stored
in this fixed hash table itself without the use of linked lists.
In such a table, collision can be avoided by finding another
unoccupied location in the array.
The collision can be avoided using linear probing.
Linear Probing: In linear probing, we linearly probe for next slot.

Department of Computer Science, RLJIT 50


To create the hash table, we need the following:
1. Initial hash table.
2. Select the hashing function.
3. Find the index of each location in the hash table.

1. Create initial hash table: Assume the size of the hash table is HASH_SIZE = 5
and the hash table can be pictorially represented as shown below:
a[0] a[1] a[2] a[3] a[4]

The empty hash table is indicated by storing 0 (zero) values in each location as
shown below:
a[0] a[1] a[2] a[3] a[4]
0 0 0 0 0

Code: void initialize_hash_table (int a[])


{
int i;
for (i=0; i<HASH_SIZE; i++)
{
a[i] = 0;Department of Computer Science, RLJIT 51
}
2. Identify the hash function: Let the hash function is H(K) = K % m
where m is HASH_SIZE.
Code:
int H (int k)
{
return K % HASH_SIZE;
}

3. Find the index for hash table given hash value:


Code:
for (i=0; i<HASH_SIZE; i++)
{
printf(“%d”, i)
}
The above code can also be written using % operator as shown below:
for ( i=0; i<HASH_SIZE; i++)
{
printf(“%d”, i % HASH_SIZE);
} Department of Computer Science, RLJIT 52
Again code can be modified by adding pos to i then perform mod operation
as shown below:

for ( i = 0 ; i < HASH_SIZE; i++)


{
printf( “%d”, (pos+i)%HASH_SIZE);
}

Output for various values of pos:


If pos = 0, output will be : 0 1 2 3 4
If pos = 1, output will be : 1 2 3 4 5
If pos = 2, output will be : 2 3 4 0 1
If pos = 3, output will be : 3 4 0 1 2
If pos = 4, output will be : 4 0 1 2 3

So, index to hash table from pos is given by


index = (pos + i) % HASH_SIZE
So, index to hash table from h_value is given by:
index = (h_value + i) % HASH_SIZE
Department of Computer Science, RLJIT 53
Construct a hash table ht of size 15 (using open addressing method) to store the
following words: like, a, tree, you, first, find, a, place, to, grow, and, then, branch
and out
Solution: Now, we “Find hash address for each word”. Let us find the hash address by
taking the sum of positions alphabets in the word and taking the remainder by dividing
it by 15. The various hash addresses or hash value are shown below:

Department of Computer Science, RLJIT 54


Creating hash table: The words are taken in the order of serial
number from the above table and inserted into hash table one by one
based on the hash address (value or index). The words “like”, “a” and
“tree” are inserted into positions 7, 1 and 3 respectively as shown
below into hash table.

The next word to be selected is “you” whose hash value is 1 and is


already full. So, it is inserted in the next available location 2 as shown
below.

The word “first” with hash value 12 is inserted into ht[12] a shown
below.
Department of Computer Science, RLJIT 55
The next word “find” with hash value 3 should be inserted at ht[3].
But, a word is already present. So, insert “find” at next free slot i.e. 4
as shown below.

The next word “a” with hash value 1 is already in ht[1]. So, select
next word “place” with hash value 7. It is already occupied and hence
insert at next empty space 8 as shown below:

The word “to” with hash value 5 inserted at position 5 as shown


below.

Department of Computer Science, RLJIT 56


The word “grow” has the hash value 3and since the location is full,
the next available location is 6 and so it is inserted at position 6 as
shown below.

The word “and” has the hash value 4. It is full, next free slot is 9 and
hence “and” is inserted at location 9 as shown below.

The word “then” has the hash value 2. It is full. It is inserted into next
free location 10 as shown below.

Department of Computer Science, RLJIT 57


The word “branch” has the hash value1. It is full. It is inserted into
next free location 11 as shown below.

The word “out” has the hash value 11. It is full. It is inserted into next
free location 13 as shown below.

Department of Computer Science, RLJIT 58


2. Chaining:
This technique is a very simple method using which collisions can be
avoided. This is achieved using an array of linked lists. If an item has
to be inserted, using a hash function, the hash value is obtained. This
hash value is used to find the list to which the item is to be added and
using the appropriate function an item can be inserted at the end of the
list. If more than one key has same hash value, all the keys hashed into
same hash value will be inserted at the end of the list and thus
collision is avoided.
Advantages: The searching using this technique takes less time.

Department of Computer Science, RLJIT 59


Example: Construct a hash table ht of size 5 (using chaining method to avoid
collision) and solve the following words: like, a, tree, you, first, find, a, place, to,
grow, and, then, branch, out
Solution: A hash table of size 5 indicates that it is required to construct a hash table
by using an array of 5 linked lists.

Department of Computer Science, RLJIT 60


Construction of Hash table: Now, let us see how to construct a hash
table. Since the size of the hash table is 5, we will have an array of 5
rows with each row having a linked list. Obtain the words one by one,
find the hash address and insert at the end of the list in the appropriate
location in the array. The hash table obtained by computing the hash
values is shown below.

Department of Computer Science, RLJIT 61


Files and Their Organization:
Data hierarchy:
 Data is a representation of facts or concepts in an organized
manner.
 Systematic organization of data is involved.
 It involves various data items such as data fields, records, files and
databases.
1. Data Field: It is an elementary unit of information that stores a
single fact as an attribute.
Example: Name of a student, USN, Phone Number and Address are
various fields or attributes of a Student.
2. Record: It is a collection of related data fields.
3. File: It is a collection of related records.
Example: We may have a file for each student consisting of student‟s
academic record, student‟s personal record and student‟s hostel record
and so on.
4. Database: A database is a data structure that stores organized
Department of Computer Science, RLJIT
information usually in the form of tabular columns. 62
File Attributes:
 File attributes are settings that are associated with computer files.
These attributes may grant or deny certain rights to the user or the
operating system(os) when a file is accessed.
 Various file attributes supported by windows or DOS
1. Read-only: If a file is marked read-only, it can‟t be modified or
deleted. If we try to delete read only file, we get the message
“Access Denied”.
2. Hidden: If a file is marked hidden, we can see the contents of
directory, but we can‟t see the file.
3. System: A file marked as a system file indicated that it is a file
used by the computer system. These files should not be modified
or deleted.
4. Directory: When a directory bit is set, the name associated with
file is considered a directory or sub-directory which contains files.
5. Archive: A archive bit set when a file is modified. So, when
archive bit is set, it is ready for backup. After backup is over,
Department of Computer Science, RLJIT
archive bit is cleared. 63
Text and Binary File :

Text File Binary File


The text files contain plain text The binary file contain the data
data. in binary form.
The text file does not contain The binary file can store the
the graphical data. data such as text, graphics,
image and sound.
Text files can be directly read The binary file can‟t be read
and interpreted. directly.
The text files are not executable The binary files can be
files. executable files.

Department of Computer Science, RLJIT 64


Basic File Operations:
 The basic operations that can be performed on files are given in
below figure:

1. Creation
2. Updating
3. Retrieval
4. Maintenance

Department of Computer Science, RLJIT 65


1. Create a file:
 A file is created by specifying its name and mode. Then the file is
opened for writing records that are read from an input device. all
the records have been written into the file, the file is closed. The
file is now available for future read/write operations by any
program that has been designed to use it in some way or the other.
2. Updating a file:
 A file can be opened for updating using “r+” mode. A file be
updated by inserting a new record or we can delete an existing
record or we can modify the contents of file.
 After updating the file, the file has to be closed.
3. Retrieving from a file:
 It is the process of extracting useful information from a file.
 Information can be retrieved either for analysis purpose or for
generating various reports.
4. Maintaining a file:
 It involves re-organization of file contents without changing the
contents of the file. This is used only for structured changes.
Department of Computer Science, RLJIT
66
File organization:
 The method of storing files in memory in certain order, so that
the insertion or deletion or modification will be faster and
efficient. It also increase the performance of the system is called
file organization.
 Organization of records means the logical arrangement of
records in the file and not the physical layout of the file as
stored on a storage media.
 Following considerations should be kept in mind before
selecting an appropriate file organization method:
1. Rapid access to one or more records.
2. Efficient storage of records.
3. Using redundancy to ensure data integrity.
4. Ease of inserting/updating/deleting one or more records
without disrupting the speed of accessing record.

Department of Computer Science, RLJIT 67


1. Sequential Organization::
 Sequential file organization is the most basic way to organize a
large collection of records in a file
 A sequentially organized file stores the records in the order in
which they were entered.
 Sequential files can be read only sequentially, starting with the
first record in the file.

Department of Computer Science, RLJIT 68


Features:
 Records are written in the order in which they are entered.
 Records are read and written sequentially.
 Deletion or updation of one or more records calls for replacing the
original file with a new file that contains the desired changes.
 Records have the same size and the same field format.
 Records are sorted on a key value.
 Generally used for report generation or sequential reading.
Advantages:
 Simple and easy to Handle, no extra overheads involved
 Sequential files can be stored on magnetic disks as well as magnetic
tapes.
 Well suited for batch– oriented applications.
Disadvantages :
 Records can be read only sequentially. If ith record has to be read,
then all the i–1 records must be read.
 Does not support update operation. A new file has to be created and
the original file has to be replaced with the new file that contains the
desired changes Department of Computer Science, RLJIT 69
 Cannot be used for interactive applications.
2. Relative File Organization :
 If the records are of fixed length and we know the base address of the
file and the length of the record, then any record i can be accessed using
the following formula:
Address of ith record = baseaddress + (i–1) * recordlength

Consider the base address of a file is 1000 and each record occupies 20
bytes, then the address of the 5th record can be given as:
1000 + (5–1) * 20
= 1000 + 80
Department of Computer Science, RLJIT 70
= 1080
Features:
 Provides an effective way to access individual records.
 The record number represents the location of the record relative to the
beginning of the file
 Records in a relative file are of fixed length
 Relative files can be used for both random as well as sequential access
 Every location in the table either stores a record or is marked as FREE.
Advantages:
 Ease of processing. If the relative record number of the record that has to
be accessed is known, then the record can be accessed instantaneously.
 Random access of records makes access to relative files fast.
 Allows deletions and updations in the same file. New records can be
easily added in the free locations based on the relative record number.
 Well suited for interactive applications.
Disadvantages :
 Records can be of fixed length only
 For random access of records, the relative record number must be known
in advance.
 Use of relative files is restricted to disk devices
Department of Computer Science, RLJIT 71
3. Indexed Sequential File Organization:
 Records are of fixed length. Provides fast data retrieval.
 Index table stores the address of the records in the file.
 The ith entry in the index table points to the ith record of the file.
 Index table is read sequentially to find the address of the desired record,
a direct access is made to the address of the specified record in order to
access it randomly.
 Indexed sequential files perform well in situations where sequential
access as well as random access is made to the data.

Department of Computer Science, RLJIT 72


Advantages:
 The indices are small and can be searched quickly, allowing the database
to access only the records it needs.
 Supports applications that require both batch and interactive processing.
 Records can be accessed sequentially as well as randomly
 Updates the records in the same file.
Disadvantages :
 Indexed sequential files can be stored only on disks.
 Needs extra space and overhead to store indices
 Handling these files is more complicated than handling sequential files
 Supports only fixed length records

Department of Computer Science, RLJIT 73


Questions???
1. Define the following: i. Graph ii. Multi-graph iii. Graph with
self-edge iv. Complete graph.
2. What is graph? Give matrix and adjacency list representation of
graph. Show adjacency matrix and list for the below graph.

3. Write algorithm for BFS and DFS graph traversal methods. Or


What are the methods used for traversing a graph. Explain any
one with example and write function for same.
Department of Computer Science, RLJIT 74
Questions???
4. How an insertion sort works? Suppose an array A contains 8
elements as follows: 77, 33, 44, 11, 88, 22, 66, 55. Trace
insertion sort algorithm for sorting in ascending order. Or
Write a C function for insertion sort. Sort the following list
using insertion sort: 50, 30, 10, 70, 40, 20, 60
5. Arrange the following elements in ascending order using
RADIX SORT 151, 60, 875, 342, 12, 477, 689, 128, 15
6. Explain in detail about static and dynamic hashing.
7. Explain different types of HASH function with example.
8. Explain hashing and collision. Explain the methods to resolve
the collision? Explain linear probing with an example.
9. Briefly explain basic operations that can be performed on a file.
Explain indexed sequential file organization.
10. List and explain the methods used for file organization.
Department of Computer Science, RLJIT 75

You might also like