Unit - I

You might also like

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 36

Data Structures and Algorithms UNIT-I Snapshots

Data Structure Stack Queues Linked List Trees Application of Trees Sets and Disjoint Set Union Huffman Algorithm Graphs Depth First Search Breadth First Search Searching Techniques Hashing

1.0

Introduction

This is the introductory unit of Data Structure. This chapter gives an in-depth knowledge of data structure. This unit helps to design and develop the computer programs of any kind. The fundamental nature of programming and data processing requires efficient algorithms for access of the data in main memory and storage devices. The effectiveness is directly linked to the structure of the data being processed.

1.1

Objective

The objective of this lesson is to make the learner understand the fundamental concepts of Data Structure. This lesson makes you familiar with the concepts of stacks, queues, linked lists, trees and graphs. They are the most commonly used data structures.

1.2

Content

1.2.1 Data Structure


Data is represented by data values held temporarily in the programs data area or recorded permanently on a file, the different data values are related to each other; these data values must be in an organized form. The program has to follow certain rules to access and process the structured data and therefore data is represented as follows: data structure=organized data + allowed operations Definition: - A Data Structure is a logical way of organization data that considers not only the items stored, but also their relationship to each other.

Page 1

Data Structures and Algorithms


For example consider a single dimensional array in C Language declared follows: int a [5]; In this structure a contains 5 integer values. The value starts from a[0] and ends with a[4]. The value is accessed by its index value. Types of Data Structure There are various data structures, which are also used depending upon the needs and convenience. There are two types of data structures. a. Linear data structure. b. Non-linear data structure. Linear Data Structure Data is stored in memory locations in some sequential order or by means of pointers. The elements stored in memory locations of a sequential order are called Arrays. The elements stored in memory locations by means of pointer are called Linked Lists. Linked List is the most commonly used Data Structure. The other linear Data Structures are Array, Linked Lists, Stacks and Queues. Non-Linear Data Structure Non-Linear Data Structure is the one that represents the hierarchical relationship between individual data items. They are Trees and Graphs.

1.2.2 Stack
The most common form of data organizing in computer programs is the ordered list or linear list. The stack is a linear data structure. Stacks are suitable data structures for backtracking or to return to the previous state. Definition: - Stack is an ordered collection of items into which insertion or deletion of items could be done, at one end, that is, at the top. An item may be inserted or deleted only from the top of the stack. This means that, the last element to be added to the stack is the first item to be removed. So the stacks are called Last In First Out (LIFO) lists. Example of stack Most common example of stack is a stack of dishes; another one is a stack of folded towels. Basic Stack Operations

Page 2

Data Structures and Algorithms


Stack has two basic operations: 1. Push - adding an item to a stack. 2. Pop - removing an item from the stack.
E D C B A

Top

Figure 1.1 Example of a Stack Note: Each time a new element is inserted in the stack, the stack pointer value is incremented by one, before another element is placed on the stack. Similarly, the pointer is decremented by one each time a deletion is made from the stack. Since the stack provides insertion and deletion of items, the stack is called dynamic data structure, that is, constantly changing object. From the definition one conclude that the operation takes place at a single end called top. New item may be added to the top of the stack or items, which are at the top of the stack, may be removed. The pointer keeps track of the elements in the stack. For Example: if the data is feeded into the stack that is, A, B, C, D and E are inserted into a stack in that order, then the first element to be removed would be E. Using Linked Lists, in which a node is collection of data link information, stacks can be represented. A stack can be represented using nodes with two fields possibly called data and link. The data field of each node contains the actual element in the stack and the corresponding link field points to the node containing the next item in the stack. The link field of the last node is zero. The following figure fig1.2 shows the data link of the five elements linked in stack. E Stack D C B A 0

Data link Figure 1.2 Elements of linked stack

Implementation of stacks To specify the basic operations of stacks, the programming details for their implementation is needed. Begin with contiguous implementation, where the stack entries are stored in an array. Page 3

Data Structures and Algorithms


Declaration For contiguous implementation set up an array that will hold the entries in the stack and a counter that will indicate the number of entries. In the following code, maxstack is a constant giving the maximum elements allowed for stacks and item type entry is the type describing the data that will be stored into the stack. Item Type entry depends on the application and can range from a single number or character to a large record with many fields. #define MAXSTACK 10 typedef char item_type; typedef struct stack_tag{ int top; item_type stack[MAXSTACK]; } Stack_type; Boolean_type Empty(Stack_type *); Boolean_type Full(Stack_type *); Void initialize(Stack_type *); Void push(item_type , stack_type *); Void pop(item_type *, stack_type *); Push and Pop Stack is implemented using Push and Pop operations. The operations during implementation should be done with care; for example: popping an item from an empty stack or pushing an item into a stack that is full should not be done. Such attempts are regarded as errors fatal to the execution of the program and therefore Boolean value functions must be written to check the stack status. Algorithm for push Procedure push[s,top,x,N] this procedure inserts an element[x] to the top of the stack with N elements. 1. [check for stack overflow] if top>=N then write (Stack overflow) return 2. [increment top] top=top+1 3. [insert element] s[top]=x 4. [finished] return First step of the algorithm checks for an overflow condition. If such a condition exits then the insertion cannot be performed and an appropriate error message result. /* Push: an item on to the stack*/ Page 4

Data Structures and Algorithms


void push(item_type item, stack_type *stack_ptr) {if (stack_ptr->top>=MAXSTACK) error(stack is full); else stack_ptr->stack[stack_ptr->top++]=item; } Algorithm for pop Function pop[s,top]: this function removes the top element from a stack. 1. [check for underflow on stack] if top =0 then write(Stack is Underflow) exit 2. [decrement pointer] top <= top-1 3. [return top element of stack] return[s(top+1)] In this algorithm first step itself checking underflow condition if exist an appropriate error message will display and the procedure is terminated. Other operations /*Empty: returns non-zero if the stack is empty*/ boolean-type empty(stack_type *stack_ptr) {return stack_ptr->top<=0;} /*Full: returns non-zero if the stack is full*/ boolean_type full(stack_type *stack_ptr) {return stack_ptr->=MAXSTACK;} The next function Initialize initializes a stack to be empty before it is first used in a program: /*Initialize: initialize the stack to be empty*/ void initialize (stack_type *stack_ptr) {stack_ptr->top=0;} Stack is represented using a single-dimension array. The first or bottom element in the stack is stored at stack[0], the second at stack[1] and the ith at stack[i-1]. Associated with the array is an index variable, called top, which points to the top element in the stack. The stack is said to be empty if the stack pointer value is less than or equal to 0. Similarly the stack is said to be full if the stack pointer value reaches a value greater than the maximum size of the stack. Application of Stack The main application of stack are recursion, compilation of infix expression and stack machines.

Page 5

Data Structures and Algorithms 1.2.3 Queues


In our real life Queue is defined as a waiting line for a specific task, like in a line of people waiting to purchase tickets, where the first person in line is the first person served. Definition: -A queue is an ordered list in which all insertions take place at one end, the rear, where as all deletion takes place at other end, the front. Therefore Queues work on the concept of First in First out (FIFO) principle. The representation of a queue in sequential location is more difficult than that of the stack. The simplest scheme employs a one-dimensional array and two variables front and rear. Front is less than the position of the first element in the queue and the rear is the position of the last element in the queue. The terms front and rear are used in describing a linear list while implementing a queue. The entry in a queue ready to be processed is the first entry, which will also be the first one to be removed from the queue. The position of the first entry is called the front of the queue. Similarly the last entry in the queue is the one most recently added, to the rear (or tail) of the queue.

Figure 1.3 queue with five elements The above figure contains five elements A1 to A5. A1 is at the front of the queue and A5 is the at the rear. An element can been deleted only from the front of the queue. That is, A1 is removed and A2 is the front of the queue. Types of Queues There are three types of queues are given below: i. Circular Queue ii. Deque iii. Priority Queue

[4] [ [3] [2] J2 [1] [n-2] J1 [0] [n-1] front =0; rear=3 Figure 1.4 Circular queue of capacity n-1 containing four elements J1, J2, J3 and J4 J4 J3 [n-3] [n-4]

Page 6

Data Structures and Algorithms


[n-3] [4] [3] [2] [1] [0] front = 0; rear=4 Figure 1.4a Insertion into a circular queue. Circular Queue This technique essentially allows the queue to wrap around upon reaching the end of the array. A large amount of memory would be required to accommodate the elements.A Circular queue may be specified by a single pointer p to the list, node(p) is the rear of the queue and the following the node is its front. To insert element in the rear of a circular queue, the element is inserted into the front of the circular queue and the circular list pointer is then advanced one element so, that the new element becomes the rear. The above figure shows the circular queue structure. Dequeues A Deque is a linear list in which the elements can be added or removed at either end, but not in middle (deque means double ended queue). Deque is maintained by a circular array. Deque with pointers Left and Right will point to the two ends of the deque. From the term circular deque [1] comes after the deque [n] in the array. The two variations of deque are: Input restricted deque- which allows insertion at only one end of the list, but allows deletion at both ends of the list. Output restricted deque- which allows deletion at only one end of the list, but allows insertion at both ends of the list. Priority Queues A Priority Queue is a collection of elements such that each element has been assigned a priority and based on the order in which elements are deleted and processed from the rules specified below: 1. A higher priority element is processed before any lower priority element. J4 J3 J2 J1 J0 [n-1] [n-2]

[5]

Page 7

Data Structures and Algorithms


2. Any two elements having same priority will be accessed according to the order in which they are added in the queue. Basic Queue operations Queue has basically three primitive operations are follows: Insert (q, x) - insert an item x in the rear of the queue q. For Insert operation there are no limit for the number of elements. Remove(q, x)- delete the element from the queue q and set x to its contents. This operation can be applied only if the queue is not empty. Empty (q)returns true or false depending on whether or not the queue contains any elements. This operation is always applicable.

Algorithm to insert into a queue QINSERT(Q, F, R, N, Y) Q Queue F & R Front and Rear pointers N number of elements Y element to be inserted 1. [Check for Overflow] if R >= N then write(Overflow) return. 2. [Increment rear pointer] R=R+1 3. [Insert element] Q[R] = Y 4. [Set the front pointer properly] if F = 0 then F=1 Algorithm to delete an element from Queue QDELETE(Q, F, R) 1. [Check for underflow] if F = 0 then Write (Underflow)

Page 8

Data Structures and Algorithms


Return 2. [Delete element] Y = Q[F] 3. [Queue Empty] if F= R then F=R=0 Else F=F+1 4. [Return Element] return (Y) Initially F = R = 0 is the status of the queue. As soon as an item is inserted, rear pointer is incremented. Front pointer F is set to 1 during the first enqueue operation. While deleting, Front pointer is incremented. F = R implies queue is empty and so, F = R =0. Applications of Queues The most common application of queue in computer is job scheduling. That is, in a time sharing system, programs form a queue waiting to be executed. A line in a bank, cars forming a long line at busy tool booth are some of the common examples of queue.

1.2.4 Linked list


Linked lists are the type of data structures that are widely used in real time programming. Linked list is possible to grow and shrink size at any time. The definition of the linked list is given below. Definition: - A linked list is a chain of structures in which each structure consists of data as well as pointers, which store the address (link) of the next logical structure in the list. The major goal behind linked list data structure is to eliminate the data movement associated with insertions into and deletions from the middle of the list. The advantage of the linked list is it is not necessary to know the number of elements and allocate memory beforehand. One major disadvantage of having array is even if it is dynamically allocated one is that it cannot grow. A linked list is a structure that guide to algorithms, which minimize data movement as insertion and deletion occur in an ordered list. Each element called node contains data field and pointer field (in that pointer field contains an address). Types of linked lists The types of linked lists are given below:

Page 9

Data Structures and Algorithms


i. ii. iii. Single linked list Each node contains data and a single link, which attaches it to the next node in the list. Doubly linked list each node contains data and two links, one to the previous node and one to the next node. Circular linked list linked list structure in which the last element point to the first element of the lists is called circularly linked list. It is a list in which the link field of the last element of the list contains a pointer to the first element of the list. The last element of the list no longer points to a NIL value. Info next Info next Info next null node node node node

iv.

Info next list

Figure 1.5 linear linked list Basic operation of linked list List is dynamic structure that is number of nodes on a list may vary dramatically based on a list operation. The operations are given below 1. Adding element in a linked list 2. Removing element in a linked list Adding an element into the linked list There are three ways of adding elements in a list; one is adding an element to the head of a linked list, another one is adding an element to the tail of a list and the last method being adding an element after the current node. To add a new node in front of a linked list, the pointer must allocate the appropriate memory to the node that is, the head of the list. Then check the pointer of the current element if it is not the last element in the list. If so, it will change the pointers accordingly. Algorithm to insert into a list at the beginning of the list if pHead = NULL then pHead = pNode pTail = pNode end if. where pHead is the head node, pTail is the tail node and pNode is the node to be inserted. Algorithm to insert node at the tail of the list If pHead != Null then Page 10

Data Structures and Algorithms


pTail -> pNext = pNode pTail = pNode pNode->pNext = NULL End if. Algorithm to insert pNode after node pAfter pNode ->pNext = pAfter-> pNext pAfter ->pNext = pNode If ( pNode-> pNext = NULL) then pTail = pNode In the above algorithm, the new node pNode is inserted, after the node pAfter. The next pointer of pAfter is assigned to pNodes next pointer. The next pointer of pAfter is mapped to the new node. Removing an element from a linked list To delete an element from the front of a linked list; make sure the pointer element contains head. Then it must patch up the linked list accordingly. The bond between head and the next element is broken, so, the next element in the list is set to be the head. Removing an element from the tail of the list is simpler. First, the element must be obtained and then the memory used by the tail node is released. The tail pointer is set at the preceding element and the procedure makes sure that this is not the last element in the list. Deleting a node refers to reassigning the pointer from the deleted node to the succeeding node. It can be done in three ways Deleting a head node Deleting a middle node Deleting a tail node If pTemp = pHead PHead = pHead-> Next Delete pTemp Elsif pNode = pTail then PTemp = Tail Tail = previous(Tail) PTail->pNext = NULL Delete pTemp; Else PTemp = previous(pNode) PTemp ->pNext = pNOde->pNext Delete pTemp End if Application of Linked Lists Application of linked lists are as follows :

Page 11

Data Structures and Algorithms


In Directory structures Operating systems for process management Data management in data base systems

1.2.5 Trees
Tree is a non-linear data structure. This structure is mainly used to represent data containing a hierarchical relationship between elements. Example: record, family trees. In this structure data is represented in the hierarchical form. that is, relationship between individual data items. Definition: - A Tree is a data structure used to represent data containing a hierarchical relation between its elements. Applications of Tree For example a tree can be used to represent the Unix file system in which files and subdirectories are stored under directories. Another example: is to represent the records in a file in which elementary items are stored under group items. Basic structure of tree In a tree each item is called node or leaf. Each item contains data and pointer, which contain address of the node. The first node in the tree is called root. Each piece under node is called subtree. A node under which the subtree exists is called parent node. A node has no subtree is called terminal node. Level of root = 0. Level of any node = level of parent + 1 0
A

Figure 1.6 Sample tree Consider the above figure has four levels of trees and thirteen nodes. The highest level that is, level 1 node A is called as root node. The number of sub trees of a node is called its degree. The degree of A is 3, of C is 1 and G is 0. Nodes having zero degree are Page 12

Data Structures and Algorithms


called leaf or terminal nodes. The terminal nodes are K, L, F, G, M, J and I the remaining nodes and are called nonterminals. The children of D are H, J and I and parent of D is A. Types of trees There are various types of trees; of which two of them are given below 1. Unbalanced binary tree: -In the level degree the numbers of sub trees every node have differ by one or more. 2. Balanced binary tree:- In the level degree the number of two sub trees every node have never differ by more than one. Binary Trees In this tree, no nodes have more than two children. Binary trees are special cases of general trees. 0 A 0 A
B C D E

1 2 D 3 4 H

2 3

Figure 1.7 Two Sample Binary Tree Definition: - A Binary tree is a finite set of nodes that is, either empty or consists of a root and two disjoint binary trees called left and right subtrees. Properties of binary Tree 1. A binary tree with N internal nodes has maximum of (N+1) nodes. 2. The external path length of any binary tree with N Internal nodes is 2N greater than the internal path length. 3. The height of a full binary tree with N internal nodes is about log2N. A binary tree is a useful data structure when two-way decisions must be made at each point in a process. A binary tree can reduce the number of comparisons. The first number in the list is placed in a node that is, established as the root of a binary tree. Each successive number in the list is then compared to the number in the root. If it matches, we have a duplicate. If it is smaller, the left subtree is examined; if it is larger, the right subtree is examined. If the subtree is empty, the number is not a duplicate and is placed into a new node at that position in the tree. If the subtree is nonempty, compare the number to the contents of the root of the subtree and the entire process is repeated with the subtree. Page 13

Data Structures and Algorithms


Traversal of binary tree Traversal of a graph is to visit each node exactly once. For example searching the particular nodes. Let T be a binary tree, there are different ways to proceed and the methods differ primarily in the order in which they visit the nodes. The different traversals of T are: Inorder Postorder Preorder Level by level traversal + * E * E A B D

Figure 1.8 Expression tree This is an example of an expression of tree for (A+B*C)-(D*E) Inorder The general strategy of Left-Root-Right. If T is not empty, 1. First traverse(inorder) the left subtree. 2. Visit the root. 3. Traverse the right subtree in inorder. Postorder Traversal In this traversal, it is a left-right-root strategy. 1.Traverse the left subtree in postorder. 2.Traverse the right subtree in postorder. 3.Visit the root. Preorder Traversal In this traversal strategy is root-left-right. Preorder traversal is employed in depth first search. Visit the first root, then recursively the following one: Visit the root. Page 14

Data Structures and Algorithms


Traverse the left subtree preorder. Traverse the right subtree preorder. Level by level traversal In this method we traverse level-wise that is, we first the visit the node root at level 0. that is, root is one. The next level is traverse from left to right this is two. Then we visit the next level from left to right and so, on. This is same as breadth first search. This traversal need not be recursive because the other traversals are recursive. Therefore we may use queue of a kind of data structure to implement this and the stack kind of data structure for other three traversals. Inorder Preorder Postorder Level by level : : : : (A+B*C)-(D*E) -+A*BC*DE ABC*+DE*-+*A*DEBC

Many algorithms that use binary trees proceed in two phases. The first phase builds a binary tree and the second traverses the tree. Applications of Trees Tree has several applications. One important application of tree is representation. The implementation of set manipulation in tree is given below. set

1.2.6 Sets and disjoints set unions


Tree is used to represent the set. Set is a collection of elements. Two sets are said to be disjoint if they do not share any common element. This means that any element that belongs to one of the set does not belong to the other set. Let us consider three disjoint sets S1, S2 and S3. Let the elements belonging to these sets be as follows: S1={a,g,h,i} S2={b,e,j} S3={c,d,f} The figure shown below represents the tree representation of the sets S1, S2 and S3. Note that for each of the set the nodes from the children are linked to the parent and not vice versa.

a
S1 g h

b Page 15 e j

Data Structures and Algorithms

S2

S3 d

Figure 1.9(a) Possible tree representation of sets Let us now have a look at the operations that could be performed on these sets, namely: 1. Disjoint set union: Union of two disjoint sets Si and Sj is represented as SiUSj and it would contain all elements x such that x is in either S i or Sj. Thus, according to our example, S1US2 = {a, g, h, i, b, e, j}. Since disjoint sets had been considered, it can be assumed that after the union of Si and Sj, the sets Si and Sj do not exist independently; that is, they are replaced by SiUSj in the collection of sets. Find(i): This is for finding the set to which a particular element belongs. Thus, d is in set S3 and a is in set S1.

Union and Find Operations The union of two trees can be easily obtained by making one tree, a sub tree of the other. Thus the two sets S1 and S2 from Figure(a) can be united and S1US2 could then have one of the representations of Figure (b).

a g
S1US2 (or)

h e

b j
b

Page 16

Data Structures and Algorithms

S1US2 Figure 1.10(b) Possible representations of S1US2 Union can be accomplished easily if, with each set name, a pointer is kept to the root of the tree representing the set. If, in addition, each root has a pointer to the set name, then to determine which set an element is currently in, let parent links follow to the root of its tree and use the pointer to the set name. To unite sets Si and Sj, unite the trees with roots. The roots for our example are FindPointer(Si) and FindPointer(Sj). FindPointer is a function that determines the root of a set. This is done by an examination of the [set name, pointer] table. The operation of Find(i) is to determine the root of the tree containing the element i. The function Union(i,j) requires two trees with roots i and j be joined.

1.2.7 Huffman Algorithm


Huffman problem is that of finding the minimum length bit string which can be used to encode a string of symbols. One application is text compression. Huffman's scheme uses a table of frequency of occurrence for each symbol (or character) in the input. This table may be derived from the input itself or from data which is representative of the input. For instance, the frequency of occurrence of letters in normal English might be derived from processing a large number of text documents and then used for encoding all text documents. A variable-length bit string is assigned to each character that unambiguously represents that character. This means that the encoding for each character must have a unique prefix. If the characters to be encoded are arranged in a binary tree,

Page 17

Data Structures and Algorithms

Figure 1.11 Encoding Tree an encoding for each character is found by following the tree from the root to the character in the leaf: the encoding is the string of symbols on each branch followed. For example: String Encoding TEA 10 00 010 SEA 011 00 010 TEN 10 00 110 As desired, the highest frequency letters - E and T - have two digit encodings, whereas all the others have three digit encodings. Encoding would be done with a lookup table. A divide-and-conquer approach might lead to the question, which characters should appear in the left and right sub trees and trying to build the tree from the top down. As with the optimal binary search tree, this will lead to an exponential time algorithm. Operation of the Huffman Algorithm The following diagrams show how a Huffman encoding tree is built using a straight-forward greedy algorithm which combines the two smallest-weight trees at every step.

Initial data sorted by frequency

Page 18

Data Structures and Algorithms

Combine the two lowest frequencies, F and E, to form a sub-tree of weight 14. Move it into its correct place.

Again combine the two lowest frequencies, C and B, to form a sub-tree of weight 25. Move it into its correct place.

Now the sub-tree with weight, 14 and D are combined to make a tree of weight, 30. Move it to its correct place.

Now the two lowest weights are held by the "25" and "30" sub-trees, so, combine them to make one of weight, 55. Move it after the A.

Page 19

Data Structures and Algorithms

Finally, combine the A and the "55" sub-tree to produce the final tree. The encoding table is:
A C B F E D 0 100 101 1100 1101 111

A greedy approach places our n characters in n sub-trees and starts by combining the two least weight nodes into a tree which is assigned the sum of the two leaf node weights as the weight for its root node. The time complexity of the Huffman algorithm is O(n log n). Using a heap to store the weight of each tree, each iteration requires O(log n) time to determine the cheapest weight and insert the new weight. There are O(n) iterations, one for each item. Decoding Huffman-Encoded Data With these variable length strings, its not possible to break up an encoded string of bits into characters.

Page 20

Data Structures and Algorithms


Figure 1:12 Huffman Decoding Algorithm In the decoding procedure, starting with the first bit in the stream, successive bits are used from the stream to determine whether to go left or right in the decoding tree. When a leaf of the tree is reached, a character is decoded and placed onto the (uncompressed) output stream. The next bit in the input stream is the first bit of the next character. Transmission and storage of Huffman-Encoded Data If the system is continually dealing with data in which the symbols have similar frequencies of occurrence, then both encoders and decoders can use a standard encoding table/decoding tree. However, even text data from various sources will have quite different characteristics. For example, ordinary English text will have generally 'e' at the root of the tree, with short encodings for 'a' and 't', whereas C programs would generally have ';' at the root, with short encodings for other punctuation marks such as '(' and ')' (depending on the number and length of comments). If the data has varying frequencies, then, for optimal encoding, an encoding tree for each data set is to be generated and stored or transmitting the encoding with the data has to take place. The extra cost of transmitting the encoding tree means that there is no gain and overall benefit unless the data stream to be encoded is quite long - so, that the savings through compression more than compensate for the cost of the transmitting the encoding tree also.

1.2.8 Graphs
A Graph G consists of set of vertices (V) and set of edges (E). V is finite and nonempty. E is a set of pairs of vertices called edges. V(G) represent vertices and E(G) represents edges respectively. 1 2 3 1 1

3 c. G3

a. G1

b. G2 Figure 1.13 Three Sample Graphs

Definition: - A directed graph G consists of a set V and E. the set V is a finite, nonempty set of vertices. The set E is a set of pairs of vertices; these pairs are called edges. The notations V (G) and E(G) represents the set of vertices and edges respectively of graph G.

Page 21

Data Structures and Algorithms

1 1 1 1 1 1 1 a 11 1 1

B 5 5 5 5

22 E

1 F 1 1 3 3 3

G D C 4 4 4

Figure 1.14 An Undirected Graph (1) In the fig.1,2,3,4,5 are the nodes and A,B,C,D,E,F,G are the edges. Undirected graph: - The pair of vertices representing any edge is unordered. Thus the pairs (u,v) and (v,u) represent the same edge. Path is a sequence of distinct vertices each adjacent to the next. Cycle is a path containing at least three vertices such that the last vertex on the path is adjacent to the first. Connected A graph is connected if there is a path from any vertex to any other vertex. Free tree is defined as a connected undirected graph with no cycles. Directed Graph Each edge of graph G(V,E) is represented by a directed pair<u,v> u is the tail and v is the Head of the edge therefore <v,u> and <u,v> represent two different edges in a directed graph or digraph. E is a set of ordered pair of edges There are several kinds of directed graph: 1 1

B 5 C

2 E

3 D

Figure 1.15 A graph and its strongly and weakly connected components Page 22

Data Structures and Algorithms


Strongly connected : - If there is a directed path from any vertex to any other vertex for each v and w there is a path from v to w. Weakly connected : - we suppress the direction of the edges and the resulting Undirected graph is connected. A complete graph with n- nodes will have n(n-1)/2 edges. A sub graph of G is a graph G such that V(G) V(G) and E(G) E(G). Multiple edges: - Distinct edges E and E are called multiple edges if they connect the same end points(that is) E=[u,v] and E`=[u,v]. :- An edge E is called loop if it has identical start and end points that is E=[u,u].

loop

Indegree of a node n is the number of edges having n as head. Outdegree of a node n is the number of edges having n as tail. 1

B 5 C

2 E

3 D

Figure 1.16 A Directed Graph Application of Graph The most common example of graph is a collection of cities and road between them. Graphs are most frequently applied in the following areas such as electrical circuit, transportation networks, chemical structures of crystals and artificial intelligence. Graph Representation There are three most commonly used graphs representation: Adjacency Matrices Adjacency Multi - lists Adjacency lists

Page 23

Data Structures and Algorithms

Figure 1.17 Graph and Adjacency Matrix Adjacency Matrix The adjacency matrix of G is a two dimensional n X n array say A, with the property that A(i,j)=1 if there is a edge between i an j for a directed graph and A(i,j)=0 if there is no such a edge in G. Consider the adjacency matrix for the graph G1, G2 and G3 in the figure 1.15. The adjacency matrix for an undirected graph is symmetric and for directed graph need not be symmetric. The space needed to represent a graph using its adjacency matrix is n2 bits. For an undirected graph the degree of any vertex is its row sum n j=1 A (i,j) for a directed graph the row sum is the out-degree while the column is the in degree. This below table shows the adjacency matrix for the given graph(1.14)

Page 24

Data Structures and Algorithms


Vertice 1 2 3 4 5 1 2 3 4 5 the leading diagonal will Be always 0.

0 1 1 0 1 1 0 1 0 0 1 1 0 1 1 0 0 1 0 1 1 0 1 1 0 Figure 1.18 represent the adjacency matrix for the graph(1.14) Adjacency Multi-list In the adjacency list the representation of undirected graph each edge (vi,vj) is represented by two entries one on the list for vi and the other on the list for vj. The adjacency lists are actually maintained as multilist. For each edge there will be exactly one node, but this node will be in two lists i.e the adjacency list for each of the two nodes it is incident to. The new node structure is: M vertex1 vertex 2 list 1 list 2

Where m is a one-bit mark field that may be used to indicate whether or not the edge has been examined. The storage requirements are the same as for normal adjacency list except for the addition of the mark bit M. Adjacency Lists In this representation, n rows of adjacency matrix are represented as n linked lists. There is one list for each vertex in G. Each node has at least two fields, vertex containing the indices of vertices adjacent to vertex i and Link. 1 5

2 3

4 The adjacency list for the above graph is given below 1 2 3 4 5 1 2 3 2 3 3 4 5 4 5 5 4 Figure 1.18 Adjacency list structure for graph 5

Page 25

Data Structures and Algorithms 1.2.9 Graph Traversal


In many hands on problem graphs require one to systematically examine the nodes and edges of a graph G. There are many possible methods for visiting vertices of the graph. The two standard methods to examine graphs are: Breadth first search will use a queue as an auxiliary structure to hold nodes for future processing. Depth First search will use a stack pushing all unvisited vertices adjacent to the one being visited onto the stack and popping the stack to find the next vertex to visit. Breadth first search This method can be used both for directed and undirected graph. This method begins at a given node (i.e. starting node) and then proceeds to all the nodes directly connected to that node. The following steps require doing the operations: 1. Begin with any node, marks it as visited. 2. Proceed to the next node having an edge connection to the node in step(1) mark it as visited. 3. Come back to the node in step (1) descend along the edge toward an unvisited node and mark the new node as visited. 4. Repeat step (3) until all the nodes adjacent to the node in step (1) have been marked as visited. 5. Repeat step (1) through 4 starting from the noted visited in 2 then starting from the node visited in step 3 in the order visited. Keep this up as long as possible before starting a new scan This algorithm visiting the graph in a tree is by root node, then visiting all the nodes at level 1 and level 2 and so, on. A queue is a convenient structure to keep track of nodes that are visited during search. Once node is visited it is entered into the queue of waiting to have their childrens visited, during the child node search over, the node currently at the front of the queue is removed and all its children are visited.

The starting node 1

delete the node 2 & Page 26

pop3, the node 3

Data Structures and Algorithms


Have 2 child 2,3 & Nodes 2 and 3 are Added to the queue visit1 node 2 have child 5 and 4 have add 5 to the so, add 5 , 4 to queue visit 2 queue visit 3

pop4 the nodes 6,7 are added pop5, the node 8 is added to queue visit 4 visit 5

pop5 already visited

the node 9 is added visit 6 Theorem

pop7 visit 7 pop8 visit 8 The result: 1,2,3,4,5,6,7,8,9.

pop 9 visit 9

Algorithm BFS visits all vertices reachable from v. Algorithm BFS(v) // A breadth first search of G is carried out //beginning at vertex v. For any node i, //visited[i]=1 if i has already been visited. The //graph G and array visited[] are global; //visited[] is initialized to zero. { u:v; //q is a queue of unexplored vertices. visited[v]:=1; repeat { for all vertices w adjacent from u do

Page 27

Data Structures and Algorithms


{ if (visited[w]=0) then { Add w to q;// w is unexplored visited[w]:=1; } } if q is empty then return; //No //unexplored vertex. Delete u from q; //Get first unexplored vertex. }until(false); } Depth First search The main logic of this algorithm is analogous to the preorder traversal of a tree. It searches progressively deeper in the recursive manner and so, call it Depth first search. The basic idea behind the depth first search is search begins at the starting node, first we examine the starting node, then examine each node along a path p, that is, we process a neighbor of A then a neighbor of B and so, on. After coming to a dead end that is, to the end of the path, backtracked on path p until one can continue along another path and so, on. Step 1 step 2 steps 3

push 1,2,4,6,9 to stack visit 1, 2, 4, 6, 9

pop 9 for the node 9, there is no unprocessed adjacent nodes

pop 6 for the node 6 no unprocessed adjacent nodes

Page 28

Data Structures and Algorithms

pop 4 the node 4 has adjacent node 7 push 4 and 7 visit 7

pop 7 the node 7-has no unprocessed adjacent nodes

pop 4 the node is already processed

pop2 the node 2 have adjacent node 5 2 and 5 unprocessed are push into stack visit 5

pop5 the node 5 have adjacent nodes. 8 so, 5 & 8 are pushed

pop8- there is no adjacent nodes

Page 29

Data Structures and Algorithms

pop4 push 5 and 3 to stack visit 3 result : 1,2,4,6,9,7,5,8,3. The algorithm for Depth First search

pop 3,5,2,1 no unprocessed adjacent nodes

Algorithm DFS(v) // v, w are the vertices of G. // Calls function DFS(v) Mark v as discovered. For each vertex w such that v,w belonging to G, If w is undiscovered then Dfs(G,w) Else Check vw without visiting w Mark v as finished. Dfs(G,w) Explore vw Visit w Explore from there as much as possible and backtrack from w to v. The details of depth first search technique available in 5th chapter.

1.2.10
Hashing

Searching Techniques

Hashing is one of the search time algorithms, this technique is often used to implement in dictionary. Hashing is one type of address generating technique. Definition: - Hashing is the search technique, where the elements are ordered with respect to some function of the key value. This function is called a hash function.

Page 30

Data Structures and Algorithms


Consider the student register numbers in the range 00000 to 99999; for this to create an array is impossible. Hence, an array of 1 to 100 and the problem is solved by mod function. For example: 71325 mod 100= 25(address) This is a method of accessing the element. The hash function output is nothing, but where to search a particular element. It decides where the element is stored in the array. Hashing function acts upon a given key to return the relative position in the list where to expect to find the key. Hash function-is known as a function that transforms a key into a table index. Hash of key- h is a hash function and key is a key, h(key) is called hash of key and is the index at which a record with the key key should be placed. Hash Collision or a Hash Clash-is known as two records occupying the same position. There are four basic methods of dealing with a hash clash: They are given below: Collisions For example the student register numbers are 02145 and 82145, result in hash having the same address .that is, 45. This is collision. This is impossible to avoid all the collision, but it can minimize them by spreading the elements uniformly throughout the array. Chaining builds a linked list of all items whose keys hash to the same values. During search, this short linked list is traversed sequentially for the desired key. This technique involves adding an extra link field to each table position. Linear Probing This method being simple to implement, requires that when a collision occurs we precede the list in sequential order until one finds the empty position (i.e.to store the colliding element in the next available space). One problem in this method is resulting that situation called clustering. After the no of collisions have been resolved, the distribution of records in the array becomes less and less uniform. The result of clustering is an inconsistent efficiency of insertion and selection operation. The increment function will be h + i. Quadratic Probing

Page 31

Data Structures and Algorithms


In this method if there is a collision at hash address h, the table is probed at locations h+1, h+4, h+9.. . that is, at location h+i2 (mod hashsize) for i=1,2; the increment function being i2. This method substantially reduces clustering, but it does not examine each and every free slot of the array. Rehashing:-Involves using a secondary hash function on the hash key of the item. The rehash function is applied successively until an empty position is found where the item can be inserted. If the hash position of the item is found to be occupied during a search, the rehash function is again used to locate the item. Random Probing This is the final method for pseudorandom number generator to obtain the increment to hash address. This method is excellent for eliminate clustering, but this is a slower one.

Page 32

Data Structures and Algorithms

1.3

Revision Points

Data Structure A data structure is a logical method of representing data in memory Stack A stack is a data structure in which insertions and deletions are restricted to one end. A stack, in other words, in called a LIFO (Last In First Out) list. Queue A Queue is a data structure in which insertions are made at one end (called the front) and deletions are made at another end (called the rear). It is also called as FIFO (First in First Out) list. Tree A Tree is a data structure used to represent data containing a hierarchical relation between its elements Binary Tree A Binary tree is a finite set of nodes that is, either empty or consists of a root and two disjoint binary trees called left and right subtrees. Huffman Algorithm Huffman problem is that of finding the minimum length bit string which can be used to encode a string of symbols. Graph A Graph G consists of set of vertices (V) and set of edges (E). V is finite and non-empty. E is a set of pairs of vertices called edges. V(G) represents vertices and E(G) represents edges respectively. Hashing Hashing is one type of address generating technique. The elements are ordered with respect to some function of the key value. This function is called a hash function.

1.4

Intext Questions
1 2 3 4 5 Define the data structure. List the types of data structure. Define a stack. Describe the stack operations in detail. Define a queue.

Page 33

Data Structures and Algorithms


6 7 List the types of queues. Describe the queue operations in detail.

1.5

Summary
Data Structure is a way of organizing data that considers not only the items stored, but also their relationship to each other. Stack is an ordered collection of items into which new items may be inserted and from which items may be deleted at one end, called top of the stack. Stack is a container, which follows LIFO principle Queue is an ordered list in which all insertions take place at one end, the rear and all deletion takes place at other end, the front. Queue is a container, which follows FIFO principle. A linked list is a chain of structure in which each structure consists of data as well as pointer, which stores the address of the next logical structure in the list. Linked List can grow and shrink dynamically. A tree is a data structure used to represent data containing a hierarchical relation between its elements. A binary tree is a finite set of nodes that is, either empty or consists of a root and two disjoint binary trees called left and right sub trees. Directed Graph has each edge, represented by a directed pair<u,v> u being the Tail and v the Head of the Graph. Breadth first search will use a queue as an auxiliary structure to hold nodes for future processing. Depth First search will use a stack pushing all unvisited vertices adjacent to the one being visited onto the stack and popping the stack to find the next vertex to visit. Hashing is one type of address generating technique. The elements are ordered with respect to some function of the key value. This function is called a hash function.

1.6

Terminal Exercises
1 Define a linked list.

Page 34

Data Structures and Algorithms


2 3 4 5 6 7 8 What are the types of linked lists? Explain the linked list operations with procedure in detail. Define a Binary Tree. Define a Digraph. Explain briefly about graph representation in detail. Explain the two graph traversal method with algorithms in detail What does hashing mean?

Supplementary Materials
1. Ellis Horowitz, Sartaj Sahni, Fundamentals of Computer Algorithms, Galgotia Publications, 1997. 2. Aho, Hopcroft, Ullman, Data Structures and Algorithms, Addison Wesley, 1987. 3. Sara Baase, Computer Algorithms: Introduction to Design and Analysis, Addison Wesley, 1989. 4. Jean Paul Trembly & Paul G.Sorenson, An introduction to Data Structures with Applications, McGraw-Hill, 1984.

1.8

Assignments
1. Explain the array implementation of Stack. 2. Discuss in detail about the various hashing functions.

1.9

Suggested Reading/Reference Books/Set Books

Mark Allen Weiss, Data Structures and Algorithm Analysis in C++, Addison Wesley, 1999. 1. Yedidyah Langsam, Moshe J.Augenstein, Aaron M. Tanenbaum, Data Structures Using C and C++, Prentice-Hall, 1997.

1.10 Learning Activities


The following task can be performed by group of students 1. Explain the Push and Pop operation

Page 35

Data Structures and Algorithms


2. Perform any one of the applications of stack.

1.11 Keywords
Data Structure Tree Binary Tree Stack Queue Last in First out (LIFO) First in First out (FIFO)

Page 36

You might also like